Regex or IndexOf?

sam

Member
Joined
Sep 8, 2016
Messages
10
Programming Experience
Beginner
I have a long string "AB100123485;AB10064279293-IP-1-KNPO;AB473898487-MM41". I have to extract integer value after "IP-" i.e 1 (only) what is the most efficient way ? I am using c#
 
Efficient for the programmer or efficient for the computer?
 
I'd use neither. Try this :
C#:
            string str = "AB100123485;AB10064279293-IP-1-KNPO;AB473898487-MM41";
            string result = str.Split('-').Skip(2).FirstOrDefault();
 
With the Split() call, you will end up with 6 allocations (1 for the array of strings, and 5 for the split strings).

With IndexOf(), you would need to make two calls to IndexOf() with no allocation overhead, and then make one allocation when you call Substring().

With the Regex(), you would need to allocate memory for all the infrastructure to run the state machine that the .NET Framework builds up, as well as, the cost of generating the state machine in the first place. Then when Match() is called it will have to allocating the Match object. Internally that would allocate a GroupCollection that contains an allocation for a string that matches.

So when looking at the efficiency for the computer, IndexOf() would be the most efficient, followed by Split(), and then with the Regex trailing.
 
You know my love for Regex will cause my defensive side to begin to flare up if you give it a fare hammering lol. And we've had this discussion before about how minuscule the differences are. They're only ms/fractional differences on speed and performance vs the more commonly used traditional system provided methods. Except, with one hiccup for regex.

Using the RegexOptions.Compiled will set the memory in stone, despite offering a speed performance on execution, and that is a loss on system resources unless the regex will be running a continuous execution. BUT on the plus side, this option is actually quicker than the other Regex.Options and does improve speed by 25% to 35%.

To quote myself for those interested in the options; "What RegexOptions.Compiled does, is; it compiles the expression to explicit MSIL code, rather than the regular expression internal instructions. RegexOptions.Compiled allows .NET's just-in-time compiler to convert your expression to native machine code for "better performance". However, not without a small consequence to using this option, and that is once its loaded in, there is apparently no way to unload the resources used by the compiled expression." - With all that said, I'd probably still do it the way I just did.

That's not to say that Regex is not my preference, because it surely has its benefits for sticky situations, and it could also be used here too, although not entirely needed, since It's likely only going to be executed one time. But you make some good points. ;)

There is a good article on To Compile or Not To Compile from our buddies at Coding Horror.
 
And if you want to look at developer efficiency, then Regex looks to be a winner below -- assuming that you and your succeeding maintainers understand regular expressions.

C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

static class StringExtensions
{
    public static IEnumerable<string> Tokens(this string input, params char [] delims)
    {
        var sb = new StringBuilder();

        foreach(char ch in input)
        {
            if (delims.Contains(ch))
            {
                yield return sb.ToString();
                sb.Clear();
            }
            else
            {
                sb.Append(ch);
            }
        }

        if (sb.Length > 0)
            yield return sb.ToString();
    }
}

class Program
{
    string GetIpUsingIndexOf(string input)
    {
        if (input == null)
            return null;

        string prefix = "IP-";
        int start = input.IndexOf("IP-");
        if (start < 0)
            return null;
        start += prefix.Length;
        int end = input.IndexOf('-', start);
        end = end >= 0 ? end : input.Length;
        return end > start ? input.Substring(start, end - start) : null;
    }

    string GetIpUsingTokens(string input)
    {
        return input?.Tokens('-')
                    .SkipWhile(s => s != "IP")
                    .Skip(1)
                    .FirstOrDefault();
    }

    string GetIpUsingSplit(string input)
    {
        return input?.Split('-')
                    .SkipWhile(s => s != "IP")
                    .Skip(1)
                    .FirstOrDefault();
    }

    string GetIpUsingRegex(string input)
    {
        if (input == null)
            return null;

        return Regex.Match(input, "IP-(?<IPValue>[0-9]+)")
                    ?.Groups["IPValue"]
                    ?.Value;
    }

    void Run()
    {
        var tests = new string []
            {
                "sometext-IP-123-745",
                "sometext-IP-123",
                "IP-123",
                "sometext-IP-123-moretext",
                "IP-123-moretext",
                "sometext-IX-123-moretext",
                "",
                null
            };

        foreach(var input in tests)
        {
            Console.WriteLine(GetIpUsingIndexOf(input));
            Console.WriteLine(GetIpUsingTokens(input));
            Console.WriteLine(GetIpUsingSplit(input));
            Console.WriteLine(GetIpUsingRegex(input));
        }
    }

    static void Main()
    {
        new Program().Run();
    }
}

The code for StringExtensions.Tokens() is an attempt to have strings be allocated on demand as opposed to Split() which will split the entire string all at once.

Update after: I forgot to mention that after the getting the return value of GetIpUsing*(), then int.TryParse() should be called to ensure that a valid integer has been collected.
 
Last edited:
Back
Top Bottom