I have a problem and I already solved it, but I am searching for enhancement.

pythonatSheriff · Dec 18, 2021

The problem is :

((Have the function MathChallenge(str) take the str parameter being passed and determine if there is some substring K that can be repeated N > 1 times to produce the input string exactly as it appears. Your program should return the longest substring K, and if there is none it should return the string -1.

For example: if str is "abcababcababcab" then your program should return abcab because that is the longest substring that is repeated 3 times to create the final string. Another example: if str is "abababababab" then your program should return ababab because it is the longest substring. If the input string contains only a single character, your program should return the string -1. ))

My solutions is :

my solution to math problem:

using System;
using System.Text;
using System.Collections.Generic;
using System.Linq;


class MainClass
{

  public static void Main(string[] arg1)
  {
        //abcababcababcab
        string str = "abababababab";

        int count = 0;
        for(var i=0; i<str.Length; i++)
        {
            count = count + 1;
        }
        Console.WriteLine(count);

        int division = 0;
        if(count%2 == 0)
        {
            //Console.WriteLine("can divide on 2");
            division = 2;
            
        }
        else if(count%3 == 0)
        {
            //Console.WriteLine("can divide on 3");
            division = 3;
        }


        char[] LooseLetters = str.ToCharArray();
        int LettersCount = (LooseLetters.Length)/(division);
        StringBuilder part1 = new StringBuilder(str);
        if(division == 2)
        {
            part1.Insert(LettersCount, ",");
            Console.WriteLine(part1);
        }
        else if(division == 3)
        {
            part1.Insert(LettersCount, ",");
            part1.Insert((2*LettersCount)+1, ",");
            Console.WriteLine(part1);
        }

        string strConverter = part1.ToString();

        List<string> DuplicateString = strConverter.Split(",").ToList();
        // foreach(var item in DuplicateString)
        // {
        //     Console.WriteLine(item);
        // }

        string result = "";

        if(division == 2)
        {
            if(DuplicateString[0] == DuplicateString[1])
            result = DuplicateString[0];
            else
            result = "-1";
        }
        else if(division == 3)
        {
            if(DuplicateString[0] == DuplicateString[1] && DuplicateString[0] == DuplicateString[2])
            result = DuplicateString[0];
            else
            result = "-1";
        }

        Console.WriteLine(result);

  }

}

Now, I feel like it can be done using (Linq) methods, although I read most of the methods, or there is a simpler solution, so I would like to know to increase my knowledge.
Thanks in advance.

Skydiver · Dec 19, 2021

Yes, but isn't AllPartsEqual() called in a loop?

JohnH · Dec 19, 2021

That loop just identifies the search terms, ie the part, where input string is divisible by part.

I don't think KMP applies, this is not a single wild search for a pattern anywhere in a string. Here each possible part is a new search term, and the search only examines specific segments of the input string (multiples of part). If using KMP the whole input string have to be LPS computed for each different possible part, and then doing searches that doesn't even apply to the problem, and finally check if all multiples of the part length are included in returned match indexes.
Found a KMP code here: Knuth–Morris–Pratt Algorithm Implementation in C# - Programming Algorithms

Anyway, these are the results I get with simple Stopwatch testing, time in ticks:

48 (string.Split)
646 (Enumerable.Repeat)
764 (AllPartsEqual3)
1185 (AllPartsEqual2)
2497 (pure Regex)
4444 (Regex match repeat)
10123 (AllPartsEqual1)
13297 (KMP)

Skydiver · Dec 19, 2021

Part of the solution is here:

Find if a given string can be represented from a substring by iterating the substring “n” times - GeeksforGeeks

Your All-in-One Learning Portal: GeeksforGeeks is a comprehensive educational platform that empowers learners across domains-spanning computer science and programming, school education, upskilling, commerce, software tools, competitive exams, and more.

www.geeksforgeeks.org

Only the LPS part of KMP needs to be done.

Unfortunately, that generates the shortest string that can be repeated to build the original string. The problem above asks for the longest string.

If n % (n - len) != 0 then there is no such string.
If n / (n - len) is odd, then the first n - len characters of the pattern is the longest string.
if n / (n - len) is even then the first n / 2 characters is the longest string.

Skydiver · Dec 19, 2021

BenchmarkDotNet results for me:

C#:

|           Method |         Mean |       Error |      StdDev |
|----------------- |-------------:|------------:|------------:|
| SubstringMatcher |     466.3 us |     2.95 us |     2.61 us |
|     SplitMatcher |     749.4 us |     7.83 us |     7.32 us |
|     RegexMatcher |  23,371.9 us |    70.32 us |    58.72 us |
|      LinqMatcher |  24,440.0 us |   355.01 us |   314.71 us |
| PureRegexMatcher | 616,405.8 us | 3,398.95 us | 3,013.08 us |
|    OffsetMatcher |     500.0 us |     0.30 us |     0.25 us |
|    ModuloMatcher |   1,215.6 us |     2.90 us |     2.72 us |
|       KmpMatcher |     523.1 us |     8.42 us |     7.88 us |

I was using the following code to test:

C#:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace MatchChallenge
{
    public interface IMatcher
    {
        string MatchChallenge(string input);
    }

    public abstract class SubstringPartsMatcher : IMatcher
    {
        public string MatchChallenge(string input)
        {
            for (int i = 2; i <= input.Length / 2; i++)
            {
                if (input.Length % i == 0)
                {
                    var partLength = input.Length / i;
                    var part = input.Substring(0, partLength);
                    if (AllPartsEqual(input, part))
                        return part;
                }
            }
            return "-1";
        }

        protected abstract bool AllPartsEqual(string input, string part);
    }

    public class SubstringMatcher : SubstringPartsMatcher
    {
        protected override bool AllPartsEqual(string input, string part)
        {
            for (int i = 2; i < input.Length / part.Length; i++)
            {
                if (input.Substring(i * part.Length, part.Length) != part)
                    return false;
            }
            return true;
        }
    }

    public class SplitMatcher : SubstringPartsMatcher
    {
        protected override bool AllPartsEqual(string input, string part)
            => input.Split(new string[] { part }, StringSplitOptions.RemoveEmptyEntries).Length == 0;
    }

    public class RegexMatcher : SubstringPartsMatcher
    {
        protected override bool AllPartsEqual(string input, string part)
        {
            for (int i = 2; i < input.Length / part.Length; i++)
            {
                if (Regex.IsMatch(input, $"({part}){{{i}}}"))
                    return false;
            }
            return true;
        }
    }

    public class LinqMatcher : SubstringPartsMatcher
    {
        protected override bool AllPartsEqual(string input, string part)
        {
            for (int i = 2; i < input.Length / part.Length; i++)
            {
                if (string.Concat(Enumerable.Repeat(part, i)) == input)
                    return false;
            }
            return true;
        }
    }

    public class PureRegexMatcher : IMatcher
    {
        public string MatchChallenge(string input)
        {
            var m = Regex.Match(input, "(.+)\\1+");
            if (m.Success && m.Length == input.Length)
                return m.Groups[1].Value;
            return "-1";
        }
    }

    public abstract class StringIndexMatcher : IMatcher
    {
        public string MatchChallenge(string input)
        {
            for (int i = 2; i <= input.Length / 2; i++)
            {
                if (input.Length % i == 0)
                {
                    var partLength = input.Length / i;
                    if (AllPartsEqual(input, partLength))
                        return input.Substring(0, partLength);
                }
            }
            return "-1";
        }

        protected abstract bool AllPartsEqual(string input, int partLength);
    }

    public class OffsetMatcher : StringIndexMatcher
    {
        protected override bool AllPartsEqual(string input, int partLength)
        {
            for (int i = 0; i < input.Length / partLength - 1; i++)
            {
                for (int j = 0; j < partLength; j++)
                {
                    int index = i * partLength + j;
                    if (input[index] != input[index + partLength])
                        return false;
                }
            }
            return true;
        }
    }

    public class ModuloMatcher : StringIndexMatcher
    {
        protected override bool AllPartsEqual(string input, int partLength)
        {
            for (int i = partLength; i < input.Length; i++)
            {
                if (input[i] != input[i % partLength])
                    return false;
            }
            return true;
        }
    }

    public class KmpMatcher : IMatcher
    {
        public string MatchChallenge(string input)
        {
            if (string.IsNullOrEmpty(input))
                return "-1";

            var lps = ComputeLpsArray(input);
            int suffixLength = lps[input.Length - 1];
            if (suffixLength == 0)
                return "-1";

            int partLength = input.Length - suffixLength;
            if (input.Length % partLength != 0)
                return "-1";

            if (partLength % 2 == 0)
                partLength = input.Length / 2;
            return input.Substring(0, partLength);
        }

        int [] ComputeLpsArray(ReadOnlySpan<char> input)
        {
            var lps = new int[input.Length];
            int len = 0;
            int i = 1;

            while (i < input.Length)
            {
                if (input[i] == input[len])
                {
                    lps[i++] = ++len;
                }
                else
                {
                    if (len != 0)
                        len = lps[len - 1];
                    else
                        i++;
                }
            }

            return lps;
        }
    }

    public class MatcherBenchmarks
    {
        const string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
        static Random _rnd = new Random();

        string _good;
        string _bad;
        IMatcher _substringMatcher = new SubstringMatcher();
        IMatcher _splitMatcher = new SplitMatcher();
        IMatcher _regexMatcher = new RegexMatcher();
        IMatcher _linqMatcher = new LinqMatcher();
        IMatcher _pureRegexMatcher = new PureRegexMatcher();
        IMatcher _offsetMatcher = new OffsetMatcher();
        IMatcher _moduloMatcher = new ModuloMatcher();
        IMatcher _kmpMatcher = new KmpMatcher();

        public MatcherBenchmarks()
        {
            var sb = new StringBuilder();
            for (int i = 0; i < 257; i++)
                sb.Append(alphabet[_rnd.Next(alphabet.Length)]);
            var value = sb.ToString();

            sb.Clear();
            for (int i = 0; i < 257; i++)
                sb.Append(value);

            _good = sb.ToString();

            sb.Clear();
            for (int i = 0; i < 257 * 257; i++)
                sb.Append(alphabet[_rnd.Next(alphabet.Length)]);
            _bad = sb.ToString();
        }

        string Test(IMatcher matcher)
        {
            matcher.MatchChallenge(_bad);
            return matcher.MatchChallenge(_good);
        }

        [Benchmark]
        public string SubstringMatcher() => Test(_substringMatcher);

        [Benchmark]
        public string SplitMatcher() => Test(_splitMatcher);

        [Benchmark]
        public string RegexMatcher() => Test(_regexMatcher);

        [Benchmark]
        public string LinqMatcher() => Test(_linqMatcher);

        [Benchmark]
        public string PureRegexMatcher() => Test(_pureRegexMatcher);

        [Benchmark]
        public string OffsetMatcher() => Test(_offsetMatcher);

        [Benchmark]
        public string ModuloMatcher() => Test(_moduloMatcher);

        [Benchmark]
        public string KmpMatcher() => Test(_kmpMatcher);
    }

    public class Program
    {
        public static void Main()
        {
            BenchmarkRunner.Run<MatcherBenchmarks>();
            // Console.WriteLine(new MatcherBenchmarks().KmpMatcher());
        }
    }
}

As a side note, the regex matchers should have a preparation step to escape regex special characters in the search strings prior to trying to find a match, but I didn't implement that in the code above. I just happened to run across a crash when my original alphabet included the '+' and '/' characters and my random test string happened to have a "++" in it.

JohnH · Dec 20, 2021

Excellent testing, I got similar results. Both Regex versions is very slow, and SplitMatcher not quite as fast as I measured.

SubstringMatcher = AllPartsEqual1
OffsetMatcher = AllPartsEqual2
ModuloMatcher = AllPartsEqual3

I also added [MemoryDiagnoser]

C#:

|           Method |         Mean |     Error |    StdDev |     Gen 0 |     Gen 1 |     Gen 2 |    Allocated |
|----------------- |-------------:|----------:|----------:|----------:|----------:|----------:|-------------:|
| SubstringMatcher |     132.1 us |   0.57 us |   0.48 us |   16.3574 |         - |         - |    138,288 B |
|     SplitMatcher |     262.1 us |   2.26 us |   2.01 us |   41.5039 |   41.5039 |   41.5039 |    134,956 B |
|     RegexMatcher |  10,276.8 us |  56.00 us |  49.64 us |  125.0000 |   15.6250 |         - |  1,088,624 B |
|      LinqMatcher |   9,972.0 us | 187.22 us | 222.87 us | 6171.8750 | 6171.8750 | 6171.8750 | 33,852,374 B |
| PureRegexMatcher | 159,565.9 us | 485.07 us | 430.01 us |         - |         - |         - |        786 B |
|    OffsetMatcher |     161.9 us |   1.12 us |   1.05 us |         - |         - |         - |        536 B |
|    ModuloMatcher |     374.7 us |   1.19 us |   1.12 us |         - |         - |         - |        536 B |
|       KmpMatcher |     266.9 us |   2.64 us |   2.47 us |  166.5039 |  166.5039 |  166.5039 |    529,040 B |

For the longer test strings memory use gets a lot worse for RegexMatcher and LinqMatcher. PureRegexMatcher surprisingly does well memory-wise, but has very bad performance.
Interesting to see ModuloMatcher vs OffsetMatcher, the % operation obviously adds more computation than multiply index.
Also tested with one of the short sample strings (and a short bad one) got these:

C#:

|           Method |        Mean |    Error |   StdDev |  Gen 0 | Allocated |
|----------------- |------------:|---------:|---------:|-------:|----------:|
| SubstringMatcher |    40.93 ns | 0.759 ns | 0.874 ns | 0.0124 |     104 B |
|     SplitMatcher |   409.65 ns | 1.360 ns | 1.272 ns | 0.1040 |     872 B |
|     RegexMatcher | 1,659.53 ns | 6.118 ns | 5.424 ns | 0.0324 |     272 B |
|      LinqMatcher |    81.48 ns | 1.635 ns | 1.883 ns | 0.0191 |     160 B |
| PureRegexMatcher | 1,827.29 ns | 9.028 ns | 8.445 ns | 0.0782 |     664 B |
|    OffsetMatcher |    69.78 ns | 0.576 ns | 0.481 ns | 0.0038 |      32 B |
|    ModuloMatcher |    93.41 ns | 0.272 ns | 0.213 ns | 0.0038 |      32 B |
|       KmpMatcher |    60.40 ns | 0.980 ns | 0.917 ns | 0.0248 |     208 B |

So one have to choose between best performance or memory use, or a combination. Maybe also expected input strings.

I have to look into KMP again, I definitely didn't apply that right. (the code I copied had a different layout)

Skydiver · Dec 20, 2021

The original brute force substring matcher seems to be the best in terms of simplicity and speed.

Skydiver · Dec 20, 2021

JohnH said:
Interesting to see ModuloMatcher vs OffsetMatcher, the % operation obviously adds more computation than multiply index.

My gut feel is that the ModuloMatcher is not as cache or predictor friendly as OffsetMatcher. The OffsetMatcher is always looking forward in memory, while the ModuloMatcher has to peek back into the beginning of the memory block.

JohnH · Dec 20, 2021

Skydiver said:
The original brute force substring matcher seems to be the best in terms of simplicity and speed.

Unfortunately that one has serious bug, it is cheating by starting the loop at index 2 (the third part), if there is only two parts in input it skip the test and just return true

So the for loop in SubstringMatcher must start with 1. (I updated post 3 now also for this)

Further there is a couple of big bugs in your RegexMatcher and LinqMatcher, neither runs in a loop, they match input in one go by "repeat tricks", their AllPartsEqual overrides must be changed to:

RegexMatcher:

var partsCount = input.Length / part.Length;
return Regex.IsMatch(input, $"({part}){{{partsCount}}}");

LinqMatcher:

var partsCount = input.Length / part.Length;
return string.Concat(Enumerable.Repeat(part, partsCount)) == input;

Actually the partCount here is recalculated, the i suggested in post 6 was the i from outer loop (in SubstringPartsMatcher).

Then the results change considerably for RegexMatcher and LinqMatcher, and almost no change for SubstringMatcher:

C#:

|           Method |        Mean |    Error |   StdDev |    Gen 0 |    Gen 1 |    Gen 2 | Allocated |
|----------------- |------------:|---------:|---------:|---------:|---------:|---------:|----------:|
| SubstringMatcher |    133.2 us |  1.17 us |  0.91 us |  16.3574 |        - |        - | 138,824 B |
|     SplitMatcher |    267.8 us |  3.65 us |  3.41 us |  41.5039 |  41.5039 |  41.5039 | 134,956 B |
|     RegexMatcher |    170.8 us |  0.81 us |  0.76 us |   0.2441 |        - |        - |   2,176 B |
|      LinqMatcher |    239.6 us |  1.57 us |  1.31 us |  83.0078 |  83.0078 |  83.0078 | 265,420 B |
| PureRegexMatcher | 10,900.7 us | 95.01 us | 84.22 us |        - |        - |        - |     504 B |
|    OffsetMatcher |    162.6 us |  0.95 us |  0.89 us |        - |        - |        - |     536 B |
|    ModuloMatcher |    375.5 us |  1.23 us |  1.15 us |        - |        - |        - |     536 B |
|       KmpMatcher |    265.8 us |  0.95 us |  0.80 us | 166.5039 | 166.5039 | 166.5039 | 529,040 B |

Skydiver · Dec 20, 2021

I have a pending speed boost for the KMP matcher but using stackalloc instead of the heap. Code is currently in:

GitHub - azureskydiver/MatchChallenge

Contribute to azureskydiver/MatchChallenge development by creating an account on GitHub.

github.com

I have a problem and I already solved it, but I am searching for enhancement.

pythonatSheriff

Full Stack Web Developer

Skydiver

JohnH

C# Forum Moderator

Skydiver

Find if a given string can be represented from a substring by iterating the substring “n” times - GeeksforGeeks

Skydiver

JohnH

C# Forum Moderator

Skydiver

Skydiver

JohnH

C# Forum Moderator

Skydiver

GitHub - azureskydiver/MatchChallenge

Similar threads

Share this page

Latest posts