Gradient RegEx Fractal Index and length Error FIXED

Boleeman

Member
Joined
Jan 15, 2025
Messages
9
Programming Experience
10+
Hi All.

I have converted a Gradient RegEx Fractal code from Java to CSharp in which you type in a RegEx string like 1*2*0 and it creates a fancy gradient pattern, saved to a png file. It used a nuget package called Fare which implements dk.briks.automation.

The Java version was originally here:
GitHub - SSODelta/GradientRegexImages: Creating pretty images from regular expressions using edit distance
and the Fare nuget for CSharp is here:

The
Fare did not have implemented GetStrings, but in the Java version of dk briks automation it was there.

In Java version:

Java GetString Implementation:
private static void getStrings(State s, Set<String> strings, StringBuilder path, int length) {
if (length == 0) {
if (s.accept)
strings.add(path.toString());
} else
for (Transition t : s.transitions)
for (int n = t.min; n <= t.max; n++) {
path.append((char)n);
getStrings(t.to, strings, path, length - 1);
path.deleteCharAt(path.length() - 1);
}
} 
 
and also 

public static Set<String> getStrings(Automaton a, int length) {
HashSet<String> strings = new HashSet<String>();
if (a.isSingleton && a.singleton.length() == length)
strings.add(a.singleton);
else if (length >= 0)
getStrings(a.initial, strings, new StringBuilder(), length);
return strings;


I tried to make a C# version of GetStrings that matches the Java code but I get the error message "Index and length must refer to a location within the string".
Googling suggested:
The error occurs in C# when you are trying to perform an operation (like accessing or removing characters) on a string or StringBuilder and the provided index and/or length is out of bounds for the current string or string-like object, but I still cannot find where that error is (getting really frustrated). Any help or hints would be greatly appreciated. I would really like to get it working in CSharp. At present the program compiles to an exe and is run in the console with say a test RegEx string 1*2*0 but I get the error message "Index and length must refer to a location within the string".

CSHarp GetString:
public static HashSet<string> GetStrings(Automaton automaton, int length)
{
    HashSet<string> strings = new HashSet<string>();
    if (automaton.IsSingleton && automaton.Singleton.Length == length)
        strings.Add(automaton.Singleton);
    else if (length >= 0)
        GetStrings(automaton.Initial, strings, new StringBuilder(), length);
    return strings;
}

private static void GetStrings(State s, HashSet<string> strings, StringBuilder path, int length)
{
    if (length == 0)
    {
        if (s.Accept)
            strings.Add(path.ToString());
    }
    else
    {
        foreach (Transition t in s.Transitions)
        {
            for (int n = t.Min; n <= t.Max; n++)
            {
                path.Append((char)n);
                GetStrings(t.To, strings, path, length - 1);
                
                // Only remove the last character if the path is not empty
                if (path.Length > 0)
                    path.Remove(path.Length - 1, 1);
            }
        }
    }
}


The problem might be too hard to solve, but I thought I would ask for help.
 

Attachments

  • Gradient RegEx updated.zip
    283.1 KB · Views: 230
  • testimage from Java.png
    testimage from Java.png
    35.4 KB · Views: 226
Also another thing to consider is that in both Java and C# almost every array access will do a bounds check. Bounds checks are branches. Branches cost time even if the pipeline predictor is correct. There are some ways to do micro-optimizations in C# so that the JIT optimizer will skip the bounds check but these tend to make the code look very ugly. I don't know enough about Java if it's possible to tell it to not do any bounds checking. In C and C++ there are no bounds checking done when accessing standard C arrays and C++ vector via the [] operator.
 
Another part of the genius of NYA's code is that he encodes the possible strings as integers instead of strings. He can store up to 9 characters in a single 32 bit integer. So instead of pulling 18 bytes for a string plus another 2 bytes for the length of a string from main memory, he only needs to pull 4 bytes. The less memory you need to move around the faster your code. Even if the CPU had the memory already in the L1 or L2 cache, less is still faster. And even better, 32 bits can fit into a single register. So when computes the Hamming distance between two strings, it is all done in CPU after the initial memory read into the registers. (Even with my partially optimized code, computing the Hamming distance is the bottleneck.)

I still have not figured out why NYA chose to use base 10 to encode his strings. With base 5 he could double the length of his strings without going to 64 bit integers. Or I wonder if it just simplified debugging.
 
The other version his code rxfrac.cpp does the version of the RegEx fractal of the Daily Programmer competition.

With the distrix.cpp were you able to get it to compile?

I tried the standard GNU C++ compiler but was unsuccessful (Also installed gpp General-purpose preprocessor but that did not work).
Not sure, but was he using Linux and macOS as the listed gpp script had #!/usr/bin/env bash ?

Yes NYA was a smart guy in encoding the possible strings as integers instead of strings. Would have been nice to implement his ideas into the CSharp program, without using dk.automation.

With the java version, I managed to get it running by installing the Eclipse IDE and adding the automation.jar (and changing the path to that jar file). Took a while to work it out, but in the end theSSODelta java code worked. For longer RegEx strings both the CSharp version and the Java version do bottlenecks, as you mentioned above. I wonder how the distrix code would perform. On the Daily programmer site someone did the other version of RegEx Fractals (not gradient) in C using POSIX regular expressions and stated "It uses very little memory, practically constant-space. As-is it can output up to a 4 billion by 4 billion image (232 x 232) using only a few bytes of memory. It's also very fast.". Not sure how true that statement is, but it seems both C and C++ can perform well.
 
Last edited:
I see. Thanks for the info!

The <vector> library is for C++ dynamically managed arrays. They are smarter than the standard C array. He just uses the standard regex class and supporting regex_match() function to do the regular expression matching supplied by the <regex> library.

I'm guessing that the <extra\task.h>. just provide him with multi-threaded support in C++ before it became a standard. He tries to do 4 quadrants in parallel, as opposed to the Java code which does all the quadrants sequentially.

gpp is not a special type of C++. It's merely a build script that uses the standard GNU C++ compiler g++. See his gpp in GitHub.

Hi Skydiver, I came across this and thought it may be the utils.hpp as distrix would need to use RegEx somewhere.


utils.hpp
#ifndef UTILS_HPP
#define UTILS_HPP

#include <vector>
#include <string>


std::vector<std::string>
split_at_regex(const std::string& input, const std::string& regex);

std::string trim_zeroes(std::string in);
std::string _trim_zeroes(std::string in);

#endif // UTILS_HPP
 
Also there is an Awesome header-only C++ libraries at GitHub - p-ranav/awesome-hpp: A curated list of awesome header-only C++ libraries
and under Concurrency there are some header libraries
Perhaps one of these might be useful in replacing/substituting the task.h as the seem to do parallel tasking.
 
I just a straight transliteration the code to C# with minimal changes to the logic or structure. Pretty disappointing perf as well as the generated images doesn't seem to match what SSO Delta had even if I customize the RGB gradient values.


C#:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

// Transliteration of distrx.cpp from C++ to C#

namespace DistRegex.Literal
{
    record struct RGB(byte R, byte G, byte B)
    {
        public readonly Color ARGB => Color.FromArgb(R, G, B);
    }

    class Mat
    {
        readonly int _size;
        readonly int[] _data;
        readonly byte _depth;

        public Mat(int n)
        {
            _depth = (byte) n;
            _size = (int) Math.Pow(2, n);
            _data = Enumerable.Repeat(0, _size * _size).ToArray();

            Populate(_data, _size, _size);
        }

        static void Populate(Span<int> mat, int size, int rawSize)
        {
            if (size <= 1)
                return;

            for (int i = 0; i < size; ++i)
            {
                for (int j = 0; j < size; ++j)
                {
                    mat[j + i * rawSize] *= 10;
                    mat[j + i * rawSize] += (i < size / 2) ? ((j < size / 2) ? 1 : 2)
                                                           : ((j < size / 2) ? 3 : 4);
                }
            }

            Populate(mat[0..],                          size / 2, rawSize);
            Populate(mat[(size / 2)..],                 size / 2, rawSize);
            Populate(mat[(size / 2 * rawSize)..],       size / 2, rawSize);
            Populate(mat[(size / 2 * (rawSize + 1))..], size / 2, rawSize);
        }

        static RGB D_To_Col(RGB[] grad, double norm_d)
        {
            return new RGB(Comp(norm_d, grad[0].R, grad[1].R),
                           Comp(norm_d, grad[0].G, grad[1].G),
                           Comp(norm_d, grad[0].B, grad[1].B));

            static byte Comp(double norm, byte snd, byte fst)
            {
                if (fst > snd)
                    norm = 1 - norm;
                return (byte)(Math.Min(snd, fst) + norm * Math.Abs(snd - fst));
            }
        }

        static byte N_Edits(int fst, int snd)
        {
            byte dist = 0;

            while (fst != 0)
            {
                if ((fst % 10) != (snd % 10))
                    dist++;
                fst /= 10;
                snd /= 10;
            }
            return dist;
        }

        static byte Min_Distance(int px, IEnumerable<int> matches, byte depth)
        {
            byte min = depth;
            foreach(var it in matches)
            {
                var distance = N_Edits(it, px);
                if (distance == 0)
                    return distance;
                else if (distance < min)
                    min = distance;
            }
            return min;
        }

        public Img Apply(string restr, RGB [] grad)
        {
            var re = new Regex(restr, RegexOptions.Compiled);

            string format = $"D{_depth}";
            var d0 = _data.AsParallel()
                          .WithDegreeOfParallelism(4)
                          .Where(val => re.IsMatch(val.ToString(format)))
                          .ToList();

            var res = new RGB[_data.Length];
            var partioner = Partitioner.Create(0, _data.Length);
            Parallel.ForEach(partioner,
                             new ParallelOptions() { MaxDegreeOfParallelism = 4 },
                             (range, _) =>
                             {
                                 for (int i = range.Item1; i < range.Item2; i++)
                                     res[i] = D_To_Col(grad, Min_Distance(_data[i], d0, _depth) / (double)_depth);
                             });
            return new Img(res);
        }
    }

    class Img(IEnumerable<RGB> pxs)
    {
        readonly RGB[] _pxs = pxs.ToArray();

        public void Save(string @out)
        {
            var size = (int) Math.Sqrt(_pxs.Length);
            using var image = new Bitmap(size, size);
            for(int i = 0; i < size; ++i)
            {
                for (int j = 0; j < size; ++j)
                    image.SetPixel(i, j, _pxs[i * size + j].ARGB);
            }
            image.Save(@out, ImageFormat.Png);
        }
    }

    class DistRx
    {
        public static void Main(string[] args)
        {
            if (args == null || args.Length <= 0)
            {
                PrintHelp();
                return;
            }

            var fname = args[0];
            var depth = Convert.ToInt32(args[1]);
            var rx = args[2];

            var grad = new RGB[2]
            {
                new(0x00, 0x00, 0x00),
                new(0xFF, 0xFF, 0xFF)
            };

            if (args.Length == 9)
            {
                grad[0] = new RGB(Convert.ToByte(args[3]), Convert.ToByte(args[4]), Convert.ToByte(args[5]));
                grad[1] = new RGB(Convert.ToByte(args[6]), Convert.ToByte(args[7]), Convert.ToByte(args[8]));
            }

            new Mat(depth).Apply(rx, grad).Save(fname);

            static void PrintHelp()
            {
                Console.Error.WriteLine($$"""
USAGE   {{Environment.ProcessPath}} file depth regx [grad_params]
    depth<unsigned>: recursion depth of the canvas generator, determines the image size (size: 2^depth, 2^10 is 1024)
    regex<string>: self explanatory"
SCOPE: https://ssodelta.wordpress.com/2015/01/26/gradient-images-from-regular-expression
""");
            }
        }
    }
}
 
Back
Top Bottom