Array Acces Speed

SilverShaded

Well-known member
Joined
Mar 7, 2020
Messages
93
Programming Experience
10+
Hello, i have a program which is heavy on array access, so i've been tyring to optimise the speed. This is important for the overall program which is highly iterative with lots of matrix manipulation. I was hoping to get any criticism of the code below or any further suggestions to speed it up more.

Current run times (in release mode) are respectively, in ms, the relative time differences are relatively constant, at the moment option 4 is looking like the best possible, which represent a significant increase in speed over option 2 which is the current method in the progam. Before i go changing a whole bunch of arrays are there any obvious improvements possible or better ways to do it or any problems with the code below that may give odd results?

elapsedMs1 128
elapsedMs2 29
elapsedMs3 228
elapsedMs4 7

Array Access Speed Test:
 [TestMethod]
        [MethodImpl(MethodImplOptions.NoInlining)]
        public unsafe void TestDoubleDoubleArrayWithPts()
        {

            double[,] Array = new double[10000,1000];
            var watch = Stopwatch.StartNew();
            int UpperBound0 = Array.GetUpperBound(0);
            int UpperBound1 = Array.GetUpperBound(1);

            watch.Restart();
            for (int i = 0; i < Array.GetUpperBound(0); i++)
                for (int y = 0; y < Array.GetUpperBound(1); y++)
                    Array[i,y] = 10D * 10D;

            var elapsedMs1 = watch.ElapsedMilliseconds;

            watch.Restart();
            for (int i = 0; i < UpperBound0; i++)
                for (int y = 0; y < UpperBound1; y++)
                    Array[i, y] = 10D * 10D;

            var elapsedMs2 = watch.ElapsedMilliseconds;

            watch.Restart();
            fixed (double* ptr = Array)
            {
                for (int i = 0; i < Array.GetUpperBound(0); i++)
                    for (int y = 0; y < Array.GetUpperBound(1); y++)
                        *(ptr + i * Array.GetUpperBound(1) + y) = 10D * 10D;    
            }

            var elapsedMs3 = watch.ElapsedMilliseconds;

            watch.Restart();
            fixed (double* ptr = Array)
            {
                for (int i = 0; i < UpperBound0; i++)
                    for (int y = 0; y < UpperBound1; y++)
                        *(ptr + i*UpperBound1 + y) = 10D * 10D;      
            }

            var elapsedMs4 = watch.ElapsedMilliseconds;

            System.Windows.Forms.MessageBox.Show(elapsedMs1.ToString() + " " + elapsedMs2.ToString() + " " + elapsedMs3.ToString() + " " + elapsedMs4.ToString());
       
        }
 
Solution
Yes your right, it looks odd now you mention it, when i tested it the results looked ok, late here so ill check it tommorrow. If the results are actually correct maybe that routine can be simplified, the final matrix should be symmetrical around the diagonal, some odd thing in the math maybe. Could just be wrong though. Should be able to cut the sqrt calls in half actually, as its symmetrical, but i was too knackered to get it working today. Need to think about it with a fresh head on.

parralell might be interesting!

i can’t reduce the number of calls to this function for pure tower simulations but definately can for streams and general flash calculations.
So without using any parallelism yet, I've managed to get these timings on my laptop running on battery for 10000x10000 matrices:
C#:
Original pointers: Average: 729 ms
Back to arrays: Average: 916 ms
Back to arrays 2: Average: 650 ms
Back to arrays 3: Average: 585 ms
Spans: Average: 875 ms
Spans 2: Average: 610 ms
Spans 3: Average: 308 ms

Original pointers: Average: 801 ms
Back to arrays: Average: 893 ms
Back to arrays 2: Average: 749 ms
Back to arrays 3: Average: 580 ms
Spans: Average: 665 ms
Spans 2: Average: 571 ms
Spans 3: Average: 310 ms

(I'll try running the code on my desktop PC sometime tomorrow since that tends to give more consistent numbers without the CPU being throttled to save energy.)

Using the following variations of the code from post #12. One of the key things to notice is that I didn't want to be distracted by the memory allocation times, so I broke those out to the non-*Inner implementations.. Hopefully, I didn't goof as I refactored from one variation to the next. The *2 variations remove the conditional out of the first loop, and uses a temporary variable for computing the row sum before finally storing the value in the second loop. The *3 variations merges the first and second loops so that only one row is worked on at a time.

C#:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Program
{
    static public unsafe double CalcAmix(int n, double[] XY, double[] ai, double[,] kijm, out double[,] Aij, out double[] sumAi)
    {
        Aij = new double[n, n];
        sumAi = new double[n];
        return CalcAmixInner(n, XY, ai, kijm, Aij, sumAi);
    }

    static public unsafe double CalcAmixInner(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
    {
        double amix = 0;
        int UpperBound0 = Aij.GetUpperBound(0);
        int UpperBound1 = Aij.GetUpperBound(1);

        fixed (double* ptr = Aij, aiPtr = ai, kijmPtr = kijm)
        {
            double* element = ptr;
            double* kijelement = kijmPtr;
            double* aielement = aiPtr;
            for (int i = 0; i <= UpperBound0; i++)
            {
                for (int j = 0; j <= UpperBound1; j++)
                {
                    if (i == j)
                        *element = *aielement;
                    else
                        *element = Math.Sqrt(*(aiPtr + i) * *(aiPtr + j)) * (1 - *kijelement);

                    element++;
                    kijelement++;
                }

                aielement++;
            }
        }

        fixed (double* ptrAij = Aij, ptrXY = XY, ptrsumAI = sumAi)
        {
            double* elementAij = ptrAij;
            double* elementsumAI = ptrsumAI;
            double* elementXy = ptrXY;

            for (int i = 0; i < n; i++)
            {
                for (int j = 0; j < n; j++)
                {
                    *elementsumAI += *elementXy * *elementAij;
                    elementXy++;
                    elementAij++;
                }

                amix += *elementsumAI * *(ptrXY + i);
                elementXy = ptrXY;
                elementsumAI++;
            }
        }
        return amix;
    }

    static public double CalcAmixArrays(int n, double[] XY, double[] ai, double[,] kijm, out double[,] Aij, out double[] sumAi)
    {
        Aij = new double[n, n];
        sumAi = new double[n];
        return CalcAmixArraysInner(n, XY, ai, kijm, Aij, sumAi);
    }

    static public double CalcAmixArraysInner(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
    {
        for (int i = 0; i < n; i++)
        {
            for (int j = 0; j < n; j++)
            {
                if (i == j)
                    Aij[i,j] = ai[i];
                else
                    Aij[i,j] = Math.Sqrt(ai[i] * ai[j]) * (1 - kijm[i,j]);
            }
        }

        double amix = 0;
        for (int i = 0; i < n; i++)
        {
            for (int j = 0; j < n; j++)
                sumAi[i] += XY[j] * Aij[i,j];

            amix += sumAi[i] * XY[i];
        }
        return amix;
    }

    static public double CalcAmixArraysInner2(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
    {
        for (int i = 0; i < n; i++)
        {
            var aii = ai[i];
            for (int j = 0; j < n; j++)
                Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);

            Aij[i, i] = aii;
        }

        double amix = 0;
        for (int i = 0; i < n; i++)
        {
            double sum = 0;
            for (int j = 0; j < n; j++)
                sum += XY[j] * Aij[i, j];

            sumAi[i] = sum;
            amix += sum * XY[i];
        }
        return amix;
    }

    static public double CalcAmixArraysInner3(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
    {
        double amix = 0;

        for (int i = 0; i < n; i++)
        {
            var aii = ai[i];
            double sum = XY[i] * aii;

            for (int j = 0; j < n; j++)
            {
                var value = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
                Aij[i, j] = value;
                sum += XY[j] * value;
            }

            sum -= XY[i] * Aij[i, i];
            Aij[i, i] = aii;

            sumAi[i] = sum;
            amix += sum * XY[i];
        }

        return amix;
    }

    static public double CalcAmixSpansInner(int n,
                                            ReadOnlySpan<double> XY,
                                            ReadOnlySpan<double> ai,
                                            double[,] kijm,
                                            double[,] Aij,
                                            Span<double> sumAi)
    {
        for (int i = 0; i < n; i++)
        {
            var aii = ai[i];
            for (int j = 0; j < n; j++)
                Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);

            Aij[i, i] = aii;
        }

        double amix = 0;
        for (int i = 0; i < n; i++)
        {
            double sum = 0;
            for (int j = 0; j < n; j++)
                sum += XY[j] * Aij[i, j];

            sumAi[i] = sum;
            amix += sum * XY[i];
        }
        return amix;
    }

    static public unsafe double CalcAmixSpansInner2(int n,
                                             ReadOnlySpan<double> XY,
                                             ReadOnlySpan<double> ai,
                                             double[,] kijm,
                                             double[,] Aij,
                                             Span<double> sumAi)
    {
        fixed (double * kijmPtr = kijm, AijPtr = Aij)
        {
            var kijmSpan = new ReadOnlySpan<double>(kijmPtr, kijm.Length);
            var kijmRowISpan = kijmSpan.Slice(0);
            var AijSpan = new Span<double>(AijPtr, Aij.Length);
            var AijRowISpan = AijSpan.Slice(0);

            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                for (int j = 0; j < n; j++)
                    AijRowISpan[j] = Math.Sqrt(aii * ai[j]) * (1 - kijmRowISpan[j]);

                AijRowISpan[i] = aii;
                kijmRowISpan = kijmRowISpan.Slice(n);
                AijRowISpan = AijRowISpan.Slice(n);
            }

            double amix = 0;
            AijRowISpan = AijSpan.Slice(0);
            for (int i = 0; i < n; i++)
            {
                double sum = 0;
                for (int j = 0; j < n; j++)
                    sum += XY[j] * AijRowISpan[j];

                sumAi[i] = sum;
                amix += sum * XY[i];
                AijRowISpan = AijRowISpan.Slice(n);
            }
            return amix;
        }
    }

    static public unsafe double CalcAmixSpansInner3(int n,
                                                    ReadOnlySpan<double> XY,
                                                    ReadOnlySpan<double> ai,
                                                    double[,] kijm,
                                                    double[,] Aij,
                                                    Span<double> sumAi)
    {
        fixed (double* kijmPtr = kijm, AijPtr = Aij)
        {
            var kijmSpan = new ReadOnlySpan<double>(kijmPtr, kijm.Length);
            var kijmRowISpan = kijmSpan.Slice(0);
            var AijSpan = new Span<double>(AijPtr, Aij.Length);
            var AijRowISpan = AijSpan.Slice(0);
            double amix = 0;

            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                double sum = XY[i] * aii;

                for (int j = 0; j < n; j++)
                {
                    var value = Math.Sqrt(aii * ai[j]) * (1 - kijmRowISpan[j]);
                    AijRowISpan[j] = value;
                    sum += XY[j] * value;
                }

                sum -= XY[i] * AijRowISpan[i];
                AijRowISpan[i] = aii;

                sumAi[i] = sum;
                amix += sum * XY[i];

                kijmRowISpan = kijmRowISpan.Slice(n);
                AijRowISpan = AijRowISpan.Slice(n);
            }

            return amix;
        }
    }

    static Random _random = new Random(123);

    static double[] MakeRandomOneDimensionalArray(int n)
        => Enumerable.Range(0, n)
                     .Select(i => _random.NextDouble())
                     .ToArray();

    static double[,] MakeRandomTwoDimensionalArray(int n)
    {
        var arr = new double[n, n];
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                arr[i, j] = _random.NextDouble();
        return arr;
    }

    static void TimeIt(string caption, Action action)
    {
        const int iterationCount = 5;
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        for(int i = 0; i < iterationCount; i++)
            action();
        stopwatch.Stop();
        Console.WriteLine($"{caption}: Average: {stopwatch.ElapsedMilliseconds / iterationCount} ms");
    }

    static void Main()
    {
        int n = 10000;
        var xy = MakeRandomOneDimensionalArray(n);
        var ai = MakeRandomOneDimensionalArray(n);
        var kijm = MakeRandomTwoDimensionalArray(n);
        var aij = new double[n, n];
        var sumai = new double[n];

        TimeIt("Original pointers", () => CalcAmixInner(n, xy, ai, kijm, aij, sumai));
        TimeIt("Back to arrays", () => CalcAmixArraysInner(n, xy, ai, kijm, aij, sumai));
        TimeIt("Back to arrays 2", () => CalcAmixArraysInner2(n, xy, ai, kijm, aij, sumai));
        TimeIt("Back to arrays 3", () => CalcAmixArraysInner3(n, xy, ai, kijm, aij, sumai));
        TimeIt("Spans", () => CalcAmixSpansInner(n, xy, ai, kijm, aij, sumai));
        TimeIt("Spans 2", () => CalcAmixSpansInner2(n, xy, ai, kijm, aij, sumai));
        TimeIt("Spans 3", () => CalcAmixSpansInner3(n, xy, ai, kijm, aij, sumai));
    }
}
 
My results are just slightly different in that "Back to arrays 3" is coming out as the fastest. So i combined that with some pointers to get the following which is now the fastest on my PC. Not sure why the last Span Test came out so different.



option 3a:
    static unsafe public double CalcAmixArraysInner3a(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
        {
            double amix = 0;

            fixed (double* ptr = Aij, aiPtr = ai, kijmPtr = kijm)
            {
                double* elementIJ = ptr;
                double* kijelement = kijmPtr;
                double* aielement = aiPtr;
                double value;

                for (int i = 0; i < n; i++)
                {
                    var aii = ai[i];
                    double sum = XY[i] * aii;

                    for (int j = 0; j < n; j++)
                    {
                        kijelement++;
                        aielement++;
                        elementIJ++;

                        if (i > j)
                        {
                            value = *elementIJ;
                            Aij[j, i] = value;
                        }
                        else
                        {
                            value = Math.Sqrt(aii * *aielement) * (1 - *kijelement);
                            *elementIJ = value;
                        }

                        sum += XY[j] * value;
                    }

                    aielement = aiPtr;

                    sum -= XY[i] * Aij[i, i];
                    Aij[i, i] = aii;

                    sumAi[i] = sum;
                    amix += sum * XY[i];
                }
            }

            return amix;
        }
 
You are getting faster speeds with 3a because you are doing half as many square root computations, and just mirroring into the lower half. This isn't an apples to apples comparison anymore unless you update the other variants to also just do the same mirroring trick.

Also, branches in tight loops kill performance. There is a reason why my 2 variants removed the conditional in the first loop. You've now re-introduced a branch in the inner loop with that check to see if you are below the diagonal.

Anyway, if that gets you the speed you need, have fun with it. It was fun twiddling with the code to try to make it faster.
 
I meant your CalcAmixArraysInner3 was showing up as the fastest on my PC before i changed anything, the Span3 wasn't.

Anyway thanks for your help, i would never have got this far without it and you made several things much clearer for me and I will be apply them on some other arrays as well.

A big Thanks!
 
And here's results from my PC (which is circa 2010). I like the plain old arrays time with the leet number. :)
C#:
Original pointers: Average: 1377 ms
Back to arrays: Average: 1337 ms
Back to arrays 2: Average: 1146 ms
Back to arrays 3: Average: 867 ms
Spans: Average: 1164 ms
Spans 2: Average: 1024 ms
Spans 3: Average: 821 ms

Original pointers: Average: 1335 ms
Back to arrays: Average: 1336 ms
Back to arrays 2: Average: 1154 ms
Back to arrays 3: Average: 868 ms
Spans: Average: 1163 ms
Spans 2: Average: 1026 ms
Spans 3: Average: 820 ms

Amazing how even a modern laptop running on battery conservation mode manages to outpace an old PC running at full performance. :)
 
You still get Spans 3 to be the fastest. I dont get that, different processor maybe? (AMD Ryzen 9 3900X). Anyway I agree the arrays version is simpler and easier to maintain and easy to apply the same methodology to my other arrays. Adding a few pointers makes it less readable.

Amazing what a relatively small change made, program is getting pretty decent speedwise.
 
As a follow-up, the only way I could get the "back to arrays 3" to run fastest is to run the Debug build instead of the Release build. This was consistent between my laptop (Ryzen 7), my old PC (AMD Phenom II), and my work laptop (iCore 7). Please always do your timings with Release builds.

Also I saw in your OP that you were running these as unit tests. Don't do that because the VS test engine throws all kinds of overhead. Hopefully you've switched over to using the standalone Console app like I had above.

I should really start using BenchmarkDotNet to guarantee better perf testing because it compile a Release build separately and then time that release build. That's among other good things that BenchmarkDotNet does. The downside is that I can't just copy and paste code anymore for others to try without having them first install the BenchmarkDotNet Nuget package, and actually benchmarking will take longer. Maybe I can do best of both worlds -- use my quick TimeIt() code for quick tests, and then get the full benchmark using BenchmarkDotNet.
 
All were running in realease mode, I just tried your original code in a console and still get the Span 3 being slower than Back To Arrays 3. Anyway i think maybe weve optimised this routine as far as it can realtistically go, i have some other areas to re-visit, once again thanks for your valuable help!

2021-01-15.png
 
That's odd, these are my results running @Skydiver 's test code from post 16 in Release build on a AMD Ryzen 3600X:
C#:
Original pointers: Average: 503 ms
Back to arrays: Average: 592 ms
Back to arrays 2: Average: 424 ms
Back to arrays 3: Average: 325 ms
Spans: Average: 464 ms
Spans 2: Average: 300 ms
Spans 3: Average: 232 ms
 
That's odd, these are my results running @Skydiver 's test code from post 16 in Release build on a AMD Ryzen 3600X:
C#:
Original pointers: Average: 503 ms
Back to arrays: Average: 592 ms
Back to arrays 2: Average: 424 ms
Back to arrays 3: Average: 325 ms
Spans: Average: 464 ms
Spans 2: Average: 300 ms
Spans 3: Average: 232 ms
I think i got it, your running in x64 mode?
 
Yes. If you want the most performance or of your 64 bit OS running on a 64 bit machine, you should run as a 64 bit process. If you are running as a 32 but process, the 64 bit OS has to pay overhead costs for thunking to and from a child 32 not process.
 
I get similar results for all AnyCPU (both x64 and prefer 32 bit), x64 and x86.
 
Yes. If you want the most performance or of your 64 bit OS running on a 64 bit machine, you should run as a 64 bit process. If you are running as a 32 but process, the 64 bit OS has to pay overhead costs for thunking to and from a child 32 not process.
Yeah i had it left as Any Cpu, for no particular reason. It’s all a bit faster now.
 
What an interesting result... Trying to do just the top diagonal of the matrix and then trying to mirror it to the lower diagonal seems to be slower than just sequentially computing all the matrix elements. I would have never expected doing half the work to actually take longer. Goes to show how working with the CPU cache instead of against it makes a difference.

C#:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.685 (2004/?/20H1)
AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.102
  [Host]     : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  DefaultJob : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT


|                        Method |     Mean |    Error |   StdDev |   Median |
|------------------------------ |---------:|---------:|---------:|---------:|
| DoAllItemsDiagonalSpecialCase | 532.2 ms | 10.55 ms | 29.77 ms | 540.6 ms |
|           DoAllItemsSegmented | 462.0 ms |  9.22 ms | 13.80 ms | 464.9 ms |
|        DoAllItemsDiagonalLast | 499.7 ms |  9.93 ms | 15.47 ms | 498.9 ms |
|        MirrorItemsConditional | 745.8 ms | 14.39 ms | 21.55 ms | 744.3 ms |
|      MirrorItemsDiagonalFirst | 559.7 ms | 10.97 ms | 10.26 ms | 563.0 ms |
|       MirrorItemsDiagonalLast | 564.0 ms | 11.19 ms | 14.16 ms | 563.1 ms |
|                    DoJustHalf | 247.5 ms |  5.29 ms | 15.43 ms | 246.7 ms |
|  DoJustHalfThenMirrorByColumn | 742.9 ms | 14.71 ms | 17.52 ms | 741.0 ms |
|     DoJustHalfThenMirrorByRow | 778.9 ms | 12.73 ms | 27.66 ms | 773.8 ms |

That was from benchmarking the following code:
C#:
using System;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace AMixBenchmark
{
    public static class ArrayMaker
    {
        static Random _random = new Random(123);

        public static double[] MakeRandomOneDimensionalArray(int n)
            => Enumerable.Range(0, n)
                         .Select(i => _random.NextDouble())
                         .ToArray();

        public static double[,] MakeRandomTwoDimensionalArray(int n)
        {
            var arr = new double[n, n];
            for (int i = 0; i < n; i++)
                for (int j = 0; j < n; j++)
                    arr[i, j] = _random.NextDouble();
            return arr;
        }
    }

    public class BenchArray
    {
        int n = 10000;
        double[] ai;
        double[,] kijm;
        double[,] Aij;

        public BenchArray()
        {
            ai = ArrayMaker.MakeRandomOneDimensionalArray(n);
            kijm = ArrayMaker.MakeRandomTwoDimensionalArray(n);
            Aij = new double[n, n];
        }

        [Benchmark]
        public void DoAllItemsDiagonalSpecialCase()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                for (int j = 0; j < n; j++)
                {
                    if (i == j)
                        Aij[i, j] = aii;
                    else
                        Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
                }
            }
        }

        [Benchmark]
        public void DoAllItemsSegmented()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                for (int j = 0; j < i; j++)
                    Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);

                Aij[i, i] = aii;

                for (int j = i + 1; j < n; j++)
                    Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
            }
        }

        [Benchmark]
        public void DoAllItemsDiagonalLast()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                for (int j = 0; j < n; j++)
                    Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);

                Aij[i, i] = aii;
            }
        }

        [Benchmark]
        public void MirrorItemsConditional()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                for (int j = 0; j < n; j++)
                {
                    if (i > j)
                        Aij[j, i] = Aij[i, j];
                    else
                        Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
                }
                Aij[i, i] = aii;
            }
        }

        [Benchmark]
        public void MirrorItemsDiagonalFirst()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                Aij[i, i] = aii;
                for (int j = i + 1; j < n; j++)
                {
                    var value = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
                    Aij[i, j] = value;
                    Aij[j, i] = value;
                }
            }
        }

        [Benchmark]
        public void MirrorItemsDiagonalLast()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                for (int j = i + 1; j < n; j++)
                {
                    var value = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
                    Aij[i, j] = value;
                    Aij[j, i] = value;
                }
                Aij[i, i] = aii;
            }
        }

        [Benchmark]
        public void DoJustHalf()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                Aij[i, i] = aii;
                for (int j = i + 1; j < n; j++)
                    Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
            }
        }

        [Benchmark]
        public void DoJustHalfThenMirrorByColumn()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                Aij[i, i] = aii;
                for (int j = i + 1; j < n; j++)
                    Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
            }

            for (int i = 0; i < n; i++)
            {
                for (int j = i + 1; j < n; j++)
                    Aij[j, i] = Aij[i, j];
            }
        }

        [Benchmark]
        public void DoJustHalfThenMirrorByRow()
        {
            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                Aij[i, i] = aii;
                for (int j = i + 1; j < n; j++)
                    Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
            }

            for (int i = 0; i < n; i++)
            {
                for (int j = 0; j < i; j++)
                    Aij[i, j] = Aij[j, i];
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<BenchArray>();
        }
    }
}
 
So now i'm running x64 (release mode) I found that the size of the arrays makes a difference in the relative speed. For large arrays (10000 x 10000) the modified Back to Arrays 3 method is the fastest and Span 3 is a close second.

However, in my particular case the array sizes will be relatively small (but called many thousands of times), array size is determined by the number of trays in a tower which around 50 would be a kind of median (tallest towers would have maybe 100 trays, beyond that they may be split into two physical towers, these are not so common).

If i run the test with an array size of 50 I find "Back to Arrays 3" & "Modifed back to arrays 3 (or 3a)" are equal in performance and the fastest.

Implication being that the best appproach for this particular problem was optimise the array handling using 'simple' C# code (Skydiver thanks for doing that!) and no need for spans or pointers! There are other arrays were i can try the same re-arranging of the code.

The program is now solving 8 distillation towers and a simple flowsheet in about 3.9s, which is PDG!

Now i can move on to re-write the basic flash calculations elsewhere to be 'inside out' (and hence have less calls to the complex thermodynamics routines).
 
Back
Top Bottom