Array Acces Speed

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
Hello, i have a program which is heavy on array access, so i've been tyring to optimise the speed. This is important for the overall program which is highly iterative with lots of matrix manipulation. I was hoping to get any criticism of the code below or any further suggestions to speed it up more.

Current run times (in release mode) are respectively, in ms, the relative time differences are relatively constant, at the moment option 4 is looking like the best possible, which represent a significant increase in speed over option 2 which is the current method in the progam. Before i go changing a whole bunch of arrays are there any obvious improvements possible or better ways to do it or any problems with the code below that may give odd results?

elapsedMs1 128
elapsedMs2 29
elapsedMs3 228
elapsedMs4 7

Array Access Speed Test:
 [TestMethod]
        [MethodImpl(MethodImplOptions.NoInlining)]
        public unsafe void TestDoubleDoubleArrayWithPts()
        {

            double[,] Array = new double[10000,1000];
            var watch = Stopwatch.StartNew();
            int UpperBound0 = Array.GetUpperBound(0);
            int UpperBound1 = Array.GetUpperBound(1);

            watch.Restart();
            for (int i = 0; i < Array.GetUpperBound(0); i++)
                for (int y = 0; y < Array.GetUpperBound(1); y++)
                    Array[i,y] = 10D * 10D;

            var elapsedMs1 = watch.ElapsedMilliseconds;

            watch.Restart();
            for (int i = 0; i < UpperBound0; i++)
                for (int y = 0; y < UpperBound1; y++)
                    Array[i, y] = 10D * 10D;

            var elapsedMs2 = watch.ElapsedMilliseconds;

            watch.Restart();
            fixed (double* ptr = Array)
            {
                for (int i = 0; i < Array.GetUpperBound(0); i++)
                    for (int y = 0; y < Array.GetUpperBound(1); y++)
                        *(ptr + i * Array.GetUpperBound(1) + y) = 10D * 10D;    
            }

            var elapsedMs3 = watch.ElapsedMilliseconds;

            watch.Restart();
            fixed (double* ptr = Array)
            {
                for (int i = 0; i < UpperBound0; i++)
                    for (int y = 0; y < UpperBound1; y++)
                        *(ptr + i*UpperBound1 + y) = 10D * 10D;      
            }

            var elapsedMs4 = watch.ElapsedMilliseconds;

            System.Windows.Forms.MessageBox.Show(elapsedMs1.ToString() + " " + elapsedMs2.ToString() + " " + elapsedMs3.ToString() + " " + elapsedMs4.ToString());
       
        }
 
Solution
Yes your right, it looks odd now you mention it, when i tested it the results looked ok, late here so ill check it tommorrow. If the results are actually correct maybe that routine can be simplified, the final matrix should be symmetrical around the diagonal, some odd thing in the math maybe. Could just be wrong though. Should be able to cut the sqrt calls in half actually, as its symmetrical, but i was too knackered to get it working today. Need to think about it with a fresh head on.

parralell might be interesting!

i can’t reduce the number of calls to this function for pure tower simulations but definately can for streams and general flash calculations.

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
interesting, yes im not familiar with Spans, a quick search seems to give mostly examples with one dimensional arrays, are they as good for two dimensional?
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
Multidimensional arrays (as opposed to jagged arrays) are laid out contiguously in memory. That is why your pointer code on lines 36-41 works. A span just represents a contiguous range of memory.

Lines 36-41 could be re-written as:
C#:
fixed (double* ptr = Array)
{
    double * dst = ptr;
    for (int i = 0; i < UpperBound0; i++)
        for (int y = 0; y < UpperBound1; y++)
            *dst++ = 10D * 10D;
}
to show the contiguous nature of the array elements.
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
im not familiar with Spans
I mentioned Spans in your other thread where you were also asking about optimizing matrix access:
 

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
I mentioned Spans in your other thread where you were also asking about optimizing matrix access:
Thank yes, it probably didn't fully register with me at the time, im kind of slow to pick up new programming tricks and tend to just plow on with the bulk of the code. Its a 'spare time' project so paid work (chemical engineering/thermodynamics) often interrupts my train of thought.

Actualy im quite impressed with how fast C# actually is with some of these tricks, my program performace isnt quite up to commercially available program performace yet (e.g. stuff probably written in C) but it is getting there for this particular application.
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
Also as I noted in the other thread, always choose to try to find a more efficient algorithm before you start going down the path of micro-optimizations. Go beyond the Newton-Raphson method, and use something more efficient.

Out of curiosity, are you using simulated annealing as an optimization method? Is this why you have to do multiple iterations of what is sounding like a simulation? Why doesn't the simplex method for optimization work for your problem space?

If also need to do matrix multiplication and the profiler is identifying it as the bottleneck, make note of the comment I made in the other thread with regards to creating 2 versions of the matrix: one in the standard row major layout, and a transformed version that is effectively column major. This will speed things up because the memory accesses will be contiguous and help take advantage of the processor's memory cache as well as its predictors regarding what memory to bring into the cache.

Micro-optimizations may speed up processing, but it slows down the developer because the code becomes much harder to parse and understand. With more complexity, it's easier for bugs to sneak in. And also with more complexity, it is harder to have a big mental picture of all the moving pieces. Without a big picture understanding, it's harder to see where other optimizations could be made, or where refactoring can be done.
 

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
Also as I noted in the other thread, always choose to try to find a more efficient algorithm before you start going down the path of micro-optimizations. Go beyond the Newton-Raphson method, and use something more efficient.

Out of curiosity, are you using simulated annealing as an optimization method? Is this why you have to do multiple iterations of what is sounding like a simulation? Why doesn't the simplex method for optimization work for your problem space?

If also need to do matrix multiplication and the profiler is identifying it as the bottleneck, make note of the comment I made in the other thread with regards to creating 2 versions of the matrix: one in the standard row major layout, and a transformed version that is effectively column major. This will speed things up because the memory accesses will be contiguous and help take advantage of the processor's memory cache as well as its predictors regarding what memory to bring into the cache.

Micro-optimizations may speed up processing, but it slows down the developer because the code becomes much harder to parse and understand. With more complexity, it's easier for bugs to sneak in. And also with more complexity, it is harder to have a big mental picture of all the moving pieces. Without a big picture understanding, it's harder to see where other optimizations could be made, or where refactoring can be done.

Yes you are correct its a distillation tower simulation. Matricies come into it in various aspects, one of the main industray standard approaches is an 'inside out' method where the inner problem is solved with simplified thermodynamics and the outer problem iterates around by updating the inner simplified equations. There are matrices to solve the component balance for every component (so for a crude oil tower that would be ~100 matrices) solved direclty by matrix inversion. There is a jacobian matrix which contains the gradients (evaluated by pertubing the 100 component matrices * the number of variables, upto possible 100 for a large complex tower). This jacobian is then solved by typical newtown raphson type iterations until all the constraints are met. Currently i'm inverting the jacobian at every step rather than using the Sherman-Morrison update formula.

The slowest step in all of this is actually in the thermodynamics even though its the outer loop of the inside out method and not in the component matrices or jacobian matrix inversion and updating. (this may change as i try larger problems).

There is a small function which was consuming about 20% or so of the overall runtime (tower solving time is roughly 300ms to 1s, this will grow with bigger problems) This small function is buried in the thermodynamics and is used throughout the simulation not just in tower problems. So this is where i'm currently targeting improvement(s).

Ok so 300 ms doesn't sound a lot to solve a tower (i recently heard from an expert that the 1980's mantra was 'a tower an hour'), but its still significant if there is a simulation with several towers and there are recycles in the flowsheet from one tower to the next and back.


Slow Step:
        static public double CalcAmix(Components cc, double[] XY, double[] ai, double[,] kijm, out double[,] Aij, out double[] sumAi)
        {
           // Debug.Print(": CalcAmix");
            //
            // T is in units of Kelvin; Tc in Kelvin; Pc in Pa
            //
            int n = cc.components.Count;
            double amix = 0;
            Aij = new double[n,n];
            sumAi = new double[n];

            for (int i = 0; i < n; i++)
            {
                for (int j = 0; j < n; j++)
                {
                    if (i == j)
                        Aij[i, j] = ai[i];
                    else if (i > j)
                        Aij[i, j] = Aij[j, i];
                    else
                        Aij[i, j] = Math.Sqrt(ai[i] * ai[j]) * (1 - kijm[i, j]);
                }
            }

            for (int i = 0; i < n; i++)
            {
                for (int j = 0; j < n; j++)
                    sumAi[i] += XY[j] * Aij[i, j];
               
                amix += sumAi[i] * XY[i];
            }

            return amix;
        }
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
Yes, I recall, that my father used to say "a tower an hour", but then he used to say it with such glee. It's because he was a chemical engineer working with oil refineries since the 60's, so to bring down the simulation to an hour with a lot of fidelity was a major improvement from their overnight runs on the mainframe with just rough numbers. I also recall him having tons of fun of being able to do optimization problem solving with Lotus 1-2-3 and a 1980's PC clone, and what a time difference adding in a math co-processor made to the calculations.
 

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
Yes, I recall, that my father used to say "a tower an hour", but then he used to say it with such glee. It's because he was a chemical engineer since the 60's, so to bring down the simulation to an hour with a lot of fidelity was a major improvement from their overnight runs on the mainframe with just rough numbers. I also recall him having tons of fun of being able to do optimization problem solving with Lotus 1-2-3 and a 1980's PC clone, and what a time difference adding in a math co-processor made to the calculations.

It has moved on a long way and these days simulation is allmost treated like a simple calculator, instant accurate (unrealistically so) answers are often demanded.

I am intrigued that there may may be better alternatives to Newton Raphson, hadn't really thought about it much. NR can have a tendency to diverge on this problem without good intial guesses. I spent a lot of time improving the original published algorithm (which often requires good guesses for starting conditions) so that guesses are not required, they are now estimated by the program (this is a (small) improvement over several commercial simulators). However a more robust method than NR (or better vrsion of NR rather than my own) wouldn't do any harm at all.
 

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
Found time to try it out, its no longer the slowest step.

Actually calling Math.Sqrt is now my slowest step and its called 35,700 times, consuming 10% of the total run time... so running out of significant potential improvements maybe...

Speeded up version:
   static public unsafe double CalcAmix(Components cc, double[] XY, double[] ai, double[,] kijm, out double[,] Aij, out double[] sumAi)
        {
            // Debug.Print(": CalcAmix");
            //
            // T is in units of Kelvin; Tc in Kelvin; Pc in Pa
            //
            // Pointer Examples
         
            int n = cc.components.Count;
            double amix = 0;
            Aij = new double[n,n];
            sumAi = new double[n];
            int UpperBound0 = Aij.GetUpperBound(0);
            int UpperBound1 = Aij.GetUpperBound(1);

            fixed (double* ptr = Aij, aiPtr= ai, kijmPtr=kijm)
            {
                double* element = ptr;
                double* kijelement = kijmPtr;
                double* aielement = aiPtr;
                for (int i = 0; i <= UpperBound0; i++)
                {                
                    for (int j = 0; j <= UpperBound1; j++)
                    {
                        if (i == j)
                            *element = *aielement;
                        //else if (i > j)
                        //    *(ptr + i * UpperBound1 + j) = *(ptr + j * UpperBound1 + i);
                        else
                            *element = Math.Sqrt(*(aiPtr + i) * *(aiPtr + j)) * (1 - *kijelement);

                        element++;
                        kijelement++;
                    }
                   
                    aielement++;
                }
            }

            fixed (double* ptrAij = Aij, ptrXY = XY, ptrsumAI = sumAi)
            {
                double* elementAij = ptrAij;
                double* elementsumAI = ptrsumAI;
                double* elementXy = ptrXY;

                for (int i = 0; i < n; i++)
                {                
                    for (int j = 0; j < n; j++)
                    {
                        *elementsumAI += *elementXy * *elementAij;
                        elementXy++;
                        elementAij++;
                    }

                    amix += *elementsumAI * *(ptrXY+i);
                    elementXy = ptrXY;
                    elementsumAI++;
                }
            }
            return amix;
        }
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
I think that there are opportunities for parallelism so get more speed out of it because it looks like your outputs don't overlap. I'll look at it a bit more when I'm off work.

Unless your can roll your own faster lookup based square root function, I don't think you can get square root to go any faster.

But in the meantime, I suspect that there is something wrong with the elements that you are accessing to pass into the square root:
C#:
Math.Sqrt(*(aiPtr + i) * *(aiPtr + j)) * (1 - *kijelement)
translates to array notation:
C#:
Math.Sqrt(Aij[0,i] * Aij[0,j] * (1 - kijm[i,j])
I doubt that you always want just the first row of the Aij matrix.
 

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
Yes your right, it looks odd now you mention it, when i tested it the results looked ok, late here so ill check it tommorrow. If the results are actually correct maybe that routine can be simplified, the final matrix should be symmetrical around the diagonal, some odd thing in the math maybe. Could just be wrong though. Should be able to cut the sqrt calls in half actually, as its symmetrical, but i was too knackered to get it working today. Need to think about it with a fresh head on.

parralell might be interesting!

i can’t reduce the number of calls to this function for pure tower simulations but definately can for streams and general flash calculations.
 
Solution

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
I think that there are opportunities for parallelism so get more speed out of it because it looks like your outputs don't overlap. I'll look at it a bit more when I'm off work.

Unless your can roll your own faster lookup based square root function, I don't think you can get square root to go any faster.

But in the meantime, I suspect that there is something wrong with the elements that you are accessing to pass into the square root:
C#:
Math.Sqrt(*(aiPtr + i) * *(aiPtr + j)) * (1 - *kijelement)
translates to array notation:
C#:
Math.Sqrt(Aij[0,i] * Aij[0,j] * (1 - kijm[i,j])
I doubt that you always want just the first row of the Aij matrix.
My mistake. I was confused by your naming convention. So on closer look now that I'm off work, this:
C#:
Math.Sqrt(*(aiPtr + i) * *(aiPtr + j)) * (1 - *kijelement)
translates over to:
C#:
Math.Sqrt(ai[i] * ai[j]) * (1 - kijm[i,j])
which now looks kind of more reasonable.
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
So without using any parallelism yet, I've managed to get these timings on my laptop running on battery for 10000x10000 matrices:
C#:
Original pointers: Average: 729 ms
Back to arrays: Average: 916 ms
Back to arrays 2: Average: 650 ms
Back to arrays 3: Average: 585 ms
Spans: Average: 875 ms
Spans 2: Average: 610 ms
Spans 3: Average: 308 ms

Original pointers: Average: 801 ms
Back to arrays: Average: 893 ms
Back to arrays 2: Average: 749 ms
Back to arrays 3: Average: 580 ms
Spans: Average: 665 ms
Spans 2: Average: 571 ms
Spans 3: Average: 310 ms

(I'll try running the code on my desktop PC sometime tomorrow since that tends to give more consistent numbers without the CPU being throttled to save energy.)

Using the following variations of the code from post #12. One of the key things to notice is that I didn't want to be distracted by the memory allocation times, so I broke those out to the non-*Inner implementations.. Hopefully, I didn't goof as I refactored from one variation to the next. The *2 variations remove the conditional out of the first loop, and uses a temporary variable for computing the row sum before finally storing the value in the second loop. The *3 variations merges the first and second loops so that only one row is worked on at a time.

C#:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Program
{
    static public unsafe double CalcAmix(int n, double[] XY, double[] ai, double[,] kijm, out double[,] Aij, out double[] sumAi)
    {
        Aij = new double[n, n];
        sumAi = new double[n];
        return CalcAmixInner(n, XY, ai, kijm, Aij, sumAi);
    }

    static public unsafe double CalcAmixInner(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
    {
        double amix = 0;
        int UpperBound0 = Aij.GetUpperBound(0);
        int UpperBound1 = Aij.GetUpperBound(1);

        fixed (double* ptr = Aij, aiPtr = ai, kijmPtr = kijm)
        {
            double* element = ptr;
            double* kijelement = kijmPtr;
            double* aielement = aiPtr;
            for (int i = 0; i <= UpperBound0; i++)
            {
                for (int j = 0; j <= UpperBound1; j++)
                {
                    if (i == j)
                        *element = *aielement;
                    else
                        *element = Math.Sqrt(*(aiPtr + i) * *(aiPtr + j)) * (1 - *kijelement);

                    element++;
                    kijelement++;
                }

                aielement++;
            }
        }

        fixed (double* ptrAij = Aij, ptrXY = XY, ptrsumAI = sumAi)
        {
            double* elementAij = ptrAij;
            double* elementsumAI = ptrsumAI;
            double* elementXy = ptrXY;

            for (int i = 0; i < n; i++)
            {
                for (int j = 0; j < n; j++)
                {
                    *elementsumAI += *elementXy * *elementAij;
                    elementXy++;
                    elementAij++;
                }

                amix += *elementsumAI * *(ptrXY + i);
                elementXy = ptrXY;
                elementsumAI++;
            }
        }
        return amix;
    }

    static public double CalcAmixArrays(int n, double[] XY, double[] ai, double[,] kijm, out double[,] Aij, out double[] sumAi)
    {
        Aij = new double[n, n];
        sumAi = new double[n];
        return CalcAmixArraysInner(n, XY, ai, kijm, Aij, sumAi);
    }

    static public double CalcAmixArraysInner(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
    {
        for (int i = 0; i < n; i++)
        {
            for (int j = 0; j < n; j++)
            {
                if (i == j)
                    Aij[i,j] = ai[i];
                else
                    Aij[i,j] = Math.Sqrt(ai[i] * ai[j]) * (1 - kijm[i,j]);
            }
        }

        double amix = 0;
        for (int i = 0; i < n; i++)
        {
            for (int j = 0; j < n; j++)
                sumAi[i] += XY[j] * Aij[i,j];

            amix += sumAi[i] * XY[i];
        }
        return amix;
    }

    static public double CalcAmixArraysInner2(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
    {
        for (int i = 0; i < n; i++)
        {
            var aii = ai[i];
            for (int j = 0; j < n; j++)
                Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);

            Aij[i, i] = aii;
        }

        double amix = 0;
        for (int i = 0; i < n; i++)
        {
            double sum = 0;
            for (int j = 0; j < n; j++)
                sum += XY[j] * Aij[i, j];

            sumAi[i] = sum;
            amix += sum * XY[i];
        }
        return amix;
    }

    static public double CalcAmixArraysInner3(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
    {
        double amix = 0;

        for (int i = 0; i < n; i++)
        {
            var aii = ai[i];
            double sum = XY[i] * aii;

            for (int j = 0; j < n; j++)
            {
                var value = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);
                Aij[i, j] = value;
                sum += XY[j] * value;
            }

            sum -= XY[i] * Aij[i, i];
            Aij[i, i] = aii;

            sumAi[i] = sum;
            amix += sum * XY[i];
        }

        return amix;
    }

    static public double CalcAmixSpansInner(int n,
                                            ReadOnlySpan<double> XY,
                                            ReadOnlySpan<double> ai,
                                            double[,] kijm,
                                            double[,] Aij,
                                            Span<double> sumAi)
    {
        for (int i = 0; i < n; i++)
        {
            var aii = ai[i];
            for (int j = 0; j < n; j++)
                Aij[i, j] = Math.Sqrt(aii * ai[j]) * (1 - kijm[i, j]);

            Aij[i, i] = aii;
        }

        double amix = 0;
        for (int i = 0; i < n; i++)
        {
            double sum = 0;
            for (int j = 0; j < n; j++)
                sum += XY[j] * Aij[i, j];

            sumAi[i] = sum;
            amix += sum * XY[i];
        }
        return amix;
    }

    static public unsafe double CalcAmixSpansInner2(int n,
                                             ReadOnlySpan<double> XY,
                                             ReadOnlySpan<double> ai,
                                             double[,] kijm,
                                             double[,] Aij,
                                             Span<double> sumAi)
    {
        fixed (double * kijmPtr = kijm, AijPtr = Aij)
        {
            var kijmSpan = new ReadOnlySpan<double>(kijmPtr, kijm.Length);
            var kijmRowISpan = kijmSpan.Slice(0);
            var AijSpan = new Span<double>(AijPtr, Aij.Length);
            var AijRowISpan = AijSpan.Slice(0);

            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                for (int j = 0; j < n; j++)
                    AijRowISpan[j] = Math.Sqrt(aii * ai[j]) * (1 - kijmRowISpan[j]);

                AijRowISpan[i] = aii;
                kijmRowISpan = kijmRowISpan.Slice(n);
                AijRowISpan = AijRowISpan.Slice(n);
            }

            double amix = 0;
            AijRowISpan = AijSpan.Slice(0);
            for (int i = 0; i < n; i++)
            {
                double sum = 0;
                for (int j = 0; j < n; j++)
                    sum += XY[j] * AijRowISpan[j];

                sumAi[i] = sum;
                amix += sum * XY[i];
                AijRowISpan = AijRowISpan.Slice(n);
            }
            return amix;
        }
    }

    static public unsafe double CalcAmixSpansInner3(int n,
                                                    ReadOnlySpan<double> XY,
                                                    ReadOnlySpan<double> ai,
                                                    double[,] kijm,
                                                    double[,] Aij,
                                                    Span<double> sumAi)
    {
        fixed (double* kijmPtr = kijm, AijPtr = Aij)
        {
            var kijmSpan = new ReadOnlySpan<double>(kijmPtr, kijm.Length);
            var kijmRowISpan = kijmSpan.Slice(0);
            var AijSpan = new Span<double>(AijPtr, Aij.Length);
            var AijRowISpan = AijSpan.Slice(0);
            double amix = 0;

            for (int i = 0; i < n; i++)
            {
                var aii = ai[i];
                double sum = XY[i] * aii;

                for (int j = 0; j < n; j++)
                {
                    var value = Math.Sqrt(aii * ai[j]) * (1 - kijmRowISpan[j]);
                    AijRowISpan[j] = value;
                    sum += XY[j] * value;
                }

                sum -= XY[i] * AijRowISpan[i];
                AijRowISpan[i] = aii;

                sumAi[i] = sum;
                amix += sum * XY[i];

                kijmRowISpan = kijmRowISpan.Slice(n);
                AijRowISpan = AijRowISpan.Slice(n);
            }

            return amix;
        }
    }

    static Random _random = new Random(123);

    static double[] MakeRandomOneDimensionalArray(int n)
        => Enumerable.Range(0, n)
                     .Select(i => _random.NextDouble())
                     .ToArray();

    static double[,] MakeRandomTwoDimensionalArray(int n)
    {
        var arr = new double[n, n];
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                arr[i, j] = _random.NextDouble();
        return arr;
    }

    static void TimeIt(string caption, Action action)
    {
        const int iterationCount = 5;
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        for(int i = 0; i < iterationCount; i++)
            action();
        stopwatch.Stop();
        Console.WriteLine($"{caption}: Average: {stopwatch.ElapsedMilliseconds / iterationCount} ms");
    }

    static void Main()
    {
        int n = 10000;
        var xy = MakeRandomOneDimensionalArray(n);
        var ai = MakeRandomOneDimensionalArray(n);
        var kijm = MakeRandomTwoDimensionalArray(n);
        var aij = new double[n, n];
        var sumai = new double[n];

        TimeIt("Original pointers", () => CalcAmixInner(n, xy, ai, kijm, aij, sumai));
        TimeIt("Back to arrays", () => CalcAmixArraysInner(n, xy, ai, kijm, aij, sumai));
        TimeIt("Back to arrays 2", () => CalcAmixArraysInner2(n, xy, ai, kijm, aij, sumai));
        TimeIt("Back to arrays 3", () => CalcAmixArraysInner3(n, xy, ai, kijm, aij, sumai));
        TimeIt("Spans", () => CalcAmixSpansInner(n, xy, ai, kijm, aij, sumai));
        TimeIt("Spans 2", () => CalcAmixSpansInner2(n, xy, ai, kijm, aij, sumai));
        TimeIt("Spans 3", () => CalcAmixSpansInner3(n, xy, ai, kijm, aij, sumai));
    }
}
 

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
My results are just slightly different in that "Back to arrays 3" is coming out as the fastest. So i combined that with some pointers to get the following which is now the fastest on my PC. Not sure why the last Span Test came out so different.



option 3a:
    static unsafe public double CalcAmixArraysInner3a(int n, double[] XY, double[] ai, double[,] kijm, double[,] Aij, double[] sumAi)
        {
            double amix = 0;

            fixed (double* ptr = Aij, aiPtr = ai, kijmPtr = kijm)
            {
                double* elementIJ = ptr;
                double* kijelement = kijmPtr;
                double* aielement = aiPtr;
                double value;

                for (int i = 0; i < n; i++)
                {
                    var aii = ai[i];
                    double sum = XY[i] * aii;

                    for (int j = 0; j < n; j++)
                    {
                        kijelement++;
                        aielement++;
                        elementIJ++;

                        if (i > j)
                        {
                            value = *elementIJ;
                            Aij[j, i] = value;
                        }
                        else
                        {
                            value = Math.Sqrt(aii * *aielement) * (1 - *kijelement);
                            *elementIJ = value;
                        }

                        sum += XY[j] * value;
                    }

                    aielement = aiPtr;

                    sum -= XY[i] * Aij[i, i];
                    Aij[i, i] = aii;

                    sumAi[i] = sum;
                    amix += sum * XY[i];
                }
            }

            return amix;
        }
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
You are getting faster speeds with 3a because you are doing half as many square root computations, and just mirroring into the lower half. This isn't an apples to apples comparison anymore unless you update the other variants to also just do the same mirroring trick.

Also, branches in tight loops kill performance. There is a reason why my 2 variants removed the conditional in the first loop. You've now re-introduced a branch in the inner loop with that check to see if you are below the diagonal.

Anyway, if that gets you the speed you need, have fun with it. It was fun twiddling with the code to try to make it faster.
 

SilverShaded

Active member
Joined
Mar 7, 2020
Messages
30
Programming Experience
10+
I meant your CalcAmixArraysInner3 was showing up as the fastest on my PC before i changed anything, the Span3 wasn't.

Anyway thanks for your help, i would never have got this far without it and you made several things much clearer for me and I will be apply them on some other arrays as well.

A big Thanks!
 

Skydiver

Staff member
Joined
Apr 6, 2019
Messages
2,416
Location
Chesapeake, VA
Programming Experience
10+
And here's results from my PC (which is circa 2010). I like the plain old arrays time with the leet number. :)
C#:
Original pointers: Average: 1377 ms
Back to arrays: Average: 1337 ms
Back to arrays 2: Average: 1146 ms
Back to arrays 3: Average: 867 ms
Spans: Average: 1164 ms
Spans 2: Average: 1024 ms
Spans 3: Average: 821 ms

Original pointers: Average: 1335 ms
Back to arrays: Average: 1336 ms
Back to arrays 2: Average: 1154 ms
Back to arrays 3: Average: 868 ms
Spans: Average: 1163 ms
Spans 2: Average: 1026 ms
Spans 3: Average: 820 ms

Amazing how even a modern laptop running on battery conservation mode manages to outpace an old PC running at full performance. :)
 
Top Bottom