Array Speed

SilverShaded

Well-known member
Joined
Mar 7, 2020
Messages
93
Programming Experience
10+
is there any possible way to speed up the following method?

baset is 298.15K, baset2 is baset*baset etc

I tried converting the divides to multipliers which made no difference.

C#:
        private static double IdealGasMolarEnthalpy(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double Tk1 = Tk;
            double Tk2 = Tk1 * Tk1; // slightly faster
            double[] cp = sc.IdealVapCP;

            if (cp != null)
            {
                VE = cp[0] * (Tk1 - baset1)
                   + cp[1] * (Tk2 - baset2) / 2
                   + cp[2] * (Tk2 * Tk1 - baset3) / 3
                   + cp[3] * (Tk2 * Tk2 - baset4) / 4
                   + cp[4] * (Tk1 * Tk2 * Tk2 - baset5) / 5;
            }
            return VE * sc.MW;
        }
 
Where are baset, baset2, etc. computer? Are you counting the time for those as well?

What profiling tool are you using?
 
startic readonly double Baset = 273.15 +25;
static readonly double Baset2 = Baset * Baset;
etc...

The function is pretty small but gets called 9.6 million times typically.

For the speed test I am just running a series of tests using the VS test enviroment, in .Net 6 i get 45s, in .Net 7 i get 60.5s. I guess it could be the test enironment that is slower?
 
Are you timing stuff using a Release build, or using a debug build?

If release, and you are still finding it slow, there is a blog post about "micro-optimizations" to get the C# compiler to stop putting in as many array range checks, but I doubt that will help considering that you have hardcoded array indices (which should be optimized) instead of dynamic values which must be range checked.
 
Anyway, adding another Tk4 may help reduce one more repeated operation.
 
How parallel is it already?

Good point! It isn't at the moment. Some times there are 10 trays and 70 components, other times 200 trays and 2 components, so which part and when, to be made parrallel, will need a test/switch, there is some overhead in making things parralell so i found in the past it only helps under some circumstances - ill try it!
 
If i was running the sub-routine 96,000,000 times there is a huge speed up using parralell, but in reality its called in batches of around 100, so makeing it parrallel actually slows it down considerably due to the overhead of calling the parrallel routines.

I did shave off a few ms in other ways, but i think i'm into rapidly diminishing returns now.
 
Does doing the maths using integers (eg x1000 so 273150 rather than 273.15) help any? I expect the gains would be minimal..

Do you need the answers soon after the question is asked, or is there mileage in waiting and arranging larger batches?

Can you revise the code so the GPU does the work rather than the CPU?
 
Last edited:
Does doing the maths using integers (eg x1000 so 273150 rather than 273.15) help any? I expect the gains would be minimal..

Do you need the answers soon after the question is asked, or is there mileage in waiting and arranging larger batches?

Can you revise the code so the GPU does the work rather than the CPU?

Its a distillation tower simulation, which would be part of a potentially much larger simulations with recycles, it just needs to iterate around as fast as possible really. The speed is actually pretty decent allready, its not bad compared to some very expensive commercial simulators in C/fortran, but getting it faster is better of course.

Using the GPU is an interesting option, it has been at the back of my mind for a while, e.g. there are matrices that need to be inverted and solved multiple times for example, although they are not limiting performance at the moment.

I need to relook at solving the tower trays in parralell, i tried it but it actually stopped solving properly, i guess some variables were located in the wrong place or something.
 
How many trays in the simulated tower?
 
It varies, between 10 and 200, depending on the problem. I have a test bed of about 50 different tower designs, to make sure after any changes they still all converge. The number of components can also very from 2 as a minimum to about 75 on each tray. Usualy though, the columns with more trays have fewer components.

So total number of matrices on average would be lets say 50 trays x 75 componenets (no 3750) (size 75*75) + 1 jacobian matrix for each iteration. Solving the column could take a dozen iterations, sometimes less/more.
 
And to think that my dad used to do those simulations by hand (on a smaller scale) to feel out whether he was on the right track with this tower design before he would submit some full blown jobs on punch cards for the mainframe to run the simulation, and he would get back piles of printout out paper the following day.

Part of me wishes I paid more attention when he was trying to teach me how to do the calculations. I'm quite sure he would have a blast if I called him up this weekend and asked for how to do things. :)

Anyway, the calculation you have above almost looks like Runge-Kutta style calculation. It may be worth reading through some computer gaming books to see how they manage to do things in games in "real time", but still have realistic physics.
 
I got these interesting results:
Code:
|                    Method |     Mean |     Error |    StdDev |   Median |
|-------------------------- |---------:|----------:|----------:|---------:|
|                  UseLoops | 5.673 us | 0.4007 us | 1.1235 us | 5.167 us |
|     UseLoopsPreRangeCheck | 4.699 us | 0.0926 us | 0.1669 us | 4.690 us |
|                    Plain0 | 3.369 us | 0.0641 us | 0.0810 us | 3.319 us |
|                     Plain | 3.441 us | 0.0686 us | 0.1354 us | 3.405 us |
|        PlainPreRangeCheck | 3.295 us | 0.0656 us | 0.1534 us | 3.301 us |
|              PlainReverse | 3.857 us | 0.1763 us | 0.4736 us | 3.713 us |
| PlainPreRangeCheckReverse | 3.491 us | 0.0525 us | 0.0664 us | 3.460 us |

using the code in the spoiler below.
C#:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Buffers.Text;

namespace EnthalpyCalc
{
    public record struct Temperature(double Value)
    {
        public static implicit operator double(Temperature temp) => temp.Value;
    }

    public record class BaseComp(double [] IdealVapCP, double MW);

    public class Program
    {
        const double BaseT = 273.15 + 25;
        const double BaseT1 = BaseT;
        const double BaseT2 = BaseT1 * BaseT1;
        const double BaseT3 = BaseT2 * BaseT1;
        const double BaseT4 = BaseT2 * BaseT2;
        const double BaseT5 = BaseT4 * BaseT1;

        static readonly Temperature TestTemp;
        static readonly BaseComp TestBaseComp;

        static Program()
        {
            TestTemp = new Temperature(256);
            TestBaseComp = new BaseComp(new double[] { 1234, 4547, 2341, 9097, 8697 }, 5235.0);
        }

        static double StandardRun(Func<Temperature, BaseComp, double> computeEnthalpy)
        {
            double x = 0;
            for(int i = 0; i < 500; i++)
                x += computeEnthalpy(TestTemp, TestBaseComp);
            return x;
        }

        static double UseLoops(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double temp = Tk;
            double baseT = BaseT;
            double[] cp = sc.IdealVapCP;

            if (cp != null)
            {
                for (int i = 0; i < 5; i++)
                {
                    VE += cp[i] * (temp - baseT) / (i + 1);
                    temp *= Tk;
                    baseT *= BaseT;
                }
            }
            return VE * sc.MW;
        }

        static double UseLoopsPreRangeCheck(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double temp = Tk;
            double baseT = BaseT;
            double[] cp = sc.IdealVapCP;

            if (cp != null && cp.Length >= 5)
            {
                for (int i = 0; i < 5; i++)
                {
                    VE += cp[i] * (temp - baseT) / (i + 1);
                    temp *= Tk;
                    baseT *= BaseT;
                }
            }
            return VE * sc.MW;
        }

        static double Plain0(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double Tk1 = Tk;
            double[] cp = sc.IdealVapCP;

            if (cp != null)
            {
                VE = cp[0] * (Tk1 - BaseT1)
                   + cp[1] * (Tk1 * Tk1 - BaseT2) / 2
                   + cp[2] * (Tk1 * Tk1 * Tk1 - BaseT3) / 3
                   + cp[3] * (Tk1 * Tk1 * Tk1 * Tk1 - BaseT4) / 4
                   + cp[4] * (Tk1 * Tk1 * Tk1 * Tk1 * Tk1 - BaseT5) / 5;
            }
            return VE * sc.MW;
        }

        static double Plain(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double Tk1 = Tk;
            double Tk2 = Tk1 * Tk1; // slightly faster
            double[] cp = sc.IdealVapCP;

            if (cp != null)
            {
                VE = cp[0] * (Tk1 - BaseT1)
                   + cp[1] * (Tk2 - BaseT2) / 2
                   + cp[2] * (Tk2 * Tk1 - BaseT3) / 3
                   + cp[3] * (Tk2 * Tk2 - BaseT4) / 4
                   + cp[4] * (Tk1 * Tk2 * Tk2 - BaseT5) / 5;
            }
            return VE * sc.MW;
        }

        static double PlainPreRangeCheck(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double Tk1 = Tk;
            double Tk2 = Tk1 * Tk1; // slightly faster
            double[] cp = sc.IdealVapCP;

            if (cp != null && cp.Length >= 5)
            {
                VE = cp[0] * (Tk1 - BaseT1)
                   + cp[1] * (Tk2 - BaseT2) / 2
                   + cp[2] * (Tk2 * Tk1 - BaseT3) / 3
                   + cp[3] * (Tk2 * Tk2 - BaseT4) / 4
                   + cp[4] * (Tk1 * Tk2 * Tk2 - BaseT5) / 5;
            }
            return VE * sc.MW;
        }

        static double PlainReverse(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double Tk1 = Tk;
            double Tk2 = Tk1 * Tk1; // slightly faster
            double[] cp = sc.IdealVapCP;

            if (cp != null && cp.Length >= 5)
            {
                VE = cp[4] * (Tk1 * Tk2 * Tk2 - BaseT5) / 5
                   + cp[3] * (Tk2 * Tk2 - BaseT4) / 4
                   + cp[2] * (Tk2 * Tk1 - BaseT3) / 3
                   + cp[1] * (Tk2 - BaseT2) / 2
                   + cp[0] * (Tk1 - BaseT1);
            }
            return VE * sc.MW;
        }

        static double PlainPreRangeCheckReverse(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double Tk1 = Tk;
            double Tk2 = Tk1 * Tk1; // slightly faster
            double[] cp = sc.IdealVapCP;

            if (cp != null && cp.Length >= 5)
            {
                VE = cp[4] * (Tk1 * Tk2 * Tk2 - BaseT5) / 5
                   + cp[3] * (Tk2 * Tk2 - BaseT4) / 4
                   + cp[2] * (Tk2 * Tk1 - BaseT3) / 3
                   + cp[1] * (Tk2 - BaseT2) / 2
                   + cp[0] * (Tk1 - BaseT1);
            }
            return VE * sc.MW;
        }

        [Benchmark] public void UseLoops() => StandardRun(UseLoops);
        [Benchmark] public void UseLoopsPreRangeCheck() => StandardRun(UseLoopsPreRangeCheck);
        [Benchmark] public void Plain0() => StandardRun(Plain0);
        [Benchmark] public void Plain() => StandardRun(Plain);
        [Benchmark] public void PlainPreRangeCheck() => StandardRun(PlainPreRangeCheck);
        [Benchmark] public void PlainReverse() => StandardRun(PlainReverse);
        [Benchmark] public void PlainPreRangeCheckReverse() => StandardRun(PlainPreRangeCheckReverse);

        public static void Main()
        {
            Console.WriteLine("Benchmarking...");
            BenchmarkRunner.Run(typeof(Program).Assembly);
        }
    }
}
 
Back
Top Bottom