Array Speed

SilverShaded · Feb 26, 2023

is there any possible way to speed up the following method?

baset is 298.15K, baset2 is baset*baset etc

I tried converting the divides to multipliers which made no difference.

C#:

        private static double IdealGasMolarEnthalpy(Temperature Tk, BaseComp sc)
        {
            double VE = 0;
            double Tk1 = Tk;
            double Tk2 = Tk1 * Tk1; // slightly faster
            double[] cp = sc.IdealVapCP;

            if (cp != null)
            {
                VE = cp[0] * (Tk1 - baset1)
                   + cp[1] * (Tk2 - baset2) / 2
                   + cp[2] * (Tk2 * Tk1 - baset3) / 3
                   + cp[3] * (Tk2 * Tk2 - baset4) / 4
                   + cp[4] * (Tk1 * Tk2 * Tk2 - baset5) / 5;
            }
            return VE * sc.MW;
        }

cjard · Mar 1, 2023

If you ditch the array coming back from `sc.IdealVapCP;` and just use 5 variables in e.g. a tuple, does it cost less?

SilverShaded · Mar 1, 2023

cjard said:
If you ditch the array coming back from `sc.IdealVapCP;` and just use 5 variables in e.g. a tuple, does it cost less?

I just tried it and it didnt help, which surprises me actually, would have thought the array access would have had some overhead. It's the last case below, added two more cases to SkyDivers set.

Next to last case i realised all I need in the were fixed numbers (updated peridoically)

New case:

        static double PlainPreRangeCheckPassAllTks(Temperature Tk, BaseComp sc)
        {

            double VE = 0;
            double[] cp = sc.IdealVapCP;

            if (cp != null && cp.Length >= 5)
            {
                VE = cp[0] * DT1
                   + cp[1] * DT2
                   + cp[2] * DT3
                   + cp[3] * DT4
                   + cp[4] * DT5;
            }
            return VE * sc.MW;
        }

SilverShaded · Mar 1, 2023

Skydiver said:
And to think that my dad used to do those simulations by hand (on a smaller scale) to feel out whether he was on the right track with this tower design before he would submit some full blown jobs on punch cards for the mainframe to run the simulation, and he would get back piles of printout out paper the following day.

Part of me wishes I paid more attention when he was trying to teach me how to do the calculations. I'm quite sure he would have a blast if I called him up this weekend and asked for how to do things.

Anyway, the calculation you have above almost looks like Runge-Kutta style calculation. It may be worth reading through some computer gaming books to see how they manage to do things in games in "real time", but still have realistic physics.

There used to be, and probably still are, short cut hand/graphical methods for solving the problem. Luckily i just missed the graphics card stage (graduated 1986). A tower an hour was the goal i believe back then. Today some of the simpler columns are solving in 14ms which is mind blowing, and the slowest in my test bed is about 5s. (although there are more difficult columns i'm not testing).

Even 5 seconds is annoying though, a full flowsheet could have dozens of columns and multiple recycles.

One alternative solution method is equation oriented which means all the equations are opened up and solved simultaneously, massive matrix, often used for real time optimisation applications on process plant. However these are huge applications, difficult to maintain and have several drawbacks. (like they can have big problems converging).

Most of my time was spent getting the iterative column algorithm 'reliable', and writing the simulator interface & flowsheet solver.

Really appreciate you guys suggestions for optimising the code speed, I would never try some of these things out without your suggestions!

Skydiver · Mar 1, 2023

Notice though that in both your results and mine that pre-computing Tk2 was actually slower than just using Tk1 and multiplying it multiple times (see Plain0 vs Plain). My suspicion is that the cost of reading from memory to get Tk2 and then multiplying is slower than just to read Tk1 from memory once into a register, and then just multiply it several times.

Skydiver · Mar 1, 2023

Interesting... on my 12 year old machine that has an old AMD Phenom II processor, I got the Plain0 vs Plain to flip to the expected results where more multiplication is slower than reading from memory.

Skydiver · Mar 1, 2023

@cjard was unto something regarding ditching the array. See PlainFlat below:

Code:

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19044.2604/21H2/November2021Update)
AMD Phenom(tm) II X4 965 Processor, 1 CPU, 4 logical and 4 physical cores
Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET
.NET SDK=7.0.103
  [Host]     : .NET 7.0.3 (7.0.323.6910), X64 RyuJIT SSE3
  DefaultJob : .NET 7.0.3 (7.0.323.6910), X64 RyuJIT SSE3


|                    Method |      Mean |     Error |    StdDev |    Median |
|-------------------------- |----------:|----------:|----------:|----------:|
|                  UseLoops | 16.124 us | 0.0200 us | 0.0187 us | 16.122 us |
|     UseLoopsPreRangeCheck | 16.024 us | 0.0901 us | 0.0843 us | 16.070 us |
|                    Plain0 |  7.082 us | 0.0400 us | 0.0374 us |  7.070 us |
|                 PlainFlat |  6.736 us | 0.0335 us | 0.0314 us |  6.722 us |
|                     Plain |  7.014 us | 0.0421 us | 0.0394 us |  7.046 us |
|        PlainPreRangeCheck |  6.992 us | 0.0382 us | 0.0357 us |  6.980 us |
|              PlainReverse |  7.852 us | 0.0483 us | 0.0452 us |  7.843 us |
| PlainPreRangeCheckReverse |  7.863 us | 0.0450 us | 0.0399 us |  7.860 us |

C#:

public record class BaseCompFlat(double IdealVapCP0,
                                 double IdealVapCP1,
                                 double IdealVapCP2,
                                 double IdealVapCP3,
                                 double IdealVapCP4,
                                 double MW);

static double PlainFlat(Temperature Tk, BaseCompFlat sc)
{
    double Tk1 = Tk;
    double Tk2 = Tk1 * Tk1;
    double VE = sc.IdealVapCP0 * (Tk1 - BaseT1)
              + sc.IdealVapCP1 * (Tk2  - BaseT2) / 2
              + sc.IdealVapCP2 * (Tk2 * Tk1  - BaseT3) / 3
              + sc.IdealVapCP3 * (Tk2 * Tk2 - BaseT4) / 4
              + sc.IdealVapCP4 * (Tk2 * Tk2 * Tk1 - BaseT5) / 5;
    return VE * sc.MW;
}

SilverShaded · Mar 6, 2023

Thanks guys, im still working through lots of these suggestions, heres a video showing the nascent app... Just for interest really.

cjard · Mar 7, 2023

Can the divide 2 and divide 4 be done as bit shifts?

Skydiver · Mar 7, 2023

Not for floating point.

cjard · Mar 7, 2023

In this app can that be done away with too?

(And not in a Quake 3 FISR sense)

Skydiver · Mar 7, 2023

I know some game developers built their own fixed point math libraries to speed up their flight and racing simulations back in the 80's and 90's. So it is a possibility. I think that trend in the 2010's and onwards is just to make use of CPU vector operations and/or take advantage of the GPU by making the shaders do the math.

cjard · Mar 7, 2023

I'm thinking along the lines of "if, say, 6 decimals is accurate/precise enough, could the math be done in integrals that are stored/processed naturally and then just divided by 1_000_000 for display?"

(I've no idea what kind of values will be in these doubles)

Array Speed

SilverShaded

Well-known member

cjard

Well-known member

SilverShaded

Well-known member

Attachments

SilverShaded

Well-known member

Skydiver

Skydiver

Skydiver

SilverShaded

Well-known member

Attachments

cjard

Well-known member

Skydiver

cjard

Well-known member

Skydiver

cjard

Well-known member

Similar threads

Share this page

Latest posts