Resolved Two nearly identical threads, big performance difference

Henrik Kragh

Member
Joined
Nov 24, 2020
Messages
8
Programming Experience
10+
I have a big multithreaded system in C#, and I realized that performance was very different between two threads.
Now I have designed two nearly identical threads, where one performs 4-5 times faster (And it scales up linearly if you change the amount of loops they have to run).
And the difference?
One clumsy condition surrounding the actual heavy code of one of them.
It makes no sense to me, and I feel powerless in optimizing going forward, if such a minor detail can have such a huge impact.
This was tested in Unity, and as such it could be that the result is different in other environments.

ThreadA finish time: 2.8 seconds.
ThreadB finish time: 0.6 seconds.


Mind you "ThreadB" is the one that has a condition (Which will evaluate to true instantly on first while iteration).
How can such a stupid addition to the code make the actual payload (The for loops and the number crunching) perform so much faster?
Also, if I change the "delay" variable with a static "0.0" directly in the condition of ThreadB, it performs like ThreadA again.
In other words: One single double, and whether it is a hardcoded value, or references a variable, makes a difference in performance of a factor of 4-5.

Never mind the actual algorithm, which is only there to make the computer crunch some numbers.
I know I am comparing the same data again and again, that is besides the point.

I am no compilation nerd, and I have no way of probing how this differs in actual machine/assembler code.
I just know that the difference is huge, and non sensical to me.
What do I miss?
I discovered this by accident, and in the future I may not have any way of knowing a given thread performs at 20% possible speed, and that one slight change could solve it.

Please.
I need an expert to make this going from pure magic to "Oh, that's why...!
Now I know how to avoid it in the future...".
I know compilation of C# is surrounded by layers of managed stuff, but there must be a logical reason. Right?

Here is some test code with some simple structs to support it. If anyone has the time to check if they get the same results as I, I would be happy.


C#:
using System.Threading;

public class ThreadTest
{
    Thread threadA;
    Thread threadB;

    bool runThreadA = false;
    bool runThreadB = false;

    System.Diagnostics.Stopwatch stopWatch;

    double elapsedTimeA = 0;
    double elapsedTimeB = 0;
    
    public ThreadTest()
    {
        stopWatch = new System.Diagnostics.Stopwatch();
        StartThreads();
    }

    public void StartThreads ()
    {
        stopWatch.Reset();
        stopWatch.Start();
        threadA = new Thread(ThreadA);
        threadB = new Thread(ThreadB);

        runThreadA = true;
        runThreadB = true;

        elapsedTimeA = 0;
        elapsedTimeB = 0;
        threadA.Start();
        threadB.Start();
    }

    void ThreadA ()
    {
        while (runThreadA)
        {
            
            runThreadA = false;
            double preTicks = stopWatch.ElapsedTicks;

            Line3Double lineA = new Line3Double(new Vector3DoublePrecision(10, 20, 30), new Vector3DoublePrecision(100, 140, 180));
            Line3Double lineB = new Line3Double(new Vector3DoublePrecision(-10, -20, -30), new Vector3DoublePrecision(-100, -140, -180));

            int lines = 1000;

            for (int i = 0; i < 8; i++)
            {
                for (int j = 0; j < lines; j++)
                {
                    double aStartX = lineA.startX;
                    double aStartY = lineA.startY;
                    double aStartZ = lineA.startZ;

                    double aEndX = lineA.endX;
                    double aEndY = lineA.endY;
                    double aEndZ = lineA.endZ;

                    double aDirX = lineA.dirX;
                    double aDirY = lineA.dirY;
                    double aDirZ = lineA.dirZ;

                    double aDotSelf = lineA.dotSelf;

                    for (int k = 0; k < 8; k++)
                    {
                        for (int l = 0; l < lines; l++)
                        {
                            double wX = aStartX - lineB.startX;
                            double wY = aStartY - lineB.startY;
                            double wZ = aStartZ - lineB.startZ;

                            double b = aDirX * lineB.dirX + aDirY * lineB.dirY + aDirZ * lineB.dirZ;
                            double d = aDirX * wX + aDirY * wY + aDirZ * wZ;
                            double e = lineB.dirX * wX + lineB.dirY * wY + lineB.dirZ * wZ;

                            double D = aDotSelf * lineB.dotSelf - b * b;
                            double sc, tc;
                            if (D < 0.0000001)
                            {
                                // the lines are almost parallel
                                sc = 0.0f;
                                tc = (b > lineB.dotSelf ? d / b : e / lineB.dotSelf);
                            }
                            else
                            {
                                sc = (b * e - lineB.dotSelf * d) / D;
                                tc = (aDotSelf * e - b * d) / D;
                            }

                            double shortestX = wX + (sc * aDirX) - (tc * lineB.dirX);
                            double shortestY = wY + (sc * aDirY) - (tc * lineB.dirY);
                            double shortestZ = wZ + (sc * aDirZ) - (tc * lineB.dirZ);

                            double distance = shortestX * shortestX + shortestY * shortestY + shortestZ * shortestZ;
                        }
                    }
                }
            }

            double postTicks = stopWatch.ElapsedTicks;
            double time = ((postTicks - preTicks) / System.Diagnostics.Stopwatch.Frequency) * 1000;
            elapsedTimeA = time;
        }
    }

    void ThreadB()
    {
        long startTicks = stopWatch.ElapsedTicks;
        double delay = 0;

        while (runThreadB)
        {
            if ((double)(stopWatch.ElapsedTicks - startTicks) / System.Diagnostics.Stopwatch.Frequency >= delay)
            {
                runThreadB = false;
                double preTicks = stopWatch.ElapsedTicks;

                Line3Double lineA = new Line3Double(new Vector3DoublePrecision(10, 20, 30), new Vector3DoublePrecision(100, 140, 180));
                Line3Double lineB = new Line3Double(new Vector3DoublePrecision(-10, -20, -30), new Vector3DoublePrecision(-100, -140, -180));

                int lines = 1000;

                for (int i = 0; i < 8; i++)
                {
                    for (int j = 0; j < lines; j++)
                    {
                        double aStartX = lineA.startX;
                        double aStartY = lineA.startY;
                        double aStartZ = lineA.startZ;

                        double aEndX = lineA.endX;
                        double aEndY = lineA.endY;
                        double aEndZ = lineA.endZ;

                        double aDirX = lineA.dirX;
                        double aDirY = lineA.dirY;
                        double aDirZ = lineA.dirZ;

                        double aDotSelf = lineA.dotSelf;

                        for (int k = 0; k < 8; k++)
                        {
                            for (int l = 0; l < lines; l++)
                            {
                                double wX = aStartX - lineB.startX;
                                double wY = aStartY - lineB.startY;
                                double wZ = aStartZ - lineB.startZ;

                                double b = aDirX * lineB.dirX + aDirY * lineB.dirY + aDirZ * lineB.dirZ;
                                double d = aDirX * wX + aDirY * wY + aDirZ * wZ;
                                double e = lineB.dirX * wX + lineB.dirY * wY + lineB.dirZ * wZ;

                                double D = aDotSelf * lineB.dotSelf - b * b;
                                double sc, tc;
                                if (D < 0.0000001)
                                {
                                    // the lines are almost parallel
                                    sc = 0.0f;
                                    tc = (b > lineB.dotSelf ? d / b : e / lineB.dotSelf);
                                }
                                else
                                {
                                    sc = (b * e - lineB.dotSelf * d) / D;
                                    tc = (aDotSelf * e - b * d) / D;
                                }

                                double shortestX = wX + (sc * aDirX) - (tc * lineB.dirX);
                                double shortestY = wY + (sc * aDirY) - (tc * lineB.dirY);
                                double shortestZ = wZ + (sc * aDirZ) - (tc * lineB.dirZ);

                                double distance = shortestX * shortestX + shortestY * shortestY + shortestZ * shortestZ;
                            }
                        }
                    }
                }

                double postTicks = stopWatch.ElapsedTicks;
                double time = ((postTicks - preTicks) / System.Diagnostics.Stopwatch.Frequency) * 1000;
                elapsedTimeB = time;
            }
        }
    }
}
public struct Vector3DoublePrecision
{
    public double x;
    public double y;
    public double z;

    public Vector3DoublePrecision(double x, double y, double z)
    {
        this.x = x;
        this.y = y;
        this.z = z;
    }
}

public struct Line3Double
{
    public double startX;
    public double startY;
    public double startZ;
    public double endX;
    public double endY;
    public double endZ;

    public double dirX;
    public double dirY;
    public double dirZ;

    public double dotSelf;

    public Line3Double(Vector3DoublePrecision start, Vector3DoublePrecision end)
    {
        startX = start.x;
        startY = start.y;
        startZ = start.z;

        endX = end.x;
        endY = end.y;
        endZ = end.z;

        dirX = end.x - start.x;
        dirY = end.y - start.y;
        dirZ = end.z - start.z;

        dotSelf = dirX * dirX + dirY * dirY + dirZ * dirZ;
    }
}
 
Solution
Thanks for all your inputs and especially thanks to Skydiver for testing it out. Your low times made me accept something was more "off" than just the difference in compilation between the two threads.
And so, I really dug into the whole compilation setup of the my Unity project, and looked into settings I normally don't touch (Never had to, never cared to), and now that it has become a Unity specific problem, I may as well talk Unity specific solutions. Because a solution HAS been found. And a major one at that.
So, in project settings>player one can choose between Mono and IL2CPP as "scripting backend". Default is Mono.
Which gives me both these high times, AND the difference between the two threads.
2.8s on A
0.6s on B

When changed...
I'm not seeing a huge difference.
C#:
Thread A: 60.38267433430785
Thread B: 59.519296446894785

Below is the code that I used. Whenever you are doing timings, be sure to compile for Release:
C#:
using System;
using System.Threading;

public class ThreadTest
{
    Thread threadA;
    Thread threadB;

    bool runThreadA = false;
    bool runThreadB = false;

    System.Diagnostics.Stopwatch stopWatch;

    double elapsedTimeA = 0;
    double elapsedTimeB = 0;

    public (double, double) Run()
    {
        stopWatch = new System.Diagnostics.Stopwatch();
        StartThreads();
        threadA.Join();
        threadB.Join();
        return (elapsedTimeA, elapsedTimeB);
    }

    public void StartThreads()
    {
        stopWatch.Reset();
        stopWatch.Start();
        threadA = new Thread(ThreadA);
        threadB = new Thread(ThreadB);

        runThreadA = true;
        runThreadB = true;

        elapsedTimeA = 0;
        elapsedTimeB = 0;
        threadA.Start();
        threadB.Start();
    }

    void ThreadA()
    {
        while (runThreadA)
        {

            runThreadA = false;
            double preTicks = stopWatch.ElapsedTicks;

            Line3Double lineA = new Line3Double(new Vector3DoublePrecision(10, 20, 30), new Vector3DoublePrecision(100, 140, 180));
            Line3Double lineB = new Line3Double(new Vector3DoublePrecision(-10, -20, -30), new Vector3DoublePrecision(-100, -140, -180));

            int lines = 1000;

            for (int i = 0; i < 8; i++)
            {
                for (int j = 0; j < lines; j++)
                {
                    double aStartX = lineA.startX;
                    double aStartY = lineA.startY;
                    double aStartZ = lineA.startZ;

                    double aEndX = lineA.endX;
                    double aEndY = lineA.endY;
                    double aEndZ = lineA.endZ;

                    double aDirX = lineA.dirX;
                    double aDirY = lineA.dirY;
                    double aDirZ = lineA.dirZ;

                    double aDotSelf = lineA.dotSelf;

                    for (int k = 0; k < 8; k++)
                    {
                        for (int l = 0; l < lines; l++)
                        {
                            double wX = aStartX - lineB.startX;
                            double wY = aStartY - lineB.startY;
                            double wZ = aStartZ - lineB.startZ;

                            double b = aDirX * lineB.dirX + aDirY * lineB.dirY + aDirZ * lineB.dirZ;
                            double d = aDirX * wX + aDirY * wY + aDirZ * wZ;
                            double e = lineB.dirX * wX + lineB.dirY * wY + lineB.dirZ * wZ;

                            double D = aDotSelf * lineB.dotSelf - b * b;
                            double sc, tc;
                            if (D < 0.0000001)
                            {
                                // the lines are almost parallel
                                sc = 0.0f;
                                tc = (b > lineB.dotSelf ? d / b : e / lineB.dotSelf);
                            }
                            else
                            {
                                sc = (b * e - lineB.dotSelf * d) / D;
                                tc = (aDotSelf * e - b * d) / D;
                            }

                            double shortestX = wX + (sc * aDirX) - (tc * lineB.dirX);
                            double shortestY = wY + (sc * aDirY) - (tc * lineB.dirY);
                            double shortestZ = wZ + (sc * aDirZ) - (tc * lineB.dirZ);

                            double distance = shortestX * shortestX + shortestY * shortestY + shortestZ * shortestZ;
                        }
                    }
                }
            }

            double postTicks = stopWatch.ElapsedTicks;
            double time = ((postTicks - preTicks) / System.Diagnostics.Stopwatch.Frequency) * 1000;
            elapsedTimeA = time;
        }
    }

    void ThreadB()
    {
        long startTicks = stopWatch.ElapsedTicks;
        double delay = 0;

        while (runThreadB)
        {
            if ((double)(stopWatch.ElapsedTicks - startTicks) / System.Diagnostics.Stopwatch.Frequency >= delay)
            {
                runThreadB = false;
                double preTicks = stopWatch.ElapsedTicks;

                Line3Double lineA = new Line3Double(new Vector3DoublePrecision(10, 20, 30), new Vector3DoublePrecision(100, 140, 180));
                Line3Double lineB = new Line3Double(new Vector3DoublePrecision(-10, -20, -30), new Vector3DoublePrecision(-100, -140, -180));

                int lines = 1000;

                for (int i = 0; i < 8; i++)
                {
                    for (int j = 0; j < lines; j++)
                    {
                        double aStartX = lineA.startX;
                        double aStartY = lineA.startY;
                        double aStartZ = lineA.startZ;

                        double aEndX = lineA.endX;
                        double aEndY = lineA.endY;
                        double aEndZ = lineA.endZ;

                        double aDirX = lineA.dirX;
                        double aDirY = lineA.dirY;
                        double aDirZ = lineA.dirZ;

                        double aDotSelf = lineA.dotSelf;

                        for (int k = 0; k < 8; k++)
                        {
                            for (int l = 0; l < lines; l++)
                            {
                                double wX = aStartX - lineB.startX;
                                double wY = aStartY - lineB.startY;
                                double wZ = aStartZ - lineB.startZ;

                                double b = aDirX * lineB.dirX + aDirY * lineB.dirY + aDirZ * lineB.dirZ;
                                double d = aDirX * wX + aDirY * wY + aDirZ * wZ;
                                double e = lineB.dirX * wX + lineB.dirY * wY + lineB.dirZ * wZ;

                                double D = aDotSelf * lineB.dotSelf - b * b;
                                double sc, tc;
                                if (D < 0.0000001)
                                {
                                    // the lines are almost parallel
                                    sc = 0.0f;
                                    tc = (b > lineB.dotSelf ? d / b : e / lineB.dotSelf);
                                }
                                else
                                {
                                    sc = (b * e - lineB.dotSelf * d) / D;
                                    tc = (aDotSelf * e - b * d) / D;
                                }

                                double shortestX = wX + (sc * aDirX) - (tc * lineB.dirX);
                                double shortestY = wY + (sc * aDirY) - (tc * lineB.dirY);
                                double shortestZ = wZ + (sc * aDirZ) - (tc * lineB.dirZ);

                                double distance = shortestX * shortestX + shortestY * shortestY + shortestZ * shortestZ;
                            }
                        }
                    }
                }

                double postTicks = stopWatch.ElapsedTicks;
                double time = ((postTicks - preTicks) / System.Diagnostics.Stopwatch.Frequency) * 1000;
                elapsedTimeB = time;
            }
        }
    }
}
public struct Vector3DoublePrecision
{
    public double x;
    public double y;
    public double z;

    public Vector3DoublePrecision(double x, double y, double z)
    {
        this.x = x;
        this.y = y;
        this.z = z;
    }
}

public struct Line3Double
{
    public double startX;
    public double startY;
    public double startZ;
    public double endX;
    public double endY;
    public double endZ;

    public double dirX;
    public double dirY;
    public double dirZ;

    public double dotSelf;

    public Line3Double(Vector3DoublePrecision start, Vector3DoublePrecision end)
    {
        startX = start.x;
        startY = start.y;
        startZ = start.z;

        endX = end.x;
        endY = end.y;
        endZ = end.z;

        dirX = end.x - start.x;
        dirY = end.y - start.y;
        dirZ = end.z - start.z;

        dotSelf = dirX * dirX + dirY * dirY + dirZ * dirZ;
    }
}

class Shell
{
    static void Main()
    {
        Console.WriteLine("Running...");
        var (a, b) = new ThreadTest().Run();
        Console.WriteLine($"Thread A: {a}");
        Console.WriteLine($"Thread B: {b}");
    }
}
 
This is code running in Unity. And as such I am not compiling for release, I just "Build the project" ;)

But it was for sure good to mention, because I had not built it yet (Just tested directly in the editor). But when I do, the problem actually gets even bigger.
Times in Unity build:
ThreadA: 8.77 seconds
ThreadB: 0.65 seconds

Same code as already posted. A further 4x increase in time usage.

Seems whatever is compiling the code is doing a better job on your end.
That for sure makes it a lot harder to debug, if this is differs across environments.
Any idea how to approach this?

Thank you for taking a look!
 
So, in post #2, I had to add code to make sure that both threads had already finished before reading the elapsed time variables that you set within the thread. Are you sure you are waiting for the second thread to finish before you are pulling the time from it?
 
So, in post #2, I had to add code to make sure that both threads had already finished before reading the elapsed time variables that you set within the thread. Are you sure you are waiting for the second thread to finish before you are pulling the time from it?
Yes :) Otherwise I also guess my result would be 0. Because the two double values holding times are either 0, or an actual result.
I prepared the code for this forum to not be too Unity bound, and in that process I removed my way of pulling out the result.
Because of the time it takes (Which is a physically felt time of seconds) I don't only rely my result on the values, but on how fast ThreadB finishes, compared to ThreadA. I have to physically wait for the results to come in.
It seems your computer also performs way faster than mine overall, as both your threads finishes in 60 ms. On mine it actually takes a "felt" amount of time...
 
Please note that the machine I tested on was built around 2010 and was a mid-class gaming machine back then.
 
I am pretty sure unity doesn't like multi threading as it is already pretty multi threaded itself and relies a lot on refraction.

In the past when building complex systems for unity I created external Dll's for this purpose so that unity just had to call them not re fracture them.

I am unable to test you code as my computer decided to implode and I will be without one for at least the next week or two.

One of the strategies I found usefull in a case like this is to test the speed of each side independently... then look at where they might block each-other.

I read the code and didn't see anything wrong at first glace.
 
Moved to third party product forum.
As my code didn't invoke any Unity functionality or came even close to the Unity API, I didn't figure this to be related to Unity at all. Now I am not so sure. I have filed a bug report to Unity, maybe they can figure out what's going on.
 
Please note that the machine I tested on was built around 2010 and was a mid-class gaming machine back then.
My computer is brand new, 18 core Intel i9 processor. So if you can get it to run these threads in 60ms, while my times starts at 600ms, something is clearly wrong.
The loops should get the actual calculation to run 64,000,000 times. Are you sure your old computer should be able to do that in 60ms?
Maybe I am getting used to low performance because Unity makes ALL my code run slow. I am simply not nerdy enough to judge stuff like that.

EDIT: Could it be that your compiler simply strips away the whole calculation due to the fact that the "distance" calculation never leaves the scope? I have no idea how much compilers do to optimize, but as the "distance" calculation isn't really used for anything, in a real world scenario it would be great to have the compiler simply strip such non sensical stuff from the final machine code. Right?
 
Last edited:
I am pretty sure unity doesn't like multi threading as it is already pretty multi threaded itself and relies a lot on refraction.

In the past when building complex systems for unity I created external Dll's for this purpose so that unity just had to call them not re fracture them.

I am unable to test you code as my computer decided to implode and I will be without one for at least the next week or two.

One of the strategies I found usefull in a case like this is to test the speed of each side independently... then look at where they might block each-other.

I read the code and didn't see anything wrong at first glace.
I don't know what "refraction" means (Well, I do know the word, and how it relates to the way em waves transition between materials ;)). But it sounds like you know stuff. It never entered my mind that Unity would have problem with stuff like that, as in my head Unity only deals with their own API, and how we interact with that, and not the general compilation or execution of the more general code. But as already said, I am simply not very low level techy.
 
I didn't figure this to be related to Unity at all. Now I am not so sure.
Your project dictates which forum it belongs too entirely. Since your code and reported issue is relative to threading in unity and threading within unity, and not just c#, your problem is not strictly a c# issue. Moving it here doesn't mean we can't help you with it.

I'll try look over this later if I get time, as I'm the only unity developer on this board afaik. You should note however, unity doesn't need a threading class of it's own so why did you feel you needed to add it?

I'm working long hours so it might be long before I get time to revisit this topic.
 
You should note however, unity doesn't need a threading class of it's own so why did you feel you needed to add it?

This is not a "Unity threading class". This is standard threading (System.Threading).
I need it because I need some calculations that doesn't run in Unity's main thread. Some stuff might be more heavy, and so can't be performed within a standard frame time of Unity.
So unless I want the game to "stutter", then I need to do it in separate threads. And unless I need some result each frame, there is no reason to try and do it within a frame. This way I don't hurt my frame rate.
 
Last edited:
I saved the code from ThreadA() and ThreadB() into a.cs and b.cs respectively and ran `git diff -w a.cs b.cs` to try to see the difference between the two. This is the result that came out:
C#:
diff --git a/a.cs b/b.cs
index fa46078..449d9ed 100644
--- a/a.cs
+++ b/b.cs
@@ -1,9 +1,13 @@
-    void ThreadA()
-    {
-        while (runThreadA)
+    void ThreadB()
     {
+        long startTicks = stopWatch.ElapsedTicks;
+        double delay = 0;

-            runThreadA = false;
+        while (runThreadB)
+        {
+            if ((double)(stopWatch.ElapsedTicks - startTicks) / System.Diagnostics.Stopwatch.Frequency >= delay)
+            {
+                runThreadB = false;
                 double preTicks = stopWatch.ElapsedTicks;

                 Line3Double lineA = new Line3Double(new Vector3DoublePrecision(10, 20, 30), new Vector3DoublePrecision(100, 140, 180));
@@ -67,6 +71,7 @@

                 double postTicks = stopWatch.ElapsedTicks;
                 double time = ((postTicks - preTicks) / System.Diagnostics.Stopwatch.Frequency) * 1000;
-            elapsedTimeA = time;
+                elapsedTimeB = time;
+            }
         }
     }

So essentially the only difference between ThreadA() and ThreadB() is:
C#:
while (runThreadA)
{
    runThreadA = false;
vs.
C#:
long startTicks = stopWatch.ElapsedTicks;
double delay = 0;

while (runThreadB)
{
    if ((double)(stopWatch.ElapsedTicks - startTicks) / System.Diagnostics.Stopwatch.Frequency >= delay)
    {
        runThreadB = false;

Since the delay is zero, then the condition in the if statement should always be true (unless the elapsed number of ticks wraps around back to zero between the time it took to set the value startTicks and the time takes to execute the check in the if-statement. But that wrap around case should result in Thread B taking longer than Thread A, which is not the result you are seeing where Thread A takes longer than Thread B.

At this point, there are only two things that I can think of at this point:
1) The code that you are testing for speed, is not the code that you posted here for us to try out.
2) The time it takes to JIT (Just-In-Time) compile on Unity is very slow and currently Thread A, is paying the price to perform the JIT operation.
 
Test them independently, first only A, then only B. Then compare the times.
 
Back
Top Bottom