Resolved How to access an element from a non-sequential 2D array

SankaUD

Member
Joined
May 17, 2018
Messages
19
Programming Experience
1-3
Hi All,

This may sound stupid to you, as I'm a casual programmer. I have a two-dimensional array of values. Where rows are angles in decimal degrees, increments can be by 1 degree, 5 degrees even 10. Columns are also organized similar manner. I know how to access a generic two-dimensional array, but not sure how to go about this one, say if I want to access a value referenced at 5 degrees (row) and 100 degrees (col). Do I need to store the row and column values in two different matrixes and use a for loop to iterate through data? Any suggestions?

Ex: (x) represents a decimal number

5 0 10 20 30 40
10 x x x x x
15 x x x x x
20 x x x x x
25 x x x x x
30 x x x x x
 
Yikes! Floating point numbers can't be used as indexes into arrays. First three lines are floating point numbers. You will need to make those integers of you wish to use them as array indices.

If you want to use floating point, then for correctness, you will have to compare against some epsilon. This is where skip lists or sparse arrays may help.
 
Yikes! Floating point numbers can't be used as indexes into arrays. First three lines are floating point numbers. You will need to make those integers of you wish to use them as array indices.

If you want to use floating point, then for correctness, you will have to compare against some epsilon. This is where skip lists or sparse arrays may help.

The angles do not need to be floating point numbers. It can be converted to integers. I believe it's a rare case that someone conducts a test in smaller steps.
 
My take on it:

C#:
var file = @"0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 90.0 95.0 100.0 105.0
110.0 115.0 120.0 125.0 130.0 135.0 140.0 145.0 150.0 155.0 160.0 165.0 170.0 175.0 180.0
0.0 90.0 180.0 270.0 360.0
1400.734 1398.456 1374.237 1333.317 1276.078 1202.066 1110.461 1006.302 892.741 774.296 654.736 537.447 425.874 321.486
225.278 139.505 69.087 21.752 1.288 0.671 1.035 1.758 2.496 3.384 4.353 5.237 6.225 7.106
8.049 8.949 9.726 10.572 11.275 12.057 12.680 13.094 13.268
1400.734 1384.893 1362.734 1324.096 1268.993 1197.711 1108.523 1005.510 892.906 775.262 656.559 540.323 429.048 324.650
228.363 142.541 70.829 21.450 0.517 0.000 0.000 0.296 1.220 2.301 3.231 4.334 5.278 6.294
7.270 8.144 9.073 9.831 10.733 11.464 12.147 12.679 13.268
1400.734 1396.554 1370.509 1327.727 1268.035 1192.095 1098.487 991.733 876.026 756.559 636.745 520.386 409.607 305.437
209.379 124.615 56.134 14.073 0.000 0.000 0.000 0.775 1.854 2.841 3.960 4.973 6.082 7.088
8.049 9.008 9.823 10.702 11.487 12.199 12.844 13.267 13.268
1400.734 1379.348 1351.499 1307.695 1247.560 1170.390 1076.484 970.428 856.312 738.308 620.713 506.334 397.666 296.162
203.108 121.258 54.462 15.163 1.017 0.993 1.431 1.991 2.797 3.618 4.481 5.444 6.283 7.229
8.037 8.838 9.577 10.310 11.102 11.743 12.414 12.900 13.268
1400.734 1398.456 1374.237 1333.317 1276.078 1202.066 1110.461 1006.302 892.741 774.296 654.736 537.447 425.874 321.486
225.278 139.505 69.087 21.752 1.288 0.671 1.035 1.758 2.496 3.384 4.353 5.237 6.225 7.106
8.049 8.949 9.726 10.572 11.275 12.057 12.680 13.094 13.268";
       
        var lines = file.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries); //like File.ReadAllLines
       
        var h = lines[0].Split().Concat(lines[1].Split()).Select(s=>(int)(Decimal.Parse(s)*10)).ToArray();
       
        var v = lines[2].Split().Select(s=>(int)(Decimal.Parse(s)*10)).ToArray();

        var dict = new Dictionary<(int H, int V), decimal>();
        for(int vi = 0; vi < v.Length; vi++){
            var range = (vi+1)*3;
            var data = string.Join(" ", lines[range..(range+3)]).Split().Select(Decimal.Parse).ToArray();
           
            for(int hi = 0; hi < h.Length; hi++){
                dict[(h[hi], v[vi])] = data[hi];
            }
        }
       
        Console.WriteLine("Measurement at 155.0,90.0 is "+dict[(1550,900)]);


Tell you what, you show us yours, maybe @Skydiver will do a sparse array approach and then I'll do a BenchmarkDotNet test on the 2 or 3 approaches and you can see the numbers. I'd expect sparse array to be the fastest, dictionary to be shortly behind and looping-for-indexes to be considerably slower than the other two
 
Last edited:
My take on it:

C#:
var file = @"0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 90.0 95.0 100.0 105.0
110.0 115.0 120.0 125.0 130.0 135.0 140.0 145.0 150.0 155.0 160.0 165.0 170.0 175.0 180.0
0.0 90.0 180.0 270.0 360.0
1400.734 1398.456 1374.237 1333.317 1276.078 1202.066 1110.461 1006.302 892.741 774.296 654.736 537.447 425.874 321.486
225.278 139.505 69.087 21.752 1.288 0.671 1.035 1.758 2.496 3.384 4.353 5.237 6.225 7.106
8.049 8.949 9.726 10.572 11.275 12.057 12.680 13.094 13.268
1400.734 1384.893 1362.734 1324.096 1268.993 1197.711 1108.523 1005.510 892.906 775.262 656.559 540.323 429.048 324.650
228.363 142.541 70.829 21.450 0.517 0.000 0.000 0.296 1.220 2.301 3.231 4.334 5.278 6.294
7.270 8.144 9.073 9.831 10.733 11.464 12.147 12.679 13.268
1400.734 1396.554 1370.509 1327.727 1268.035 1192.095 1098.487 991.733 876.026 756.559 636.745 520.386 409.607 305.437
209.379 124.615 56.134 14.073 0.000 0.000 0.000 0.775 1.854 2.841 3.960 4.973 6.082 7.088
8.049 9.008 9.823 10.702 11.487 12.199 12.844 13.267 13.268
1400.734 1379.348 1351.499 1307.695 1247.560 1170.390 1076.484 970.428 856.312 738.308 620.713 506.334 397.666 296.162
203.108 121.258 54.462 15.163 1.017 0.993 1.431 1.991 2.797 3.618 4.481 5.444 6.283 7.229
8.037 8.838 9.577 10.310 11.102 11.743 12.414 12.900 13.268
1400.734 1398.456 1374.237 1333.317 1276.078 1202.066 1110.461 1006.302 892.741 774.296 654.736 537.447 425.874 321.486
225.278 139.505 69.087 21.752 1.288 0.671 1.035 1.758 2.496 3.384 4.353 5.237 6.225 7.106
8.049 8.949 9.726 10.572 11.275 12.057 12.680 13.094 13.268";
     
        var lines = file.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries); //like File.ReadAllLines
     
        var h = lines[0].Split().Concat(lines[1].Split()).Select(s=>(int)(Decimal.Parse(s)*10)).ToArray();
     
        var v = lines[2].Split().Select(s=>(int)(Decimal.Parse(s)*10)).ToArray();

        var dict = new Dictionary<(int H, int V), decimal>();
        for(int vi = 0; vi < v.Length; vi++){
            var range = (vi+1)*3;
            var data = string.Join(" ", lines[range..(range+3)]).Split().Select(Decimal.Parse).ToArray();
         
            for(int hi = 0; hi < h.Length; hi++){
                dict[(h[hi], v[vi])] = data[hi];
            }
        }
     
        Console.WriteLine("Measurement at 155.0,90.0 is "+dict[(1550,900)]);


Tell you what, you show us yours, maybe @Skydiver will do a sparse array approach and then I'll do a BenchmarkDotNet test on the 2 or 3 approaches and you can see the numbers. I'd expect sparse array to be the fastest, dictionary to be shortly behind and looping-for-indexes to be considerably slower than the other two

Here is the code that worked for me so far. Basically, save the data to a text file with new line termination. I don't believe this is not the most efficient way because I'm a casual programmer.

C#:
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Runtime.Remoting.Messaging;
using System.Globalization;

namespace ReadFile
{
    internal class Program
    {
        static void Main(string[] args)
        {           
            readData();           
        }     

        static List<string> readFile(string filePath)
        {
            List<string> lines = new List<string>();

            using (StreamReader reader = new StreamReader(filePath))
            {               
                while (!reader.EndOfStream)
                {
                    string line = reader.ReadLine();                  
                   
                    lines.Add(line);                      
                }                  
               
            }
            return lines;
        }
       
        static void readData()
        {           
            Dictionary<int, double> vertRead = new Dictionary<int, double>();
            Dictionary<int, double> horiRead = new Dictionary<int, double>();           

            string filePath = "Test.txt";

            List<string> lines = readFile(filePath);
          
            string[] vertString = lines[0].Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
            for (int i = 0; i < vertString.Length; i++)
            {
                vertRead.Add(i, Convert.ToDouble(vertString[i]));
            }

            string[] horiString = lines[1].Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
            for (int i = 0; i < horiString.Length; i++)
            {
                horiRead.Add(i, Convert.ToDouble(horiString[i]));
            }

            double[,] dataTable = new double[horiRead.Count,vertRead.Count];

            int colIndex = 0;
            for (int i = 2; i < lines.Count; i++)
            {
                string[] dataString = lines[i].Split((char[])null, StringSplitOptions.RemoveEmptyEntries);

                for (int j = 0; j < vertRead.Count; j++)
                {
                    dataTable[colIndex, j] = Convert.ToDouble(dataString[j]);                   
                }
                colIndex++;
            }
                       
            Console.WriteLine("Enter Horizontal Angle :");
            double horiAngle = Convert.ToInt32(Console.ReadLine());
            Console.WriteLine("Enter Vertical Angle :");
            double vertAngle = Convert.ToInt32(Console.ReadLine());

            Console.WriteLine("Value at Horizontal Angle {0} and Vertical angle {1} is :{2} :", horiAngle, vertAngle, dataTable[horiRead.FirstOrDefault(x => x.Value==horiAngle).Key, vertRead.FirstOrDefault(y => y.Value == vertAngle).Key]);

            Console.ReadKey();
        }
    }
}
 

Attachments

  • Test.txt
    1.5 KB · Views: 10
*Sigh*

So, the file you've just posted is nothing like the file you originally claimed to have when you said "below is an extract of an actual data set"
 
*Sigh*

So, the file you've just posted is nothing like the file you originally claimed to have when you said "below is an extract of an actual data set"

Confused:oops:. It is from the actual dataset. I copied the data from the file it it should be like for like.
 
The first file you posted looks like:

C#:
Line 1: 60% of the vertical headers
Line 2: 40% of the vertical headers
Line 3: 100% of the horizontal headers
Line 4: First 40% of the data for the first vertical
Line 5: Second 40% of the data for the first vertical
Line 6: Final 20% of the data for the first vertical
Line 7: First 40% of the data for the second vertical..

I.e. it's essentially split into blocks of 3 lines, and 5 verticals requires an 18 line file (5*3 data, plus 3 header lines)

The second file looks like:

C#:
Line 1: 100% of the vertical headers
Line 2: 100% of the horizontal headers
Line 3: 100% of the data for the first vertical
Line 4: 100% of the data for the second vertical ...

Lines are not split into blocks of 3.
5 verticals are described in 7 lines (5*1 lines for data plus two header lines)

First file:
1690778428546.png
Second file (what I see when I open the file on my cellphone then paste into .net fiddle to count lines):
1690778496614.png

Same data, totally different positional presentation; the code I wrote against the first file(and your description of it) will totally fail to work against the second file. The code you wrote is very different to your description, and will totally fail against the first file. For reference, this is how the file was described:

As you can note, the first two lines have vertical angle measurements taken every 5 degrees, and the horizontal plane, the third line, is only done every 90 degrees. following four datasets, terminated with newline characters, represent measurements taken at each vertical angle for four horizontal planes.
(I assumed the "four planes" was a typo)

---------
Moving onto looking at the actual code though, most critically your code builds its dictionary the wrong way round. This means when you have a a number that is the Value and you want a number that is the Key, you have to look through every item in the dictionary to find the Value, then pull its Key:

C#:
horiRead.FirstOrDefault(x => x.Value==horiAngle).Key

And the same for the vertRead dictionary. This is the bits that destroys performance.

If you built the dictionary the other way round (and don't use doubles; you might as well leave it as its string for all it matters, if the user will type "135.0" for the search angle) like this:

C#:
vertRead.Add(vertString[i], I);

Then the dictionary can search in the direction it was designed; "know Key, lookup Value"

Never arrange a dictionary such that it's "know Value, lookup Key". If you have scenarios where sometimes the search is eg Angle to Index and other times is Index to Angle, then keep two dictionaries

For this particular application, where it seems the slowest part by hundreds of orders of magnitude, is the user typing in their search terms using the dictionary the wrong way round ain't going to matter the slightest iota. If you're building something that will be queries hundreds or thousands of times a second, then it matters

So, out of curiosity, let's benchmark these approaches. It doesn't matter, from a perf test, about the data presentation because the thing being tested (the parsed result) is the same. I'll post back in a bit
 
Last edited:
I wrote another couple of things, and i the end tested your "lookup keys by looping dictionary values, then lookup in 2d array" approach against a dictionary with tuple, and a sparse 2D array. I also converted yours to use decial rather than double, for fairness against other decimal approaches

Then I also pondered "what happens if one day there is a file with tenth of a degree precision on both axes"

Here are the results:
C#:
BenchmarkDotNet v0.13.6, Windows 10 (10.0.19044.3086/21H2/November2021Update)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


|                Method |         Mean |      Error |     StdDev |   Gen0 | Allocated |
|---------------------- |-------------:|-----------:|-----------:|-------:|----------:|
|          Cjard2dArray |     28.58 ns |   0.358 ns |   0.335 ns |      - |         - |
|    Cjard2dArrayBigger |     56.34 ns |   0.689 ns |   0.576 ns |      - |         - |
|             CjardDict |     49.82 ns |   0.656 ns |   0.513 ns |      - |         - |
|       CjardDictBigger |    247.97 ns |   4.157 ns |   3.889 ns |      - |         - |
|               SankaUD |    411.65 ns |   8.169 ns |   8.023 ns | 0.0648 |     272 B |
|       SankaUD_Decimal |    508.33 ns |   3.820 ns |   3.573 ns | 0.0725 |     304 B |
| SankaUD_DecimalBigger | 48,769.85 ns | 562.027 ns | 498.222 ns | 0.0610 |     304 B |

The sparse 2D array is fastest, and this is entirely expected. There is very little work to do in performing a lookup and each lookup takes about 30 nanoseconds. Dictionary tuple is slower; there is more to do in creating a tuple, hashing it, storing it in an array and jumping if there are collisions. All in it's around 60% slower, at 50 nanoseconds per operation. They're both and order of magnitude faster than a "looping the dictionary approach"

The really intersting thing is how the look changes when we work on much bigger datasets. At tenth of a degree precision we've gone from about 200 measurements to 6.5 million, and if we're considering datasets generated by machine that might even be a small dataset.

With bigger arrays and bigger dictionaries comes a performance punch on the nose; the 2D array is half the speed at 60ns per lookup, the Dictionary tuple is a fifth the speed at 250ns per lookup but these are something of drops in the ocean on the looping approach taking 95 times longer, at nearly 49,000 nanoseconds per lookup...

---

And what does it look like if you built the dictionary the other way round and used it as it was intended, rather than looping through its values looking for your hit? That 49,000 ns comes down to 135ns (not present in the above table).


The key take-away here is that nested loops are a very slow way to do things.. You need to understand that doing a FirstOrDefault on a dictionary is just going to loop through it one by one, looking for whatever entry satisfies the predicate passed in. On average, assuming random lookup values, that is going to find the wanted entry after searching half the values so in our bigger dataset, a random lookup will result in 3.25 million values being checked befoer the wanted one is found..

Dictionary was designed to be able to calculate where a wanted value is and go straight(ish) to it. It does some checking that it found what it wanted, because it's possible that Dictionary will end up wanting to store two different keys in the same place, so it has a strategy for mitigating that but it costs a bit of time. A sparse 2D array is guaranteed able to calculate where the wanted value is and there will never be a collision with another value, but you cannot always universally apply it and it might burn a lot of memory in some cases. Dictionary is usually a good tradeoff between time and space consumption
 
Last edited:
I wrote another couple of things, and i the end tested your "lookup keys by looping dictionary values, then lookup in 2d array" approach against a dictionary with tuple, and a sparse 2D array. I also converted yours to use decial rather than double, for fairness against other decimal approaches

Then I also pondered "what happens if one day there is a file with tenth of a degree precision on both axes"

Here are the results:
C#:
BenchmarkDotNet v0.13.6, Windows 10 (10.0.19044.3086/21H2/November2021Update)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK 7.0.306
  [Host]     : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2


|                Method |         Mean |      Error |     StdDev |   Gen0 | Allocated |
|---------------------- |-------------:|-----------:|-----------:|-------:|----------:|
|          Cjard2dArray |     28.58 ns |   0.358 ns |   0.335 ns |      - |         - |
|    Cjard2dArrayBigger |     56.34 ns |   0.689 ns |   0.576 ns |      - |         - |
|             CjardDict |     49.82 ns |   0.656 ns |   0.513 ns |      - |         - |
|       CjardDictBigger |    247.97 ns |   4.157 ns |   3.889 ns |      - |         - |
|               SankaUD |    411.65 ns |   8.169 ns |   8.023 ns | 0.0648 |     272 B |
|       SankaUD_Decimal |    508.33 ns |   3.820 ns |   3.573 ns | 0.0725 |     304 B |
| SankaUD_DecimalBigger | 48,769.85 ns | 562.027 ns | 498.222 ns | 0.0610 |     304 B |

The sparse 2D array is fastest, and this is entirely expected. There is very little work to do in performing a lookup and each lookup takes about 30 nanoseconds. Dictionary tuple is slower; there is more to do in creating a tuple, hashing it, storing it in an array and jumping if there are collisions. All in it's around 60% slower, at 50 nanoseconds per operation. They're both and order of magnitude faster than a "looping the dictionary approach"

The really intersting thing is how the look changes when we work on much bigger datasets. At tenth of a degree precision we've gone from about 200 measurements to 6.5 million, and if we're considering datasets generated by machine that might even be a small dataset.

With bigger arrays and bigger dictionaries comes a performance punch on the nose; the 2D array is half the speed at 60ns per lookup, the Dictionary tuple is a fifth the speed at 250ns per lookup but these are something of drops in the ocean on the looping approach taking 95 times longer, at nearly 49,000 nanoseconds per lookup...

---

And what does it look like if you built the dictionary the other way round and used it as it was intended, rather than looping through its values looking for your hit? That 49,000 ns comes down to 135ns (not present in the above table).


The key take-away here is that nested loops are a very slow way to do things.. You need to understand that doing a FirstOrDefault on a dictionary is just going to loop through it one by one, looking for whatever entry satisfies the predicate passed in. On average, assuming random lookup values, that is going to find the wanted entry after searching half the values so in our bigger dataset, a random lookup will result in 3.25 million values being checked befoer the wanted one is found..

Dictionary was designed to be able to calculate where a wanted value is and go straight(ish) to it. It does some checking that it found what it wanted, because it's possible that Dictionary will end up wanting to store two different keys in the same place, so it has a strategy for mitigating that but it costs a bit of time. A sparse 2D array is guaranteed able to calculate where the wanted value is and there will never be a collision with another value, but you cannot always universally apply it and it might burn a lot of memory in some cases. Dictionary is usually a good tradeoff between time and space consumption

WOW! That was amazing, and I truly appreciate the time and effort you have put into helping me with this task. I need to admit that I have made a mistake by not mentioning that the text file containing the data is space delimited and terminated with a new line for each set.

I have realized that I also made a mistake with using a dictionary, key-value sets. I guess being a casual programmer is the cause to blame. To be 100% honest with you, I didn't even know if there a such a thing called sparse arrays until this post. I think it's time to dig a little deeper into these things. For now, I will continue to develop my solution correcting the Dictionary lookup for that extra performance boost, until I understand how to get this working with spars arrays. Thanks again for the tremendous effort you have put into this, and I will mark this post as solved!
 
The big 2D array I was suggesting is a "sparsely populated array" that wastes a lot of memory. There is another data structure called a "sparse array" that is not as wasteful with memory. It is basically a linked list of indices and elements. Some implementations even take a hybrid approaach of a linked list of smaller 2D arrays if it I'd known ahead of time that there are islands of populated elements.
 
Last edited:

Latest posts

Back
Top Bottom