This may sound stupid to you, as I'm a casual programmer. I have a two-dimensional array of values. Where rows are angles in decimal degrees, increments can be by 1 degree, 5 degrees even 10. Columns are also organized similar manner. I know how to access a generic two-dimensional array, but not sure how to go about this one, say if I want to access a value referenced at 5 degrees (row) and 100 degrees (col). Do I need to store the row and column values in two different matrixes and use a for loop to iterate through data? Any suggestions?
Ex: (x) represents a decimal number
5 0 10 20 30 40
10 x x x x x
15 x x x x x
20 x x x x x
25 x x x x x
30 x x x x x
Yikes! Floating point numbers can't be used as indexes into arrays. First three lines are floating point numbers. You will need to make those integers of you wish to use them as array indices.
If you want to use floating point, then for correctness, you will have to compare against some epsilon. This is where skip lists or sparse arrays may help.
Yikes! Floating point numbers can't be used as indexes into arrays. First three lines are floating point numbers. You will need to make those integers of you wish to use them as array indices.
If you want to use floating point, then for correctness, you will have to compare against some epsilon. This is where skip lists or sparse arrays may help.
The angles do not need to be floating point numbers. It can be converted to integers. I believe it's a rare case that someone conducts a test in smaller steps.
Test your C# code online with .NET Fiddle code editor.
dotnetfiddle.net
Tell you what, you show us yours, maybe @Skydiver will do a sparse array approach and then I'll do a BenchmarkDotNet test on the 2 or 3 approaches and you can see the numbers. I'd expect sparse array to be the fastest, dictionary to be shortly behind and looping-for-indexes to be considerably slower than the other two
Test your C# code online with .NET Fiddle code editor.
dotnetfiddle.net
Tell you what, you show us yours, maybe @Skydiver will do a sparse array approach and then I'll do a BenchmarkDotNet test on the 2 or 3 approaches and you can see the numbers. I'd expect sparse array to be the fastest, dictionary to be shortly behind and looping-for-indexes to be considerably slower than the other two
Here is the code that worked for me so far. Basically, save the data to a text file with new line termination. I don't believe this is not the most efficient way because I'm a casual programmer.
C#:
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Runtime.Remoting.Messaging;
using System.Globalization;
namespace ReadFile
{
internal class Program
{
static void Main(string[] args)
{
readData();
}
static List<string> readFile(string filePath)
{
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(filePath))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
lines.Add(line);
}
}
return lines;
}
static void readData()
{
Dictionary<int, double> vertRead = new Dictionary<int, double>();
Dictionary<int, double> horiRead = new Dictionary<int, double>();
string filePath = "Test.txt";
List<string> lines = readFile(filePath);
string[] vertString = lines[0].Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < vertString.Length; i++)
{
vertRead.Add(i, Convert.ToDouble(vertString[i]));
}
string[] horiString = lines[1].Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < horiString.Length; i++)
{
horiRead.Add(i, Convert.ToDouble(horiString[i]));
}
double[,] dataTable = new double[horiRead.Count,vertRead.Count];
int colIndex = 0;
for (int i = 2; i < lines.Count; i++)
{
string[] dataString = lines[i].Split((char[])null, StringSplitOptions.RemoveEmptyEntries);
for (int j = 0; j < vertRead.Count; j++)
{
dataTable[colIndex, j] = Convert.ToDouble(dataString[j]);
}
colIndex++;
}
Console.WriteLine("Enter Horizontal Angle :");
double horiAngle = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("Enter Vertical Angle :");
double vertAngle = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("Value at Horizontal Angle {0} and Vertical angle {1} is :{2} :", horiAngle, vertAngle, dataTable[horiRead.FirstOrDefault(x => x.Value==horiAngle).Key, vertRead.FirstOrDefault(y => y.Value == vertAngle).Key]);
Console.ReadKey();
}
}
}
Line 1: 60% of the vertical headers
Line 2: 40% of the vertical headers
Line 3: 100% of the horizontal headers
Line 4: First 40% of the data for the first vertical
Line 5: Second 40% of the data for the first vertical
Line 6: Final 20% of the data for the first vertical
Line 7: First 40% of the data for the second vertical..
I.e. it's essentially split into blocks of 3 lines, and 5 verticals requires an 18 line file (5*3 data, plus 3 header lines)
The second file looks like:
C#:
Line 1: 100% of the vertical headers
Line 2: 100% of the horizontal headers
Line 3: 100% of the data for the first vertical
Line 4: 100% of the data for the second vertical ...
Lines are not split into blocks of 3.
5 verticals are described in 7 lines (5*1 lines for data plus two header lines)
First file:
Second file (what I see when I open the file on my cellphone then paste into .net fiddle to count lines):
Same data, totally different positional presentation; the code I wrote against the first file(and your description of it) will totally fail to work against the second file. The code you wrote is very different to your description, and will totally fail against the first file. For reference, this is how the file was described:
As you can note, the first two lines have vertical angle measurements taken every 5 degrees, and the horizontal plane, the third line, is only done every 90 degrees. following four datasets, terminated with newline characters, represent measurements taken at each vertical angle for four horizontal planes.
---------
Moving onto looking at the actual code though, most critically your code builds its dictionary the wrong way round. This means when you have a a number that is the Value and you want a number that is the Key, you have to look through every item in the dictionary to find the Value, then pull its Key:
And the same for the vertRead dictionary. This is the bits that destroys performance.
If you built the dictionary the other way round (and don't use doubles; you might as well leave it as its string for all it matters, if the user will type "135.0" for the search angle) like this:
C#:
vertRead.Add(vertString[i], I);
Then the dictionary can search in the direction it was designed; "know Key, lookup Value"
Never arrange a dictionary such that it's "know Value, lookup Key". If you have scenarios where sometimes the search is eg Angle to Index and other times is Index to Angle, then keep two dictionaries
For this particular application, where it seems the slowest part by hundreds of orders of magnitude, is the user typing in their search terms using the dictionary the wrong way round ain't going to matter the slightest iota. If you're building something that will be queries hundreds or thousands of times a second, then it matters
So, out of curiosity, let's benchmark these approaches. It doesn't matter, from a perf test, about the data presentation because the thing being tested (the parsed result) is the same. I'll post back in a bit
I wrote another couple of things, and i the end tested your "lookup keys by looping dictionary values, then lookup in 2d array" approach against a dictionary with tuple, and a sparse 2D array. I also converted yours to use decial rather than double, for fairness against other decimal approaches
Then I also pondered "what happens if one day there is a file with tenth of a degree precision on both axes"
The sparse 2D array is fastest, and this is entirely expected. There is very little work to do in performing a lookup and each lookup takes about 30 nanoseconds. Dictionary tuple is slower; there is more to do in creating a tuple, hashing it, storing it in an array and jumping if there are collisions. All in it's around 60% slower, at 50 nanoseconds per operation. They're both and order of magnitude faster than a "looping the dictionary approach"
The really intersting thing is how the look changes when we work on much bigger datasets. At tenth of a degree precision we've gone from about 200 measurements to 6.5 million, and if we're considering datasets generated by machine that might even be a small dataset.
With bigger arrays and bigger dictionaries comes a performance punch on the nose; the 2D array is half the speed at 60ns per lookup, the Dictionary tuple is a fifth the speed at 250ns per lookup but these are something of drops in the ocean on the looping approach taking 95 times longer, at nearly 49,000 nanoseconds per lookup...
---
And what does it look like if you built the dictionary the other way round and used it as it was intended, rather than looping through its values looking for your hit? That 49,000 ns comes down to 135ns (not present in the above table).
The key take-away here is that nested loops are a very slow way to do things.. You need to understand that doing a FirstOrDefault on a dictionary is just going to loop through it one by one, looking for whatever entry satisfies the predicate passed in. On average, assuming random lookup values, that is going to find the wanted entry after searching half the values so in our bigger dataset, a random lookup will result in 3.25 million values being checked befoer the wanted one is found..
Dictionary was designed to be able to calculate where a wanted value is and go straight(ish) to it. It does some checking that it found what it wanted, because it's possible that Dictionary will end up wanting to store two different keys in the same place, so it has a strategy for mitigating that but it costs a bit of time. A sparse 2D array is guaranteed able to calculate where the wanted value is and there will never be a collision with another value, but you cannot always universally apply it and it might burn a lot of memory in some cases. Dictionary is usually a good tradeoff between time and space consumption
I wrote another couple of things, and i the end tested your "lookup keys by looping dictionary values, then lookup in 2d array" approach against a dictionary with tuple, and a sparse 2D array. I also converted yours to use decial rather than double, for fairness against other decimal approaches
Then I also pondered "what happens if one day there is a file with tenth of a degree precision on both axes"
The sparse 2D array is fastest, and this is entirely expected. There is very little work to do in performing a lookup and each lookup takes about 30 nanoseconds. Dictionary tuple is slower; there is more to do in creating a tuple, hashing it, storing it in an array and jumping if there are collisions. All in it's around 60% slower, at 50 nanoseconds per operation. They're both and order of magnitude faster than a "looping the dictionary approach"
The really intersting thing is how the look changes when we work on much bigger datasets. At tenth of a degree precision we've gone from about 200 measurements to 6.5 million, and if we're considering datasets generated by machine that might even be a small dataset.
With bigger arrays and bigger dictionaries comes a performance punch on the nose; the 2D array is half the speed at 60ns per lookup, the Dictionary tuple is a fifth the speed at 250ns per lookup but these are something of drops in the ocean on the looping approach taking 95 times longer, at nearly 49,000 nanoseconds per lookup...
---
And what does it look like if you built the dictionary the other way round and used it as it was intended, rather than looping through its values looking for your hit? That 49,000 ns comes down to 135ns (not present in the above table).
The key take-away here is that nested loops are a very slow way to do things.. You need to understand that doing a FirstOrDefault on a dictionary is just going to loop through it one by one, looking for whatever entry satisfies the predicate passed in. On average, assuming random lookup values, that is going to find the wanted entry after searching half the values so in our bigger dataset, a random lookup will result in 3.25 million values being checked befoer the wanted one is found..
Dictionary was designed to be able to calculate where a wanted value is and go straight(ish) to it. It does some checking that it found what it wanted, because it's possible that Dictionary will end up wanting to store two different keys in the same place, so it has a strategy for mitigating that but it costs a bit of time. A sparse 2D array is guaranteed able to calculate where the wanted value is and there will never be a collision with another value, but you cannot always universally apply it and it might burn a lot of memory in some cases. Dictionary is usually a good tradeoff between time and space consumption
WOW! That was amazing, and I truly appreciate the time and effort you have put into helping me with this task. I need to admit that I have made a mistake by not mentioning that the text file containing the data is space delimited and terminated with a new line for each set.
I have realized that I also made a mistake with using a dictionary, key-value sets. I guess being a casual programmer is the cause to blame. To be 100% honest with you, I didn't even know if there a such a thing called sparse arrays until this post. I think it's time to dig a little deeper into these things. For now, I will continue to develop my solution correcting the Dictionary lookup for that extra performance boost, until I understand how to get this working with spars arrays. Thanks again for the tremendous effort you have put into this, and I will mark this post as solved!
The big 2D array I was suggesting is a "sparsely populated array" that wastes a lot of memory. There is another data structure called a "sparse array" that is not as wasteful with memory. It is basically a linked list of indices and elements. Some implementations even take a hybrid approaach of a linked list of smaller 2D arrays if it I'd known ahead of time that there are islands of populated elements.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.