Resolved How to get values from elements in JSON without indexing?

WeyardWiz

Member
Joined
Oct 23, 2020
Messages
23
Programming Experience
3-5
I have the following code that extracts json elements values and outputs to csv:

C#:
public static void Json_to_Csv(string jsonInputFile, string csvFile)
{
    using (var p = new ChoJSONReader(jsonInputFile).WithJSONPath("$..readResults")) // "readResults": [
    {
        using (var w = new ChoCSVWriter(csvFile).WithFirstLineHeader())
        {
            w.Write(p
                .Select(r1 =>
                {
                    var lines = (dynamic[])r1.lines;
                    return new
                    {
                        FileName = jsonInputFile,
                        Page = r1.page,
                        PracticeName = lines[2].text,
                        OwnerFullName = lines[4].text,
                        OwnerEmail = lines[6].text,
                    };
                }
        }
    }
}

csv output:

File Name,Page,Practice Name,Owner Full Name,Owner Email
file1.json,1,Some Practice Name,Bob Lee,Bob@someemail.com

Currently there is no other contextual information on each item to reference them so the only way is by indexing, e.g. lines[2]

This works for now but I may have other JSON files that have an extra field, therefore the values pulled will be wrong.

In order to address this scenario, how can i pull the values contextually instead of indexing the lines?

Ive tried
C#:
PracticeName = lines["Practice Name"].text

but i get Cannot implicitly convert type string to int error


file1.json sample:

JSON:
{
  "status": "succeeded",
  "createdDateTime": "2020-10-22T19:35:35Z",
  "lastUpdatedDateTime": "2020-10-22T19:35:36Z",
  "analyzeResult": {
    "version": "3.0.0",
    "readResults": [
      {
        "page": 1,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": [        
          {
            "boundingBox": [
              0.5016,
              1.9141,
              2.5726,
              1.9141,
              2.5726,
              2.0741,
              0.5016,
              2.0741
            ],          
           "text": "Account Information",
            "words": [
              {
                "boundingBox": [
                  0.5016,
                  1.9345,
                  1.3399,
                  1.9345,
                  1.3399,
                  2.0741,
                  0.5016,
                  2.0741
                ],
                "text": "Account",
                "confidence": 1
              },
              {
                "boundingBox": [
                  1.3974,
                  1.9141,
                  2.5726,
                  1.9141,
                  2.5726,
                  2.0741,
                  1.3974,
                  2.0741
                ],
                "text": "Information",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              1.7716,
              2.4855,
              2.8793,
              2.4855,
              2.8793,
              2.6051,
              1.7716,
              2.6051
            ],
            "text": "Practice Name",
            "words": [
              {
                "boundingBox": [
                  1.7716,
                  2.4855,
                  2.3803,
                  2.4855,
                  2.3803,
                  2.6051,
                  1.7716,
                  2.6051
                ],
                "text": "Practice",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4362,
                  2.4948,
                  2.8793,
                  2.4948,
                  2.8793,
                  2.6051,
                  2.4362,
                  2.6051
                ],
                "text": "Name",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              2.9993,
              2.5257,
              4.7148,
              2.5257,
              4.7148,
              2.714,
              2.9993,
              2.714
            ],
            "text": "Some Practice Name",
            "words": [
              {
                "boundingBox": [
                  3.0072,
                  2.5385,
                  3.6546,
                  2.5284,
                  3.6516,
                  2.7131,
                  3.0105,
                  2.712
                ],
                "text": "Some",
                "confidence": 0.984
              },
              {
                "boundingBox": [
                  3.6887,
                  2.5281,
                  4.2112,
                  2.5262,
                  4.2028,
                  2.7159,
                  3.6854,
                  2.7132
                ],
                "text": "Parctice",
                "confidence": 0.986
              },
              {
                "boundingBox": [
                  4.2453,
                  2.5263,
                  4.7223,
                  2.5297,
                  4.7091,
                  2.72,
                  4.2366,
                  2.7161
                ],
                "text": "Name",
                "confidence": 0.986
              }
            ]
          },
          {
            "boundingBox": [
              1.6116,
              2.9999,
              2.8816,
              2.9999,
              2.8816,
              3.1158,
              1.6116,
              3.1158
            ],
            "text": "Owner Full Name",
            "words": [
              {
                "boundingBox": [
                  1.6116,
                  3.0039,
                  2.1026,
                  3.0039,
                  2.1026,
                  3.1157,
                  1.6116,
                  3.1157
                ],
                "text": "Owner",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.1541,
                  2.9999,
                  2.3784,
                  2.9999,
                  2.3784,
                  3.1158,
                  2.1541,
                  3.1158
                ],
                "text": "Full",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4384,
                  3.0052,
                  2.8816,
                  3.0052,
                  2.8816,
                  3.1155,
                  2.4384,
                  3.1155
                ],
                "text": "Name",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              2.9993,
              3.0242,
              3.6966,
              3.0242,
              3.6966,
              3.2125,
              2.9993,
              3.2014
            ],
            "text": "Bob Lee",
            "words": [
              {
                "boundingBox": [
                  3.0063,
                  3.0303,
                  3.3439,
                  3.0349,
                  3.3461,
                  3.2125,
                  3.007,
                  3.2081
                ],
                "text": "Bob",
                "confidence": 0.987
              },
              {
                "boundingBox": [
                  3.3788,
                  3.0349,
                  3.6931,
                  3.0326,
                  3.697,
                  3.2121,
                  3.3813,
                  3.2125
                ],
                "text": "Lee",
                "confidence": 0.983
              }
            ]
          },
          {
            "boundingBox": [
              1.945,
              3.5063,
              2.8748,
              3.5063,
              2.8748,
              3.6261,
              1.945,
              3.6261
            ],
            "text": "Owner Email",
            "words": [
              {
                "boundingBox": [
                  1.945,
                  3.5143,
                  2.4359,
                  3.5143,
                  2.4359,
                  3.6261,
                  1.945,
                  3.6261
                ],
                "text": "Owner",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4874,
                  3.5063,
                  2.8748,
                  3.5063,
                  2.8748,
                  3.6259,
                  2.4874,
                  3.6259
                ],
                "text": "Email",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              3.0104,
              3.5005,
              4.6042,
              3.5005,
              4.6042,
              3.6888,
              3.0104,
              3.6777
            ],
            "text": "bob@gmail.com",
            "words": [
              {
                "boundingBox": [
                  3.0212,
                  3.5047,
                  4.5837,
                  3.5039,
                  4.5769,
                  3.6886,
                  3.0129,
                  3.6787
                ],
                "text": "bob@gmail.com",
                "confidence": 0.951
              }
            ]
          }
        ]
      }
    ]
  }
}
 
Last edited by a moderator:
Solution
The following outputs the pairs to the console:
C#:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Newtonsoft.Json;

class RootObject
{
    [JsonProperty("analyzeResult")]
    public AnalyzeResult AnalyzeResult { get; set; }
}

class AnalyzeResult
{
    [JsonProperty("readResults")]
    public ReadResults[] ReadResults { get; set; }
}

class ReadResults
{
    [JsonProperty("lines")]
    public Line[] Lines { get; set; }
}

class Line
{
    [JsonProperty("text")]
    public string Text { get; set; }

    public override string ToString() => Text;
}

public static class IEnumerableExtensions
{
    public static IEnumerable<KeyValuePair<T, T>> Pairs<T>(this IEnumerable<T> items)
    {
        var...
Recall in post #8, I said:
If you know that each pair of lines is always a name-value pair, then you can just ingest the lines in pairs and setup the values to go out into the CSV.

You're data doesn't meet that condition.

Take time to read about how LINQ pipelining and fluent interface works. You can add extra filters and modifiers to change how the data flows down the pipeline, and try to adjust the enumeration of the fields.

Personally, I think that you are using the wrong approach of doing OCR first and then trying to massage the data that comes out. My recommendation is to do some AI classifications of the scanned PDF's or papers first which classifies the different kinds of forms into particular kinds of buckets. Then for each bucket, you apply the Strategy pattern and have a custom mask that you use with the OCR to only scan in the data that matters to you. Then as the data from each bucket comes out, it gets fed into a POCO that is the correct shape of data that you want to eventually want to save to your CSV. My gut says that you'll have to go down this path any which way because you'll have variances where some forms may say "Owner Full Name", while other forms will have "Owner's Name", or just "Name". And other forms may have the different fields in different orders (and so that SkipWhile() will end up skipping over important data. Consider what happens when you have to deal with languages that are RTL, where the field labels will be on the right and the values will be on the left.
 
You're data doesn't meet that condition.

Take time to read about how LINQ pipelining and fluent interface works. You can add extra filters and modifiers to change how the data flows down the pipeline, and try to adjust the enumeration of the fields.
because you'll have variances where some forms may say "Owner Full Name", while other forms will have "Owner's Name", or just "Name". And other forms may have the different fields in different orders (and so that SkipWhile() will end up skipping over important data. Consider what happens when you have to deal with languages that are RTL, where the field labels will be on the right and the values will be on the left.
This is a very good point. Ive reviewed with the team and they said this form will be the official template, so we won't have to worry about variances.
Since that's the case, does the desired output im seeking from the JSON still not meet the condition in post#8? In other words, the above code could only work for JSON items meeting that condition, like Owner Full Name?
 
Even if there are no variances, the data still doesn't meet the condition that adjacent pairs of lines are related to each other. The "Server Setup" line is followed by 4 lines which are all related to it. The "Number of Locations" has 2 lines which are related to it.
 
Perhaps try to group the lines by the second number in boundingBox which seems to be the line Y coordinate, a proximity comparison of +-0.12 would be needed, see LINQ (Or pseudocode) to group items by proximity
Then you could handle each group, where first text would be the label and second the value. The "Server Setup" line would need special treatment to find out which text follows the "V" selection.
 
Follow up to my suggestion, this adds to @Skydiver's code in post 10.
Line class:
[JsonProperty("boundingBox")]
public double[] BoundingBox { get; set; }
IEnumerableExtensions:
internal static IEnumerable<IEnumerable<Line>> GroupByProximity(this IEnumerable<Line> source, double threshold)
{
    var g = new List<Line>();
    foreach (var x in source)
    {
        if ((g.Count != 0) && (!x.BoundingBox[1].IsProximity(g[0].BoundingBox[1], threshold)))
        {
            yield return g;
            g = new List<Line>();
        }
        g.Add(x);
    }
    yield return g;
}

private static bool IsProximity(this double value, double compareTo, double treshold) {
    return value >= compareTo - treshold && value <= compareTo + treshold;
}
example usage:
var lines = GetLines(File.ReadAllText("response.json"));
var groups = lines.GroupByProximity(0.12);

var dictionary = groups.ToDictionary(g => g.First().Text, g => g.Skip(1).Select(line => line.Text));
var value1 = dictionary["Practice Name"].First();
var value2 = dictionary["Server Setup"].SkipWhile(s => s != "V").Skip(1).FirstOrDefault();
var value3 = string.Join(" ", dictionary["Number of Locations Enrolling"]);
ToDictionary requires that the labels are distinct, if they are not you could use ToLookup instead.
value2 example allows for no selection, it will be null in that case.
value3 is just an example combining all line values to a single string.
 
Follow up to my suggestion, this adds to @Skydiver's code in post 10.
That is brilliant! I just tested it out and its working like a charm!
Couple questions:
  1. if i change ToDictionary to ToLookup, i get an exception for the values: sequence contains no elements. of course, using FirstOrDefault() suppresses the exception, but i endup with null output anyways...Do i need to adjust something in the values as well for values to print out?
  2. Ive ran the Azure Vision on an image that has pretty much the same format as pdf file except its a jpeg file.
The API interprets the server selection as "I" instead of "V" (i think it parses the right edge of the square box as I)

1604034524121.png


So the resulting JSON
is something like this:

JSON:
{
  "status": "succeeded",
  "createdDateTime": "2020-10-22T19:35:35Z",
  "lastUpdatedDateTime": "2020-10-22T19:35:36Z",
  "analyzeResult": {
    "version": "3.0.0",
    "readResults": [
      {
        "page": 1,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": [    
          {
            "boundingBox": [
              0.5016,
              1.9141,
              2.5726,
              1.9141,
              2.5726,
              2.0741,
              0.5016,
              2.0741
            ],      
           "text": "Account Information",
            "words": [
              {
                "boundingBox": [
                  0.5016,
                  1.9345,
                  1.3399,
                  1.9345,
                  1.3399,
                  2.0741,
                  0.5016,
                  2.0741
                ],
                "text": "Account",
                "confidence": 1
              },
              {
                "boundingBox": [
                  1.3974,
                  1.9141,
                  2.5726,
                  1.9141,
                  2.5726,
                  2.0741,
                  1.3974,
                  2.0741
                ],
                "text": "Information",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              1.7716,
              2.4855,
              2.8793,
              2.4855,
              2.8793,
              2.6051,
              1.7716,
              2.6051
            ],
            "text": "Practice Name",
            "words": [
              {
                "boundingBox": [
                  1.7716,
                  2.4855,
                  2.3803,
                  2.4855,
                  2.3803,
                  2.6051,
                  1.7716,
                  2.6051
                ],
                "text": "Practice",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4362,
                  2.4948,
                  2.8793,
                  2.4948,
                  2.8793,
                  2.6051,
                  2.4362,
                  2.6051
                ],
                "text": "Name",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              2.9993,
              2.5257,
              4.7148,
              2.5257,
              4.7148,
              2.714,
              2.9993,
              2.714
            ],
            "text": "Some Practice Name",
            "words": [
              {
                "boundingBox": [
                  3.0072,
                  2.5385,
                  3.6546,
                  2.5284,
                  3.6516,
                  2.7131,
                  3.0105,
                  2.712
                ],
                "text": "Some",
                "confidence": 0.984
              },
              {
                "boundingBox": [
                  3.6887,
                  2.5281,
                  4.2112,
                  2.5262,
                  4.2028,
                  2.7159,
                  3.6854,
                  2.7132
                ],
                "text": "Practice",
                "confidence": 0.986
              },
              {
                "boundingBox": [
                  4.2453,
                  2.5263,
                  4.7223,
                  2.5297,
                  4.7091,
                  2.72,
                  4.2366,
                  2.7161
                ],
                "text": "Name",
                "confidence": 0.986
              }
            ]
          },
          {
            "boundingBox": [
              1.6116,
              2.9999,
              2.8816,
              2.9999,
              2.8816,
              3.1158,
              1.6116,
              3.1158
            ],
            "text": "Owner Full Name",
            "words": [
              {
                "boundingBox": [
                  1.6116,
                  3.0039,
                  2.1026,
                  3.0039,
                  2.1026,
                  3.1157,
                  1.6116,
                  3.1157
                ],
                "text": "Owner",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.1541,
                  2.9999,
                  2.3784,
                  2.9999,
                  2.3784,
                  3.1158,
                  2.1541,
                  3.1158
                ],
                "text": "Full",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4384,
                  3.0052,
                  2.8816,
                  3.0052,
                  2.8816,
                  3.1155,
                  2.4384,
                  3.1155
                ],
                "text": "Name",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              2.9993,
              3.0242,
              3.6966,
              3.0242,
              3.6966,
              3.2125,
              2.9993,
              3.2014
            ],
            "text": "Bob Lee",
            "words": [
              {
                "boundingBox": [
                  3.0063,
                  3.0303,
                  3.3439,
                  3.0349,
                  3.3461,
                  3.2125,
                  3.007,
                  3.2081
                ],
                "text": "Bob",
                "confidence": 0.987
              },
              {
                "boundingBox": [
                  3.3788,
                  3.0349,
                  3.6931,
                  3.0326,
                  3.697,
                  3.2121,
                  3.3813,
                  3.2125
                ],
                "text": "Lee",
                "confidence": 0.983
              }
            ]
          },
          {
            "boundingBox": [
              1.945,
              3.5063,
              2.8748,
              3.5063,
              2.8748,
              3.6261,
              1.945,
              3.6261
            ],
            "text": "Owner Email",
            "words": [
              {
                "boundingBox": [
                  1.945,
                  3.5143,
                  2.4359,
                  3.5143,
                  2.4359,
                  3.6261,
                  1.945,
                  3.6261
                ],
                "text": "Owner",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4874,
                  3.5063,
                  2.8748,
                  3.5063,
                  2.8748,
                  3.6259,
                  2.4874,
                  3.6259
                ],
                "text": "Email",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              3.0104,
              3.5005,
              4.6042,
              3.5005,
              4.6042,
              3.6888,
              3.0104,
              3.6777
            ],
            "text": "bob@gmail.com",
            "words": [
              {
                "boundingBox": [
                  3.0212,
                  3.5047,
                  4.5837,
                  3.5039,
                  4.5769,
                  3.6886,
                  3.0129,
                  3.6787
                ],
                "text": "bob@gmail.com",
                "confidence": 0.951
              }
            ]
          },
          {
            "boundingBox": [
              863,
              1690,
              1082,
              1692,
              1082,
              1729,
              863,
              1727
            ],
            "text": "Server Setup",
            "words": [
              {
                "boundingBox": [
                  866,
                  1691,
                  970,
                  1691,
                  970,
                  1729,
                  867,
                  1727
                ],
                "text": "Server",
                "confidence": 0.985
              },
              {
                "boundingBox": [
                  977,
                  1691,
                  1081,
                  1692,
                  1081,
                  1730,
                  977,
                  1729
                ],
                "text": "Setup",
                "confidence": 0.986
              }
            ]
          },
          {
            "boundingBox": [
              1203,
              1679,
              1361,
              1688,
              1358,
              1737,
              1201,
              1733
            ],
            "text": "Icloud",
            "words": [
              {
                "boundingBox": [
                  1235,
                  1680,
                  1358,
                  1685,
                  1355,
                  1738,
                  1233,
                  1734
                ],
                "text": "Icloud",
                "confidence": 0.641
              }
            ]
          },
          {
            "boundingBox": [
              1514,
              1703,
              1643,
              1700,
              1644,
              1734,
              1514,
              1735
            ],
            "text": "ocation",
            "words": [
              {
                "boundingBox": [
                  1516,
                  1705,
                  1643,
                  1700,
                  1643,
                  1735,
                  1515,
                  1735
                ],
                "text": "ocation",
                "confidence": 0.985
              }
            ]
          },
          {
            "boundingBox": [
              1808,
              1698,
              2140,
              1697,
              2141,
              1736,
              1808,
              1737
            ],
            "text": "Central (multi-location)",
            "words": [
              {
                "boundingBox": [
                  1809,
                  1702,
                  1916,
                  1699,
                  1915,
                  1736,
                  1808,
                  1735
                ],
                "text": "Central",
                "confidence": 0.981
              },
              {
                "boundingBox": [
                  1923,
                  1699,
                  2140,
                  1698,
                  2138,
                  1737,
                  1922,
                  1736
                ],
                "text": "(multi-location)",
                "confidence": 0.889
              }
            ]
          },
          {
            "boundingBox": [
              0.6466,
              7.0788,
              2.8775,
              7.0788,
              2.8775,
              7.2388,
              0.6466,
              7.2388
            ],
            "text": "Number of Locations Enrolling",
            "words": [
              {
                "boundingBox": [
                  0.6466,
                  7.0832,
                  1.2496,
                  7.0832,
                  1.2496,
                  7.1991,
                  0.6466,
                  7.1991
                ],
                "text": "Number",
                "confidence": 1
              },
              {
                "boundingBox": [
                  1.2969,
                  7.0788,
                  1.4364,
                  7.0788,
                  1.4364,
                  7.1988,
                  1.2969,
                  7.1988
                ],
                "text": "of",
                "confidence": 1
              },
              {
                "boundingBox": [
                  1.4892,
                  7.0793,
                  2.2013,
                  7.0793,
                  2.2013,
                  7.1988,
                  1.4892,
                  7.1988
                ],
                "text": "Locations",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.2576,
                  7.0793,
                  2.8775,
                  7.0793,
                  2.8775,
                  7.2388,
                  2.2576,
                  7.2388
                ],
                "text": "Enrolling",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              3.4421,
              7.0342,
              3.6413,
              7.0453,
              3.6413,
              7.3001,
              3.4421,
              7.289
            ],
            "text": "1",
            "words": [
              {
                "boundingBox": [
                  3.4757,
                  7.0352,
                  3.6451,
                  7.0446,
                  3.631,
                  7.299,
                  3.4616,
                  7.2896
                ],
                "text": "1",
                "confidence": 0.987
              }
            ]
          },
          {
            "boundingBox": [
              4.1835,
              7.0896,
              7.8999,
              7.0896,
              7.8999,
              7.2158,
              4.1835,
              7.2158
            ],
            "text": "*If more than 1 location, add info on the locations form",
            "words": [
              {
                "boundingBox": [
                  4.1835,
                  7.0896,
                  4.3291,
                  7.0896,
                  4.3291,
                  7.1979,
                  4.1835,
                  7.1979
                ],
                "text": "*If",
                "confidence": 1
              },
              {
                "boundingBox": [
                  4.3611,
                  7.1193,
                  4.725,
                  7.1193,
                  4.725,
                  7.1988,
                  4.3611,
                  7.1988
                ],
                "text": "more",
                "confidence": 1
              },
              {
                "boundingBox": [
                  4.7701,
                  7.0936,
                  5.0809,
                  7.0936,
                  5.0809,
                  7.1988,
                  4.7701,
                  7.1988
                ],
                "text": "than",
                "confidence": 1
              },
              {
                "boundingBox": [
                  5.1307,
                  7.0985,
                  5.1613,
                  7.0985,
                  5.1613,
                  7.1979,
                  5.1307,
                  7.1979
                ],
                "text": "1",
                "confidence": 1
              },
              {
                "boundingBox": [
                  5.2006,
                  7.09,
                  5.7803,
                  7.09,
                  5.7803,
                  7.2158,
                  5.2006,
                  7.2158
                ],
                "text": "location,",
                "confidence": 1
              },
              {
                "boundingBox": [
                  5.8268,
                  7.0936,
                  6.102,
                  7.0936,
                  6.102,
                  7.1988,
                  5.8268,
                  7.1988
                ],
                "text": "add",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.1394,
                  7.0896,
                  6.3896,
                  7.0896,
                  6.3896,
                  7.1988,
                  6.1394,
                  7.1988
                ],
                "text": "info",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.435,
                  7.1193,
                  6.6005,
                  7.1193,
                  6.6005,
                  7.1988,
                  6.435,
                  7.1988
                ],
                "text": "on",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.6481,
                  7.0936,
                  6.865,
                  7.0936,
                  6.865,
                  7.1988,
                  6.6481,
                  7.1988
                ],
                "text": "the",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.9081,
                  7.09,
                  7.5365,
                  7.09,
                  7.5365,
                  7.1988,
                  6.9081,
                  7.1988
                ],
                "text": "locations",
                "confidence": 1
              },
              {
                "boundingBox": [
                  7.5783,
                  7.0896,
                  7.8999,
                  7.0896,
                  7.8999,
                  7.1988,
                  7.5783,
                  7.1988
                ],
                "text": "form",
                "confidence": 1
              }
            ]
          }
        ]
      }
    ]
  }
}

How can the code be optimized further to handle an edgecase like that with the server selection?

3. I am printing the values to console like Console.WriteLine(value1);
How can i print the values with their keys using maybe
C#:
IEnumerable<KeyValuePair<T, T>> Pairs<T>(this IEnumerable<T> items)

Ive tried:
C#:
dictionary.Select(i => $"{i.Key}: {i.Value.FirstOrDefault()}").ToList().ForEach(Console.WriteLine);
however, i am getting fields like Account Information printed out, and Server setup printing V for value (which is expected, since some keys/values have special considerations, such as Only printing starting from Practice Name as first Key or like Server Setup, is this why you set individual values for each key? in other words, i can't do something dynamic like the code i attempted above?
 
if i change ToDictionary to ToLookup, i get an exception for the values: sequence contains no elements. of course, using FirstOrDefault() suppresses the exception, but i endup with null output anyways...Do i need to adjust something in the values as well for values to print out?
Yes, remember a Lookup can have multiple value series for each key, so the 'value' of one key will be IEnumerable<IEnumerable<string>>.
C#:
var value1 = lookup["Practice Name"].First().First(); //first set, first value
 
Ive ran the Azure Vision on an image that has pretty much the same format as pdf file except its a jpeg file.
The API interprets the server selection as "I" instead of "V" (i think it parses the right edge of the square box as I)
Image OCR is rather unreliable from what I've seen.
So the resulting JSON
is something like this:
"something like this", right :rolleyes: The coordinate system does not match, it seems you have mixed sections from different data files.
1,9141 Account Information
2,4855 Practice Name
2,5257 Some Practice Name
2,9999 Owner Full Name
3,0242 Bob Lee
3,5063 Owner Email
3,5005 bob@gmail.com
1690 Server Setup
1679 Icloud
1703 ocation
1698 Central (multi-location)
7,0788 Number of Locations Enrolling
7,0342 1
7,0896 *If more than 1 location, add info on the locations form
or maybe it was just an example of the strings it produces?
 
The API interprets the server selection as "I" instead of "V" (i think it parses the right edge of the square box as I)
No it's because your image is at an angle, and so the check mark is now actually vertical and the OCR sees it as an 'I'. The OCR is not smart enough to rotate the entire image first, then perform the OCR process.
 
No it's because your image is at an angle, and so the check mark is now actually vertical and the OCR sees it as an 'I'. The OCR is not smart enough to rotate the entire image first, then perform the OCR process.
ohhh thats a good explanation, never thought about it that way. but even with a non-titled image, i still got an "I"cloud.
 
Image OCR is rather unreliable from what I've seen.

"something like this", right :rolleyes: The coordinate system does not match, it seems you have mixed sections from different data files.

or maybe it was just an example of the strings it produces?
my bad, you're right, was just an example of the string it produces.

here is an un-tilted version of the image and resulting JSON:

1604073819198.png


JSON:

JSON:
{
  "status": "succeeded",
  "createdDateTime": "2020-10-30T15:56:11Z",
  "lastUpdatedDateTime": "2020-10-30T15:56:12Z",
  "analyzeResult": {
    "version": "3.0.0",
    "readResults": [
      {
        "page": 1,
        "angle": 0.086,
        "width": 684,
        "height": 272,
        "unit": "pixel",
        "lines": [
          {
            "boundingBox": [
              7,
              6,
              196,
              5,
              196,
              24,
              7,
              25
            ],
            "text": "Account Information",
            "words": [
              {
                "boundingBox": [
                  10,
                  7,
                  83,
                  7,
                  81,
                  24,
                  7,
                  26
                ],
                "text": "Account",
                "confidence": 0.981
              },
              {
                "boundingBox": [
                  87,
                  7,
                  196,
                  6,
                  196,
                  24,
                  85,
                  24
                ],
                "text": "Information",
                "confidence": 0.939
              }
            ]
          },
          {
            "boundingBox": [
              120,
              56,
              223,
              57,
              223,
              70,
              120,
              70
            ],
            "text": "Practice Name",
            "words": [
              {
                "boundingBox": [
                  120,
                  57,
                  176,
                  57,
                  176,
                  70,
                  120,
                  71
                ],
                "text": "Practice",
                "confidence": 0.982
              },
              {
                "boundingBox": [
                  179,
                  57,
                  222,
                  57,
                  222,
                  71,
                  179,
                  70
                ],
                "text": "Name",
                "confidence": 0.985
              }
            ]
          },
          {
            "boundingBox": [
              236,
              62,
              390,
              62,
              390,
              77,
              236,
              77
            ],
            "text": "Some Practice Name",
            "words": [
              {
                "boundingBox": [
                  236,
                  62,
                  277,
                  62,
                  277,
                  78,
                  236,
                  78
                ],
                "text": "Some",
                "confidence": 0.987
              },
              {
                "boundingBox": [
                  280,
                  62,
                  340,
                  62,
                  341,
                  78,
                  280,
                  77
                ],
                "text": "Practice",
                "confidence": 0.984
              },
              {
                "boundingBox": [
                  343,
                  62,
                  390,
                  62,
                  390,
                  78,
                  344,
                  78
                ],
                "text": "Name",
                "confidence": 0.987
              }
            ]
          },
          {
            "boundingBox": [
              107,
              102,
              223,
              102,
              223,
              115,
              107,
              115
            ],
            "text": "Owner Full Name",
            "words": [
              {
                "boundingBox": [
                  108,
                  103,
                  151,
                  102,
                  151,
                  116,
                  107,
                  116
                ],
                "text": "Owner",
                "confidence": 0.985
              },
              {
                "boundingBox": [
                  154,
                  102,
                  177,
                  102,
                  176,
                  116,
                  153,
                  116
                ],
                "text": "Full",
                "confidence": 0.954
              },
              {
                "boundingBox": [
                  180,
                  102,
                  224,
                  103,
                  223,
                  116,
                  179,
                  116
                ],
                "text": "Name",
                "confidence": 0.987
              }
            ]
          },
          {
            "boundingBox": [
              237,
              104,
              298,
              104,
              298,
              119,
              237,
              119
            ],
            "text": "Bob Lee",
            "words": [
              {
                "boundingBox": [
                  238,
                  104,
                  266,
                  104,
                  266,
                  119,
                  238,
                  120
                ],
                "text": "Bob",
                "confidence": 0.987
              },
              {
                "boundingBox": [
                  269,
                  104,
                  298,
                  105,
                  298,
                  120,
                  269,
                  119
                ],
                "text": "Lee",
                "confidence": 0.987
              }
            ]
          },
          {
            "boundingBox": [
              136,
              147,
              223,
              147,
              223,
              160,
              137,
              161
            ],
            "text": "Owner Email",
            "words": [
              {
                "boundingBox": [
                  137,
                  148,
                  181,
                  147,
                  181,
                  161,
                  137,
                  162
                ],
                "text": "Owner",
                "confidence": 0.985
              },
              {
                "boundingBox": [
                  184,
                  147,
                  224,
                  147,
                  224,
                  161,
                  184,
                  161
                ],
                "text": "Email",
                "confidence": 0.985
              }
            ]
          },
          {
            "boundingBox": [
              239,
              144,
              361,
              144,
              361,
              162,
              239,
              162
            ],
            "text": "bob@gmail.com",
            "words": [
              {
                "boundingBox": [
                  240,
                  145,
                  362,
                  146,
                  361,
                  163,
                  240,
                  163
                ],
                "text": "bob@gmail.com",
                "confidence": 0.974
              }
            ]
          },
          {
            "boundingBox": [
              137,
              193,
              224,
              193,
              224,
              208,
              137,
              208
            ],
            "text": "Server Setup",
            "words": [
              {
                "boundingBox": [
                  137,
                  194,
                  179,
                  194,
                  179,
                  208,
                  137,
                  208
                ],
                "text": "Server",
                "confidence": 0.985
              },
              {
                "boundingBox": [
                  182,
                  194,
                  224,
                  194,
                  224,
                  209,
                  182,
                  208
                ],
                "text": "Setup",
                "confidence": 0.985
              }
            ]
          },
          {
            "boundingBox": [
              276,
              188,
              340,
              192,
              339,
              211,
              275,
              209
            ],
            "text": "Icloud",
            "words": [
              {
                "boundingBox": [
                  297,
                  192,
                  339,
                  194,
                  339,
                  211,
                  297,
                  211
                ],
                "text": "Icloud",
                "confidence": 0.933
              }
            ]
          },
          {
            "boundingBox": [
              376,
              187,
              461,
              191,
              460,
              212,
              376,
              211
            ],
            "text": "Location",
            "words": [
              {
                "boundingBox": [
                  394,
                  191,
                  460,
                  196,
                  459,
                  211,
                  394,
                  211
                ],
                "text": "Location",
                "confidence": 0.844
              }
            ]
          },
          {
            "boundingBox": [
              500,
              189,
              666,
              192,
              665,
              212,
              499,
              211
            ],
            "text": "LIcentral (multi-location)",
            "words": [
              {
                "boundingBox": [
                  501,
                  190,
                  567,
                  195,
                  567,
                  212,
                  500,
                  212
                ],
                "text": "LIcentral",
                "confidence": 0.665
              },
              {
                "boundingBox": [
                  572,
                  195,
                  665,
                  195,
                  665,
                  212,
                  571,
                  212
                ],
                "text": "(multi-location)",
                "confidence": 0.899
              }
            ]
          },
          {
            "boundingBox": [
              21,
              238,
              224,
              238,
              223,
              255,
              21,
              253
            ],
            "text": "Number of Locations Enrolling",
            "words": [
              {
                "boundingBox": [
                  21,
                  239,
                  76,
                  239,
                  76,
                  253,
                  21,
                  253
                ],
                "text": "Number",
                "confidence": 0.985
              },
              {
                "boundingBox": [
                  79,
                  239,
                  92,
                  239,
                  92,
                  253,
                  79,
                  253
                ],
                "text": "of",
                "confidence": 0.983
              },
              {
                "boundingBox": [
                  95,
                  239,
                  161,
                  239,
                  161,
                  254,
                  95,
                  253
                ],
                "text": "Locations",
                "confidence": 0.981
              },
              {
                "boundingBox": [
                  164,
                  239,
                  224,
                  239,
                  223,
                  256,
                  163,
                  254
                ],
                "text": "Enrolling",
                "confidence": 0.983
              }
            ]
          },
          {
            "boundingBox": [
              273,
              237,
              289,
              239,
              288,
              257,
              272,
              255
            ],
            "text": "1",
            "words": [
              {
                "boundingBox": [
                  278,
                  237,
                  290,
                  239,
                  287,
                  257,
                  276,
                  255
                ],
                "text": "1",
                "confidence": 0.981
              }
            ]
          },
          {
            "boundingBox": [
              337,
              239,
              670,
              239,
              670,
              253,
              337,
              252
            ],
            "text": "*If more than 1 location, add info on the locations form",
            "words": [
              {
                "boundingBox": [
                  338,
                  239,
                  347,
                  239,
                  347,
                  252,
                  338,
                  252
                ],
                "text": "*If",
                "confidence": 0.874
              },
              {
                "boundingBox": [
                  350,
                  239,
                  384,
                  239,
                  384,
                  253,
                  350,
                  252
                ],
                "text": "more",
                "confidence": 0.983
              },
              {
                "boundingBox": [
                  386,
                  239,
                  416,
                  239,
                  416,
                  253,
                  386,
                  253
                ],
                "text": "than",
                "confidence": 0.986
              },
              {
                "boundingBox": [
                  419,
                  239,
                  422,
                  239,
                  422,
                  253,
                  419,
                  253
                ],
                "text": "1",
                "confidence": 0.635
              },
              {
                "boundingBox": [
                  425,
                  239,
                  478,
                  239,
                  478,
                  253,
                  425,
                  253
                ],
                "text": "location,",
                "confidence": 0.955
              },
              {
                "boundingBox": [
                  481,
                  239,
                  506,
                  239,
                  506,
                  253,
                  481,
                  253
                ],
                "text": "add",
                "confidence": 0.986
              },
              {
                "boundingBox": [
                  509,
                  239,
                  533,
                  239,
                  533,
                  253,
                  509,
                  253
                ],
                "text": "info",
                "confidence": 0.981
              },
              {
                "boundingBox": [
                  535,
                  239,
                  551,
                  239,
                  552,
                  253,
                  535,
                  253
                ],
                "text": "on",
                "confidence": 0.988
              },
              {
                "boundingBox": [
                  554,
                  239,
                  574,
                  239,
                  575,
                  253,
                  554,
                  253
                ],
                "text": "the",
                "confidence": 0.987
              },
              {
                "boundingBox": [
                  577,
                  239,
                  634,
                  239,
                  634,
                  253,
                  577,
                  253
                ],
                "text": "locations",
                "confidence": 0.973
              },
              {
                "boundingBox": [
                  636,
                  239,
                  666,
                  240,
                  666,
                  253,
                  637,
                  253
                ],
                "text": "form",
                "confidence": 0.986
              }
            ]
          }
        ]
      }
    ]
  }
}
 
That's possible. Check out how it became "LICentral" for the last checkbox.
 
:)

OCR == Optical Character Recognition

Image recognition is a slightly different branch of AI.
 
:)

OCR == Optical Character Recognition

Image recognition is a slightly different branch of AI.
hmm, according to Azure computer vision REST API , the doc says:
extract printed and handwritten text from an image using the new OCR technology available as part of the Computer Vision 3.1 REST API. With the new Read and Get Read Result methods, you can detect text in an image and extract recognized characters into a machine-readable character stream.
which is why ive been just referring to it as OCR since "Azure Computer Vision Read API" is a pretty long description lol
 
Back
Top Bottom