Resolved How to get values from elements in JSON without indexing?

WeyardWiz

Member
Joined
Oct 23, 2020
Messages
23
Programming Experience
3-5
I have the following code that extracts json elements values and outputs to csv:

C#:
public static void Json_to_Csv(string jsonInputFile, string csvFile)
{
    using (var p = new ChoJSONReader(jsonInputFile).WithJSONPath("$..readResults")) // "readResults": [
    {
        using (var w = new ChoCSVWriter(csvFile).WithFirstLineHeader())
        {
            w.Write(p
                .Select(r1 =>
                {
                    var lines = (dynamic[])r1.lines;
                    return new
                    {
                        FileName = jsonInputFile,
                        Page = r1.page,
                        PracticeName = lines[2].text,
                        OwnerFullName = lines[4].text,
                        OwnerEmail = lines[6].text,
                    };
                }
        }
    }
}

csv output:

File Name,Page,Practice Name,Owner Full Name,Owner Email
file1.json,1,Some Practice Name,Bob Lee,Bob@someemail.com

Currently there is no other contextual information on each item to reference them so the only way is by indexing, e.g. lines[2]

This works for now but I may have other JSON files that have an extra field, therefore the values pulled will be wrong.

In order to address this scenario, how can i pull the values contextually instead of indexing the lines?

Ive tried
C#:
PracticeName = lines["Practice Name"].text

but i get Cannot implicitly convert type string to int error


file1.json sample:

JSON:
{
  "status": "succeeded",
  "createdDateTime": "2020-10-22T19:35:35Z",
  "lastUpdatedDateTime": "2020-10-22T19:35:36Z",
  "analyzeResult": {
    "version": "3.0.0",
    "readResults": [
      {
        "page": 1,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": [        
          {
            "boundingBox": [
              0.5016,
              1.9141,
              2.5726,
              1.9141,
              2.5726,
              2.0741,
              0.5016,
              2.0741
            ],          
           "text": "Account Information",
            "words": [
              {
                "boundingBox": [
                  0.5016,
                  1.9345,
                  1.3399,
                  1.9345,
                  1.3399,
                  2.0741,
                  0.5016,
                  2.0741
                ],
                "text": "Account",
                "confidence": 1
              },
              {
                "boundingBox": [
                  1.3974,
                  1.9141,
                  2.5726,
                  1.9141,
                  2.5726,
                  2.0741,
                  1.3974,
                  2.0741
                ],
                "text": "Information",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              1.7716,
              2.4855,
              2.8793,
              2.4855,
              2.8793,
              2.6051,
              1.7716,
              2.6051
            ],
            "text": "Practice Name",
            "words": [
              {
                "boundingBox": [
                  1.7716,
                  2.4855,
                  2.3803,
                  2.4855,
                  2.3803,
                  2.6051,
                  1.7716,
                  2.6051
                ],
                "text": "Practice",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4362,
                  2.4948,
                  2.8793,
                  2.4948,
                  2.8793,
                  2.6051,
                  2.4362,
                  2.6051
                ],
                "text": "Name",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              2.9993,
              2.5257,
              4.7148,
              2.5257,
              4.7148,
              2.714,
              2.9993,
              2.714
            ],
            "text": "Some Practice Name",
            "words": [
              {
                "boundingBox": [
                  3.0072,
                  2.5385,
                  3.6546,
                  2.5284,
                  3.6516,
                  2.7131,
                  3.0105,
                  2.712
                ],
                "text": "Some",
                "confidence": 0.984
              },
              {
                "boundingBox": [
                  3.6887,
                  2.5281,
                  4.2112,
                  2.5262,
                  4.2028,
                  2.7159,
                  3.6854,
                  2.7132
                ],
                "text": "Parctice",
                "confidence": 0.986
              },
              {
                "boundingBox": [
                  4.2453,
                  2.5263,
                  4.7223,
                  2.5297,
                  4.7091,
                  2.72,
                  4.2366,
                  2.7161
                ],
                "text": "Name",
                "confidence": 0.986
              }
            ]
          },
          {
            "boundingBox": [
              1.6116,
              2.9999,
              2.8816,
              2.9999,
              2.8816,
              3.1158,
              1.6116,
              3.1158
            ],
            "text": "Owner Full Name",
            "words": [
              {
                "boundingBox": [
                  1.6116,
                  3.0039,
                  2.1026,
                  3.0039,
                  2.1026,
                  3.1157,
                  1.6116,
                  3.1157
                ],
                "text": "Owner",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.1541,
                  2.9999,
                  2.3784,
                  2.9999,
                  2.3784,
                  3.1158,
                  2.1541,
                  3.1158
                ],
                "text": "Full",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4384,
                  3.0052,
                  2.8816,
                  3.0052,
                  2.8816,
                  3.1155,
                  2.4384,
                  3.1155
                ],
                "text": "Name",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              2.9993,
              3.0242,
              3.6966,
              3.0242,
              3.6966,
              3.2125,
              2.9993,
              3.2014
            ],
            "text": "Bob Lee",
            "words": [
              {
                "boundingBox": [
                  3.0063,
                  3.0303,
                  3.3439,
                  3.0349,
                  3.3461,
                  3.2125,
                  3.007,
                  3.2081
                ],
                "text": "Bob",
                "confidence": 0.987
              },
              {
                "boundingBox": [
                  3.3788,
                  3.0349,
                  3.6931,
                  3.0326,
                  3.697,
                  3.2121,
                  3.3813,
                  3.2125
                ],
                "text": "Lee",
                "confidence": 0.983
              }
            ]
          },
          {
            "boundingBox": [
              1.945,
              3.5063,
              2.8748,
              3.5063,
              2.8748,
              3.6261,
              1.945,
              3.6261
            ],
            "text": "Owner Email",
            "words": [
              {
                "boundingBox": [
                  1.945,
                  3.5143,
                  2.4359,
                  3.5143,
                  2.4359,
                  3.6261,
                  1.945,
                  3.6261
                ],
                "text": "Owner",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4874,
                  3.5063,
                  2.8748,
                  3.5063,
                  2.8748,
                  3.6259,
                  2.4874,
                  3.6259
                ],
                "text": "Email",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              3.0104,
              3.5005,
              4.6042,
              3.5005,
              4.6042,
              3.6888,
              3.0104,
              3.6777
            ],
            "text": "bob@gmail.com",
            "words": [
              {
                "boundingBox": [
                  3.0212,
                  3.5047,
                  4.5837,
                  3.5039,
                  4.5769,
                  3.6886,
                  3.0129,
                  3.6787
                ],
                "text": "bob@gmail.com",
                "confidence": 0.951
              }
            ]
          }
        ]
      }
    ]
  }
}
 
Last edited by a moderator:
Solution
The following outputs the pairs to the console:
C#:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Newtonsoft.Json;

class RootObject
{
    [JsonProperty("analyzeResult")]
    public AnalyzeResult AnalyzeResult { get; set; }
}

class AnalyzeResult
{
    [JsonProperty("readResults")]
    public ReadResults[] ReadResults { get; set; }
}

class ReadResults
{
    [JsonProperty("lines")]
    public Line[] Lines { get; set; }
}

class Line
{
    [JsonProperty("text")]
    public string Text { get; set; }

    public override string ToString() => Text;
}

public static class IEnumerableExtensions
{
    public static IEnumerable<KeyValuePair<T, T>> Pairs<T>(this IEnumerable<T> items)
    {
        var...
You would use LINQ to Objects to filter to the object instance that you want. That would important that you need to deserialize your JSON into objects. I'm not familiar with ChoJSONReader, but I know that I already don't like it because it does not follow the .NET Framework naming conventions.
 
You would use LINQ to Objects to filter to the object instance that you want. That would important that you need to deserialize your JSON into objects. I'm not familiar with ChoJSONReader, but I know that I already don't like it because it does not follow the .NET Framework naming conventions.
I tried following the deserialization way but for what Im looking to do it seemed too complicated to implement. I found ChoJSONReader as an alternative because its so far been the only way I could achieve what I want.
I would appreciate if you can show me an example of what you mentioned above, as that may help improve the current design I have and make thing way more flexible.
 
If you know the contents of your file, you can populate your data to a class. From that class, you can then serialise to your csv file rather easily. One method I was showing Skydiver on another topic was the populate method : Populate an Object which he admitted was useful. Serialising and deserialising can be found on those pages amongst other useful examples.
 
If you know the contents of your file, you can populate your data to a class. From that class, you can then serialise to your csv file rather easily. One method I was showing Skydiver on another topic was the populate method : Populate an Object which he admitted was useful. Serialising and deserialising can be found on those pages amongst other useful examples.
Under normal circumstances, we will have a property, and then give it a value, like this:

C#:
public string Test { get; set; }

        public Program()

        {

            Test = "Test";

        }

Then we can get the value based on the property name in other places.

But in this json, "Owner Full Name" and "Bob Lee" are not the relationship between property and value, but the values of Text property in two unrelated objects, like this:

C#:
public class Line

    {

        public float[] BoundingBox { get; set; }

        public string Text { get; set; }

        public Word[] Words { get; set; }

    }

 

    ***********************



    new Line() { Text = "Owner Full Name" };

    new Line() { Text = "Bob Lee" };

We can't establish a connection between them, except to specify manually as in my original code.

Therefore, id have to reconstruct a qualified JSON before attempting to import it into a csv file, but the problem is this json is the result of the response from the Azure Computer Vision Read API.

I guess my current code using choJSON is the only way to accomplish this.
 
If you can be bothered to read the documentation on the links I gave you, you will see it is more than possible and very easy.
 
If you can be bothered to read the documentation on the links I gave you, you will see it is more than possible and very easy.
I have read it.
Populating the object works in the Account class example they show because it's a property/attribute to value relationship. Therefore, deserializing it dynamically is very easily done. However, in the json I've given in my post, this method does not work because the attributes and supposed ”values” have no connection. As a human, I can tell for example that ”Bob Lee” is the value of ”Owner Full Name ” property but the program cannot distinguish that like I can. Because there is no connection between them.

The only way is to populate a class for every file manually, which defeats the purpose of using a program to do this since I can just fill the data manually in the csv directly by reading the original pdf file.
 
Looking at the JSON there, it looks like it roughly maps to the following class structure:
C#:
class RootObject
{
    AnalyzeResult AnalyzeResult { get; set; }
}

class AnalyzeResult
{
    ReadResults ReadResults { get; set; }
}

class ReadResults
{
    Line [] Lines { get; set; }
}

class Line
{
    string Text { get; set; }
    Word [] Words { get; set; }
}

class Word
{
    string Text { get; set; }
}

If you know that each pair of lines is always a name-value pair, then you can just ingest the lines in pairs and setup the values to go out into the CSV.

As an aside, I took a glance at the source code for ChoJSONReader in GitHub. It's just a wrapper around the NewtonSoft JSON.NET library.
 
Interesting...could you demonstrate ”ingesting the lines in pairs and setting up the values to go out into the csv”? I think what compelled me to use choJSONReader is due to choCSVWriter since ultimately that's what I want, to write properties/values to csv.
If it's just a wrapper around the JSON.NET library, does this mean it's possible to pull the values contextually instead of indexing the lines? And if so, how?

Mod edit : No need to quote in whole or quote the person directly above you.
 
Last edited by a moderator:
The following outputs the pairs to the console:
C#:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Newtonsoft.Json;

class RootObject
{
    [JsonProperty("analyzeResult")]
    public AnalyzeResult AnalyzeResult { get; set; }
}

class AnalyzeResult
{
    [JsonProperty("readResults")]
    public ReadResults[] ReadResults { get; set; }
}

class ReadResults
{
    [JsonProperty("lines")]
    public Line[] Lines { get; set; }
}

class Line
{
    [JsonProperty("text")]
    public string Text { get; set; }

    public override string ToString() => Text;
}

public static class IEnumerableExtensions
{
    public static IEnumerable<KeyValuePair<T, T>> Pairs<T>(this IEnumerable<T> items)
    {
        var enumerator = items.GetEnumerator();

        while (enumerator.MoveNext())
        {
            var name = enumerator.Current;
            if (enumerator.MoveNext())
                yield return new KeyValuePair<T, T>(name, enumerator.Current);
            else
                throw new InvalidDataException("Odd number of items found in IEnumerable<T>");
        }
    }
}

class Program
{
    static IEnumerable<Line> GetLines(string jsonText)
    {
        var root = JsonConvert.DeserializeObject<RootObject>(jsonText);
        return root.AnalyzeResult
                   .ReadResults
                   .SelectMany(r => r.Lines);
    }

    static void Main(string[] args)
    {
        var lines = GetLines(File.ReadAllText("response.json"));

        // Skip(1) to skip over the "Account Information" Line.
        var pairs = lines.Skip(1).Pairs();

        foreach(var pair in pairs)
            Console.WriteLine($"{pair.Key}: {pair.Value}");
    }
}

which produces the following output:
Code:
Practice Name: Some Practice Name
Owner Full Name: Bob Lee
Owner Email: bob@gmail.com
 
Last edited:
Solution
The following outputs the pairs to the console:
C#:
    static void Main(string[] args)
    {
        // Skip(1) to skip over the "Account Information" Line.
        var pairs = lines.Skip(1).Pairs();
    }
This is awesome, thank you Skydiver. Although, wouldnt the fact that i have to explicitly skip over a certain Line, i.e. "Account Information", mean that Im still technically confined to the JSON structure? In other words, doesnt this mean that if i had another JSON file structure with more fields preceding even Account Information, that I would have to adjust the skip once again in order to make sure the first field it reads is Practice Name?
Is there a way then to make it even more dynamic so that it directly goes to Practice Name instead of just having to skip() over a certain number of Lines?
something like:
C#:
var pairs = lines.SkipAllUntil("Practice Name").Pairs();
 
I only put in the skip there as hard coded because I was trying to highlight the pulling of JSON elements in pairs since that is what you asked about. With software anything is possible. It just depends how much time, energy, and money you want to invest. To answer your new question, you can operate on any IEnumerable using LINQ's SkipWhile().
 
I only put in the skip there as hard coded because I was trying to highlight the pulling of JSON elements in pairs since that is what you asked about. With software anything is possible. It just depends how much time, energy, and money you want to invest. To answer your new question, you can operate on any IEnumerable using LINQ's SkipWhile().
i see. so ive tried the following:
C#:
var pairs = lines.SkipWhile(r => r == "Practice Name").Pairs();
However I am getting "Operator '==' cannot be applied to operands of type 'Line' and 'String'"

I think I understand what this error means, in that Line is not of Type String to enable a direct comparison like that.

So ive figured ok, easy enough, i just gotta convert lines to string type:
C#:
var pairs = lines.SkipWhile(r => r.ToString() == "Practice Name").Pairs();

However this not only print out "Account Information", but the pairing became messed up on console output and then i got an exception
"Unreachable code, 'Odd number of items found in IEnumerable<T>'"

But anyways, wouldnt this mean that lines will always have to be == to practice name using while? which means only that part of the JSON gets executed?

pardon my asking a lot of questions, the last i programmed in c# was 6 years ago and ive had to recently start using it again. Almost there though, truly appreciate your guidance so far!
 
Flip the logic.
C#:
var pairs = lines.SkipWhile(l => l.Text != "Practice Name").Pairs();
foreach(var pair in pairs)
    Console.WriteLine($"{pair.Key}: {pair.Value}");
seems to do the right thing for me.
 
Flip the logic.
C#:
var pairs = lines.SkipWhile(l => l.Text != "Practice Name").Pairs();
foreach(var pair in pairs)
    Console.WriteLine($"{pair.Key}: {pair.Value}");
seems to do the right thing for me.
Awesome, this does the trick indeed :)
btw, this JSON structure is generated through Azure computer vision REST API from a pdf input like this:

1603905939908.png


I mention this because i understand what the IEnumerable code is doing, but there is one edgecase that may not conform to how it operates. Basically, my understanding is that:

Scan of JSON seems to show that the .text of odd numbered lines is the name of a field
and the .text of even numbered lines is the value of that field.
For example:
If lines[3].text is "Owner Full Name",
then lines[3+1] is "Bob Lee"

The skipped variable would be the 'lines' input with everything prior to the field of interest
removed. We then just skip over the field name line and return the .text property of the next line.

The full JSON from the picture/pdf is derived by the API as:

JSON:
{
  "status": "succeeded",
  "createdDateTime": "2020-10-22T19:35:35Z",
  "lastUpdatedDateTime": "2020-10-22T19:35:36Z",
  "analyzeResult": {
    "version": "3.0.0",
    "readResults": [
      {
        "page": 1,
        "angle": 0,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "lines": [       
          {
            "boundingBox": [
              0.5016,
              1.9141,
              2.5726,
              1.9141,
              2.5726,
              2.0741,
              0.5016,
              2.0741
            ],         
           "text": "Account Information",
            "words": [
              {
                "boundingBox": [
                  0.5016,
                  1.9345,
                  1.3399,
                  1.9345,
                  1.3399,
                  2.0741,
                  0.5016,
                  2.0741
                ],
                "text": "Account",
                "confidence": 1
              },
              {
                "boundingBox": [
                  1.3974,
                  1.9141,
                  2.5726,
                  1.9141,
                  2.5726,
                  2.0741,
                  1.3974,
                  2.0741
                ],
                "text": "Information",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              1.7716,
              2.4855,
              2.8793,
              2.4855,
              2.8793,
              2.6051,
              1.7716,
              2.6051
            ],
            "text": "Practice Name",
            "words": [
              {
                "boundingBox": [
                  1.7716,
                  2.4855,
                  2.3803,
                  2.4855,
                  2.3803,
                  2.6051,
                  1.7716,
                  2.6051
                ],
                "text": "Practice",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4362,
                  2.4948,
                  2.8793,
                  2.4948,
                  2.8793,
                  2.6051,
                  2.4362,
                  2.6051
                ],
                "text": "Name",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              2.9993,
              2.5257,
              4.7148,
              2.5257,
              4.7148,
              2.714,
              2.9993,
              2.714
            ],
            "text": "Some Practice Name",
            "words": [
              {
                "boundingBox": [
                  3.0072,
                  2.5385,
                  3.6546,
                  2.5284,
                  3.6516,
                  2.7131,
                  3.0105,
                  2.712
                ],
                "text": "Some",
                "confidence": 0.984
              },
              {
                "boundingBox": [
                  3.6887,
                  2.5281,
                  4.2112,
                  2.5262,
                  4.2028,
                  2.7159,
                  3.6854,
                  2.7132
                ],
                "text": "Practice",
                "confidence": 0.986
              },
              {
                "boundingBox": [
                  4.2453,
                  2.5263,
                  4.7223,
                  2.5297,
                  4.7091,
                  2.72,
                  4.2366,
                  2.7161
                ],
                "text": "Name",
                "confidence": 0.986
              }
            ]
          },
          {
            "boundingBox": [
              1.6116,
              2.9999,
              2.8816,
              2.9999,
              2.8816,
              3.1158,
              1.6116,
              3.1158
            ],
            "text": "Owner Full Name",
            "words": [
              {
                "boundingBox": [
                  1.6116,
                  3.0039,
                  2.1026,
                  3.0039,
                  2.1026,
                  3.1157,
                  1.6116,
                  3.1157
                ],
                "text": "Owner",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.1541,
                  2.9999,
                  2.3784,
                  2.9999,
                  2.3784,
                  3.1158,
                  2.1541,
                  3.1158
                ],
                "text": "Full",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4384,
                  3.0052,
                  2.8816,
                  3.0052,
                  2.8816,
                  3.1155,
                  2.4384,
                  3.1155
                ],
                "text": "Name",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              2.9993,
              3.0242,
              3.6966,
              3.0242,
              3.6966,
              3.2125,
              2.9993,
              3.2014
            ],
            "text": "Bob Lee",
            "words": [
              {
                "boundingBox": [
                  3.0063,
                  3.0303,
                  3.3439,
                  3.0349,
                  3.3461,
                  3.2125,
                  3.007,
                  3.2081
                ],
                "text": "Bob",
                "confidence": 0.987
              },
              {
                "boundingBox": [
                  3.3788,
                  3.0349,
                  3.6931,
                  3.0326,
                  3.697,
                  3.2121,
                  3.3813,
                  3.2125
                ],
                "text": "Lee",
                "confidence": 0.983
              }
            ]
          },
          {
            "boundingBox": [
              1.945,
              3.5063,
              2.8748,
              3.5063,
              2.8748,
              3.6261,
              1.945,
              3.6261
            ],
            "text": "Owner Email",
            "words": [
              {
                "boundingBox": [
                  1.945,
                  3.5143,
                  2.4359,
                  3.5143,
                  2.4359,
                  3.6261,
                  1.945,
                  3.6261
                ],
                "text": "Owner",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4874,
                  3.5063,
                  2.8748,
                  3.5063,
                  2.8748,
                  3.6259,
                  2.4874,
                  3.6259
                ],
                "text": "Email",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              3.0104,
              3.5005,
              4.6042,
              3.5005,
              4.6042,
              3.6888,
              3.0104,
              3.6777
            ],
            "text": "bob@gmail.com",
            "words": [
              {
                "boundingBox": [
                  3.0212,
                  3.5047,
                  4.5837,
                  3.5039,
                  4.5769,
                  3.6886,
                  3.0129,
                  3.6787
                ],
                "text": "bob@gmail.com",
                "confidence": 0.951
              }
            ]
          },
          {
            "boundingBox": [
              1.945,
              6.5768,
              2.8886,
              6.5768,
              2.8886,
              6.7271,
              1.945,
              6.7271
            ],
            "text": "Server Setup",
            "words": [
              {
                "boundingBox": [
                  1.945,
                  6.5768,
                  2.4165,
                  6.5768,
                  2.4165,
                  6.6884,
                  1.945,
                  6.6884
                ],
                "text": "Server",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.4643,
                  6.5768,
                  2.8886,
                  6.5768,
                  2.8886,
                  6.7271,
                  2.4643,
                  6.7271
                ],
                "text": "Setup",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              3.5085,
              6.5025,
              3.7298,
              6.5136,
              3.7188,
              6.7351,
              3.4974,
              6.7241
            ],
            "text": "V",
            "words": [
              {
                "boundingBox": [
                  3.5672,
                  6.5046,
                  3.7293,
                  6.5128,
                  3.7183,
                  6.734,
                  3.5561,
                  6.7259
                ],
                "text": "V",
                "confidence": 0.984
              }
            ]
          },
          {
            "boundingBox": [
              3.7471,
              6.6145,
              4.1792,
              6.6145,
              4.1792,
              6.7304,
              3.7471,
              6.7304
            ],
            "text": "Cloud",
            "words": [
              {
                "boundingBox": [
                  3.7471,
                  6.6145,
                  4.1792,
                  6.6145,
                  4.1792,
                  6.7304,
                  3.7471,
                  6.7304
                ],
                "text": "Cloud",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              4.904,
              6.6105,
              5.5344,
              6.6105,
              5.5344,
              6.7301,
              4.904,
              6.7301
            ],
            "text": "Location",
            "words": [
              {
                "boundingBox": [
                  4.904,
                  6.6105,
                  5.5344,
                  6.6105,
                  5.5344,
                  6.7301,
                  4.904,
                  6.7301
                ],
                "text": "Location",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              6.2924,
              6.6037,
              7.8618,
              6.6037,
              7.8618,
              6.752,
              6.2924,
              6.752
            ],
            "text": "Central (multi-location)",
            "words": [
              {
                "boundingBox": [
                  6.2924,
                  6.6145,
                  6.8385,
                  6.6145,
                  6.8385,
                  6.7301,
                  6.2924,
                  6.7301
                ],
                "text": "Central",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.8929,
                  6.6037,
                  7.8618,
                  6.6037,
                  7.8618,
                  6.752,
                  6.8929,
                  6.752
                ],
                "text": "(multi-location)",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              0.6466,
              7.0788,
              2.8775,
              7.0788,
              2.8775,
              7.2388,
              0.6466,
              7.2388
            ],
            "text": "Number of Locations Enrolling",
            "words": [
              {
                "boundingBox": [
                  0.6466,
                  7.0832,
                  1.2496,
                  7.0832,
                  1.2496,
                  7.1991,
                  0.6466,
                  7.1991
                ],
                "text": "Number",
                "confidence": 1
              },
              {
                "boundingBox": [
                  1.2969,
                  7.0788,
                  1.4364,
                  7.0788,
                  1.4364,
                  7.1988,
                  1.2969,
                  7.1988
                ],
                "text": "of",
                "confidence": 1
              },
              {
                "boundingBox": [
                  1.4892,
                  7.0793,
                  2.2013,
                  7.0793,
                  2.2013,
                  7.1988,
                  1.4892,
                  7.1988
                ],
                "text": "Locations",
                "confidence": 1
              },
              {
                "boundingBox": [
                  2.2576,
                  7.0793,
                  2.8775,
                  7.0793,
                  2.8775,
                  7.2388,
                  2.2576,
                  7.2388
                ],
                "text": "Enrolling",
                "confidence": 1
              }
            ]
          },
          {
            "boundingBox": [
              3.4421,
              7.0342,
              3.6413,
              7.0453,
              3.6413,
              7.3001,
              3.4421,
              7.289
            ],
            "text": "1",
            "words": [
              {
                "boundingBox": [
                  3.4757,
                  7.0352,
                  3.6451,
                  7.0446,
                  3.631,
                  7.299,
                  3.4616,
                  7.2896
                ],
                "text": "1",
                "confidence": 0.987
              }
            ]
          },
          {
            "boundingBox": [
              4.1835,
              7.0896,
              7.8999,
              7.0896,
              7.8999,
              7.2158,
              4.1835,
              7.2158
            ],
            "text": "*If more than 1 location, add info on the locations form",
            "words": [
              {
                "boundingBox": [
                  4.1835,
                  7.0896,
                  4.3291,
                  7.0896,
                  4.3291,
                  7.1979,
                  4.1835,
                  7.1979
                ],
                "text": "*If",
                "confidence": 1
              },
              {
                "boundingBox": [
                  4.3611,
                  7.1193,
                  4.725,
                  7.1193,
                  4.725,
                  7.1988,
                  4.3611,
                  7.1988
                ],
                "text": "more",
                "confidence": 1
              },
              {
                "boundingBox": [
                  4.7701,
                  7.0936,
                  5.0809,
                  7.0936,
                  5.0809,
                  7.1988,
                  4.7701,
                  7.1988
                ],
                "text": "than",
                "confidence": 1
              },
              {
                "boundingBox": [
                  5.1307,
                  7.0985,
                  5.1613,
                  7.0985,
                  5.1613,
                  7.1979,
                  5.1307,
                  7.1979
                ],
                "text": "1",
                "confidence": 1
              },
              {
                "boundingBox": [
                  5.2006,
                  7.09,
                  5.7803,
                  7.09,
                  5.7803,
                  7.2158,
                  5.2006,
                  7.2158
                ],
                "text": "location,",
                "confidence": 1
              },
              {
                "boundingBox": [
                  5.8268,
                  7.0936,
                  6.102,
                  7.0936,
                  6.102,
                  7.1988,
                  5.8268,
                  7.1988
                ],
                "text": "add",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.1394,
                  7.0896,
                  6.3896,
                  7.0896,
                  6.3896,
                  7.1988,
                  6.1394,
                  7.1988
                ],
                "text": "info",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.435,
                  7.1193,
                  6.6005,
                  7.1193,
                  6.6005,
                  7.1988,
                  6.435,
                  7.1988
                ],
                "text": "on",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.6481,
                  7.0936,
                  6.865,
                  7.0936,
                  6.865,
                  7.1988,
                  6.6481,
                  7.1988
                ],
                "text": "the",
                "confidence": 1
              },
              {
                "boundingBox": [
                  6.9081,
                  7.09,
                  7.5365,
                  7.09,
                  7.5365,
                  7.1988,
                  6.9081,
                  7.1988
                ],
                "text": "locations",
                "confidence": 1
              },
              {
                "boundingBox": [
                  7.5783,
                  7.0896,
                  7.8999,
                  7.0896,
                  7.8999,
                  7.1988,
                  7.5783,
                  7.1988
                ],
                "text": "form",
                "confidence": 1
              }
            ]
          }
        ]
      }
    ]
  }
}

I think the API is not perfect and therefore we have an edgecase when it comes to multiple elements in a bounding box. For example, the way the API interprets "Server Setup" is the same as it interprets "Owner Full Name", basically just sticking to the convention that the .text of odd numbered lines is the name of a field and the .text of even numbered lines is the value of that field.
It fails to place the supposed "values" of that field into an "inner" bounded box inside the "Server Setup" text, therefore we end up with an output like this:

Practice Name: Some Practice Name
Owner Full Name: Bob Lee
Owner Email: bob@gmail.com
Server Setup: V
Cloud: Location
Central (multi-location): Number of Locations Enrolling
1: *If more than 1 location, add info on the locations form

While Practice Name, Owner, Full Name, and Owner Email fields/values are correct, the Server Setup field and values unfortunately is not. and that is understandable because the JSON structure is like that to begin with, missing a "child"-like element dependency as we would otherwise observe in the pdf/image.

1603906337179.png


Note: "V" represents the checkmark, since it looks like the API is incapable of interpreting symbols into the JSON.

The ideal output however should be this:

Practice Name: Some Practice Name
Owner Full Name: Bob Lee
Owner Email: bob@gmail.com
Server Setup: Cloud
Location: Central (multi-location)
Number of Locations Enrolling: 1

How do i adjust the IEnumerable code to accommodate this edgecase, (if thats even possible)?
 
Back
Top Bottom