Parsing City, State, Zip Lines

HLeyes

Member
Joined
Feb 10, 2016
Messages
15
Programming Experience
10+
I have a text file with names, addresses, city-state-zip lines, and extra lines. Each record doesn't necessarily occupy a certain number of lines such as six lines (as in a label format that always has 6 lines). Is there a way to be able to detect the line that has the City, State, Zip no matter how many blank lines or other lines might be in the record? It's no issue to parse that line once I can detect it. Thanks in advance. See example below:

John Brown
123 Gateway Blvd.
Fort Myers, FL 33913

Susie Smith
456 First Street
Suite 5
Orlando, FL 44444
call #32

Daniel Boone
999 Miami Highway
Gainesville, FL 84848
some line
some line 2

...



HLeyes
 
If there is a text pattern that is unique to that line you could use a regular expression (regex).
Based on your examples that pattern could be:
C#:
(some text), a comma, a space, a two letter uppercase state identifyer (A-Z), a space, 5 digits zip
In regex that can be expressed as:
C#:
.+, [A-Z]{2} \d{5}
See .NET Framework Regular Expressions | Microsoft Docs
 
My advice would be to seek an alternative way of getting these names and addresses. Like, however did they get into a textfile to begin with? If this is something you can control and prevent, seek an alternative, such as a Persons Class that you can add each person to and manage their data from there. Is this something you could do or are you forced to work with a text file?

Because the way the file is currently structured, it is rather difficult to do what you want without perhaps regex which would be useful here and definitely better than the alternative I am about to propose.

However, if you didn't want to go the regex route, your file would need to have some structure. For example, your text file would need to be in this format ::
Name: John Brown
Address: 123 Gateway Blvd
Next Address Line: Fort Myers
Postcode: FL 33913

Etc.... then while your (assumed) stream reader reads each line, you would check if the line being executed is first not an empty line and also contains Name: and if it does, you would use string.Substring() and return the new string of the name value without the "Name:" text and do what you please with the persons actual name. You would then repeat the cycle for the remaining lines. What I said on the first line, would be the correct way to approach this.
 
Also If you could guarantee you will only have 6 lines per each name and address entry in your text file before you hit a new empty line, you could do something like this ::
C#:
            var pathDir = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            var fileName = "CFile";
            var fileEtx = ".txt";
            var incrementer = 0;
            using (StreamReader sr = new StreamReader(Path.Combine(pathDir, string.Concat(fileName, fileEtx))))
            {
                var eachLine = string.Empty;
                while ((eachLine = sr.ReadLine()) != null)
                {
                    if (!string.IsNullOrEmpty(eachLine))
                    {
                        incrementer++;
                        /* We have read the first set of names and addresses
                           We have also read the first set of lines up until we hit an empty line */

                        /* This is where you do what you want with each of the six lines */ 
                        Debug.WriteLine(eachLine);

                        if (incrementer == 6)
                        {
                            incrementer = 0; /* Start the process over. Do nothing else here */
                        }
                    }
                    else if (incrementer == 6)
                    {
                        incrementer = 0;
                        /* Start the process over. Do nothing else here */
                    }
                }
            }
But I don't see this as a very good way of doing it either, regardless that it does work and outputs as planned ::
C#:
John Brown
123 Gateway Blvd.
Fort Myers
FL 33913
Some line 1
Some line 2

Susie Smith
456 First Street
Suite 5
Orlando, FL 44444
call #32
some line

Daniel Boone
999 Miami Highway
Gainesville, FL 84848
some line
some line 2
some line 3
Console Output:
John Brown
123 Gateway Blvd.
Fort Myers 
FL 33913
Some line 1
Some line 2
Susie Smith
456 First Street
Suite 5
Orlando, FL 44444
call #32
some line 
Daniel Boone
999 Miami Highway
Gainesville, FL 84848
some line
some line 2
some line 3
 
Won't a regular expression get tricked by:

Code Monkey
500 Rocky Stream Ave.
Floor 5, RM 53321
Seattle, WA 98106
 
Ticked how?

A persons class would be better, if the OP can in fact not use a textfile to store the data from wherever they receive it from. @HLeyes Is this an option for you?

If the only way forward for the OP is to use a textfile, then regex would be the next best solution given the correct patterns are set.
 
I have a text file with names, addresses, city-state-zip lines, and extra lines. Each record doesn't necessarily occupy a certain number of lines such as six lines (as in a label format that always has 6 lines). Is there a way to be able to detect the line that has the City, State, Zip no matter how many blank lines or other lines might be in the record? It's no issue to parse that line once I can detect it. Thanks in advance. See example below:

John Brown
123 Gateway Blvd.
Fort Myers, FL 33913

Susie Smith
456 First Street
Suite 5
Orlando, FL 44444
call #32

Daniel Boone
999 Miami Highway
Gainesville, FL 84848
some line
some line 2

...



HLeyes
Would it be possible to get this as a data file like xml, a delimited file, or CSV at least? Preferrably with all parts of the address as it's own field.
For example if you could get the file with these columns (fields): FirstName|LastName|Address1|Address2|City|State|Zip|Other1|Other2
 
The regex approach would be tricked by the second address line regarding floor and room number because it satisfies the conditions for the regex for city, state and zip.
 
Thanks to you all for your input. Looks like the best answer would be to get the source data in a structured format.
 
Back
Top Bottom