Remove substring in line when neither starting nor ending position is known

nesco88

Member
Joined
Sep 7, 2017
Messages
5
Programming Experience
3-5
I have a C# application that, depending on certain database conditions, creates a new file, opens it, and writes a record from an input file into the file.

Each field in the record is separated with an "|", and certain conditions require a new file to be opened, an input file read, and copied, and an output file created that removes several of the records (Name, ID, and TINType) in the output file and keeps the "|" that separates the record. Several of the records can be different lengths, so I don't know the position of either of the substrings. I'm using IndexOf and RemoveSubstring to determine the places that need to be removed, but it seems to repeat the entire line once I get to Name, the first place that needs to be removed. Any ideas?

Here is the code:

switch (cCond)
{
case "1":
if (Regex.IsMatch(line, "\\|" + dUCode + "\\|", RegexOptions.IgnoreCase) && Regex.IsMatch(line, "\\|" + tType + "\\|", RegexOptions.IgnoreCase))
{
int pos = line.IndexOf("|");
int pos2 = line.IndexOf("|", pos + 1);
string DocUnitCode = line.Substring(pos + 1, pos2 - (pos + 1));
pos = 0;
pos2 = 0;
pos = line.IndexOf("|"); //start of file
lineA = lineA + line.Substring(0, pos); //add payment request code
pos = line.IndexOf("|"); //start of file
lineA = lineA + line.Substring(0, pos); //add department code
//pos = line.IndexOf("|", pos + 1); //end of department code
pos2 = line.IndexOf("|", pos + 1); //end of doc code
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //doc code
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of file unit code
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //file unit code
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of payment reference number
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //payment reference number
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of bank code
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //bank code
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of payment code
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //payment code
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of confirmation Code
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //confirmation Code
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Transaction Type
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //Transaction Type
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Payment Amount
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //Payment Amount
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of EFT Tracking Number
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //EFT Tracking Number
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Name
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //Name
lineA = lineA + line.Remove(pos + 1, pos2 - (pos + 1)); //Remove name
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of ID Number
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //ID Number
lineA = lineA + line.Remove(pos + 1, pos2 - (pos + 1)); //ID Number
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of TIN Type
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //TIN Type
lineA = lineA + line.Remove(pos + 1, pos2 - (pos + 1)); //Remove TIN Type
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Return Reason Code
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //Return Reason Code
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Return Reason Description
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //Return Reason Description
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Addenda Information
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //Addenda Information
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of EFT Settlement/Cleared date
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)) + "|"; //EFT Settlement/Cleared date
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Offset flag
if (pos2 < 0)
{
pos2 = Convert.ToInt32(line.Length - 1);
lineA = lineA + line.Substring(pos + 1, pos2 - pos) + "|";
}
else
{
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)) + "|"; //Offset flag
}
pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Offset flag
if (pos2 < 0)
{
pos2 = Convert.ToInt32(line.Length - 1);
lineA = lineA + line.Substring(pos + 1, pos2 - pos);
}
else
{
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1));
}
string fName = strOutputFileDir + oFile + ".txt";
writeToFile(fName, line);
breakInnerLoop = true;
}
break;
 
Last edited:
So, I pretty much stopped reading at "Several of the records can be different lengths, so I don't know the position of either of the substrings." You cannot consistently and reliably parse a file that does not adhere to a known schema. If you have the ability to modify the program that creates your file then you should change it so that it is consistent. Typically, if that program writes a record that is missing a field then it simply inserts a delimiter, in your case the pipe | character. That way, you can reliably parse the file based on the delimeters.
 
So, I pretty much stopped reading at "Several of the records can be different lengths, so I don't know the position of either of the substrings." You cannot consistently and reliably parse a file that does not adhere to a known schema. If you have the ability to modify the program that creates your file then you should change it so that it is consistent. Typically, if that program writes a record that is missing a field then it simply inserts a delimiter, in your case the pipe | character. That way, you can reliably parse the file based on the delimeters.
The database copies a line from an input file. This line might look like this, separated by the delimiter "|":

29K|375|48625|NOK SPA|3811241496|S|888|None|10/10/2014

Then it is copied into an output file. In the output file though, several fields need to be removed (but the delimiters kept in) so that it looks like this:

29K|375|48625|||888|None|10/10/2014

But fields like the first one might be different sizes, so another example might be like this:

x59K|375|48625|NOK SPA|3811241496|S|888|None|10/10/2014

And the output needs to look the same as the one above. I've been able to add delimiters to fields based on IndexOf, so can all text between two delimiters be removed?
 
If you are modifying the file directly, rather than creating a new file with the modified values, then you would probably be best off doing the following

  • Read file line by line
  • Store current line in variable (or even better create a class that represents the object)
  • Modify the values needed via the variable
  • Perform a replace on the line using the variable value as the replacement string

I think this will be much easier than trying to only update the individual segment within each line
 
If you are modifying the file directly, rather than creating a new file with the modified values, then you would probably be best off doing the following

  • Read file line by line
  • Store current line in variable (or even better create a class that represents the object)
  • Modify the values needed via the variable
  • Perform a replace on the line using the variable value as the replacement string

I think this will be much easier than trying to only update the individual segment within each line
I'm copying the line from the input file:

29K|375|48625|NOK SPA|3811241496|S|888|None|10/10/2014

Then pasting the line into a separate output file:

29K|375|48625|||888|None|10/10/2014

So values NOK SPA, 3811241496, and S need to be removed in the output file, and the "|" stays. Which lines in the input file get copied and pasted with fields removed vary depending on database conditions.

pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Name
lineA = lineA + line.Substring(pos + 1, pos2 - (pos + 1)); //Name
lineA = lineA + line.Remove(pos + 1, pos2 - (pos + 1)); //Remove name

So rather than try the above, would something like this be better?

pos = pos2;
pos2 = line.IndexOf("|", pos + 1); //end of Name
int line2 = line.Substring(pos + 1, pos2 - (pos + 1));
line2 = " ";

Thanks.
 
Divide and conquer? What if you read lines and split each line by delimiter, modify field values (set some fields to empty), then join the fields back together with delimiter?
var line = "29K|375|48625|NOK SPA|3811241496|S|888|None|10/10/2014";
var fields = line.Split('|');
fields[3] = string.Empty;
fields[4] = string.Empty;
fields[5] = string.Empty;
line = string.Join("|", fields); //= "29K|375|48625||||888|None|10/10/2014"
 
Divide and conquer? What if you read lines and split each line by delimiter, modify field values (set some fields to empty), then join the fields back together with delimiter?
var line = "29K|375|48625|NOK SPA|3811241496|S|888|None|10/10/2014";
var fields = line.Split('|');
fields[3] = string.Empty;
fields[4] = string.Empty;
fields[5] = string.Empty;
line = string.Join("|", fields); //= "29K|375|48625||||888|None|10/10/2014"
Hi JohnH,

I like this idea much better than trying to modify each line in a certain place. Only one problem; the input files contain multiple lines and a read with a for each statement as shown below (then reads the conditions from the database):

C#:
 foreach (var line in File.ReadLines(fileName))
                    {
                        if (line.Trim().Length != 0)
                        {
                            foreach (DataRow dr in dt.Rows)
                            {
                                string dUCode = Convert.ToString(dr["Doc_Code"]);
                                string oFile = Convert.ToString(dr["Output_File"]);
                                string cCond = Convert.ToString(dr["Chk_Condition"]);

So the Join action can't be assigned to line. Should I use a for variable instead, like this:

for(line=0; line < File.ReadLines(fileName); line++)

Or a while statement?


while((line = file.ReadLines(fileName))!=null)

Thanks!
 
I would use File.ReadAllLines, loop the lines, then File.WriteAllLines.

If it is not possible to put the file in memory, loop ReadLines while writing new lines to a new temp file and replace original afterwards.
 
Back
Top Bottom