obtaining line preceding match, and after, up till another string match

G-Oker

Member
Joined
Jan 21, 2021
Messages
8
Programming Experience
1-3
Hello,
I am trying to search an IIS log file for a specific email address (inputted by the user @ runtime).
I can match the address fine, but what I need to do is grab the preceding lines from the string "0 EHLO - +" and then the all the lines from then up to the same string match.
I am lost on how best to do this.
Can anyone advise?

sample log entry below. would be searching for PersonToSendTo@anotherDomain.com address, but get all the data back (As below).

thank you in advance



C#:
2021-03-19 00:08:31 xx.198.1.xx OWNEROR-PC SMTPSVC1 OWNEROR-PC xx.198.1.xxx 0 EHLO - +OWNEROR-PC 250 0 246 20 0 SMTP - - - -
2021-03-19 00:08:31 xx.198.1.xxx OWNEROR-PC SMTPSVC1 OWNEROR-PC xx.198.1.xxx 0 AUTH - OWNEROR-PC 235 0 18 51 0 SMTP - - - -
2021-03-19 00:08:31 xx.198.1.xxx OWNEROR-PC SMTPSVC1 OWNEROR-PC xx.198.1.xxx 0 MAIL - +FROM:<senders@address.co.uk> 250 0 54 41 0 SMTP - - - -
2021-03-19 00:08:31 xx.198.1.xxx OWNEROR-PC SMTPSVC1 OWNEROR-PC xx.198.1.xxx 0 RCPT - +TO:<PersonToSendTo@anotherDomain.com> 250 0 35 32 0 SMTP - - - -
2021-03-19 00:08:31 xx.198.1.xxx OWNEROR-PC SMTPSVC1 OWNEROR-PC xx.198.1.xxx 0 RCPT - +TO:<CCaddress@domain.co.uk> 250 0 43 40 0 SMTP - - - -
2021-03-19 00:08:31 xx.198.1.xxx OWNEROR-PC SMTPSVC1 OWNEROR-PC xx.198.1.xxx 0 DATA - <OWNEROR-PCGNso0002e105@smtp-relay.com> 250 0 135 5269 47 SMTP - - - -
2021-03-19 00:08:31 xx.198.1.xxx OWNEROR-PC SMTPSVC1 OWNEROR-PC xx.198.1.xxx 0 QUIT - OWNEROR-PC 240 47 71 4 0 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 220+eu-smtp-1.mimecast.com+ESMTP;+Fri,+19+Mar+2021+00:08:31++0000 0 0 65 0 16 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 EHLO - smtp-relay.com 0 0 4 0 16 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 250-eu-smtp-1.mimecast.com+Hello+[xx.198.1.xxx] 0 0 47 0 31 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 MAIL - FROM:<senders@address.co.uk> 0 0 4 0 31 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 250+Sender+OK+[1bb1wKQqP42RYekw68P00w.uk40] 0 0 43 0 63 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 RCPT - TO:<CCaddress@domain.co.uk> 0 0 4 0 63 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 250+Recipient+OK+[1bb1wKQqP42RYekw68P00w.uk40] 0 0 46 0 109 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 DATA - - 0 0 4 0 109 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.217.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 354+Start+mail+data,+end+with+CRLF.CRLF+[1bb1wKQqP42RYekw68P00w.uk40] 0 0 69 0 125 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.57.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 220+DM6N.mail.nam1.somedomainname.com+name+ESMTP+MAIL+Service+ready+at+Fri,+19+Mar+2021+00:08:31++0000 0 0 115 0 125 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.57.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 EHLO - smtp-relay.com 0 0 4 0 125 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.57.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 250-DM6N.mail.protection.somedomainname.com+Hello+[xx.198.1.xxx] 0 0 66 0 249 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.57.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 MAIL - FROM:<renders@address.co.uk>+SIZE=5609 0 0 4 0 249 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.57.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 250+2.1.0+Sender+OK 0 0 19 0 374 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.57.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 RCPT - TO:<PersonToSendTo@anotherDomain.com> 0 0 4 0 374 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.57.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 250+2.1.5+Recipient+OK 0 0 22 0 515 SMTP - - - -
2021-03-19 00:08:31 xxx.xxx.57.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 BDAT - 5609+LAST 0 0 4 0 515 SMTP - - - -
2021-03-19 00:08:32 xxx.xxx.217.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 250+SmtpThread-8306465-1616112512239@uk-xxx-xx.uk.mimecast.lan+Received+OK+[1bb1wKQqP42RYekw68P00w.uk40] 0 0 104 0 936 SMTP - - - -
2021-03-19 00:08:32 xxx.xxx.217.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 QUIT - - 0 0 4 0 936 SMTP - - - -
2021-03-19 00:08:32 xxx.xxx.217.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 221+Service+closing+transmission+channel+[_-P-hI3lOxW3_giuQ6Ldbg.uk40] 0 0 70 0 967 SMTP - - - -
2021-03-19 00:08:34 xxx.xxx.57.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 250+2.6.0+<OWNEROR-PCGNso0002e105@smtp-relay.com>+[InternalId=54649163886184,+Hostname=DM6NAM11HT152.eop-nam1.somedomainname.com]+12114+bytes+in+0.797,+14.825+KB/sec+Queued+mail+for+delivery+->+250+2.1.5 0 0 226 0 2449 SMTP - - - -
2021-03-19 00:08:34 xxx.xxx.57.xxx OutboundConnectionCommand SMTPSVC1 OWNEROR-PC - 25 QUIT - - 0 0 4 0 2449 SMTP - - - -
2021-03-19 00:08:34 xxx.xxx.57.xxx OutboundConnectionResponse SMTPSVC1 OWNEROR-PC - 25 - - 221+2.0.0+Service+closing+transmission+channel 0 0 46 0 2574 SMTP - - - -
2021-03-19 00:09:15 157.xxx.5.xxx DCQPC SMTPSVC1 OWNEROR-PC xx.198.1.xxx 0 QUIT - DCQPC 240 86113 71 4 0 SMTP - - - -
 
Is this a one time search? Or will you need to do this for every email address you found in your other thread?
 
Hi again Skydiver. I would be looking for all occurrences of the email address in the log file. this would run on a separate button push to the Count option (from the other thread). thank you
 
I understood that part. My question is if you will only ever look for a single email address, or if you will have to do this for all email addresses, or a large set of email addresses.

Here is why I am asking:

If it's a single email address, then it's not too expensive to find the line that has the email address, then go backwards and find the line that has the corresponding EHLO or HELO from that same IP address, as well as go forward and find the line that has the QUIT from the same IP address. Then you can just grab all the lines between the EHLO or HELO up to the QUIT which have the matching IP address.

Now if you had to keep doing that again and again for other email address, then it becomes quite expensive. The way to do things if you had to handle all the email address anyway is to process the entire log file line by line. Each time you find an ELHO or HELO line, you would fire up a state machine for that specific IP address. Each line that you find is fed into a state machine for the corresponding IP address. The state machine will store the lines that correspond to the session. The state machine will note when it sees the email address and stash it away for use later. The state machine will also know to stop collecting lines when it sees the QUIT message, and also fire off a notification that it is done. With that notification, you grab the lines and email address and store it in a data structure. With this approach, it will be a single pass through the entire file instead of having to search forward to find the email address, then back to the ELHO or HELO, the forward to find the QUIT, then back again to the ELHO or HELO until the QUIT.
 
I understood that part. My question is if you will only ever look for a single email address, or if you will have to do this for all email addresses, or a large set of email addresses.

Here is why I am asking:

If it's a single email address, then it's not too expensive to find the line that has the email address, then go backwards and find the line that has the corresponding EHLO or HELO from that same IP address, as well as go forward and find the line that has the QUIT from the same IP address. Then you can just grab all the lines between the EHLO or HELO up to the QUIT which have the matching IP address.

Now if you had to keep doing that again and again for other email address, then it becomes quite expensive. The way to do things if you had to handle all the email address anyway is to process the entire log file line by line. Each time you find an ELHO or HELO line, you would fire up a state machine for that specific IP address. Each line that you find is fed into a state machine for the corresponding IP address. The state machine will store the lines that correspond to the session. The state machine will note when it sees the email address and stash it away for use later. The state machine will also know to stop collecting lines when it sees the QUIT message, and also fire off a notification that it is done. With that notification, you grab the lines and email address and store it in a data structure. With this approach, it will be a single pass through the entire file instead of having to search forward to find the email address, then back to the ELHO or HELO, the forward to find the QUIT, then back again to the ELHO or HELO until the QUIT.
ahh. ok. apologies. It would only be for a single, specific email address at a time (which the user (me) would put in a search box. No wildcards ( *hotmail.com . Brad*.* )etc. just the data form PersonToSendTo@anotherDomain.com matches (so I can see the times and associated flow when sending the emails).

I have it looking for the address , just need to build on it re: previous lines, and then lines up to the next ELHO.

C#:
        private void button1_Click_1(object sender, EventArgs e)
            {
            if (searchParam.Text != "")
                {
                if (fname == "")
                    {
                    MessageBox.Show("No file selected.\nClick the LOAD button to select a file first.", "No File Selected", MessageBoxButtons.OK, MessageBoxIcon.Warning);
                    }
                else
                    {
                    foreach (string line in File.ReadLines(fname))
                        {
                        if (line.Contains(searchParam.Text.ToString()))
                            {
                            mainWindow.AppendText(line + Environment.NewLine);
                            mainWindow.AppendText("-------------------" + Environment.NewLine + Environment.NewLine);
                            }
                        }
                    }
                }
            else { MessageBox.Show("Search box content is empty.\nPlease select something to search for.","No Criteria",MessageBoxButtons.OK, MessageBoxIcon.Warning); }
                    }
 
Put the lines in a variable for backreference, use a for loop so you can utilize line indexes, in loop also search for EHLO and remember the line index. When you find the email you have both the current line index and the last EHLO line index and can grab those lines right there.
 
And the reason why the line index is significant is because File.ReadAllLines() returns an array. You can use that index on the array.

Granted, if your log file is several gigabytes in size, I doubt that you'll want to read it all into memory with File.ReadAllLines(). You may have to implement a more sophisticated wrapper around File.ReadLines() which gives you a window the lines.
 
Back
Top Bottom