How to search sequence of bytes of a bin file?

Gokuland

New member
Joined
Feb 9, 2018
Messages
2
Programming Experience
Beginner
Hello, this is my first post. I'm fairly new to C# and specially to file handling, but I've been wanting to code a program that does something like this:



  1. Load a file
  2. Input a start and beginning offset addresses where to scan data from
  3. Scan that offset range in search of specific sequence of bytes (such as "05805A6C")
  4. Retrieve the offset of every match and write them to a .txt file
2zelef5.png
As the picture shows I need to search the file for "05805A6C" and then print to a .txt file the offset "0x21F0".

So far I can only achieve the first step which is loading the file and converting it into a byte array, but have no clue on how to search my array. Could anyone guide some guidance or advice?

Thanks

 
You would actually achieve steps 1 and 2 in one go. You could do it in two but that would mean reading data that you didn't need. You would create a FileStream and either call its Seek method or set its Position property to the start of the range you were interested in. You would then call the Read method of that FileStream, providing a byte array and specific a num er of bytes to read that corresponded to the length of the range you were interested in. You would then have a populated byte array and have avoided reading any data before or after the range of interest.

Step 3 is the meat of the problem. I don't have any code for you but I would advise that you read up on pattern-matching algorithms for binary data. I haven't had to do something like that for a long time but I do recall learning about such algorithms in my university days. Once you have an understanding of the algorithm you're going to use, you can implement it in any language. If you have specific issues while trying to do that in C# then we can help with those but you should know EXACTLY what your code is trying to achieve before starting to write it.

The values of the offsets will follow from implementing step 3 and, to be frank, you don't need anyone to tell you how to write to a text file. I shudder to think about the number of times that must have been written about on the web. You can even implement that part before step three because you can write and test a method to accept an offset value and write it to a file without having code that generates genuine offsets.
 
You would actually achieve steps 1 and 2 in one go. You could do it in two but that would mean reading data that you didn't need. You would create a FileStream and either call its Seek method or set its Position property to the start of the range you were interested in. You would then call the Read method of that FileStream, providing a byte array and specific a num er of bytes to read that corresponded to the length of the range you were interested in. You would then have a populated byte array and have avoided reading any data before or after the range of interest.

Step 3 is the meat of the problem. I don't have any code for you but I would advise that you read up on pattern-matching algorithms for binary data. I haven't had to do something like that for a long time but I do recall learning about such algorithms in my university days. Once you have an understanding of the algorithm you're going to use, you can implement it in any language. If you have specific issues while trying to do that in C# then we can help with those but you should know EXACTLY what your code is trying to achieve before starting to write it.

The values of the offsets will follow from implementing step 3 and, to be frank, you don't need anyone to tell you how to write to a text file. I shudder to think about the number of times that must have been written about on the web. You can even implement that part before step three because you can write and test a method to accept an offset value and write it to a file without having code that generates genuine offsets.

Yeah, first two steps aren't really a problem now, since step 3 is what really makes up the main functionality in this program. I don't have an issue with generating a txt file, either.

What I have so far is this:

C#:
byte[] test = System.IO.File.ReadAllBytes(openFileDialog1.FileName);

                string hex = BitConverter.ToString(test).Replace("-", string.Empty);

int indice = hex.IndexOf("05805A6C");
                    indice = indice + 8;
                    int index = (indice / 2);


                    string outputHex = int.Parse(index.ToString()).ToString("X");


                    MessageBox.Show("0x" + outputHex);


I load the file using openFileDialog, then I convert into a byte array and it's data into hexadecimal. To see if I can search for a pattern I try getting the index of the first finding. That index I convert it into hexadecimal and I show it as an offset address. This method works, but only for one search. What I'm aiming for is to search the entire file and get as many findings as possible. Not to mention that my patterns to look will come from a list.
 
I would have thought you would actually need to match bytes but, if you don't, then I suggest that you read the documentation for the String.IndexOf method that you're calling and see what overloads it has. That is the key to finding multiple instances.
 
Yeah, first two steps aren't really a problem now, since step 3 is what really makes up the main functionality in this program. I don't have an issue with generating a txt file, either.

What I have so far is this:

C#:
byte[] test = System.IO.File.ReadAllBytes(openFileDialog1.FileName);

                string hex = BitConverter.ToString(test).Replace("-", string.Empty);

int indice = hex.IndexOf("05805A6C");
                    indice = indice + 8;
                    int index = (indice / 2);


                    string outputHex = int.Parse(index.ToString()).ToString("X");


                    MessageBox.Show("0x" + outputHex);


I load the file using openFileDialog, then I convert into a byte array and it's data into hexadecimal. To see if I can search for a pattern I try getting the index of the first finding. That index I convert it into hexadecimal and I show it as an offset address. This method works, but only for one search. What I'm aiming for is to search the entire file and get as many findings as possible. Not to mention that my patterns to look will come from a list.
Thank you. This was really helpful.
 
Yikes; round tripping bytes through a string (1 megabyte bytes becomes 3+ megabytes string), then into another string to remove hyphens(3 megabytes string becomes 2 megabytes string) and back to bytes just so you can search it is ridiculously inefficient

The most basic algorithm to find a sequence of bytes in another sequence of bytes genuinely is ridiculously simple: iterate the haystack looking for the first byte of the needle, then when found check that the next N bytes of the haystack are the remaining bytes of the needle


Look at using Array.IndexOf(haystack, needle, offset) repeatedly to find one offset after another and consider needle[1..].SequenceEqual(array[offset+1..offset+needle.Length]) for checking if the successive bytes are equal if you aren't going to roll your own with a pair of nested loops (marginally more efficient)

Of course a quick Google for "find one byte array inside another" brings up everyone's favorite site - Find an array (byte[]) inside another array? - top answer being the "pair of nested loops" approach. You can convert that answer to accepting an offset to start searching from by accepting a parameter that is used to initialize i, rather than starting i from 0. Then you can repeatedly call search supplying the last offset and begin your search from a location within the array

If you're interested in boosting performance some, look at Boyer-Moore-Horspool Algorithm for All Matches (Find Byte array inside Byte array)
 
Last edited:
It's actually much worst. Recall that each character in .NET Unicode which will be 16-bits. That is 2 bytes per character. So 1MB in bytes will become a 6MB string.
 

Latest posts

Back
Top Bottom