Optimizing Memory Usage when Processing Data in C#


Jul 12, 2024
Programming Experience
Hello everyone, I'm currently tasked with processing 10 text files, each containing 500,000 lines. Each line consists of 10 keywords separated by ';'. My job is to read these files and find the top 10 most frequently appearing keywords across all files (essentially the top 10 keywords out of 50 million keywords).

A strict requirement is to limit memory usage to no more than 1GB. I've tried using StreamReader to read the files and storing counts of each keyword in a Dictionary, but this approach doesn't meet the memory requirement. Is there a way to reduce the memory footprint for this task? Thank you.
When hashing it's can conflict hashcode ?

If you know your incoming data intimately, you can design a hashing function that provides "perfect hashing" where there will be no collisions. In general though, most generic hashing algorithms will have collisions. Normally, the more number of bits available in the resulting hash, the less probability of a collision (assuming a well designed hashing algorithm).
Top Bottom