Question How would you compare an excel with data in the database table?

raysefo

Well-known member
Joined
Feb 22, 2019
Messages
361
Programming Experience
10+
Hello friends,

I am working on a Blazor Server application. I would like to upload an excel file, consisting of a couple of columns. (ReferenceId-guid, Order Date-datetime, Product Code-string, and Product Description-string) This excel file roughly has 200000 rows of data.
I am gonna use EPPlus non-commercial package. I will compare the referenceId's from the excel with the database but there might be duplicate referenceId's. So I am somehow grouping those referenceId's in excel and then iterating it comparing them with the database records.
One last requirement is to write down both matching and nonmatching referenceId's in an excel file. I think when comparing the referenceId's adding those values into two separate lists (matching, non-matching) and then writing those records inside of those lists in excel with EPPlus will solve my problem.

Can I have your comments about this solution? Can it be improved?

Thank you.
 
How many duplicates do you expect to find in the Excel file? The overhead of sorting and grouping may not be worth the cost of doing a few extra duplicate queries against the database. Recall that databases are designed to return queries quickly.

On the flip side, there maybe some potential gains form sorting and grouping the Excel data, and then asking for the data from the database also in sorted order. The comparisons will just end up as if you were doing a merge sort where the Excel data is one bin, and the database is the other bin.

The only true way to find out is to do some profiling on your program to see which approach gives you better efficiency.

Why an excel file for the output file of matching and non-matching reference ids? Do you expect to do any formatting on them? If not, a CSV file should be good enough since Excel is usually setup to be the default viewer for CSV files on most machines.
 
Ugh, this is awkward in Blazor server because there isn't a good way to stream an Excel file, so you end up loading it all into memory. Converting it to CSV could be helpful. Deduping it on the fly would be helpful too, and by that I mean deduping it against itself before approaching the db to see if records have been seen before

Also, I'm not sure if I'd use EPPlus just as a reader. It's kinda like buying a Ferrari when you need a car to go to the shop once a week. EPP is great at making excels, but your use case looks limited to needing a lightweight, fast, reader only.
 
Take a look at borisdj's EF core bulk extensions. They're good for loading a large amount of data into a db, upsert style (create or update depending if it exists or not)

The last project I had for big volume data was looking at ECG readings, and my Blazor secret app would stream a CSV uniquely into a DB at about 75,000 records a second which was reasonable for the size of the datasets
 
Back
Top Bottom