Interaction with Excel, SQL, what is the best (quickest and most robust)?

SharpDunce

New member
Joined
Sep 10, 2014
Messages
2
Programming Experience
Beginner
Hello all,

I'm interning at a company and I've been tasked to make a command line c# program that:
- Takes in data from two excel files
- Converts it to a SQL friendly format after error checking and standardizing
- Checks the database for changes, removals and new addictions
- Consolidates the information into the server in one transaction

It has to be relatively quick and robust (error checking, crash-free, etc.). The excel sheets will have a few thousand (under 5) entries that will mostly remain the same with changes and additions here and there.

For most of the day I've been researching, it seems like using ado.net libraries and functions seems like a good way to do it. I made a test program that converted a simple excel table to a DataTable object and uploaded it to a table I made in the SQL database, but it was very simple and had no error checking and validation.
What is the best approach in your opinion? I'm pretty inexperienced with software development, and I haven't done anything past the regular class assignments.
 
Using ADO.NET to read the Excel data is certainly a viable option. It does have a few gotchas though, e.g. it only uses the first 8 rows to make a best guess at the data type in each column. That can be a problem if the data is mixed or empty in those rows.

The specifics of the validation would depend on the specifics of the data and the business rules. You're basically going to nee to write a method that returns a Boolean for each rule and then loop through the rows in the table and call each method. If a method returns False then validation fails. What to do then is also up to you, e.g. fail the whole file or just that row.

To save the data you have two options:

1. Retrieve all the data from SQL Server into a DataTable, make changes to that DataTable based on the contents of the other DataTable and then save back to SQL Server.

2. Loop through the rows of the original DataTable and make changes directly to the database for each one.

The first option has the overhead of retrieving all the data in the first place but would be more efficient other than that.
 
in c#
(depending on the structure, 1 datatable could be enough)
1 DataTable for Excel 1
1 DataTable for Excel 2

Datavalidation on DataTable(s) after inserting from Excel OR on Excel before inserting to the datatable(s)

this step should be done within some milliseconds

After the datavalidation u can BulkCopy the data of the DataTable to your sql server (http://msdn.microsoft.com/de-de/library/ex21zs8x(v=vs.110).aspx)
this is by far the fastest way to write massdata to the server (120 mbytes are written within 2 seconds or faster ... depends on your network)

if u need to validate the data from EXCEL vs SQLData (deleted records are maybe a bit hard to find, but changes and new records are simple) u should (in the step above -> BulkCopy) insert the data to a temp or import table (if u wanna check the data on your own by selecting data from table, u should use a import table that will be truncated by you right before the BulkCopy step)

After that u could execute a MERGE on the Server
SqlCommand myCommand = new SqlCommand("YourMergeStatementGoesHere", conn);
myCommand.ExecuteNonScalar (http://msdn.microsoft.com/de-de/lib...nt.sqlcommand.executenonquery(v=vs.110).aspx)

if u need to track changes within the database (update, inserted data) take a look at MERGE (Transact-SQL) this explains the MERGE Query
(and pay attention to th OUTPUT clause, which u could use to track the changes

all in all (depending on the whole data saved in the SQL table), these steps should take less than 2 seconds for that small amount of data.

i'm doin it that way (incl. a more complex datavalidation) for hundreds of text files each day

so long story short:
- Excel -> DataTable
- DataTable -> Validation
- Truncate Import Table on SQL Server
- BulkCopy DataTable to SQLServer ImportTable
- Merge ImportTable to largeSqlTable with current data


but there should be a much faster / better way -> SSIS Packages (no need for a console app, it will run completely on your server)
we get gigabytes delivered in text format (doesn't matter if it is text or excel, you can use both as datasource in SSIS) and import them with SSIS packages within seconds,
completly automated, and it takes the same time to write an ssis packages than it takes to write the console app.
But it uses the advantages of SQL Server much better than .net (but harder to debug (my personal opinion)
 
Thanks guys, I appreciate it. I'll look into all of that and get into implementing soon.

I forgot to mention above that the way in which I pushed my c# DataTable to SQL was with SQLBULKCOPY, it seemed like the best way to do all the actions in one-go, and its constructor can also take in a transaction object so I can ensure the copy is atomic (one of the requirements).

I also need to log errors (for example: missing or badly formatted column values in the excel files), is there an easy to use library I can use? I've never done any kind of logging before, again, because thus far my university's assignments have sort of been disjointed and I didn't get much exposure to more comprehensive software development.

EDIT: The data will about 10-11 columns, all of them strings.
 
Last edited:
Back
Top Bottom