oledb is slow.

Anwind

Well-known member
Joined
Nov 6, 2021
Messages
48
Programming Experience
1-3
hello

CSV File size : 500MB
The CSV file is read using the above code method.
oledb is slow.
Is there any way to speed it up?
Please teach me.
 
Last edited:
OLEDB is just a standard interface for different databases. As a standard, there various native database engines needs to convert data to data that confirms to the interface. It is not OLEDB that is slow but rather the database engine that is implementing the OLEDB interface that is slow.

What database are you using? The ACE or the JET engine?

What non-OLEDB database driver are you comparing with?

I hope you realize that anything backed by a CSV, the plaintext file needs to parsed to determine the rows and columns values, then try to detect the type of data for that column, then convert the column your to that massive type for each row, as the rows are being returned. Compare that to a real database where the column values are already in native types, and already stored in rows.

Also, by chance are you writing it to the console within your while loop in the code above? If so, the console is slow. Write out to a log file using a real logging system. Do not kludge together an ad hoc logging system where a file keeps on being opened and closed per write -- you would only barely be faster than the console.
 
Also, did you try out the Sylvan CSV data reader linked to by @cjard ? It has a very good reputation for being very very fast for a CSV reader.
 
I have to read a csv using System.Data.OleDb.
Why?

In your initial threads, you only said that you needed DbDataReader. My understanding is that the Sylvan CSV data reader, can present itself as a DbDataReader for those want to use ADO.NET code.

And then later you revealed that you need to be able to use a SELECT statement, but it turns out that you only needed that SELECT statement because you needed to get row numbers. We showed you in those threads that you can generate the row numbers yourself.

Again, tell us what is forcing the need for OLEDB?
 
Is there any way to speed up the read speed?
It is not OLEDB that is slow. It is the ACE database driver exposing an OLEDB interface that is slow. You are basically at the mercy of Microsoft. Considering that they abandoned ACE and went with JET, you'll likely not get much help there. The ACE database driver was written primarily for use with Access databases. It was only tangentially, expanded by the Excel development team to also accept a few other data formats so that data imports could be performed. It (as well as JET) was not meant to be a way for you to use a CSV files (or Excel files) as a full blown database.
 
I have to read a csv using OleDb.
The number of columns in the csv file is 43. All are text type.
Again, why do you need to use OLEDB? If all the column types are text, then the TextFieldParser should be all you need, since you generate your own row numbers anyway.

So in saying that all are type text, and in your other thread you said that you have an ID column, so the data in that column is not a numeric value. So that column can be sorted as text, and not as a number. That makes implement using a TextFieldParser even easier.

In pseudo code:
C#:
var parser = new TextFieldParser(csvFile)
{
    parser.TextFieldType = FieldType.Delimited;
    parser.Delimiters = new[] { "," };
    parser.CommentTokens = new[] { "#" };
};

// Read the first row to find out the ID column so that we can sort later
var headers = parser.ReadFields();
var idColumn = headers.IndexOf("ID");

var sortedRowsWithRowNumber = 
        GetRows(parser)
            .OrderBy(row => row[idColumn])
            .Select((row, index) => new { RowNumber = index + 1, Data = row})
            .ToList();

foreach(var row with sortedRowsWithRowNumber)
{
    // Print out row number and data in column 13 as example of using the data
    Console.WriteLine($"{row.RowNumber} : {row.Data[12]})
}

C#:
<IEnumerable<string[]> GetRows(TextFieldParser parser)
{
    while (!parser.EndOfData)
        yield return parser.ReadFields();
}
 
The JET driver is only available when running 32-bit. That is why you are getting that error about the driver not being registered.
 
I am migrating the existing sqlite( System.Data.SQLite.dll ) to oledb( System.Data.Oledb.dll ).
This is linked to System.Data.Common (System.Data.SQLite.dll & System.Data.Oledb.dll).
But why are you migrating?
 
The JET driver is only available when running 32-bit. That is why you are getting that error about the driver not being registered.
My PC : Windows10 (64bit)
Visual Studio2019, C#.NET compile AnyCPU & 32bit
No matter how much I tried to install the JET driver, it didn't install.
 
My PC : Windows10 (64bit)
Visual Studio2019, C#.NET compile AnyCPU & 32bit
No matter how much I tried to install the JET driver, it didn't install.
There is no need to install it. It comes built into .NET Framework. You need to make sure that you are building for 32-bit. Show us a screenshot of your build options page.
 
I have to do a code migration.
So you are migrating the code. Why are you migrating the data from a perfectly good database to a CSV file?
 
And if you are migrating the data from SQLite because of "performance" why would you migrate to something that have even worse performance. If the performance your are seeking is "memory efficiency" or "CPU usage", do a bit more research. There is a reason why SQLite is the database of choice for small devices like phones which have limited memory and CPU power (beyond the zero configuration features).
 
Is there any way to speed it up?
Other than moving the .CSV file to a faster drive, turning off all antivirus, and upgrading your CPU, no.

Does switching to a JET driver speed it up?
No.

Correction to my post #6 above: The JET driver came first. It was the ACE driver that followed afterwards. (Notice that I'm not going back and changing my previous posts or deleting them like someone seems to love doing.)

I still don't understand why doing a code migration is forcing a change from using SQLite to OLEDB. You can migrate the code and still keep the same database engine.
 
Last edited:
Or does the destination have to even be OLEDB common interface? Can it be the ODBC common interface? I believe that there is an SQLite ODBC driver.

Or is the concern about using the original SQLite database wrapper? You could use the Microsoft SQLite driver instead.
 
Your Answer) Other than moving the .CSV file to a faster drive
==> what is that drive?
"Drive" as in hard drive (HD) or solid state drive (SSD).
 
Back
Top Bottom