Click here to Skip to main content
15,886,110 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I often have to import data from multiple sources, including text and XML files. Sometimes I ask myself, why is the SQL server so extremely slow? Currently, a SQL bulkcopy command is freezing my application after a few seconds (at about 600 data sets out of 15,000! 11MB csv file), while loading, parsing and converting the data into a dataset consumes so little time, that you do not even notice. The only bottleneck is the sql bulkcopy command. Annoying! Of course, XML databases are not a substitute for relational databases, which also have certain advantages. SQL Servers can not be dismissed - think of the transaction concept, standardized query syntax, stored procedures and other advanced / specialized features. But, considering the performance of a non-tuned / optimized SQL Server compared to reading a csv- file, seems like a bad a joke to me. On the local file system, XML databases perform much better - despite the huge XML syntax overhead. The usage of simple text files, accessed over SAN, exceeds any DB Solution in reaction and transaction speed and size, as well as total memory usage over time (my experiences and I am talking about a dumb and non- cached solution).

So my question to you would be, what du you say on the subject, what alternatives do you use (Apache Derby, Berkeley, in-memory and XML, IBM DB2 pureXML, etc.) and has anyone ever done some test on this. Is there a .net Database with full Entity- Framework support out there?

OK, guys, I am glad hearing from you and get it going...
Posted
Updated 11-Oct-16 11:57am
v3

The problem generally isn't that SQL Server is a poor choice for bulk inserts, the problem actually lies in having to rebalance the indexes after each record is inserted. The more indexes you have, the more work the database engine has to do to cope with it. A common solution to this is to drop the indexes, bulk upload the data and then reapply the indexes - recreating the indexes once is much faster.

This problem doesn't just lie with SQL Server, just doing a bulk upload into any of the big Indexed databases has the same effect.
 
Share this answer
 
Comments
Oliver Bleckmann 23-May-12 6:07am    
Well, as I said, "a non-tuned / optimized SQL Server", means no db index involved here and, neither text files are cached, nor paged, fragmented or presorted!
 
Share this answer
 
I think that you have stopped investigating the problem too early. As Pete pointed out, there are factors that affect the overall performance. One is the indexing, of course the network may cause delay, data type conversions (especially implicit) and so on.

So my advice wouldn't be to change the product before you dig deeper and find out the root cause. Also instead of using bulk copy, the performance may (or may not) be better if you use type definition instead. For that, have a look at CREATE TYPE[^] or for an example How to pass multiple records to a Stored Procedure[^]
 
Share this answer
 
Well, I did some Improvements on my code, et voila, the bulkcopy routine (without changes!) runs faster than ever (about 6 seconds for 11 MB data in 15000 rows)! Conclusion: Update your code bevore complaining ;-)
Seriously, that did not change my mind. SQL performance is poor, hence the insertion takes 6 seconds, but loading and parsing the csv takes no time at all.

1. Use OleDb over ODBC like:
// cahnage 
OdbcConnection, OdbcCommand, OdbcDataReader 
// to -> 
OleDbConnection, OleDbCommand, OleDbDataReader 
// and so on...

2. Use ACE over Jet for 64 bit support - install from redistributable if neccessary, search for AccessDatabaseEngine_X64 for 64 bit version
3. and coose the right connection string like:
// not this connection string is for csv files, a schema.ini may be needed for correct results
@"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="   CsvDirectory.Trim()   @";Extended Properties=""text;HDR=yes;FMT=Delimited""

C#
// These are depricated and for use with odbc/ jet
"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq="   CsvDirectory.Trim()   ";Extensions=asc,csv,tab,txt;Persist Security Info=False";
// or
"Driver={Microsoft Access Text Driver (*.txt, *.csv)};Extensions=asc,csv,tab,txt;Persist Security Info=False;Dbq="   CsvDirectory.Trim()  


3. now we are able to take it into x64 architecture
4. I feel like I forgot something...
 
Share this answer
 
Comments
Pete O'Hanlon 23-May-12 10:12am    
Why aren't you using SqlConnection, SqlCommand and SqlDataReader? They are much faster than the more generic command/connection tyeps. Also, why don't you use SqlBulkCopy? http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx.
Oliver Bleckmann 24-May-12 2:15am    
Well, correct me if I am wrong, but SqlConnection is for SQL Server Connections only, so in my case, there is no chance to connect a csv or text file. Provide me a connection string for csv and I will give it a try. And yes, you maybe right, but elsewhere I do use SqlConnection...
Pete O'Hanlon 24-May-12 4:01am    
My apologies. Part 1 of your list here looked like you were using these for the SQL Server connections. To be honest though - most people wouldn't attempt to bulk upload into SQL Server like this - SQL Server provides bulk copy routines that you would be better off using.
Four years later knowing some other solutions, I can tell that some of the really huge companies ended up writing their own storage strategy to fit their need. That's where solutions like HBase, BigTable, etc. came from. Companies tend to go hybrid or non-relational. The later doesn't mean there no transaction concept involved or cannot be used for critical data. The variety of databases is really frightening and there is no consolidation in sight. In times of IoT and BigData the choice depends rather on an existing infrastructure than features or performance considerations. OMG...
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900