Click here to Skip to main content
15,884,879 members
Articles / All Topics

CSV: Carelessly Saved Values

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
25 Oct 2010Apache2 min read 6.7K   1  
CSV: Carelessly Saved Values

I currently work with some numeric methods that accept large quantities of data as inputs. Normally the input comes from a database. As I was writing the tests, I needed a suitable external format to represent relatively large quantities of data. CSV seemed like a natural choice, because it is widely used and supported by SQL Management Studio and Excel. However, I found CSV very inconvenient, for the following reasons:

  • There is no standard way to export a data table to CSV. CSV files can be read by OLE DB Jet driver, but apparently this driver cannot create new tables/files. Google is full of samples on how to write to CSV files by hand.
  • There is no standard data format. Should the data be included in quotes? How do you handles strings with embedded commas? What is the date format? Apparently, there is no such things as “CSV standard” and everyone does what they please.
  • There is no type information. OLE DB driver can use schema.ini, but it is non-standard and cumbersome to work with. Also, it combines type information for multiple tables. Without explicit type information, OLE DB driver tries to “guess” the types, but often does it wrong. It seems that it looks at the first couple of dozen rows, and makes its decision based on that. If further rows are incompatible, they will suffer. The driver will either complain (“cannot convert string to integer”), or silently truncate doubles to ints, etc., which is even worse.
  • There is no null value. Empty strings are treated as NULLs on read, even if included in quotes. However, SQL Management Studio when exporting NULL values puts literal NULL into the file, which is read back as string.

All this makes CSV format somewhat difficult to use – it is very hard to guarantee that the data read will be exactly equivalent (in type and value) to the data written. I gave up and switched to DataTable.WriteXml() and WriteXmlSchema(), but this leads to much bigger files and is not compatible with anything but .NET. I am still looking for an alternative.

This article was originally posted at http://www.ikriv.com/blog?p=605

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0


Written By
Technical Lead Thomson Reuters
United States United States
Ivan is a hands-on software architect/technical lead working for Thomson Reuters in the New York City area. At present I am mostly building complex multi-threaded WPF application for the financial sector, but I am also interested in cloud computing, web development, mobile development, etc.

Please visit my web site: www.ikriv.com.

Comments and Discussions

 
-- There are no messages in this forum --