Click here to Skip to main content
15,886,689 members
Articles / Programming Languages / C#

Converting text files (CSV) to DataTable

Rate me:
Please Sign up or sign in to vote.
4.85/5 (12 votes)
21 Mar 2017CPOL3 min read 30.1K   1.1K   14   8
An article on converting text files or csv-files to a System.Data.DataSet based on regular expressions

Download textfiledataset_src.zip - 6 KB
Download textfiledataset_demo.zip - 15 KB

Introduction

(Originally posted on 18 Feb 2008)
This code allows you to generate and fill a DataTable from a text file. You are able to:

  • define your own column names or have them created automatically
  • validate data with regular expressions
  • use RegexColumnBuilder to construct the DataTable with no regular expression knowledge
  • specify columntypes for the resulting DataTable, not just the type string
  • process any kind of text file, not just delimited files
  • have a collection of misread lines of data
  • process large files

Background

Every now and then I need to read some text file for importing data. Most of the times what I really need is a dataset with that data. But how to convert that text file to a dataset quickly? There are several articles here on codeproject that describe how to convert a csv file or a text file to a database or dataset (e.g. FinalCSVReader, DataSetFrmDelimTxt). But none of them were flexible enough, or offered data-validation.

Using the code

Getting started

To get started right away let us assume that we have a delimited file birthday.txt with the following content from which we want to create a DataTable:

1,Chris,12-07-1972
2,Dave,03-01-1974
3,John,03-19-1980,Drummer
4,Mark,12-02-1980
5,Eric,09-18-1981

A quick way to get the DataSet would be:

C#
// open the file 
FileStream fileStream = new FileStream("birthday.txt", FileMode.Open, FileAccess.Read);
// create an instance of MyTextFileDataSet
TextFileDataSet MyTextFileDataSet = new TextFileDataSet();
// specify the regular expression for validating and recognising columns
MyTextFileDataSet.ContentExpression = new Regex("^(?<ID>[^,]+),(?<Name>[^,]+),(?<Date>[^,]+)$");
// fill the dataset
MyTextFileDataSet.Fill(fileStream);
// close the file
fileStream .Close();

The resulting MyTextFileDataSet will contain one DataTable with three DataColumns: ID, Name and Date. Ofcourse it will be filled with four rows of data. Four? Yes, four rows. Row number 3 is not valid according to the regular expression. You can find this row in the property MyTextFileDataSet.MisReads. This property is of type List<string>.

Wondering where the column names came from, just take a look at the used regular expression:

C#
^(?<ID>[^,]+),(?<Name>[^,]+),(?<Date>[^,]+)$

You can see that the names for the DataColumns are provided. If you did not specify the names but made your expression something like ^([^,]+),([^,]+),([^,]+)$ The names of the columns would be 1,2 and 3.

Without regular expressions

If you are not familiar with regular expressions another approach is implemented to define the columns.
This is achieved by using the RegexColumnBuilder. Take a look at this sample:

C#
RegexColumnBuilder MyColumBuilder = new RegexColumnBuilder();
MyColumBuilder.AddColumn("ID", ',');
MyColumBuilder.AddColumn("DATE", ',');
MyColumBuilder.AddColumn("NAME", ',');
Regex MyRegex = MyColumBuilder.CreateRegularExpression();

The MyRegex can be used to put in the MyTextFileDataSet.RegularExpression, or just place the complete RegexColumnBuilder in the TextFileDataSet.

MyTextFileDataSet.ColumnBuilder = MyColumnBuilder;

Specify column type

In the previous sample a column is defined with only a suitable delimiter. At this time there are four defined types available: INT, DOUBLE, DATE and STRING. The way to use these is:

MyColumBuilder.AddColumn("ID", ',', RegexColumnType.INTEGER);
MyColumBuilder.AddColumn("DATE", ',', RegexColumnType.DATE);

To have the specified types present in the resulting dataset one must place the MyColumnBuilder in the MyTextFileDataSet.ColumnBuilder property, otherwise the specifications are always of type string.

Large files

TextFileDataSet can handle large files because internally it is using System.IO.Stream as way of input. This also has the extra benefit of not having to have the file on disk per se.

Points of Interest

The TextFileDataSet inherits the System.Data.DataSet, so anything you can do with a System.Data.DataSet you can do with TextFileDataSet.

If you are not familiar with regular expressions, check out these articles and get started:

You can also get this code on github : https://github.com/cvanbergen/TextFileDataSet

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) NoNerds
Netherlands Netherlands
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralMy vote of 4 Pin
Jim_Snyder22-Mar-17 9:44
professionalJim_Snyder22-Mar-17 9:44 
GeneralRe: My vote of 4 Pin
Christiaan van Bergen22-Mar-17 10:59
professionalChristiaan van Bergen22-Mar-17 10:59 
GeneralRe: My vote of 4 Pin
Jim_Snyder23-Mar-17 2:50
professionalJim_Snyder23-Mar-17 2:50 
GeneralRe: My vote of 4 Pin
Christiaan van Bergen23-Mar-17 3:02
professionalChristiaan van Bergen23-Mar-17 3:02 
GeneralMy vote of 5 Pin
kevinr18821-Mar-17 14:01
professionalkevinr18821-Mar-17 14:01 
GeneralRe: My vote of 5 Pin
Christiaan van Bergen21-Mar-17 22:48
professionalChristiaan van Bergen21-Mar-17 22:48 
Generalconvert to DataTable, not to Dataset Pin
Mr.PoorEnglish20-Mar-17 10:56
Mr.PoorEnglish20-Mar-17 10:56 
GeneralRe: convert to DataTable, not to Dataset Pin
Christiaan van Bergen20-Mar-17 23:38
professionalChristiaan van Bergen20-Mar-17 23:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.