Using LINQ with CSV Files

In my last post, I wrote how you can convert a CSV to XML. I stated that the reason that you would do this is so that you can use LINQ on CSV files.

Using my CsvToXml class you can easily use LINQ on CSV. You can see the source for CsvToXml in my C# CommonLib library.

Let’s assume your CSV file looks like the following:

'Product','Price','DateStocked'
'Pepsi','4.50','2010-05-04'
'Coke','3.00','2010-09-22'
'Cheetos','7.25','2009-01-13'

You can then convert like so:

var csv2xml = new CsvToXml("mycsvfile.csv", true);
csv2xml.TextQualifier = '\'';
csv2xml.Convert();

You can then do cool LINQ queries using the “Records” property of the CsvToXml class:

var records = from rec in csv2xml.Records
			where (decimal)rec.Element("Price") > 3.5m
			orderby (string)rec.Element("Product")
			select rec;

or use LINQ like…

var records = from rec in csv2xml.Records
			where Convert.ToDateTime((string)rec.Element("DateStocked")) > new DateTime(2010, 9, 1)
			orderby (string)rec.Element("Product")
			select rec;

Then to access the data of the returned “records” object you can do this:

var myrec = records.ElementAt(0);
var firstProduct = (string)myrec.Element("Product");

Cool, huh? But we can do better. It’s annoying to cast your objects all of the time and to call Element(“CSV_COLUMN_NAME”). Why not take advantage of .NET 4.0 and use its ‘dynamic’ data type? Admittedly, I got this idea from this article: Creating Wrappers with DynamicObject.

Now we can use LINQ like this:

var records = from rec in csv2xml.DynamicRecords
			where (decimal)rec.Price > 3.5m
			orderby (string)rec.Product
			select rec;
			
//OR

records = from rec in csv2xml.DynamicRecords
		where Convert.ToDateTime((string)rec.DateStocked) > new DateTime(2010, 9, 1)
		orderby (string)rec.Product
		select rec;

A bit better. Now you don’t have to call “Element” method everytime, you still have to annoyingly cast. But, in each CSV column, we already know that the types should be consistent. Let’s define our types up front:

var csv2xml = new CsvToXml("mycsvfile.csv", true);
csv2xml.TextQualifier = '\'';

csv2xml.ColumnTypes.Add("Product", typeof(string));
csv2xml.ColumnTypes.Add("Price", typeof(decimal));
csv2xml.ColumnTypes.Add("DateStocked", typeof(DateTime));
			
csv2xml.Convert();

Now we can use LINQ like this:

records = from rec in csv2xml.DynamicRecords
			where rec.Price > 3.5m
			orderby rec.Product
			select rec;

//OR
records = from rec in csv2xml.DynamicRecords
			where rec.DateStocked > new DateTime(2010, 9, 1)
			orderby rec.Product
			select rec;

This is truly clean code. All you need to do is define your types for each column and you’re set. At this time, the major disadvantage is that you can’t lazily evaluate your CSV. You must read it entirely into memory. This class could be modified for lazy evaluation and then you would be able to use LINQ on large CSV files.

Are you a Git user? Let me help you make project management with Git simple. Checkout Gitpilot.

I hope you’ve enjoyed this. You should read my blog on entrepreneurship: Techneur or follow me on Twitter: @jprichardson.

-JP

Convert a CSV to XML using C#

I wrote a simple class that will help you convert a CSV file to XML. I can hear you asking “that’s nice JP, but why would I want to convert my simple CSV to that messy text format from the early 2000’s?” You would do this because you want to use LINQ with CSV files… that’s why.

Here is the crux of the algorithm:

public void Convert() {
	var tempLines = File.ReadAllLines(this.CsvFile);
	string[] lines = null;
	_columnNames = null;
	
	if (this.HasColumnNames) {
		_columnNames = Csv.RecordSplit(tempLines[0], this.RecordDelimiter, this.TextQualifier);
	
		lines = new string[tempLines.Length - 1];
		Array.Copy(tempLines, 1, lines, 0, lines.Length);
	} else {
		var columnCount = Csv.RecordSplit(tempLines[0], this.RecordDelimiter, this.TextQualifier).Length;
		_columnNames = new string[columnCount];
		for (int x = 0; x < _columnNames.Length; ++x)
			_columnNames[x] = "Column" + (x+1);
	
		lines = tempLines;
	}
	
	this.XmlData = new XElement("Records",
		from line in lines
		let fields = Csv.RecordSplit(line, this.RecordDelimiter, this.TextQualifier)
		select new XElement("Record",
			from fieldData in fields
			let i = fields.ToList().FindIndex(f => f == fieldData)
			select new XElement(_columnNames[i], fieldData)
		)
	);
}

You can look at the full code in my C# CommonLib library. As you can see, the code is really straightforward.

Let’s assume that your CSV file looks like this:

'Product','Price','DateStocked'
'Pepsi','4.50','2010-05-04'
'Coke','3.00','2010-09-22'
'Cheetos','7.25','2009-01-13'

You can then run Csv2Xml like this:

var csv = new CsvToXml(csvFile);
csv.RecordDelimiter = ','; 
csv.TextQualifier = '\'';
csv.HasColumnNames = false;
csv.Convert();

var actualXml = csv.XmlString;

The output XML will look like this:

<?xml version="1.0" encoding="utf-8"?>
<Records>
	<Record>
		<Column1>Pepsi</Column1>
		<Column2>4.50</Column2>
		<Column3>2010-05-04</Column3>
	</Record>
	<Record>
		<Column1>Coke</Column1>
		<Column2>3.00</Column2>
		<Column3>2010-09-22</Column3>
	</Record>
	<Record>
		<Column1>Cheetos</Column1>
		<Column2>7.25</Column2>
		<Column3>2009-01-13</Column3>
	</Record>
</Records>

If you set “HasColumnNames” to “True”, your XML output will look like this:

<?xml version="1.0" encoding="utf-8"?>
<Records>
	<Record>
		<Product>Pepsi</Product>
		<Price>4.50</Price>
		<DateStocked>2010-05-04</DateStocked>
	</Record>
	<Record>
		<Product>Coke</Product>
		<Price>3.00</Price>
		<DateStocked>2010-09-22</DateStocked>
	</Record>
	<Record>
		<Product>Cheetos</Product>
		<Price>7.25</Price>
		<DateStocked>2009-01-13</DateStocked>
	</Record>
</Records>

Are you a Git user? Let me help you make project management with Git simple. Checkout Gitpilot.

Pretty nifty huh? In the next post, I’ll explain how you can use this with LINQ. In the mean time follow me on Twitter (@jprichardson) or read my blog on entrepreneurship: Techneur.

-JP