Insides of LINQ

Tariq A Karim

4.14/5 (15 votes)

Jun 17, 2008

CPOL

9 min read

37043

This article describes LINQ and other related language extensions.

Introduction

Today's programming languages support various storage structures to store data on a permanent or temporary basis, like Relational Databases, XML, Collections, and Arrays. These different storage structures expose different APIs for the manipulation of data; e.g., to interact with a relational data source, we use ADO/SQL interfaces, and we use an XML/XPath library for XML data storage. Some of the these data storage options expose very strong APIs for manipulation of data like SQL and XPath, while some storage options provide a very simple interface like Collections and Arrays. Indeed, all these storage options are very powerful, but yet there is a gap between the usage of these data storage options and the general programming languages.

LINQ (as a result of a long term research project in Microsoft) provides a unified, yet strong, interface for manipulation of data across different data sources. It exposes a query language, similar to SQL, to manipulate data even if data is stored in Arrays, Collections, XML, or data sets. You would also appreciate that unlike other data storage interfaces like XPath/SQL, LINQ provides a strong typed interface. Therefore, all strong typed compilers like C# can make sure that applications using LINQ are type safe.

LINQ is designed to be an extendable technology, and the current release targets relational databases, data sets, collections, arrays, and objects implementing the IEnumerable interface. Along with LINQ, Microsoft also introduced a few language extensions that make LINQ more powerful and easy to use. So, let's take an overview of these language extensions before we really start learning LINQ.

Implicitly Typed Variables and Arrays

In the Microsoft .NET platform, all variables need to be defined and must have a type before they could be used. C# 3.0 lets you define an implicitly typed variable where the type of the variable is inferred by the compiler.

public void LetsDeclareSomeImplicitlyTypedVariables()
{
  var myVar1 = 0;
  var myVar2 = true;
  var myVar3 = "Lazy fox jumps over the brown gate";
}

Implicitly typed variables are defined with the keyword var. Strictly, var does not mean variant variables as in Visual Basic. This keyword just requests the compiler to infer the type of the variable based on the initial value. In the above example, myVar1 is inferred as an integer variable, myVar2 as a boolean, and myVar3 as a string variable. In the CLR, there is nothing known as var, thus if you disassemble the code using the implicitly typed variables, you will observe that the C# compiler, during compilation, defines the proper types for all implicitly typed variables. The disassembled version for the above example is as follows:

[Disassembled via Reflector]

public string LetsDeclareSomeImplicitlyTypedVariables()
{
    int    myVar1 = 0;
    bool   myVar2 = true;
    string myVar3 = "Lazy fox jumps over the brown gate";
}

You can define implicitly typed variables for any type as long as the compiler can determine the type of the variable during compilation. All of the following are valid implicitly typed variables:

var myIntegerArray = new int[] { 1, 2, 3, 4, 6, 6, 7, 8 };	Compiler converts to `int[] myIntegerArray = new int[] { 1, 2, 3, 4, 6, 6, 7, 8 };`
var myGenericListOfTypeMyClass = new List<MyClass>();	Compiler converts to `List<MyClass> myGenericListOfTypeMyClass = new List<MyClass>();`
var myClass = new MyClass();	Compiler converts to `MyClass myClass = new MyClass();`
int[] myIntegerArray = new int[] { 1, 2, 3, 4, 6, 6, 7, 8 }; foreach (var item in myIntegerArray ) { Console.WriteLine("Item value: {0}", item); }	Compiler converts to: int[] myIntegerArray = new int[] { 1, 2, 3, 4, 6, 6, 7, 8 }; foreach (int item in myIntegerArray ) { Console.WriteLine("Item value: {0}", item); }

Similar to implicitly typed variables, C# 3.0 lets you define an implicitly typed array as follows:

var myArray = new[] { 1, 10, 100, 1000 };

The above statement will be compiled as int[] myArray = new int[] { 1, 10, 100, 1000 };.

There are some restrictions associated with implicitly typed variables, as follows:

Implicitly typed variables can only be declared local to a method, and must be initialized with some initial value.
It is illegal to define a nullable implicitly typed local variable.
It is illegal to use the var keyword to define return values, parameters, or field data of a type:

Anonymous Types

In earlier versions of C#, you need to declare either a struct or a class to encapsulate data. However, you would agree that there are situations when you want to encapsulate data locally without any associated methods or events. For such situations, defining a class or a struct could be quite tedious and labor-intensive. C# 3.0 introduces a feature called "Anonymous types" that lets you encapsulate data without a proper definition of a class or a struct. The following example creates an anonymous type:

public string LetsDeclareSomeAnonymousTyoe()
{
  var BoxAnonymousType = new { Color = "Blue", Weight = 24.0, Height = 45, Width=45.34 };
}

In the above example, we create an anonymous type with these members: Color, Weight, Height, and Width. Similar to Implicitly Typed variables, the CLR does not have any concept of Anonymous Types. During compilation, the C# compiler declares a class with a unique name and with four readonly properties Color, Blue, Height, and Width. The implicitly typed variable BoxAnonymousType is initialized with the compiler generated class. If you disassemble the above code, via ILDASM or Reflector, you would see the compiler generated class.

All compiler generated classes for anonymous types drive from System.Object and override the Equal, GetHasCode, and ToString methods. The ToString implementation of a compiler generated class for an anonymous type just builds a string from each name/value pair. Thus, BoxAnonymousType.ToString() will return {Color = "Blue", Weight = "24.0", Height = "45", Width="45.34"}. GetHashCode computes the hash code by using each name/value pair, thus, if two objects of a compiler generated class have exactly the same value for each name/value pair, the GetHasCode method would return the same value. The Equal method compares the values of each name/value pair, and therefore returns true if both the objects of the compiler generated class have exactly the same values. Please note that the == operator (as default behavior) will still compare the reference of the compiler generated classes, not the values. With these default implementation of methods in anonymous types, anonymous types are well suited to be contained in a hashtable.

One more important aspect of Anonymous Types is that the compiler generates one class for all similar Anonymous Types (i.e., having the same properties, name and type).

Extension Methods

Once a type is compiled, its definition is more or less final. The only way to add a new functionality or member is to re-code and re-compile. Hence, if you do not have access to the source code, you could not add the members in the compiled type. Let's assume you want to add a method in the System.Object class that returns the assembly name of the object. As all types directly or indirectly inherit from System.Object, adding a method in System.Object means adding a method in all types. Since you do not have access to the code-base for System.Object, adding this functionality in System.Object seems quite impossible. C# 3.0 introduces a new feature known as "Extension Methods" that allows you to add a new method in a compiled type. The following example uses extension methods to add a new method in the System.Object class.

public static class MyExtensionClass 
{
  public static string AssemblyName(this System.Object obj)
  {
    return obj.GetType().Assembly.FullName;
  }
}

All extension methods need to be defined in a static class, i.e., all extension methods must be defined as static. The above example creates an Extension Method named AssemblyName. The parameter this signifies that it is an extension method. The next parameter indicates the type this extension method belongs to, i.e., System.Object. Since this method was not originally a part of System.Object, we need to declare a variable that holds the reference of the object. In the above example, obj holds the reference. Through this reference, the extension method can access the object's properties and methods.

Once an extension method is defined, you can access it like other System.Object members, and is also visible in the member list of Visual Studio.

[Note: Down arrow next to AssemblyName indicates that it is an extension method]

Extension methods can also take parameters. The following extension method is added to the type int that compares the int with the given value.

public static class MyExtensionClass
{
    public static bool iSGreater(this int currentInt, int value)
    {
        return currentInt > value;
    }
}
Int myInt = 450;
Bool b = myInt.iSGreater (350);

Like the other two features of C# 3.0, Extension Methods is also a language extension and has no significance in the CLR. During compilation, C# replaces the extension method call with a regular call of a static method. Therefore, myInt.iSGreater (350) would be replaced with MyExtensionClass.iSGreater(myInt, 350). Consequently, the C# compiler and Visual Studio together make the developers feel that AssemblyName is defined in the System.Object class.

LINQ uses Extension Methods extensively, therefore, before jumping into LINQ, let's build a library of Extension Methods that, down the road, would help us in understanding LINQ. Following is the template that we would be using to build the library:

namespace MyLibrary
{
    public static class MyExtensionMethods
    {
    }
}

Extension Method: `FilterCollectionsBasedOnType`

A non-generic collection class can have objects of different types. Our first Extension Method is for the IEnumerble interface, and it allows you to filter a collection based on the given type. Here is the code:

public static System.Collections.Generic.IEnumerable<T> 
       FilterCollectionsBasedOnType<T>(this System.Collections.IEnumerable list) 
       where T : class
{
    System.Collections.Generic.List<T> filteredList = new List<T>();
    foreach (System.Object obj in list)
    {
        if (obj is T)
            filteredList.Add(obj as T);
    }
    return filteredList;
}

Example:

System.Collections.ArrayList myArray = new ArrayList();
myArray.Add(new Car());
myArray.Add(new Box());
myArray.Add(new Person());
IEnumerable<Car> enumerable =  myArray.FilterCollections<Car>();

Extension Method: `FilterCollectionsBasedOnPredicate`

The following Extension Method filters the collection based on the result of the given predicate delegate:

public static System.Collections.IEnumerable 
       FilterCollectionsBasedOnType<T>(this System.Collections.Generic.IEnumerable<T> list, 
       Predicate<T> predicate)
{
    System.Collections.Generic.List<T> filteredList = new List<T>();
    foreach (T item in list)
    {
        if (predicate(item))
            filteredList.Add(item);
    }
    return filteredList;
}

Example:

System.Collections.Generic.List<int> numbersList = new List<int>();
numbersList.Add(23);
numbersList.Add(45);
numbersList.Add(56);
numbersList.Add(87);
IEnumerable evenNumbers= numbersList.FilterCollectionsBasedOnPredicate(IsEven) 
//OR
IEnumerable evenNumbers= numbersList.FilterCollectionsBasedOnPredicate(num=> num%2 == 0)
//(For more information about lambda expression read my blog).
public bool IsEven(int num)
{
    return num % 2 == 0;
}

Extension Method: `FindMinimum`

The FindMinimum extension method applies to an array of ints and returns the minimum value.

public static int FindMinimum(System.Collections.Generic.IEnumerable<int> list)
{
    int min = int.MinValue
    foreach (int item in list)
    {
        if (item < min) min = item;
    }
    return min;
}

Example:

Int[] myIntArray = new int[]{3, 1, 45, 67};
Int minimumValue = myIntArrau.FindMinimum();

Extension Method: `FindMaximum`

The FindMaximum extension method applies to an array of ints and returns the maximum value.

public static int FindMaximum(System.Collections.Generic.IEnumerable<int> list)
{
    int max = int.MaxValue;
    foreach (int item in list)
    {
        if (item > max) max = item;
    }
    return max;
}

Example:

Int[] myIntArray = new int[]{3, 1, 45, 67};
Int maximumValue = myIntArrau.FindMaximum();

Note: you can write overloaded methods for other numeric types like float, double etc

LINQ (Language Integrated Query Language)

You would be surprised to know that you have already covered the core basics of LINQ. Primarily, LINQ exposes a library of hundreds of Extension Methods like the one we built in the earlier exercise. The following code uses a few of those Eextension Methods:

The following code finds the minimum value from the array of integers:

int[] myIntArray = new int[] { 71, 45, 67, 23, 89, 101 };
int minimumValue = myIntArray.Min();

The following code finds the average value from the array of integers:

int[] myIntArray = new int[] { 71, 45, 67, 23, 89, 101 };
double averageValue = myIntArray.Average();

The following code finds those students older than 11 by using a delegate:

System.Collections.Generic.List<Student> studentCollection = new List<Student>();

Student student1 = new Student();
student1.Name = "Alpha";
student1.Age = 14;
studentCollection.Add(student1); 
Student student2 = new Student();
student2.Name = "Beta";
student2.Age = 13;
studentCollection.Add(student2);
 Student student3 = new Student();
student3.Name = "Gamma";
student3.Age = 10;
studentCollection.Add(student3);

IEnumerable<Student> enumerable = 
  studentCollection.Where<Student>(IsStudentOlderThanEleven);

foreach (Student student in enumerable)
{
  MessageBox.Show(student.Name);
}
private bool IsStudentOlderThanEleven(Student student)
{
    return student.Age > 11;
}

The following code finds those students older than 11 by using a Lambda expression:

System.Collections.Generic.List<Student> studentCollection = new List<Student>();
Student student1 = new Student();
student1.Name = "Alpha";
student1.Age = 14;
studentCollection.Add(student1); 
Student student2 = new Student();
student2.Name = "Beta";
student2.Age = 13;
studentCollection.Add(student2);
Student student3 = new Student();
student3.Name = "Gamma";
student3.Age = 10;
studentCollection.Add(student3);
IEnumerable<Student> enumerable = 
  studentCollection.Where<Student>(student=> student.Age>11);
foreach (Student student in enumerable)
{
  MessageBox.Show(student.Name);
}

The following code finds those students older than 11 by using a Lambda expression and var:

System.Collections.Generic.List<Student> studentCollection = new List<Student>();
Student student1 = new Student();
student1.Name = "Alpha";
student1.Age = 14;
studentCollection.Add(student1); 
Student student2 = new Student();
student2.Name = "Beta";
student2.Age = 13;
studentCollection.Add(student2);
Student student3 = new Student();
student3.Name = "Gamma";
student3.Age = 10;|
studentCollection.Add(student3);
var enumerable=studentCollection.Where<Student>(student=> student.Age>11);
foreach (Student student in enumerable)
{
  MessageBox.Show(student.Name);
}

LINQ provides both generic and non-generic versions of Extension Methods. I would suggest you browse and get an idea of other Extension Methods introduced by LINQ. The usage of these extensions is further simplified by the C# compiler by introducing a simpler and SQL-like syntax to call these extension methods.

The following code uses LINQ query syntax to find those students older than 11:

System.Collections.Generic.List<Student> studentCollection = new List<Student>();
Student student1 = new Student();
student1.Name = "Alpha";
student1.Age = 14;
studentCollection.Add(student1); 
Student student2 = new Student();
student2.Name = "Beta";
student2.Age = 13;
studentCollection.Add(student2);
Student student3 = new Student();
student3.Name = "Gamma";
student3.Age = 10;
studentCollection.Add(student3); 
var enumerable = from student in studentCollection where student.Age > 11 select student;
foreach (Student student in enumerable)
{
  MessageBox.Show(student.Name);
}

During compilation, the C# compiler changes the query syntax to extension method calls. The following table illustrates the query struct of LINQ:

`from`, `in`	Define the container that needs to be filtered
`where`	Condition to filter
`select`	Select an object from the container
`join`, `on`, `equals`, `into`	Executes `join`s
`orderby`, `ascending`, `descending`	Sorts the result
`group`, `by`	Groups the data by the given key

Remember, each LINQ query is eventually converted into calls to multiple Extension Methods.

Insides of LINQ

Introduction

Implicitly Typed Variables and Arrays

[Disassembled via Reflector]

Anonymous Types

Extension Methods

[Note: Down arrow next to AssemblyName indicates that it is an extension method]

Extension Method: FilterCollectionsBasedOnType

Extension Method: FilterCollectionsBasedOnPredicate

Extension Method: FindMinimum

Extension Method: FindMaximum

Note: you can write overloaded methods for other numeric types like float, double etc

LINQ (Language Integrated Query Language)

Extension Method: `FilterCollectionsBasedOnType`

Extension Method: `FilterCollectionsBasedOnPredicate`

Extension Method: `FindMinimum`

Extension Method: `FindMaximum`