Generic Vector<T> Class Implementation With Use of Expression Trees

Alexandr Stefek

5.00/5 (4 votes)

Jul 21, 2014

CPOL

5 min read

19703

361

This article show another way to implement generic operators. Vector is in middle of attention.

Content

Introduction
Expression Trees
Dive Deeply into Implementation
Generic T[] Binary Operator
Time Tests
Vector<T> Class Implementation
Code Use and Limitations
Future Development
History
References

Introduction

Using generics in Vector like class implementation is neverending story. This article depictes another way for implementation.

Main problem with generic Vector implementation is that there does not exists possibility to directly implement something like

class Vector<T>
{

    public static Vector<T> operator +(Vector<T> a, Vector<T> b)
    {
        int cnt = a.Lenght;
        Vector<T> result = new Vector(new T[cnt]);
        for(int i = 0; i < cnt; i++)
            result[i] = a[i] + b[i];   
        return result;
    }
}

the compiler will throw an error at line

            result[i] = a[i] + b[i];

So if you still want to implement it you have to solve this problem and avoid by some way the line code above.

One way which can help in this problem is use of lambda expression as implementation if operators are very simillar to lambda expressions. Let consider line code

Func<double, double, double> add = (a, b) => (a + b);

it behaves exactly as we will expect fot the + operator over double type. But if generic version will be typed compiler will throw an error, even if expression Func<T, T, T> add; is for compiler OK.

Func<T, T, T> add = (a, b) => (a + b); //give compiler error

So the question is how to fill variable Func<T, T, T> add; with proper generic form?

Expression Trees

Expression trees represents set of classes which are useable for building structures which can be compiled into delegates. Some articles describing the process, structures, classes etc. can be found here:

Abul Kayes, http://www.codeproject.com/Articles/235860/Expression-Tree-Basics
Sacha Barber, http://www.codeproject.com/Articles/30604/A-Journey-into-Expressions
Alexandra Rusina, http://blogs.msdn.com/b/csharpfaq/archive/2009/11/19/debugging-expression-trees-in-visual-studio-2010.aspx
Pranay Rana, http://www.codeproject.com/Tips/438804/Expression-Tree

Going directly to the goal of this article let show very simple expression tree use

ParameterExpression ap = Expression.Parameter(typeof(double), "a");
ParameterExpression bp = Expression.Parameter(typeof(double), "b");

Expression operationResult = Expression.Add(ap, bp);

Expression<Func<double, double, double>> lambda = Expression.Lambda<Func<double, double, double>>(operationResult, ap, bp);
Func<double, double, double> add = lambda.Compile();

this code will create Func<double, double, double> add delegate with same functionallity as

Func<double, double, double> add = (a, b) => (a + b);

Dive Deeply into Implementation

So the non generic form is implemented. What about generic form? If the double keyword will be replaced in implementation it will still works? (header and footer were added to define full method)

Func<T, T, T> CreateAdd()
{
    ParameterExpression ap = Expression.Parameter(typeof(T), "a");
    ParameterExpression bp = Expression.Parameter(typeof(T), "b");

    Expression operationResult = Expression.Add(ap, bp);

    Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);

    Func<T, T, T> add = lambda.Compile();
    return add;
}

This implementation is nearly same as

Func<T, T, T> add = (a, b) => (a + b);

The difference is that CreateAdd method is OK for compiler and the lambda.Compile() expression will be called at runtime which can throw an error if operator + is not defined for type T. Second implementation is compiled at compile time and the generic avalability of operator + is tested (and failed).

Method CreateAdd can be generalized for all operators so only one implementation is enought for any kind of binary operator.

public static class FuncGenerator<T>
{
    /// <summary>
    /// Convert BinaryExpression into Func which acts as operator on types T and T. T = T op T, where op is provided by f param.
    /// </summary>
    /// <param name="f">Func which provides BinaryExpression.</param>
    /// <returns>Func&LT;T, T, T&GT; </returns>
    public static Func<T, T, T> ExpressionToFunc(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
    {
        ParameterExpression ap = Expression.Parameter(typeof(T), "a");
        ParameterExpression bp = Expression.Parameter(typeof(T), "b");
        Expression operationResult = f(ap, bp);

        Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
        return lambda.Compile();
    }

}

Now it is easy to call method FuncGenerator<T>.ExpressionToFunc() for creating appropriate Func delegate. But still the user must be familliar with expression trees.

Func<double, double, double> add = FuncGenerator<double>.ExpressionToFunc((a, b) => Expression.Add(a, b));
Func<double, double, double> sub = FuncGenerator<double>.ExpressionToFunc((a, b) => Expression.Subtract(a, b));

Now we have generic implementation of binary operators and the rest is quite easy. Or not?

Generic T[] Binary Operator

Bellow is implemented generic operator implementation for array binary operations (see method ExpressionToFuncArray). This implementation use for loop and evaluation of Func method returned by FuncGenerator<T>.ExpressionToFunc().

public static class FuncGenerator<T>
{
    /// <summary>
    /// Convert BinaryExpression into Func which acts as operator on types T and T. T = T op T, where op is provided by f param.
    /// </summary>
    /// <param name="f">Func which provides BinaryExpression.</param>
    /// <returns>Func&LT;T, T, T&GT; </returns>
    public static Func<T, T, T> ExpressionToFunc(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
    {
        ParameterExpression ap = Expression.Parameter(typeof(T), "a");
        ParameterExpression bp = Expression.Parameter(typeof(T), "b");
        Expression operationResult = f(ap, bp);

        Expression<Func<T, T, T>> lambda = Expression.Lambda<Func<T, T, T>>(operationResult, ap, bp);
        return lambda.Compile();
    }

    public static Func<T[], T[], T[]> ExpressionToFuncArray(Func<ParameterExpression, ParameterExpression, BinaryExpression> f)
    {
        Func<T, T, T> op = ExpressionToFunc(f);
        return (a, b) =>
        {
            int len = a.Length;
            T[] result = new T[len];
            for (int i = 0; i < len; i++)
                result[i] = op(ap[i], bp[i]);
            return result;
        };
    }
}

Focus now on line

                result[i] = op(ap[i], bp[i]);

op delegate is called n times per one array operation which is really time consuming. So we have generic implementation but as can be tested it is very slow implementation. Back to expression trees and hope that it give us some solution.

    public static Func<double[], double[], double[]> ArrayExpressionToFunc()
    {
        return (a, b) =>
        {
            int len = a.Length;
            double[] result = new double[len];
            for (int i = 0; i < len; i++)
                result[i] = ap[i] + bp[i];
            return result;
        };
    }

Above is depicted nongeneric implementation for double type. This implementation is faster than generic implementation introduced earlier. Now lets try to construct expression tree which behaves exactly as this nongeneric implementation.

    /// <summary>
    /// Convert BinaryExpression into Func which acts as operator on types double[] and double[]. double[] = double[] op double[], where op is provided by f param.
    /// </summary>
    /// <param name="f"></param>
    /// <returns></returns>
    private static Func<double[], double[], double[]> ArrayExpressionToFunc(Func<IndexExpression, IndexExpression, BinaryExpression> f)
    {
        //a, b are input parametres for returned Func<double[], double[], double[]> delegate
        //c is output (result)
        //i is loop variable
        //
        // //implementation looks like:
        // for(int i = a.Length; i > -1; i--)
        //     c[i] = a[i] op b[i];
        // return c;
        //
        ParameterExpression apA = Expression.Parameter(typeof(double[]), "a");
        ParameterExpression bpA = Expression.Parameter(typeof(double[]), "b");
        ParameterExpression operationResult = Expression.Parameter(typeof(double[]), "c");
        ParameterExpression iA = Expression.Parameter(typeof(int), "i");

        LabelTarget labelReturn = Expression.Label(typeof(double[]));

        //this block represent block inside loop
        Expression innerBlock = Expression.Block(
            Expression.SubtractAssign(iA, Expression.Constant(1)),
            Expression.IfThen(Expression.Equal(iA, Expression.Constant(-1)),
            Expression.Return(labelReturn, operationResult)),
            Expression.Assign(Expression.ArrayAccess(operationResult, iA), f(Expression.ArrayAccess(apA, iA), Expression.ArrayAccess(bpA, iA)))
            );

        //expression for easy implementation of new double[i] constructor
        Expression<Func<int, double[]>> newTA = (i) => new double[i];

        //main body of Func. Variable initialization and loop execution
        Expression addeA = Expression.Block(
            new[] { iA, operationResult },
            Expression.Assign(iA, Expression.ArrayLength(apA)),
            Expression.Assign(operationResult, Expression.Invoke(newTA, iA)),
            Expression.Loop(innerBlock, labelReturn)
            );

        //Compilation to get result.
        Expression<Func<double[], double[], double[]>> lambdaA = Expression.Lambda<Func<double[], double[], double[]>>(addeA, apA, bpA);
        return lambdaA.Compile();
    }

This implementation returns delegate which can be used for binary operators application on double arrays. Possible use is:

Func<double[], double[], double[]> addArray = FuncGenerator<double>.ArrayExpressionToFunc((a, b) => Expression.Add(a, b));
Func<double[], double[], double[]> subArray = FuncGenerator<double>.ArrayExpressionToFunc((a, b) => Expression.Subtract(a, b));

Now the last step is needed for generic implementation. Just replace double keyword at appropriate places with T keyword.

    /// <summary>
    /// Convert BinaryExpression into Func which acts as operator on types T[] and T[]. T[] = T[] op T[], where op is provided by f param.
    /// </summary>
    /// <param name="f"></param>
    /// <returns></returns>
    private static Func<T[], T[], T[]> ArrayExpressionToFunc(Func<IndexExpression, IndexExpression, BinaryExpression> f)
    {
        //a, b are input parametres for returned Func<T[], T[], T[]> delegate
        //c is output (result)
        //i is loop variable
        //
        // //implementation looks like:
        // for(int i = a.Length; i > -1; i--)
        //     c[i] = a[i] op b[i];
        // return c;
        //
        ParameterExpression apA = Expression.Parameter(typeof(T[]), "a");
        ParameterExpression bpA = Expression.Parameter(typeof(T[]), "b");
        ParameterExpression operationResult = Expression.Parameter(typeof(T[]), "c");
        ParameterExpression iA = Expression.Parameter(typeof(int), "i");

        LabelTarget labelReturn = Expression.Label(typeof(T[]));

        //this block represent block inside loop
        Expression innerBlock = Expression.Block(
            Expression.SubtractAssign(iA, Expression.Constant(1)),
            Expression.IfThen(Expression.Equal(iA, Expression.Constant(-1)),
            Expression.Return(labelReturn, operationResult)),
            Expression.Assign(Expression.ArrayAccess(operationResult, iA), f(Expression.ArrayAccess(apA, iA), Expression.ArrayAccess(bpA, iA)))
            );

        //expression for easy implementation of new T[i] constructor
        Expression<Func<int, T[]>> newTA = (i) => new T[i];

        //main body of Func. Variable initialization and loop execution
        Expression addeA = Expression.Block(
            new[] { iA, operationResult },
            Expression.Assign(iA, Expression.ArrayLength(apA)),
            Expression.Assign(operationResult, Expression.Invoke(newTA, iA)),
            Expression.Loop(innerBlock, labelReturn)
            );

        //Compilation to get result.
        Expression<Func<T[], T[], T[]>> lambdaA = Expression.Lambda<Func<T[], T[], T[]>>(addeA, apA, bpA);
        return lambdaA.Compile();
    }

The code can be used for definition array operators:

Func<int[], int[], int[]> addArray = FuncGenerator<int>.ArrayExpressionToFunc((a, b) => Expression.Add(a, b));
Func<int[], int[], int[]> subArray = FuncGenerator<int>.ArrayExpressionToFunc((a, b) => Expression.Subtract(a, b));

Time Tests

For testing and comparison of implementation above set of tests was implemented.

Simple implementation (full compile time specification expected to be faster implementation)
Func<double, double, double> in for loop (use of delegate for + operator of type T, expected to be slow)
Func<double[], double[], double[]> created at compile time (use of delegate for + operator of type T[], expected to be fast)
Func<double[], double[], double[]> created at runtime (use of expression trees to get operator + of type T[], hope to be fast enought)

Code for all tests is bellow:

int lastTick = 0;
private void Msg(string msg)
{
    int thisTick = Environment.TickCount;
    int delta = thisTick - lastTick;
    textBox1.AppendText(delta.ToString() + "ms\t" + msg + "\r\n");
    lastTick = thisTick;
}

private void TimeTests()
{
    double[] a = new double[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    double[] b = new double[] { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };
    double[] r = new double[] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };

    Func<double, double, double> Add = (ap, bp) => (ap + bp);
    Func<double[], double[], double[]> AddA = (ap, bp) =>
        {
            int len = ap.Length;
            double[] result = new double[len];
            for (int i = 0; i < len; i++)
                result[i] = ap[i] + bp[i];
            return result;
        };
    Func<double[], double[], double[]> AddA2 = FuncGenerator<double>.CreateArrayOperatorFuncAdd();

    Msg("beginLoops");
    int cnt = 1000000; //1M

    //Simple implementation
    for (int i = 0; i < cnt; i++)
    {
        r = new double[a.Length];
        for (int j = 0; j < a.Length; j++)
            r[j] = a[j] + b[j];
    }
    Msg("Simple implementation");

    //Func<double, double, double> in for loop
    for (int i = 0; i < cnt; i++)
        for (int j = 0; j < a.Length; j++)
            r[j] = Add(a[j], b[j]);
    Msg("Func<double, double, double> in for loop");

    //Func<double[], double[], double[]> created at compile time
    for (int i = 0; i < cnt; i++)
        r = AddA(a, b);
    Msg("Func<double[], double[], double[]> created at compile time");

    //Func<double[], double[], double[]> created at runtime
    for (int i = 0; i < cnt; i++)
        r = AddA2(a, b);
    Msg("Func<double[], double[], double[]> created at runtime");
}

Result of this test:

0ms      beginLoops
109ms    Simple implementation
282ms    Func<double, double, double> in for loop
109ms    Func<double[], double[], double[]> without loop
109ms    Func<double[], double[], double[]> created runtime

"Simple implementation" is fastest and it is on par with "created runtime". "Func in for loop" is slowest. This prove that goal was reached and this implementation is very good. Interesting is the difference between slowest and fastest implementation as this difference is spend on repeated calls of Func<double, double, double> delegate.

Please take into account that during test garbage collection sometimes go to collect memory so test results can vary but still the introduced implementation is on par with "Simple implementation".

Vector<T> Class Implementation

All partial jobs were done and now lets introduce template for full implementation. Full implementation is not goal so just main parts are depicted.

/// <summary>
/// Generic class for arithmetic operator overloading demonstration
/// </summary>
/// <typeparam name="T"></typeparam>
class Vector<T>
{
    //static operator delegates
    private static Func<T, T, T> Add = FuncGenerator<T>.CreateOperatorFuncAdd();
    private static Func<T[], T[], T[]> AddT = FuncGenerator<T>.CreateArrayOperatorFuncAdd();

    //http://stackoverflow.com/questions/1717444/combining-two-lamba-expressions-in-c-sharp

    //the ideas for implementation was based on article
    //http://blogs.msdn.com/b/csharpfaq/archive/2009/11/19/debugging-expression-trees-in-visual-studio-2010.aspx
    //

    private T[] values;
    public Vector(int dim)
    {
        this.values = new T[dim];
    }

    public Vector(IEnumerable<T> initialValues)
    {
        values = initialValues.ToArray();
    }

    public static Vector<T> operator +(Vector<T> a, Vector<T> b)
    {
        return AddT(a.values, b.values);
    }

    /// <summary>
    /// Allows easily convert an array to vector of same item type. Can be used with operator overloading.
    /// </summary>
    /// <param name="a">array to be converted</param>
    /// <returns></returns>
    public static implicit operator Vector<T>(T[] a)
    {
        return new Vector<T>(a);
    }

    /// <summary>
    /// Some kind of conversion of Vector into string
    /// </summary>
    /// <returns></returns>
    public override string ToString()
    {
        string result = "";

        result = values.Select(t => t.ToString()).Aggregate((a, b) => a + ";" + b);
        return "[" + result + "]";

    }

    /// <summary>
    /// Some kind of parsing string into Vector. Items must be delimited by ';' char. On left side '[' char is expected, on right side ']' char is expected.
    /// This is just demonstration class so I am not proud of this implementation as some strange string forms will be successfuly parsed into T[].
    /// </summary>
    /// <param name="value">String value which will be parsed.</param>
    /// <param name="parse">Parse delegate for conversion from string into T value.</param>
    /// <returns></returns>
    public static T[] Parse(string value, Func<string, T> parse)
    {
        if (value[0] != '[' | value[value.Length - 1] != ']')
            throw new FormatException(string.Format("{0}  is not valid format for type {1}", value, typeof(T[]).ToString()));

        string tmpStr = value.Substring(1, value.Length - 2).Trim();

        string[] items = tmpStr.Split(new char[] { ';' });
        var values = items.Select(s => parse(s.Trim()));
        return values.ToArray();
    }
}

Code Use and Limitations

Basic form of generic Vector<T> was introduced and now lets show possible way to use. Firstly there are set of numerical types which works with this implementation very well.

/// <summary>
/// Initialization values for numeric types
/// </summary>
string aValue = "[0; 1; 2; 3; 4; 5; 6; 7; 8; 9]";
string bValue = "[1; 1; 1; 1; 1; 1; 1; 1; 1; 1]";

/// <summary>
/// Generic type test
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="parser"></param>
private void TypeTest<T>(Func<string, T> parser)
{
    Msg("Test for " + typeof(T).ToString());
    Vector<T> a = Vector<T>.Parse(aValue, parser);
    Vector<T> b = Vector<T>.Parse(bValue, parser);
    Vector<T> result = a + b;

    Msg(string.Format("c = a + b; => {0} = {1} + {2}", result.ToString(), a.ToString(), b.ToString()));
}

/// <summary>
/// Test of native numeric types
/// </summary>
private void ImplementationTest()
{

    TypeTest<int>(s => int.Parse(s));
    TypeTest<uint>(s => uint.Parse(s));
    TypeTest<short>(s => short.Parse(s));
    TypeTest<ushort>(s => ushort.Parse(s));
    TypeTest<long>(s => long.Parse(s));
    TypeTest<ulong>(s => ulong.Parse(s));

    TypeTest<double>(s => double.Parse(s));
    TypeTest<float>(s => float.Parse(s));
    TypeTest<decimal>(s => decimal.Parse(s));

}

Result of this test:

0ms    Test for System.Int32
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
15ms   Test for System.UInt32
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms    Test for System.Int16
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms   Test for System.UInt16
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms    Test for System.Int64
16ms   c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
0ms    Test for System.UInt64
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
15ms   Test for System.Double
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms   Test for System.Single
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]
16ms   Test for System.Decimal
0ms    c = a + b; => [1;2;3;4;5;6;7;8;9;10] = [0;1;2;3;4;5;6;7;8;9] + [1;1;1;1;1;1;1;1;1;1]

Unfortunately there are types which cannot be used by this way even if someone can suppose that they will work. Presented implementation does not work for byte and sbyte types.

Test on user definded types is last test which have to be run with this implementation. For this test UserNumericType (with + operator overloaded) was prepared and hypothesis about structure Vector<Vector<double>> was set.

/// <summary>
/// Test of types (with overloaded operator) defined by user
/// </summary>
private void UserTypeTest()
{
    Vector<UserNumericType> a = new UserNumericType[] { new UserNumericType(0, 1), new UserNumericType(10, 10) };
    Vector<UserNumericType> b = new UserNumericType[] { new UserNumericType(0, 1), new UserNumericType(20, 20) };

    Vector<UserNumericType> result = a + b;
    Msg("Test for UserNumericType");
    Msg(string.Format("c = a + b; => {0} = {1} + {2}", result.ToString(), a.ToString(), b.ToString()));

    Vector<Vector<double>> avvd = new Vector<double>[] { new double[] { 0, 1, 2 }, new double[] { 10, 10, 10 } };
    Vector<Vector<double>> bvvd = new Vector<double>[] { new double[] { 1, 1, 1 }, new double[] { 21, 21, 21 } };
    Vector<Vector<double>> resultvvd = avvd + bvvd;

    Msg("Test for Vector<Vector<double>> type");
    Msg(string.Format("c = a + b; => {0} = {1} + {2}", resultvvd.ToString(), avvd.ToString(), bvvd.ToString()));

}

Results for this test:

0ms    Test for UserNumericType
16ms   c = a + b; => [(0; 2);(30; 30)] = [(0; 1);(10; 10)] + [(0; 1);(20; 20)]
15ms   Test for Vector<Vector<double>> type
0ms    c = a + b; => [[1;2;3];[31;31;31]] = [[0;1;2];[10;10;10]] + [[1;1;1];[21;21;21]]

Future Development

Many times was in article mentioned that goal is not implement perfect class Vector<T> so it is not done. Just some principles which create solid basement for future development was set. If someone want it is possible to find many implementations of some kind of vectors over internet with friendly licence and reform them to principles presented in this article.

History

2014-07-21 First implementation after few days of investigation.

References

Abul Kayes, http://www.codeproject.com/Articles/235860/Expression-Tree-Basics
Sacha Barber, http://www.codeproject.com/Articles/30604/A-Journey-into-Expressions
Alexandra Rusina, http://blogs.msdn.com/b/csharpfaq/archive/2009/11/19/debugging-expression-trees-in-visual-studio-2010.aspx
Pranay Rana, http://www.codeproject.com/Tips/438804/Expression-Tree