Methods are first class objects in C#

RajeshRaushan

4.09/5 (5 votes)

Mar 31, 2013

CPOL

13 min read

19482

103

In C# delegates can behave like a first class object by having closure on outer variables

Download demo - 26.6 KB

Introduction

With advance c# - code these days heavily depends upon lambda expression. Callbacks are so common with delegates like Action and Func<T>. We don't hesitate to access variables from outside of the lambda expression range. It comes handy and provides a great way to construct a programming-model which otherwise with conventional object-oriented programming is quite a challenge. Here I am trying to discuss the construct called closure and the memory footprint that it leaves. Being aware of these basic concepts would certainly help writing better code. My idea is to discuss the core concept without going too-much in internal details.

Before we get into this - let's touch the basics.

What is a class? It’s a template to define objects. In wiki I got this definition

In object-oriented programming, a class is a construct that is used to create instances of itself – referred to as class

instances, class objects, instance objects or simply objects. A class defines constituent members which enable its instances to have state and behavior. Data field members (member variables or instance variables) enable a class instance to maintain state. Other kinds of members, especially methods, enable the behavior of class instances. Classes define the type of their instances.

If you are a C# developer then nothing quite new here – we all know this – then there are other things related to class; like – static class, sealed class, nested class, generics and so on. Then there are object oriented behavior that it provide; like abstraction, overloading and encapsulation.

The key differences between structured programming and object oriented programming are introduction of concepts like encapsulation, association, aggregation and composition. For time being let’s not go to the concept of abstraction and polymorphism although that’s also equally important if not more.

OOPS in the most basic of sense - tells about relating data and functionality together and restricting unnecessary exposure of data. Class do that by having private modifier on types on which only methods defined in that class can operate directly. This modifier is applicable on behavior as well. Functionality/behavior which is not relevant to outside world will not be visible from outside. It helps maintaining a good design.

The data contained in an object at any point of time is known as state. The state can change any moment at run-time. Storing of this state information requires some memory space either on heap or stack. When an object is no more useful or cannot be accessed from code then that should die and the memory occupied by its state should be freed to be be made available for others looking for it.

Before we proceed further let’s discuss something about the freeing of memory.

Beginning with language C the programming constructs are block structured. Before this we use to have statements like jump and goto – these are random calls. You can go to any labels defined from any where - so execution flow was unpredictable and extremely complicated to understand. The introduction of block is one of the nicest features of high-level-languages. In C family and most of others as well - the blocks are defined by curly braces { }. (With advance C# may times the blocks are inferred). This block defines scope for whatever is there inside. In C block could exist at function level and inside that in loops, branches and for anything else. Block within block (nested blocks) are just fine. Function was the top-most level where block could exist. Outside that whatever you define that becomes global. With inclusion of a header file all variable defined at file level would be available for usage globally. In current scenario we can think of these as static. So there was only static thing – Instance concept was not there.

To bring Instance thing into picture (or OOPS) there was just one thing needed. That was allowing blocks outside function. This came in form of class. Class will have a name that defines a scope – inside that whatever we define that would be limited to that class only. Class is not an execution unit like function. It’s an entity with some state which offers some behavior/services/functionality. This new concept is termed as OOPS. Then it offers additional things – like inheritance, encapsulation, abstraction, polymorphism, static and so on. But introduction of block beyond function limit was one of the most fundamental differences in structured programming and object oriented programming. System defined types were there already like int, char, float. In that line now we got freedom to define our types as well.

I have a bad habit of diverting topic! Let’s come-back to concept behind freeing resources now (memory in particular).

If a variable/object is defined inside function then that variable will not exist until someone calls that function. When the function gets called the variable gets created somewhere in memory, then used and then as soon as function finishes its job and comes to end that variable is of no use. No code can reach that – that goes out of scope. So this is the time the memory claimed by that variable should be freed. The same holds true for any variable defined in any scope. As soon as the scope dies the claimed resources should be freed.

C++ destructor works that way. When an object goes out of scope the destructor gets called immediately. For some reason (scattered memory-points after some time of execution was the main reason ) this wasn't the best approach so later (java onward) cleaning process started following new concept known as garbage-collection. It cleans periodically instead of immediately and for large systems it helps improving performance as well as maintaining cleaner memory footprints.

Forget about the timing of cleaning – the basic thing is that when a variable/object gets out-of-scope it should free the resource claimed either immediately of after sometime (some definite time).

To make it happen - framework should know that something is out of scope now. It’s easiest to determine for inner blocks/functions, it is OK for class also – but in case of closure it gets a bit complicating. We will see why it gets complicated but before that we must know closure first.

What is CLOSURE?

Let’s see how it has been defined in Wikipedia.

In computer science, a closure (also lexical closure or function closure) is a function or reference to a function together with a referencing environment—a table storing a reference to each of the non-local variables (also called free variables) of that function. A closure—unlike a plain function pointer—allows a function to access those non-local variables even when invoked outside of its immediate lexical scope.

The concept of closures was developed in the 1960s and was first fully implemented in 1975 as a language feature in the Scheme programming language to support lexically scoped first-class functions. The explicit use of closures is associated with functional programming languages such as Lisp and ML, as traditional imperative languages such as Algol, C and Pascal did not support returning nested functions as results of higher-order functions and thus did not require supporting closures either. Many modern garbage-collected imperative languages support closures, such as Smalltalk (the first object-oriented language to do so) and C#. Support for closures in Java isplanned for Java 8.

So – it’s not a new thing, it is not a OOPS concept either and it has its existence from before object oriented programming. But it has a really nice feature to treat a function as an entity with some state; this is supported now by object-oriented languages like C#. In basic sense it means that treating function as an object and expecting it to remember about the environment in which it got created. Functions are being passed as an object which holds some state as well.

But C# is fully object oriented language and so it need to treat these as well as object only some how. let's see one example now.

public class BasicClosure
{
  private IntOne one = new IntOne() { name = "IntOne+one", value = 30 };
  private IntOne two = new IntOne() { name = "IntOne+two", value = 30 };
  void IWillBeExposed()
  {
    one.value += 10;
  }
  public Action AddValue()
  {
    return IWillBeExposed;
  }
  ~BasicClosure()
  {
    LogDetail.DebugLogs.Add(new LogDetail() { Name = "BasicClosure", 
      Method = "~BasicClosure", Variable = "one", Value = one.value.ToString() });
  }
}
 
public class IntOne
{
    public int value = 0;
    public string name = "";


    ~IntOne()
    {
        LogDetail.DebugLogs.Add(new LogDetail() { Name = name, Method = 
          "~IntOne", Variable = "value", Value = value.ToString() });
    }
}




public class LogDetail
{
    public string Name = "";
    public string Method = "";
    public string Variable = "";
    public string Value = "";


    public override string ToString()
    {
        return string.Format(
           "Name : {0}, Method : {1}, Variable : {2}, Value : {3}", Name, Method, Variable, Value);
    }




    public static List<logdetail> DebugLogs = new List<logdetail>();
}
 
public partial class TestBasicClosure : Form
{
    
    public TestBasicClosure()
    {
        InitializeComponent();
    }


    Action addValue = null;
    
    private void btnStart_Click(object sender, EventArgs e)
    {
        BasicClosure fbc = new BasicClosure();
        addValue = fbc.AddValue();
    }

    private void btnCall_Click(object sender, EventArgs e)
    {
        addValue();
    }

    private void btnLog_Click(object sender, EventArgs e)
    {
        rtbLog.Clear();
        rtbLog.Lines = LogDetail.DebugLogs.Select(dl => dl.ToString()).ToArray();
        rtbLog.Refresh();
    }

    private void btnGC_Click(object sender, EventArgs e)
    {
        GC.Collect();
    }

    private void btnClr_Click(object sender, EventArgs e)
    {
        addValue = null;
    }
}

We have a BasicClosure class which has a private void method IWillBeExposed that conforms to signature of Action delegate. There is another method AddValue that returns the reference of the private method as action.

Now in the TestBasicClosure class in the Start_Click we are creating object of BasicClosure - calling the AddValue and storing the Action in a Instance variable addValue. The object fbc immediately gets out-of-scope as the Start_Click ends.

This is a bad design - functions should not be exposed outside of class this way. What happens in this case that due to the function reference the lifetime of the object fbc gets extended and it will not release memory till the function reference exists. We have another method in the test class that requests Garbage-Collector to start cleaning ASAP. after calling this method also you can see - the object fbc finalizer does not get called. The Log_Click displays all the cleaned object logs (I created this for easy tracing).

In the code above the fbc object reference doesn't exist anywhere directly - it means we can't do any operation on that object - apart from calling the function that got exposed (that too never through the object). The function uses one Instance variable namely one. There is one more variable two that is not accessed by the exposed function - so logically that should be cleaned. The lifetime of one can be extended as that is still a reachable reference but as two is not reachable so there isn't any harm in cleaning that. In a true function-oriented-language that's how it should happen - but as C# is object-oriented-language so it doesn't happen that way. It knows one thing - a member of the class is still reachable so the object can't be cleaned.

The exposed function is basically behaving like a CLOSURE here. function acting as an entity and extending the lifetime of the context it got created. This is not really helpful till this point; the lifetime of object gets extended even though the method doesn't refer any instance variable - given any member function reference goes outside.

This will not happen in case of static methods as they are not tied to instances.

Now let's move to the next example:

public class TestClosure
{
    // Create a instance of CLOSURE and call this method to test the behavior
    public void Test() 
    { 
        // this call will get three Action delegates
        // these delegates are aware of environment in which they were created.
        var shout = Shout(new string[] {"John", "Bill", "Danish"} );

        shout[0].Invoke(); // it shows John as name would get tied to the delegate
        shout[2].Invoke();
        shout[1].Invoke();
    }

    List<action> Shout(string[] names)
    {
        List<action> actions = new List<action>();

        foreach (string currName in names)
        {
            //this assignment is important
            string name = currName;
            // if delegate forms closure on currName then as that
            // is shared by all the delegates and that is changing so all
            // delegate will end-up having same closure.
            actions.Add(
                            // the name here is outer variable with which
                            // the Action delegate is forming a closure
                            () => MessageBox.Show(name)
                            //() => MessageBox.Show(currName)
                            // un-comment this and comment the line above to see the difference
                       );
            // CLOSURE can be formed by anonymous delegates also.
            //actions.Add(delegate() 
            //            {
            //                MessageBox.Show(name);
            //            });
        }
        return actions;
    }
}

In this the Shout method creates a List of Action delegates dynamically based on number of strings supplied in the names argument. The MessageBox.Show call that is inside the Action delegate body - is accessing outer variable name that is created inside the loop.

When you run the Test method it will show all names in message-box one by one.

One thing to note here is the name variable that is declared in the scope of loop. So in effect the Action delegate accessing the name will be different every time - In other words each action has it's own copy of outer variable. The difference you can see if instead of showing name you show currName. This variable is not inside scope of loop; so it doesn't get created every time it loops. So if we show currName then for every action the last value of currName will get flashed.

When a method or delegate forms a closure on some outer variable - it doesn't copy the value. It just associate the reference of that with itself. While making the call to such delegate the associated variable will be demanded and the value contained that time will be the one that will be taken. This is the reason why in case of currName it will always be the last one that will be available because all the actions are getting invoked much after the loop ends.

This was about scope, how should we associate outer variables to a lambda-expression/anonymous-delegates and all that. Now just see how a object-oriented language like C# achieves this and how memory-management goes with all these? How methods are treated as first-class-objects? Here is another sample code:

public class FCO
{
    public string name = "FCO";
    IntOne y = new IntOne() { name = "Y", value = 20 };

    public void TestLifeTime(out Func<int> func1, out Func<int> func2) {

        IntOne i = new IntOne() { name = "I", value = 10 };
        IntOne j = new IntOne() { name = "J", value = y.value };
        IntOne k = new IntOne() { name = "K", value = y.value }; // not part of any closure

        // closure on i
        Func<int> add10 = () => 
        { 
            i.value += 10; // outer variable
            return i.value; 
        };
        // closure on i and j
        Func<int> addFive = () => 
        { 
            j.value = i.value + 5; // outer variable i and j
            return j.value; 
        };

        func1 = add10;
        func2 = addFive;

        k.value++; // simple operation
    }

    ~FCO(){
        y = new IntOne() { name = "new Y" };
        LogDetail.DebugLogs.Add(new LogDetail() {Name = name,  Method = "~FCO", Variable = "y", Value = y.value.ToString() }); 
    }
}

Here is a windows form to test this class:

public partial class TestFCO : Form
{
    public TestFCO()
    {
        InitializeComponent();
    }

    Func<int> f1 = null;
    Func<int> f2 = null;


    private void btnStart_Click(object sender, EventArgs e)
    {
        FCO fco = new FCO();
        fco.TestLifeTime(out f1, out f2);
    }

    private void btnAddTen_Click(object sender, EventArgs e)
    {
        if (f1 != null)
        {
            lblOne.Text = f1().ToString();
        }
    }
 
    private void btnAddFive_Click(object sender, EventArgs e)
    {
        if (f2 != null)
        {
            lblTwo.Text = f2().ToString();
        }
    }

    private void btnGC_Click(object sender, EventArgs e)
    {
        GC.Collect();
    }

    private void btnLog_Click(object sender, EventArgs e)
    {
        rtbLog.Clear();
        rtbLog.Lines = LogDetail.DebugLogs.Select(dl => dl.ToString()).ToArray();
        rtbLog.Refresh();
    }

    private void btnClrTen_Click(object sender, EventArgs e)
    {
        f1 = null;
    }

    private void btnClrFive_Click(object sender, EventArgs e)
    {
        f2 = null;
    }
}

We are creating an instance of FCO on start - then we get two action delegates supplied as out parameters. The start method ends so the locally created object goes out-of-scope. Unlike the first sample here we aren't exposing any instance member of FCO so there isn't any reason why finalizer should not fire. As garbage-collector may take sometime so we are requesting immediate clean-up using GC.Collect(). We can see the log if some objects has been released. Here is what i get if i call start then collect then getLog.

Name : FCO, Method : ~FCO, Variable : y, Value : 0
Name : Y, Method : ~IntOne, Variable : value, Value : 20
Name : K, Method : ~IntOne, Variable : value, Value : 21

If i call collect once again and then getLog then i get one more line of log.

Name : new Y, Method : ~IntOne, Variable : value, Value : 0

So what is happening here - FCO finalizer gets called. Y is an instance member and nobody closes on it as well so it's finalizer also gets called. in FCO finalizer a new object gets created and kept in Y. K is a local variable to the function TestLifeTime and nobody closes (depends for future) on it so it's finalizer also gets called.

the newly created Y doesn't have any reference but it got created during clean-up so it will survive first cycle of GC, but upon next GC.Collect() call that also gets collected and that's why we are getting the one extra log after the 2nd call to GC.Collect().

Now we have two closures created. The Object died already but the closures survives in TestFCO as f1 and f2 member variables. Here the lifetime of I and J get extended because

f1 closes on I and

f2 closes in I and J both

I and J were local variables to the function they were defined in - the function execution has ended already so how come the I and J will still survive? with what object they would be tied to? because apart from execution context local variables - state has to be associated with an object or-else the GC will clean that. OK, what would happen if we nullify f2 reference in the TestFCO? logically J should get cleared as f1 doesn't closes on J, it requires only I so holding J doesn't have any relevance. In my demo-project i am doing this on Clear 5 button - but to my surprise - J doesn't get collected. But when i nullify f1 also then I and J both gets collected.

This problem is here to stay - and it is because of the object-oriented construct. It doesn't poses a big threat although.

When you create a closure then dynamically - nested classes gets created at compile-time for all the scopes which variables are accessed as outer variables by the closing function/delegate. But the compiler isn't a dumb piece. In the nested class only those member will participate on whom somebody is closing. See in the figure below how it looks from IL Dis-assembler. the c_displayClass5 is the dynamic generated type for assisting closure. further we can see that I and J both are member of this class but not k. An instance of this class gets associated with the delegate and so this instance will live till the time any reference of the delegate exist. Here in case this type is shared by both the delegate f1 and f2 so even though f2 dies and J gets unreachable - the J will still exist because it is member of the same object which I belongs. This is how in object-oriented environment function-oriented construct is supported. It may extend life of some un-wanted variables as well (here in case it is of J) but still it's useful. with careful design by having these concepts in mind the problem can be avoided almost and the impact of this in most cases is negligible.

This is why we call Methods in C# are treated as first-class objects. They act as a simple object - in fact compilers create simple objects to hold their required state in case they form closure on some outer variable. these objects are simple and do not go to the length of abstraction and polymorphism as that's not required also in these cases.

With this I am bringing closure to this article. hope I haven't missed any important part. Thanks.

Methods are first class objects in C#

Introduction

Name : FCO, Method : ~FCO, Variable : y, Value : 0Name : Y, Method : ~IntOne, Variable : value, Value : 20Name : K, Method : ~IntOne, Variable : value, Value : 21

Name : new Y, Method : ~IntOne, Variable : value, Value : 0

Name : FCO, Method : ~FCO, Variable : y, Value : 0
Name : Y, Method : ~IntOne, Variable : value, Value : 20
Name : K, Method : ~IntOne, Variable : value, Value : 21