Calculating Optimistic Memory Footprint of Managed Object

PFalkowski

5.00/5 (5 votes)

Feb 17, 2015

CPOL

5 min read

22302

142

The way of calculating amount of memory occupied by some object in C#.NET

Introduction

C#.NET is a high-level, multipurpose modern language with only two flaws compared to C++: generics and sizeof operator. Anyone who tried using generics in C# (coming from the C++ world) the same way as in C++, was quickly disappointed: not only can't you access static members of template (generic) type, which rends policy-based design not so swift (need to create instance of a generic type), but the generic type is (almost) useless, unless the constraint (by the where clause) will provide some interface to it. It would not be such a pain, except you can't even add two instances of the generic type, even when the generic type implements IConvertible interface (the interface all primitives implement). There is dynamic operator, which enables you to do such duck-typing, but well, it is by name dynamic, so there is no compile time type checking and a slight performance drawback. So there it is, as a programmer and not C# language implementer, one can't do much about it. Actually, the level of abstraction layers meticulously built on top of C# language makes it such an incredibly risky project, that it would be pure madness to even try. But there is this second thing, the sizeof operator, and we can do something about it, and what this tip will show is the method (actually a helper class) that rends possible calculating the optimistic (the object occupies at least returned amount of memory) size an object occupies in memory. Obviously, it makes use of reflection, but that is what the reflection is for.

Background

There have been many approaches to calculating managed object living in CLR. The approaches can be categorized as:

Calculate the difference of memory before and after the object release:

This one is straight forward:

var before = GC.GetTotalMemory(true);
// do some allocation
Console.Write("Memory used = {0}", before - GC.GetTotalMemory());

The obvious drawbacks are: you will fail in multithreaded environment - even operating on single thread requires the rest of CLR to remain totally frozen, even then the negative result when measuring little objects should not be a surprise (uups, the GC just collected some garbage). In short - it's neither reliable nor to the point.

The second approach, proposed here and here, relies on serialization. There are 3 types of serialization in C#, which I will not describe here and refer to the MSDN article: Serialization (C# and Visual Basic). The reason this approach is not optimal is that the serialization is not meant to measure the object size, rather to persist it, or send away. This means that measuring bytes of serialized object, one will once measure names of variables and overheaded XML, or in case of binary serialization, the compressed object size (the compression is lossless, but the object is represented using whole set of characters).

The third approach is to calculate all fields the object references sizes, add them up and voila. Or at least it seems that simple. One way it can be done in .NET is to create an interface exposing method size() returning an int representing number of bytes occupied by an instance. This is great, however: (1) one cannot measure this way build in or third-party objects, (2) it's the contract, it has not to be implemented as one would expect in class implementing the interface, and even if it would be, this is error prone. The better way is the divide and conquer way (or at least it resembles it): we start from the sizes we know, and this includes all primitives, decimal, string (str.Lenght * sizeof(char)) and few more. Then, the question must be asked: are there any (managed) objects that do not rely entirely on those primitives? The answer is, (disputably) not, or at least we are not interested in others. Consider an example: there is a class reading data (clients, stock quotes) from database to internal buffer, say, list of client instances and managing the DB connection. The client class has a couple of standard fields (int age, string name, reference to a product), and the product can be referred by many clients and contains internally a list of all client references that bought it.

We have in this example all 4 dangers one has to address when calculating memory size:

Unmanaged resource which is database
Object referencing other objects (product), that the user of the Size function not necessarily wanted/intended to calculate
The object that has no one-to-one relation: the same product can be referenced by many customers, thus its size can be calculated more than once), and the final boss
Circular references back from the product to (!) all customers that bought it. What could go wrong.

How to address these problems:

Unmanaged does not interest us. Simple. What does, on the other hand, is the connection object, which size should be calculated: mostly the connection string probably.
Object referencing other objects - The assumption is straight forward: you reference it, you own it. This may not be good reasoning for other situations, but in this case saves a lot of trouble of finding example object that created it (?).
No one-to-one relation: HasSet of references. It requires the function to store the state between calculations, so in case of recurrent function, the helper wrapper class is needed, but it's that, nothing else. + Bonus: it solves problem with circular references.
See 3.
Additionally, by counting fields only, all properties that have field underlying are counted once, and ex. FullName { get { return Forename + Surname; } } are not counted (the Forename and Surname will be separate).
There is also a not-so-obvious problem with System.Reflection.Pointer which will be discussed.

Using the Code

public static class Utilities
    {
        /// <summary>
        /// Nice way to calculate the size of managed object!
        /// </summary>
        /// <typeparam name="TT"></typeparam>
        internal class Size<TT>
        {
            private readonly TT _obj;
            private readonly HashSet<object> references;
            private static readonly int PointerSize = 
            Environment.Is64BitOperatingSystem ? sizeof(long) : sizeof(int);
            public Size(TT obj)
            {
                _obj = obj;
                references = new HashSet<object>() { _obj };
            }
            public long GetSizeInBytes()
            {
                return this.GetSizeInBytes(_obj);
            }

// The core functionality. Recurrently calls itself when an object appears to have fields 
// until all fields have been  visited, or were "visited" (calculated) already.
            private long GetSizeInBytes<T>(T obj)
            {
                if (obj == null) return sizeof(int);
                var type = obj.GetType();

                if (type.IsPrimitive)
                {
                    switch (Type.GetTypeCode(type))
                    {
                        case TypeCode.Boolean:
                        case TypeCode.Byte:
                        case TypeCode.SByte:
                            return sizeof(byte);
                        case TypeCode.Char:
                            return sizeof(char);
                        case TypeCode.Single:
                            return sizeof(float);
                        case TypeCode.Double:
                            return sizeof(double);
                        case TypeCode.Int16:
                        case TypeCode.UInt16:
                            return sizeof(Int16);
                        case TypeCode.Int32:
                        case TypeCode.UInt32:
                            return sizeof(Int32);
                        case TypeCode.Int64:
                        case TypeCode.UInt64:
                        default:
                            return sizeof(Int64);
                    }
                }
                else if (obj is decimal)
                {
                    return sizeof(decimal);
                }
                else if (obj is string)
                {
                    return sizeof(char) * obj.ToString().Length;
                }
                else if (type.IsEnum)
                {
                    return sizeof(int);
                }
                else if (type.IsArray)
                {
                    long size = PointerSize;
                    var casted = (IEnumerable)obj;
                    foreach (var item in casted)
                    {
                        size += this.GetSizeInBytes(item);
                    }
                    return size;
                }
                else if (obj is System.Reflection.Pointer)
                {
                    return PointerSize;
                }
                else
                {
                    long size = 0;
                    var t = type;
                    while (t != null)
                    {
                        size += PointerSize;
                        var fields = t.GetFields(BindingFlags.Instance | BindingFlags.Public | 
                                BindingFlags.NonPublic | BindingFlags.DeclaredOnly);
                        foreach (var field in fields)
                        {
                            var tempVal = field.GetValue(obj);
                            if (!references.Contains(tempVal))
                            {
                                references.Add(tempVal);
                                size += this.GetSizeInBytes(tempVal);
                            }
                        }
                        t = t.BaseType;
                    }
                    return size;
                }
            }
        }

// The actual, exposed method:
        public static long SizeInBytes<T>(this T SomeObject)
        {
            var temp = new Size<T>(SomeObject);
            var tempSize = temp.GetSizeInBytes();
            return tempSize;
        }
    }

Points of Interest

The most trippy part, after embracing the whole recurrent reference-jumping compared against HashSet is the System.Reflection.Pointer. It's a hellish creature to appear as a field in code using reflection, because it's not CLS compliant, and when not "crossed out" explicitly, will cause stack overflow quickly.

Also note that the generic collections and even ArrayList are not arrays in the sense of Type.IsArray, which is good actually, letting the object fall to the last case, where all fields are counted: example, size which is kept internally and incremented/decremented behind the scenes.

History

The code in its previous form was posted by me here.