Using AppDomain Storage for Large Data Collections

Roger500

4.88/5 (4 votes)

Feb 12, 2015

CPOL

10 min read

22776

273

How to use an AppDomain to store large data collections

Download source - 335.1 KB

Introduction

Do you have a 32 bit operating system? Do you write 32 bit applications? You may be interested in getting more memory for your application; no hardware needed! The secret? AppDomains!

Background

I will be talking about virtual memory only in this article. The relationship between real and virtual memory is outside the scope of this article and is extensively covered by many other articles.

A default domain is created for you by the Common Language Runtime (CLR) before your application runs. The CLR will automatically loads mscorlib.dll, your program and, possibly, other assemblies. Consequently, the default domain has 4GB of addressible memory but applications have a smaller amount of memory to work with; typically two to three gigabytes (GB) of virtual storage.

Your program may create additional domains. I will refer to domains created by your application as secondary domains. A secondary domain on a 32 bit system will have 2GB of addressible memory with usable memory being somewhere between 1 and 2 GB after mscorlib.dll and, possibly, other assemblies have been loaded. You may have multiple secondary domains and a secondary domain may create other domains.

What can we use the additional memory for?

Most articles and books talk about using a secondary domain to create an isolated environment where untrusted applications execute. I want to use the additional memory to store data. A secondary AppDomain is an excellent use for large collections and data caches.

How do we use AppDomains and what are the limits?

The NET CLR creates the default domain but you must create any secondary domains. The code to store and access data in the secondary domain's memory must also be written. I will show you how to create and dispose of the secondary domain as well as how to add executable code to the secondary domain. The code you write must take the following limitations into account.

Each AppDomain has its own address space. Memory in one domain is isolated from other domains. Data must be explicitly passed between domains.
Data passed between domains must be serializable. Fortunately, many of the the standard collections are already marked as serializable. You may mark your own classes and structures with the [Serializable] attribute.
Iterators and enumerators are not serializable; foreach and collection.ForEach statements do not work across domain barriers. The alternatives are array subscripting or working with subsets of the collection. "AddRange(elementList);" and "List<string> elements = GetRange(from, to);" are examples of methods to save or fetch subsets of the collection.
Constructors can not have parameters. You will need to use properties and functions to pass initialization parameters to the object. Other functions may need to be modified to insure the object construction has initialized completely.

Costs

One of the traditional trade offs is using more memory in exchange for faster execution speed. The memory versus time trade off is not true in this case; serializing and deserializing data takes time. The application will run slower than if the data were stored in the same domain as the code processing the data. However, it may be possible to shift the processing of the data to the secondary domain which would eliminate much of the overhead costs.
Exception handling must be carefully planned. Exceptions thrown in a secondary domain must be handled in the secondary domain. If not handled, the default domain receives notification of an exception via the UnhandledExceptionEventHandler event and the application terminates.

Using a Cross-domain Object

The sample programs use a generic collection of strings to demonstrate the cross domain object. The sample programs create and process the string collection in both the primary domain's local memory or in a seconday domain. The generic collection is named DomainList<T>. DomainList<T> is a wrapper around the generic List<T> class. The sample programs create a set of random numbers that are converted to fixed length strings. For example, the number 4567 is converted to the right justified, blank filled literal, " 4567". The literal strings are stored in an instance of the DomainList<T> class. Normally, this would be coded as:

1. DomainList<string> storedText = new DomainList<string>();
2. storedText.Add(rightJustifiedNumber);
3. storedText.Dispose();

Statement 1 creates an instance of the DomainList object tailored for string data.

Statement 2 adds a string to the object. The string will be stored in the current domain's memory.

Statement 3 disposes of the storedText object and releases memory associated with the object.

To store the data in a different domain's memory, you need to create an AppDomain and create an instance of the DomainList<T> object in the newly created domain. The following code is the simplest way to accomplish those actions.

1. AppDomain storageDomain = AppDomain.CreateDomain("SecondaryDomain");
2. DomainList<string> storedText = 
	(DomainList<string>)storageDomain.CreateInstanceAndUnwrap(
               typeof(DomainList<string>).Assembly.FullName,
               typeof(DomainList<string>).FullName);
3. storedText.Add(rightJustifiedNumber);
4. storedText.Dispose();
5. AppDomain.Unload(storageDomain);

Statement 1 creates a new domain and gives the new domain the specified name. The domain's name should be suitable for display.

Statement 2 creates an instance of the DomainList<T> object in the current domain and the secondary domain. It also establishes the cross domain communication environment that allows you to pass data between the two domains. More than one object may be created in the secondary domain.

Statement 3 adds a string to the object. The string will be stored in the secondary domain's memory.

Statement 4 disposes of the storedText object and releases memory associated with the object. The AppDomain must be disposed of separately.

Statement 5 disposes of the secondary domain. You may have more than one cross domain object in the secondary domain. Dispose of the domain after all cross domain objects in the domain have been disposed of.

The DomainList<T> Class

The DomainList<T> object is a wrapper around the generic List<T>> collection. The List<T> class provides the functions to operate on data in the collection. DomainList<T> provides the required information and function definitions for the cross domain communication and enhances some of the functionality. A wrapper is needed for several reasons.

Inheritance is not feasible. DomainList<T> already inherits from MarshalByRefObject or an equivalent. C# does not support multiple inheritance so I wrapped List<T> by creating an instance of the List<T> object within the DomainList<T> object. I then wrote enhanced versions of the functions I needed. The end user application will call DomainList<T>'s enhanced function which will then call List<T>'s equivalent funtion to perform the work.
Data must be serializable. Enumeration and iterators not serializable. You may want to provide alternate means of iteration such as subscripting or data retrieval by range. I created Add(mySublist), Remove(startIndex, endIndex) and Retrieve(startIndex, endIndex) functions to speed high volume data exchanges. The Sort() function runs in the secondary domain thus eliminating a great deal of cross domain data serialization/deserialization.
You cannot use throw/catch logic across domain boundaries to handle exceptions. You must use other techniques to communicate error conditions. The enhanced DomainList<T> functions use try/catch logic within the functions to capture exceptions thrown by List<T>. The exception is passed back to the end user via DomainList<T>'s LastException property. DomainList<T> will typically create a new Exception and use the Source property to identify the function being performed when an exception was thrown. The new exception's InnerException property returns the original error. An improper return is also used where feasible. For example, x.Count would return -1 if the number of elements in List<T> could not be determined. The disadvantage is obvious; the caller must check the LastException property or the error disappears.

The concept of the LastException property works fine for a single threaded class but not a multi-threaded class. You may miss an error in one thread or receive error notification in a thread that did not make an erroneous call. Error feed back for subscript notation remains problematic.

Coding the DomainList<T> class is very similar to coding a normal class. The class declaration is:

[Serializable]
public class DomainList<T> : MarshalByRefObject, IDisposable

The serializable attribute marks the class as serializable and is required.

MarshalByRefObject is the base class for objects that communicate across application domain boundaries via a proxy. Objects within the same domain communicate directly; no proxy is necessary.

IDisposable is highly recommended so cross domain resources may be freed as soon as possible.

The CrossAppDomainObject

Disposing of an object inheriting from MarshalByRefObject will cause a memory leak in a cross domain environment. The domain must be freed to release all resources. You should inherit from the CrossAppDomainObject class, not the MarshalByRefObject. Nathan B. Evans wrote the CrossAppDomainObject class and published it at http://nbevans.wordpress.com/2011/04/17/memory-leaks-with-an-infinite-lifetime-instance-of-marshalbyrefobject/

His explanation is far better than mine would be but I will point out the differences in coding. Change public class DomainList<T> : MarshalByRefObject, IDisposable to public class DomainList<T> : CrossAppDomainObject, IDisposable The IDisiposable is optional when inheriting from CrossAppDomainObject as CrossAppDomainObject contains dispose code. If you have objects in your class that need to be disposed of, you need to override the Dispose method in the CrossAppDomainObject. See the DomainListCD<T> object in the sample projects for a complete coding example.

Sample Programs

I have provided three sample projects. I compiled all samples with a platform target of an x86 processor. The samples will use 32 bit addressing regardless of operating system mode. This is especially important for the third sample project. All projects will work in 64 bit mode but memory limitations are not as easy to demonstrate.

DomainMemoryDemo and DomainMemoryDemoCrossAppDomain are the same programs except DomainList<T> inherits from MarshalByRefObject and DomainListCD<T> inherits from CrossAppDomainObject. You may run the the test with the object in the default domain or in a secondary domain. The "Test the object" checkbox will perform several operations against the generated data to demonstrate some coding techniques. The "Pre-allocate memory" checkbox will provide a capacity to the collection before populating the collection. I added this checkbox just to see how big a difference it made in load time.

The generated reports will show elapsed time and CPU time for various operations. For example, the CPU time for the sort is significant and will show up in the domain where the data is stored. You will also see a major difference in data load time; 3.22 elapsed seconds for the default domain versus 16.27 for the secondary domain. This is the overhead for using a secondary domain.

The third sample project is DomainMemoryLeakTest. This test will create and dispose of DomainMemory objects until memory is exhausted or you stop the test. Running "Normal Dispose" with "Create domain once" should fail on a 32 bit system within a minute or so. All other combinations will continue to execute until you press the "Stop" button.

Points of Interest

These objects also work well in a 64 bit environment. Regardless of the environment, objects are limited to not more than 2GB in size unless gcAllowVeryLargeObjects is set (.NET 4.5+ and 64 bit OS only). See

To add an application configuration file to your C# project

On the menu bar, choose Project, Add New Item.
The Add New Item dialog box appears.
Expand Installed, expand Visual C# Items, and then choose the Application Configuration File template.
In the Name text box, enter a name, and then choose the Add button. A file that's named app.config is added to your project.
Add the line containing gcAllowVeryLargeObjects to your configuration file.

A sample configuration file is:

<configuration>
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
</configuration>

History

This is version 1.0.0.0

Credits and Acknowledgements

All samples were developed and tested using Microsoft Visual Studio Community 2013 and Microsoft Visual Studio 2010 Express, Net Framework 4 Client Profile. I also tested under Visual Studio 2010 Express. Thanks to Nathan B. Evans for his excellent work on the CrossAppDomainObject. See

http://nbevans.wordpress.com/2011/04/17/memory-leaks-with-an-infinite-lifetime-instance-of-marshalbyrefobject/