Introduction
By using a single instance string cache, you can significantly reduce the memory footprint of your application. We discovered the value of this while doing performance and memory tuning of Gibraltar, our commercial application monitoring product. The overhead in processor time is minimal, and the memory improvement tends to increase as your application manages more data, which can significantly improve your ability to perform operations in memory. Just use one simple static class to easily swap strings for a single common value ensuring that each string is only in RAM once.
Performance in the Real World
Using one of the sample applications that we ship with Gibraltar, we created a specific test application that lets us enable and disable the string cache to validate performance both in memory savings and in processor usage. What we found was that for a processor penalty of 5% (which did not translate into any runtime performance change in our case because of the way we use multithreading), we were able to reduce the memory footprint of the Gibraltar Agent, particularly in certain extreme cases where clients where stretching the capabilities of the Agent. Here's a chart that shows the observable difference of memory usage with the StringReference
class enabled and disabled:
This was done on a system with no memory pressure; when we examined the internal details, it was clear that the difference was more stark: the number of strings in memory dropped by 90%, consuming about 6MB of memory for the test instead of around 70MB. In the above test, the agent stored over 2.8 million log messages and metrics during the interval profiled.
You can duplicate these results for yourself: Attached is the sample application we used to run these tests. It has a checkbox that can enable and disable the single instance string cache so you can watch the effect on RAM. Just compile it and crank up the log message generation rates to maximum to quickly see the difference in memory footprint. Here's what the sample application looks like as it runs:
Because we wanted to be able to show exactly the tests we ran, the sample uses Reflection to reach into our agent assembly and disable the cache. It's an internal object because we don't anticipate anyone wanting to disable it in production use and we want to keep our API as clean as possible. You can use Reflector if you want to see that it is exactly the same source code as the StringReference
class we've attached.
Conceptual Background
Virtually every piece of data your application works with ends up as a string - to be serialized to a display, log, or file. This is so common that ToString
is an intrinsic feature of every object. As your application works with more data, you'll discover that the most common objects, and the ones tying up the most memory, are strings. Because your application is working within a common problem domain, you'll tend to have substantial repetition of values. Each time a value is repeated, it uses up the same amount of memory. Additionally, having string objects all over that have different durations causes the Garbage Collector to have to relocate objects more often. While it's very difficult to prevent unique strings from being created, if they can be immediately exchanged for a single common reference copy, it allows them to be garbage collected quickly and without fragmenting memory.
Fortunately, .NET Strings
are immutable. This means that once they're created, they can never be changed: any attempt to change them results in a new String
with the changes applied. This is one of the reasons that you can create real performance problems in your application by doing innocuous things like composing a string through a series of appends. While this immutability can cause performance problems in environments where you want to do a lot of string manipulation, it creates a golden opportunity for memory optimization: since a String
can't ever be changed, any two String
objects that have the same value are interchangeable.
Sounds Great, Doesn't .NET do this Already?
Indeed, .NET does have a capability called Interning strings. With this, it's easy to create a string and then intern it, swapping it for an existing copy (if there is one) or putting it in the Interned string store for future reference. There's one big problem: interning is for the duration of the AppDomain. That means any string that you store will not be removed from memory until the AppDomain exits. This is generally fine for compile-time constants (which is done automatically), but for most applications, this would have the opposite effect we're looking for - no string would ever be released, and our memory consumption would continually increase. What we want is to keep them in memory only as long as the string is in use by an active object.
How it Works
What we want is a way to have a dictionary of strings that are currently in memory so we can get the single reference copy of any string already there. But, we need the string to be garbage collected if no one has a reference to it. That means, the dictionary of strings itself can't have a reference to the string, but it needs to be able to return a reference when requested. So, we need something that isn't a full .NET reference - something closer to an old fashioned pointer where we can walk it, but the object may not be available anymore because it has been garbage collected.
Enter the WeakReference
. A WeakReference
is an object that has a property that will return the referenced object (if it's still available), or null
if the object has been collected. Outstanding, that's half the problem: we can keep a list of strings we've been asked to manage without that list itself keeping them in memory.
The second half of the problem is that we can't just use a Dictionary
with the string for a key: if we did, it'd keep a copy of the string itself so it could perform lookups, and that copy would be a strong reference that would prevent the String
from ever being released. Therefore, to make this work, we'll have to have an efficient way of doing a lookup that doesn't in any way create a strong reference to the string. We did this by implementing a hash lookup to a linked list using the built-in GetHashCode
method built into the String
object. If there are multiple strings with the same Hash Code (which will happen if you have enough strings), then it does a linear search to find a match. This allows complete accuracy without requiring any strong references.
Usage
All of the necessary code to implement our single instance string store is contained in the static StringReference
class. As a static class, it can be accessed easily anywhere in your code with a straightforward syntax.
There are two ways that strings can be exchanged for a central, common copy:
SwapReference
: Takes the original string as a reference and exchanges it for an existing copy within the String store, if found, or returns the original if it's a new string. This is most efficient when there is a key moment in your process where you want to fix strings to their common representation, as in this example:
private string m_TypeName;
private string m_Message;
public string TypeName { get { return m_TypeName; } set { m_TypeName = value; } }
public string Message { get { return m_Message; } set { m_Message = value; } }
public void FixData()
{
StringReference.SwapReference(ref m_TypeName);
StringReference.SwapReference(ref m_Message);
}
GetReference
: Takes a string and supplies the correct single instance string as its return value. This can create simple code in property accessors and other situations, as in the following example:
private string m_TypeName;
private string m_Message;
public string TypeName { get { return m_TypeName; }
set { m_TypeName = StringReference.GetReference(value); } }
public string Message { get { return m_Message; }
set { m_Message = StringReference.GetReference(value); } }
The StringReference
class is fully thread safe internally, so no external locking is necessary.
Additional Features
There are two additional features of the StringReference
class that can come in handy: a Disabled
property that enables the cache to be seamlessly enabled and disabled, and a Pack
method that can speed up garbage collection in very large string scenarios.
Disabling the StringReference
The main use case for the Disabled
property is for testing performance and compatibility. You can incorporate the StringReference
class in your code and then use this property to globally disable it without changing any other code. If you suspect that the class is causing a problem, or you just want to see what it's doing for you, then use this property to turn the class on and off. When disabled, it simply returns the original string every time, and the Pack
feature is disabled.
Packing the StringReference
As the StringReference
class is used, it will end up using memory on its own for the bookkeeping necessary to track the weak references. This isn't much compared to the strings themselves, but in scenarios where strings are relatively short lived and there are a very large number of unique strings, it can add up. To free up this memory, you can periodically call the Pack
method which will find all weak references pointing to objects that have been garbage collected and therefore shouldn't be tracked any more. In most applications, there are key moments where a lot of strings are freed up - such as when a large form is closed or a business process completes. Relatively quickly after these actions, the GC will tend to release the objects and they can be released from the StringReference
class.
Conclusion
For a processor impact of less than five percent, you can significantly reduce the memory footprint of most applications. This can be a significant consideration with 32 bit processes that are limited to about 1.5GB of usable data memory, and because the more strings there are, the higher the probability the next one is already in the list. This means the amount of memory reduction increases with the amount of memory used.
Revision History
- 2009-07-12: Initial version.
- 2009-07-13: Updated to include the complete demonstration app used for the original testing.