Click here to Skip to main content
15,867,488 members
Articles / Programming Languages / C#

Single Instance String Store for .NET

Rate me:
Please Sign up or sign in to vote.
4.89/5 (48 votes)
16 Jul 2009CPOL8 min read 64.6K   994   114   28
By implementing a single instance string store, you can significantly reduce the memory footprint of your application.

Introduction

By using a single instance string cache, you can significantly reduce the memory footprint of your application. We discovered the value of this while doing performance and memory tuning of Gibraltar, our commercial application monitoring product. The overhead in processor time is minimal, and the memory improvement tends to increase as your application manages more data, which can significantly improve your ability to perform operations in memory. Just use one simple static class to easily swap strings for a single common value ensuring that each string is only in RAM once.

Performance in the Real World

Using one of the sample applications that we ship with Gibraltar, we created a specific test application that lets us enable and disable the string cache to validate performance both in memory savings and in processor usage. What we found was that for a processor penalty of 5% (which did not translate into any runtime performance change in our case because of the way we use multithreading), we were able to reduce the memory footprint of the Gibraltar Agent, particularly in certain extreme cases where clients where stretching the capabilities of the Agent. Here's a chart that shows the observable difference of memory usage with the StringReference class enabled and disabled:

Memory Utilization Comparision showing 50% memory reduction with StringRefrence enabled

This was done on a system with no memory pressure; when we examined the internal details, it was clear that the difference was more stark: the number of strings in memory dropped by 90%, consuming about 6MB of memory for the test instead of around 70MB. In the above test, the agent stored over 2.8 million log messages and metrics during the interval profiled.

You can duplicate these results for yourself: Attached is the sample application we used to run these tests. It has a checkbox that can enable and disable the single instance string cache so you can watch the effect on RAM. Just compile it and crank up the log message generation rates to maximum to quickly see the difference in memory footprint. Here's what the sample application looks like as it runs:

Sample application running with two threads logging as fast as possible.

Because we wanted to be able to show exactly the tests we ran, the sample uses Reflection to reach into our agent assembly and disable the cache. It's an internal object because we don't anticipate anyone wanting to disable it in production use and we want to keep our API as clean as possible. You can use Reflector if you want to see that it is exactly the same source code as the StringReference class we've attached.

Conceptual Background

Virtually every piece of data your application works with ends up as a string - to be serialized to a display, log, or file. This is so common that ToString is an intrinsic feature of every object. As your application works with more data, you'll discover that the most common objects, and the ones tying up the most memory, are strings. Because your application is working within a common problem domain, you'll tend to have substantial repetition of values. Each time a value is repeated, it uses up the same amount of memory. Additionally, having string objects all over that have different durations causes the Garbage Collector to have to relocate objects more often. While it's very difficult to prevent unique strings from being created, if they can be immediately exchanged for a single common reference copy, it allows them to be garbage collected quickly and without fragmenting memory.

Fortunately, .NET Strings are immutable. This means that once they're created, they can never be changed: any attempt to change them results in a new String with the changes applied. This is one of the reasons that you can create real performance problems in your application by doing innocuous things like composing a string through a series of appends. While this immutability can cause performance problems in environments where you want to do a lot of string manipulation, it creates a golden opportunity for memory optimization: since a String can't ever be changed, any two String objects that have the same value are interchangeable.

Sounds Great, Doesn't .NET do this Already?

Indeed, .NET does have a capability called Interning strings. With this, it's easy to create a string and then intern it, swapping it for an existing copy (if there is one) or putting it in the Interned string store for future reference. There's one big problem: interning is for the duration of the AppDomain. That means any string that you store will not be removed from memory until the AppDomain exits. This is generally fine for compile-time constants (which is done automatically), but for most applications, this would have the opposite effect we're looking for - no string would ever be released, and our memory consumption would continually increase. What we want is to keep them in memory only as long as the string is in use by an active object.

How it Works

What we want is a way to have a dictionary of strings that are currently in memory so we can get the single reference copy of any string already there. But, we need the string to be garbage collected if no one has a reference to it. That means, the dictionary of strings itself can't have a reference to the string, but it needs to be able to return a reference when requested. So, we need something that isn't a full .NET reference - something closer to an old fashioned pointer where we can walk it, but the object may not be available anymore because it has been garbage collected.

Enter the WeakReference. A WeakReference is an object that has a property that will return the referenced object (if it's still available), or null if the object has been collected. Outstanding, that's half the problem: we can keep a list of strings we've been asked to manage without that list itself keeping them in memory.

The second half of the problem is that we can't just use a Dictionary with the string for a key: if we did, it'd keep a copy of the string itself so it could perform lookups, and that copy would be a strong reference that would prevent the String from ever being released. Therefore, to make this work, we'll have to have an efficient way of doing a lookup that doesn't in any way create a strong reference to the string. We did this by implementing a hash lookup to a linked list using the built-in GetHashCode method built into the String object. If there are multiple strings with the same Hash Code (which will happen if you have enough strings), then it does a linear search to find a match. This allows complete accuracy without requiring any strong references.

Usage

All of the necessary code to implement our single instance string store is contained in the static StringReference class. As a static class, it can be accessed easily anywhere in your code with a straightforward syntax.

There are two ways that strings can be exchanged for a central, common copy:

SwapReference: Takes the original string as a reference and exchanges it for an existing copy within the String store, if found, or returns the original if it's a new string. This is most efficient when there is a key moment in your process where you want to fix strings to their common representation, as in this example:

C#
private string m_TypeName;
private string m_Message;

public string TypeName { get { return m_TypeName; } set { m_TypeName = value; } }
public string Message { get { return m_Message; } set { m_Message = value; } }

public void FixData()
{
    // Swap all strings for a common string reference
    StringReference.SwapReference(ref m_TypeName);
    StringReference.SwapReference(ref m_Message);
}

GetReference: Takes a string and supplies the correct single instance string as its return value. This can create simple code in property accessors and other situations, as in the following example:

C#
private string m_TypeName;
private string m_Message;

public string TypeName { get { return m_TypeName; } 
       set { m_TypeName = StringReference.GetReference(value); } }
public string Message { get { return m_Message; } 
       set { m_Message = StringReference.GetReference(value); } }

The StringReference class is fully thread safe internally, so no external locking is necessary.

Additional Features

There are two additional features of the StringReference class that can come in handy: a Disabled property that enables the cache to be seamlessly enabled and disabled, and a Pack method that can speed up garbage collection in very large string scenarios.

Disabling the StringReference

The main use case for the Disabled property is for testing performance and compatibility. You can incorporate the StringReference class in your code and then use this property to globally disable it without changing any other code. If you suspect that the class is causing a problem, or you just want to see what it's doing for you, then use this property to turn the class on and off. When disabled, it simply returns the original string every time, and the Pack feature is disabled.

Packing the StringReference

As the StringReference class is used, it will end up using memory on its own for the bookkeeping necessary to track the weak references. This isn't much compared to the strings themselves, but in scenarios where strings are relatively short lived and there are a very large number of unique strings, it can add up. To free up this memory, you can periodically call the Pack method which will find all weak references pointing to objects that have been garbage collected and therefore shouldn't be tracked any more. In most applications, there are key moments where a lot of strings are freed up - such as when a large form is closed or a business process completes. Relatively quickly after these actions, the GC will tend to release the objects and they can be released from the StringReference class.

Conclusion

For a processor impact of less than five percent, you can significantly reduce the memory footprint of most applications. This can be a significant consideration with 32 bit processes that are limited to about 1.5GB of usable data memory, and because the more strings there are, the higher the probability the next one is already in the list. This means the amount of memory reduction increases with the amount of memory used.

Revision History

  • 2009-07-12: Initial version.
  • 2009-07-13: Updated to include the complete demonstration app used for the original testing.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Founder Gibraltar Software
United States United States
Kendall Miller has been designing, creating, and deploying information systems (hardware, software, and networks) since 1993.  Currently, Kendall is one of the founders of Gibraltar Software, creating developer tools for .NET developers. 

Prior to working at Gibraltar Software, Kendall helped get two Software as a Service startups off the ground creating complete IT infrastructure from the ground up.  He got his career start at John Deere working on Microsoft software and strategies
for the world wide John Deere dealership network.

You can follow the Gibraltar development team at RockSolid.GibraltarSoftware.com.

Kendall lives near Baltimore, MD. 

Comments and Discussions

 
SuggestionLock interval may be long Pin
fengyuancom1-Oct-14 15:39
fengyuancom1-Oct-14 15:39 
GeneralRe: Lock interval may be long Pin
Kendall Miller1-Oct-14 16:50
Kendall Miller1-Oct-14 16:50 
SuggestionFantastic. Here's a perfect usage scenario. Pin
beep2-Nov-11 20:59
beep2-Nov-11 20:59 
GeneralRe: Fantastic. Here's a perfect usage scenario. Pin
Kendall Miller1-Oct-14 16:51
Kendall Miller1-Oct-14 16:51 
QuestionGreat! A question too. Pin
navinmishra22-Sep-11 6:22
navinmishra22-Sep-11 6:22 
AnswerRe: Great! A question too. Pin
Kendall Miller22-Sep-11 7:14
Kendall Miller22-Sep-11 7:14 
Constants (any of the things you put in your code in quotes) will be interned already. But, the minute you append or manipulate them *at all* you get a copy. So for example, every time someone calls DebugLog it’s definitely creating a new string. Your use of String Builder means it’s just making one.

Now, the single instance string store would save you memory only if those strings (strMsg) were staying around in RAM. If you’re writing them to a file immediately and then dumping them, no savings. On the other hand if they’re buffering in RAM and eventually writing out , or you keep a circular buffer or something then it would.
GeneralString.Intern [modified] Pin
seeblunt24-Jan-10 2:05
seeblunt24-Jan-10 2:05 
GeneralRe: String.Intern Pin
Kendall Miller24-Jan-10 10:53
Kendall Miller24-Jan-10 10:53 
GeneralCool Pin
valeranavin10-Aug-09 19:15
valeranavin10-Aug-09 19:15 
General5++ Pin
Spiff Dog6-Aug-09 18:42
Spiff Dog6-Aug-09 18:42 
QuestionNot able to get same benefits Pin
EIT_Dev20-Jul-09 11:58
EIT_Dev20-Jul-09 11:58 
AnswerRe: Not able to get same benefits [modified] Pin
Kendall Miller20-Jul-09 12:33
Kendall Miller20-Jul-09 12:33 
GeneralRe: Not able to get same benefits Pin
EIT_Dev21-Jul-09 4:00
EIT_Dev21-Jul-09 4:00 
GeneralRe: Not able to get same benefits Pin
Stefan Scholte22-Jul-09 2:28
Stefan Scholte22-Jul-09 2:28 
GeneralRe: Not able to get same benefits Pin
Kendall Miller22-Jul-09 6:18
Kendall Miller22-Jul-09 6:18 
GeneralRe: Not able to get same benefits Pin
Stefan Scholte22-Jul-09 20:05
Stefan Scholte22-Jul-09 20:05 
GeneralRe: Not able to get same benefits Pin
Kendall Miller22-Jul-09 20:17
Kendall Miller22-Jul-09 20:17 
GeneralIncrease Throughput with 2 Locks Pin
Jon Okie20-Jul-09 8:37
Jon Okie20-Jul-09 8:37 
GeneralRe: Increase Throughput with 2 Locks Pin
Kendall Miller20-Jul-09 8:53
Kendall Miller20-Jul-09 8:53 
GeneralRe: Increase Throughput with 2 Locks Pin
Jon Okie20-Jul-09 9:59
Jon Okie20-Jul-09 9:59 
GeneralRe: Increase Throughput with 2 Locks Pin
Kendall Miller20-Jul-09 10:08
Kendall Miller20-Jul-09 10:08 
GeneralRe: Increase Throughput with 2 Locks Pin
AspDotNetDev12-Aug-09 19:55
protectorAspDotNetDev12-Aug-09 19:55 
GeneralRe: Increase Throughput with 2 Locks Pin
Kendall Miller14-Aug-09 12:18
Kendall Miller14-Aug-09 12:18 
GeneralExcellent work Pin
Binoy Patel16-Jul-09 12:53
Binoy Patel16-Jul-09 12:53 
GeneralNice Pin
Md. Marufuzzaman16-Jul-09 11:20
professionalMd. Marufuzzaman16-Jul-09 11:20 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.