Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Faster String Building

4.42/5 (38 votes)
30 Mar 2008CPOL9 min read 5  
An alternative to native string and StringBuilder options

Introduction

So many articles have been written about the string versus StringBuilder topic that there really isn't anything left for anyone to write about. Or is there? Is there something we all overlooked? Let’s recap: don't use string for a lot of string operations on a string, do use StringBuilder when speed and efficiency is needed while doing a multitude of string operations. All is good and well; expect that StringBuilder has a major drawback. One that we never seem to talk about, memory usage.

Background

I was in the situation recently where I had to write an application that would perform a huge number of operations on strings in the least amount of time. As usual, I had to design, develop and implement the code as soon as it is humanly possible to do, so needless to say, not a lot of time was invested into the design aspect of the code. To cut a long story short, I wrote the application and to my amazement (in retrospect) I had used the string class for all string operations. Big mistake. At first the problems weren't noticeable, the program ran as it should (bar a few normal haste-bugs) but I noticed that the longer it ran, the slower my strings were being built. On top of this, it seemed like I had a memory leak or something was using an obscene amount of memory. In comes the StringBuilder class. At first it seemed to be the answer to all my problems and would solve all the obvious design flaws I had ignorantly implemented. Using the StringBuilder class immediately gave me a huge performance increase (see the statistical graphs below for some benchmarking) with strings being built in a fraction of time it used to take. The memory problem however was not solved. In fact I started getting System out of memory errors with my application crashing after a certain number of hours. This new problem was due to the way in which StringBuilder operates. Let me discuss how the string and StringBuilder classes work.

Strings are immutable. There, I had to say it at least once in this article to give it that certain bit of credibility. What this means is that when a value is assigned to a string, that value can't ever be changed. Sure the string can change, but only because the initial value it was given is disregarded and substituted with the new one. Look at the following example:

C++
string myString = "This is my string";
myString += ", that is now longer.";

In the first statement, myString gets set to a value. In the second statement, the string is extended with another string which should give us a string: This is my string, that is now longer., but what actually happens is that the original value is destroyed and a new string is created with the old and new values joined. So in effect two strings were created where we intended on using only one. StringBuilder is better in this regard. It has an internal string of size x (initially and by default 16 characters long) which it appends any new strings without creating a temporary string and it only copies its string once the size of the internal and new string exceeds the size of its current buffer. When this happens, the StringBuilder will increase its internal string’s size by doubling the current size. And this is where my problem originates from. My memory usage shot up by a value of 2 exponent with every increase of the internal buffer, which in my case with lots of big strings was devastating to the machine’s memory. Have a look at the following screenshot to see how StringBuilder’s internal size increases over a three second period. You will see that if left unchecked, StringBuilder would run amok with your computer’s resources, because after 3 seconds my internal string size was just over 33.5 million characters long.

The size increase over time of the StringBuilder object

Fair enough, I could have set the initial size of my StringBuilder’s internal string but the problem is that I don't know beforehand what the size of the string would end up being and I couldn't allocate that amount of memory for every string I was going to build (on average there are 400 strings being built at the same time). The solution should be obvious at this point: instead of increasing the internal string size by double, why not increase it with a smaller amount? Well, easier said than done. The String<code>Builder class is a sealed class it would turn out, which means that it can't be inherited from. The reason for this is unknown to me but might be due to the fact the StringBuilder class is a special class that gets handled by the runtime and JIT compiler and thus has optimization etc. in place that might be lost during inheritance. Which in itself is probably not too bad, because that is the reason why the StringBuilder is such an amazingly fast class, but that doesn't help me with my problem.

It was then suggested to me that I use some obscure COM component found on the internet, but that was quickly dismissed as it was up to ten times slower than the StringBuilder class. I had no other option, but to write my own StringBuilder equivalent class. I did my fair share of searching for someone who has written something like this, but came up empty handed. So I started writing my FastStringBuilder class and funnily enough was finished after a good four hour’s worth of development. The subsequent class component I wrote is fast, very fast. I should add that it isn't as fast as the StringBuilder class, but comes closer to it than anything I have come across. It also solved my memory problem, because I wrote the capability to specify the increase size of the internal string into my component. I also mimicked the way in which the StringBuilder worked to minimize the changes to my code and to ensure ease of use with the component. Thus, people who can use the StringBuilder class can use my class.

Using the Code

I should add at this point that I haven't written anything else into the code that I did not need and that the component was tailored for my needs, but being a normal .NET class could be extended upon very easily. As it stands, I have two supported methods: Append and ToString. So let’s look at some code by comparing the three major options you are faced with when doing string operations with a focus on appending to a string and getting the constructed string back. Let’s look at string, StringBuilder and FastStringBuilder.

String

C#
String myString;
myString = "This is the first part";
myString += ", this is the second part"
myString += "and the last part";
Console.WriteLine(myString);

StringBuilder

C#
StringBuilder myString = new StringBuilder();
myString.Append("This is the first part");
myString.Append(",this is the second part");
myString.Append("and the last part.");
Console.WriteLine(myString.ToString());

FastStringBuilder

C#
FastStringBuilder.StringBuilder myString = new FastStringBuilder.StringBuilder();
myString.Append("This is the first part");
myString.Append(",this is the second part");
myString.Append("and the last part");
Console.WriteLine(myString.ToString());

You can see that, except for the FastStringBuilder part, there is virtually no difference between the way you would use the StringBuilder class to the way you would use mine. Where the difference comes in is the fact that you can specify the increase size during the initialization phase as such:

C#
myString = new FastStringBuilder.StringBuilder(100);

This tells the FastStringBuilder class to increase the internal string size (when an overflow will occur) with 100 characters. Obviously you would want to find the right balance between time spent increasing the buffer versus memory used by the internal string, but with my component, you at least have the ability to find a balance.

The Internals

What happens internally in my component is that an array of characters will be kept in memory and every time you append a string to it, it will add the characters in the string into the internal character array. When a would-be overflow is detected, the internal buffer is increased by the set number of characters and the string is added as usual. I have also written in the ability for you to specify the initial size of the internal character array as well as the initial value of the string. Just to provide you with a bit more options. When the ToString method is called the class will simply return the string representation of the character array. Et voila. You have a StringBuilder class that has the speed of its counterpart coupled with a flexible way of allocating memory, and if you don't believe me on the speed issue, have a look at the following section.

Points of Interest

Test1: Append a string containing 100 characters to an initially empty string, 10000 times.

String

Using the following code:

C#
string hundredAstring = new string('a', 100);
string myString = "";
Console.WriteLine(" Started with 'string' test ...");     
  
DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
    myString += hundredAstring;
TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks);

Console.WriteLine(" Finished");
Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds); 

The size increase over time of the StringBuilder object

StringBuilder

Using the following code:

C#
string hundredAstring = new string('a', 100);
StringBuilder myString = new StringBuilder();
Console.WriteLine(" Started with 'StringBuilder' test ...");

DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
    myString.Append(hundredAstring);
TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks);

Console.WriteLine(" Finished");
Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds);

The size increase over time of the StringBuilder object

FastStringBuilder

Using the following code:

C#
string hundredAstring = new string('a', 100);
FastStringBuilder.StringBuilder myString = new FastStringBuilder.StringBuilder();
Console.WriteLine(" Started with 'FastStringBuilder' test ...");

DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
    myString.Append(hundredAstring);
TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks);

Console.WriteLine(" Finished");
Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds);

The size increase over time of the StringBuilder object

Not too bad, but not good enough. Remember the flexibility aspect I was talking about and the tweaking that you can possibly do? Well, I ran the same test but this time I set the increase size to 5000 because the default increase size is a 1000 and this was the result:

Using the following code:

C#
string hundredAstring = new string('a', 100);
FastStringBuilder.StringBuilder myString = new 
FastStringBuilder.StringBuilder(5000);
Console.WriteLine(" Started with 'FastStringBuilder' test ...");

DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
    myString.Append(hundredAstring);
TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks);

Console.WriteLine(" Finished");
Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds + " 
			(milliseconds : " + timeTaken.Milliseconds + ")"); 

The size increase over time of the StringBuilder object

Findings: StringBuilder is still the fastest of the lot, but as it will be shown in the next sample loses the plot somewhat. FastStringBuilder is second by being ~20 times faster than normal string operations and ~20 times slower than StringBuilder. Normal string operations are left at the back of the pack with a bad 30 seconds start to finish time.

Test2: Append a string containing 100 characters to an initially empty string an infinite amount of times for a maximum of 30 seconds (In all, the results below the code were kept the same as above except the iteration condition was changed so it would never leave the for loop).

String

The size increase over time of the StringBuilder object

StringBuilder

The size increase over time of the StringBuilder object

FastStringBuilder

The size increase over time of the StringBuilder object

Findings: This is where FastStringBuilder showcases the fact that it has the best features from both the string and StringBuilder worlds. Both FastStringBuilder and string operations used its fair share in CPU cycles as well as its allocated memory. They both ran for 30 seconds at nearly the same stats, but from the previous tests we know that FastStringBuilder would have processed many more appends than the string method could. StringBuilder failed to run for 30 seconds, and after just (let's round it off) 5 seconds it threw an “Out of memory” exception. Here you can see that StringBuilder merely thunders along its processing path, not caring about the fact that with every increase, it doubles the memory footprint

Conclusion

StringBuilder is still hands down the fastest way of appending to strings, although it is the least memory efficient when dealing with a large number of big size strings. String operators such as the + and += should only be used if your operations won't occur too frequently and the strings you are operating on aren't bigger than the most basic of strings. My custom created component appends to strings the same way in which StringBuilder does but allows for more control over the increase size so subsequently memory usage is much better than the traditional StringBuilder. I am sure once you start playing around with the component, you will want to make some tweaks or identify possible feature additions which would make the component even more usable or faster, but if you want the speed minus the memory usage, there is simply no other choice out, there at the time this document was written.

History

  • 30th March, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)