Introduction
So many articles have been written about the string
versus StringBuilder
topic that there really isn't anything left for anyone to write about. Or is there? Is there something we all overlooked? Let’s recap: don't use string
for a lot of string
operations on a string
, do use StringBuilder
when speed and efficiency is needed while doing a multitude of string
operations. All is good and well; expect that StringBuilder
has a major drawback. One that we never seem to talk about, memory usage.
Background
I was in the situation recently where I had to write an application that would perform a huge number of operations on string
s in the least amount of time. As usual, I had to design, develop and implement the code as soon as it is humanly possible to do, so needless to say, not a lot of time was invested into the design aspect of the code. To cut a long story short, I wrote the application and to my amazement (in retrospect) I had used the string
class for all string
operations. Big mistake. At first the problems weren't noticeable, the program ran as it should (bar a few normal haste-bugs) but I noticed that the longer it ran, the slower my string
s were being built. On top of this, it seemed like I had a memory leak or something was using an obscene amount of memory. In comes the StringBuilder
class. At first it seemed to be the answer to all my problems and would solve all the obvious design flaws I had ignorantly implemented. Using the StringBuilder
class immediately gave me a huge performance increase (see the statistical graphs below for some benchmarking) with string
s being built in a fraction of time it used to take. The memory problem however was not solved. In fact I started getting System out of memory
errors with my application crashing after a certain number of hours. This new problem was due to the way in which StringBuilder
operates. Let me discuss how the string
and StringBuilder
classes work.
String
s are immutable. There, I had to say it at least once in this article to give it that certain bit of credibility. What this means is that when a value is assigned to a string
, that value can't ever be changed. Sure the string
can change, but only because the initial value it was given is disregarded and substituted with the new one. Look at the following example:
string myString = "This is my string";
myString += ", that is now longer.";
In the first statement, myString
gets set to a value. In the second statement, the string
is extended with another string
which should give us a string
: This is my string, that is now longer.
, but what actually happens is that the original value is destroyed and a new string
is created with the old and new values joined. So in effect two string
s were created where we intended on using only one. StringBuilder
is better in this regard. It has an internal string
of size x (initially and by default 16 characters long) which it appends any new string
s without creating a temporary string
and it only copies its string
once the size of the internal and new string
exceeds the size of its current buffer. When this happens, the StringBuilder
will increase its internal string
’s size by doubling the current size. And this is where my problem originates from. My memory usage shot up by a value of 2 exponent with every increase of the internal buffer, which in my case with lots of big string
s was devastating to the machine’s memory. Have a look at the following screenshot to see how StringBuilder
’s internal size increases over a three second period. You will see that if left unchecked, StringBuilder
would run amok with your computer’s resources, because after 3 seconds my internal string
size was just over 33.5 million characters long.
Fair enough, I could have set the initial size of my StringBuilder
’s internal string
but the problem is that I don't know beforehand what the size of the string
would end up being and I couldn't allocate that amount of memory for every string
I was going to build (on average there are 400 string
s being built at the same time). The solution should be obvious at this point: instead of increasing the internal string
size by double, why not increase it with a smaller amount? Well, easier said than done. The String<code>
Builder class is a sealed class it would turn out, which means that it can't be inherited from. The reason for this is unknown to me but might be due to the fact the StringBuilder
class is a special class that gets handled by the runtime and JIT compiler and thus has optimization etc. in place that might be lost during inheritance. Which in itself is probably not too bad, because that is the reason why the StringBuilder
is such an amazingly fast class, but that doesn't help me with my problem.
It was then suggested to me that I use some obscure COM
component found on the internet, but that was quickly dismissed as it was up to ten times slower than the StringBuilder
class. I had no other option, but to write my own StringBuilder
equivalent class. I did my fair share of searching for someone who has written something like this, but came up empty handed. So I started writing my FastStringBuilder
class and funnily enough was finished after a good four hour’s worth of development. The subsequent class component I wrote is fast, very fast. I should add that it isn't as fast as the StringBuilder
class, but comes closer to it than anything I have come across. It also solved my memory problem, because I wrote the capability to specify the increase size of the internal string
into my component. I also mimicked the way in which the StringBuilder
worked to minimize the changes to my code and to ensure ease of use with the component. Thus, people who can use the StringBuilder
class can use my class.
Using the Code
I should add at this point that I haven't written anything else into the code that I did not need and that the component was tailored for my needs, but being a normal .NET class could be extended upon very easily. As it stands, I have two supported methods: Append
and ToString
. So let’s look at some code by comparing the three major options you are faced with when doing string
operations with a focus on appending to a string
and getting the constructed string
back. Let’s look at string
, StringBuilder
and FastStringBuilder
.
String
String myString;
myString = "This is the first part";
myString += ", this is the second part"
myString += "and the last part";
Console.WriteLine(myString);
StringBuilder
StringBuilder myString = new StringBuilder();
myString.Append("This is the first part");
myString.Append(",this is the second part");
myString.Append("and the last part.");
Console.WriteLine(myString.ToString());
FastStringBuilder
FastStringBuilder.StringBuilder myString = new FastStringBuilder.StringBuilder();
myString.Append("This is the first part");
myString.Append(",this is the second part");
myString.Append("and the last part");
Console.WriteLine(myString.ToString());
You can see that, except for the FastStringBuilder
part, there is virtually no difference between the way you would use the StringBuilder
class to the way you would use mine. Where the difference comes in is the fact that you can specify the increase size during the initialization phase as such:
myString = new FastStringBuilder.StringBuilder(100);
This tells the FastStringBuilder
class to increase the internal string
size (when an overflow will occur) with 100 characters. Obviously you would want to find the right balance between time spent increasing the buffer versus memory used by the internal string
, but with my component, you at least have the ability to find a balance.
The Internals
What happens internally in my component is that an array of characters will be kept in memory and every time you append a string
to it, it will add the characters in the string
into the internal character array. When a would-be overflow is detected, the internal buffer is increased by the set number of characters and the string
is added as usual. I have also written in the ability for you to specify the initial size of the internal character array as well as the initial value of the string
. Just to provide you with a bit more options. When the ToString
method is called the class will simply return the string
representation of the character array. Et voila. You have a StringBuilder
class that has the speed of its counterpart coupled with a flexible way of allocating memory, and if you don't believe me on the speed issue, have a look at the following section.
Points of Interest
Test1: Append a string
containing 100 characters to an initially empty string
, 10000 times.
String
Using the following code:
string hundredAstring = new string('a', 100);
string myString = "";
Console.WriteLine(" Started with 'string' test ...");
DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
myString += hundredAstring;
TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks);
Console.WriteLine(" Finished");
Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds);
StringBuilder
Using the following code:
string hundredAstring = new string('a', 100);
StringBuilder myString = new StringBuilder();
Console.WriteLine(" Started with 'StringBuilder' test ...");
DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
myString.Append(hundredAstring);
TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks);
Console.WriteLine(" Finished");
Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds);
FastStringBuilder
Using the following code:
string hundredAstring = new string('a', 100);
FastStringBuilder.StringBuilder myString = new FastStringBuilder.StringBuilder();
Console.WriteLine(" Started with 'FastStringBuilder' test ...");
DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
myString.Append(hundredAstring);
TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks);
Console.WriteLine(" Finished");
Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds);
Not too bad, but not good enough. Remember the flexibility aspect I was talking about and the tweaking that you can possibly do? Well, I ran the same test but this time I set the increase size to 5000 because the default increase size is a 1000 and this was the result:
Using the following code:
string hundredAstring = new string('a', 100);
FastStringBuilder.StringBuilder myString = new
FastStringBuilder.StringBuilder(5000);
Console.WriteLine(" Started with 'FastStringBuilder' test ...");
DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
myString.Append(hundredAstring);
TimeSpan timeTaken = new TimeSpan(DateTime.Now.Ticks - startTime.Ticks);
Console.WriteLine(" Finished");
Console.WriteLine(" Seconds to complete : " + timeTaken.Seconds + "
(milliseconds : " + timeTaken.Milliseconds + ")");
Findings: StringBuilder
is still the fastest of the lot, but as it will be shown in the next sample loses the plot somewhat. FastStringBuilder
is second by being ~20 times faster than normal string
operations and ~20 times slower than StringBuilder
. Normal string
operations are left at the back of the pack with a bad 30 seconds start to finish time.
Test2: Append a string
containing 100 characters to an initially empty string
an infinite amount of times for a maximum of 30 seconds (In all, the results below the code were kept the same as above except the iteration condition was changed so it would never leave the for
loop).
String
StringBuilder
FastStringBuilder
Findings: This is where FastStringBuilder
showcases the fact that it has the best features from both the string
and StringBuilder
worlds. Both FastStringBuilder
and string
operations used its fair share in CPU cycles as well as its allocated memory. They both ran for 30 seconds at nearly the same stats, but from the previous tests we know that FastStringBuilder
would have processed many more appends than the string
method could. StringBuilder
failed to run for 30 seconds, and after just (let's round it off) 5 seconds it threw an “Out of memory
” exception. Here you can see that StringBuilder
merely thunders along its processing path, not caring about the fact that with every increase, it doubles the memory footprint
Conclusion
StringBuilder
is still hands down the fastest way of appending to string
s, although it is the least memory efficient when dealing with a large number of big size string
s. String
operators such as the +
and +=
should only be used if your operations won't occur too frequently and the string
s you are operating on aren't bigger than the most basic of string
s. My custom created component appends to string
s the same way in which StringBuilder
does but allows for more control over the increase size so subsequently memory usage is much better than the traditional StringBuilder
. I am sure once you start playing around with the component, you will want to make some tweaks or identify possible feature additions which would make the component even more usable or faster, but if you want the speed minus the memory usage, there is simply no other choice out, there at the time this document was written.
History
- 30th March, 2008: Initial post