Click here to Skip to main content
15,897,187 members
Articles / Programming Languages / C#
Article

Is String a Simple?

Rate me:
Please Sign up or sign in to vote.
1.96/5 (26 votes)
19 Mar 20043 min read 66.1K   13   4
A look behind .NET's System.String type.

Introduction

This article will attempt to help you better understand the System.String type and how it works behind the scenes.

A string is an immutable set of characters. Yes, it's true - when we make any changes to a string, we don't actually get back an altered version of the same string we started with, but rather a new string which includes our changes.

A string is an Object; it says string is reference type, not a value one. But when we compare string with the operators == or !=, we are actually comparing content. But it's important to remember that when we use the > or >= operators, we are comparing references. The reason for this behavior is that the operators == and != are overloaded and actually use the Compare function to do their work.

The string type has 8 public constructors, with those of them that take pointers marked as not CLS compliant - this is important in terms of future language interoperability. Info about the rest of the constructors is less interesting upon deeper analysis.

Now, a little bit more about the String.Compare function. All overloaded compare functions are based on the CultureInfo class. Please, don't be confused by the AssemblyCulture attribute designated to distinguish between main and satellite assemblies. Comparison results may be different for the same case based upon the selected culture.

For example, this is how CultureInfo is passed as an argument if we are using case sensitive comparison:

C#
return culture.CompareInfo.Compare(strA, strB, CompareOptions.None);

and for case insensitive one:

C#
return culture.CompareInfo.Compare(strA, strB, 
         CompareOptions.IgnoreCase);

Another interesting point is the implementation of comparing strings without considering of culture or language. For a comparison with the case insensitive option, it uses CaseInsensitiveComHelper function (written in C++). If the string includes characters that are greater then char (0x80) then it will always return false.

Interesting how strings bring to the same case: low case characters and upper case characters differenced only by the 0x20 bit. So, when by XOR operation, a character is known not to be lower case, by "bitwise OR" operation, it is brought to lower case and only afterwards is the comparison performed. This comparison is performed by trivial increment of character array pointers. If any of the characters is greater then 0x7F then we'll get an Argument Exception.

Comparing case sensitively resulted in loop characters comparing. The number of iterations is defined by the shortest string length if the compared strings' lengths are not equal.

In C#, string concatenation is realized in a more sophisticated method than compared to Visual Basic 6 concatenation. The first step is the allocation of memory for a character array with length equal to the sum of the concatenated string lengths. Then the result array is filled by the string's content.

The last thing I'd like to discuss is the Replace function, more exactly the Replace(string, string) function implementation. The first step is to perform some error handling to check that the new string's length is greater then zero in which case the function returns without any action.

The next step is building an index of all needed replaces and storing it in an integer array. Now, we simply walk through the array and copy characters into the result array until we get to an indexed location. Here the new value is inserted, the counter incremented, and iteration continued. Of course, the whole thing is performed on a low level with memory allocation.

A little more about replace: when your job needs a simple and frequently repeated replace operation, try to use Regular Expressions instead of the Replace function. Performance differences can be tremendous. Here are examples of simple code which may help you to see it better:

  1. Using Replace function of the String (C#):
    C#
    DateTime t1=System.DateTime.Now;
     for(int i=0;i<100;i++)
      {
       String digitregex = "9";
       String before =new String('9',65000);
       String after = before.Replace(digitregex, "");
      }
     DateTime t2=System.DateTime.Now;
     MessageBox.Show(Convert.ToString(t2-t1));

    This code performed at 0.38 seconds on average.

  2. Using dumb regular expression (C#)
    C#
    DateTime t1=System.DateTime.Now;
     for(int i=0;i<100;i++){
      Regex digitregex = new Regex("(?<digit>[9])");
      String before =new String('9',65000);
      String after = digitregex.Replace(before, "");
     }
     DateTime t2=System.DateTime.Now;
     MessageBox.Show(Convert.ToString(t2-t1));

    This code performed at 17.5 seconds on average. Conclusion - don't use regular expressions in this type of cases.

    Now, a little improvement will reduce time to 0.38 seconds:

    C#
    Regex digitregex = new Regex("(?<digit>[9])*");

    And last improvement will bring it to 0.24:

    C#
    Regex digitregex = new Regex("(?<digit>[9])+");

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Software Developer KCS
Israel Israel
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
General&amp;#1057;omparing strings without considering of culture or language Pin
Spaider21-Mar-04 22:34
Spaider21-Mar-04 22:34 
GeneralRe: &amp;#1057;omparing strings without considering of culture or language Pin
Anonymous23-Mar-04 3:05
Anonymous23-Mar-04 3:05 
GeneralCase insensitive Replace Pin
sytelus20-May-03 4:11
sytelus20-May-03 4:11 
Quite surprisingly native String object has way to compare strings case insensitively but no way to replace strings by case insensetive option. You must use Regular expressions for that!

Regards,
Shital.
http://www.ShitalShah.com

GeneralRe: Case insensitive Replace Pin
Uri Gorobets20-May-03 4:25
Uri Gorobets20-May-03 4:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.