Click here to Skip to main content
15,883,921 members
Articles / Programming Languages / C#

The Oft Forgotten Middle Trim

Rate me:
Please Sign up or sign in to vote.
4.71/5 (11 votes)
4 May 2016BSD4 min read 21.1K   11   17
The oft forgotten middle trim

Two Most Popular Ways to Trim

It has become ubiquitous to trim whitespace from data. Data should almost never have whitespace at the front or at the end. This fact is nearly ubiquitous throughout the industry.

  • Front Trim (also called left trim) = Remove leading whitespace, whitespace (space, tab, new line, carriage return) at the front of text.
  • Back Trim (also called right trim) = Remove trailing whitespace, whitespace (space, tab, new line, carriage return) from the back of data. Trailing whitespace.

What does this mean? Look at the following data example:

"  White space at front"      <-- space
"	White space at front" <-- tab
"
White space at front"         <-- new line or carriage return
"White space at back   "      <-- space
"White space at back	"     <-- tab
"White space at back
"                             <-- new line or carriage return

When extra white space is added to the front or back of data, it should almost always be trimmed.

The Third Way to Trim – Middle Trim

There is a third type of trimming that should be done for many fields. It is not as popular and many developers forget about it. (Marked in green below.)

  • Front Trim (also called left trim) = Remove whitespace (space, tab, new line, carriage return) from the front of data.
  • Back Trim (also called left trim) = Remove whitespace (space, tab, new line, carriage return) from the back of data.
  • Middle Trim (also called center trim) = Remove extra whitespace (space, tab, new line, carriage return) from between words of data.

Note: Extra whitespace could mean different things depending on the field. In this post, it means more than one space. However, if we were dealing with names of objects in code that should not have any middle spaces at all, then even one middle space could be considered an extra space.

Perhaps “Middle Trim” is not something you have heard of before. Front and back trim involves only removing characters if they exist. Middle Trim involves either removing or replacing characters if they exist. Because of this, some might argue that Middle Trim is an incorrect phrase. From a certain point of view, I would agree. However, to properly link the task to front trim and back trim, the phrase Middle Trim makes a lot of sense.

"Extra     white space in middle"      <-- space
"Extra 	white space in middle"          <-- tab
"Extra
white space in middle"         <-- new line or carriage return

This one actually takes some thought. Because it doesn’t apply to every field as often as front trim and back trim do. However, for many fields, middle trim is just as valid.

  • Address Lines (When there is one field per line)
  • City
  • Country
  • Name (Pretty much any type of name)
    • Account
    • Business
    • Contact
    • Company
    • Course
    • Customer
    • First
    • Last
    • Middle
    • Part
    • Partner
    • Product
    • School
    • Spouse
    • Street
    • User
  • Order Identifiers
  • State
  • etc.

Names should not have extra whitespace at the front, end, or middle. State or Country names should never have extra whitespace at the front, middle, or end. Many types of input should be cleaned of extra whitespace in the front, middle, or end.

"Awesome     Company LLC"  <-- space
"Washington	D.C."      <-- tab
"United States of
America"                   <-- new line or carriage return

All of the above are wrong. I could quote First Normal Form to you, but really common sense should be enough. These spaces make the data wrong.

Now, each field may be different. You may not want middle trim if your field is a blob of text, that has paragraphs. In that case, you certainly want to leave carriage returns.

Implementing Middle Trim in C#

Middle trim isn’t exactly easy to implement. Some languages have features, such as Regex, which make it easy. Others do not.

Why isn’t Middle Trim extremely common and more easily implemented? Perhaps middle trim is forgotten because there isn’t a clear method for it like there is with String.Trim() and so it is often left out?

Many languages, like C#, make front and back trimming easy. In C#, you can simply call String.Trim() and it will trim whitespace from the front and back. However, it doesn’t clean up extra whitespace in the middle.

Doing all three trims in C# is most easily done with Regex and an extension method.

public static class StringExtensions
{
    public static string TrimAll(this string value)
    {
        var newstring = value;
        newstring = myString.Trim(); // This removes extra whitespace from the front and the back.
        newstring = Regex.Replace(LastName, @"\s+", " "); // Replaces all whitespace with a single space
    }
}

If you want to avoid regex, you could roll your own like this:

public static class StringExtensions
{
    public static string TrimAll(this string value)
    {
        var trimmedValue = new StringBuilder();
        char previousChar = (char)0;
        foreach (char c in value)
        {
            if (char.IsWhiteSpace(c))
            {
                previousChar = c;
                continue;
            }
            if (char.IsWhiteSpace(previousChar) && trimmedValue.Length > 0)
            {
                trimmedValue.Append(' ');
            }
            trimmedValue.Append(c);
            previousChar = c;
        }
        return trimmedValue.ToString();
    }
}

You would use either method the same way.

var newstring = " This string     has extra whitespace in the      front, middle and the end.   "
newstring = nestring.TrimAll();

Implementing Middle Trim in MSSQL

MSSQL also has LTRIM (left trim) and RTRIM (right trim), but middle trim doesn’t exist. Middle Trim is even harder to write in MSSQL because there is no Regex. So you have to replace whitespaces characters with spaces, then remove multiple spaces.

Here is what it looks like to add a name to a person and to do all three trims: front, back, middle. Wow! It is ugly.

SQL
INSERT INTO PERSON  (NAME) VALUES (
	REPLACE(
		REPLACE (
			REPLACE(
				REPLACE(
					REPLACE(
						REPLACE(
							LTRIM(RTRIM(@str))
							, char(9), ' '
						),  char(10), ' '
					),  char(13), ' '
				),'  ',' '+CHAR(7)
			), CHAR(7)+' ',''
		), CHAR(7),''
	)
)

This does right trim, left trim. Then it replaces tabs, new line, and carriage returns with spaces. Then it uses the bell character (because bell is basically never used) to replace any double spaces, char(32)+Char(32), with space bell, char(32)+char(7). Then it replaces any instance of char(7)+char(32) with ”, an empty string. Then, that might leave a few space bell sequences, so we only need one more replace of bell, char(7), with ”, an empty string.

How To Know Which Type of Trimming You Need?

This is very simple. Just ask questions:

  • Front trim – Will extra whitespace at the front ever be valid?
  • Back trim – Will extra whitespace at the back ever be valid?
  • Middle trim – Will extra whitespace in the middle ever be valid? Are middle spaces allowed? If so, should they always be a single space?

If the answer to any of those questions is “no,” then you need to do that type of trim. However, it is clear that Middle Trim has more questions as it is more complex.

License

This article, along with any associated source code and files, is licensed under The BSD License


Written By
Software Developer (Senior) LANDesk Software
United States United States
I write two things: Code and Fiction:
http://www.rhyous.com
http://www.jabrambarneck.com

I am a technology expert that loves both open source and C# (an interesting combination).

I am an expert at WPF and MVVM. I've been working a lot with WCF and Entity Framework lately.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Member 123643909-May-16 21:13
Member 123643909-May-16 21:13 
GeneralMy vote of 5 Pin
Dmitriy Gakh4-May-16 21:04
professionalDmitriy Gakh4-May-16 21:04 
Bugtypo? Pin
thewazz31-Mar-16 8:56
professionalthewazz31-Mar-16 8:56 
SuggestionI don't like the name "TrimAll()" Pin
Chrris Dale19-Mar-16 10:26
Chrris Dale19-Mar-16 10:26 
Why? Because you still have a singe space in the sentence that separate the words. Better method name would be ToSingleSpace()
GeneralRe: I don't like the name "TrimAll()" Pin
rhyous19-Mar-16 17:17
rhyous19-Mar-16 17:17 
GeneralMy vote of 5 Pin
Franc Morales18-Mar-16 19:52
Franc Morales18-Mar-16 19:52 
GeneralRe: My vote of 5 Pin
RickZeeland18-Mar-16 23:21
mveRickZeeland18-Mar-16 23:21 
GeneralWill extra whitespace in the middle ever be valid? Pin
PIEBALDconsult18-Mar-16 19:32
mvePIEBALDconsult18-Mar-16 19:32 
GeneralRe: Will extra whitespace in the middle ever be valid? Pin
rhyous19-Mar-16 7:07
rhyous19-Mar-16 7:07 
Questionnice Pin
Garth J Lancaster18-Mar-16 14:13
professionalGarth J Lancaster18-Mar-16 14:13 
GeneralRe: nice Pin
PIEBALDconsult18-Mar-16 16:34
mvePIEBALDconsult18-Mar-16 16:34 
GeneralRe: nice Pin
Garth J Lancaster18-Mar-16 16:39
professionalGarth J Lancaster18-Mar-16 16:39 
GeneralRe: nice Pin
PIEBALDconsult18-Mar-16 16:45
mvePIEBALDconsult18-Mar-16 16:45 
GeneralRe: nice Pin
Garth J Lancaster18-Mar-16 17:16
professionalGarth J Lancaster18-Mar-16 17:16 
GeneralRe: nice Pin
PIEBALDconsult18-Mar-16 17:23
mvePIEBALDconsult18-Mar-16 17:23 
SuggestionTypo Pin
RickZeeland18-Mar-16 10:21
mveRickZeeland18-Mar-16 10:21 
GeneralRe: Typo Pin
rhyous18-Mar-16 13:32
rhyous18-Mar-16 13:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.