Click here to Skip to main content
14,972,737 members
Articles / Web Development / ASP.NET
Article
Posted 24 Jun 2009

Stats

28.7K views
174 downloads
16 bookmarked

Lost and Found Identity Proper Case Format Provider (IFormatProvider implementation)

Rate me:
Please Sign up or sign in to vote.
3.40/5 (3 votes)
24 Jun 2009MPL2 min read
Provides administrators with a way to re-format biographical data from case-insensitive data-sources.

Introduction

While working with Identity Management issues, administrators often face a dilemma on re-formatting user's biographical data. When user's last names and titles are stored in undesirable case-insensitive format, Identity management admins have to re-format last names. Last names and titles present difficulties for first letter capitalization due to non-standard name spellings and spellings of acronyms.

What problem does this solution solve

"Proper case" capitalization of user biographical data or any other string.

How does this help someone else?

System administrators and Identity Management professionals could use this Format Provider to process data that is stored in a case-insensitive data-source.

Using the code

To use this format provider, the user should include the Lost and Found Identity Proper Case Format provider into their project and use it in the following way:

C#
string improper = " MRs. De'MArLeY-Smith mccarthy IV Sr. PhD "; 

Without McOption

C#
string result = string.Format(new LafiProperCaseFormatProvider(), "{0:p}", improper); 

Result:

Mrs. De'Marley-Smith Mccarthy IV Sr. PhD

With McOption

C#
string result = string.Format(new LafiProperCaseFormatProvider(), "{0:mc}", improper); 

Result:

Mrs. De'Marley-Smith McCarthy IV Sr. PhD

How does the code actually work

The Lost and Found Identity Proper Case Format Provider is an implementation of the IFormatProvider interface.

The format provider splits strings on the "space" character, then removes any excess white space and applies several patterns for special capitalization rules (Roman Numerals, Salutations, and titles like PhD, etc.); thereafter, the string is split on "hyphens" and "apostrophes" to ensure proper capitalization of hyphenated words and compound words with apostrophe in the middle.

If the user specifies a case "m" or "mc" (McOption), Irish/Scottish names will be included into the pattern analysis. This option is particularly tricky, since it can produce undesirable results on non-Irish/Scottish names. Consider "MacDonald" vs. "Macado". The format provider will not capitalize Machado into "MaChado", which is generally undesirable. To solve this problem within the Identity Management project, you should use attribute flow precedence and a dedicated data-source which will contain the exceptions to case capitalization. (See my blog: http://kdmitry.spaces.live.com/blog/.)

Attention:

This code was designed and tested in the context of Identity Management proper case formatting. Applying this format provider to general text could produce undesirable results.

C#
//-----------------------------------------------------------------------
// <copyright file="LafiProperCaseFormatProvider.cs" company="LostAndFoundIdentity">
// Copyright (c) 2007 LostAndFoundIdentity.com | All rights reserved.
// </copyright>
// <author> Dmitry Kazantsev </author>
//-----------------------------------------------------------------------
[assembly: System.CLSCompliant(true)]
namespace LostAndFoundIdentity.Text
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics.CodeAnalysis;
    using System.Globalization;
    using System.Text;
    using System.Text.RegularExpressions;

    /// <summary>
    /// Represents Lost and Found Identity class LafiProperCaseFormatProvider
    /// which is responsible for providing custom proper case string formatting
    /// </summary>
    [SuppressMessage("Microsoft.Naming", 
      "CA1704:IdentifiersShouldBeSpelledCorrectly", 
      MessageId = "Lafi", 
      Justification = "Nothing wrong with 'Lafi'; It is stands for Lost and Found Identity")]
    public class LafiProperCaseFormatProvider : ICustomFormatter, IFormatProvider
    {
        #region Fields
        /// <summary>
        /// String representing space character
        /// </summary>
        private const string Space = " ";
        
        /// <summary>
        /// Boolean containing value representing user's desire
        /// to look for and format strings with Mc/Mac formatting algorithm
        /// </summary>
        private bool mcOption;
        
        /// <summary>
        /// Dictionary containing pattern name and and regex pattern
        /// </summary>
        private Dictionary<pattern,> patternDictionary;
        #endregion Fields
        #region Constructors
        /// <summary>
        /// Initializes a new instance of the LafiProperCaseFormatProvider class
        /// </summary>
        public LafiProperCaseFormatProvider()
        {
            this.InitializeDictionary();
        }
        #endregion Constructors
        #region Enums
        /// <summary>
        /// Name of the pattern that could be present in the string
        /// </summary>
        private enum Pattern
        {
            /// <summary>
            /// No pattern found
            /// </summary>
            None = 0,
            
            /// <summary>
            /// Represents patent where all letters must be capitalized
            /// </summary>
            AllUpperCase = 1,
           
            /// <summary>
            /// Represents pattern where first and last
            /// letter of the work must be capitalized
            /// </summary>
            FirstAndLastCapitals = 2,
            
            /// <summary>
            /// Represents pattern where Mc and Mac must be distinguished from
            /// the rest of the word by capitalizing character following the Mc or Mac
            /// </summary>
            McAndMac = 8,
            
            /// <summary>
            /// Represents patterns where string is a Roman Numeral
            /// </summary>
            RomanNumerals = 16,
            
            /// <summary>
            /// Represents pattern where string is a salutation
            /// </summary>
            Salutation = 32
        }
        #endregion Enums
        #region Properties
        /// <summary>
        /// Gets or sets a value indicating whether user wishes
        /// to look for Mc/Mac in the formatted string or not
        /// </summary>
        private bool McOption
        {
            get
            {
                return this.mcOption;
            }

            set
            {
                this.mcOption = value;
            }
        }
        
        // Gets the Dictionary containing Patten name
        // and correlated RegEx pattern "formula" 
        private Dictionary PatternDictionary
        {
            get
            {
                return this.patternDictionary;
            }
        }
        #endregion Properties
        #region Interface implementation
        /// 
        /// Formats provided string with a pre-defined template
        /// 
        /// Name of the format presented as {0:x}
        /// Value to be formatted
        /// The format provider class
        /// Formatted string
        public string Format(string format, object arg, IFormatProvider formatProvider)
        {
            string value = arg.ToString();
            switch (format.ToUpperInvariant())
            {
                default:
                {
                    return value;
                }

                case "M":
                case "MAC":
                case "MC":
                {
                    this.McOption = true;
                    return this.FormatProperCase(value);
                }

                case "P":
                {
                    this.McOption = false;
                    return this.FormatProperCase(value);
                }
            }
        }
        
        /// 
        /// Gets type of the format
        /// 
        /// Format in question
        /// Type of the format in question
        public object GetFormat(Type formatType)
        {
            if (formatType == typeof(ICustomFormatter))
            {
                return this;
            }
            else
            {
                return null;
            }
        }
        #endregion
        #region Methods
        /// 
        /// Removes all white-space from the string in question
        /// 
        /// String to be processed
        /// Reformatted string without whitespace
        private static string ProcessWhitespace(string value)
        {
            //// Strip leading and trailing whitespace(s)
            value = value.Trim().TrimStart().TrimEnd();
            //// Replace all multiple occurrences of whitespace
            //// characters (middle of the string) with a single space.
            value = Regex.Replace(value, @"\s+", Space);
            return value;
        }
        
        /// 
        /// Determines which RegEx patters are applicable for a given string
        /// 
        /// The string to be examined
        /// The Enum value of the pattern detected in the string
        private Pattern DetectPattern(string value)
        {
            foreach (KeyValuePair pair in this.PatternDictionary)
            {
                if (Regex.IsMatch(value, pair.Value, 
                    RegexOptions.IgnoreCase | 
                    RegexOptions.CultureInvariant))
                {
                    return pair.Key;
                }
            }

            return Pattern.None;
        }
        
        /// 
        /// Reformats provided value into properly capitalized string
        /// 
        /// String to be formatted
        /// Properly capitalized string
        [SuppressMessage("Microsoft.Globalization", 
          "CA1308:NormalizeStringsToUppercase", 
          Justification = "By design")]
        private string FormatProperCase(string value)
        {
            //// String that will store the 
            StringBuilder output = new StringBuilder();
            //// Remove white space from the word
            value = ProcessWhitespace(value);
            //// Process Each Word (separated by a single space)
            foreach (string token in value.ToLowerInvariant().Split(' '))
            {
                //// Create temporary token 
                string tempToken = string.Empty;
                Pattern pattern = this.DetectPattern(token);
                switch (pattern)
                {
                    case Pattern.Salutation:
                    {
                        //// Capitalizing first character in the current token
                        tempToken = token.Substring(0, 1).ToUpperInvariant() + 
                                    token.Substring(1);
                        break;
                    }

                    case Pattern.FirstAndLastCapitals:
                    {
                        //// Capitalizing first and Last characters of the string
                        Match matchedToken = Regex.Match(token, 
                          this.PatternDictionary[Pattern.FirstAndLastCapitals], 
                          RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
                        tempToken = matchedToken.ToString().ToLowerInvariant();
                        tempToken = tempToken.Replace("p", "P");
                        tempToken = tempToken.Replace("l", "L");
                        tempToken = tempToken.Replace("d", "D");
                        break;
                    }

                    case Pattern.RomanNumerals:
                    case Pattern.AllUpperCase:
                    {
                        //// Capitalizing all characters of the current token
                        tempToken = token.ToUpperInvariant();
                        break;
                    }

                    case Pattern.McAndMac:
                    {
                       // Check whether Mc/Mac option is requested
                        if (this.McOption)
                        {
                            // Capitalizing First "M" and first
                            // character after the 'Mc' or 'Mac' of the current token
                            Match matchedToken = Regex.Match(token, 
                              this.PatternDictionary[Pattern.McAndMac], 
                              RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
                            tempToken = 
                              matchedToken.Groups[1].Value.Substring(0, 1).ToUpperInvariant();
                            tempToken += matchedToken.Groups[1].Value.Substring(1);
                            tempToken += 
                              matchedToken.Groups[2].Value.Substring(0, 1).ToUpperInvariant();
                            tempToken += matchedToken.Groups[2].Value.Substring(1);
                        }
                        else
                        {
                            //// Capitalizing first character in the current token
                            tempToken = token.Substring(0, 1).ToUpperInvariant() + 
                                        token.Substring(1);
                        }

                        break;
                    }

                    case Pattern.None:
                    {
                        //// Capitalizing first character of the current token
                        tempToken = token.Substring(0, 1).ToUpperInvariant() + 
                                    token.Substring(1);
                        break;
                    }
                }
                // Looking for the << - >> character
                // as an indicator of "separated" token
                if (token.IndexOf(@"-", StringComparison.OrdinalIgnoreCase) > -1)
                {
                    //// Calling FormatSeparatedValue with separator character "-"
                    tempToken = this.FormatSeparatedValue(token, '-');
                }

                if (token.IndexOf(@"'", StringComparison.OrdinalIgnoreCase) > -1)
                {
                    //// Calling FormatSeparatedValue with separator character "'"
                    tempToken = this.FormatSeparatedValue(token, '\');
                }
                output.AppendFormat(CultureInfo.CurrentCulture, 
                                    "{0}{1}", tempToken, Space);
            }
            // Returning trimmed value
            return output.ToString().Trim();
        }
        
        /// 
        /// Formats "separated" string to ensure that hyphenated
        /// and apostrophe-separated strings are properly capitalized
        /// 
        /// Value to be processed
        /// A separator character
        /// Properly formatted "separated" string
        private string FormatSeparatedValue(string value, char separator)
        {
            string[] multiPartValue = value.Split(separator);
            StringBuilder result = new StringBuilder();
            int lastPart = multiPartValue.Length - 1;
            for (int i = 0; i < lastPart; i++)
            {
                if (multiPartValue[i].Length == 0)
                {
                    result.Append(separator.ToString());
                }
                else
                {
                    result.AppendFormat(CultureInfo.InvariantCulture, "{0}{1}", 
                      this.FormatProperCase(multiPartValue[i]), 
                      separator.ToString(CultureInfo.InvariantCulture));
                }
            }

            if (multiPartValue[lastPart].Length > 0)
            {
                result.Append(this.FormatProperCase(multiPartValue[lastPart]));
            }

            return result.ToString();
        }

        /// 
        /// Initializes dictionary of pattern names and regex "formulas"
        /// 
        private void InitializeDictionary()
        {
            // a regular expression to define salutations for the proper case function
            string salutations = 
               @"(^m(r|s)\.?$)|(^mrs\.?$)|(^mi(s){2}\.?$)|(^(j|s)r\.?,?$)";
            // a regular expression string to match PhD or LegD and any variants with periods
            string firstLastCap = @"(^leg\.?d\.?,?$)|(^ph\.?d\.?,?$)";
           // a regular expression string that matches degrees and professional designations
            //// and ensures that they are in all caps
            //// this will match: MVP and MCP, DSC, CNA, CCNA
            //// and CCNP, MCSE and MCSA and MCSD, CISM and CISA
            //// DDS, RN, MD and OD, BA and MA, CISSP
            string allUpperCase = @"(^m(v|c)p\,?\.?$)|(^dsc\.?\,?$)|(^cna\.?\," + 
               @"?$)|(^c{2}n(a|p)\.?\,?$)|(^mcs[ead]\.?\,?$)|(^cis(a|m\.?\,?)$)|" + 
               @"(^d{2}s$\.?\,?$)|(^rn\.?\,?$)|(^(m|o)\.?d\.?\,?$" + 
               @")|(^(b|m)\.?a\.?\,?$)|(^cis{2}p\.?\,?$)";
            //// a regular expression to match the Mc's
            //// and Mac's of the world, but NOT MCSE MCSD or MCSA.
            //// this uses negative look ahead to rule out those possibilities.
            string mcAndMac = @"^(ma?c)(?!s[ead]$)((.+))$";
            //// a regular expression to match Roman numerals
            string romanNumerals = @"^((?=[MDCLXVI])((M{0,3})((C[DM])|(D?" + 
               @"C{0,3}))?((X[LC])|(L?XX{0,2})|L)?((I[VX])|(V?(II{0,2}))|V)?)),?$";
            this.patternDictionary = new Dictionary();
            this.patternDictionary.Add(Pattern.AllUpperCase, allUpperCase);
            this.patternDictionary.Add(Pattern.FirstAndLastCapitals, firstLastCap);
            this.patternDictionary.Add(Pattern.McAndMac, mcAndMac);
            this.patternDictionary.Add(Pattern.RomanNumerals, romanNumerals);
            this.patternDictionary.Add(Pattern.Salutation, salutations);
        }
        #endregion Methods
    }
}

License

This article, along with any associated source code and files, is licensed under The Mozilla Public License 1.1 (MPL 1.1)

Share

About the Author

kdmitry
Architect
United States United States
No Biography provided

Comments and Discussions

 
GeneralMy vote of 1 Pin
tystent30-Jun-09 4:17
Membertystent30-Jun-09 4:17 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.