Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / C++
Article

Case-Insensitive String Search

Rate me:
Please Sign up or sign in to vote.
4.89/5 (15 votes)
1 Dec 19993 min read 151.4K   1.9K   17   23
A function which doesn't require changing the case of the strings, and was also DBCS (double-byte character set) friendly.

The C runtime includes a function called strstr which is used when you want to find an exact substring within a main string. To perform a case-insensitive search, you must first force the case of both strings either to lower-case or upper-case before you call strstr. I wanted a function which didn't require changing the case of the strings, and was also DBCS (double-byte character set) friendly.

Currently, Windows 2000 is only targeted for the Intel platform, and of course Windows 95/98 only runs on the Intel platform. That makes the use of assembly language a bit more appealing then it has in the past when NT ran on a PowerPC or an ALPHA CPU. Visual Studio doesn't include a separate assembler, and the debugger doesn't do a very good job of debugging assembly files if you add in an assembler. So, the easiest way to include assembly language into your project is to use the _asm block. In the string search functions, I let the C compiler handle the outer layer:

char* __fastcall stristrA(const char* pszMain, const char* pszSub)
{
}

Everything inside the function is handled within an _asm block. Note that MSDN has an article that says you should never use the __fastcall calling convention for functions using _asm blocks because the compiler could use any register for function arguments. There is another article in MSDN that states that the first two variables are always placed in the ECX and EDX registers. I'm going to believe the second article and ignore the first, but if you are concerned, you can change the calling convention and load up the ESI and EDI registers that we use from the stack. The advantage of the __fastcall calling convention is that nothing has to be pushed or popped from the stack, making the function faster to execute.

The first thing we do is to save both a lower-case and upper-case version of the first character of the string. If the first character is non-alphabetic, then we'll end up checking it twice. However, that only costs us an additional 3 clock cycles per character, but our savings of not having to call CharLower for every character is quite significant.

Once we have the first character set, we store the address of the CharNext function in the EDI register. This halves the overhead for the actual call -- and we call this function for every character in the main string (we use CharNext because it will correctly move the pointer over a DBCS character). We then walk through the main string checking every character against our upper-case and lower-case first character of the sub string. As soon as we find a first-character match, we switch to a loop that checks every character in the substring against our current position in the main string. If we have a mismatch, then we change the character in both the main string and the substring to lower case and compare again. In the sample project, we search for the word "god" in the string "What hath God wrought?". We find the match of 'G' without having to call CharLower and are able to match the rest of the string ("od") without ever having to call CharLower. Of course, if the main string was "What hath GOD wrought?" then we would call CharLower for both 'O' and 'D'.

The source code provides both an ANSI and Unicode version called stristrA and stristrW respectively. In your header file, you can specify which version you want to use with the following:

#ifdef UNICODE
#define stristr stristrW
#else
#define stristr stristrA
#endif

char*  __fastcall stristrA(const char* psz1, const char* psz2);
WCHAR* __fastcall stristrW(const WCHAR* pszMain, const WCHAR* pszSub);

In your code, you would simply specify stristr and the correct function would be chosen depending on whether you are compiling for Unicode or not.

The sample project doesn't do anything useful, but it does call both the ANSI and Unicode version of the function so that you can step through the code in the debugger to see how it works. In the ANSI function, setting the following watch points will make it easier to see what is happening:

ASM
al
lowerch,c
(char*) esi,s
(char*) edi,s
pszTmp1,s

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionNot working Pin
includeh103-Jul-12 14:46
includeh103-Jul-12 14:46 
GeneralRe: Not working Pin
Ralph Walden4-Jul-12 6:28
Ralph Walden4-Jul-12 6:28 
GeneralMy vote of 5 Pin
tolw15-Jun-11 21:00
tolw15-Jun-11 21:00 
Good job
GeneralMy vote of 5 Pin
tuncay2-Sep-10 4:30
tuncay2-Sep-10 4:30 
Generallook for StrStrI in shlwapi.lib Pin
_Olivier_7-Nov-07 2:53
_Olivier_7-Nov-07 2:53 
GeneralRe: look for StrStrI in shlwapi.lib Pin
spammehere12344-Dec-07 8:56
spammehere12344-Dec-07 8:56 
GeneralLicensing Pin
msvcyc1-Aug-07 10:08
msvcyc1-Aug-07 10:08 
GeneralRe: Licensing Pin
Ralph Walden6-Aug-07 7:12
Ralph Walden6-Aug-07 7:12 
GeneralGreat work! Pin
qweqwe26-Mar-06 7:30
qweqwe26-Mar-06 7:30 
GeneralRe: Great work! Pin
uni23487623847682374629-Jan-07 23:09
uni23487623847682374629-Jan-07 23:09 
GeneralJust what I needed Pin
none.provided29-Jul-05 13:59
none.provided29-Jul-05 13:59 
Generalpoorly written article.. Pin
Bbuz0073-May-05 6:26
sussBbuz0073-May-05 6:26 
Generalpoorly written article.. Pin
_buz0073-May-05 6:25
suss_buz0073-May-05 6:25 
GeneralRe: poorly written article.. Pin
frans13-Aug-05 23:40
frans13-Aug-05 23:40 
GeneralRe: poorly written article.. [modified] Pin
NeWi21-Jul-06 10:42
NeWi21-Jul-06 10:42 
GeneralRe: poorly written article.. Pin
sprice8611-May-07 6:13
professionalsprice8611-May-07 6:13 
Generaloptimization Pin
Paul M Watt21-Aug-02 20:03
mentorPaul M Watt21-Aug-02 20:03 
GeneralRe: optimization Pin
Oskar Wieland26-Dec-04 1:12
Oskar Wieland26-Dec-04 1:12 
GeneralRe: optimization Pin
Ralph Walden26-Dec-04 4:47
Ralph Walden26-Dec-04 4:47 
GeneralRe: optimization Pin
PJ Arends3-May-05 7:08
professionalPJ Arends3-May-05 7:08 
GeneralBug mentioned in article Pin
Todd Smith9-Apr-02 15:18
Todd Smith9-Apr-02 15:18 
GeneralString compare problem in java script Pin
5-Aug-01 22:55
suss5-Aug-01 22:55 
GeneralCase-insensitive Replace Pin
Uwe Keim13-Jan-00 9:46
sitebuilderUwe Keim13-Jan-00 9:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.