Click here to Skip to main content
15,884,298 members
Articles / Programming Languages / C#

Windows-based Version of grep in C#

Rate me:
Please Sign up or sign in to vote.
5.00/5 (6 votes)
28 Mar 2021CPOL5 min read 9.3K   370   14   5
grep in C# - Windows-based version
In this article, you will learn about the Windows-based version of the very popular GNU Unix utility grep, written in C#.

Introduction

I think there is no need to say much about this one, as the name speaks enough for itself – this is a Windows-based version of the very popular GNU Unix utility grep, written in C#. Basically, I needed to use the functionality of grep utility in Windows environment, so I decided to create a solution of my own.

For those who don't know about grep – grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command g/re/p (globally search for a regular expression and print matching lines), which has the same effect. grep was originally developed for the Unix operating system, but later available for all Unix-like systems and some others such as OS-9.

For more information, go to https://en.wikipedia.org/wiki/Grep.

Usage / Syntax

BAT
Wingrep [--color] [options] [-e pattern1 -e pattern2 ... | -f pattern_file |pattern] [filelist]

Options:

BAT
-c : This prints only count of the lines that match a pattern
-h : Display the matched lines, but do not display the filenames.
-i : Ignores case for matching
-l : Displays list of filenames only.
-n : Display the matched lines and their line numbers.
-v : This prints out all the lines that do not match the pattern
-e exp : Specifies expression with this option. Can use multiple times.
-f file : Takes patterns from file, one per line.
-w : Match whole word
-r : Search directory recursively
-o : Print only the matched parts of a matching line, 
     with each such part on a separate output line.
-A n : Prints searched line and n lines after the result.
-B n : Prints searched line and n line before the result.
-C n : Prints searched line and n lines after and before the result.

--color : Print match in yellow. This option needs to be first.

Minimum number of arguments is 2 - pattern and at least one filename. In that case, a default option of -h (display matched lines) is assumed.

-e is used to specify multiple expressions, like this:

BAT
wingrep --color -n -e "pattern" -e "another_pattern" file1.txt

-f is used to specify a file containing the patterns. There can be any number of patterns in the file. Each line in the file is considered a new pattern.

BAT
wingrep --color -n -f "patterns.txt" file1.txt

Patterns.txt

Pattern1
Pattern2
Pattern3
...

Options A, B or C cannot be used in combination with options c, h, l, n, v or o. Also, ABC cannot be mutually combined, and chlnvo cannot be mutually combined. They can all be combined with i, w or r.

There is an additional option to print out the matches inside the lines in yellow. If chosen, this option needs to go first, i.e.:

BAT
wingrep --color -n „pattern" file1.txt

Filelist can be both a regular filename or a Windows filename pattern.

BAT
wingrep -n „pattern" *.txt

Example Output

file.txt

-c : This prints only count of the lines that match a pattern
-h : Display the matched lines, but do not display the filenames.
-i : Ignores case for matching
-l : Displays list of filenames only.
-n : Display the matched lines and their line numbers.
-v : This prints out all the lines that do not match the pattern
-e exp : Specifies expression with this option. Can use multiple times.
-f file : Takes patterns from file, one per line.
-w : Match whole word
-r : Search directory recursively
-o : Print only the matched parts of a matching line, 
     with each such part on a separate output line.
-A n : Prints searched line and n lines after the result.
-B n : Prints searched line and n line before the result.
-C n : Prints searched line and n lines after and before the result.

--color : Print match in yellow. This option needs to be first.

Option -c

Prints only the count of the lines that match a pattern.

BAT
wingrep -c "Display" file.txt
------------------------------

D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:3

Option -h

Prints only the matched lines.

BAT
wingrep --color -h "Display" file.txt
--------------------------------------

-h : <span style="color:yellow">Display</span> the matched lines, but do not display the filenames.
-l : <span style="color:yellow">Display</span>s list of filenames only.
-n : <span style="color:yellow">Display</span> the matched lines and their line numbers.

Option -l

Prints only the filenames containing matches.

BAT
wingrep -l "Display" file.txt
--------------------------------------

D:\VS Projects\_temp\wingrep\bin\Debug\file.txt

Option -n

Display the filename, the line number and the line containing matches (separated with : )

BAT
wingrep --color -n "Display" file.txt
--------------------------------------

D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:2:-h : <span style="color:yellow">Display</span> the matched lines, but do not display the filenames.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-l : <span style="color:yellow">Display</span>s list of filenames only.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:5:-n : <span style="color:yellow">Display</span> the matched lines and their line numbers.

Option -v

Display filenames and all the non-matching lines (inverse search).

BAT
wingrep -v "Display" file.txt
--------------------------------------

D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-c : This prints only count of the lines that match a pattern
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-i : Ignores case for matching
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-v : This prints out all the lines that do not match the pattern
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-e exp : Specifies expression with this option. Can use multiple times.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-f file : Takes patterns from file, one per line.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-E : Treats pattern as an extended regular expression (ERE)
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-w : Match whole word
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-r : Search directory recursively
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-o : Print only the matched parts of a matching line, with each such part on a separate output line.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-A n : Prints searched line and n lines after the result.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-B n : Prints searched line and n line before the result.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:-C n : Prints searched line and n lines after and before the result.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:--color : Print match in yellow. This option needs to be first.

Option -o

Prints the filename and only the matched parts of a matched line.

BAT
wingrep --color -o "Display" file.txt
--------------------------------------

D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:<span style="color:yellow">Display</span>
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:<span style="color:yellow">Display</span>
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:<span style="color:yellow">Display</span>

Options A, B, C

These options are used to print the lines that contain matches plus a desired number of lines before and/or after the matching line.

They are accompanied with the number of lines.

Option -A

Print n lines after the match.

BAT
wingrep --color -A 2 "Display" file.txt
--------------------------------------

D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:2:-h : <span style="color:yellow">Display</span> the matched lines, but do not display the filenames.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:3:-i : Ignores case for matching
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-l : Displays list of filenames only.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-l : <span style="color:yellow">Display</span>s list of filenames only.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:5:-v : This prints out all the lines that do not match the pattern
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:6:-e exp : Specifies expression with this option. Can use multiple times.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:5:-n : <span style="color:yellow">Display</span> the matched lines and their line numbers.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:7:-w : Match whole word
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:7:-r : Search directory recursively

Option -B

Print n lines before the match.

BAT
wingrep --color -B 2 "Display" file.txt
--------------------------------------

D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:2:-c : This prints only count of the lines that match a pattern
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:2:-h : <span style="color:yellow">Display</span> the matched lines, but do not display the filenames.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:3:-h : Display the matched lines, but do not display the filenames.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:3:-i : Ignores case for matching
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-l : <span style="color:yellow">Display</span>s list of filenames only.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-i : Ignores case for matching
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-l : Displays list of filenames only.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:5:-n : <span style="color:yellow">Display</span> the matched lines and their line numbers.

Option -C

Print n lines before AND after the match.

BAT
wingrep --color -C 2 "Display" file.txt
--------------------------------------

D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:2:-c : This prints only count of the lines that match a pattern
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:2:-h : <span style="color:yellow">Display</span> the matched lines, but do not display the filenames.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:3:-i : Ignores case for matching
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-l : Displays list of filenames only.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:3:-h : Display the matched lines, but do not display the filenames.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:3:-i : Ignores case for matching
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-l : <span style="color:yellow">Display</span>s list of filenames only.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:5:-v : This prints out all the lines that do not match the pattern
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:6:-e exp : Specifies expression with this option. Can use multiple times.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-i : Ignores case for matching
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:4:-l : Displays list of filenames only.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:5:-n : <span style="color:yellow">Display</span> the matched lines and their line numbers.
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:6:-w : Match whole word
D:\VS Projects\_temp\wingrep\bin\Debug\file.txt:7:-r : Search directory recursively

Using the code

The code is fairly short and simple. The pseudo-code would look something like this:

  • Parse arguments
  • Initialize Regex
  • Foreach file
    • Foreach line
      • Search for regex matches
      • Print based on option

parseArgs Method

C#
static void parseArgs(string[] args, ref ArrayList options, 
ref ArrayList patterns, ref ArrayList files, ref bool color)

The arguments are parsed in a separate method – one by one, and the options, patterns and files are returned through ref type ArrayList variables. Bool color says if match coloring should be used or not.

The options are gathered in two elements opf ArrayList options. The first index [0] is used for options A, B, C and number of lines, i.e., A4 or C3. The second index [1] is used for all other options (chlnvoiwr).

If there is an illegal combination of options, an error is returned (method error is called).

Regex

All of the patterns are put together in a single regular expression.

C#
string allpatterns = String.Concat(_patterns.Cast<string>().Select(x => x + '|'));
allpatterns = "(" + allpatterns.Substring(0, allpatterns.Length - 1) + ")";
rx = new Regex(allpatterns, rxo);

Also, if option -i is used, Regex is initialized with RegexOptions.IgnoreCase .

For -r option, a recursive search is used for all the subdirectories – with option SearchOption.AllDirectories .

If option -w is used, this means only the whole words are matched, so the regular expression is additionally surrounded with lookbehind (?<=[ \t\n]|^) and lookahead (?=[ \t\n]|$) – meaning the regular expression needs to be surrounded by empty space or horizontal tab or newline.

C#
if (_options[1].ToString().Contains("w"))
    for (int i = 0; i < _patterns.Count; i++)
        _patterns[i] = "(?<=[ \\t\\n]|^)" + _patterns[i] + "(?=[ \\t\\n]|$)";

Do the Work

First, for each file pattern, all the files that match the pattern are enumerated. Then, each file in the enumeration is read line by line, and the line is matched against the regex. If there are matches, method print is called with the proper arguments (depending on the selected option).

For option -l, we don't need to check all of the lines – if there is at least one line that matches the regex, the loop is broken, the filename is printed and we go to the next file.

For option -c, we only count the matching lines, we do not print them out.

print method

C#
static void print(string path, long line_number, string line,
MatchCollection matches, ConsoleColor color, string option)
{
    if (option == "o")
    {
        foreach (Match m in matches)
        {
            Console.Write(path + ":");
            Console.ForegroundColor = color;
            Console.WriteLine(m.Value);
            Console.ResetColor();
        }
    }
    else if (option == "h")
    {
        printLineWithColor(line, matches, color);
    }
    else if (option == "l")
    {
        Console.WriteLine(path);
    }
    else if (stringMatches(option, "ABCn") > 0)
    {
        Console.Write(path + ":" + line_number.ToString() + ":");
        printLineWithColor(line, matches, color);
    }
    else if (option == "v")
    {
        Console.WriteLine(path + ":" + line);
    }
}

static void printLineWithColor(string line, MatchCollection rxmc, ConsoleColor MATCH_COLOR)
{
    Console.Write(line.Substring(0, rxmc[0].Index));
    for (int i = 1; i < rxmc.Count; i++)
    {
        Console.ForegroundColor = MATCH_COLOR;
        Console.Write(rxmc[i - 1]);
        Console.ResetColor();
        Console.Write(line.Substring(rxmc[i - 1].Index + rxmc[i - 1].Length,
        rxmc[i].Index - rxmc[i - 1].Index - rxmc[i - 1].Length));
    }
    Console.ForegroundColor = MATCH_COLOR;
    Console.Write(rxmc[rxmc.Count - 1]);
    Console.ResetColor();
    Console.WriteLine(line.Substring(rxmc[rxmc.Count - 1].Index + rxmc[rxmc.Count - 1].Length));
}

Points of Interest

Peekahead and peekbehind

In order to be able to print out the lines before a match (options B and C), I needed to introduce a variable that would always store the last n lines. The best way to do that was to use a Queue; it offers the simplest way of adding new lines (Enqueue) and deleting the oldest line (last in queue – Dequeue).

On the other hand, for options A and C, I needed to ensure a possibility of looking ahead for n lines in the stream. This is not supported by the StreamReader object I am using to read the stream, so I needed to be creative. Therefore, I introduced another Queue object, which would hold all the lines that I read but haven't used yet (they haven't been compared to the pattern).

So, each time we run into a line that contains a match, we print out that line, and print n lines after that line, consuming those n lines. We put those line in the Queue. At the beginning of each iteration, we first check if there are any lines in the _peekahead queue – if there are, use them first, and ONLY if the _peekahead is empty, consume another line from the stream.

C#
// if there are lines in peekahed, read them first
                        if (_peekahead.Count > 0)
                            _line = _peekahead.Dequeue();
                        else // else read new line from file
                        {
                            _line = _sr.ReadLine();
                            if (_line == null)
                                break;
                        }

Here is the code where the printing for options A, B and C is handled:

C#
else if (_options[0].ToString() != "") // ABC - print lines before/after match
{
    if (stringMatches(_options[0].ToString(), "BC") > 0) // for B and C options, 
                                                         // write out previous n lines
    {
        // peekbehind is used to remember n lines before the match
        if (match)
        {
            _tmpcnt = _peekbehind.Count;
            for (int i = 0; i < Math.Min(_ABC_nlines, _tmpcnt); i++)
            {
                _tmpline = _peekbehind.Dequeue();
                Console.WriteLine(path + ":" + 
                (_cnt_lines - _peekbehind.Count).ToString() + ":" + _tmpline);
                _peekbehind.Enqueue(_tmpline);
            }
            if (!_options[0].ToString().StartsWith("C")) // to avoid writing the 
                                                         // line twice in case of option C
                print(path, _cnt_lines, _line, rxmc, _color ? 
                      MATCH_COLOR : Console.ForegroundColor, "B");
        }
        if (_peekbehind.Count == _ABC_nlines)
            _peekbehind.Dequeue();
        _peekbehind.Enqueue(_line);
    }
    if (stringMatches(_options[0].ToString(), "AC") == 1 && match)
    {
        for (int i = 0; i <= _ABC_nlines; i++)
        {
            if (i == 0)
                print(path, _cnt_lines, _line, rxmc, _color ? 
                      MATCH_COLOR : Console.ForegroundColor, "A");
            else
                Console.WriteLine(path + ":" + (_cnt_lines + i).ToString() + ":" + _line);
            _line = _sr.ReadLine();
            if (_line == null)
                break;
            // using lookahead queue not to waste the lines we consume with AC options
            _peekahead.Enqueue(_line);
        }
    }
}

History

  • 28th March, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
User Interface Analyst Raiffeisenbank Austria
Croatia Croatia
I acquired Masters degree in computing science at the Faculty of Electrical Engineering and Computing in Zagreb, Croatia in 2009. Following my studies, I got a job in a Croatian branch of Austrian-based CEE Raiffeisen Bank as an MIS (Management information system) analyst.
I have been working there since 2010, as an IT expert within the Controlling department, maintaining the Oracle's OFSA system, underlying interfaces and databases.
Throughout that time, I have worked with several different technologies, which include SQL & PL/SQL (mostly), postgres, Cognos BI, Apparo, Datastage, ODI, Jenkins, Qlik, ...
I am doing a lot of automation with scripting in batch / shell and VBscript (mostly) - data analysis and processing, automated DB imports and exports, Jenkins automation etc.
Privately, I was mostly doing Windows Forms and Console app tools in Visual Studio, C#.

Comments and Discussions

 
QuestionSpeed? Pin
Michael B. Smith29-Mar-21 7:44
Michael B. Smith29-Mar-21 7:44 
GeneralMy vote of 5 Pin
Member 1370414329-Mar-21 3:55
Member 1370414329-Mar-21 3:55 
Questionwow Pin
Member 1024352829-Mar-21 1:50
Member 1024352829-Mar-21 1:50 
GeneralWindows utility: FINDSTR Pin
Izhar A.28-Mar-21 23:37
Izhar A.28-Mar-21 23:37 
GeneralRe: Windows utility: FINDSTR Pin
Member 145017781-Apr-21 11:38
Member 145017781-Apr-21 11:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.