Click here to Skip to main content
15,885,767 members
Articles / Programming Languages / Perl

PERL: Removing Text Within Parentheses

Rate me:
Please Sign up or sign in to vote.
4.50/5 (2 votes)
18 Aug 2010CPOL2 min read 21K   6   1
Removing Text Within Parentheses


I recently wanted to get a list of major landmarks, but the text list had the name of the landmark, followed by the location of the landmark in parentheses: e.g. Eiffel Tower (Paris, France). I just wanted the names of the landmarks without the text in the parentheses, so had to figure out the command to remove all the parenthesized text.

Perl was the winner as tool of choice. There was a small trick to doing this seemingly trivial task, so document it we shall.

Original: Eiffel Tower (Paris, France)
Desired: Eiffel Tower

There are two ways to do it based on what you want:

perl -p -e 's#\(.*\)##g' textfile

You may have seen 's/oldtext/newtext/g' as the syntax before and are wondering why I am using hash marks (or pound signs) instead. You don't have to use the forward slash, it is just the common way, but if you want to use the forward slash in the search text without having to escape, using hash marks is the way. It can also be used to make it easier to read. Now, onto the command--the \( obviously says look for a left parenthesis, then there is the critical .* which says find any number of any characters. Finally, we close it off with a right parenthesis. This will find anything encapsulated by two parentheses.

perl -p -e 's#\([^)]*\)##g' textfile

This solution will also do the same thing based on our Original text string, but it is slightly different. The [^)] is telling "any character that is not a right parenthesis." The carat (^) is negating everything in the brackets. This is useful if you are making an exclusion set. You can place several characters, [^$)?], and it will look for any character except a $, ), or ?.

Since the two commands work the same for the given example, let's show how the commands will vary in different situations:

If textfile contains:

  1. Paris (France,) Hilton (Hotel)
  2. Paris (France (Hilton) Hotel)

Using:

perl -p -e 's#\(.*\)##g' textfile

The results would be:

  1. Paris
  2. Paris

Note the danger here is that, even though in line 1 Hilton is not in parentheses, it gets removed because there is an ending right parenthesis at the end of the line. This may not be the expected/intended operation.

Next, using:

perl -p -e 's#\([^)]*\)##g' textfile

The results would be:

  1. Paris Hilton
  2. Paris Hotel)

The operation for line 1 may have been what we were expecting, but line 2 doesn't look good. The moral here is to understand what you are trying to do and choose the correct command to do the appropriate operation.

This article was originally posted at http://www.chiefsandendians.com/feeds/posts/default

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Chief Technology Officer Chiefs And Endians
United States United States
Come visit us at http://www.chiefsandendians.com

A compilation of varied technical gems learned over many years of experience in the industry.

Hint for the confused: Endian is a Computer Science term--the title is a play on words, not a misspelling.

Comments and Discussions

 
GeneralCPAN is your friend. Pin
Nicky___23-Aug-10 11:47
professionalNicky___23-Aug-10 11:47 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.