Parsing a file in Perl

Question

0.00/5 (No votes)

See more:

Hi,

I have only been learning Perl for less than 2 weeks. I am a C++ programmer.
I have attached a portion of the data below. The data is in file1.txt. I would like to move the data from file1.txt to file2.txt. But, I only want to keep the numbers.

Eg: I want row 1 to look like this:

1       1549367 11     8       3      11      0       -12.00  6.00    -0.25   -3.00   0.00    -1.67   -12.00  6.00    -0.64

Instead of this:

1       Chr26   1549367 11      GGGGGGGAAGA     8       3       Transition      11      0       -12.00  6.00    -0.25   -3.00   0.00    -1.67   -12.00  6.00    -0.64

This is what I have done so far (file1.txt will be in @ARGV):

open FILE2, "+>file2.txt" or die "Cant not open file2.txt!";
my $line;
while($line = readline(ARGV))
{
        print FILE2 $line;
}

The code above only copies content of file1.txt (ARGV) into file2.txt.
I tried to use ‘seek’ and ‘tell()’ but, to solve my problem above but, I got confused :(

I also tried this:

Open(FILE, "file1.txt")
@theFile = ;

This puts every row in the array @the File. But, I can I now modify the elements of one row? (I’m still a novice Perl programmer)

Thank you for your help

/………………………………………………………………………………………../
The file portion

1       Chr26   1549367 11      GGGGGGGAAGA     8       3       Transition      11      0       -12.00  6.00    -0.25   -3.00   0.00    -1.67   -12.00  6.00    -0.64
1       Chr26   1549501 15      ccCctctccccctCC 12      3       Transition      3       12      -17.00  6.00    0.50    1.00    6.00    2.67    -17.00  6.00    0.93
1       Chr26   1549552 14      AagAAaaAAAagga  11      3       Transition      6       8       -31.00  6.00    -2.09   -12.00  3.00    -5.67   -31.00  6.00    -2.86
1       Chr26   1549563 14      tAAaaAAAattat^Ft        9       5       Transversion    5       9       -7.00   6.00    0.22    -64.00  4.00    -18.40  -64.00  6.00    -6.43
1       Chr26   1549726 14      TtTtctTtTtTTTT  13      1       Transition      8       6       -3.00   6.00    1.92    6.00    6.00    6.00    -3.00   6.00    2.21
2       Chr26   1549737 16      T+1Atttt+1aT+1At+1aTt+1aT+1AT+1AT+1AT+1AtT+1A^FA        15      11      Transversion    16      10      -64.00  6.00    -35.67  -64.00  6.00    -46.18  -64.00  6.00    -40.12
2       Chr26   1549815 9       CtCTTTTTT       7       2       Transition      8       1       -3.00   6.00    -0.14   -9.00   0.00    -4.50   -9.00   6.00    -1.11
1       Chr26   1549914 12      gGGGGGGGAGgg    11      1       Transition      9       3       -9.00   6.00    1.18    -4.00   -4.00   -4.00   -9.00   6.00    0.75
1       Chr26   1550018

Posted 19-Oct-11 3:14am

The_Real_Chubaka

Updated 19-Oct-11 4:55am

Mehdi Gholam

v2

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

fjdiewornncalwe · Accepted Answer · 2011-10-19T03:45:00

You could do this in a couple of different ways.

The best would be to create the appropriate regular expression to clean the data as you want it, but another solution could be to do the following and then just add in a check for non-numeric characters in each column: I found this here(http://perdoc.perl.org)

How do I extract selected columns from a string?
(contributed by brian d foy)
If you know the columns that contain the data, you can use substr to extract a single column.

PERL

my $column = substr( $line, $start_column, $length );

You can use split if the columns are separated by whitespace or some other delimiter, as long as whitespace or the delimiter cannot appear as part of the data.

PERL

my $line = ' fred barney betty '; my @columns = split /\s+/, $line; # ( '', 'fred', 'barney', 'betty' ); my $line = 'fred||barney||betty'; my @columns = split /\|/, $line; # ( 'fred', '', 'barney', '', 'betty' );

If you want to work with comma-separated values, don't do this since that format is a bit more complicated. Use one of the modules that handle that format, such as Text::CSV , Text::CSV_XS , or Text::CSV_PP .

If you want to break apart an entire line of fixed columns, you can use unpack with the A (ASCII) format. By using a number after the format specifier, you can denote the column width. See the pack and unpack entries in perlfunc for more details.

PERL

my @fields = unpack( $line, "A8 A8 A8 A16 A4" );

Note that spaces in the format argument to unpack do not denote literal spaces. If you have space separated data, you may want split instead.

I haven't the time at the moment to create the regex for you as that would be my primary choice, or to update the code above to accommodate your question completely, but hopefully it can get you down the right path. I'll update my answer if I get a chance in the next couple of hours.