Click here to Skip to main content
15,867,771 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
The following Perl script cureently reads in an html file and strips off what I don't need. It also opens up a csv document which is blank.
My problem being is I want to import the stripped down results into the CSV's 3 fields using Name as field 1, Lives in as field 2 and commented as field 3.
The results are getting displayed in the cmd prompt but not in the CSV.

use warnings; 
use strict;  
use DBI;
use HTML::TreeBuilder;  
use Text::CSV;
open (FILE, 'punter.htm'); 
#open (my $fh, ">punter.csv") || die "couldn't open the file!";

 my $csv = Text::CSV->new (); 
 
$csv->column_names('field1', 'field2', 'field3'); 
open my $fh, ">", "punter.csv" or die "new.csv $!"; 
while ( my $l = $csv->getline_hr(my $fh)) { 
    next if ($l->{'field1'} =~ /xxx/); 
    printf "Field1: %s Field2: %s Field3: %s\n", $l->{'field1'}, $l->{'field2'}, $1->{'field3'}; 
$csv->print (my $fh, [my $name, my $location, my $comment]);
} 
close my $fh1 or die "$!"; 
my $tree = HTML::TreeBuilder->new_from_content(     do { local $/; <FILE> } ); 

for ( $tree->look_down( 'class' => 'postbody' ) ) 
{     
my $location = $_->look_down( 'class' => 'posthilit' )->as_trimmed_text;     
my $comment  = $_->look_down( 'class' => 'content' )->as_trimmed_text;     my $name     = $_->look_down( '_tag'  => 'h3' )->as_text;     
$name =~ s/^Re:\s*//;     
$name =~ s/\s*$location\s*$//;      
print "Name: $name\nLives in: $location\nCommented: $comment\n"; } 


An example of the html is -
<pre lang="xml"><div class="postbody"> <h3><a href "foo">Re: John Smith <span class="posthilit">England</span></a></h3> <div class="content">Is C# better than Visula Basic?</div> </div>



How can I get the results into a CSV?
Posted

I believe your error is rooted in not really understanding the meaning of my. You are using it all over the place, but when you do that you are creating a new variable in the enclosing block. You should really go back and check every instance of my to see if that is what you really want to do.

Specifically in (but not limited to) the line:
$csv->print (my $fh, [my $name, my $location, my $comment]);


You are:
  • creating a new variable $fh (masking the $fh from your open) where print is expecting you to give it an IO handle
  • creating three new variables $name, $location, $comment where print is expecting to get an arrayref

and note that none of these new variables have values, so no wonder nothing is being printed. The only reason your close is not giving you a warning is that you mistyped the $fh as $fhl. Just fix that and you should see the warning "my" variable $fh masks earlier declaration in same scope.

The CSV section would be better as something like this (untested):
my @names = qw(name location comment)
$csv->column_names(@names); 
open my $fh, ">", "punter.csv" or die "new.csv $!"; 
while ( my $l = $csv->getline_hr($fh)) { 
    next if ($l->{'name'} =~ /xxx/); 
    for(@names) { print "$_: ",$l->{$_} }
    $csv->print ($fh, $l);
} 
close $fh or die "$!"; 


This takes advantage of the column naming feature also, and should cope with any number of fields. By the way, I'd suggest never using $l as a variable name in Perl as it looks too much like $1 in many fonts, which of course has a special regex meaning.
 
Share this answer
 
v9
Comments
LamboLambo 20-Jul-11 11:07am    
Great stuff works a treat, thanks for the input about the usage of 'my' I understand how it can conflict if used too often.
If you want just want plain implementation
just use single print statement to a file .CSV file is nothing but fields seperated by comma.
PERL
open (MYFILE, ">>$tempFile");
print MYFILE  "field1,field2,field3\n";



If you want to use Text::CSV,
Then I don't think you should use that "my" in "my $fh" again.
here -
while ( my $l = $csv->getline_hr(my $fh)) {
 
Share this answer
 
v4
Comments
Uilleam 19-Jul-11 12:53pm    
While I do sometimes use simple prints for CSVs too, it is a bad habit and will break easily. In this case, the second and third fields in particular could contain commas (e.g. location of "Dallas, TX"), and Text::CSV will quote that properly to avoid issues when reading it later.
harish85 20-Jul-11 19:42pm    
Thanks. Yes what I said was if the OP (after seeing the redecarlation of variables with my everywhere) just require a plain implementation to CSV can go for normal printing to a file directly. But I don't consider its had habit to not to use "Text::CSV" , you can custom that writing with your implementation too.
You have explained it very well in your post. Neat work, My 5!
Thanks,-Harish

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900