Extract Values from a file and perform calculations

Question

0.00/5 (No votes)

See more:

Hi guys,

>I'm working on a project that involves extracting the values from a file,that is in .npd format(this is not my problem) and then perform some calculations.
The file contains several columns of data(without any heading). I have to give each column a name, and do some calculations.

The raw file looks like :

1 $Abc 21:00:55 20 33 56 1 $Abc 21:00:56 22 34 56 
2 $Abc 22:00:34 43 30 45 2 $Abc 22:00:35 44 36 45
3 $Abc 22:23:23 19 88 67 3 $Abc 22:23:24 12 82 63
4 $Abc 23:40:29 20 20 20 4 $Abc 23:40:30 26 26 24
5 $Abc 23:55:00 34 21 28 5 $Abc 23:55:01 37 26 29

1)I have to perform calculations on the 2nd, 3rd and 4th column and print it on a file with heading( Which I managed to do just now). for example add 1 to each value and give headings.
2)I have to delete the values starting from the 6th column!! I cannot use tokenise method, using "$" as delimiters because there are 2 columns starting with the symbol "$". And I want the values starting
The last6 columns must be completely removed.

Heading1 Heading 2 Heading 3
21       34        57         
44       31        46
20       89        67 
21       21        21
35       22        29

2) Then find the mean of each column.

Mean of column 1:
Mean of column 2: 
Mean of column 3:

Also one of the columns contain 'N' and the column to be calculated is to the left of this column.
I know how to open the file, but sorting the data is a bit confusing, please help, Examples would be very helpful!!!
[Update]

I used this code to get the value from the 4th column, but I cannot make any calculations on these values. for example :

C++

void main{
str(_T("%21:00:55 $GPGGA 210055 6102.00399 six"));
CAtlString resToken;
int curPos = 24;
double val;
resToken= str.Tokenize(_T(" "),curPos);
_tprintf_s(_T("Resulting token: %s\n"), resToken);
}

This prints the 4th column value but I have to do some calculation on that variable, For which I have to convert from string to double and back to string to print the values. Is this feasible?

[/Update]

Posted 22-May-12 2:29am

Sumal.V

Updated 24-May-12 22:31pm

v12

Add a Solution

Comments

Maximilien 22-May-12 8:57am

What is an npd file?
from which software does it come from?
Is it ascii/binary?
Do you need to display the data to the user? (i assume so, if you have to add headers)
What platform/SDK are you using? Windows/MFC/STL ?

Sumal.V 22-May-12 9:06am

NPD: Novell Printer Definition (NPD) files appear in the Printer Type list when you are creating a Printer Agent using the Novell Gateway's Print Device Subsystem (PDS).(Googled it : http://www.novell.com/fr-fr/documentation/nw51/docui/#../ndps_enu/data/hxiyy2w0.html)

Yes I'm working on windows and use visual C++. Yes I have to give a name to each column and display to the user,in a text format.

scottgp 23-May-12 11:28am

In a case like this, would it make more sense to use a multi-purpose tool like R (http://www.r-project.org/), or some reporting tool, rather than writing a single use program?

Maciej Los 23-May-12 13:09pm

Some text was moved from comment...

2 solutions

Add a Solution

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Aescleal · Answer 1 · 2012-05-23T04:40:00

Solution 2

First off forget all this ATL/MFC rubbish, you're making stuff far more complicated than it needs to be. Just back away, hands where I can see them - reach for a CString and you're history.

The algorithm for this is pretty simple isn't it?

- Open an ouput file
- Output the column headings

- Open the input file
- Until you hit the end of file:
    - Extract a string (the date) and discard it
    - Read three ints
    - add them to three running totals
    - ouput them to the output file
    - update a line counter

- Divide each running total by the number of lines
- Output the means to the results file

So how would you do that in C++?

Well the input and output files are just std::ifstream and std::ofstream objects. Outputing data is just operator<< on the output file. Reading the data is just operator >> on the input file, the rest is just arithmetic.

Posted 23-May-12 4:40am

Aescleal

Comments

Sumal.V 23-May-12 11:49am

And to output the file is it better to use .csv file as I have to save them in columns and its rely confusing!

Aescleal 23-May-12 13:54pm

That's just a detail change for the "output them to the output file".

Sumal.V 25-May-12 4:33am

Yes I managed to open the source file, read the values and print them onto the destination file, with headings. But I have to discard the last set of values and again performing calculation on this is a little confusing..

Aescleal 25-May-12 5:18am

If you mean you have to discard the last line of data in the file, then you can delay writing the values to the file and updating the totals and line counter by one line.

Ah, hang on, see what the problem is from the data file format. To get rid of the data in the columns you don't want read them into variables but don't do anything with them.

Sumal.V 25-May-12 5:30am

Exactly, to not read the data from the file I have to set some kind of a condition and that can be either the discarding the column based on a format, but this row has 2 variables of the same format.

And I really don't understand what you mean by delaying ..

CPallini · Answer 2 · 2012-05-22T03:47:00

Solution 1

For each column, use a std::vector<int> to store read values and then perform all calculations you like on such vectors' data.

[update]

Member 8446342 wrote:
1 $Abc 21:00:55 20 33 56 1 $Abc 21:00:56 22 34 56
2 $Abc 22:00:34 43 30 45 2 $Abc 22:00:35 44 36 45
3 $Abc 22:23:23 19 88 67 3 $Abc 22:23:24 12 82 63
4 $Abc 23:40:29 20 20 20 4 $Abc 23:40:30 26 26 24
5 $Abc 23:55:00 34 21 28 5 $Abc 23:55:01 37 26 29

Since that is a pretty fixed format, you may quickly extract the needed values using sscanf, e.g.:

C++

int col[3];
char tmp[MAX_PATH];
CString s ="1 $Abc 21:00:55 20 33 56 1 $Abc 21:00:56 22 34 56";
if ( sscanf(s, "%s %s %s %d %d %d", tmp, tmp, tmp, &col[0], &col[1], &col[2])!= 6 )
{
  // handle error
}

[/update]

Posted 22-May-12 3:47am

CPallini

Updated 24-May-12 23:06pm

v3

Comments

Sumal.V 22-May-12 9:54am

But how can I separate the columns? There are no commas, or any other delimiters between each column..

Maximilien 22-May-12 10:17am

"space" is a delimiter.

Sumal.V 22-May-12 10:56am

In one of the code, in order to ignore the comment lines, I can type:
if (line.Find(_T("%")) == 0)
return;
//where the line is a CSring and "%" is used for comment lines in .csv files
But now since I have to find many spaces ie to calculate the 4th column I have to check 3 spaces in a row, so how can I do that?

CPallini 23-May-12 4:49am

You may use the CString::Tokenize method to extract the values from the input line. HAve a look at the documentation
http://msdn.microsoft.com/en-us/library/k4ftfkd2.aspx

Sumal.V 23-May-12 6:33am

Thanks, But by using tokenise function I can use space as delimiter, but there is space after every column in the file and I have to extract a particular column say the 5th one, how can I do that?

CPallini 23-May-12 6:36am

Uhm, counting?

Sumal.V 23-May-12 6:40am

:)

CPallini 23-May-12 6:56am

You may also use a regex class: see, for instance
http://stackoverflow.com/questions/181624/c-what-regex-library-should-i-use
though, I think, it would be overkill.

Sumal.V 23-May-12 9:46am

I used this code to get the value from the 4th column, but I cannot make any calculations on these values. for example :
void main{
str(_T("%21:00:55 $GPGGA 210055 6102.00399 six"));
CAtlString resToken;
int curPos = 24;
double val;

resToken= str.Tokenize(_T(" "),curPos);
_tprintf_s(_T("Resulting token: %s\n"), resToken);

}

thsi prints the 4th column value but I have to do some calculation on that variable, For which I have to convert from string to double and back to string to print the values. Is this feasible?

CPallini 23-May-12 17:10pm

Yes, it is feasible, using strtod or sscanf (check out the documentation).

Sumal.V 25-May-12 4:30am

Just managed to print heading in the file, I cannot use tokenise because of the format of the text in the file, as updated in my question. Cant point to a particular column in a row and delete all the values after that row.

Sumal.V 25-May-12 6:39am

Thanks I will try that!. Is that your phone number printed by mistake ?

CPallini 25-May-12 6:49am

>"Is that your phone number printed by mistake ?"
Uh?!

Sumal.V 25-May-12 6:55am

;) Nope in the solution, there is a phone number printed..
Is that yours ?

CPallini 25-May-12 7:30am

I see NO phone number.

Sumal.V 25-May-12 7:42am

OOps! In my screen there is a number! something is wrong with my screen then!

CPallini 25-May-12 7:46am

"There's someone in my head but it's not me."
:-D

Sumal.V 25-May-12 7:48am

Haha that's cool... will ignore :P

Sumal.V 28-May-12 4:37am

Hi sorry, I really did not understand that code above, could you please explain?

CPallini 28-May-12 13:20pm

Reading sscanf documentation would do the trick.

Extract Values from a file and perform calculations

2 solutions

Solution 2

Solution 1

Add your solution here

Preview 0

Extract Values from a file and perform calculations

2 solutions

Solution 2

Solution 1

Add your solution here

Preview 0

Existing Members

...or Join us