Click here to Skip to main content
15,885,754 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
Hello. I made a program in C and I'm finish it. My problem is the following though.
I read a xxx.in file that contains 1.000.000 numbers and store them in an dynamic array using scanf. Also I first read the first number in the file, with that number the file contain 1.000.001 nums, with the command "fscanf(fp, "%d", &t_size);" to know, just for case, that we want to create an array with size of 1.000.000. Then I store the values with the below code:

C
int i =0; 
for ( i = 0; i < t_size; i++ ) 
{ 
fscanf(fp, "%d",&my_array[i]); 
} 



My problem isn't the coding is the flexibility of my program. I want the procedure from the beginning through the store of values to be 1s and this can't happen with fscanf.

Can someone tell me if we can store so many nums in an dynamic array in 1 second (using fread or fets) and how can i make it?
Posted
Updated 18-Nov-14 6:14am
v8
Comments
Sergey Alexandrovich Kryukov 18-Nov-14 9:19am    
Use binary data instead.
—SA
Sergey Alexandrovich Kryukov 18-Nov-14 9:47am    
Don't use text format, it is inherently much slower than binary. Do you have an option to change the format of the file?
—SA
Sergey Alexandrovich Kryukov 18-Nov-14 9:59am    
You did not answer my question...
—SA
Sergey Alexandrovich Kryukov 18-Nov-14 12:01pm    
Oh, you don't know if you have to opportunity to suggest a different file format? Then, if someone gave you this problem to solve, ask this person about it. This is because you are facing the performance problem of reading a million of items, but text format seriously slows it down...
—SA
Sergey Alexandrovich Kryukov 18-Nov-14 15:47pm    
It'/s not about how to read it (isn't it obvious?). This is about the format of the file...
—SA

Try reading the entire file in one go, that should be fairly quick for a file of the size you're referring to.

Then, when you have everything in memory, it should be possible to do 1,000,000 rows in one second. Below is a naive implementation that first reads everything into a single char*, then splits that by line into a array of char* and then lastly converts each line to an int by calling atoi.
(Note that this code isn't tidied up, and probably needs to ).

This program takes the file to read as a command line parameter;

C
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <time.h>

int is_end(char* input) {
	return *input == 0;
}

int is_linebreak(char* input) {
	return *input == '\r' || *input == '\n' || *input == ' ';
}

char* eat_linebreaks(char* input) {
	while (is_linebreak(input))
		++input;

	return input;
}

size_t count_lines(char* input) {
	char* p = input;
	size_t rows = 1;

	if (is_end(p))
		return 0;
	
	while (!is_end(p)) {
		if (is_linebreak(p)) {
			++rows;
			p = eat_linebreaks(p);
		}
		else {
			++p;
		}
	}
	return rows;
}

/* split string by lines */
char** get_lines(char* input, size_t line_count) {
	char* p = input;
	char* from = input;
	size_t length = 0;
	size_t line = 0;
        int i;
	char** lines = (char**)malloc(line_count * sizeof(char*));

	do {
		if (is_end(p) || is_linebreak(p)) {
			lines[line] = (char*)malloc(length + 1);
			for (i = 0; i < length; ++i)
				lines[line][i] = *(from + i);

			lines[line][length] = 0;
			length = 0;
			++line;
			p = eat_linebreaks(p);
			from = p;
			
		}
		else {
			++length;
			++p;
		}
	} while (!is_end(p));

	// Copy the last line as well in case the input doesn't end in line-break
	lines[line] = (char*)malloc(length + 1);
	for (i = 0; i < length; ++i)
		lines[line][i] = *(from + i);

	lines[line][length] = 0;
	++line;


	return lines;
}

int main(int argc, char* argv[]) {
	clock_t start;
	unsigned long microseconds;
	float seconds;
	char** lines;
	size_t size;
	size_t number_of_rows;
	int count;
	int* my_array;
	start = clock();

	FILE *stream;
	char *contents;
	int fileSize = 0;
        int i;

	// Open file, find the size of it
	stream = fopen(argv[1], "rb");
	fseek(stream, 0L, SEEK_END);
	fileSize = ftell(stream);
	fseek(stream, 0L, SEEK_SET);

	// Allocate space for the entire file content
	contents = (char*)malloc(fileSize + 1);

	// Stream file into memory
	size = fread(contents, 1, fileSize, stream);
	contents[size] = 0; 
	fclose(stream);

	// Count rows in content
	number_of_rows = count_lines(contents);

	// Get array of char*, one for each line
	lines = get_lines(contents, number_of_rows);
	
	// Get the numbers out of the lines
	count = atoi(lines[0]); // First row has count
	my_array = (int*)malloc(count * sizeof(int));
	for (i = 0; i < count; ++i) {
		my_array[i] = atoi(lines[i + 1]);
	}

	microseconds = clock() - start;
	seconds = microseconds / 1000000.0f;
	printf("Took %fs", seconds);


	return 0;
}




Hope this helps,
Fredrik
 
Share this answer
 
v3
Comments
Fredrik Bornander 19-Nov-14 7:17am    
You pass the filename in as a command line argument, if you don't want to do that just change
fopen(argv[1], "rb")
to
fopen("in.txt", "rb").

It's true that you don't need to read the entire file as you only need to read the number of lines specified in the first line, but unless the file has a lot more lines after the last number, that's not going to be an issue. Reading the file is fast.
Fredrik Bornander 19-Nov-14 7:34am    
Did you give the full path to your file?
Has the file got lines that aren't numbers in it?
[no name] 19-Nov-14 7:47am    
the file is in the same directory as the program so i just add the filename. The program starts but after a while it crass. The file contains only two lines. First line is only one number the size of array and in the second one is the numbers to put. How can I edit the code to read the first number of the first line generate the array and put the elements of the second one in the array using your code? And how can I put my procedure (see above) in the code?
Fredrik Bornander 19-Nov-14 8:06am    
What does your file look like?
How does it crash?
Are you debugging this?
Fredrik Bornander 19-Nov-14 9:31am    
Oh, ok, I thought the file was line based, it's space separated as well. I'll update the code.
The title of your question is probably misleading. You are not looking for a more flexible program, but for more speed.

fscanf is a relatively complex function and hence relatively slow. It is worth a try to use fgets and then strtol instead. But even this might take longer than 1 second per a million lines.

What would make your input processing much faster is to use binary format instead of text format (See the comment of Sergey). In a binary file you could store your numbers bit by bit in the same way it resides in your dynamic array. And so you could input this array in a single call to fread. That would be orders of magnitude faster.

However: A binary file cannot be read and modified by a text editor. So, if your client or teacher request that the input be a text file, that technique won't work.
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 18-Nov-14 15:48pm    
Well, we agree; a 5.
—SA
nv3 18-Nov-14 17:01pm    
Thank you, Sergey.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900