|
It's sheer coincidence that the out of memory exception is thrown were you marked it in your code. You essentially read the file a line at a time and then append that into a StringBuilder instance. That means at one point there will be so much memory consumed that you'll run into that kind of exception either when reading a new line or when trying to append said line into the StringBuilder .
First question would be do you really need all those lines in memory at once. The program does not really do anything useful in that loop as it just pours it into that buffer.
Second question is how much memory does you system have and is it a 32 bit or a 64 bit OS.
Regards,
— Manfred
"I had the right to remain silent, but I didn't have the ability!"
Ron White, Comedian
|
|
|
|
|
Hi,
Its ok even if I read chunks of data.But i should read the entire line. See my text file will have data which spans to next line to form one complete sentence.
For example:
12/12/2013 John 03/12/1978
New York USA 1803-345-233
The above data can also be in same line.
12/12/2013 John 03/12/1978 New York USA 1803-345-233
So I want to read one complete line even if the file is read in chunks.
RAM:2gb
OS type:64 bit
BR,
Arjun
|
|
|
|
|
Arjun Mourya wrote: So I want to read one complete line even if the file is read in chunks. That still doesn't mean that you need the entire block in memory. Put what you read in a buffer until you reach the end of the sentence. Process the sentence, clear the buffer, continue reading.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|
|
 Hi,
I was able to read the file and insert each read line into database. So in order to show progress to the user, I used Backgroundworker with progressbar.Below is my code:
const string dataFile = @"F:\Bharath CS\Document1.txt";
public Form1()
{
InitializeComponent();
InitializeBackgroundWorker();
}
private void InitializeBackgroundWorker()
{
backgroundWorker1.DoWork +=
new DoWorkEventHandler(backgroundWorker1_DoWork);
backgroundWorker1.RunWorkerCompleted +=
new RunWorkerCompletedEventHandler(
backgroundWorker1_RunWorkerCompleted);
backgroundWorker1.ProgressChanged +=
new ProgressChangedEventHandler(
backgroundWorker1_ProgressChanged_1);
}
private void button1_Click(object sender, EventArgs e)
{
backgroundWorker1.RunWorkerAsync();
}
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
int count = 0;
string prev = "";
foreach (string line in File.ReadLines(dataFile))
{
if (backgroundWorker1.CancellationPending)
{
break;
}
backgroundWorker1.ReportProgress(count);
try
{
MySqlConnection conn1 = new MySqlConnection("server=demo;port=1234;database=demodb;userid=xyz;pwd=xyz");
conn1.Open();
MySqlCommand cmd1 = new MySqlCommand();
cmd1.Connection = conn1;
string s = line.Replace("\"", "");
if (s.Length > 0 && !(s.Contains("-")))
{
if (s.Contains("ED5."))
{
cmd1.CommandText = "insert into yashomati_demo values('" + s + "')";
cmd1.ExecuteNonQuery();
s = "";
count++;
}
cmd1.Dispose();
conn1.Close();
conn1.Dispose();
}
}
catch (Exception ex) { throw ex; }
}
}
private void backgroundWorker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
if (e.Cancelled)
{
MessageBox.Show("You've cancelled the backgroundworker!");
}
else
{
progressBar1.Value = 100;
MessageBox.Show("Done");
}
}
private void backgroundWorker1_ProgressChanged_1(object sender, ProgressChangedEventArgs e)
{
progressBar1.Value = e.ProgressPercentage;
}
private void button2_Click(object sender, EventArgs e)
{
backgroundWorker1.CancelAsync();
}
}
What I am actually facing the problem is , every line is getting inserted twice.
For example: if you have three lines in the file, all three lines gets inserted first and again same three lines is inserted(mean to say file is read again and all lines are inserted)
BR,
Arjun
modified 24-Dec-13 6:18am.
|
|
|
|
|
Firstly, there are two things to notice here:
1) The maximum size of any object in .NET is 2GB - so any contiguous block of memory that the system tries to allocate (regardless of whether you are operating in a 32 or 64 bit environment) must be less than 2GB.
2) StringBuilder works by allocating a chunk of memory and copying your new data into it every time you Append anything. If the space is too small, then the memory buffer is doubled, and the existing content copied in before the new data is added.
So, if you are trying to read 1.5GB, then it will try to allocate memory well and truly in excess of that...Not only will this be pretty slow, but it will need a full 2GB chunk to hold your data, and there will be a lot of Very Large objects created on the way there.
Secondly, how long do you think it is going to take to display 1.5GB of rich text in a RichTextBox? And what earthly use would it be to the user to do that? Do you want to sit there and scroll through that much text looking for the bit you want?
I would very strongly suggest that you reconsider this whole approach, and look at creating something that the user can actually use without wanting to beat you over the head with his keyboard...
|
|
|
|
|
Hi,
I don't mind pushing the whole data into any database(MySQL/MSSQL).
This code I had written for demo purpose only(just to check how long it will take to read a large text file).
Point me to a link or please provide me a sample code.
BR,
Arjun
|
|
|
|
|
Even then, you start doing silly things - a database is worse in many ways because there is no easy way in SQL to return a "chunk" of a column.
What are you trying to do with the data in the real world?
|
|
|
|
|
We try to read data from the file and insert into database and generate reports the data.Which number is dialled from which extension.
But I am failing to read itself.
BR,
Arjun
|
|
|
|
|
So start by looking at the data - how is it organised?
Hopefully (since it has a .TXT extension) it is line based - if so, then it should be pretty easy to handle.
Have you tried
string[] lines = File.ReadAllLines(datafile);
If that works, (and it should, even on a 64 bit system!) it gives you a chance to process each line and transfer that to a separate row in SQL - which would be a lot easier to work with!
|
|
|
|
|
I have a similar problem to be done.
My requirement is to read one large file and split it into 2 files depending on the content.
File format is flat file containing records in each line.
Depending on the record, it will either go into 1st or second file.
What I am currently doing is to read the file one line at a time, check it and write it to either of the two new files created.
With this approach, it is taking around 3 hrs for a file size of 900 MB.
I would like to improve the logic for faster processing.
Can anyone suggest on a better approach ?
|
|
|
|
|
Well, this did it in 59.413 seconds with a 900MB text file, but it also removed all the duplicate lines (of which there were a lot, I don't keep huge text files lying around!)
string origPath = @"D:\Temp\MyHugeText.txt";
string inPath = @"D:\Temp\MyHugeTextIn.txt";
string notInPath = @"D:\Temp\MyHugeTextOut.txt";
var lines = File.ReadLines(origPath);
var isIn = lines.Where(l => l.Contains("raise"));
var notIn = lines.Except(isIn);
File.WriteAllLines(inPath, isIn);
File.WriteAllLines(notInPath, notIn);
Even with the select test reversed so the the duplicates are still written to disk, we are talking about 96.693 seconds showing it's a bit disk limited!
Never underestimate the power of stupid things in large numbers
--- Serious Sam
|
|
|
|
|
Hi this is executing fast.But I will check how long it will take if i try processing each and every line.
BR,
Arjun
|
|
|
|
|
Thanks OriginalGriff. I will try this out.
BTW, is'nt this method using LINQ?
Because, in my deployment scenario there is no 4.0 framework installed. So I may have to target for 3.0 or earlier frameworks.
I can modify this logic to not use LINQ right?
|
|
|
|
|
It uses a Linq method yes - but it was introduced at version 3.5 over 6 years ago!
Yes, you can do it yourself, but it probably won't be as quick (or easy)
Never underestimate the power of stupid things in large numbers
--- Serious Sam
|
|
|
|
|
Yes. But still the client is using VB6 apps and trying to interface them with new .Net services
I'll try both ways and see how the performance is.
|
|
|
|
|
Hi,
Thanks for the post. Its reading real fast but I am getting an error "Cannot read from a closed TextReader".
string origPath = @"D:\Temp\MyHugeText.txt";
string inPath = @"D:\Temp\MyHugeTextIn.txt";
string notInPath = @"D:\Temp\MyHugeTextOut.txt";
var lines = File.ReadLines(origPath);
var isIn = lines.Where(l => l.Contains("raise"));
var notIn = lines.Except(isIn);
File.WriteAllLines(inPath, isIn);
File.WriteAllLines(notInPath, notIn);
BR,
Arjun
|
|
|
|
|
Strange - I just ran it again, and I don't. Have you got another code in there?
Never underestimate the power of stupid things in large numbers
--- Serious Sam
|
|
|
|
|
I have changed only the "Contains" part and the Origpath,InPath and Outpath values.Please see below
string origPath = @"F:\Bharath CS\tickets.txt";
string inPath = @"F:\Bharath CS\ticketsIn.txt";
string notInPath = @"F:\Bharath CS\ticketsOut.txt";
var lines = File.ReadLines(origPath);
var isIn = lines.Where(l => l.Contains("ED5"));
var notIn = lines.Except(isIn);
File.WriteAllLines(inPath, isIn);
File.WriteAllLines(notInPath, notIn);
if I try the above code it did not work.So tried something else and it started to work.This I found on google.
public static IEnumerable<string> MyReadLines(string path)
{
using (var stream = new StreamReader(path))
{
string line;
while ((line = stream.ReadLine()) != null)
{
yield return line;
}
}
}
I used the above method instead of File.ReadLines and it worked.
|
|
|
|
|
 I tried with File.ReadLines(path) and I was able to read it and insert into database.
In order to show progress of reading and inserting into database, I used backgroundworker.Below is my code:
const string dataFile = @"F:\Bharath CS\Document1.txt";
public Form1()
{
InitializeComponent();
InitializeBackgroundWorker();
}
private void InitializeBackgroundWorker()
{
backgroundWorker1.DoWork +=
new DoWorkEventHandler(backgroundWorker1_DoWork);
backgroundWorker1.RunWorkerCompleted +=
new RunWorkerCompletedEventHandler(
backgroundWorker1_RunWorkerCompleted);
backgroundWorker1.ProgressChanged +=
new ProgressChangedEventHandler(
backgroundWorker1_ProgressChanged_1);
}
private void button1_Click(object sender, EventArgs e)
{
backgroundWorker1.RunWorkerAsync();
}
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
int count = 0;
string prev = "";
foreach (string line in File.ReadLines(dataFile))
{
if (backgroundWorker1.CancellationPending)
{
break;
}
backgroundWorker1.ReportProgress(count);
try
{
MySqlConnection conn1 = new MySqlConnection("server=demo;port=3306;database=demodb;userid=xyz;pwd=xyz");
conn1.Open();
MySqlCommand cmd1 = new MySqlCommand();
cmd1.Connection = conn1;
string s = line.Replace("\"", "");
if (s.Length > 0 && !(s.Contains("-")))
{
if (s.Contains("ED5."))
{
cmd1.CommandText = "insert into yashomati_demo values('" + s + "')";
cmd1.ExecuteNonQuery();
s = "";
count++;
}
cmd1.Dispose();
conn1.Close();
conn1.Dispose();
}
}
catch (Exception ex) { throw ex; }
}
}
private void backgroundWorker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
if (e.Cancelled)
{
MessageBox.Show("You've cancelled the backgroundworker!");
}
else
{
progressBar1.Value = 100;
MessageBox.Show("Done");
}
}
private void backgroundWorker1_ProgressChanged_1(object sender, ProgressChangedEventArgs e)
{
progressBar1.Value = e.ProgressPercentage;
}
private void button2_Click(object sender, EventArgs e)
{
backgroundWorker1.CancelAsync();
}
}
But the lines are getting inserted twice.For example if there are 3 lines,all three lines get inserted and again the same three lines get inserted.I mean to say after file is completely read and inserted, again the process of reading and inserting is done once more.
I am not able to find where exactly I am going wrong.
BR,
Arjun
|
|
|
|
|
Two things spring to mind:
Either
1) Your text file contains repeated data
Or
2) You are running the background worker twice.
The second is easy to check, just add a couple of lines to your button1 click event:
if (backgroundWorker1.IsBusy)
{
MessageBox.Show("Already running");
return;
}
I'd start with the first one: Create a dummy file that contains just a dozen lines, and run it into an empty DB. Check the lines it should have against the actual DB table content: if it doesn't duplicate, then you need to look at your actual data and check it for duplicates. (Or modify your code to check for existing values before you insert a new row)
Never underestimate the power of stupid things in large numbers
--- Serious Sam
|
|
|
|
|
OriginalGriff wrote: You are running the background worker twice Isn't that impossible?
From the documentation[^]:
If the background operation is already running, calling RunWorkerAsync again will raise an InvalidOperationException.
|
|
|
|
|
It should be impossible, yes. But you clearly trust the documents more than I do!
(I haven't tried it, it's just the only other way of getting into the code I can think of)
Never underestimate the power of stupid things in large numbers
--- Serious Sam
|
|
|
|
|
Hi,
I wanna ask about your experience please..
if ou had an option to choose between XtraReport and RDLC for our business application? which one you'll decide to use?
Thanks,
Jassim
Technology News @ www.JassimRahma.com
|
|
|
|
|
That is easy. I would choose the one that satisfies the needs of the application.
|
|
|
|
|
HTML.
Then again, would also depend on the business, the expectations of the customers, technical limitations, budget.. If you had to create a report by tomorrow-morning, which of the two would you choose?
Start with that one. You can always "add in" a second choice later
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
|
|
|
|