|
Luc Pattyn wrote: Having two threads doing nothing but writing a file does not make sense to me
My thoughts exactly.
The actual I/O to the disk is the primary bottleneck in the process described, so I suspect doubt that even having each thread writing its own file would help.
modified on Thursday, February 10, 2011 12:57 PM
|
|
|
|
|
Ok I just put this to the last one to make it easier... all of you have good points so I'll put this into one message.
First this is for a application that will be used by about 3 people, so do I have to have it threaded no. I've done a few apps in VB but they were all just a basic single worker to report back progress to the GUI in the form of a progress bar, nothing involved. So yes I'm using this one as a learning exp and to help speed it up some (it works fine now but takes a long time (over 2 min to run right now)). Now I know 2+ min isn't a long time for some things, but I already have it down to 1:45 with just the 2 workers I have going now. However the longest part of the app is this part I'm trying to do now which is aroun 1:10 (if I remember right, it's been a day or so since I last ran it with no threads).
Next: is there a way to lock a file like you can with a database table? This way I'd be able to check to see if that file is locked, if so wait what ever number of seconds and try again, or just keep beating away till it's free?
The other option of writing to two seperate files has another exciting part, the file/files are on a network drive. As of now it does write pretty quick, but that is writing to just one file.
I think I got everything there... I'll keep plugging away in the mean time.
|
|
|
|
|
Ok just ran into something interesting, I'm trying to add the 3rd worker to my form and it's telling me it doesn't exist in the current context:
public Form1()
{
InitializeComponent();
InitializeBackgroundWorker();
InitializeBackgroundWorker2();
InitializeBackgroundWorker3();
and then here
private void InitializeBackgroundWorker2()
{
backgroundWorker2.DoWork += new DoWorkEventHandler(backgroundWorker2_DoWork);
backgroundWorker2.RunWorkerCompleted += new RunWorkerCompletedEventHandler(backgroundWorker2_RunWorkerCompleted);
backgroundWorker2.ProgressChanged += new ProgressChangedEventHandler(backgroundWorker2_ProgressChanged);
}
private void InitializebackgroundWorker3()
{
backgroundWorker3.DoWork += new DoWorkEventHandler(backgroundWorker3_DoWork);
backgroundWorker3.RunWorkerCompleted += new RunWorkerCompletedEventHandler(backgroundWorker3_RunWorkerCompleted);
backgroundWorker3.ProgressChanged += new ProgressChangedEventHandler(backgroundWorker3_ProgressChanged);
}
As you can see there is nothing different between them, I'm a bit confused...
modified on Thursday, February 10, 2011 2:20 PM
|
|
|
|
|
I dunno, I don't think I've seen that error.
Where are backgroundWorker2 and backgroundWorker3 declared?
|
|
|
|
|
namespace MailTest
{
public partial class Form1 : Form
{
string strServer;
string strPort;
string strUser;
string strPassword;
string strOutput;
public Form1()
{
InitializeComponent();
InitializeBackgroundWorker();
InitializeBackgroundWorker2();
InitializeBackgroundWorker3();
btnGetMessageInfo.Enabled = false;
btnCancelConnection.Enabled = false;
}
That is where they are "Initialized"... and then further down are the private void InitializeBackgroundWorker3() for each of them.
Though I think I could just do the follwoing where you have InitializeBackgroundWorker(); & InitializeBackgroundWorker2(); & InitializeBackgroundWorker3();
Backgroundworker bgw1();
Backgroundworker bgw2();
......
the working example I found for C# has it as I have it now, VB I have it like the section I just posted...
|
|
|
|
|
MacRaider4 wrote: As you can see there is nothing different between them
My sight tells otherwise.
I suggest you start believing the error messages you are getting.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Luc Pattyn wrote: My sight tells otherwise.
I suggest you start believing the error messages you are getting.
Ok I've been staring at this for much too long, and have looked at it many times and I'm not seeing the difference other than one has a 2 and the other a 3.
|
|
|
|
|
one BaCKgroUNDworKEr isn't the other.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
MacRaider4 wrote: ran it with no threads
How'd you manage that?
I guess I'd need a higher-level view of the process. The data gathering may benefit from multi-threading, but the writing to disk is less likely to, so somehow have the data-gathering threads pass the gathered data to the writing thread. There are a number of ways to accomplish that.
|
|
|
|
|
I copied the original application and then created another that was threaded... and gave it another name.
Actually the initial gathering takes about 2 seconds, the parsing of the data and the writing is what takes forever. I've actually decided that I'm going to split up the file creation part into two workers as well. Ok I'll explain what I'm doing in full maybe that will help...
- This app logs onto a mail server
- reads the number of messages on it and reports the number
- then it writes the information I need from those messages to textfiles (currently just reports progress back from a thread)
- Then it goes through those files and parses the information even more thinning down the data and writes that to the csv file
it does more after that but that's working fine so far
Does this help more? And I thought the POP stuff was hard.
|
|
|
|
|
Rather than having all those text files, maybe you should consider having a single database, storing the relevant data in appropriately typed fields as soon as possible.
Anyway, if parsing text files is taking that long, I'd venture you're doing it wrong. I wouldn't be surprised if you were using lots of regexes, a prime tool for slowing down and obfuscating your intentions.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
What's taking so long is I have to search for particular lines in the email and they aren't always in the same order, same line or anything. So I have to search for the messageID, From line, the subject line, the actual text of the email, if there is a attachment and what the name of said attachment could be, then have to watch out for the end of email marker or if I found evenything then I just end it there. So yes there is some regexes in there but I think only a few lines for the one thing I'm looking for (I'd need to go look at that part again to see what it's for).
Though I am doing that line by line, isn't there a way with the streamreader to search for what ever it is you are looking for (lets say I need to find "From: " not "Received: from "). This is why I'm doing it line by line as I'm able to look at the start of the line and determine if that is what I need. This is where I'm writing that "master file" rather than the individual ones.
I could probably skip the write to the csv file and just go straight to the database, but for some reason when I first wrote this 9 months ago I had some problems with something and thus the writing to the csv file (I don't remember what they were).
This is also my first large application in C#, up till 1 1/2 years ago I was mainly doing VB, VBA & Access in M$ land.
|
|
|
|
|
one should not perform multiple passes on a (text) file, just read it once; or read a part, skip some, read some more, and never go back. Anything else is bound to be slow. If you need searching back and forth, store it all in memory or use a database. From your description it really sounds like a DB is in order. IMO you should thoroughly rethink the whole approach; a sub-optimal approach will not get fixed by throwing in some multi-threading.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Isn't some of that available as properties or something?
Without having seen what the emails look like, I'd recommend reading the whole email into one string and using one RegEx to extract what you need.
|
|
|
|
|
Luc Pattyn wrote: consider having a single database
I concur.
Luc Pattyn wrote: lots of regexes
Hadn't thought of that, but yeah, good point.
|
|
|
|
|
PIEBALDconsult wrote: Luc Pattyn wrote:
lots of regexes
here are the only lines
Regex objLongDollar = new Regex("\\d+,\\d+\\.\\d+");
Regex objNumRd = new Regex("\\d+rd");
Regex objNumSt = new Regex("\\d+st");
This is in the parse message section...
|
|
|
|
|
The result will depend on how large the search object is, and how often you execute such regexes.
When I care about performance, I avoid the Regex class. I use string methods, maybe a StringBuilder, maybe a character array, maybe several nested loops, but no regexes. Regexes are good for compact code when performance does not matter at all, and readability is not a primary concern either.
Here[^] is the report on a little experiment I once performed.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
That seems reasonable. I don't know anything about reading messages from a mail server, but...
Each thread can read and parse one message and report back the result to be written. Whether or not the thread also downloads the message I don't know, but that should be doable.
So you can have a class that distributes work to a bunch of threads.
The process on the thread performs the work and reports back when finished.
For writing, you can have an event handler that locks a stream when it writes.
|
|
|
|
|
I don't "download" the message, just read it from the server and write the needed info to the file.
This is an example of the first section of the mail that I'm working with...
+OK 670581 octets
Return-Path: <email address>
Received: from hrndva-omtalb.email host([ip address])
by hrndva-imta01.email hostwith ESMTP
id <20100324163246244.LLFZ11363@hrndva-imta01.email host>
for <email it's going to>; Wed, 24 Mar 2010 16:32:46 +0000
Return-Path: <email address>
X-Authority-Analysis: v=1.0 c=1 a=Y--C8wIrtp4A:10 a=ed-Ggqp32-PxgnFQ28IA:9 a=gFYqYUHr3cvJf5tUtWv3jj12YwYA:4 a=wPNLvfGTeEIA:10 a=SSmOFEACAAAA:8 a=Xz8RjLcVAAAA:8 a=bvyAQD6M8USi_luE8VwA:9 a=zkXRgtjM-mmOsYMX5XAA:7 a=lSBj04H3UYGbvfZ5gLKUj7ga3v4A:4 a=TQY7aazGoy4vupPYzM8A:9 a=A9QQSRYdmLSsclXicRDPQfuie2oA:4 a=IKIoO-ieCDEA:10 a=l42U5Vqe35IA:10 a=OU-3oeRcviPOZ7V7:21 a=r3OCwUNA-PGXxiAt:21
X-Cloudmark-Score: 0
X-Originating-IP: IP Address
Received: from [IP Address] ([IP Address] helo=computer it's from (I think))
by hrndva-oedge02.email host (envelope-from <email address>)
(ecelerity 2.2.2.39 r()) with ESMTP
id 8E/A4-28072-8AE3AAB4; Wed, 24 Mar 2010 16:32:45 +0000
Received: from 127.0.0.1 (AVG SMTP 8.5.437 [271.1.1/2767]); Wed, 24 Mar 2010 12:31:41 -0500
Message-ID: <006301cacb77$ddf18460$ae02a8c0@pc it's from>
From: "Name" <emailaddress>
To: "person it's going to" <their email>
Subject: kinda obvious but using this data
Date: Wed, 24 Mar 2010 12:31:41 -0500
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_005F_01CACB4D.F4FC82B0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1983
Disposition-Notification-To: "Person from" <email address>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1983
There is a lot more after that but should give you an idea... hope I replaced all the stuff I should have
|
|
|
|
|
Ok so to speed up that section you are suggesting to create a class that passes work to lets say 4 background workers? That sounds really good, but I've never done anything like that and how would I then return that info back to the Form? With a return? I would also loose my updates on the progressBar would I not?
Ok my brain is really starting to hurt now, thankfully I've only got 10 min left in my day right now... will have to get back to this tomorrow!
Thank you all for everything thus far...
|
|
|
|
|
MacRaider4 wrote: return that info back to the Form
Well, I question the use of a form at all; I'd use a Windows Service, but that's just me. You can have a Service that pulls the data into the database and then the form pulls it (already fluffed and folded) from there.
Or you could use an event.
|
|
|
|
|
I'll have to look up services as I've never done anything with that before. Though I will say doing this project has made me a better C# programmer, at this rate in another year I'll be answering some of these questions for other people.
Some one else mentioned just doing this all in one pass, now that I'm looking back at my code I think that is a very good idea. Is this something I could do with the service or event?
I could then do my initial pass to get the number, then have a couple workers work on the list storing the data in arrays. Once those are done combine the arrays or better yet just have the arrays loaded stright into the database which should take no time at all even with checking to make sure that message isn't already there?
|
|
|
|
|
MacRaider4 wrote: have the arrays loaded stright into the database
Right. The Service would periodically (once a minute?) query the email server for messages, if there are some, get them, process them, and stick the results in the database. You could still use a thread to process each message in necessary.
Depending on your needs, you could then have the same Service host a WCF Web Service that your client application can use to get the data.
|
|
|
|
|
That was my original intent once I got it working, just didn't know about the service part.
So let me see if I have this right now:
1. Log into the server and get the number of messages
2. Decide if I need to use a bgw and how many
3. Do the work with no or a couple workers:
a. have the worker/s log in with the number of account/s each is processing
b. process the "entire" message all at once and store in an array
4. Update the database
5. Have the form check fo updates?
Do this sound about right?
|
|
|
|
|
Yeah, basically. But remember that I'm not familiar with reading messages from an email server, so I don't understand the "a. have the worker/s log in with the number of account/s each is processing" part.
I would have a worker read a message, process it, and stick it in the database; then maybe get another.
Or read all the available messages and pass them to the workers.
There are many ways to skin this cat.
|
|
|
|