|
You can share code across threads as much as you like; and you can share read-only data. As soon as shared data starts to change though, you need to take precautions. Having both threads write to a single file is a recipe for trouble. Witgout precautions you might get any of the following:
1,2,3,4,5,50000001,50000002,50000003,50000004,50000005,6...
1,50000001,2,3,500000002,4,5,50000003...
1,5002,3,00001,500000002,4,5,50000003...
the third line should worry you.
Anyway, threads help when the problem is too much calculations (non-blocking CPU activity) or too much uncertainty in the exact sequence of operations (blocking CPU activity as in I/O), they don't help much (or at all) when the problem is memory bandwidth or peripheral bandwidth. Having two threads doing nothing but writing a file does not make sense to me, it will either go wrong or be slower than a single thread doing the same; having multiple threads doing many things including writing to a single file needs a different approach, probably one where data gets collected, not written, by several threads, then filed by a single thread.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Good point and you picked up on the question I had thought of during lunch (obviously still new to multi threading). So what I'm thinking is perhaps two seperate files and then adding the one to the other once they are both done. Which brings me to my next question in this... how am I going to tell when both of these are done... would it be a 3rd function that acts like a master checking on the workers every so many seconds with a worker1 are you done, worker2 are you done? Once completed go do this with the results?
Thanks for the answers thus far they have been really helpful, wish books covered this kind of thing, most just give you some useless example and forget to mention multiple workers or working with classes...
|
|
|
|
|
One of my older articles[^] may give you something to ponder (or give you indigestion ), but I don't think it will help much for the process you describe.
Is this task something you really need to do? Or are you just trying to learn about using threads?
|
|
|
|
|
you could have a common RunWorkerCompleted handler and organize a counter of outstanding BGW's, which you would increment before launching a BGW, and decrement in the Completed handler (that would all be on the same thread), so a volatile numeric value type would behave as you'd expect. Alternatively you could use the Interlocked class and use its Increment and Decrement methods.
However you don't have to gather both halves of the output and then combine them though; it suffices to gather a reasonable amount (say 100KB) of output (in each thread separately) and then output it, while holding a unique lock to avoid problems without paying too much overhead.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
If you are looking to speed up file system access, the short answer is that you're not going to be able to. If you write to 2 files, you're just going to slow the process down, because now the hard drive will have to seek between each file rather than stream to a single file. If they were on different hard drives, that might work, but then there'd be no way to combine them into a single file in a quick manner (well, maybe a virtual file system of some sort could help with that, but that's probably not something you're going to be considering).
|
|
|
|
|
Luc Pattyn wrote: Having two threads doing nothing but writing a file does not make sense to me
My thoughts exactly.
The actual I/O to the disk is the primary bottleneck in the process described, so I suspect doubt that even having each thread writing its own file would help.
modified on Thursday, February 10, 2011 12:57 PM
|
|
|
|
|
Ok I just put this to the last one to make it easier... all of you have good points so I'll put this into one message.
First this is for a application that will be used by about 3 people, so do I have to have it threaded no. I've done a few apps in VB but they were all just a basic single worker to report back progress to the GUI in the form of a progress bar, nothing involved. So yes I'm using this one as a learning exp and to help speed it up some (it works fine now but takes a long time (over 2 min to run right now)). Now I know 2+ min isn't a long time for some things, but I already have it down to 1:45 with just the 2 workers I have going now. However the longest part of the app is this part I'm trying to do now which is aroun 1:10 (if I remember right, it's been a day or so since I last ran it with no threads).
Next: is there a way to lock a file like you can with a database table? This way I'd be able to check to see if that file is locked, if so wait what ever number of seconds and try again, or just keep beating away till it's free?
The other option of writing to two seperate files has another exciting part, the file/files are on a network drive. As of now it does write pretty quick, but that is writing to just one file.
I think I got everything there... I'll keep plugging away in the mean time.
|
|
|
|
|
Ok just ran into something interesting, I'm trying to add the 3rd worker to my form and it's telling me it doesn't exist in the current context:
public Form1()
{
InitializeComponent();
InitializeBackgroundWorker();
InitializeBackgroundWorker2();
InitializeBackgroundWorker3();
and then here
private void InitializeBackgroundWorker2()
{
backgroundWorker2.DoWork += new DoWorkEventHandler(backgroundWorker2_DoWork);
backgroundWorker2.RunWorkerCompleted += new RunWorkerCompletedEventHandler(backgroundWorker2_RunWorkerCompleted);
backgroundWorker2.ProgressChanged += new ProgressChangedEventHandler(backgroundWorker2_ProgressChanged);
}
private void InitializebackgroundWorker3()
{
backgroundWorker3.DoWork += new DoWorkEventHandler(backgroundWorker3_DoWork);
backgroundWorker3.RunWorkerCompleted += new RunWorkerCompletedEventHandler(backgroundWorker3_RunWorkerCompleted);
backgroundWorker3.ProgressChanged += new ProgressChangedEventHandler(backgroundWorker3_ProgressChanged);
}
As you can see there is nothing different between them, I'm a bit confused...
modified on Thursday, February 10, 2011 2:20 PM
|
|
|
|
|
I dunno, I don't think I've seen that error.
Where are backgroundWorker2 and backgroundWorker3 declared?
|
|
|
|
|
namespace MailTest
{
public partial class Form1 : Form
{
string strServer;
string strPort;
string strUser;
string strPassword;
string strOutput;
public Form1()
{
InitializeComponent();
InitializeBackgroundWorker();
InitializeBackgroundWorker2();
InitializeBackgroundWorker3();
btnGetMessageInfo.Enabled = false;
btnCancelConnection.Enabled = false;
}
That is where they are "Initialized"... and then further down are the private void InitializeBackgroundWorker3() for each of them.
Though I think I could just do the follwoing where you have InitializeBackgroundWorker(); & InitializeBackgroundWorker2(); & InitializeBackgroundWorker3();
Backgroundworker bgw1();
Backgroundworker bgw2();
......
the working example I found for C# has it as I have it now, VB I have it like the section I just posted...
|
|
|
|
|
MacRaider4 wrote: As you can see there is nothing different between them
My sight tells otherwise.
I suggest you start believing the error messages you are getting.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Luc Pattyn wrote: My sight tells otherwise.
I suggest you start believing the error messages you are getting.
Ok I've been staring at this for much too long, and have looked at it many times and I'm not seeing the difference other than one has a 2 and the other a 3.
|
|
|
|
|
one BaCKgroUNDworKEr isn't the other.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
MacRaider4 wrote: ran it with no threads
How'd you manage that?
I guess I'd need a higher-level view of the process. The data gathering may benefit from multi-threading, but the writing to disk is less likely to, so somehow have the data-gathering threads pass the gathered data to the writing thread. There are a number of ways to accomplish that.
|
|
|
|
|
I copied the original application and then created another that was threaded... and gave it another name.
Actually the initial gathering takes about 2 seconds, the parsing of the data and the writing is what takes forever. I've actually decided that I'm going to split up the file creation part into two workers as well. Ok I'll explain what I'm doing in full maybe that will help...
- This app logs onto a mail server
- reads the number of messages on it and reports the number
- then it writes the information I need from those messages to textfiles (currently just reports progress back from a thread)
- Then it goes through those files and parses the information even more thinning down the data and writes that to the csv file
it does more after that but that's working fine so far
Does this help more? And I thought the POP stuff was hard.
|
|
|
|
|
Rather than having all those text files, maybe you should consider having a single database, storing the relevant data in appropriately typed fields as soon as possible.
Anyway, if parsing text files is taking that long, I'd venture you're doing it wrong. I wouldn't be surprised if you were using lots of regexes, a prime tool for slowing down and obfuscating your intentions.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
What's taking so long is I have to search for particular lines in the email and they aren't always in the same order, same line or anything. So I have to search for the messageID, From line, the subject line, the actual text of the email, if there is a attachment and what the name of said attachment could be, then have to watch out for the end of email marker or if I found evenything then I just end it there. So yes there is some regexes in there but I think only a few lines for the one thing I'm looking for (I'd need to go look at that part again to see what it's for).
Though I am doing that line by line, isn't there a way with the streamreader to search for what ever it is you are looking for (lets say I need to find "From: " not "Received: from "). This is why I'm doing it line by line as I'm able to look at the start of the line and determine if that is what I need. This is where I'm writing that "master file" rather than the individual ones.
I could probably skip the write to the csv file and just go straight to the database, but for some reason when I first wrote this 9 months ago I had some problems with something and thus the writing to the csv file (I don't remember what they were).
This is also my first large application in C#, up till 1 1/2 years ago I was mainly doing VB, VBA & Access in M$ land.
|
|
|
|
|
one should not perform multiple passes on a (text) file, just read it once; or read a part, skip some, read some more, and never go back. Anything else is bound to be slow. If you need searching back and forth, store it all in memory or use a database. From your description it really sounds like a DB is in order. IMO you should thoroughly rethink the whole approach; a sub-optimal approach will not get fixed by throwing in some multi-threading.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Isn't some of that available as properties or something?
Without having seen what the emails look like, I'd recommend reading the whole email into one string and using one RegEx to extract what you need.
|
|
|
|
|
Luc Pattyn wrote: consider having a single database
I concur.
Luc Pattyn wrote: lots of regexes
Hadn't thought of that, but yeah, good point.
|
|
|
|
|
PIEBALDconsult wrote: Luc Pattyn wrote:
lots of regexes
here are the only lines
Regex objLongDollar = new Regex("\\d+,\\d+\\.\\d+");
Regex objNumRd = new Regex("\\d+rd");
Regex objNumSt = new Regex("\\d+st");
This is in the parse message section...
|
|
|
|
|
The result will depend on how large the search object is, and how often you execute such regexes.
When I care about performance, I avoid the Regex class. I use string methods, maybe a StringBuilder, maybe a character array, maybe several nested loops, but no regexes. Regexes are good for compact code when performance does not matter at all, and readability is not a primary concern either.
Here[^] is the report on a little experiment I once performed.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
That seems reasonable. I don't know anything about reading messages from a mail server, but...
Each thread can read and parse one message and report back the result to be written. Whether or not the thread also downloads the message I don't know, but that should be doable.
So you can have a class that distributes work to a bunch of threads.
The process on the thread performs the work and reports back when finished.
For writing, you can have an event handler that locks a stream when it writes.
|
|
|
|
|
I don't "download" the message, just read it from the server and write the needed info to the file.
This is an example of the first section of the mail that I'm working with...
+OK 670581 octets
Return-Path: <email address>
Received: from hrndva-omtalb.email host([ip address])
by hrndva-imta01.email hostwith ESMTP
id <20100324163246244.LLFZ11363@hrndva-imta01.email host>
for <email it's going to>; Wed, 24 Mar 2010 16:32:46 +0000
Return-Path: <email address>
X-Authority-Analysis: v=1.0 c=1 a=Y--C8wIrtp4A:10 a=ed-Ggqp32-PxgnFQ28IA:9 a=gFYqYUHr3cvJf5tUtWv3jj12YwYA:4 a=wPNLvfGTeEIA:10 a=SSmOFEACAAAA:8 a=Xz8RjLcVAAAA:8 a=bvyAQD6M8USi_luE8VwA:9 a=zkXRgtjM-mmOsYMX5XAA:7 a=lSBj04H3UYGbvfZ5gLKUj7ga3v4A:4 a=TQY7aazGoy4vupPYzM8A:9 a=A9QQSRYdmLSsclXicRDPQfuie2oA:4 a=IKIoO-ieCDEA:10 a=l42U5Vqe35IA:10 a=OU-3oeRcviPOZ7V7:21 a=r3OCwUNA-PGXxiAt:21
X-Cloudmark-Score: 0
X-Originating-IP: IP Address
Received: from [IP Address] ([IP Address] helo=computer it's from (I think))
by hrndva-oedge02.email host (envelope-from <email address>)
(ecelerity 2.2.2.39 r()) with ESMTP
id 8E/A4-28072-8AE3AAB4; Wed, 24 Mar 2010 16:32:45 +0000
Received: from 127.0.0.1 (AVG SMTP 8.5.437 [271.1.1/2767]); Wed, 24 Mar 2010 12:31:41 -0500
Message-ID: <006301cacb77$ddf18460$ae02a8c0@pc it's from>
From: "Name" <emailaddress>
To: "person it's going to" <their email>
Subject: kinda obvious but using this data
Date: Wed, 24 Mar 2010 12:31:41 -0500
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_005F_01CACB4D.F4FC82B0"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1983
Disposition-Notification-To: "Person from" <email address>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1983
There is a lot more after that but should give you an idea... hope I replaced all the stuff I should have
|
|
|
|
|
Ok so to speed up that section you are suggesting to create a class that passes work to lets say 4 background workers? That sounds really good, but I've never done anything like that and how would I then return that info back to the Form? With a return? I would also loose my updates on the progressBar would I not?
Ok my brain is really starting to hurt now, thankfully I've only got 10 min left in my day right now... will have to get back to this tomorrow!
Thank you all for everything thus far...
|
|
|
|
|