Click here to Skip to main content
15,881,938 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I have a program that manipulates strings taken from an input file, then writes them to an output file. The files are large, so it takes a long time to run, and it's calculation-bound, not IO-bound. The heart of the process is a For-Next loop that's ripe to be rewritten as a Parallel.For in order to take advantage of the idle processor power.

I'd never written a parallel program, so I read what I could find and experimented, with success. I've written a version of the program that can be run either as a serial For-Next or a Parallel.For (with Lambda expression) with only a couple of comment changes needed to switch from one to the other.

As a serial, it runs perfectly. As a parallel, it misses some of the output and does so inconsistently (meaning the output file sizes are not right and vary from run to run for the same input). This sounds like a thread safety issue to my naive brain, though I could be humming the wrong tune.

The only shared data that is written is an output array of strings. Each loop only writes to one position in it, defined by the counter in the For, so I had thought that would be thread-safe. All other variables that are written are defined inside the Lambda and should be thread-safe, I think.

I've done a lot of experimenting with it, including (but not limited to):

* using a SyncLock when writing to the output array.
* pulling the code in from two small function calls.
* making thread-local copies of all data that is read, not just written.
* using a Synclock when making the thread-local data.

I've obviously blundered somewhere. The code isn't difficult, but it's too long to post, so if I can get some general recommendations on what to try or read, or where to start looking, I'd be thankful.
Posted

The question makes not much sense without showing some code sample.

However, you should understand that parallel processing can give results fully equivalent to the ones for sequential processing only under certain conditions. For the parallel loops, roughly speaking, it means that the result of each iteration should not affect the results of other iterations. One simple example is filling in a previously created array (not collection!) with values, when none of the values depends on the values in other positions of the array.

The related problems are known under the umbrella term "race condition", which I tend to express more precisely as "incorrect dependency on the order of execution". This aspect is totally unrelated to the problem of the locking of the shared resource; that is: you may have locking and yet have the race condition.

(Sorry, editing of links in CodeProject is screwed up somehow. Please see the Wikipedia article "http://en.wikipedia.org/wiki/Race_condition".)

—SA
 
Share this answer
 
v6
Comments
Guitar Zero 23-Sep-14 16:39pm    
Yes, a race condition is created when the results depend on who gets there first, and this sounds like a race. In my case, the only output is for each iteration of the loop to write to its unique position in the output array, which is only written to the file outside the loop, after it has finished. The only thing the iterations of the loop share is the input data, and I've even declared thread-local copies of that and put a SyncLock on when I set them to the (global?) values outside the loop.
Sergey Alexandrovich Kryukov 23-Sep-14 17:02pm    
All right. Follow this logic and come to the conclusions on your particular case.
Will you accept the answer formally (green "Accept" button)?
—SA
OK, so I figured out what was wrong, but I don't know why. The For-Next loop in the middle of this part was not creating bitstr correctly, so non-unique entries were being made, and those were stripped in a subsequent operation in another, outer loop. outG() was correct, but this concatenation method to build the string did not work.
VB
' Convert the output vector back to a character string.
      cc_str = Nothing
      ' Start with the bit string.
      bitstr = bobinbitsl
      For jl = 0 To en0
        bitstr &= CStr(outG(jl, enl))
      Next
      bitstr &= eobinbitsl
      bitcount = bitstr.Length
      If bitcount Mod 6 <> 0 Then Stop ' got a problem

My solution to the problem was to streamline the creation of bitstr and make it at the same time it's creating outG (used for some analysis I didn't include).

I'm still mystified about why that loop fails. All of the variables are thread-local, and the code methods are clumsy but not unconventional. I even tried a SyncLock around the loop, and that didn't help, either. I'll go through this process with a fine-tooth comb and clean it up. I just hope I don't create another weird bug in the process!
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900