|
Thanks, that's the best insight yet.
This is the equivalent of the code I'm using (hopefully more clear):
class Classifier
{
int[] posPoints
int[] negPoints
}
class Image
{
double[,] imageData;
}
void doCalculations(List<Image> images, List<Classifier> classifiers)
{
foreach(Image image in images)
{
foreach(Classifier classifier in classifiers)
{
double value = 0;
foreach(int p in posPoints)
{
value += image.imageData[p];
}
foreach(int p in negPoints)
{
value -= image.imageData[p];
}
}
}
}
This method "doCalculations" would typically be called 500 times during the entire process (so that's where the 500 came from )
In case you were wondering, the reason I need classes is because extra data associated with each image or classifier is stored in the class (which I have omitted here).
tvb
|
|
|
|
|
OK
The calculations are simple enough, it is all about getting the data in.
in its current form this a performance challenge as there is a lot of data involved, both the images (5000 * 576 * 8B = 3MB), and the classifiers (200,000 times 20 ints = 4MB) are almost filling the level2 cache of a modern CPU. And neither is size-dominant right now.
you may not like it much, but for maximum performance you need to reduce the amount of data bytes involved.
Hence:
1. why store pixels as double (i.e. 8B); I trust no physical system can discern more than 2^16 shades of gray, so use (u)shorts for storage and int for calculation? (if that is unacceptable, consider int or float)
2. when images are 576 pixels, why use int (i.e. 4B) indexes in posPoints/negPoints? short could do.
1. reduces the total image size to less than 1 MB, so that now is a prime candidate for your inner loop;
2. reduces the classifier data, making the outer loop faster.
even when classifiers would be somewhat smaller than total images, I would keep them in the outer loop as they have less "locality of reference", since they point to arrays (posPoints/negPoints) that could be anywhere in memory, so better load those only once.
Further ideas:
- drop all the 2D stuff; linearize your images once and for all (probably even when using pointers).
- merge posPoints/negPoints into a single array; maybe use negative indexes for negPoints. Slightly more calculations, better locality. May win, may loose, depends on your typical data.
- this you will hate: merge posPoints/negPoints from several/all classifiers; e.g. put them all in one big array, and let the classifier only hold a begin and end index into that array.
- you haven't revealed what is common among the 500 calculations; if a lot, reorganize to take advantage of that!
Conclusion: give up on some of the simple and clean design and gain a lot in performance.
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Wow, that's a big help!
Unfortunately since I've made so many versions of my code, some of the stuff you've said I've already done (like using the int's i.s.o. doubles).
What you've really helped with is using short's i.s.o. int's (hope this makes significant difference)
Questions:
1. The problem I have with merging the pos/neg points, is that those are index references to the image data (in other words to the pixels). I can't have a negative reference to an array... eg. image.data[-4] would be wrong. Unless you maybe meant something else?
2. Lol, you're right about that second last point. I can't merge my data, since I have for example a "errorRate (double)" associated with each classifier. Would an array[] of classifiers be faster than a List<> of classifiers?
3. Do you see any safe way of implementing parallelism? I've tried Parallel.For(0, bound, x=>{}) instead of the foreach statements for each of the respective loops, but I always get corrupted results. Could you expand on "or better yet have a counter for each thread, then accumulate them when all have finished"?
Oh, and a last point, each of the 500 iterations are called at independent times, so there's nothing in common between them.
tvb
|
|
|
|
|
tvbarnard wrote: some of the stuff you've said I've already done
Now getting good performance is very similar to getting correct results: you may theoretically improve one aspect (fix one bottleneck or one bug) but still not reach the final goal. That does not mean the step you made was right or wrong, it only says you haven't reached the finish yet.
In your case, if you don't need double, don't use them unless comfort (not sure which) is more important than performance. If you apply a sufficient number of optimizations, you will see the improvement; one by one they may not be noticeable or effective at all. That does not mean you should apply them all at once, apply them one by one, make sure it still works correctly, and move on until the goal has been reached (again, same as debugging).
Answers:
1. if using negative indexes for negPoints, that would require code similar to this:
int index=points[i];
if (index>0) value+=data[index];
else value-=data[-index-1];
2.
I did not imply merging the classifiers, only merge their point arrays. So the classifiers continue to exist, except they now hold a start and an end index into the new overall points array, instead of their own posPoints array. Them having a double errorRate doesn't change a thing.
3.
I haven't looked into the official parallel package in any detail. What I mean in principle is this:
let each thread perform its job on its part of the data, and store results in its own variables (one could use "thread local data" for it); then sum all the thread results. This avoids a continuous data contention when all threads would try and increment a single counter. As I said, I don't know how that maps on Parallel.For at all. Anyway unless they did a very good job, I wouldn't use it; what you really need is split the data in N and create N threads, where N is the number of concurrent threads your hardware supports (probably the number of cores).
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Oh I see,
But wouldn't using another statement to check a value's sign require even more computations? Or would the fact that the points are merged outweigh this?
For the threading part, do you think something as follows woould work:
thread1()
{
for(int i = 0, i < classifiers.count/2; i++)
foreach(image)...
}
thread2()
{
for(int i = classifiers.count/2; i < classfiers.count; i++)
foreach(clas
}
I'm not too familiar with threading, so not too sure about the syntax.
tvb
|
|
|
|
|
tvbarnard wrote: require even more computations?
you seem to have missed my point, your code is running slow as it is continuously waiting for data, not because it contains lots of calculations. So adding a few cycles could be compensated for by saving memory or improving locality.
tvbarnard wrote: the threading part
That would be the concept. However you don't have to write the code multiple times; different threads can run the same code, just like you can work in two MS Word documents in parallel without copying winword.exe
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Okay, thanks once again, that helped a lot
tvb
|
|
|
|
|
you're welcome.
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Thought you'd like to know I sat up the whole of last night testing combinations of for-loops/pointers/foreach statements and these were my findings:
Hands down the fastest combination was
int boundC = classifiers.count;
int boundI = images.count;
for(int c = 0; c < boundC; c++)
{
for(int i = 0; i < boundI; i++)
{
}
}
I then tried your idea to combine all images into one array, and use a seperate array to hold each respective image's data.
That reduced the time by about a factor 2! And the best thing with this method was that as I added more images, it reduced the time by an even larger factor (compared to using the List<images>)
I then tried using a parallel loop (threads) and discovered that the only way to protect the data is to use "lock (allClassifiers) { }". Unfortunately, using lock give the same processing power as without using parallel/threads. My assumption is that locking the data prevents simultaneous access, nullifying the use of threads.
tvb
|
|
|
|
|
yep. No big surprise, except locking the classifiers makes no sense to me, as it kills all parallelism and I don't see the need. IMO you either did something wrong, or forgot to tell me something...
did you compact the data with shorts?
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Yes, I used ushorts for all the data. I can't say exactly what the effects were for the classifiers, but making the image data ushorts definitely helped.
Do you maybe know how I could use parallism without locking the data? Locking it seems the only way to keep it safe, and like you say, nullifies the point of using parallism...
tvb
|
|
|
|
|
tvbarnard wrote: how I could use parallism
I already told you I would use N threads (N=#cores), and give them 1/Nth of the classifiers each. I have no experience with Parallel.For
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Yeah, was actually asking if you could elaborate on that, but now that you say it like that I understand
tvb
|
|
|
|
|
tvbarnard wrote: the system uses at least 200,000 classifiers
what is a classifier? what thing in your source code equals 200,000?
can you please explain, maybe give some numeric examples, I want to see to what extent "locality of data" applies.
and how long does it take to do all you want done for just one image?
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Sorry for the confusion.
In context of my system, a classifier would describe the way an image is evaluated and "classified" to contain a certain object in the image.
In my case, a classifier is a set of points in the image that respectively either get added or subtracted and thus return a real value. Using a threshold one can then decide accroding to the real value, whether the image represents that object or not.
For example, say you have a classifier:
int[] classifier = new int[] { 5, 6, 22}
then the image would be evaluated by adding all pixel values at coordinates {5, 6, 22} (assuming the image has been reshaped as a vector, and not a matrix).
Hope this is clear enough!
The point of the whole excercise, is to evaluate a set of classifiers on all images (each) and determine which is the best classifier. So what makes the excercise so computationally intensive, is that there exist at least 200,000 classifiers.
To answer your question on how long it takes to do an image, I haven't time it since it doesn't take very long. But the problem becomes exponentially more difficult if you increase your amount of images to let say 3000. I would estimate 3000 images, with 200,000 classifiers takes about 4min. 4 min doesn't sound long, but this must be done 500 times 500x4min = 33hrs!
The reason why I need to optimize this, is because firstly, it's not suppose to take so long (many papers report much faster times), and I also don't personally have the time to wait 30+hrs.
tvb
|
|
|
|
|
Sorry, I still:
- don't understand how your textual explanation fits your code;
- what "f" represents in your code;
- which array/list/whatever holds a classifier;
- and now also where that "500" is coming from.
And I fail to understand it at the higher level, i.e. what is really happening.
But I gave you another reply about cache efficiency and parallelism errors. Maybe that helps you.
If you need more advice, make sure to show sufficient code and provide explanations that refer to actual variables; in my experience, if you fail to explain it properly, chances are huge that you will fail to make it run efficiently too.
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Very sorry, the problem is that the theory behind the code is very complex, so I try not to refer to the theory too much.
To answer:
1. A classifier is exactly the same as a feature, hence I use the terms interchangeably (sorry bout that)
2. The 500 refers to the amount of iterations the entire process must be carried out (where the entire process refers to the evaluation of all classifiers/features on all images)
Thanks for your willingness to help (I know it's a bit confusing)
If you give me a few minutes I'll come up with a more generic/descriptive code example for you.
tvb
|
|
|
|
|
Sorry, to answer your questions:
I load the images at the start of the program using pointers to traverse the image and store each pixel into the array (one by one). After that, I never reference the files themself again, so how I load the data shouldn't make any difference.
tvb
|
|
|
|
|
tvbarnard wrote: 2. Parrallel processing the images (but I ended up with errors).
What were the errors? How did you code it? Did you use the Parallel Framework[^]
only two letters away from being an asset
|
|
|
|
|
I am not sure which parallel libraries you are refering to, but to be specific, I used the following line of code:
Parallel.For(0, bound, delegate(int i)
{
);
I didn't get errors in compiling, just got errors in my results.
I know my results were wrong, because the results I got were immediately different.
Any other parallel approaches you would suggest?
I also tried using normal referencing instead of pointers but I still got incorrect results.
tvb
|
|
|
|
|
Only one Parallel Framework I'm aware of.
only two letters away from being an asset
|
|
|
|
|
A couple of suggestions...
It takes a lot longer to process doubles than ints. Can you use ints for pixels?
A good algorithm for discovering classifiers is the ID3 algorithm (http://en.wikipedia.org/wiki/ID3_algorithm[^]). It analyzes your training set and automatically produces a decision tree that will classify a new case. Each branch of the decision tree will compare a feature (in this case a pixel) of the new case with a threshold.
|
|
|
|
|
Oh yes, already did that
Thanks, will have a look at the ID3. Unfortunately I'm running out of time, so can't change or add too much code.
tvb
|
|
|
|
|
what is their functions? I am completely confused.
When sending arp message in LAN,Winpcap is very nessecery?
What is about it?
any one help me?
thanks in advance!
|
|
|
|
|
I've had a look at the documentation on their site, and it appears to be all C++ not C#. You will need to write a wrapper and use PInvoke to call the functions and write equivalent data structures etc so that data can be marshalled correctly, unless there is already a .NET wrapper for it such as this[^].
|
|
|
|
|