|
dokhinahaoa wrote: do you know any good parsers name
I don't name the parsers I create; all it takes is some ten lines of code: scan the string for begin and end of identifier, then look it up in a replacement dictionary.
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
OOPs sorry...
Can you share your parser code an how to use it??
I will be grateful to you...
|
|
|
|
|
dokhinahaoa wrote: see here empno,empname,deptname,sal is one word. Because there is no space here
Then seperate it by comma not space
From all of you're posts it's clear you really have no clue what you are doing. I would suggest you turn this project over to someone who does.
only two letters away from being an asset
|
|
|
|
|
dokhinahaoa wrote: That means after operation my the sql should be as follows:
select eno, empname, deptname, sal from employee,department where employee.deptid=department.deptid
dokhinahaoa wrote: I have to do it using c#.
dokhinahaoa wrote: I have the replace/ create a new sql select statement by...
These statements contradict each other. What do you actually want?
only two letters away from being an asset
|
|
|
|
|
You can think the sql as a string here. Just the format of the string is like an sql select statement. Because i m not sending the sql to the database before formatting as i mentioned. And I have to do the formatting using c#.
|
|
|
|
|
Good day
This is a relatively advanced problem...
I am writing training software for image recognition.
When the training program starts, it loads a set of a few thousand (roughly 5000) images of size 24x24 pixels (greyscale).
I decided to load all images into my own class type, using a list. In other words, if my class is called "class Image", i have a list: List.
The image data is stored as a 2D double array in the class.
On each training round I need to process all images, and I am finding this is taking very long. Is there a way to make processing faster?
This is what I have already tried without much reduction in time:
1. Using unsafe code and pointers to access each image object.
2. Parrallel processing the images (but I ended up with errors).
3. Using a "for loop" instead of "foreach" to iterate through the list (not much improvement).
I am also considering using structs instead of classes to store the images, but not sure if this would overload the stack?
Please ask if any more information needed (I didn't give more since there is too much to say).
Any help would be appreciated please
tvb
|
|
|
|
|
Do you really need all those images in memory to be processed all at once? Even with just one byte per pixel, 24 x 24 x 5000 is almost 3GBMB
modified on Sunday, October 18, 2009 9:22 AM
|
|
|
|
|
Thanks for the reply.
I think it would be 2.8Meg? 24x24x5000= 2.8+E6
Memory isn't my biggest problem, since I'm using 4G of ram.
The reason I load them all into memory is that I assumed it would be faster. Do you have any other suggestions?
tvb
|
|
|
|
|
That's what happens when I try to think on a Sunday!
|
|
|
|
|
tvbarnard wrote: The image data is stored as a 2D double array in the class.
Something like this?
List<MyImage> myImages;
class MyImage
{
Double[,] data;
} How are these created, and how does the Image get into this array? Have you tried timing the code, to see where the bottlenecks are?
I are Troll
|
|
|
|
|
Yes, that would be how it's done.
I have used a profiler, and it shows that my bottleneck is when I am processing each image array.
The code is very abstract, but I'll post it in any case.
unsafe
{
fixed (int* pPosPoints = allFeatures[f].pointsPos, pNegPoints = allFeatures[f].pointsNeg)
{
for (int iP = 0; iP < boundPosImages; iP++)
{
featureValue = 0;
fixed (double* pIntVec = posTrainingSet[iP].imageIntegralVectorDouble)
{
for (int i = 0; i < boundPointsPos; i++)
{
featureValue += *( pIntVec + *( pPosPoints + i ) );
}
for (int i = 0; i < boundPointsNeg; i++)
{
featureValue -= *( pIntVec + *( pNegPoints + i ) );
}
if (allFeatures[f].polarity)
{
if (featureValue < allFeatures[f].threshold)
allFeatures[f].weightedError += posTrainingSet[iP].weight;
}
else if (allFeatures[f].threshold < featureValue)
allFeatures[f].weightedError += posTrainingSet[iP].weight;
}
}
To explain: the "imageIntegralVectorDouble" is my image array which I have reshaped to a vector (so the vector is size 24x24= 576).
What is happening here is I am using a classifer that contains two arrays of points ("allFeatures[f].pointsPos" & "allFeatures[f].pointsNeg") that need to be either added or subtracted in that image. In other words each point is a location reference.
In this specific code, I decided to use pointers to reference the images and points (as can hopefully be seen)
What I might have left out from my original question (in order to prevent confusion) is that the system uses at least 200,000 classifiers, implying - each image has to be evaluated by all of those classifiers. Which makes the reason for trying to optimize the code fairly obvious.
Most of the processing time is spent on the first two for-loops. I suppose this is the area to try and optimize.
tvb
|
|
|
|
|
tvbarnard wrote: Most of the processing time is spent on the first two for-loops. I suppose this is the area to try and optimize.
I have no idea how you'd optimize the code within those loops. Have you tried doing them in parallel[^]?
I are Troll
|
|
|
|
|
I actually did, but somehow my resutls are different when I use parallel, which means they are incorrect. I don't know how how one would go about debugging this...
(Btw, the parallel significantly improves performance).
Maybe you can tell me which would be better:
1. run for each image(5000 loops), compute classifiers(200,000 loops), or
2. for each classifer(200,000 loops), compute each image(5000 loops)
I don't see how it would make a difference, unless one of the two is stored on the heap instead of the stack.
tvb
|
|
|
|
|
Hi,
yes it can make a big difference, it is all about cache efficiency.
assume you have two arrays a[] and b[], and you are asked to count how many elements appear in both.
count=0;
foreach(int ia in a) {
foreach(int ib in b) {
if (ia==ib) {
count++;
break;
}
}
Lets first assume there aren't many matches, a[] is small (say 1KB) and b[] is large (say 100MB).
So the inner loop will read most of the large array on each outer iteration, hence all of b gets loaded from memory to cache over and over, the cache isn't really working.
Now assume a[] is large (100MB) and b[] is small (1KB); the outer loop works its way through the large array, but does so only once, while the inner loop is getting all of a[] from the cache. This will be faster by several orders of magnitude.
BTW: if you run the above code in a parallized way without taking any precautions, the result may be wrong, as the threads all are operating on the single variable "count". You would have to remedy by:
- locking the variable (bad idea, takes resources and will slow down);
- using Interlocked.Increment()
- or better yet have a counter for each thread, then accumulate them when all have finished
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Thanks, that's the best insight yet.
This is the equivalent of the code I'm using (hopefully more clear):
class Classifier
{
int[] posPoints
int[] negPoints
}
class Image
{
double[,] imageData;
}
void doCalculations(List<Image> images, List<Classifier> classifiers)
{
foreach(Image image in images)
{
foreach(Classifier classifier in classifiers)
{
double value = 0;
foreach(int p in posPoints)
{
value += image.imageData[p];
}
foreach(int p in negPoints)
{
value -= image.imageData[p];
}
}
}
}
This method "doCalculations" would typically be called 500 times during the entire process (so that's where the 500 came from )
In case you were wondering, the reason I need classes is because extra data associated with each image or classifier is stored in the class (which I have omitted here).
tvb
|
|
|
|
|
OK
The calculations are simple enough, it is all about getting the data in.
in its current form this a performance challenge as there is a lot of data involved, both the images (5000 * 576 * 8B = 3MB), and the classifiers (200,000 times 20 ints = 4MB) are almost filling the level2 cache of a modern CPU. And neither is size-dominant right now.
you may not like it much, but for maximum performance you need to reduce the amount of data bytes involved.
Hence:
1. why store pixels as double (i.e. 8B); I trust no physical system can discern more than 2^16 shades of gray, so use (u)shorts for storage and int for calculation? (if that is unacceptable, consider int or float)
2. when images are 576 pixels, why use int (i.e. 4B) indexes in posPoints/negPoints? short could do.
1. reduces the total image size to less than 1 MB, so that now is a prime candidate for your inner loop;
2. reduces the classifier data, making the outer loop faster.
even when classifiers would be somewhat smaller than total images, I would keep them in the outer loop as they have less "locality of reference", since they point to arrays (posPoints/negPoints) that could be anywhere in memory, so better load those only once.
Further ideas:
- drop all the 2D stuff; linearize your images once and for all (probably even when using pointers).
- merge posPoints/negPoints into a single array; maybe use negative indexes for negPoints. Slightly more calculations, better locality. May win, may loose, depends on your typical data.
- this you will hate: merge posPoints/negPoints from several/all classifiers; e.g. put them all in one big array, and let the classifier only hold a begin and end index into that array.
- you haven't revealed what is common among the 500 calculations; if a lot, reorganize to take advantage of that!
Conclusion: give up on some of the simple and clean design and gain a lot in performance.
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Wow, that's a big help!
Unfortunately since I've made so many versions of my code, some of the stuff you've said I've already done (like using the int's i.s.o. doubles).
What you've really helped with is using short's i.s.o. int's (hope this makes significant difference)
Questions:
1. The problem I have with merging the pos/neg points, is that those are index references to the image data (in other words to the pixels). I can't have a negative reference to an array... eg. image.data[-4] would be wrong. Unless you maybe meant something else?
2. Lol, you're right about that second last point. I can't merge my data, since I have for example a "errorRate (double)" associated with each classifier. Would an array[] of classifiers be faster than a List<> of classifiers?
3. Do you see any safe way of implementing parallelism? I've tried Parallel.For(0, bound, x=>{}) instead of the foreach statements for each of the respective loops, but I always get corrupted results. Could you expand on "or better yet have a counter for each thread, then accumulate them when all have finished"?
Oh, and a last point, each of the 500 iterations are called at independent times, so there's nothing in common between them.
tvb
|
|
|
|
|
tvbarnard wrote: some of the stuff you've said I've already done
Now getting good performance is very similar to getting correct results: you may theoretically improve one aspect (fix one bottleneck or one bug) but still not reach the final goal. That does not mean the step you made was right or wrong, it only says you haven't reached the finish yet.
In your case, if you don't need double, don't use them unless comfort (not sure which) is more important than performance. If you apply a sufficient number of optimizations, you will see the improvement; one by one they may not be noticeable or effective at all. That does not mean you should apply them all at once, apply them one by one, make sure it still works correctly, and move on until the goal has been reached (again, same as debugging).
Answers:
1. if using negative indexes for negPoints, that would require code similar to this:
int index=points[i];
if (index>0) value+=data[index];
else value-=data[-index-1];
2.
I did not imply merging the classifiers, only merge their point arrays. So the classifiers continue to exist, except they now hold a start and an end index into the new overall points array, instead of their own posPoints array. Them having a double errorRate doesn't change a thing.
3.
I haven't looked into the official parallel package in any detail. What I mean in principle is this:
let each thread perform its job on its part of the data, and store results in its own variables (one could use "thread local data" for it); then sum all the thread results. This avoids a continuous data contention when all threads would try and increment a single counter. As I said, I don't know how that maps on Parallel.For at all. Anyway unless they did a very good job, I wouldn't use it; what you really need is split the data in N and create N threads, where N is the number of concurrent threads your hardware supports (probably the number of cores).
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Oh I see,
But wouldn't using another statement to check a value's sign require even more computations? Or would the fact that the points are merged outweigh this?
For the threading part, do you think something as follows woould work:
thread1()
{
for(int i = 0, i < classifiers.count/2; i++)
foreach(image)...
}
thread2()
{
for(int i = classifiers.count/2; i < classfiers.count; i++)
foreach(clas
}
I'm not too familiar with threading, so not too sure about the syntax.
tvb
|
|
|
|
|
tvbarnard wrote: require even more computations?
you seem to have missed my point, your code is running slow as it is continuously waiting for data, not because it contains lots of calculations. So adding a few cycles could be compensated for by saving memory or improving locality.
tvbarnard wrote: the threading part
That would be the concept. However you don't have to write the code multiple times; different threads can run the same code, just like you can work in two MS Word documents in parallel without copying winword.exe
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Okay, thanks once again, that helped a lot
tvb
|
|
|
|
|
you're welcome.
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Thought you'd like to know I sat up the whole of last night testing combinations of for-loops/pointers/foreach statements and these were my findings:
Hands down the fastest combination was
int boundC = classifiers.count;
int boundI = images.count;
for(int c = 0; c < boundC; c++)
{
for(int i = 0; i < boundI; i++)
{
}
}
I then tried your idea to combine all images into one array, and use a seperate array to hold each respective image's data.
That reduced the time by about a factor 2! And the best thing with this method was that as I added more images, it reduced the time by an even larger factor (compared to using the List<images>)
I then tried using a parallel loop (threads) and discovered that the only way to protect the data is to use "lock (allClassifiers) { }". Unfortunately, using lock give the same processing power as without using parallel/threads. My assumption is that locking the data prevents simultaneous access, nullifying the use of threads.
tvb
|
|
|
|
|
yep. No big surprise, except locking the classifiers makes no sense to me, as it kills all parallelism and I don't see the need. IMO you either did something wrong, or forgot to tell me something...
did you compact the data with shorts?
Luc Pattyn
I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages
Local announcement (Antwerp region): Lange Wapper? 59.24% waren verstandig genoeg om NEEN te stemmen; bye bye viaduct.
|
|
|
|
|
Yes, I used ushorts for all the data. I can't say exactly what the effects were for the classifiers, but making the image data ushorts definitely helped.
Do you maybe know how I could use parallism without locking the data? Locking it seems the only way to keep it safe, and like you say, nullifies the point of using parallism...
tvb
|
|
|
|
|