
The picture was taken on while on my brothers stag week and it's the first and only time I've fired a real gun. I'd no idea it would elicit that kind of a response!





The only one of these functions I ever used was OleLoadPicture() which I used to get a JPEG image loaded and displayed. This required the file to be read into memory first and then accessed via a LPSTREAM pointer.
You may find that OleLoadFromStream() offers similar capabilities. To be brutally frank I don't fully understand these functions, but I just did a lot of poking around until I got it to work.





Thanks for the info. It sounds like I might be able to use the OleLoad() function to do what I want. Someone really ought to write a book on what's going on inside OLE





They did. It's called The NecronomiCOM .





Silly Hoplophobe.
The latest nation. Procrastination.





(this is a long one, i apologize)
i'm working on an image filter. the basic operation is:
1. run across a row of pixels (one BYTE per component); perform a calculation and store the result in an array of floats. the output array is the same width as a pixel row.
2. run across the same row, but in the opposite direction; perform a similar (but not identical) calculation. store the results in a different array of floats.
3. sum each pair from the two float arrays and store the result in a temporary float image of the same dimensions as the input image, but rotate the output 90 degrees. in other words: output rows as columns in the temp image.
4. using the same basic method as 1 and 2, process the floating point temp image. sum the results and write them to an output 8bit image.
so, again: there are two calculation loops and a summation loop per row. after all input rows are done, the process is repeated using the temp as input.
Loop over 8bit Rows
Loop over Column[y] left to right => tempRow1
Loop over Column[y] right to left => tempRow2
tempRow1 + tempRow2 => temp f.p. image row[y]
Loop over temp f.p. Rows
Loop over Column[y] left to right => tempRow1
Loop over Column[y] right to left => tempRow2
tempRow1 + tempRow2 => output 8bit image row[y]
just for reference, here's the first calculation loop:
for (x=4;x < width;x++)
{
pTempPos[x] =
numerator_factors_positive[0] * pCurPix[0]
+ numerator_factors_positive[1] * pCurPix[iChannels]
+ numerator_factors_positive[2] * pCurPix[2 * iChannels]
+ numerator_factors_positive[3] * pCurPix[3 * iChannels]
 denominator_factors_positive[1] * pTempPos[x  1]
 denominator_factors_positive[2] * pTempPos[x  2]
 denominator_factors_positive[3] * pTempPos[x  3]
 denominator_factors_positive[4] * pTempPos[x  4];
pCurPix+=uChannels;
}
that's the first loop. the third loop looks identical, except that pCurPix is a pointer to a floating point value in the third loop  it is a BYTE pointer here. the 2nd and 4th loops are very similar to that and are also identical to each other  again, except for the pCurPix data type.
also, i wrote this code as a template so i can switch the type of the floating point data from float to double, for testing (the "factor" arrays, the temp row buffers and the temp image).
a little more info:
one of the parameters ("sigma") to the function is used to set the values in those factors array. and the algorithm is constant complexity with respect to that parameter  sigma changes the values that pixels are multiplied by, not the number of times the calculations happen. the only thing that influences the amount of calculations performed is the size of the input image. in theory...
another parameter is the number of color channels in the input image (1=grayscale, 3=RGB, 4=RGBA, etc.)
and finally here's the problem! :
when:
1. the class is using float s for the f.p. data type
2. the image is a single channel
3. the value of sigma is near 3...
the third loop (again, which looks exactly like what i've posted above) slows down to where it's literally ten times slower than all the other loops. as i move sigma away from three, the performance quickly goes to where it should : by 6, loop #3 is as fast as the rest, and they all stay the same speed as far as i can tell, for all other values of sigma: 50 is as fast as 6, and 70 is as fast as 6.
so it would seem the solution is to use doubles. but an array of doubles is 2x as big as an array floats. and, even worse, the float version of this is 2x faster than the double version !
here are the overall timings for the float version (sigma 1st col, time for 50 reps, 2nd col):
0.10 0.44
0.60 0.56
1.10 0.76
1.60 0.80
2.10 1.29
2.60 1.48  spike, around 3.0
3.10 1.44
3.60 1.25
4.10 0.97
4.60 0.66
5.10 0.45
5.60 0.39
6.10 0.34
6.50 0.34
.. and then it stays at 0.34s until sigma = 93.5, when it totally blows up and does this:
93.00 0.34
93.10 0.36
93.20 0.34
93.30 0.34
93.40 0.34
93.50 27.22
93.60 27.38
93.70 27.19
93.80 0.34
93.90 0.36
94.00 0.36
these timings are on a Core2 2.4MHz. but i can duplicate the slowdown on a singlecore Pentium D 2.8.
anybody have any idea what could be going on?
update: ok, 93.5 issue is when some of the array values go to infinity... i don't see anything like that at 3.0, though.
modified on Wednesday, September 30, 2009 4:59 PM





Odd, according to the manual the speeds of fmul fadd and fsub are the same for all normal inputs, which is everything except denormals infinities and NANs.
Have you checked for denormals?
edit: it changes a little if you're using SSE, but not much, I would still check for denormals just in case





yeah, i've been looking for that. it was blowing up at sigma = 93.5 because of a div by zero in the factors calculations (which put a bunch of #INFs and #INDs in the calcs). but everything looks reasonable in the sigma=3 range.





Any chance you can show your code that includes your use of the sigma function parameter ? Does your code produce the correct values ?





sure.
here's most of it:
void CalcIIRFactors (FloatType sigma)
{
const FloatType b0 = (FloatType)1.783;
const FloatType b1 = (FloatType)1.723;
const FloatType w0 = (FloatType)0.6318;
const FloatType w1 = (FloatType)1.997;
const FloatType b0OverSigma = b0 / sigma;
const FloatType b1OverSigma = b1 / sigma;
const FloatType w0OverSigma = w0 / sigma;
const FloatType w1OverSigma = w1 / sigma;
const FloatType pi = (FloatType)acos(1.0);
const FloatType scale = sqrt (2 * pi) * sigma;
const FloatType a0 = (FloatType)1.680 / scale;
const FloatType a1 = (FloatType)3.735 / scale;
const FloatType c0 = (FloatType)0.6803 / scale;
const FloatType c1 = (FloatType)0.2598 / scale;
numerator_factors_positive [0] = a0 + c0;
numerator_factors_positive [1] =
(exp(b1OverSigma)*(c1*sin(w1OverSigma)
(c0+2*a0)*cos(w1OverSigma)) +
exp(b0OverSigma)*(a1*sin(w0OverSigma) 
(2*c0+a0)*cos (w0OverSigma)));
numerator_factors_positive [2] =
(2 * exp(b0OverSigma+b1OverSigma) *
((a0+c0)*cos(w1OverSigma)*cos(w0OverSigma) 
a1*cos(w1OverSigma)*sin(w0OverSigma) 
c1*cos(w0OverSigma)*sin(w1OverSigma)) +
c0*exp(2*b0OverSigma) + a0*exp(2*b1OverSigma));
numerator_factors_positive [3] =
(exp(b1OverSigma+2*b0OverSigma) * (c1*sin(w1OverSigma) 
c0*cos(w1OverSigma)) + exp(b0OverSigma+2*b1OverSigma) *
(a1*sin(w0OverSigma)  a0*cos(w0OverSigma)));
numerator_factors_positive [4] = 0.0;
denominator_factors_positive [0] = 0.0;
denominator_factors_positive [1] =
2 * exp(b1OverSigma) * cos(w1OverSigma) 
2 * exp(b0OverSigma) * cos (w0OverSigma);
denominator_factors_positive [2] =
4 * cos(w1OverSigma) * cos(w0OverSigma) *
exp(b0OverSigma + b1OverSigma) +
exp(2 * b1OverSigma) + exp(2 * b0OverSigma);
denominator_factors_positive [3] =
2 * cos(w0OverSigma) * exp(b0OverSigma +
2*b1OverSigma)  2*cos(w1OverSigma) *
exp(b1OverSigma + 2*b0OverSigma);
denominator_factors_positive [4] = exp(2*b0OverSigma + 2*b1OverSigma);
int i;
for (i = 0; i < FACTORS; i++)
{
denominator_factors_negative[i] = denominator_factors_positive[i];
}
numerator_factors_negative[0] = 0.0;
for (i = 1; i < FACTORS; i++)
{
numerator_factors_negative[i] =
numerator_factors_positive[i] 
denominator_factors_positive[i] *
numerator_factors_positive[0];
}
there's a little more after this, but it's working on things that aren't used inside the offending loop. and, yes, the values are correct.  even when slow, it's giving good results.
also, testing with finer precision, it looks like the slowdown isn't centered at 3.0 exactly, rather, it's looking more like sigma = 2.7 is the key. e ??
i wish i could post the whole thing, but... well, it's proprietary.
modified on Wednesday, September 30, 2009 6:11 PM





I understand about the proprietary requirements.
Your code produces normal values for the numerator/demonimator values over the range of sigma that I you mentioned in my experiments.
The only suggestion I have at this point is can you instrument your runtime statistics at a finer levels to see what portions of the your calculations could be responsible to efficiency reduction.
If you find the solution it would be nice if you could post a followup to the thread.
Good luck.





Since performance is an issue you could probably achieve a speed improvement if you compute some more temporary variables like sin, cos, and exp of various expressions because several of those are recomputed more than once. For example, cos(w1OverSigma) is computed seven times in that function.





the thing is, those are calculated ahead of the actual processing, and don't change. the actual work is pretty much 4x the loop i posted.





Hello. I'm writing some C code for an embedded application, and I've run into a problem wherein a compare against an enumerated value is not being executed correctly. Take the following code snippet, for example:
typedef unsigned int UINT16;
typedef enum enum_items_tag
{
ITEM_1,
ITEM_2,
ITEM_3,
ITEM_918,
MAX_ENUM_ITEMS
} enum_items_t;
UINT16 n;
for ( n = 0; n < MAX_ENUM_ITEMS; n++ )
{
}
The code executes as expected, until n is incremented to equal MAX_ENUM_ITEMS, at which time the compare fails, and execution continues within the loop (when it should have exited). I've done things like this in the past without any problems.
I've tried retyping n as enum_items_t (i.e. declaring n as "enum_items_t n"), as well as type casting MAX_ENUM_ITEMS as UINT16. The only other thing I can think of at this point is that maybe there is an issue with the number of items there are in my enumerated type (919). Does anyone know if there are such constraints on enumerated types? I'm using a GCC based compiler. Or, if you have any other ideas, it would be much appreciated. Thanks.





why do not output value of MAX_ENUM_ITEMS to see what it is, you may miss some thing in the middle of the enum.
totally 919 items? MAX_ENUM_ITEMS should be 918.





You're right. Good catch! The enumerated type enum_items_t has a total of 920 items, including MAX_ENUM_ITEMS. I was thinking in terms of offset, or what n would be as the loop repeats.
Unfortunately, outputting constants information with embedded programming can be a little tricky, especially when you're stuck using a ***** compiler/IDE. I'll see what I can do, though. Thanks.





Just curiosity: what is the usefulness of such a enum ?
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
 Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
 Iain Clarke
[My articles]





It's used for quickly indexing into an array, rather than having to use processor time to search for an element.





Well, I cannot see any speed advantage. Would you please elaborate a bit?
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
 Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
 Iain Clarke
[My articles]





I found the problem. As usual, it turned out to be really simple. My INVALID_ITEM item was positioned at the beginning of the enumeration. When I moved it to the end, after MAX_ENUM_ITEMS, things lined up and started working.





I did a test for size change of MFC exe.
When a line of code or a small resource bitmap is deleted or added in the exe program, exe size keeps unchanged.
So I guess exe is aligned in size  similar to class aligned in size by its members.
Do you think it is correct or have more info about this?





Yes. By default applications are aligned on 512 byte boundaries. (Due to a quirk in the Windows 98 paging mechanism, before Visual Studio 2008, Win32 applications were aligned on 4096 byte boundaries unless you added the /OPT:NOWIN98 switch to the linker. The linker for VS 2008 and later now uses 512 alignment by default.)





Hi Joe,
The /OPT:NOWIN98 should be sdded at which place of project settings? How?





In Visual Studio 2003/2005 it's in PropertiesLinkerOptimization. Select "No (/OPT:NOWIN98) Only set this for Release mode.
In Visual Studio 2008, this will generate a warning, so it should be set to Default.
In Visual C++ 6.0, go to SettingsLink. In the Project Options Edit control, go to end and manually add "/OPT:NOWIN98"





Joe, thanks,
The min change is from 4096 to 512 bytes now.
cheers.



