|
Member 4399131 wrote: It seems the the test code wasn't appropriate
Member 4399131 wrote: Do you think that this test code is more accurate?
The only really accurate test set would exercise all possible inputs...which is obviously infeasible
What's more important is to understand why your code is quicker with some test sets and not with others and to determine what the optimal approach actually is (for example, if you can work out roughly where your code becomes more efficient than Microsoft's, you could switch between the two implementations dynamically).
Member 4399131 wrote: And about the MS Lookup table I look at it too and I wonder if that's quite legal, I mean that according to my assembly book the instruction BTS sets a bit in the target pointed by its position BUT MAX POSITION IS 31!!! in this case they set bits in a larger memory block by setting positions up to 255 - I wonder how this works, it does obviously?! Maybe my assembly book is obsolete
My x86 reference (pukka Intel reference manual) says that if you use an immediate operand, you're limited to 0-31. If you use a register operand, the limits disappear (well, they're -2^31 to 2^31-1, which is near as damn it unlimited)
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
|
Member 4399131 wrote: I seeeee! this detail wasn't written in my source - that's a reason to find some more complete one.
I got mine from Intel[^] - downloadable PDFs these days. When I got them (close to 10 years ago!) you could order free paper manuals and they'd dispatch them to you, again free. I guess the cost of that was an easy target when reducing costs, as paper copies of the manuals now cost money (and lots of it!).
I see (by downloading the PDFs) that BTS actually gives you offsets of up to +/- 2^63 bits on the IA-64 (== x64) platform. Should be enough, even for Unicode lookup tables
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
Wow! thank you man! for some time I was wondering where I can get such a detailed description of the architecture and each instruction(I didn't even know how to search for it), and not thanks to you I finaly have it!!! - MY REGARDS
Bay the way I didn't bother myself ordering it - I've downloaded it directly in PDFs 
|
|
|
|
|
I did it!
I used the lookup table algorithm implemented in my own coding technique (which is nothing special just few tweaks). And now I have FindOneOf that is 15 - 25% faster than CString::FindOneOf. I did many tests and it doesen't drop bellow 15%
I couldn't make it without your help! THANK YOU!!
It would be even faster but BT, BTS as well as SHR, SHL, ROR, ROL (and others) instructions are generally slower than MOV, ADD and so on ... instructions!
I noticed that in my not very long assembly practice. I think that is because they are all bitwise operations.
|
|
|
|
|
Without knowing the logic behind the standard implementation, it is impossible to know why your algorithm is being beat. Perhaps the standard string stores data about specific searches so that, in your loop, it loads known results instead of recalculating them (ie, you are searching the same string for the same charset, so why not cache the result?). Perhaps it realized that your set of input characters is consecutive, and instead of iterating through them it tests each character against a range. Perhaps the standard implementation starts from the end of the string and searches forward. Perhaps the compiler realized that the standard call to FindOneOf in your loop could be optimized away since you are not using any values of that method call except the final one, but your inline-assembly could not be optimized away. The moral is, without doing more testing I have no idea why your algorithm is slower (or your test method is inappropriate). I would only recommend additional test cases, or to make one or more of the modifications that I said were possibilities of what the standard code does. You could also try reordering the loops, but I doubt you are going to see significant timing differences resulting from such a change.
[BEGIN EDIT]
The lookup table idea is a great one for ASCII strings... I was only thinking in terms of unicode.
[END EDIT]
Sounds like somebody's got a case of the Mondays
-Jeff
|
|
|
|
|
Skippums wrote: The lookup table idea is a great one for ASCII strings... I was only thinking in terms of unicode.
The Microsoft Unicode version of that function (wcspbrk) doesn't use a lookup table - it uses nested loops, same as the OP, in straight C rather than assembly language - so Unicode would probably yield comparable performance.
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
"you are searching the same string for the same charset, so why not cache the result?"
- This is a good idea but I think it will make a major difference in my test code which tasts the came string and the same charset, and there aren't much cases like that in the real world. but I'll try it - I don't have much options anyway!
"Perhaps it realized that your set of input characters is consecutive, and instead of iterating through them it tests each character against a range."
- I've tested it with mixed characters and letters - it's the same story, so I don't think that's the case - but I appreciate the idea!
"Perhaps the standard implementation starts from the end of the string and searches forward."
- I thought about that, it's quite possible. Why I didn't test it with mirrored string already?
"Perhaps the compiler realized that the standard call to FindOneOf in your loop could be optimized away since you are not using any values of that method call except the final one"
- Well I looked at the disassembly so I can tell that's not the case - for each iteration of the test loop it calls CString::FindOneOf which then calls a bunch of other functions, somewhere in those functions I met call of "EnterCriticalSection" which blown my mind... ?!
Otherwise you are quite right that standard functions are often optimized (as I mentioned about strlen) and the self-coded ones are not, especially those with __asm in them!
I think the lookup table has a potential!
|
|
|
|
|
I think Stuart's advice to use the C runtime in this case is the best alternative.
An algorithm can be optimized for different purposes and can also be better or worse in some scenarios. I guess the standard C runtime implementation is rather generic and predictable.
Try out the following algorithm and you'll see what I mean.
It is twice as fast as CString::FindOneOf() with the strings you're using for testing and measurement when it's release-built. On the other hand, adding a 'u' in the search string will make the algorithm 50% slower than CString::FindOneOf() . Surely you can optimize it further, but it will still have its weaknesses.
The key in the algorithm below is to do as little as possible with each char in the string to be searched.
int FindOneOf( const char* pString, const char* pSearch )
{
register char cMask = -1;
register char cPattern = *pSearch;
register int nPos;
int nSPos;
int nRet = -1;
for( nPos = 1; pSearch[nPos]; ++nPos )
{
cMask &= ~(pSearch[nPos - 1] ^ pSearch[nPos]);
cPattern |= pSearch[nPos];
}
for( nPos = 0; pString[nPos]; ++nPos )
{
if( (~(pString[nPos] ^ cPattern) & cMask) == cMask )
{
for( nSPos = 0; pSearch[nSPos]; ++nSPos )
{
if( pString[nPos] == pSearch[nSPos] )
{
nRet = nPos;
break;
}
}
if( nRet != -1 )
{
break;
}
}
}
return nRet;
}
Beware! The above is just an example at the top of my head!
"It's supposed to be hard, otherwise anybody could do it!" - selfquote "High speed never compensates for wrong direction!" - unknown
|
|
|
|
|
You actually bother yourself to write and test this code to help me - thank you so much!
I'll study it carefully!
|
|
|
|
|
Member 4399131 wrote: You actually bother yourself to write and test this code to help me
Well, yesterday must have been one of my good days and I guess the Fairy of Kindness must have hit my forehead with her wand since I got a bump...
Seriously, the algorithm is not very good in my opinion. Given certain preconditions it is actually faster than CString::FindOneOf() , but it can also be awfully slow. The whole idea with the algorithm I provided is to find at least one bit that are common for all characters to be searched for that is not very common in the string to be searched. In practice this means that searching for a digit (ASCII 0x30 - 0x39) in a string that only contains letters (ASCII >0x41) is quite fast, but searching for a 'b' in a string with small letters will be rather slow.
My point was that an algorithm can be optimized for certain preconditions and may prove faster than a generic one if those preconditions are satisfied, but may be much slower than the generic one if they are not satisfied.
That's why I think Stuart's advice is the best one, i.e. to go with the C runtime implementation. It is likely implemented to give the best overall performance with an algorithm far better than mine.
"It's supposed to be hard, otherwise anybody could do it!" - selfquote "High speed never compensates for wrong direction!" - unknown
|
|
|
|
|
Well I think I beat the standard runtime library!!!
I used the lookup table algorithm of strspn but I implemented it using my own tweaks. As result I got FindOneOf 15 - 25% faster than the standard!
and its idea is pretty close to yours.
in a 256 bits wide bit table(which is all zeroed) they set a bit for each ASCII simbol found. They use the character's ASCII code as index in this bit table. And then they just test for set bits using the target string's characters as indices in the bit table
modified on Wednesday, June 24, 2009 7:53 AM
|
|
|
|
|
Member 4399131 wrote: Well I think I beat the standard runtime library!!!
Well done!
However, since I read your answer to Stuart I understand you've implemented this using inline assembler. It would be really nice to write it in C/C++ in such way that the compiler would generate the same tight code. That would make it more portable.
The algorithm used is quite interesting...
I guess using a bit table will result in a lot of calculations in order to get the correct bit if not using inline assembler. If size of the table is not an issue I would try using chars or ints instead because then indirect addressing can be used, which I assume will faster.
For the sake of my own curiosity (and amusement) I'll try that and compare it.
I'm not aware of any statement in C that would make use of assembly bit manipulation instructions with bit numbers higher than e.g. 31 on a 32-bit system. So I guess inline assembler is the only choice.
"It's supposed to be hard, otherwise anybody could do it!" - selfquote "High speed never compensates for wrong direction!" - unknown
|
|
|
|
|
Roger Stoltz wrote: It would be really nice to write it in C/C++ in such way that the compiler would generate the same tight code. That would make it more portable.
YES! Not only more portable, if the code is in C/C++ the compiler will have a far better chance to build-in the entire inline function!
All of my functions are declared inline, but since I use inline assembly inside them, I make explicit use of specific registers, thus not allowing the compiler to use the currently free ones! In effect the compiler couldn't build-in the function, but compiles it as a normal procedure and CALLs it with all stack operations necessary. Fortunately the runtime library is just the same story! It's full of CALLs and PUSH-POPs , for instance: before CString::FindOneOf gets ready to call of strspn it passes through whole bunch of them!
And I already had some issues using inline assembly, for example: when I installed a new version of NOD32 and scaned the PC, it detected my test application as unknown virus, obviously it somehow detected the unusual assembly code and alarmed. NOD32 is actually known assembly code analyser, that's probably because the inline assembly is mostly used by virus writers these days. I used the option to send them the EXE for analysis to see that it's not bad at all!
I considered writing my algorithms in C/C++ too great effort (at least for now). A lot of other functions await me to implement them after all. And I'm willing to give them just as much attention as I gave to FindOneOf if necessary. Let me see the entire conception working and then I'll try the "siplusplusing" (I just got this term )
Roger Stoltz wrote: I'm not aware of any statement in C that would make use of assembly bit manipulation instructions with bit numbers higher than e.g. 31 on a 32-bit system.
Why don't you try to write some yourself?
inline void operator [] (void* pBitTable, int index)
{
__asm
{
mov EAX, index
bts pBitTable, EAX
}
};
inline bool operator [] (void* pBitTable, int index)
{
__asm
{
mov EAX, index
bt pBitTable, EAX
jb True
xor EAX, EAX
jmp Skip
True: mov EAX, 1
Skip:
}
};
Well these hardly will give better rsults than pure assembly code, but who knows what miracles the compiler/optimizer are capable of! I think there is a GOOD chance for the compiler to build these in 'cause only EAX is named directly and you know it's reserved for the return value and the compiler keeps it free before any call
(the code above is just to give you the idea I haven't eaven tested it! But I do think it's a good idea and I'm pretty sure it'll work fine)
modified on Thursday, June 25, 2009 10:44 AM
|
|
|
|
|
Hi All,
I am trying to post data to server and get the response for the same through xmlhttp request.but i am getting status code 415 in response below is my code snippet:
hr=pIXMLHTTPRequest->
open("POST", "http://www.lowcostholidays.com/webservices/search.asmx",false);
SUCCEEDED(hr) ? 0 : throw hr;
int ssttr11 = ssttr.GetLength();
_bstr_t mylong = (_variant_t)(long)ssttr11;
pIXMLHTTPRequest->setRequestHeader("Content-type", "text/html;charset=utf-8");
pIXMLHTTPRequest->setRequestHeader("Content-length",mylong);
hr=pIXMLHTTPRequest->send(SomeURL)
SUCCEEDED(hr) ? 0 : throw hr;
int aa = pIXMLHTTPRequest->readyState;
int bb = pIXMLHTTPRequest->status;
BSTR stext,rtext;
pIXMLHTTPRequest->get_statusText(&stext);
hr = pIXMLHTTPRequest->get_responseText(&rtext)
Thanks A Ton
Ash_VCPP
walking over water is just knowing where the stones are.....
|
|
|
|
|
I guess the server told you "415" => Unsupported media type
comment this out, it isnt needed
pIXMLHTTPRequest->setRequestHeader("Content-length",mylong);
Press F1 for help or google it.
Greetings from Germany
|
|
|
|
|
Hallo,
i need to display a timer in the statusbar to show how much time the application need to do
a task, if the application is Idle the time shown in the statusbar is 00:00:00(i alreday have this done)
but if the applicatioon is busy the timer should display the time has benn requierd.
How can i do this and in which files of my SDI-Application please.
|
|
|
|
|
Start a timer (have it fire every 1000 ms) when your application starts the task (Should be done on a separate thread if it is lengthy. Because timer messages won't pile up). Use thread synchronization to know when the thread exits and kill your timer there. The timer event handler must have the code to display the time taken.
It is a crappy thing, but it's life -^ Carlo Pallini
|
|
|
|
|
See here.
"Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown
"Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons
|
|
|
|
|
Is it possible to access element by index (like in array or vector), when using multiset?
|
|
|
|
|
no
but you can walk through the collection with an iterator.
|
|
|
|
|
I found, that it is possible only to iterate in this way:
multiset<double> s;
multiset<double>::iterator it;
it = s.begin();
for (int i = 0; i < s.size(); ++i)
it++;
I tried to do the following:
it += k;
But that doesn't work. Is there any oppotunity to "jump" to the position?
|
|
|
|
|
as you've already started discussion on same topic, please continue your discussion in previous thread.
multiset is a node based container. It's not implemented using contigeous memory allocation. Please check the documentation of multiset.
-Sarath.
"Great hopes make everything great possible" - Benjamin Franklin
|
|
|
|
|
Hi all,
could somebody please suggest an API which can split wave files into small chunks of userspecified timespans
Eg: a.wav -60 seconds.
after splitting we should get
a1.wav -20 sec
a2.wav -20 sec
a3.wav -20 sec
respectively
any help could be appreciated.
thanks.
|
|
|
|
|
The header of the wavefile contains the samplefrequency (and the number of bits of one sample, mono, stereo). Next the samplepoints follow. From the data in the header you can figure out which part of the samples you need... (i must somewhere have a description of the wavefile format...)
Rozis
|
|
|
|