|
Not quite, the thread below the thread below...
take a step away from keyboard......
|
|
|
|
|
That isn't a thread - it's just a single post.
Anyway, I don't use a keyboard, I just use my psychic powers to make the words appear on the screen.
|
|
|
|
|
|
I'm writing a floating-point package in C++ that provides:
- A full implementation of the binary part of the IEEE-754-2008 Standard for Floating-Point Arithmetic (single-, double- and quad-precision)
- Implementation of higher-precision formats, compatible with the Standard (up to binary1024).
I have a basic implementation written using the "standard" algorithms, and would like some idea of where to invest time on improvements. Obviously, spending a lot of time on an operation that is rarely executed is not the best use of my time...
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack.
--Winston Churchill
|
|
|
|
|
Daniel Pfeffer wrote: I'm writing a floating-point package in C++ That was not clear from your original question.
So I will dig in here:
I would not think about that. All basic operations will be used often (more or less) and should be therefore optimised as far as possible.
Because division is the slowest operation it might be the first candidate even used probably less than the other operations. When a calculation uses divisions, a better implementation would probably reduce the overall calculation time by a greater factor than without division optimsation but with addition and multiplication optimisation.
|
|
|
|
|
OK, that makes sense. Thanks.
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack.
--Winston Churchill
|
|
|
|
|
You are welcome.
It is an interesting and challenging topic.
Did you plan to publish it as an article?
|
|
|
|
|
Eventually - yes.
The code works for the few problems that I've thrown at it, but that's not good enough (see the Pentium bug...). My biggest problem is finding an appropriate test suite; most of them cost an arm and a leg, and I can't justify spending that sort of money on a hobby.
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack.
--Winston Churchill
|
|
|
|
|
The distribution of operations depends on the problem set. However, you might be able to take some general guidelines from the evolution of computers themselves. Addition/subtraction came first, with floating point units being added later. If you look at those floating point units, you'll probably see that later ones implemented more operators.
On the other hand, if you look at GPUs, they've always had floating point hardware -- those problem sets were never tractable in real time until floating point hardware existed.
As for testing, the best way I found was to look at the architecture of the hardware, and design a test that tested it. For example, the old VAX FPUs used a nibble lookup table for multiplication, so I concluded that I needed to test every pattern in that lookup table to know if the hardware was OK. That did not reliably happen by simply pounding a lot of math-happy code at the FPU -- it required a specially created dataset that could be proven to be exercising each entry in the lookup table. If your hardware doesn't use a nibble lookup table, that test would likely be useless since it might not achieve full coverage.
We can program with only 1's, but if all you've got are zeros, you've got nothing.
|
|
|
|
|
The most floating point math I have seen recently was in a mapping package.
It was heavily loaded with trigonometry functions as you can imagine. I could see the optimizations for those functions varying heavily dependent on the bit size. (per your 1024 bit precision capability)
|
|
|
|
|
Even "real" numbers can turn out to be misleading, if the background for the figures are not completely understood. Such as: 30+ years ago I was working on a computer which had an extreme FPU (it filled about half a square meter of circuit board). It was so fast that for integer multiply and divide, the 32 bit integer value was internally converted to a 64 bit floating point value, the operation performed by the FPU, and the result converted back to integer format. So a count of FP multiply/divide operations would count integer operations as well.
Another case: At my university, the IT people running the huge mainframe (this was many years ago) attached a counter to the Divide by Zero flag, and discovered that every single day, literally tens of millions of divide by zero was performed. For a few days, there was a big uproar in the IT department over the "low code quality" causing so many exceptions - until one of the mechanical engineering guys noticed the worries and explained that this was quite normal and expected: Some of the standard matrix operations would generate partial results where some number indeed was divided by zero, but the algorithm did not make use of those partial results. So there was no "real" need to perform those divisions at all; it was just a consequence of using a standard matrix library operating on all elements rather than those actually used.
If you didn't know, you might have spent lots of time speeding up the processing of the Divide by Zero exception, which might have been a waste.
When you ask for other people's use of a certain mechanism, you will not know the context from which these figures were drawn. If you collect data from two dozen independent sources, you might get an idea about the "typical" figures, but they might be completely off for one specific application domain.
To illustrate: This machine with the half sqare meter FPU were mostly used in engineering applications, where FP performance was at a premium. For business use, you could choose the BCD option. Business applications hardly do division at all, so there was no BCD divide hardware - it was implemented purely in microcode, and it was dead slow! But no customer complained over it: They never discovered, because they never used BCD divide.
For comparison: FP divide started with a table lookup for the two operands, giving the first 11 bits correct, followed by 1-clock-cycle iterations, each iteration doubling the number of correct result bits. Finally, 1 cycle was requred for normalizing the result value. So the total time for a 32 bit divide was four clock cyles, for a 64 bit divide, five clock cycles. They won a couple of design awards for that FPU in the early 1980s (and a number of prestigious engineering contracts, like with CERN and the F16 fighter project).
|
|
|
|
|
Wouldn't that vary from one application to the next, depending on the purpose of the app?
If you think 'goto' is evil, try writing an Assembly program without JMP.
|
|
|
|
|
|
THE MOST AND LESS FREQUENT DIGITS IN THE NUMBERS FROM 1 TO 1000
It is not about writing code - but it is possible - but some nice logical explanation...
So which is the most frequent digit in the list of numbers form 1 to 1000?
And the less frequent?
Why?
And even Google is our friend - it would be nice to not to tell him about it...
Skipper: We'll fix it.
Alex: Fix it? How you gonna fix this?
Skipper: Grit, spit and a whole lotta duct tape.
|
|
|
|
|
OK, I'll be the first in.
1 is the most only because you are going from 1 to 1000. If it was 1 to 999 or 2 to 1000, there would be the same number of 1's as all other non-zero numbers.
0 is the least because numbers do not start with a 0. (except 0 which is not included).
Brent
|
|
|
|
|
dbrenth wrote: 1 is the most only because you are going from 1 to 1000
That does not convince me... That's a sloppy reasoning at the best...
dbrenth wrote: 0 is the least because numbers do not start with a 0.
But 1000 brings in three zeroes...
Skipper: We'll fix it.
Alex: Fix it? How you gonna fix this?
Skipper: Grit, spit and a whole lotta duct tape.
|
|
|
|
|
For the most, The problem actually scales down to what is the most frequent digit between 1 and 10. There are 2 ones. It is the same idea for 1 to 1 X 10^n.
You wouldn't notice the least until you get to 11 to 20. In this case, 1 shoots out ahead because it leads half of the digits. in 21 to 30, 2 catches up to 1. But there is no equivalent section of numbers for 0. In 1 to 100 there are only 11 0's, which puts it far behind the other numbers and it never catches up.
Brent
|
|
|
|
|
Now, that's much better!!!
Skipper: We'll fix it.
Alex: Fix it? How you gonna fix this?
Skipper: Grit, spit and a whole lotta duct tape.
|
|
|
|
|
I agree with Brent.
Take all arrangements of 3-digit numbers(1000 arrangements). All digits are equally represented. Remove any leading zeroes ('0' or '00') from the list. Remove '000' from the list. We have fewer zeroes now but all other numbers are equally represented. Adding '1000' at the end will not make up the deficit.
So fewer zeroes but 1 has the greatest score because of '1000'
David
|
|
|
|
|
Without thinking much:
The less is obviously the zero.
The most probably the one because 1000 is the only 4-digit number.
|
|
|
|
|
Jochen Arndt wrote: The less is obviously the zero.
I see nothing obvious there... You may say, that the line does not start with zero so it is at least one appearance behind, but 1000 adds three more zeroes!
Jochen Arndt wrote: The most probably the one because 1000 is the only 4-digit number.
OK. But that implies, that no other number had an advantage before than... Why is that?
Skipper: We'll fix it.
Alex: Fix it? How you gonna fix this?
Skipper: Grit, spit and a whole lotta duct tape.
|
|
|
|
|
You did not wrote "list of numbers form 0001 to 1000".
So it is obvious that the zero occurs less often than other digits.
1-9: Each digit except zero occurrs once
10-99: Each digit except zero occurs 10 times (10's place) plus 9 times (1's place); zero 9 times
>= 100: Zero is now present at the 10's place like the other digits but not at the 100's place.
|
|
|
|
|
Much better - it is like the second version of some code... more robust and trustworthy...
Skipper: We'll fix it.
Alex: Fix it? How you gonna fix this?
Skipper: Grit, spit and a whole lotta duct tape.
|
|
|
|
|
Well, you've got 1-9, where all digits except 0 appear once.
Then you've got 10-19, where 1 occurs 11 times, every other digit, including 0 now, once.
Then you've got 20-29, where 2 occurs 11 times, etc.
by 90-99, we have all digits in count except 0 which lags by 10 (every digit occurs twice in each set of 10 except 0 which occurs once, so you've got 10 sets, so 0 lags by 10. (every other digit in the set x0-x9 occurs 11 times).
100 - 109 - Now 0 makes up for a lost digit, but loses out again in the 1n0-1n9 (where 10 > n > 0)
Ultimately, at 1000, 0 should still be the least frequent digit, and 1 gets a head start on everyone else.
I think I thought that through properly, but my brain is still fried mapping XML to bizarre property fields in strange class relationships that someone else wrote and where all the rules are embedded in the business logic for creating said entity containers.
Marc
|
|
|
|
|
Not bad for an old chap Marc...
The only thing is that 0 is behind by 11 in the 1-99 range, so the three zeroes of 1000 can't help it out...
Skipper: We'll fix it.
Alex: Fix it? How you gonna fix this?
Skipper: Grit, spit and a whole lotta duct tape.
|
|
|
|