Algorithms Discussion Boards

khomeyni22-Jun-13 6:01

22-Jun-13 6:01

hello
I want to find a fast algorithm for computing 1/d , where d is double ( albeit it can be converted to integer) what is the best algorithm for this case of many algorithms(SRT , goldschmith,newton raphson, ...)?I'm writing my program in c language.
thanks in advance.

Re: fast division algoirthm

harold aptroot22-Jun-13 10:32

harold aptroot

22-Jun-13 10:32

It depends. On the quality of the implementation, the number of bits that you care about, the platform, and even the compiler. There's no simple answer. You'll just have to try them - unless you tell more about the intended target etc.

Some more options might be:
- Avoiding the division altogether. Calculate with fractions.
- Using hardware-supplied division, if it exists. On some processors it's not half bad.
- Using hardware-supplied approximate reciprocal, optionally with refinement step. x86 has had RCPSS[^] for a while now, it's a lot faster than actually dividing, even if you do a refinement step. 3DNow! actually had better support for fast reciprocals but it's AMD specific and obsolete.
- Using crazy IEEE-format-specific tricks such as this[^]

Re: fast division algoirthm

khomeyni22-Jun-13 23:19

khomeyni

22-Jun-13 23:19

I,m writing a bit of a large programme and within it there is some divisions, the programme must be implemented on fpga and in fpga there is no division.so I must write a fast algorithm for all divisions , I don't have any access to hardware and I must implement it on software level.
I have read some algorithm from here , but I don't know which one to use.

Re: fast division algoirthm

harold aptroot23-Jun-13 1:42

harold aptroot

23-Jun-13 1:42

So, you can't add a hardware divider to the FPGA then? Or fast reciprocal support?
Anyway it depends. Does it have fast multiplication? If not, well, that's a problem, you could only implement the slow methods then.
If you have fast multiplication and IEEE floats, you can use the weird trick I linked to in my previous post with a couple of refinement steps. That's really just Newton–Raphson division with a simpler calculation for the initial approximation (but afaik it still only takes 3 refinements for single-precision floats, just like the regular initial approximation). Fast reciprocal support works that way too - give a fast initial approximation (handling the exponent right and getting significant bits from a lookup table, if you get 12 significant bits that way you only need one refinement step for single-precision or, 13 are enough to get 2 steps for double-precision) and optionally have instructions that help implement the refinement step (like AMD's PFRCPIT1 and PFRCPIT2), for example to calculate Y = (1 - D*X) and to calculate X + X * Y.
Even without those tricks Newton–Raphson division is still not bad, with the linear approximation it takes only 4 refinements for double-precision floats, but it also takes some annoying exponent adjustments to get in the right range first (in hardware that wouldn't be half as annoying).
Goldschmidt division is, afaik, roughly equivalent in performance and might have a slightly less complex implementation. It's really the same sort of deal - trickery with the exponent to get in the right range, the "2 - something" estimation trick (which is rearranged in Newton-Raphson division, but it's really the same thing), and doing the refinement step until all the bits are right. It just looks a little different.

Re: fast division algoirthm

khomeyni2-Jul-13 2:42

khomeyni

2-Jul-13 2:42

thanks harold for your complete and good reply
I cannot add a hardware to the FPGA (the program will be complicated) and don't have any reciprocal support. I wrote some algorithms as follows( from the pages you send to me and from others):

C++

void test1(){
	const long MAX_Iter=10000000;//10^7
	int *ar1=new int [MAX_Iter];/// array of integers to find the reciprocal
	double *R=new double [MAX_Iter];//array of double for saving the regular division 1/x
	double *R2=new double [MAX_Iter];//array of double for saving the algorithms division 1/x
	double *ERROR=new double [MAX_Iter];
	double max_error=pow(10,-100.0);;

	/*
	*  generate some number bigger than 2
	*/
	for(int i=0;i<MAX_Iter;i++){
		ar1[i]=rand()+2+clock();
	}
	cout<<"===================================BESME Allah============================="<<endl<<endl;;
	cout<<"==========================regular algorithm 1/integer============================="<<endl;
	clock_t Start = clock();
	/*
	*regualr division 
	*/
	for(int i=0;i<MAX_Iter;i++){
		R[i]=1.0/ar1[i];
	}
	cout<<"				"<<clock()-Start<<endl;
	cout<<"==========================your suggestion algorithm (ASM):1/integer============================="<<endl;
	Start = clock();
	/*
	*suggested division (ASM)
	*/
	float inv;
	for(int i=0;i<MAX_Iter;i++){
		  //float inv=inverse(x);
		float x=ar1[i];
		unsigned int *y = (unsigned int*)&x;
		//*y=0x7EEEEBB3-*y;
		*y=0x7EF39252-*y;
		//inv=ar1[i];
		inv=x;
		  for(int j=0;j<3;j++){
			 inv=inv*(2-inv*ar1[i]);
		  }
		  R2[i]=inv;
	}
	cout<<"				"<<clock()-Start<<endl;
	max_error=pow(10,-100.0);
	for(int i=0;i<MAX_Iter;i++){
		ERROR[i]=abs(R[i]-R2[i]);
		if(max_error<ERROR[i])
			max_error=ERROR[i];
	}
	cout<<"					maximmum   error="<<max_error<<endl;
	cout<<"========================newton raphson division algorithm time=================="<<endl;
	Start = clock();
	/*
	*newton raphson division 
	*/
	int iter=15;
	for(int i=0;i<MAX_Iter;i++){
		//this part is for finding an approximate
		int j=32;
		double x=0.0000000004656612873077392578125;//2^(-32)
		while(!((ar1[i]<<(32-j))& 2147483648)){
			j--;
			x*=2;
		}
		//end of part of finding aproximate
		//newton raphson method
		for(int j=0;j<iter;j++){
			x=x*(2.0-ar1[i]*x);//x=x*(2-d*x);
		}
		R2[i]=x;
		//
	}
	cout<<"					"<<clock()-Start<<endl;
	max_error=pow(10,-100.0);
	for(int i=0;i<MAX_Iter;i++){
		ERROR[i]=abs(R[i]-R2[i]);
		if(max_error<ERROR[i])
			max_error=ERROR[i];
	}
	cout<<"					maximmum   error="<<max_error<<endl;
	cout<<"===========================goldschmidt division algorithm=========================="<<endl;
	/*
	*	goldschmidt division algorithm
	*/
	//
	#define BASE 31

	typedef unsigned int uint32;
	typedef unsigned long long int uint64;
	
	Start = clock();
//s	int iter=20;
	for(int i=0;i<MAX_Iter;i++){
		int shl = 0; // normalization shift
		uint32 nfp = ar1[i];//
		while ( (nfp & 0x80000000) == 0 ) { nfp <<= 1; shl++; } // use "clz" instead

		uint64 q = 0x100000000ULL; // 2^32
		uint64 e = 0x100000000ULL - (uint64)nfp; // 2^32-NFP
		int j;
		for (j=0;j<5;j++) //for (i=0;i<100;i++) //for (i=0;i<4;i++) // iterate
		{
			// Both multiplications are actually
			// 32x32 bits truncated to the 32 high bits
			q += (q*e)>>(uint64)32;
			e = (e*e)>>(uint64)32;
			//printf("Q=0x%llx E=0x%llx\n",q,e);
		}
		// Here, (Q/2^32) is the inverse of (NFP/2^32).
		// We have 2^31<=NFP<2^32 and 2^32<Q<=2^33
		R2[i]=(uint32)(q>>(64-2*BASE-shl));
	}
	cout<<"					"<<clock()-Start<<endl;
	max_error=pow(10,-100.0);
	for(int i=0;i<MAX_Iter;i++){
		ERROR[i]=abs(R[i]-R2[i]);
		if(max_error<ERROR[i])
			max_error=ERROR[i];
	}
	cout<<"					maximmum    error="<<max_error<<endl;
}

It gives me this answers:
===================================BESME Allah=============================

==========================regular algorithm 1/integer===========================
==
101
==========================your suggestion algorithm (ASM):1/integer=============
================
280
maximmum error=1.15959e-010
========================newton raphson division algorithm time==================

2146
maximmum error=4.12998e-006
===========================goldschmidt division algorithm=======================
===
2343
maximmum error=0.00328947
Press any key to continue . . .

as you see the best algorithm now is the ASM algorithm( it's my name for it) . but I have some problem with it;I can't use any pointer!!!
I wrote the same algorithm without pointer but it goes wrong(and hasn't any logic to answer correct);

C++

void test3(){
	float x;
	float inv;
	cout<<endl;
	float main_x;
	while(cin>>x){
		main_x=x;
		unsigned int *y = (unsigned int*)&x;
		//*y=0x7EEEEBB3-*y;
		*y=0x7EF39252-*y;
		//inv=ar1[i];
		inv=x;
		  for(int j=0;j<3;j++){
			 inv=inv*(2-inv*main_x);
		  }
		  cout<<"1/x="<<(1.0/main_x)<<"   our inv="<<inv<<"     error="<<(1.0/main_x)-inv<<endl;
	}
}
void test4(){
	float x;
	float inv;
	cout<<endl;
	float main_x;
	while(cin>>x){
		main_x=x;
		//unsigned int *y = (unsigned int*)&x;
		//*y=0x7EF39252-*y;
		x=(0x7EF39252-(unsigned int)x);
		inv=x;
		/*  for(int j=0;j<3;j++){
			 inv=inv*(2-inv*main_x);
		  }*/
		  cout<<"1/x="<<(1.0/main_x)<<"   our inv="<<inv<<"     error="<<(1.0/main_x)-inv<<endl;
	}
}

test3 works correct but work4 not! why? how it works? can I use a trick to use this procedure without using any pointer?(I can use only for, if , while, and some simple commands. pointer implementation in fpga will be hard).
thanks in advance;

modified 2-Jul-13 8:50am.

Re: fast division algoirthm

harold aptroot2-Jul-13 3:11

harold aptroot

2-Jul-13 3:11

"test4" is doing a value-converting cast, while the algorithm is meant to be used with a reinterpret_cast: take the bit-pattern of the float, and use it as though it was an integer. Depending on the available commands, points may not be necessary to implement that.
Perhaps if you tell me how this FPGA will work I can work out some way to implement that algorithm on it.

I'm not sure that Goldschmidt division is right - well actually the implementation is probably right and I've seen it before, but what it's doing (calculating the fixedpoint inverse of a fixedpoint input, I think?) doesn't match its usage there. That would also explain its super high error.

Re: fast division algoirthm

khomeyni2-Jul-13 8:58

khomeyni

2-Jul-13 8:58

actually I don't know anymore about it's implementation , my project manager say's something to me and I must do the desired algorithm. also I don't study anything about fpga and hardware implementations.

some thing that I know is that he wants to speed up his program by writing it in C , for example 6*6 inversion program which in it there is some 1/d where d is float(we can convert it to integer by multiplying it with 2^n) . he has wrote his code in labveiw and wants to convert it to C language. then it will be implemented on fpga.

Re: fast division algoirthm

harold aptroot2-Jul-13 9:44

harold aptroot

2-Jul-13 9:44

Ok I don't understand anymore. So you're making code that will be implemented on an FPGA itself, in the fabric, not as a program that is run on the processor that you're turning an FPGA into? Then why not use the special tricks that work well in hardware?

Algorithm Sum From Offset To Number

A*****17-Jun-13 12:37

A*****

17-Jun-13 12:37

I was using code similar to the following to calculate the sum from offset(where offset >=0) to length:

int offset = 2;
int length = 364;

float sum = 0.0f;

for(int i= offset; i<=length; i++)
{
    sum += (float) i;
}

However this seemed a naïve approach so I worked out(from the algorithm for sum (0 to number) that I could use the following:
where
f=offset
n=length

sum = ((1+n) * (n/2.0)) - ((1+f) * (f/2.0) - f)

which seems to give the correct answer.

Out of curiosity what algorithm would normally be used for this kind of thing?

My blog:[^]

Re: Algorithm Sum From Offset To Number

A*****17-Jun-13 17:07

A*****

17-Jun-13 17:07

Found the algorithm I was after on
www.wolframalpha.com/[^]

which answers my question.

My blog:[^]

Re: Algorithm Sum From Offset To Number

dusty_dex17-Jun-13 23:03

dusty_dex

17-Jun-13 23:03

Your link to Wolfram Alpha is pointless.

Where is the answer you found?

Re: Algorithm Sum From Offset To Number[Edited]

A*****18-Jun-13 12:50

A*****

18-Jun-13 12:50

Wolfram confirmed that the algorithm in my above question is the one to use.

To work out the algorithm, start with the algorithm for (sum i from i=0 to number) which is:
(n/2.0) * (n+1)

Then we need to subtract the offset using the same algorithm:
((n/2.0) * (n+1)) - ((f/2.0) * (f+1))

And finally account for when the offset is greater than zero by subtracting the offset:
((n/2.0) * (n+1)) - (((f/2.0) * (f+1))-f)

So the final algorithm is:

((n/2.0) * (n+1)) - (((f/2.0) * (f+1))-f)

Incidentally, the same thing applies for i^2, i^3, i^4, and etc, the algorithm just changes.

My blog:[^]

modified 18-Jun-13 19:22pm.

Re: Algorithm Sum From Offset To Number[Edited]

dusty_dex18-Jun-13 13:20

dusty_dex

18-Jun-13 13:20

But, does it have a name. this algorithm?

Re: Algorithm Sum From Offset To Number[Edited]

A*****18-Jun-13 13:24

A*****

18-Jun-13 13:24

Yes, Partial sums or Finite series.

My blog:[^]

Re: Algorithm Sum From Offset To Number

Bernhard Hiller18-Jun-13 20:57

Bernhard Hiller

18-Jun-13 20:57

C'mon, that's just the formula found by Gauss for calculating the sum from 1 to n. OK, in your case you have to substract the sum from 1 to offset from that sum (and make sure that you include/exclude the offset). But that's a simple thing.

Re: Algorithm Sum From Offset To Number

dusty_dex18-Jun-13 22:38

dusty_dex

18-Jun-13 22:38

Oh. I was thinking it looked more like one of Fermat theorems.

Smile | :)

Wikipedia's comments on Interpolation Search

harold aptroot11-Jun-13 0:59

harold aptroot

11-Jun-13 0:59

All the way at the bottom of wiki:interpolation_search[^] there's this text

Each iteration of the above code requires between five and six comparisons (the extra is due to the repetitions needed to distinguish the three states of < > and = via binary comparisons in the absence of a three-way comparison) plus some messy arithmetic, while the binary search algorithm can be written with one comparison per iteration and uses only trivial integer arithmetic. It would thereby search an array of a million elements with no more than twenty comparisons (involving accesses to slow memory where the array elements are stored); to beat that the interpolation search as written above would be allowed no more than three iterations.

(emphasis mine)

Is that even true? If so, why?

Because that makes exactly no sense to be. It appears to be implying that 1) accessing the same array element twice means reloading it or 2) reloading the same element takes as much time as loading it the first time
Neither of which bear any resemblance to reality.

So, what's going on here? Is wikipedia just wrong? Am I missing something?

Re: Wikipedia's comments on Interpolation Search

Alan Balkany13-Jun-13 8:27

Alan Balkany

13-Jun-13 8:27

It seems like the interpolation search does more work per iteration, so you could do more binary search iterations in the same time.

The extra work is done in the calculation of mid:

JavaScript

mid = low +
      ((toFind - sortedArray[low]) * (high - low)) /
      (sortedArray[high] - sortedArray[low]);  //out of range is possible.

while the binary search is leaner: http://en.wikipedia.org/wiki/Binary_search[^]

It looks like the calculation of mid requires on the order of ten extra instructions per iteration. And one of these is a division, which is relatively slower. Using the same array element a second time won't take twice the time because the value will be stored in the cache the first time.

One factor that may slow down the binary search is the recursive call at each iteration, although it could be written iteratively.

So the Wikipedia claim could be correct.

But the number of iterations needed for the interpolation search is dependent on how uniform the distribution of numbers is. For a near-uniform distribution, you'd only need one iteration of interpolation search.

Re: Wikipedia's comments on Interpolation Search

harold aptroot13-Jun-13 8:59

harold aptroot

13-Jun-13 8:59

Alan Balkany wrote:
So the Wikipedia claim could be correct.

Maybe. It's certainly slower - a factor of 7 still seems highly unlikely to me. Especially considering the essentially random nature of the memory accesses and of the branches.

algorithm that finds the m smallest numbers in a list of numbers

demo 29-Jun-13 21:00

demo 2

9-Jun-13 21:00

Hi guys I have a list of numbers i did algorithm to find the smallest number on that list

C++

# include<iostream>
# include<String>
using namespace std;
int main(){
 cout << "The Smallest Number on that list is: "<<  SmalList(  a ,size);

}

int SmalList(int a[], int size){

	int xsmall=a[0];

	for (int i=0; size>i; i++){
	
		if( a[i]<xsmall){
		
			xsmall=a[i];
		 
		}
	
	}
	return xsmall;
}

but i'm trying to find algorithm that can find two or more n smallest number on that list.

for example the list is A[]={2,4,8,3,1}

how can I find the the three smallest number from that list.
the function should return a list with three numbers

Re: algorithm that finds the m smallest numbers in a list of numbers

Richard MacCutchan9-Jun-13 21:14

Richard MacCutchan

9-Jun-13 21:14

You could use the same as your existing loop, but have an array to hold the numbers, or sort them into order first.

Use the best guess

Re: algorithm that finds the m smallest numbers in a list of numbers

demo 29-Jun-13 23:50

demo 2

9-Jun-13 23:50

this is the question :

SQL

Write an algorithm that finds the m smallest numbers in a list of n numbers.

and I did try to answer this question with this algorithm can fix it, I know I miss something on this code below :

C++

int getsmallest(int A[n], int m){
int smalist =[m];
small=A[0];
for(i=1;i<n;i++){

small=A[i]
}
return small;
}

Re: algorithm that finds the m smallest numbers in a list of numbers

Richard MacCutchan9-Jun-13 23:53

Richard MacCutchan

9-Jun-13 23:53

I don't think that would even compile.

Use the best guess

Re: algorithm that finds the m smallest numbers in a list of numbers

demo 210-Jun-13 0:07

demo 2

10-Jun-13 0:07

I know it's only simple design for algorithm, I don't need to do it complete code

only steps algorithm

Re: algorithm that finds the m smallest numbers in a list of numbers

Richard MacCutchan10-Jun-13 0:13

Richard MacCutchan

10-Jun-13 0:13

OK, but try following it through with some sample values and see what happens.

Use the best guess

Last Visit: 31-Dec-99 18:00 Last Update: 26-Sep-24 1:41

Refresh

ᐊ Prev 1...64 65 66 676869 70 71 72 73 Next ᐅ

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.