14,981,768 members
Articles / General Programming / Algorithms
Article
Posted 20 Dec 2020

17.7K views
6 bookmarked

Integer Factorization: Dreaded List of Primes

Rate me:
Using a large list of primes with Trial Division algorithm and how to handle the list

Introduction

The Trial Division algorithm is about factorizing an integer by checking all possible factors, and all interesting factors are prime numbers. The problem is how to handle a large list of prime numbers.

First Approach: Plain List of Prime Numbers

When using a list of primes, the first approach is to use a simple array of integers containing the prime numbers.

Under 64K, a prime number fits in an `int16` (2 bytes), beyond, a prime number rather fits in an `int32` (4 bytes).

Figures for a simple list of primes:

 Upper limit Number of primes Size of array 65,536 6,542 13,084 100,000 9,592 38,368 500,000 41,538 166,152 1,000,000 78,498 313,992 5,000,000 348,513 1,394,052 10,000,000 664,579 2,658,316 100,000,000 5,761,455 23,045,820, 2^32-1 4,294,967,295 203,280,221 813,120,884

As one can see, things get really ugly when the list is huge.

Sample code:

C++
```// Trial Division: Square Root + Prime list + Wheel
// Check all numbers until Square Root, start with a list of primes
// and Skip useless factors with a wheel
long long TD_SRPW(long long Cand) {
long long SPrimes[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,
47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113,
127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191,
193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263,
269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,
349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421,
431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499,
503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593,
599, 601, 0 };
long long Wheel[] = { 6, 4, 2, 4, 2, 4, 6, 2, 0 };
long long Div;

Count = 0;
long long Top = sqrt(Cand);
// Check small primes
for (long long Ind = 0; SPrimes[Ind] != 0; Ind++) {
Div = SPrimes[Ind]; // for debug purpose
if (SPrimes[Ind] > Top) {
return Cand;
}
Count++;
if (Cand % SPrimes[Ind] == 0) {
return SPrimes[Ind];
}
}
// Start the Wheel
Div = 601;
while (Div <= Top) {
for (long long Ind = 0; Wheel[Ind] != 0; Ind++) {
if (Div > Top) {
return Cand;
}
Count++;
if (Cand % Div == 0) {
return Div;
}
Div += Wheel[Ind];
}
}
return Cand;
}```
dBase
```    //  Trial Division Square Root + Wheel + list of primes
// with a large list of primes
// May 2019
function TD_SRW2(Prod)
local D, Top, SPrimes, Wheel, W
// Check small primes
SPrimes= {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139,
149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211,
223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293}
Wheel= {6, 4, 2, 4, 2, 4, 6, 2}

Top= int(sqrt(Prod))
for each D in SPrimes
if Top > D
return Prod
endif
if Prod % D = 0
return D
endif
next
// Start the wheel
D= 301
while D <= Top
for each W in wheel
D += W
if Top > D
return Prod
endif
if Prod % D = 0
return D
endif
next
enddo
return Prod```

Second Approach: Compress the List of Prime Numbers by Encoding in a Bitfield Array

Rather than a list of primes, the principle is to encode if a number is a prime or not, it is boolean information and fits in a single bit.

With this approach, the same optimizations than in the classic variants of Trial Division algorithm can apply and the saving is huge.

Bitfield of All Numbers

Very simple minded, 1 bit per number, or 8 numbers encoded in a byte.

The figures are as follows:

 Upper limit Number of primes Size of array 65,536 6,542 8,192 100,000 9,592 12,500 500,000 41,538 62,500 1,000,000 78,498 125,000 5,000,000 348,513 625,000 10,000,000 664,579 1,250,000 100,000,000 5,761,455 12,500,000, 2^32-1 4,294,967,295 203,280,221 536,870,912

Bitfield of All Odd Numbers

A little more optimized, 1 bit per odd number, or 16 numbers (8 odd numbers) encoded in a byte.

The figures are as shown below:

 Upper limit Number of primes Size of array 65,536 6,542 4,096 100,000 9,592 6,250 500,000 41,538 31,250 1,000,000 78,498 62,500 5,000,000 348,513 312,500 10,000,000 664,579 ,625,000 100,000,000 5,761,455 6,250,000, 2^32-1 4,294,967,295 203,280,221 268,435,456

Bitfield of All Numbers Filtered With the Wheel 30

Then really more optimized, with the wheel, only 8 possible primes per slice of 30, or 30 numbers (8 possible primes) encoded in a byte.

The figures are:

 Upper limit Number of primes Size of array 65,536 6,542 2,188 100,000 9,592 3,333 500,000 41,538 16,667 1,000,000 78,498 33,333 5,000,000 348,513 166,667 10,000,000 664,579 333,334 100,000,000 5,761,455 3,333,334, 2^32-1 4,294,967,295 203,280,221 143,165,577

Code exploiting the list of primes (stored in header file).

C++
```// Trial Division: Square Root + Long Prime list + Wheel
// Check all numbers until Square Root, start with a list of primes
// and Skip useless factors with a wheel
long long TD_SRLPW(long long Cand) {
long long WPrimes[] = { 2, 3, 5, 0 };
long long Wheel[] = { 6, 4, 2, 4, 2, 4, 6, 2, 0 };
// unsigned char EPrimes[] = { 0xfe, 0xdf, 0xef, 0x7e, 0xb6, 0xdb, 0x3d, ...
// Encoded list moved to header file

long long Div;

Count = 0;
long long Top = sqrt(Cand);
// Check small primes
for (long long Ind = 0; WPrimes[Ind] != 0; Ind++) {
Div = WPrimes[Ind]; // for debug purpose
if (WPrimes[Ind] > Top) {
return Cand;
}
Count++;
if (Cand % WPrimes[Ind] == 0) {
return WPrimes[Ind];
}
}
// start the wheel with encoded primes
Div = 1;
int EInd = 0;
while (Div <= Top) {
unsigned char Flag = EPrimes[EInd];
// if (Flag == 0) {
if (EInd >= EPrimesLen) {
break;
}
for (long long Ind = 0; Wheel[Ind] != 0; Ind++) {
if (Div > Top) {
break;
}
Count++;
if (Cand % Div == 0) {
return Div;
}
}
Div += Wheel[Ind];
}
EInd++;
}

// Start the Wheel after encoded primes
while (Div <= Top) {
for (long long Ind = 0; Wheel[Ind] != 0; Ind++) {
if (Div > Top) {
return Cand;
}
Count++;
if (Cand % Div == 0) {
return Div;
}
Div += Wheel[Ind];
}
}
return Cand;
}```

Encoding the List of Primes

The Wheel of 30 has 8 slots, a char have 8 bits, so both are in synch, it simplifies its usage. Each slot is assigned a bit. The value of a `char` is the sum of bits of slots which are Primes.

Here is how the list is encoded:

```0x01    0x02    0x04    0x08    0x10    0x20    0x40    0x80    code
7       11      13      17      19      23      29      0xfe
31      37      41      43      47              53      59      0xdf
61      67      71      73              79      83      89      0xef
97      101     103     107     109     113             0x7e
127     131             137     139             149     0xb6
151     157             163     167             173     179     0xdb
181             191     193     197     199                     0x3d
211                     223     227     229     233     239     0xf9
241             251             257             263     269     0xd5
271     277     281     283                     293             0x4f
307     311     313     317                             0x1e
331     337                     347     349     353     359     0xf3
367             373             379     383     389     0xea
397     401                     409             419     0xa6
421             431     433             439     443     449     0xed
457     461     463     467                     479     0x9e
487     491                     499     503     509     0xe6
521     523                                     0x0c
541     547                     557             563     569     0xd3
571     577                     587             593     599     0xd3```

The code showing how the list is encoded is as given below:

C++
```void TD_EncodeDtl(int Max) {
long long Wheel[] = { 6, 4, 2, 4, 2, 4, 6, 2, 0 };
long long Cand = 1;
cout << "0x01\t0x02\t0x04\t0x08\t0x10\t0x20\t0x40\t0x80\tcode" << endl;
while (Cand < Max) {
int Code = 0;
int Index = 1;
int Ind = 0;
do {
if ((TD_SRW(Cand) == Cand) && (Cand != 1)) {
Code |= Index;
cout << dec << Cand;
}
cout << "\t";
Cand += Wheel[Ind];
Index = Index << 1;
Ind++;
} while (Wheel[Ind] != 0);
cout << "0x" << setfill('0') << setw(2) << hex << Code << dec << endl;
}
}```

And the code to generate the array is as follows:

C++
```// encode Prime Numbers as array
void TD_EncodeArray(int Max) {
long long Wheel[] = { 6, 4, 2, 4, 2, 4, 6, 2, 0 };
long long Cand = 1;
cout << "Encodage liste nombres premiers compressée" << endl;
cout << "{";
while (Cand < Max) {
int Code = 0;
int Index = 1;
int Ind = 0;
do {
if ((TD_SRW(Cand) == Cand) && (Cand != 1)) {
Code |= Index;
}
Cand += Wheel[Ind];
Index = Index << 1;
Ind++;
} while (Wheel[Ind] != 0);
cout << "0x";
cout << setfill('0') << setw(2) << hex << Code << ",";
}
cout << "0}" << dec << endl;
}```

Benchmark

We are counting divisions/modulos to measure the efficiency of the variantes. The encoded list of Primes goes up to 1,000,000 and is stored in a 33kB array.

```Number				TD_SR	TD_SREF	TD_SRW	TD_SRLPW	Delta
1,009				30		16		11		11			0
2,003				43		22		14		14			0
3,001				53		27		17		16			1
4,001				62		32		19		18			1
5,003				69		35		20		19			1
6,007				76		39		23		21			2
7,001				82		42		25		23			2
8,009				88		45		26		24			2
9,001				93		47		27		24			3
10,007				99		50		28		25			3
20,011				140		71		40		34			6
30,011				172		87		49		40			9
40,009				199		100		56		46			10
49,999				222		112		62		48			14
1,000,003			999		500		268		168			100
4,000,021			1,800	901		483		279			204
9,000,011			2,999	1,500	802		430			372
16,000,057			3,999	2,000	1,068	550			518
25,000,009			4,999	2,500	1,336	669			667
36,000,007			5,999	3,000	1,602	783			819
49,000,027			6,999	3,500	1,868	900			968
64,000,031			7,999	4,000	2,136	1,007		1,129
81,000,001			8,999	4,500	2,402	1,117		1,285
100,000,007			9,999	5,000	2,668	1,229		1,439
121,000,003			10,999	5,500	2,936	1,335		1,601
144,000,001			11,999	6,000	3,202	1,438		1,764
169,000,033			12,999	6,500	3,468	1,547		1,921
196,000,001			13,999	7,000	3,736	1,652		2,084
225,000,011			14,999	7,500	4,002	1,754		2,248
256,000,001			15,999	8,000	4,268	1,862		2,406
289,000,001			16,999	8,500	4,536	1,960		2,576
10,000,000,019		99,999	50,000	26,668	9,592		17,076
1,000,000,000,039	999,999	500,000	266,668	78,498		188,170```

Delta is here to highlight the savings of the huge list of primes.

C++
```long long Test[] = { 101, 1009, 2003, 3001, 4001, 5003, 6007, 7001, 8009,
9001, 10007, 20011, 30011, 40009, 49999, 1000003, 4000021, 9000011,
16000057, 25000009, 36000007, 49000027, 64000031, 81000001,
100000007, 121000003, 144000001, 169000033, 196000001, 225000011,
256000001, 289000001, 10000000019, 1000000000039, 0 };
cout << "Comparaison Variantes" << endl;
cout << "Number\tTD_SR\tTD_SREF\tTD_SRW\tTD_SRLPW\tDelta" << endl;
// test after first 0
for (Ind = 0; Test[Ind] != 0; Ind++) {
cout << Test[Ind] << "\t";
TD_SR(Test[Ind]);
cout << Count << "\t";
TD_SREF(Test[Ind]);
cout << Count << "\t";
TD_SRW(Test[Ind]);
int C1 = Count;
cout << Count << "\t";
TD_SRLPW(Test[Ind]);
cout << Count << "\t";
cout << C1 - Count << endl;
}
cout << endl;
```

Accelerating IsPrime with the Compressed List of Primes

We can take advantage of huge list of Primes and use it as a loopup table.

With a plain list, we need to do a dichotomy search.

With an encoded list, a direct reading is possible.

C++
```// IsPrime is taking advantage of long list of primes table
int IsPrimeLP(long long Cand) {
if (Cand <= EPrimesMax) {
Count = 1;
// Cand is within list of primes
long long x = Cand / WSize; // position in wheel list
long long y = Cand % WSize; // offset in wheel
if (y == 1)
return (EPrimes[x] | 0x01);
else if (y == 7)
return (EPrimes[x] | 0x02);
else if (y == 11)
return (EPrimes[x] | 0x04);
else if (y == 13)
return (EPrimes[x] | 0x08);
else if (y == 17)
return (EPrimes[x] | 0x10);
else if (y == 19)
return (EPrimes[x] | 0x20);
else if (y == 23)
return (EPrimes[x] | 0x40);
else if (y == 29)
return (EPrimes[x] | 0x80);
else if (Cand == 2)
return 1;
else if (Cand == 3)
return 1;
else if (Cand == 5)
return 1;
else
return 0; // multiple of 2, 3 or 5
}
// Search a Factor
return (TD_SRLPW(Cand) == Cand);
}```

Points of Interest

Trial Division algorithm improves as the list of primes get bigger, and the compression helps a lot.

This work is original. If you know something similar, please drop a link in the forum below.

History

• 18th December, 2020: First version
• 25th December, 2020: Second version - Corrections with improvements
• 6th February, 2021: Added `IsPrime` code + corrections

Share

 Database Developer France
I am a professional programmer.
Problem analyse is certainly what I am best at.
My main programming expertise is in the xBase languages (dBase, Clipper, FoxPro, Harbour, xHarbour), then VBA for Excel and advanced Excel WorkBooks.

I also have knowledge on C/C++, d language, HTML, SVG, XML, XSLT, Javascript, PHP, BASICs, Python, COBOL, Assembly.
My personal interests goes to algorithm optimization and puzzles.

 First Prev Next
 What about quickly *calculating* whether a number > 2^48 is prime? Visual Herbert2-Apr-21 4:08 Visual Herbert 2-Apr-21 4:08
 Re: What about quickly *calculating* whether a number > 2^48 is prime? Patrice T2-Apr-21 7:20 Patrice T 2-Apr-21 7:20
 Re: What about quickly *calculating* whether a number > 2^48 is prime? Lee Carpenter5-Apr-21 9:17 Lee Carpenter 5-Apr-21 9:17
 Re: What about quickly *calculating* whether a number > 2^48 is prime? Patrice T5-Apr-21 10:22 Patrice T 5-Apr-21 10:22
 Question re fcn SRLPW ( ) Member 117206811-Jan-21 13:42 Member 11720681 1-Jan-21 13:42
 Re: Question re fcn SRLPW ( ) Patrice T1-Jan-21 14:43 Patrice T 1-Jan-21 14:43
 Gone too far - or not far enough? Stefan_Lang22-Dec-20 8:54 Stefan_Lang 22-Dec-20 8:54
 Re: Gone too far - or not far enough? Patrice T26-Dec-20 19:26 Patrice T 26-Dec-20 19:26
 functions TD_EncodeDtl ( ) and TD_EncodeArray() are 99% identical..Why do you need two functions ?? Member 1172068122-Dec-20 8:41 Member 11720681 22-Dec-20 8:41
 Re: functions TD_EncodeDtl ( ) and TD_EncodeArray() are 99% identical..Why do you need two functions ?? Patrice T22-Dec-20 10:56 Patrice T 22-Dec-20 10:56
 Re: functions TD_EncodeDtl ( ) and TD_EncodeArray() are 99% identical..Why do you need two functions ?? Member 1172068123-Dec-20 5:20 Member 11720681 23-Dec-20 5:20
 Re: functions TD_EncodeDtl ( ) and TD_EncodeArray() are 99% identical..Why do you need two functions ?? Patrice T23-Dec-20 5:50 Patrice T 23-Dec-20 5:50
 Re: functions TD_EncodeDtl ( ) and TD_EncodeArray() are 99% identical..Why do you need two functions ?? Member 1172068123-Dec-20 6:45 Member 11720681 23-Dec-20 6:45
 Re: functions TD_EncodeDtl ( ) and TD_EncodeArray() are 99% identical..Why do you need two functions ?? Patrice T23-Dec-20 7:16 Patrice T 23-Dec-20 7:16
 My vote of 4 Michael Thompson 12321-Dec-20 16:25 Michael Thompson 123 21-Dec-20 16:25
 Re: My vote of 4 Patrice T21-Dec-20 17:01 Patrice T 21-Dec-20 17:01
 Re: My vote of 4 Patrice T28-Dec-20 9:52 Patrice T 28-Dec-20 9:52
 Re: My vote of 4 Michael Thompson 12328-Dec-20 16:11 Michael Thompson 123 28-Dec-20 16:11
 My vote of 5 PhilipOakley21-Dec-20 4:38 PhilipOakley 21-Dec-20 4:38
 Re: My vote of 5 Patrice T21-Dec-20 5:14 Patrice T 21-Dec-20 5:14
 Last Visit: 31-Dec-99 18:00     Last Update: 3-Aug-21 13:07 Refresh 1