I am new to openmp. If anyone could suggest some more changes to get this program work faster, it would be very helpful. In this program values.txt is a file containing 60 crore ( 600 millions) lines with each line having a 32 bit permutation of 16 0's and 16 1's.
Explanation-
It would be very tough to explain here but I will try.
I have generated all of the 32 bit permutations of 16 0's and 16 1's line by line in a text file, values.txt.
Eg-
00000000000000001111111111111111
00000000000000010111111111111111
00000000000000011011111111111111
00000000000000011101111111111111
00000000000000011110111111111111
00000000000000011111011111111111
00000000000000011111101111111111
00000000000000011111110111111111
00000000000000011111111011111111
and so on....
Let us consider that each of the line of the text file is a boolean function.
I need to check for the reversibility of this function in a domain.
For this I picked up the first line from the text file and stored it into a column matrix of dimension 32x1, matrix a[][].
inside the nested for loops I am basically generating the domain values in form of a 3x3 matrix for which I need to check for the reversibility of the function.
I created a matrix g[][] of dimension 3x3 that is going to store the binary representation of all no. from 1 to 2^9. eg-
for 0 matrix g would look like-
0 0 0
0 0 0
0 0 0
for 1, matrix g would be-
0 0 0
0 0 0
0 0 1
for 2 matrix g would be
0 0 0
0 0 0
0 1 0
and so on upto 2^9.
for each matrix generated above from 0 to 2^9, I am computing a new matrix u[][] of dimension 3x3 based on my function.
This is done by reading 5 adjacent values to each element of the matrix.
for eg- consider g matrix to be
0 0 0
0 1 1
1 0 0
I pickup the first element,i.e,g[0][0], compute a new value for it using the five adjacent values(top value,left value,element itself,right value,below value) namely g[2][0],g[0][2],g[0][0],g[0][1],g[1][0]. These 5 no. combinely represent a binary no. I calculate its decimal equivalent and the decimal value corresponds to the row no. of matrix a[][] with which I have to update the vale of u[0][0].
I will repeat the above process for every element of g and will finally have a u matrix of 3x3.
this complete process was for one matrix, that it matrix corresponding to 0.
Like this for every g[][] matrix from 0 to 2^9, I will create 2^9 matrices.
At any point of time if for two matrices g[][], matrix u[][] happens to be same I abort the function, reading the second line of text file and again begin the above process, i.e., I am not interested with functions that result in duplicate matrices. If all of the 2^9 matrices happen to be different, I write the value of the corresponding function(line from text file) into another text file.
So therefore,summing up, I need to create a total of 60 crore* 2^9 matrices for the overall computation.
The thing is that for a particular function from the text files,the 2^9 matrices are calculated individually. If somehow I could parallelize them, I would lessen the computation time greatly... and there is where I need help.
I hope you got the method.
Also actually I used three nested loops because this was a sample program. In actual I donot need to calculate 2^9 matrices but a total of 2^128 matrices. My actual g matrix would be of order 16x8 and same will be u matrix. And since there is no 128 bit datatype and openMP only support integer datatype in standard library, I thought of using nested loops.
New program Updated:
#include <algorithm>
#include <fstream>
#include <iostream>
#include <string>
#include <math.h>
using namespace std;
#include <boost/multiprecision/cpp_int.hpp>
using namespace boost::multiprecision;
#include <boost/lexical_cast.hpp>
#include <cctype>
#include <boost/assign/list_of.hpp>
#include <set>
#include <stdint.h>
#include <omp.h>
#define convertToString(x) #x
using namespace boost::assign;
int main()
{
int xyz=0;
ifstream infile;
infile.open("values.txt");
ofstream outfile;
outfile.open("haha.txt");
short a[32][1];
while(!infile.eof())
{
string STRING;
getline(infile,STRING);
set<string> SET;
int count=0;
for(int i=0;i<32;i++)
{
a[i][0]=STRING.at(i)-'0';
}
int g[9];
int u[9];
char buffer[10];
buffer[9] = 0;
uint16_t f = 0;
int max = (int)pow(2,3);
for(int r=0;r<max && count!=1;r++)
{
for(int s=0;s<max && count!=1;s++)
{
for(int t=0;t<max && count!=1;t++)
{
for(int i = 0; i < 9; ++i)
{
g[i] = (f & (1 << (8 - i))) != 0;
}
++f;
u[0]=a[(g[6]*2*2*2*2)+(g[2]*2*2*2)+(g[0]*2*2)+(g[1]*2)+(g[3]*1)][0];
u[1]=a[(g[7]*2*2*2*2)+(g[0]*2*2*2)+(g[1]*2*2)+(g[2]*2)+(g[4]*1)][0];
u[2]=a[(g[8]*2*2*2*2)+(g[1]*2*2*2)+(g[2]*2*2)+(g[0]*2)+(g[5]*1)][0];
u[3]=a[(g[0]*2*2*2*2)+(g[5]*2*2*2)+(g[3]*2*2)+(g[4]*2)+(g[6]*1)][0];
u[4]=a[(g[1]*2*2*2*2)+(g[3]*2*2*2)+(g[4]*2*2)+(g[5]*2)+(g[7]*1)][0];
u[5]=a[(g[2]*2*2*2*2)+(g[4]*2*2*2)+(g[5]*2*2)+(g[3]*2)+(g[8]*1)][0];
u[6]=a[(g[3]*2*2*2*2)+(g[8]*2*2*2)+(g[6]*2*2)+(g[7]*2)+(g[0]*1)][0];
u[7]=a[(g[4]*2*2*2*2)+(g[6]*2*2*2)+(g[7]*2*2)+(g[8]*2)+(g[1]*1)][0];
u[8]=a[(g[5]*2*2*2*2)+(g[7]*2*2*2)+(g[8]*2*2)+(g[6]*2)+(g[2]*1)][0];
for(int i = 0; i < 9; ++i)
{
buffer[i] = '0' + u[i];
}
if(!SET.insert(::std::string(buffer)).second)
{
count = 1;
}
}
}
}
if(count==0)
{
outfile<<STRING<<"\n";
cout<<STRING<<"\n";
}
}
infile.close();
outfile.close();
return 0;
}
What I have tried:
I tried a lot parallelizing this code and ended up with some modifications but I still feel I am not getting the desired timing.