Click here to Skip to main content
15,884,298 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I am optimizing native version of matrix multiplication and I want to optimize it with OpenMP, SIMD and loop reordering. The following code is my attempt. My question is how I can modify the code so that I can avoid expensive memory write which is store function in the inner-most loop.

What I have tried:

C++
void mul_matrix(matrix *result, matrix *mat1, matrix *mat2){
    int I = mat1->rows;
    int J = mat2->cols;
    int K = mat2->rows;
    #pragma omp parallel for
    for(int i = 0; i < I; i++){
        for(int k = 0; k < K; k++){
            _m256d vA = _mm256_set1_pd(mat1->data[i * K + k]);
            for(int j = 0; j < J / 4 * 4; j += 4){
                _m256d sum = _mm256_loadu_pd(result->data + i * J + j);
                _m256d vB = _mm256_loadu_pd(mat2->data + k * J + j);
                _m256d intermediate = _mm256_mul_pd(vA, vB);
                sum = _mm256_add_pd(sum, intermediate);
                _mm256_storeu_pd(result->data + i * J + j, sum);
             }
             for(int x = J / 4 * 4; x < J; x++){
                 result->data[i * J + x] += mat1 -> data[i * K + k] * mat2 -> data[k * J + x];
             }
         }
     }
}
typedef struct matrix{
    int rows;
    int cols;
    double* data;
}matrix;
Posted
Updated 9-Aug-20 7:26am
v2

1 solution

check that you only calculate what you need and extract ANY constant value out of the loops and use const where possible.

Like in
C++
for(int x = J / 4 * 4; x < J; x++){
  result->data[i * J + x] += mat1 -> data[i * K + k] * mat2 -> data[k * J + x];
}
the constant
C++
int J4 = J / 4 * 4;
matrix *mr = result->data[i * J];
const matrix *m1 = mat1->data[i * K + k];
const matrix *m2 = mat2 -> data[k * J};

for(int x = J4; x < J; x++){
     mr[x] += m1 * mat2->m3[x];
}
Advanced is to create the assembly code and look if it shows some strange operations like type conversion or data aligments.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900