Click here to Skip to main content
15,887,446 members
Articles / Programming Languages / C#
Article

Inner Product Experiment: C# vs. C/C++

Rate me:
Please Sign up or sign in to vote.
3.66/5 (12 votes)
20 May 2008GPL32 min read 120.7K   362   21   45
The article demonstrating speed of inner product operation performed with shorts, ints, longs, floats, doubles and decimals in C# compared to C/C++

Introduction

The inner product (or dot product, scalar product) operation is the major one in digital signal processing field. It is used everywhere, Fourier (FFT, DCT), wavelet-analysis, filtering operations and so on. After written the similar article for the inner product in C/C++ Inner Product Experiment: CPU, FPU vs. SSE* I was thinking how the same code written in C# will perform. I repeated the inner product operations using C# types: shorts, ints, longs, floats, doubles and decimals.

Background

Inner Product Experiment: CPU, FPU vs. SSE*

Using the code

Just run the inner.exe providing as an argument the size of vector you want to convolve with. Make sure you placed timer.dll in the same directory with the executable. It provides tic() and toc() functions implementing precision time counter in milliseconds. I use the dll in PerformanceCounter static class in functions PerformanceCounter.Tic() and PerformanceCounter.Toc().

C#
static public class PerformanceCounter
{        
//Constructors

//Enums, Structs, Classes

//Properties

//Methods
//operators
//operations
        [DllImport("timer")]
        static extern void tic();
        [DllImport("timer")]
        static extern long toc(); 

        static public long Tic()
        {
                try
                {
                        tic();
                        return 0;
                }
                catch (Exception e)
                {
                        Console.WriteLine(String.Format("PerformanceCounter.Tic() {0}", e.Message));
                        return -1;
                }
        }

        static public long Toc()
        {
                try
                {                        
                        return toc();
                }
                catch (Exception e)
                {
                        Console.WriteLine(String.Format("PerformanceCounter.Toc() {0}", e.Message));
                        return -1;
                }
        }

//access
//inquiry

//Fields       
}

The main console body contains that code. I included only doubles function here to save space:

C#
class Program
{
        static int size = 1000000;

        static void Main(string[] args)
        {
                try
                {
                        if (args.Length >= 1)
                                size = (int)Convert.ToUInt32(args[0]);
                }
                catch (Exception e)
                {
                        Console.WriteLine(String.Format("Can not convert {0} to uint32: {1}", args[0], e.Message));
                        size = 1000000;
                }

                shorts();
                ints();
                longs();
                floats();
                doubles();
                decimals();
        }

        //...
        
        static void doubles()
        {
                double[] a = new double[size];
                double[] b = new double[size];

                Random rnd = new Random();
                for (int i = 0; i < size; i++)
                {
                        a[i] = rnd.NextDouble() - 0.5;
                        b[i] = rnd.NextDouble() - 0.5;
                }

                PerformanceCounter.Tic();

                double c = 0.0;
                for (int i = 0; i < size; i++)
                        c += a[i] * b[i];

                Console.WriteLine(String.Format(" doubles: {0} ms", PerformanceCounter.Toc()));

                a = null;
                b = null;
        }

        //...

Below is the example of the console output for 5000000 dimensional vectors.

C#
>inner.exe 5000000
 shorts: 16 ms
 ints: 7 ms
 longs: 69 ms
 floats: 9 ms
 doubles: 9 ms
 decimals: 2569 ms

I was actually stunned seeing floats and doubles in C# performing 1.3 to 3.3 times faster than in C/C++ even SSE optimized. It should not be so, as the code is managed and compiled during run-time and it is the same CPU/FPU? but how is it possible to run faster? If you now the answer post it here. See the Inner Product Experiment: CPU, FPU vs. SSE* article on the performance times for corresponding numeric types in C/C++. Ints perform a little faster but it might be of no profit quantizing floats to fixed point arithmetic and C# again outperforms C/C++ runing 2.28 times faster. However shorts and longs run quite slow. Shorts in C# perform as fast as in C/C++ but SSE2 intrinsics however outperform C#. You should prevent yourself to not to use decimals until you need high precision after comma, otherwise it will run the computation forever.

Having all that amenities in C# programming shall we not migrate DSP applications from C++?

Update (7 Apr 2008)

Sadly to C# adherents and to great delight of C++ gurus as the labours we spent in C/C++ were not yet in vain. The C# compiler indeed optimizes the code the way to avoid unused variables somehow, that indeed led me astray. To regain tarnished C++ glory here is the example of C# output for 5000000 sized vectors:

C#
>inner.exe 5000000
 shorts: 16 ms 
  27006 
 ints: 18 ms 
  1240761 
 longs: 72 ms 
  -5610477 
 floats: 30 ms 
  33,548 
 doubles: 35 ms 
  198,949191315363 
 decimals: 2936 ms 
  138,23876271661179995948054686

It leaves however some space for dispute as why it does not removed unused for() for shorts and longs. The doubles run slower compared to floats contrariwise for C++ where doubles outperforms floats.

Update (6 May 2008)

Unfolding for() loops indeed provided speed up but only in case of unfolding 4 times. The same trick did not provided performance increase in C++ code. This is how I did the unfolding:

C#
...
float c = 0.0f;
int ii = 0;
for (int i = 0; i < size / 4; i++)
{
        c += a[ii] * b[ii];
        ii++;
        c += a[ii] * b[ii];
        ii++;
        c += a[ii] * b[ii];
        ii++;
        c += a[ii] * b[ii];
        ii++;
}
...

And the results are shown below:

C#
>inner.exe 5000000
 shorts: 16 ms
  -24687
 shorts 4loop: 14 ms
  7038
 ints: 18 ms
  19686
 ints 4loop: 16 ms
  9090795
 longs: 71 ms
  -870676
 longs 4loop: 75 ms
  -8263341
 floats: 32 ms
  43,41741
 floats 4loop: 15 ms
  11,02298
 doubles: 34 ms
  194,810329249757
 doubles 4loop: 24 ms
  -495,312642682424
 doubles unsafe: 32 ms
  -283,031436372233
 decimals: 2550 ms
  368,82465505657333076693624932
 decimals 4loop: 2611 ms
  -50,405825071718589646106671809

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


Written By
Engineer
Russian Federation Russian Federation
Highly skilled Engineer with 14 years of experience in academia, R&D and commercial product development supporting full software life-cycle from idea to implementation and further support. During my academic career I was able to succeed in MIT Computers in Cardiology 2006 international challenge, as a R&D and SW engineer gain CodeProject MVP, find algorithmic solutions to quickly resolve tough customer problems to pass product requirements in tight deadlines. My key areas of expertise involve Object-Oriented
Analysis and Design OOAD, OOP, machine learning, natural language processing, face recognition, computer vision and image processing, wavelet analysis, digital signal processing in cardiology.

Comments and Discussions

 
QuestionWhy the suprise? Pin
adamvanner15-Feb-09 9:10
adamvanner15-Feb-09 9:10 
GeneralC/C++ performance Pin
nickythequick27-May-08 4:19
nickythequick27-May-08 4:19 
Generalnice results but... Pin
dmihailescu20-May-08 6:39
dmihailescu20-May-08 6:39 
GeneralRe: nice results but... Pin
Chesnokov Yuriy26-May-08 21:20
professionalChesnokov Yuriy26-May-08 21:20 
NewsStephen Hewitt and reinux code results. Look here anyone please before blaiming my article Pin
Chesnokov Yuriy8-Apr-08 21:15
professionalChesnokov Yuriy8-Apr-08 21:15 
GeneralRe: Stephen Hewitt and reinux code results. Look here anyone please before blaiming my article Pin
dshorter19-Apr-08 11:20
dshorter19-Apr-08 11:20 
GeneralRe: Stephen Hewitt and reinux code results. Look here anyone please before blaiming my article Pin
Stephen Hewitt9-Apr-08 19:10
Stephen Hewitt9-Apr-08 19:10 
AnswerRe: Stephen Hewitt and reinux code results. Look here anyone please before blaiming my article Pin
Chesnokov Yuriy26-May-08 21:25
professionalChesnokov Yuriy26-May-08 21:25 
GeneralRe: Stephen Hewitt and reinux code results. Look here anyone please before blaiming my article Pin
Rei Miyasaka10-Apr-08 21:39
Rei Miyasaka10-Apr-08 21:39 
GeneralRe: Stephen Hewitt and reinux code results. Look here anyone please before blaiming my article Pin
Marcin Śmiałek2-Oct-08 21:51
professionalMarcin Śmiałek2-Oct-08 21:51 
Interesting article. And I have some thoughts too share... In fact that's pretty much text and if you don't feel like reading everything, just scroll to the bottom.

If you want to be certain about the results, please ensure that you have AMD CPU drivers installed.
I can remember that when I was evaluating CUDA performance on AMD machine (AMD X2, XP 32bit), I was getting weird, unpredictable results, including negative running time. Of course it's just wrong.

When I installed drivers and dual core optimizer which I downloaded from AMD site, results become more reliable. The variance decreased a lot and I finally started to obtain expectable running time readouts. For time checking in .Net 1.1, I was using a simpler wrapper for DllImport and QueryPerformanceCounter & QueryPerformanceFrequency combo and in .Net 2.0 and later, there's System.Diagnostics.Stopwatch, which does more or less the same.

Another thing, which may increase precision, is calling Thread.Sleep just before the benchmark. Even Thread.Sleep(0) should help. The program gets the CPU access for some time and after that period other process may get some CPU, while your program is suspended. Calling Thread.Sleep will make your test more likely to start just on the beginning of this "CPU is yours" time interval. You can also set thread priority to high or even real time to decrease probability that something interrupts your test even more. It will only decrease the probability (and in fact frequency) of your program getting interrupted. The number of loop iterations have to be high enough, so small delays in CPU access won't disturb the results too much. These might be other programs wanting something to do, IOs etc. With significant loop count, you can also forget about timer calibration.

In case of multi-core processors, you might also consider using affinity lock. Core switching slows things down considerably and may result in erroneous benchmarks especially with so short running time.

Finally - CPU optimizations. It's nothing uncommon that if the compiler finds that some computations don't depend on each other, they're modified to be done in parallel. If you're doing some vector or matrix operations, then such optimization would be present in both real-world program and your benchmark, but if you compare heavily-optimized test to program that is done sequentially, results may differ a lot. Obviously, the best practice is to try to make the test mimic production environment. It's always recommended to check the executable with Reflector. It won't tell exactly how the machine code will look like, but still it's really useful.

People often blame .Net for being slow, while usually it's just high startup time and poorly written program with tons of function that appear to be simple but in fact are not.

To sum things up:
Install AMD drivers (& dual core optimizer), affinity lock, high/rt priority, Thread.Sleep, huge loop count, think about optimizations
Generalscalar product and senseless results Pin
Sigismondo Boschi8-Apr-08 4:55
Sigismondo Boschi8-Apr-08 4:55 
AnswerRe: scalar product and senseless results Pin
Chesnokov Yuriy8-Apr-08 20:20
professionalChesnokov Yuriy8-Apr-08 20:20 
GeneralRe: scalar product and aplogizes Pin
Sigismondo Boschi9-Apr-08 21:58
Sigismondo Boschi9-Apr-08 21:58 
GeneralUnroll those loops Pin
jpmik7-Apr-08 9:01
jpmik7-Apr-08 9:01 
GeneralRe: Unroll those loops Pin
Stephen Hewitt7-Apr-08 15:38
Stephen Hewitt7-Apr-08 15:38 
GeneralRe: Unroll those loops Pin
jpmik8-Apr-08 7:09
jpmik8-Apr-08 7:09 
GeneralRe: Unroll those loops 4-6 times faster Pin
bcarpent122820-May-08 9:22
bcarpent122820-May-08 9:22 
GeneralOOPS Unroll those loops 4-6 times correction Pin
bcarpent122820-May-08 9:41
bcarpent122820-May-08 9:41 
GeneralUnsafe C# [modified] Pin
Rei Miyasaka5-Apr-08 11:17
Rei Miyasaka5-Apr-08 11:17 
GeneralYou are not using result of calculation... Pin
mihasik3-Apr-08 8:35
mihasik3-Apr-08 8:35 
GeneralRe: You are not using result of calculation... Pin
Rei Miyasaka5-Apr-08 10:53
Rei Miyasaka5-Apr-08 10:53 
GeneralRe: You are not using result of calculation... Pin
Stephen Hewitt6-Apr-08 19:02
Stephen Hewitt6-Apr-08 19:02 
GeneralMy programs begs to differ! Pin
Stephen Hewitt2-Apr-08 11:29
Stephen Hewitt2-Apr-08 11:29 
GeneralRe: My programs begs to differ! Pin
Chesnokov Yuriy2-Apr-08 23:18
professionalChesnokov Yuriy2-Apr-08 23:18 
GeneralRe: My programs begs to differ! Pin
Stephen Hewitt6-Apr-08 15:31
Stephen Hewitt6-Apr-08 15:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.