Click here to Skip to main content
15,885,216 members
Everything / Programming Languages / CUDA

CUDA

CUDA

Great Reads

by CodeProject
Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language
by Carlos Jiménez de Parga
A reusable Visual C++ framework for real-time volumetric cloud rendering, animation and morphing
by Maxim Kartavenkov
Article describes how to make H.264 Video Encoder DirectShow Filter using NVIDIA encoder API in C#
by Nick Kopp
This article builds upon the earlier High Performance Queries: GPU vs. PLINQ vs. LINQ and ports this to also support OpenCL devices and adds benchmarking so you can easily compare performance.

Latest Articles

by CodeProject
Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language
by Robert Mueller-Albrecht
Using the Intel® oneAPI Math Kernel Library SYCL API
by Ryan Scott White
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of your Cuda code.
by Carlos Jiménez de Parga
A reusable Visual C++ framework for real-time volumetric cloud rendering, animation and morphing

All Articles

Sort by Score

CUDA 

29 Feb 2024 by CodeProject
Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language
3 Apr 2022 by Carlos Jiménez de Parga
A reusable Visual C++ framework for real-time volumetric cloud rendering, animation and morphing
16 Jul 2012 by Maxim Kartavenkov
Article describes how to make H.264 Video Encoder DirectShow Filter using NVIDIA encoder API in C#
16 Sep 2013 by Nick Kopp
This article builds upon the earlier High Performance Queries: GPU vs. PLINQ vs. LINQ and ports this to also support OpenCL devices and adds benchmarking so you can easily compare performance.
28 Mar 2023 by Ryan Scott White
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of your Cuda code.
22 May 2013 by John Michael Hauck
It has never been easier for C# desktop developers to write code that takes advantage of the amazing computing performance of modern graphics cards. In this post I will share some techniques for solving a simple (but still interesting) image analysis problem. Source Code https://www.assembla.com/co
2 May 2017 by Arthur V. Ratz
This article is a practical guide on using Intel® Threading Building Blocks (TBB) and OpenMP libraries for C++ based on the example of delivering parallel scalable code that implements Burrows-Wheeler Transformation (BWT) algorithm.
9 Feb 2013 by Debdatta Basu
Examine the various approaches to implementing Radix sort on the GPU
16 Feb 2016 by Max R McCarty
OWASP's #6 most vulnerable security risk has to do with keeping secrets secret.
16 Sep 2013 by Nick Kopp
Ultra high quality frequency domain image rotation on a GPU.
28 Nov 2011 by Adnan Boz
In this blog post, I’m diving deeper into Thrust usage scenarios with a simple implementation of Monte Carlo Simulation.
22 May 2013 by John Michael Hauck
Some ad hoc performance test results for a simple program written in C# as obtained from my current desktop computer: Dell Precision T3600, 16GB RAM, Intel Xeon E5-2665 0 @ 2.40GHz, NVidia GTX Titan.
16 Jan 2021 by Shao Voon Wong
How to convert a code from parallel C++ ray-tracing code to CUDA, then to SYCL 2020 via Intel® DPC++
3 Jan 2014 by OriginalGriff
I think to be honest that your teachers are right: as you say CUDA / OpenCL are not easy, and despite tutorials being available, I don't think you will be able to do anything that you would find interesting and that would be acceptable to your teachers in the time you have left to do it....
29 Oct 2019 by Patrice T
You show no attempt to solve the problem yourself, you have no question, your main effort is pasting the requirement, you just want us to do your HomeWork. HomeWork problems are simplified versions of the kind of problems you will have to solve in real life, their purpose is learning and...
N 13 Apr 2024 by Richard MacCutchan
Please post your question in the CodeProject.AI Discussions[^] forum.
12 Oct 2012 by John Michael Hauck
These posts are meant to inspire you to enter into the world of graphics processor programming.
12 Oct 2012 by John Michael Hauck
These posts are meant to inspire you to enter into the world of graphics processor programming.
23 Jul 2013 by John Michael Hauck
Writing massively parallel Windows software in C++ that takes full advantage of the processing power found in the video cards of today’s gaming computers.
2 Jan 2014 by Member 10501094
Hello,I am student on high school (not University) and technically i dont study programming but i can do my project on programming too. My teachers suggested me to make some useless sorting algorithms programs but that not really something i would like to do. I would like to create some...
30 Apr 2015 by Dave Kreskowiak
Start reading these[^]. You're really not going to get any help on this out of a forum environment because the discussions would be huge and nobody is going to write this code for you.
3 May 2017 by Intel
In this blog post, we highlight one particular class of low precision networks named binarized neural networks (BNNs), the fundamental concepts underlying this class, and introduce a Neon CPU and GPU implementation.
12 May 2019 by Michael Haephrati
I tested your code with Visual Studio 2017 Ultimate, and there are no warnings or issues. 1. Create a Console application 2. Place the following code in your main .cpp file // Test1.cpp : This file contains the 'main' function. Program execution begins and ends there. // #include "pch.h"...
10 Nov 2020 by Jeremy C. Ong
A quick 5-minute introduction to porting a CUDA app to Data Parallel C++ (DPC++)
11 Mar 2024 by Dave Kreskowiak
Ask your question in the dedicated CodeProject.AI Discussions[^] forum.
2 Apr 2012 by manythreads
This sixth article in a series on portable multithreaded programming using OpenCL™ where Rob Farber discusses how to calculate data in OpenCL™ and render it with OpenGL within the same application.
27 Dec 2012 by abdo21080
can any one help me, how can multiply vector(1*N) and matrix(N*M) and store the result on new vector(1*M) using cuda c++.
15 Jan 2013 by John Michael Hauck
CodeMash at the Kalahari Convention Center in Sandusky, OH, from January 8th through January 11th, 2013.
18 May 2013 by John Michael Hauck
“Programming Massively Parallel Processors (second edition)” by Kirk and Hwu is a very good second book for those interested in getting started with CUDA.
18 Jul 2013 by Harshil Sharma
Hi. I'm currently CUDA C from Udacity and I'm stuck at Lesson 1. I've written this code for color to grey-scale conversion but its converting only a thin strip of pixels from top.Please tell me where does the fault lie: in the grid-size calculation or in the kernel itself.Here's the...
29 Jul 2013 by GLStarter
Dear Experts,I'm developing a CUDA application.But I'm getting a compilation error from nvcc."visual studio configuration file '(null)' could not be found why this error "I tried to compile the sample application provided in CUDA SDK. It also have the same errors.Please find the...
11 Nov 2018 by Subash_e-et-t
I want to increase the particle number calculation. Up to now I am being able to calculate 1 million particle using single GPU. Is it possible to increase the particle calculation to 2 million using multi GPU?
13 Sep 2013 by Subash_e-et-t
Does GTX 560 ti graphics card supports P2P coding? If yes can you please provide me sample code for P2P coding.
18 Oct 2013 by zlristovski
I want to parallelize function in CUDA C which will count all vectors with sum equal of vector elements and elements not bigger of k. For example if number of vector elements n is 5, sum=10 and k=3 than, the number of vectors who satisfy this condition is 101. I've already make this function in...
12 Dec 2013 by amsainju
Can anyone please help me with how to use CUB library. I found out that it has a function that help me to radix sort on the basis of key value pair. But i am unable to use it even for a sample demo. so that later i can use it in my real project. my requiremement for demo is: ...
3 Jan 2014 by CPallini
You might write a useful parallelized sorting algorithm.
16 Sep 2014 by Member 11087543
I need a quick brush up on how I can conditionally (pre-process) code and check to see what GPU is installed or the such. Why I need this is because I am writing a program that is expected to be very cross-platform and capable of having workarounds on mostly all modern hardware. For GPGPU...
17 Sep 2014 by Manikandan10
See this link:http://stackoverflow.com/questions/1090261/get-the-graphics-card-model[^]
8 Oct 2014 by SanjaySMK
I have a cv::KeyPoint class in the caller function in CPU. I wanted to pass its vector as reference to a kernel function of CUDA. How can I access its members in a kernel function of CUDA? I am trying to implement this on CUDA 6.0.I Googled for its solution, but didn't succeed. Please...
25 Sep 2014 by Richard MacCutchan
Try the support forum at http://www.nvidia.com/page/support.html[^].
19 Feb 2015 by John Patel
When i try to call cuda function my code gives error that try to call cuda function my code gives error unresolved externals.
20 Feb 2015 by KarstenK
To understand CUDA you should consultate the documentation and try some sample code which I found in some seconds of research.
24 Apr 2015 by Member 11640624
hi, how can i get rid of this error :IntelliSense: expected an expression ??it's bothering me alot :( #include "cuda.h"#include "cuda_runtime.h"#include "device_launch_parameters.h"#include __global__ void helloworldcuda_Kernel(){ printf("Hello world cuda"); ...
16 Jun 2015 by Member 11640624
hello , i am in need very much to convert this genetic c program to cuda ... :(http://www-cs-students.stanford.edu/~jl/Essays/geneticAlgorithm1.c[^]
16 Jun 2015 by CPallini
There is "no royal way to geometry" you have to know both C and CUDA, understand the algorithm and implement it. We can help on specific issuues. Happy coding!
22 Nov 2015 by John Michael Hauck
It has never been easier for C# desktop developers to write code that takes advantage of the amazing computing performance of modern graphics cards.
17 Dec 2015 by Sergey Alexandrovich Kryukov
You can read about bank-switched memory here: https://en.wikipedia.org/wiki/Bank_switching[^].Your arithmetic concerns hardly can be explained based on just two sentences taken out of context. Such sentences can always be understood incorrectly, and books can contain some error. No matter...
17 Feb 2016 by Member 12332702
I want to perform canny edge detection algorithm using cuda in java .For which i am using jcuda . I am confused with what to write in kernel call(.cu) and what in .java file. can anyone pl suggest me something.What I have tried:I had tried it in simple java using netbeans ide . but i am...
18 Mar 2016 by taha93
I'm trying to use CUDAfy.NET in a web application which will be further called from a web form.When it tries to initiate a CudafyModule it gives the error as shown in pic below:ExceptionCode was working perfectly in console application. Is there anyway to get rid of this...
26 Jun 2016 by malang5
What could be the reason behind a cblas_sgemm call taking much less time for matrices with a large number of zeros as compared to the same cblas_sgemm call for dense matrices?I know gemv is designed for matrix-vector multiplication but why can't I use gemm for vector-matrix multiplication if...
26 Jun 2016 by KarstenK
Peter is right. A good library looks for optimization before it starts the heavy calculating.And on matrices the skipping for 0 values is the primary optimization step in which the matrix gets simplified.Here is a fine article from CMSoft which discusses the whole issue and they call it...
29 Aug 2016 by Member 12710061
How do I use Eclipse for JCUDA, its urgent?What I have tried:I can find this with linux and mac but I want this on windows machine.
29 Aug 2016 by Mehdi Gholam
Google is your friend, also start here : GPU Computing Using CUDA, Eclipse, and Java with JCuda[^]jcuda.org - Java bindings for CUDA[^]jCUDA - Java library for CUDA Windows support - NVIDIA Developer Forums[^]
14 Sep 2016 by Mike Lanzetta
In this post, I'll walk you through how to get one of the most popular toolkits up and running on Windows, and run through and explain some fun examples.
4 May 2017 by Intel
Theano is a Python library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays (numpy.ndarray)
18 Sep 2017 by Intel
TotalView includes a set of tools that provide scientific and academic developers with controlover processes and thread execution, along with deep visibility into program states and data.
16 May 2018 by Javier Luis Lopez
It is very hard to use the GPU because the user has to do memory segmentation and transfer, the use of local memory and in the most applications very low performance increase 10-20x is reached. In other hand using multithreads is easy and fast. It would be better use 1280 threads in parallel...
16 May 2018 by KarstenK
It is depending on what you want to do. Even multithreading isnt optimal, when a lot of short threads are running because multithreading means also overhead in the CPU. Graphical output and low level computations are best done on GPU, computations also when the usage of the GPU leads to less...
30 Jul 2018 by suraty
Hello, I want to install tensorflow-gpu on windows. I searched the internet. I found these steps: 1- install Nvidia driver 2- install cuda 3- install cudnn 4- install tensorflow-gpu Is it correct? I have noticed that some newer TensorFlow versions are incompatible with older CUDA and...
17 Jul 2018 by Richard MacCutchan
You need to check the documentation or websites for each product.
30 Jul 2018 by suraty
Thank you very much. I should check this link: https://www.tensorflow.org/install/install_windows for Requirements to run TensorFlow with GPU support.
11 Nov 2018 by tugrulGtx
If you can parallelize an algorithm and map onto a GPU, then you can easily do a similar task to map it to multiple GPUs. You can simply do this by allocating half of work to GPU1 and the other half to GPU2. Just need to use streams to overlap two GPUs working timeline. This way you can reduce...
10 Dec 2018 by Apriorit Inc, Vadym Zhernovyi
The experience of improving Mask R-CNN performance six to ten times by applying TensorRT
10 Feb 2019 by tugrulGtx
2 contains 1. true + true 1 also contains an exception: broadcasting. If not only 2 but all threads access same bank, it is broadcasted to all items of warp. 1 is true because shared memory serializes access to same bank if its not intended to broadcast. 2 is true because shared memory serves...
10 Feb 2019 by tugrulGtx
Why are you using both a uniform grid (I understand this from your "26 neighbor ..") and an octree at the same for same task (collision detection)? You could have a broad octree for broadphase collisions (particle - mesh). Then in each broad octree node that is filled, have a sub-octree to hold...
14 Jun 2021 by MohammadrezaMC2
Hello every one, I am new with cuda I have two arrays: double* A = new double[]{1,2,3,4,5}; double* B = new double[]{2,2,2,3,3,3,4,4,4}; I want to find the index of the value of each element in A that is equal to each element in B, which in...
14 Jun 2021 by KarstenK
I am not sure about the code I would write: findIndex(9, BB,AA, CC); else you must store the result and set some useful return value. You must compare each scalar of the array with the searched element value. I recommand that you work with...
11 Oct 2021 by OriginalGriff
While we are more than willing to help those that are stuck, that doesn't mean that we are here to do it all for you! We can't do all the work, you are either getting paid for this, or it's part of your grades and it wouldn't be at all fair for...
17 Nov 2021 by Deepesh Mhatre 2021
I am trying to use Tensorflow object detection API,but the CUDNN version that I getting on colab is 8.0. and I want to use 8.1,so how do I upgrade CUDNN library on colab ? Below is the script I wrote ,but it does'nt seem to have any effect,what...
27 Nov 2022 by xoxo grace
I am running Yolov5 object detector in my workstation Nvidia GeForce RTX 2080 Ti. I followed all the procedures installing the necessary requirements: CUDA 10.1 CUDNN 7.6 tensorflow-gpu 2.2 Visual Studio Python 3.7.0 But still my machine is not...
3 Dec 2022 by OriginalGriff
Quote: I apologize I am not sure where to get the code from. And we are even more in the dark! What you have said is "it don't work" and expect people with no access at all to your system to fix it for you. We can't do that. Remember that we...
27 Jun 2023 by Robert Mueller-Albrecht
Using the Intel® oneAPI Math Kernel Library SYCL API
8 Sep 2023 by Moharram
I have a signal processing algorithm that uses FFT... The algorithm is implemented in a module or the same class and has the ability to be parallelized so that up to n Instances can be made from it and run in parallel... So far, the...
23 May 2023 by G0dm0de
Hi All Been working on this a while and have not been able to get my Yolo6.2 object detection to work on my gpu it insists on using cpu I have CodeProject installed on a ESX Windows VM and have passed through a 980TI. Its working with my Agent...
11 Mar 2024 by Xeno666
Hello, I can't seem to get Project AI Object Detection to enable my newly installed Tesla T4. Any idea what I'm doing wrong? YOLOv5 Doesn't have the option to enable the GPU and only the CPU shows. 17:57:27:System: Windows...
N 13 Apr 2024 by Member 16044303
My question : Is it possible to have this configuration and the system work correctly ? Because my GPU is every time at 0% usage and I have test lot of driver, cuda and codeproject ai version and event my GPU is at 0% usage... I don't know if...
13 Oct 2012 by Maxim Kartavenkov
Article describes how to make DirectShow Filters in .NET, it consist of BaseClasses and couple of samples
9 Dec 2016 by Arthur V. Ratz
In this article, we'll demonstrate an approach the allows to increase the performance (up to 600%) of the code that implements the conventional distribution counting algorithm (DCA) using NVIDIA CUDA 8.0 Runtime API
22 Jul 2016 by Afzaal Ahmad Zeeshan
In this post, I am going to walk you through creating your own central hub to allow your connected devices to authenticate people using facial recognition system.
16 Sep 2013 by Nick Kopp
An introduction to using Cudafy.NET to perform processing on a GPU
2 Nov 2018 by Vangos
This post will show you how to build OpenCV for Windows with CUDA.
20 Sep 2015 by Bartlomiej Filipek
A little guide about modern OpenGL and why it gives us so much value.
16 Sep 2013 by Nick Kopp
How to get 30x performance increase for queries by using your Graphics Processing Unit (GPU) instead of LINQ and PLINQ.
26 May 2014 by CatchExAs
How to make best use of current technology for computationally intensive applications?
22 Sep 2011 by Adnan Boz
Massively Parallel Random Nunber Generation using CUDA C, Thrust and C#
13 Oct 2012 by Alesiani Marco
A Wave PDE simulation using GPGPU capabilities
25 Oct 2010 by hax_
Introduction to the open-source hxGrid library for distributed computing. Main benefits of the library: cluster uses only idle time of Windows 2000/XP/Vista workstation (no dedicated workstations required); easy to use; free.
10 Jan 2011 by phoaivu
GPU Implementation of Extended Gaussian mixture model for Background Subtraction
8 Aug 2011 by Adnan Boz
An entry level example of how to use NVIDIA CUDA technology to achieve better performance within C# with minimum possible amount of code
14 Jan 2012 by Adnan Boz
How to set up Amazon EC2 Windows GPU instance for NVIDIA CUDA development
10 May 2010 by Kevin Drzycimski
Unroll loops at compile time, deduced by a template argument.
13 Mar 2008 by billconan, kavinguy
This article describes the implementation of a neural network with CUDA.
16 Sep 2013 by Nick Kopp
Performing base64 encoding on a graphics processing unit using CUDAfy.NET (CUDA in .NET).