CUDA

Great Reads

CodeProject.AI Server: AI the easy way.

by CodeProject

Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language

Nimbus SDK: A Tiny Framework for C++ Real-Time Volumetric Cloud Rendering, Animation and Morphing

by Carlos Jiménez de Parga

A reusable Visual C++ framework for real-time volumetric cloud rendering, animation and morphing

H.264 CUDA Encoder DirectShow Filter in C#

by Maxim Kartavenkov

Article describes how to make H.264 Video Encoder DirectShow Filter using NVIDIA encoder API in C#

CUDA Programming Model on AMD GPUs and Intel CPUs

by Nick Kopp

This article builds upon the earlier High Performance Queries: GPU vs. PLINQ vs. LINQ and ports this to also support OpenCL devices and adds benchmarking so you can easily compare performance.

Latest Articles

CodeProject.AI Server: AI the easy way.

by CodeProject

Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language

How to Move from CUDA Math Library Calls to oneMKL

by Robert Mueller-Albrecht

Using the Intel® oneAPI Math Kernel Library SYCL API

CudaPAD

by Ryan Scott White

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of your Cuda code.

Nimbus SDK: A Tiny Framework for C++ Real-Time Volumetric Cloud Rendering, Animation and Morphing

by Carlos Jiménez de Parga

A reusable Visual C++ framework for real-time volumetric cloud rendering, animation and morphing

All Articles

top

CUDA

A Brief Test on the Code Efficiency of CUDA and Thrust

27 Jun 2010 by Wayne Wood

Verify the execution efficiency of a short CUDA program when using the library thrust

VC9.0

C++

Win7

A Brief Test on the Efficiency of a .NET 4.0 Parallel Code Example

1 Jul 2010 by Wayne Wood

Verify the execution efficiency of a series of short .NET 4.0 parallel programming samples

A C# SMTP server (receiver)

17 Sep 2012 by ObiWan_MCC

A C# SMTP server (receiver).

A Neural Network on GPU

13 Mar 2008 by billconan, kavinguy

This article describes the implementation of a neural network with CUDA.

A question about shared memory in studing CUDA Programming

17 Dec 2015 by virusx1984

There are some questions about shared memory(The book I am reading is by Shane Cook):1. The book said "In Fermi the choice is 16K or 48K in favor of the L1 or shared memory"and then "Shared memory is a bank-switched architecture. On Fermi it is 32 banks wide" "Each bank of data is 4 bytes...

CUDA

A question about shared memory in studing CUDA Programming

17 Dec 2015 by Sergey Alexandrovich Kryukov

You can read about bank-switched memory here: https://en.wikipedia.org/wiki/Bank_switching[^].Your arithmetic concerns hardly can be explained based on just two sentences taken out of context. Such sentences can always be understood incorrectly, and books can contain some error. No matter...

CUDA

Accelerating Neural Networks with Binary Arithmetic

3 May 2017 by Intel

In this blog post, we highlight one particular class of low precision networks named binarized neural networks (BNNs), the fundamental concepts underlying this class, and introduce a Neon CPU and GPU implementation.

CUDA

hardware

machine-learning

Assertion Error in VS 2012 .. DEBUG Assertion Failed! ..PLEASE HELP

23 Nov 2014 by Chethan Sharma1

'cudaDecodeGL.exe' (Win32): Loaded 'C:\Windows\SysWOW64\opengl32.dll'. Symbols loaded.'cudaDecodeGL.exe' (Win32): Loaded 'C:\Windows\SysWOW64\glu32.dll'. Symbols loaded.'cudaDecodeGL.exe' (Win32): Loaded 'C:\Windows\SysWOW64\ddraw.dll'. Symbols loaded.'cudaDecodeGL.exe' (Win32): Loaded...

CUDA

VisualC++

Avoiding the Trials and Tribulations of CUDA Development on Windows

8 Sep 2010 by Dan Buskirk

Understanding the organization of a Visual Studio project for CUDA development

Base64 Encoding on a GPU

16 Sep 2013 by Nick Kopp

Performing base64 encoding on a graphics processing unit using CUDAfy.NET (CUDA in .NET).

Building OpenCV Libraries for Linux on Windows Using MinGW and MSYS

23 Jan 2013 by Vijay Rajanna

Procedure to be followed in order to generate OpenCV libraries for Linux operating system using MinGW and MSYS toolkit on Windows Operating system.

CUDA

OpenCV

C# final year project idea about GPU acceleration

2 Jan 2014 by Member 10501094

Hello,I am student on high school (not University) and technically i dont study programming but i can do my project on programming too. My teachers suggested me to make some useless sorting algorithms programs but that not really something i would like to do. I would like to create some...

C# final year project idea about GPU acceleration

3 Jan 2014 by CPallini

You might write a useful parallelized sorting algorithm.

C# final year project idea about GPU acceleration

3 Jan 2014 by OriginalGriff

I think to be honest that your teachers are right: as you say CUDA / OpenCL are not easy, and despite tutorials being available, I don't think you will be able to do anything that you would find interesting and that would be acceptable to your teachers in the time you have left to do it....

Can't get GPU working for YOLO

23 May 2023 by OriginalGriff

This would be better posted in the dedicated forum: CodeProject.AI Discussions[^]

CUDA

GPU

Can't get GPU working for YOLO

23 May 2023 by G0dm0de

Hi All Been working on this a while and have not been able to get my Yolo6.2 object detection to work on my gpu it insists on using cpu I have CodeProject installed on a ESX Windows VM and have passed through a 980TI. Its working with my Agent...

CUDA

GPU

Can't get GPU working for YOLO

24 Sep 2023 by Member 16099679

Add your IDE to windows programs that turn on with gpu

CUDA

GPU

Cblas gemm performance for sparse matrices

26 Jun 2016 by malang5

What could be the reason behind a cblas_sgemm call taking much less time for matrices with a large number of zeros as compared to the same cblas_sgemm call for dense matrices?I know gemv is designed for matrix-vector multiplication but why can't I use gemm for vector-matrix multiplication if...

CUDA

C++

Cblas gemm performance for sparse matrices

26 Jun 2016 by KarstenK

Peter is right. A good library looks for optimization before it starts the heavy calculating.And on matrices the skipping for 0 values is the primary optimization step in which the matrix gets simplified.Here is a fine article from CMSoft which discusses the whole issue and they call it...

CUDA

C++

CodeMash 2013

15 Jan 2013 by John Michael Hauck

CodeMash at the Kalahari Convention Center in Sandusky, OH, from January 8th through January 11th, 2013.

CUDA

Dev

machine-learning

CodeProject.AI Server: AI the easy way.

29 Feb 2024 by CodeProject

Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language

CUDA

artificial-intelligence

Compute FFT on GPU cores

8 Sep 2023 by Moharram

I have a signal processing algorithm that uses FFT... The algorithm is implemented in a module or the same class and has the ability to be parallelized so that up to n Instances can be made from it and run in parallel... So far, the...

Convert Xilinx FPGA/CPLD to C Source

28 May 2011 by grilialex

Flow and tools to convert Xilinx bitstreams to C source code for programming FPGA/CPLD

CUDA 12.3 driver 545.29.06 cudnn: 8.9.6 GTX 1650

13 Apr 2024 by Member 16044303

My question : Is it possible to have this configuration and the system work correctly ? Because my GPU is every time at 0% usage and I have test lot of driver, cuda and codeproject ai version and event my GPU is at 0% usage... I don't know if...

CUDA

Linux

docker

CUDA 12.3 driver 545.29.06 cudnn: 8.9.6 GTX 1650

13 Apr 2024 by Richard MacCutchan

Please post your question in the CodeProject.AI Discussions[^] forum.

CUDA

Linux

docker

CUDA 3.0 and Nexus Parallel Nsight on VS 2010

6 Jun 2010 by GPUToaster™

CUDA 3.2 on VS2010 in 9 steps

11 May 2011 by Kerem Kat

Compile and run CUDA 3.2 projects from VS2010 in 9 easy steps

CUDA Programming Model on AMD GPUs and Intel CPUs

16 Sep 2013 by Nick Kopp

This article builds upon the earlier High Performance Queries: GPU vs. PLINQ vs. LINQ and ports this to also support OpenCL devices and adds benchmarking so you can easily compare performance.

CUDA

nvidia

OpenCL

CUDA: is the grid size calculated correctly?

18 Jul 2013 by Harshil Sharma

Hi. I'm currently CUDA C from Udacity and I'm stuck at Lesson 1. I've written this code for color to grey-scale conversion but its converting only a thin strip of pixels from top.Please tell me where does the fault lie: in the grid-size calculation or in the kernel itself.Here's the...

CUDA

Cudafy Me: Part 1 of 4

12 Oct 2012 by John Michael Hauck

These posts are meant to inspire you to enter into the world of graphics processor programming.

CUDA

All-Topics

nvidia

Cudafy Me: Part 2 of 4

12 Oct 2012 by John Michael Hauck

These posts are meant to inspire you to enter into the world of graphics processor programming.

CUDA

.NET

Dev

Cudafy Me: Part 3 of 4

12 Oct 2012 by John Michael Hauck

These posts are meant to inspire you to enter into the world of graphics processor programming.

Cudafy Me: Part 4 of 4

12 Oct 2012 by John Michael Hauck

These posts are meant to inspire you to enter into the world of graphics processor programming.

CUDA

Dev

nvidia

CudaPAD

28 Mar 2023 by Ryan Scott White

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of your Cuda code.

Deep Learning on Windows: A Getting Started Guide

14 Sep 2016 by Mike Lanzetta

In this post, I'll walk you through how to get one of the most popular toolkits up and running on Windows, and run through and explain some fun examples.

Differences of Nvidia CUDA & OpenCL

26 Aug 2013 by Buddhi Chaturanga

I want to distinguish of these two technologies relevant to their technological aspects.What are major differences and usage of each one of them?Pros and Cons.How we can handle process through GPU using each one of them?How those technologies can be implemented for 3D game programming?

Differences of Nvidia CUDA & OpenCL

27 Aug 2013 by Stefan_Lang

Basically, CUDA only works for NVIDIA cards, whereas OpenCL supports all. In theory. To my knowledge the only usable implementation for OpenCL is by AMD; NVIDIA also implemented OpenCL for their cards, but didn't put a lot effort into it.CUDA only is more powerful than the more generalized...

DirectShow Filters Development Part 3: Transform Filters

15 Mar 2011 by Roman Ginzburg

A text overlay filter and a JPEG/JPEG2000 encoder using transform filters.

Distributed Computing in Small and Medium Sized Offices

25 Oct 2010 by hax_

Introduction to the open-source hxGrid library for distributed computing. Main benefits of the library: cluster uses only idle time of Windows 2000/XP/Vista workstation (no dedicated workstations required); easy to use; free.

Do You Really Mind What's Inside Your Computer? (An Introduction to High Performance Computing)

28 Aug 2017 by Ravimal Bandara

An introduction to high performance computing

Extended GMM for Background Subtraction on GPU

10 Jan 2011 by phoaivu

GPU Implementation of Extended Gaussian mixture model for Background Subtraction

Facial biometric authentication on your connected devices

22 Jul 2016 by Afzaal Ahmad Zeeshan

In this post, I am going to walk you through creating your own central hub to allow your connected devices to authenticate people using facial recognition system.

HTML

CUDA

.NET

artificial-intelligence

Fast Image Blurring with CUDA

10 Sep 2009 by ChaoJui

High performance and good quality of image blurring

Faster copies to CUDA GPUs

21 Jun 2012 by Nick Kopp

How to both simplify CUDA applications and improve PCI-Express performance.

Faster JPEG2000 Viewer

6 Jan 2014 by Adam Wojnar

Simple .jp2/.j2k viewer using Kakadu executables demonstration pack for decoding

Faster Sorting in C# by Utilizing GPU with NVIDIA CUDA

8 Aug 2011 by Adnan Boz

An entry level example of how to use NVIDIA CUDA technology to achieve better performance within C# with minimum possible amount of code

GCN Assembler for AMD GPUs

17 Apr 2016 by Ryan Scott White

an assembler/compiler for AMD’s GCN (Generation Core Next Architecture) Assembly Language

Getting an error not sure how to fix.

3 Dec 2022 by Jordan Reardon

CUDA error: an illegal memory access was encountered This is the error I get when trying to use GPU for object detections. What I have tried: Reinstalled everything multiple times. Windows, Blue Iris, Code Project, Cuda tool kit ect.

CUDA

Getting an error not sure how to fix.

3 Dec 2022 by OriginalGriff

Quote: I apologize I am not sure where to get the code from. And we are even more in the dark! What you have said is "it don't work" and expect people with no access at all to your system to fix it for you. We can't do that. Remember that we...

CUDA

Getting Started with Intel® Software Optimization for Theano and Intel® Distribution for Python

4 May 2017 by Intel

Theano is a Python library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays (numpy.ndarray)

GPGPU on Accelerating Wave PDE

13 Oct 2012 by Alesiani Marco

A Wave PDE simulation using GPGPU capabilities

GPGPU Papyrus Demo

22 May 2013 by John Michael Hauck

It has never been easier for C# desktop developers to write code that takes advantage of the amazing computing performance of modern graphics cards. In this post I will share some techniques for solving a simple (but still interesting) image analysis problem. Source Code https://www.assembla.com/co

CUDA

nvidia

OpenCL

GPGPU Performance Tests

22 May 2013 by John Michael Hauck

Some ad hoc performance test results for a simple program written in C# as obtained from my current desktop computer: Dell Precision T3600, 16GB RAM, Intel Xeon E5-2665 0 @ 2.40GHz, NVidia GTX Titan.

GPU Computing Using CUDA, Eclipse, and Java with JCuda

21 Sep 2013 by Mark H Bishop

Tutorial: GPU computing with JCuda and Nsight (Eclipse)

GPU Performance Tests

22 Nov 2015 by John Michael Hauck

It has never been easier for C# desktop developers to write code that takes advantage of the amazing computing performance of modern graphics cards.

GPU-Quicksort in OpenCL 2.0: Nested Parallelism and Work-Group Scan Functions

20 Jan 2015 by Android on Intel

This tutorial shows how to use two powerful features of OpenCL™ 2.0: enqueue_kernel functions that allow you to enqueue kernels from the device and work_group_scan_exclusive_add and work_group_scan_inclusive_add

H.264 CUDA Encoder DirectShow Filter in C#

16 Jul 2012 by Maxim Kartavenkov

Article describes how to make H.264 Video Encoder DirectShow Filter using NVIDIA encoder API in C#

High Performance Queries: GPU vs. PLINQ vs. LINQ

16 Sep 2013 by Nick Kopp

How to get 30x performance increase for queries by using your Graphics Processing Unit (GPU) instead of LINQ and PLINQ.

High-performance finite elements with C#

25 Jul 2016 by Igor Gribanov

Performing linear static analysis on a tetrahedral mesh with a little bit of help from a third-party solver.

how can multiply vector by matrix using cuda c++

27 Dec 2012 by abdo21080

can any one help me, how can multiply vector(1*N) and matrix(N*M) and store the result on new vector(1*M) using cuda c++.

CUDA

How do I check what GPU is active on Windows?

16 Sep 2014 by Member 11087543

I need a quick brush up on how I can conditionally (pre-process) code and check to see what GPU is installed or the such. Why I need this is because I am writing a program that is expected to be very cross-platform and capable of having workarounds on mostly all modern hardware. For GPGPU...

How do I check what GPU is active on Windows?

17 Sep 2014 by Manikandan10

See this link:http://stackoverflow.com/questions/1090261/get-the-graphics-card-model[^]

How do I complete the rest of the code?

29 Oct 2019 by Patrice T

You show no attempt to solve the problem yourself, you have no question, your main effort is pasting the requirement, you just want us to do your HomeWork. HomeWork problems are simplified versions of the kind of problems you will have to solve in real life, their purpose is learning and...

CUDA

C++

How do I find the index corresponding to an specific value in an array with cuda C++

14 Jun 2021 by MohammadrezaMC2

Hello every one, I am new with cuda I have two arrays: double* A = new double[]{1,2,3,4,5}; double* B = new double[]{2,2,2,3,3,3,4,4,4}; I want to find the index of the value of each element in A that is equal to each element in B, which in...

CUDA

C++

indexes

How do I find the index corresponding to an specific value in an array with cuda C++

14 Jun 2021 by KarstenK

I am not sure about the code I would write: findIndex(9, BB,AA, CC); else you must store the result and set some useful return value. You must compare each scalar of the array with the searched element value. I recommand that you work with...

CUDA

C++

indexes

How Do I Get Rid Of This Error

24 Apr 2015 by Member 11640624

hi, how can i get rid of this error :IntelliSense: expected an expression ??it's bothering me alot :( #include "cuda.h"#include "cuda_runtime.h"#include "device_launch_parameters.h"#include __global__ void helloworldcuda_Kernel(){ printf("Hello world cuda"); ...

CUDA

VS2010

How Do I Get Rid Of This Error

25 Apr 2015 by KarstenK

This syntax looks strange and the compilers says it isnt valid. You should check the CUDA documents to write working code.I found some solution with google. There is also a link for a tutorial. You should study it.

CUDA

VS2010

How do I use eclipse for JCUDA (JAVA+CUDA), its urgent?

29 Aug 2016 by Member 12710061

How do I use Eclipse for JCUDA, its urgent?What I have tried:I can find this with linux and mac but I want this on windows machine.

CUDA

Eclipse

How do I use eclipse for JCUDA (JAVA+CUDA), its urgent?

29 Aug 2016 by Mehdi Gholam

Google is your friend, also start here : GPU Computing Using CUDA, Eclipse, and Java with JCuda[^]jcuda.org - Java bindings for CUDA[^]jCUDA - Java library for CUDA Windows support - NVIDIA Developer Forums[^]

CUDA

Eclipse

How to access members of structure, passed as a vector, inside a function in C++?

8 Oct 2014 by SanjaySMK

I have a cv::KeyPoint class in the caller function in CPU. I wanted to pass its vector as reference to a kernel function of CUDA. How can I access its members in a kernel function of CUDA? I am trying to implement this on CUDA 6.0.I Googled for its solution, but didn't succeed. Please...

CUDA

C++

OpenCV

How to access members of structure, passed as a vector, inside a function in C++?

25 Sep 2014 by Richard MacCutchan

Try the support forum at http://www.nvidia.com/page/support.html[^].

CUDA

C++

OpenCV

How to Build OpenCV for Windows with CUDA

2 Nov 2018 by Vangos

This post will show you how to build OpenCV for Windows with CUDA.

How to call CUDA function of .cu file from standard c file ?

19 Feb 2015 by John Patel

When i try to call cuda function my code gives error that try to call cuda function my code gives error unresolved externals.

CUDA

C++

How to call CUDA function of .cu file from standard c file ?

20 Feb 2015 by KarstenK

To understand CUDA you should consultate the documentation and try some sample code which I found in some seconds of research.

CUDA

C++

how to convert this c program to cuda c ?

16 Jun 2015 by Member 11640624

hello , i am in need very much to convert this genetic c program to cuda ... :(http://www-cs-students.stanford.edu/~jl/Essays/geneticAlgorithm1.c[^]

CUDA

how to convert this c program to cuda c ?

16 Jun 2015 by CPallini

There is "no royal way to geometry" you have to know both C and CUDA, understand the algorithm and implement it. We can help on specific issuues. Happy coding!

CUDA

How to Get Started with TensorFlow

24 Oct 2017 by Packt Publishing

In this section, we'll take our first steps in using the low-level TensorFlow API.

How to Move from CUDA Math Library Calls to oneMKL

27 Jun 2023 by Robert Mueller-Albrecht

Using the Intel® oneAPI Math Kernel Library SYCL API

How to perform cuda program in java

17 Feb 2016 by Member 12332702

I want to perform canny edge detection algorithm using cuda in java .For which i am using jcuda . I am confused with what to write in kernel call(.cu) and what in .java file. can anyone pl suggest me something.What I have tried:I had tried it in simple java using netbeans ide . but i am...

CUDA

Java

image

How to set up Amazon EC2 Windows GPU instance for NVIDIA CUDA development

14 Jan 2012 by Adnan Boz

How to set up Amazon EC2 Windows GPU instance for NVIDIA CUDA development

CUDA

All-Topics

nvidia

how to store triangles in an octree

10 Feb 2019 by pramithas dhakal

I am doing High Speed Data Calculation project for a CFD software.For this I have constructed an octree and pupulated it with particles.For collision detection,for each particle ,I have already calculated its neighbouring particles.For this purpose I have first calculated in which octree cube...

CUDA

C++

how to store triangles in an octree

10 Feb 2019 by tugrulGtx

Why are you using both a uniform grid (I understand this from your "26 neighbor ..") and an octree at the same for same task (collision detection)? You could have a broad octree for broadphase collisions (particle - mesh). Then in each broad octree node that is filled, have a sub-octree to hold...

CUDA

C++

How to upgrade CUDNN to 8.2 in Google colab ?

17 Nov 2021 by Deepesh Mhatre 2021

I am trying to use Tensorflow object detection API,but the CUDNN version that I getting on colab is 8.0. and I want to use 8.1,so how do I upgrade CUDNN library on colab ? Below is the script I wrote ,but it does'nt seem to have any effect,what...

CUDA

tensorflow

How to use CUDA programming to calculate and process the correct number

20 Apr 2018 by Member 13789251

Bandwidth test - test memory bandwidth. Especially important for PCIE capability. Different MB has different PCIE capability. The CUDA adaptor performance is depend on the capability of PCIE. It could be the performance bottleneck. On the following programming drills, the number of clock...

CUDA

memory

How to use CUDA programming to calculate and process the correct number

20 Apr 2018 by Member 13789251

It is better you tell me about the errors not me tell you all the answers

CUDA

memory

HyCuda, a Hybrid Framework Code-Generator for CUDA

18 Dec 2013 by Joren Heit

A Hybrid Framework Code-Generator for CUDA

CUDA

C++

templates

I need help in creating a cuda code for kronecker product. Can anyone help?

12 May 2019 by Member 14172841

can anyone convert this c code into cuda?? // C code to find the Kronecker Product of two // matrices and stores it as matrix C #include // rowa and cola are no of rows and columns // of matrix A // rowb and colb are no of rows and columns // of matrix B const int cola = 2,...

CUDA

C++

I need help in creating a cuda code for kronecker product. Can anyone help?

12 May 2019 by Michael Haephrati

I tested your code with Visual Studio 2017 Ultimate, and there are no warnings or issues. 1. Create a Console application 2. Place the following code in your main .cpp file // Test1.cpp : This file contains the 'main' function. Program execution begins and ends there. // #include "pch.h"...

CUDA

C++

I need help in creating a cuda code for kronecker product. Can anyone help?

12 May 2019 by Stefan_Lang

You still haven't posted the code that causes the errors, and the full text of the error (although in case of MSB3721 this may not be helpful - it is a generic error code). I can only say that the index you use to assign elements in C are incorrect! You may not believe so since your program...

CUDA

C++

I want to write a CUDA program to make a negative image using cuda and c. I have a partial code but need help feeling in the rest

11 Oct 2021 by ericka jones

#include #include "image.h" __global__ void negative_kernal (unsigned char *pixel, unsigned char max_value, int n) { /******************/ /* your code here */ /******************/ /* calculate thread ID */ /*...

CUDA

I want to write a CUDA program to make a negative image using cuda and c. I have a partial code but need help feeling in the rest

11 Oct 2021 by OriginalGriff

While we are more than willing to help those that are stuck, that doesn't mean that we are here to do it all for you! We can't do all the work, you are either getting paid for this, or it's part of your grades and it wouldn't be at all fair for...

CUDA

I'm running yolov5 object detection in geforce RTX 2080 ti, installed CUDA, CUDNN, tensorflow-gpu, visual studio but still my program is not using GPU

27 Nov 2022 by xoxo grace

I am running Yolov5 object detector in my workstation Nvidia GeForce RTX 2080 Ti. I followed all the procedures installing the necessary requirements: CUDA 10.1 CUDNN 7.6 tensorflow-gpu 2.2 Visual Studio Python 3.7.0 But still my machine is not...

Image Filters Using CPU, GPU, and C++ AMP

10 Jul 2012 by Kerem Kat

Process webcam images on the CPU and GPU with OpenCV, CUDA and C++ AMP

Implement SQLite on GPU using CUDA.Net

30 Apr 2015 by Member 10219452

hi all,how to implement a subset of the SQLite command processor directly on the GPU using CUDA.Net on windows OS.thank you.

Implement SQLite on GPU using CUDA.Net

30 Apr 2015 by Dave Kreskowiak

Start reading these[^]. You're really not going to get any help on this out of a forum environment because the discussions would be huge and nobody is going to write this code for you.

Implementing Parallel Scalable Distribution Counting Algorithm (DCA) with CUDA 8.0 Runtime API

9 Dec 2016 by Arthur V. Ratz

In this article, we'll demonstrate an approach the allows to increase the performance (up to 600%) of the code that implements the conventional distribution counting algorithm (DCA) using NVIDIA CUDA 8.0 Runtime API

Improving the Performance of Mask R-CNN Using TensorRT

10 Dec 2018 by Apriorit Inc, Vadym Zhernovyi