adding gpu support by dredstone1 · Pull Request #2 · dredstone1/NeuralNetwork

dredstone1 · 2025-08-05T13:26:38Z

No description provided.

Copilot

Pull Request Overview

This PR implements GPU support for neural network operations using CUDA. The core change replaces CPU-based tensor operations with GPU-accelerated equivalents, including matrix operations, activation functions, and memory management.

Key Changes:

Added CUDA infrastructure with tensor_gpu.hpp/.cu for GPU operations
Modified Tensor class to support both CPU and GPU execution modes
Updated all tensor operations to use GPU kernels when enabled
Removed deprecated scalar activation functions in favor of tensor-based operations

Reviewed Changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/model/tensor_gpu.hpp/.cu	New CUDA implementation for GPU tensor operations and activation functions
src/model/tensor.cpp	Modified tensor implementation to support GPU mode with conditional execution paths
include/tensor.hpp	Updated tensor interface to remove deprecated methods and add GPU support
src/model/activations.cpp	Removed scalar activation functions and added GPU kernel calls
src/networks/fnn/	Updated to use new tensor API with getValue/setValue methods
CMakeLists.txt	Added CUDA language support and removed sanitizers

Comments suppressed due to low confidence (1)

src/model/tensor.cpp:171

[nitpick] The variable name 'j' is not descriptive. Consider renaming it to 'offset' or 'paramOffset' to clarify its purpose as an offset into the parameters.

		const size_t N = cpu_data.size();

Copilot · 2025-08-09T10:04:34Z

src/model/tensor_gpu.cu

+#include <cstddef>
+#include <stdexcept>
+
+namespace nn::global::tensor_gpu {


The CUDA_CHECK macro uses fprintf to stderr and throws exceptions, which can be problematic in CUDA kernels. Consider using a more robust error handling mechanism or ensuring this macro is only used in host code.

Suggested change

namespace nn::global::tensor_gpu {

namespace nn::global::tensor_gpu {

// Ensure CUDA_CHECK is only available in host code

#ifndef __CUDA_ARCH__

Copilot · 2025-08-09T10:04:34Z

src/model/tensor_gpu.cu

+// Softmax
+// ==================================================
+__global__ void softmaxKernel(const ValueType* input, ValueType* output, std::size_t count) {
+    extern __shared__ ValueType shared[];


The softmax implementation uses a naive approach for finding max and sum across threads. Consider using more efficient parallel reduction techniques (e.g., warp shuffles or tree reduction) for better performance on larger tensors.

Copilot · 2025-08-09T10:04:35Z

src/model/tensor.cpp

+		ValueType *temp = gpu_data;
+		if (gpu_data_size != other.gpu_data_size) {
+			temp = (ValueType *)tensor_gpu::allocate(other.gpu_data_size * sizeof(ValueType));
+
+			gpu_data_size = other.gpu_data_size;
+		}
+		tensor_gpu::copyDeviceToDevice(gpu_data, other.gpu_data, gpu_data_size * sizeof(ValueType));
+
+		if (gpu_data_size != other.gpu_data_size) {
+			tensor_gpu::deallocate(gpu_data);
+			gpu_data = temp;
+		}


This line copies to gpu_data using the current object's gpu_data_size before checking if sizes match. If gpu_data_size != other.gpu_data_size, this could cause a buffer overflow. The copy should use other.gpu_data_size or be moved after the size check.

Suggested change

ValueType *temp = gpu_data;

if (gpu_data_size != other.gpu_data_size) {

temp = (ValueType *)tensor_gpu::allocate(other.gpu_data_size * sizeof(ValueType));

gpu_data_size = other.gpu_data_size;

}

tensor_gpu::copyDeviceToDevice(gpu_data, other.gpu_data, gpu_data_size * sizeof(ValueType));

if (gpu_data_size != other.gpu_data_size) {

tensor_gpu::deallocate(gpu_data);

gpu_data = temp;

}

if (gpu_data_size != other.gpu_data_size) {

ValueType *new_gpu_data = (ValueType *)tensor_gpu::allocate(other.gpu_data_size * sizeof(ValueType));

tensor_gpu::copyDeviceToDevice(new_gpu_data, other.gpu_data, other.gpu_data_size * sizeof(ValueType));

tensor_gpu::deallocate(gpu_data);

gpu_data = new_gpu_data;

gpu_data_size = other.gpu_data_size;

} else {

tensor_gpu::copyDeviceToDevice(gpu_data, other.gpu_data, gpu_data_size * sizeof(ValueType));

}

Copilot · 2025-08-09T10:04:35Z

tests/binary_test.cpp

+	// nn::global::Tensor give_me_a_name({5, 3});
+	// printf("test: \n");
+	//    give_me_a_name.fill(5);
+	// give_me_a_name.setValue({2, 1}, 5);
+	//
+	// nn::global::Tensor give_me_a_name1({5, 3});
+	// printf("test: \n");
+	//    give_me_a_name1.fill(3);
+	// give_me_a_name1.setValue({2, 1}, 10);
+	//    give_me_a_name1 += give_me_a_name;
+	//    printf("test: %f\n", give_me_a_name1.getValue({2,1}));
+	// return 0;


There is a large block of commented-out test code (lines 80-91). This should either be removed if no longer needed or converted to proper test cases if it serves a purpose.

Suggested change

// nn::global::Tensor give_me_a_name({5, 3});

// printf("test: \n");

// give_me_a_name.fill(5);

// give_me_a_name.setValue({2, 1}, 5);

//

// nn::global::Tensor give_me_a_name1({5, 3});

// printf("test: \n");

// give_me_a_name1.fill(3);

// give_me_a_name1.setValue({2, 1}, 10);

// give_me_a_name1 += give_me_a_name;

// printf("test: %f\n", give_me_a_name1.getValue({2,1}));

// return 0;

Copilot · 2025-08-09T10:04:35Z

src/model/activations.cpp

-	for (auto &value : metrix) {
-		if (value > max) {
-			max = value;
+	if (metrix.isGpu) {


The GPU path for maxVector function is incomplete - there's an empty if block. This will cause incorrect behavior when using GPU mode as it falls through to CPU code even when isGpu is true.

Suggested change

if (metrix.isGpu) {

if (metrix.isGpu) {

// Assuming tensor_gpu::max exists and returns the max value on the GPU

return tensor_gpu::max(metrix);

dredstone1 · 2025-08-09T18:26:41Z

mergin

dredstone1 added 30 commits August 5, 2025 16:25

first commit

84a7b8b

new commit

313c342

new commit

2294ccd

new commit

590e9cc

new commit

b1a9c53

new commit

6aecfb5

new commit

2d8637c

new commit

228b6d6

new commit

7c96a3c

new commit

038136f

new commit

2785823

bug fix

d50e875

bug fixes

5c37bfb

new commit

882810c

more bug fixes

4c83f40

bug fix

8062331

new commit

413770b

bug fixes, i simplify the data structure

efe75cb

bug fixes

ace20ca

bug fix

26c607a

new commit

a157c5e

performance improvment

696092a

small change

2f64b7e

bug fix, improve performance

7890c8f

improved performance

d313393

small change

0ef2270

improved performance

0857251

Update cmake-multi-platform.yml for cuda

c5530d1

small change - restoring

7f8d49c

improved performance

fd16a3b

dredstone1 added 4 commits August 8, 2025 23:23

small changes

09174e4

small formating changes

a3825f8

Update cmake-multi-platform.yml

586dc3f

improved code

4ddf94d

dredstone1 requested a review from Copilot August 9, 2025 10:03

Copilot AI reviewed Aug 9, 2025

View reviewed changes

dredstone1 added 6 commits August 9, 2025 16:37

bug fixes

369b2ab

now it is possible to show only status on visual

c271d62

small bug fix

60a4cb1

gpu mode is fully work now!!!

0c6228e

bug fixes

4f77d9e

now it is possible to change to gpu mode from the user code

c2b92e7

dredstone1 marked this pull request as ready for review August 9, 2025 18:15

dredstone1 closed this Aug 9, 2025

dredstone1 reopened this Aug 9, 2025

dredstone1 merged commit a71a330 into main Aug 9, 2025
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding gpu support#2

adding gpu support#2
dredstone1 merged 40 commits intomainfrom
gpuCode

dredstone1 commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 9, 2025

Uh oh!

Copilot AI Aug 9, 2025

Uh oh!

Copilot AI Aug 9, 2025

Uh oh!

Copilot AI Aug 9, 2025

Uh oh!

Copilot AI Aug 9, 2025

Uh oh!

dredstone1 commented Aug 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dredstone1 commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes:

Reviewed Changes

Uh oh!

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

dredstone1 commented Aug 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant