Skip to content

adding gpu support#2

Merged
dredstone1 merged 40 commits intomainfrom
gpuCode
Aug 9, 2025
Merged

adding gpu support#2
dredstone1 merged 40 commits intomainfrom
gpuCode

Conversation

@dredstone1
Copy link
Owner

No description provided.

@dredstone1 dredstone1 requested a review from Copilot August 9, 2025 10:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements GPU support for neural network operations using CUDA. The core change replaces CPU-based tensor operations with GPU-accelerated equivalents, including matrix operations, activation functions, and memory management.

Key Changes:

  • Added CUDA infrastructure with tensor_gpu.hpp/.cu for GPU operations
  • Modified Tensor class to support both CPU and GPU execution modes
  • Updated all tensor operations to use GPU kernels when enabled
  • Removed deprecated scalar activation functions in favor of tensor-based operations

Reviewed Changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/model/tensor_gpu.hpp/.cu New CUDA implementation for GPU tensor operations and activation functions
src/model/tensor.cpp Modified tensor implementation to support GPU mode with conditional execution paths
include/tensor.hpp Updated tensor interface to remove deprecated methods and add GPU support
src/model/activations.cpp Removed scalar activation functions and added GPU kernel calls
src/networks/fnn/ Updated to use new tensor API with getValue/setValue methods
CMakeLists.txt Added CUDA language support and removed sanitizers
Comments suppressed due to low confidence (1)

src/model/tensor.cpp:171

  • [nitpick] The variable name 'j' is not descriptive. Consider renaming it to 'offset' or 'paramOffset' to clarify its purpose as an offset into the parameters.
		const size_t N = cpu_data.size();

#include <cstddef>
#include <stdexcept>

namespace nn::global::tensor_gpu {
Copy link

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA_CHECK macro uses fprintf to stderr and throws exceptions, which can be problematic in CUDA kernels. Consider using a more robust error handling mechanism or ensuring this macro is only used in host code.

Suggested change
namespace nn::global::tensor_gpu {
namespace nn::global::tensor_gpu {
// Ensure CUDA_CHECK is only available in host code
#ifndef __CUDA_ARCH__

Copilot uses AI. Check for mistakes.
// Softmax
// ==================================================
__global__ void softmaxKernel(const ValueType* input, ValueType* output, std::size_t count) {
extern __shared__ ValueType shared[];
Copy link

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The softmax implementation uses a naive approach for finding max and sum across threads. Consider using more efficient parallel reduction techniques (e.g., warp shuffles or tree reduction) for better performance on larger tensors.

Copilot uses AI. Check for mistakes.
Comment on lines 83 to 94
ValueType *temp = gpu_data;
if (gpu_data_size != other.gpu_data_size) {
temp = (ValueType *)tensor_gpu::allocate(other.gpu_data_size * sizeof(ValueType));

gpu_data_size = other.gpu_data_size;
}
tensor_gpu::copyDeviceToDevice(gpu_data, other.gpu_data, gpu_data_size * sizeof(ValueType));

if (gpu_data_size != other.gpu_data_size) {
tensor_gpu::deallocate(gpu_data);
gpu_data = temp;
}
Copy link

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line copies to gpu_data using the current object's gpu_data_size before checking if sizes match. If gpu_data_size != other.gpu_data_size, this could cause a buffer overflow. The copy should use other.gpu_data_size or be moved after the size check.

Suggested change
ValueType *temp = gpu_data;
if (gpu_data_size != other.gpu_data_size) {
temp = (ValueType *)tensor_gpu::allocate(other.gpu_data_size * sizeof(ValueType));
gpu_data_size = other.gpu_data_size;
}
tensor_gpu::copyDeviceToDevice(gpu_data, other.gpu_data, gpu_data_size * sizeof(ValueType));
if (gpu_data_size != other.gpu_data_size) {
tensor_gpu::deallocate(gpu_data);
gpu_data = temp;
}
if (gpu_data_size != other.gpu_data_size) {
ValueType *new_gpu_data = (ValueType *)tensor_gpu::allocate(other.gpu_data_size * sizeof(ValueType));
tensor_gpu::copyDeviceToDevice(new_gpu_data, other.gpu_data, other.gpu_data_size * sizeof(ValueType));
tensor_gpu::deallocate(gpu_data);
gpu_data = new_gpu_data;
gpu_data_size = other.gpu_data_size;
} else {
tensor_gpu::copyDeviceToDevice(gpu_data, other.gpu_data, gpu_data_size * sizeof(ValueType));
}

Copilot uses AI. Check for mistakes.
Comment on lines 80 to 91
// nn::global::Tensor give_me_a_name({5, 3});
// printf("test: \n");
// give_me_a_name.fill(5);
// give_me_a_name.setValue({2, 1}, 5);
//
// nn::global::Tensor give_me_a_name1({5, 3});
// printf("test: \n");
// give_me_a_name1.fill(3);
// give_me_a_name1.setValue({2, 1}, 10);
// give_me_a_name1 += give_me_a_name;
// printf("test: %f\n", give_me_a_name1.getValue({2,1}));
// return 0;
Copy link

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a large block of commented-out test code (lines 80-91). This should either be removed if no longer needed or converted to proper test cases if it serves a purpose.

Suggested change
// nn::global::Tensor give_me_a_name({5, 3});
// printf("test: \n");
// give_me_a_name.fill(5);
// give_me_a_name.setValue({2, 1}, 5);
//
// nn::global::Tensor give_me_a_name1({5, 3});
// printf("test: \n");
// give_me_a_name1.fill(3);
// give_me_a_name1.setValue({2, 1}, 10);
// give_me_a_name1 += give_me_a_name;
// printf("test: %f\n", give_me_a_name1.getValue({2,1}));
// return 0;

Copilot uses AI. Check for mistakes.
for (auto &value : metrix) {
if (value > max) {
max = value;
if (metrix.isGpu) {
Copy link

Copilot AI Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPU path for maxVector function is incomplete - there's an empty if block. This will cause incorrect behavior when using GPU mode as it falls through to CPU code even when isGpu is true.

Suggested change
if (metrix.isGpu) {
if (metrix.isGpu) {
// Assuming tensor_gpu::max exists and returns the max value on the GPU
return tensor_gpu::max(metrix);

Copilot uses AI. Check for mistakes.
@dredstone1 dredstone1 marked this pull request as ready for review August 9, 2025 18:15
@dredstone1
Copy link
Owner Author

mergin

@dredstone1 dredstone1 closed this Aug 9, 2025
@dredstone1 dredstone1 reopened this Aug 9, 2025
@dredstone1 dredstone1 merged commit a71a330 into main Aug 9, 2025
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant