Skip to content

Conversation

@govindchari
Copy link
Member

@govindchari govindchari commented Dec 16, 2025

Currently, only the factorization and solves are performed on the GPU. All other operations are done on the CPU. This incurs significant overhead due to cudaMemcpy, so this PR seeks to remove all cudaMemcpy.

Track down all get_data_vectorf() calls. These should ideally return GPU pointers, but currently just returns CPU pointers.

Then write kernels to operate directly on QOCOVectorf rather than directly accessing pointers in memory.

The main code for QOCO should not touch pointers at all and should just use the QOCOVectorf abstraction.

@github-actions
Copy link

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • HS35: diff=0.0000s, baseline=0.0000s, Δ=+70.0%

Runtime improvements (> 5.0%)

  • CVXQP1_S: diff=0.0005s, baseline=0.0005s, Δ=-6.7%
  • DPKLO1: diff=0.0001s, baseline=0.0001s, Δ=-8.1%
  • QADLITTL: diff=0.0004s, baseline=0.0004s, Δ=-7.6%
  • QAFIRO: diff=0.0001s, baseline=0.0002s, Δ=-7.2%

@github-actions
Copy link

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 129 problems

Runtime regressions (> 5.0%)

  • DUALC5: diff=0.0007s, baseline=0.0006s, Δ=+10.9%
  • HS268: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • HS53: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • HS76: diff=0.0000s, baseline=0.0000s, Δ=+6.7%
  • QSHARE1B: diff=0.0023s, baseline=0.0021s, Δ=+6.7%
  • S268: diff=0.0000s, baseline=0.0000s, Δ=+6.3%
  • ZECEVIC2: diff=0.0000s, baseline=0.0000s, Δ=+8.3%

Runtime improvements (> 5.0%)

  • DUALC8: diff=0.0016s, baseline=0.0020s, Δ=-20.9%
  • HS35: diff=0.0000s, baseline=0.0000s, Δ=-65.6%
  • PRIMALC2: diff=0.0018s, baseline=0.0020s, Δ=-7.0%
  • QPTEST: diff=0.0000s, baseline=0.0000s, Δ=-60.7%

@github-actions
Copy link

Download benchmark artifacts

Benchmark Summary

  • Baseline solved: 129 problems
  • Diff branch solved: 8 problems

Differences in solved problems

  • Baseline solved additional problems: AUG2DCQP, AUG2DQP, AUG3DCQP, AUG3DQP, BOYD1, BOYD2, CONT-050, CONT-100, CONT-101, CONT-200, CONT-201, CONT-300, CVXQP1_M, CVXQP1_S, CVXQP2_L, CVXQP2_M, CVXQP2_S, CVXQP3_M, CVXQP3_S, DTOC3, DUAL1, DUAL2, DUAL3, DUAL4, DUALC1, DUALC2, DUALC5, DUALC8, EXDATA, GOULDQP2, GOULDQP3, HS118, HS21, HS268, HS35, HS35MOD, HS53, HS76, HUES-MOD, HUESTIS, KSIP, LASER, LISWET2, LISWET3, LISWET4, LISWET5, LISWET6, LISWET7, LOTSCHD, MOSARQP1, MOSARQP2, POWELL20, PRIMAL1, PRIMAL2, PRIMAL3, PRIMAL4, PRIMALC1, PRIMALC2, PRIMALC5, PRIMALC8, Q25FV47, QADLITTL, QAFIRO, QBANDM, QBEACONF, QBORE3D, QBRANDY, QCAPRI, QE226, QETAMACR, QFFFFF80, QFORPLAN, QGFRDXPN, QGROW15, QGROW22, QGROW7, QISRAEL, QPCBLEND, QPCBOEI1, QPCBOEI2, QPCSTAIR, QPILOTNO, QPTEST, QRECIPE, QSC205, QSCAGR25, QSCAGR7, QSCFXM1, QSCFXM2, QSCFXM3, QSCORPIO, QSCRS8, QSCSD1, QSCSD6, QSCSD8, QSCTAP1, QSCTAP2, QSCTAP3, QSEBA, QSHARE1B, QSHARE2B, QSHELL, QSHIP04L, QSHIP04S, QSHIP08L, QSHIP08S, QSHIP12L, QSHIP12S, QSIERRA, QSTAIR, QSTANDAT, S268, STADAT1, STADAT2, STADAT3, STCQP1, STCQP2, TAME, UBH1, VALUES, ZECEVIC2

Runtime regressions (> 5.0%)

  • AUG3D: diff=0.0019s, baseline=0.0016s, Δ=+20.9%
  • DPKLO1: diff=0.0002s, baseline=0.0001s, Δ=+48.9%
  • GENHS28: diff=0.0000s, baseline=0.0000s, Δ=+25.0%
  • HS51: diff=0.0000s, baseline=0.0000s, Δ=+66.7%
  • HS52: diff=0.0000s, baseline=0.0000s, Δ=+33.3%

@govindchari govindchari marked this pull request as draft December 17, 2025 05:24
@govindchari govindchari changed the title Move entire solve to GPU [DRAFT] Move solve to GPU Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants