Skip to content

Conversation

@thowell
Copy link
Collaborator

@thowell thowell commented Dec 17, 2025

as part of the effort to implement sparse Jacobians #88, this pr implements a sparse representation for flexedge_J.

the results for is_sparse == True with benchmark/cloth/scene.xml

Loading model from: benchmark/cloth/scene.xml...
  nbody: 918 nv: 2706 ngeom: 921 nu: 0 is_sparse: True
  broadphase: SAP_TILE broadphase_filter: PLANE|SPHERE|OBB
  solver: CG cone: PYRAMIDAL iterations: 100 iterative linesearch iterations: 50
  integrator: EULER graph_conditional: True
Data
  nworld: 256 naconmax: 128000 njmax: 4000

Rolling out 10 steps at dt = 0.005...

Summary for 256 parallel rollouts
  • improved performance of smooth.flex: 252202.40 ns -> 196.00 ns
  • memory savings: 38.34% -> < 1% of reported utilized memory ~6GB of savings
mjwarp-testspeed benchmark/cloth/scene.xml --nworld=256 --nconmax=500 --njmax=4000 --nstep=10 --event_trace=True --memory=True

this pr:

Rolling out 10 steps at dt = 0.005...

Summary for 256 parallel rollouts

Total JIT time: 0.22 s
Total simulation time: 1.88 s
Total steps per second: 1,364
Total realtime factor: 6.82 x
Total time per step: 733113.06 ns
Total converged worlds: 256 / 256

Event trace:

step: 730710.40
  forward: 730423.20
    fwd_position: 165132.40
      kinematics: 618.65
      com_pos: 339.20
      camlight: 42.80
      flex: 196.00
      crb: 244.00
      tendon_armature: 4.80
      collision: 880.00
        sap_broadphase: 782.60
        convex_narrowphase: 5.20
        primitive_narrowphase: 62.80
      make_constraint: 162742.80
      transmission: 5.60
    sensor_pos: 6.00
    fwd_velocity: 2840.40
      com_vel: 348.80
      passive: 1820.80
      rne: 636.80
      tendon_bias: 5.20
    sensor_vel: 6.00
    fwd_actuation: 15.60
    fwd_acceleration: 1934.80
      xfrc_accumulate: 1696.00
    solve: 560426.00
      mul_m: 92.40
      solve_m: 94.00
    sensor_acc: 5.20
  euler: 270.80

Total memory: 11002.88 MB / 48640.12 MB (22.62%)
Model memory (0.38%):
 (no field >= 1% of utilized memory)
Data memory (99.62%):
 efc.J: 10625.00 MB (96.57%)

main (f2f7957):

Rolling out 10 steps at dt = 0.005...

Summary for 256 parallel rollouts

Total JIT time: 0.20 s
Total simulation time: 3.07 s
Total steps per second: 833
Total realtime factor: 4.17 x
Total time per step: 1200190.00 ns
Total converged worlds: 256 / 256

Event trace:

step: 1198163.98
  forward: 1197873.60
    fwd_position: 633293.60
      kinematics: 622.34
      com_pos: 338.80
      camlight: 42.80
      flex: 252202.40
      crb: 279.60
      tendon_armature: 5.60
      collision: 878.40
        sap_broadphase: 781.60
        convex_narrowphase: 5.60
        primitive_narrowphase: 64.00
      make_constraint: 378861.60
      transmission: 5.60
    sensor_pos: 6.00
    fwd_velocity: 2841.20
      com_vel: 345.20
      passive: 1832.80
      rne: 628.80
      tendon_bias: 5.20
    sensor_vel: 5.60
    fwd_actuation: 16.00
    fwd_acceleration: 1906.80
      xfrc_accumulate: 1668.00
    solve: 559744.00
      mul_m: 92.80
      solve_m: 94.80
    sensor_acc: 5.60
  euler: 271.60

Total memory: 17788.09 MB / 48640.12 MB (36.57%)
Model memory (0.24%):
 (no field >= 1% of utilized memory)
Data memory (99.76%):
 flexedge_J: 6820.49 MB (38.34%)
 efc.J: 10625.00 MB (59.73%)```

@thowell thowell requested a review from quagla December 17, 2025 11:22
Copy link
Collaborator

@quagla quagla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

d.qLD = wp.zeros((nworld, mjm.nv, mjm.nv), dtype=float)

if mujoco.mj_isSparse(mjm):
d.flexedge_J_rownnz = wp.zeros((nworld, mjm.nflexedge), dtype=int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess all the dense branches of flexedge_J will be deleted once we make it always sparse in mujoco?

@quagla
Copy link
Collaborator

quagla commented Dec 17, 2025

Also, does this allow for a larger batch size in the cloth benchmark?

@thowell thowell linked an issue Dec 18, 2025 that may be closed by this pull request
4 tasks
@thowell thowell mentioned this pull request Dec 19, 2025
4 tasks
@thowell
Copy link
Collaborator Author

thowell commented Dec 19, 2025

created an issue to update relevant benchmarks once we land improved sparsity #938

@thowell
Copy link
Collaborator Author

thowell commented Jan 8, 2026

  • updated the test to utilize an assert instead of an if/else
  • in io.py moved flexedge_J fields from mujoco.mj_isSparse(mjm) block to m.opt.is_sparse block

@thowell
Copy link
Collaborator Author

thowell commented Jan 8, 2026

@quagla we should get the sparse efc_J #934 changes merged before updating the benchmarks. then yes, we should be able to significantly increase the batch size. this pr has the sparsity changes integrated and enables increasing nworld to 4096 for benchmark/cloth/scene.xml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JacobianType.SPARSE

2 participants