Skip to content

AIGeeksGroup/DragMesh

Repository files navigation

DragMesh logo DragMesh: Interactive 3D Generation Made Easy

Official repository for the paper

DragMesh: Interactive 3D Generation Made Easy.

Tianshan Zhang*, Zeyu Zhang*†, Hao Tang#

*Equal contribution. †Project lead. #Corresponding author.

Note

GAPartNet (link above) is the canonical dataset source for all articulated assets used in DragMesh.

teaser.mp4

🧾 Citation

If you find DragMesh helpful, please cite:

@article{zhang2025dragmesh,
  title={DragMesh: Interactive 3D Generation Made Easy},
  author={Zhang, Tianshan and Zhang, Zeyu and Tang, Hao},
  journal={arXiv preprint arXiv:2512.06424},
  year={2025}
}

✨ Intro

While generative models have excelled at creating static 3D content, the pursuit of systems that understand how objects move and respond to interactions remains a fundamental challenge. Current methods for articulated motion lie at a crossroads: they are either physically consistent but too slow for real-time use, or generative but violate basic kinematic constraints. We present DragMesh, a robust framework for real-time interactive 3D articulation built around a lightweight motion generation core. Our core contribution is a novel decoupled kinematic reasoning and motion generation framework. First, we infer the latent joint parameters by decoupling semantic intent reasoning (which determines the joint type) from geometric regression (which determines the axis and origin using our Kinematics Prediction Network (KPP-Net)). Second, to leverage the compact, continuous, and singularity-free properties of dual quaternions for representing rigid body motion, we develop a novel Dual Quaternion VAE (DQ-VAE). This DQ-VAE receives these predicted priors, along with the original user drag, to generate a complete, plausible motion trajectory. To ensure strict adherence to kinematics, we inject the joint priors at every layer of the DQ-VAE's non-autoregressive Transformer decoder using FiLM (Feature-wise Linear Modulation) conditioning. This persistent, multi-scale guidance is complemented by a numerically-stable cross-product loss to guarantee axis alignment. This decoupled design allows DragMesh to achieve real-time performance and enables plausible, generative articulation on novel objects without retraining, offering a practical step toward generative 3D intelligence.

πŸ“° News

βœ… TODO

  • Upload the DragMesh paper and project page.
  • Release the training and inference code.
  • Provide GAPartNet processing pipeline and LMDB builder.
  • Share checkpoints on Hugging Face.
  • Create an interactive presentation.
  • Publish a Hugging Face Space for browser-based manipulation.

⚑ Quick Start

🧩 Environment Setup

It targets Python 3.10, CUDA 12.1, and PyTorch 2.4.1 :

conda env create -f environment.yml
conda activate dragmesh
conda env update -f environment.yml --prune

The spec already installs trimesh, pyrender, pygltflib, viser, Objaverse, SAPIEN, pytorch3d, and tiny-cuda-nn.

πŸ› οΈ Native Extensions

Chamfer distance kernels are required for the VAE loss. Clone and build the upstream project:

git clone https://github.com/ThibaultGROUEIX/ChamferDistancePytorch.git
cd ChamferDistancePytorch
python setup.py install
cd ..

πŸ“¦ Data Preparation (GAPartNet)

Note

We have placed the built LMDB train and validation datasets at the following link. If you don't want to build them yourself, you can download them directly.

  1. Visit https://pku-epic.github.io/GAPartNet/ and download the articulated assets for the categories listed in config/category_split_v2.json.
  2. Arrange files so that each object folder contains mobility_annotation_gapartnet.urdf, meta.json, and textured meshes (*.obj). Example:
    data/gapartnet/<object_id>/
      |- mobility_annotation_gapartnet.urdf
      |- meta.json
      |- textured_objs/*.obj
    
  3. Convert to LMDB for fast training IO:
    python utils/build_lmdb.py \
      --dataset_root data/gapartnet \
      --output_prefix data/dragmesh \
      --config config/category_split_v2.json \
      --num_frames 16 \
      --num_points 4096
    # Produces data/dragmesh_train.lmdb and data/dragmesh_val.lmdb
    Optional knobs:
    • --joint_selection largest_motion: chooses a representative joint by motion span Γ— moving geometry scale.
    • --joint_selection first / random: deterministic / random joint selection.
  4. Use utils/balanced_dataset_utils.get_motion_type_weights with WeightedRandomSampler if you need balanced revolute/prismatic sampling.

🧠 Training

Dual Quaternion VAE

python scripts/train_vae_v2.py \
  --lmdb_train_path data/dragmesh_train.lmdb \
  --lmdb_val_path data/dragmesh_val.lmdb \
  --data_split_json_path config/category_split_v2.json \
  --output_dir outputs/vae \
  --num_epochs 300 \
  --batch_size 16 \
  --latent_dim 256 \
  --num_frames 16 \
  --mesh_recon_weight 10.0 \
  --cd_weight 30.0 \
  --kl_weight 0.001 \
  --kl_anneal_epochs 80 \
  --use_tensorboard --use_wandb

Kinematics Prediction Network (KPP-Net)

python scripts/train_predictor.py \
  --lmdb_train_path data/dragmesh_train.lmdb \
  --lmdb_val_path data/dragmesh_val.lmdb \
  --data_split_json_path config/category_split_v2.json \
  --output_dir outputs/kpp \
  --batch_size 32 \
  --num_epochs 200 \
  --encoder_type attention \
  --head_type decoupled \
  --predict_type True

Both scripts log to TensorBoard and optionally Weights & Biases. Check modules/loss.py and modules/predictor_loss.py for objective details.

πŸ§ͺ Inference

Batch Sweep (dataset mode)

python inference_animation.py \
  --dataset_root data/gapartnet \
  --checkpoint best_model.pth \
  --sample_id 40261 \
  --output_dir results_deterministic \
  --num_samples 5 \
  --num_frames 16 \
  --fps 5 \
  --loop_mode pingpong

Outputs MP4, GIF, and an animated GLB per object.

Batch Sweep (KPP-driven joint parameters)

python inference_animation_kpp.py \
  --dataset_root data/gapartnet \
  --checkpoint outputs/vae/best_model.pth \
  --kpp_checkpoint outputs/kpp/best_model_kpp.pth \
  --sample_id 40261 \
  --output_dir results_kpp_anim \
  --num_samples 5 \
  --num_frames 16 \
  --fps 5 \
  --loop_mode pingpong

Custom mesh manipulation (manual input)

python inference_pipeline.py \
  --mesh_file assets/cabinet.obj \
  --mask_file assets/cabinet_vertex_labels.npy \
  --mask_format vertex \
  --drag_point 0.12,0.48,0.05 \
  --drag_vector 0.0,0.0,0.2 \
  --manual_joint_type revolute \
  --kpp_checkpoint best_model_kpp.pth \
  --vae_checkpoint best_model.pth \
  --output_dir outputs/cabinet_demo \
  --num_samples 3 \
  --fps 5 \
  --loop_mode pingpong

Supply drag points/vectors directly through the CLI (no viewer UI). Use --manual_joint_type revolute or --manual_joint_type prismatic to force a specific motion family when needed. If you omit the manual override, the pipeline first trusts KPP-Net and, when --llm_endpoint + --llm_api_key are provided, backs off to the LLM-based classifier described in inference_pipeline.py. Outputs share the same MP4/GIF/GLB format as the batch pipeline.

πŸ‘€ Visualization

  • GIF/MP4 export depends on pyrender and imageio. For systems without a display or on remote servers, it is recommended to set: PYOPENGL_PLATFORM=osmesa.
  • inference_animation.py also exports animated GLB files for direct use in GLTF viewers.
  • For additional visualization tooling (e.g., rerun or Blender scripts), see inference_animation.py and inference_pipeline.py.

πŸ‘©β€πŸ’» Case Study

Scenario Description
Drawer opening Translational motion predicted entirely from drag cues.
Microwave door Revolute joint inference with FiLM conditioned motion generation.
Bucket handle High curvature rotations showing the benefit of dual quaternions.

🎬 Demo Gallery

Translational drags

ImageToStl.com_22508.mp4
ImageToStl.com_27044.mp4
ImageToStl.com_32601.mp4
ImageToStl.com_100051.mp4
ImageToStl.com_102996.mp4
ImageToStl.com_29921.mp4

Rotational drags

ImageToStl.com_10040.mp4
ImageToStl.com_32086.mp4
ImageToStl.com_41003.mp4
ImageToStl.com_45087.mp4
ImageToStl.com_100234.mp4
ImageToStl.com_100431.mp4

Self-spin / free-spin

ImageToStl.com_102528.mp4
ImageToStl.com_103048.mp4
ImageToStl.com_103514.mp4

πŸ—‚οΈ Repository Tour

Path Content
modules/model_v2.py Dual Quaternion VAE (encoder, decoder, FiLM Transformer).
modules/predictor.py KPP-Net architecture.
modules/data_loader_v2.py GAPartNet parsing and dual quaternion labels.
utils/balanced_dataset_utils.py LMDB dataset builder and balanced sampling utilities.
scripts/train_vae_v2.py, scripts/train_predictor.py Training entry points.
inference_animation*.py, inference_pipeline.py Inference pipelines (batch and interactive).
ChamferDistancePytorch/ CUDA kernels for Chamfer distance and auxiliary metrics.

🌳 Project Tree (annotated)

DragMesh/
β”œβ”€β”€ assets/                      # Logos, teaser figures, future demo media
β”‚   β”œβ”€β”€ dragmesh_logo.png
β”‚   └── teaser.png
checkpoints/                
β”‚   β”œβ”€β”€ dqvae.pth             
β”‚   └── kpp.pth
β”œβ”€β”€ ChamferDistancePytorch/      # CUDA/C++ Chamfer distance implementation (build with setup.py)
β”œβ”€β”€ config/
β”‚   └── category_split_v2.json   # GAPartNet in-domain split definition
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ model_v2.py              # Dual Quaternion VAE architecture
β”‚   β”œβ”€β”€ predictor.py             # KPP-Net for kinematic reasoning
β”‚   β”œβ”€β”€ loss.py                  # VAE objectives (Chamfer, dual quaternions, constraints)
β”‚   β”œβ”€β”€ predictor_loss.py        # Loss terms for KPP-Net
β”‚   └── data_loader_v2.py        # GAPartNet loader + dual quaternion ground truth builder
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ train_vae_v2.py          # Training loop for the VAE motion prior
β”‚   └── train_predictor.py       # Training loop for KPP-Net
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ balanced_dataset_utils.py # LMDB dataset class + balanced sampling helper
β”‚   β”œβ”€β”€ dataset_utils.py          # Category-aware dataset wrappers
β”‚   └── build_lmdb.py             # CLI to build LMDBs from GAPartNet folders
β”œβ”€β”€ partnet/
β”‚   └── Hunyuan3D-Part/           # External resources (P3-SAM, XPart docs)
β”œβ”€β”€ results_deterministic/        # Placeholder for inference outputs (MP4/GIF/GLB)
β”œβ”€β”€ inference_animation.py        # Batch evaluation + GLB export
β”œβ”€β”€ inference_animation_kpp.py    # Dataset-driven animation tests (legacy interface)
β”œβ”€β”€ inference_pipeline.py         # Interactive mesh manipulation pipeline
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ README.md                     

πŸ™ Acknowledgement

We thank the GAPartNet team for the articulated dataset, and upstream projects such as ChamferDistancePytorch, Objaverse, SAPIEN, and PyTorch3D for their open-source contributions.

🌟 Star History

Star History Chart

About

DragMesh: Interactive 3D Generation Made Easy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages