Official repository for the paper
DragMesh: Interactive 3D Generation Made Easy.
Tianshan Zhang*, Zeyu Zhang*β , Hao Tang#
*Equal contribution. β Project lead. #Corresponding author.
Note
GAPartNet (link above) is the canonical dataset source for all articulated assets used in DragMesh.
teaser.mp4
If you find DragMesh helpful, please cite:
@article{zhang2025dragmesh,
title={DragMesh: Interactive 3D Generation Made Easy},
author={Zhang, Tianshan and Zhang, Zeyu and Tang, Hao},
journal={arXiv preprint arXiv:2512.06424},
year={2025}
}While generative models have excelled at creating static 3D content, the pursuit of systems that understand how objects move and respond to interactions remains a fundamental challenge. Current methods for articulated motion lie at a crossroads: they are either physically consistent but too slow for real-time use, or generative but violate basic kinematic constraints. We present DragMesh, a robust framework for real-time interactive 3D articulation built around a lightweight motion generation core. Our core contribution is a novel decoupled kinematic reasoning and motion generation framework. First, we infer the latent joint parameters by decoupling semantic intent reasoning (which determines the joint type) from geometric regression (which determines the axis and origin using our Kinematics Prediction Network (KPP-Net)). Second, to leverage the compact, continuous, and singularity-free properties of dual quaternions for representing rigid body motion, we develop a novel Dual Quaternion VAE (DQ-VAE). This DQ-VAE receives these predicted priors, along with the original user drag, to generate a complete, plausible motion trajectory. To ensure strict adherence to kinematics, we inject the joint priors at every layer of the DQ-VAE's non-autoregressive Transformer decoder using FiLM (Feature-wise Linear Modulation) conditioning. This persistent, multi-scale guidance is complemented by a numerically-stable cross-product loss to guarantee axis alignment. This decoupled design allows DragMesh to achieve real-time performance and enables plausible, generative articulation on novel objects without retraining, offering a practical step toward generative 3D intelligence.
- Upload the DragMesh paper and project page.
- Release the training and inference code.
- Provide GAPartNet processing pipeline and LMDB builder.
- Share checkpoints on Hugging Face.
- Create an interactive presentation.
- Publish a Hugging Face Space for browser-based manipulation.
It targets Python 3.10, CUDA 12.1, and PyTorch 2.4.1 :
conda env create -f environment.yml
conda activate dragmesh
conda env update -f environment.yml --pruneThe spec already installs trimesh, pyrender, pygltflib, viser, Objaverse, SAPIEN, pytorch3d, and tiny-cuda-nn.
Chamfer distance kernels are required for the VAE loss. Clone and build the upstream project:
git clone https://github.com/ThibaultGROUEIX/ChamferDistancePytorch.git
cd ChamferDistancePytorch
python setup.py install
cd ..Note
We have placed the built LMDB train and validation datasets at the following link. If you don't want to build them yourself, you can download them directly.
- Visit https://pku-epic.github.io/GAPartNet/ and download the articulated assets for the categories listed in
config/category_split_v2.json. - Arrange files so that each object folder contains
mobility_annotation_gapartnet.urdf,meta.json, and textured meshes (*.obj). Example:data/gapartnet/<object_id>/ |- mobility_annotation_gapartnet.urdf |- meta.json |- textured_objs/*.obj - Convert to LMDB for fast training IO:
Optional knobs:
python utils/build_lmdb.py \ --dataset_root data/gapartnet \ --output_prefix data/dragmesh \ --config config/category_split_v2.json \ --num_frames 16 \ --num_points 4096 # Produces data/dragmesh_train.lmdb and data/dragmesh_val.lmdb--joint_selection largest_motion: chooses a representative joint by motion span Γ moving geometry scale.--joint_selection first/random: deterministic / random joint selection.
- Use
utils/balanced_dataset_utils.get_motion_type_weightswithWeightedRandomSamplerif you need balanced revolute/prismatic sampling.
python scripts/train_vae_v2.py \
--lmdb_train_path data/dragmesh_train.lmdb \
--lmdb_val_path data/dragmesh_val.lmdb \
--data_split_json_path config/category_split_v2.json \
--output_dir outputs/vae \
--num_epochs 300 \
--batch_size 16 \
--latent_dim 256 \
--num_frames 16 \
--mesh_recon_weight 10.0 \
--cd_weight 30.0 \
--kl_weight 0.001 \
--kl_anneal_epochs 80 \
--use_tensorboard --use_wandbpython scripts/train_predictor.py \
--lmdb_train_path data/dragmesh_train.lmdb \
--lmdb_val_path data/dragmesh_val.lmdb \
--data_split_json_path config/category_split_v2.json \
--output_dir outputs/kpp \
--batch_size 32 \
--num_epochs 200 \
--encoder_type attention \
--head_type decoupled \
--predict_type TrueBoth scripts log to TensorBoard and optionally Weights & Biases. Check modules/loss.py and modules/predictor_loss.py for objective details.
python inference_animation.py \
--dataset_root data/gapartnet \
--checkpoint best_model.pth \
--sample_id 40261 \
--output_dir results_deterministic \
--num_samples 5 \
--num_frames 16 \
--fps 5 \
--loop_mode pingpongOutputs MP4, GIF, and an animated GLB per object.
python inference_animation_kpp.py \
--dataset_root data/gapartnet \
--checkpoint outputs/vae/best_model.pth \
--kpp_checkpoint outputs/kpp/best_model_kpp.pth \
--sample_id 40261 \
--output_dir results_kpp_anim \
--num_samples 5 \
--num_frames 16 \
--fps 5 \
--loop_mode pingpongpython inference_pipeline.py \
--mesh_file assets/cabinet.obj \
--mask_file assets/cabinet_vertex_labels.npy \
--mask_format vertex \
--drag_point 0.12,0.48,0.05 \
--drag_vector 0.0,0.0,0.2 \
--manual_joint_type revolute \
--kpp_checkpoint best_model_kpp.pth \
--vae_checkpoint best_model.pth \
--output_dir outputs/cabinet_demo \
--num_samples 3 \
--fps 5 \
--loop_mode pingpongSupply drag points/vectors directly through the CLI (no viewer UI). Use --manual_joint_type revolute or --manual_joint_type prismatic to force a specific motion family when needed. If you omit the manual override, the pipeline first trusts KPP-Net and, when --llm_endpoint + --llm_api_key are provided, backs off to the LLM-based classifier described in inference_pipeline.py. Outputs share the same MP4/GIF/GLB format as the batch pipeline.
- GIF/MP4 export depends on
pyrenderandimageio. For systems without a display or on remote servers, it is recommended to set:PYOPENGL_PLATFORM=osmesa. inference_animation.pyalso exports animated GLB files for direct use in GLTF viewers.- For additional visualization tooling (e.g., rerun or Blender scripts), see
inference_animation.pyandinference_pipeline.py.
| Scenario | Description |
|---|---|
| Drawer opening | Translational motion predicted entirely from drag cues. |
| Microwave door | Revolute joint inference with FiLM conditioned motion generation. |
| Bucket handle | High curvature rotations showing the benefit of dual quaternions. |
Translational drags
ImageToStl.com_22508.mp4 |
ImageToStl.com_27044.mp4 |
ImageToStl.com_32601.mp4 |
ImageToStl.com_100051.mp4 |
ImageToStl.com_102996.mp4 |
ImageToStl.com_29921.mp4 |
Rotational drags
ImageToStl.com_10040.mp4 |
ImageToStl.com_32086.mp4 |
ImageToStl.com_41003.mp4 |
ImageToStl.com_45087.mp4 |
ImageToStl.com_100234.mp4 |
ImageToStl.com_100431.mp4 |
Self-spin / free-spin
ImageToStl.com_102528.mp4 |
ImageToStl.com_103048.mp4 |
ImageToStl.com_103514.mp4 |
| Path | Content |
|---|---|
modules/model_v2.py |
Dual Quaternion VAE (encoder, decoder, FiLM Transformer). |
modules/predictor.py |
KPP-Net architecture. |
modules/data_loader_v2.py |
GAPartNet parsing and dual quaternion labels. |
utils/balanced_dataset_utils.py |
LMDB dataset builder and balanced sampling utilities. |
scripts/train_vae_v2.py, scripts/train_predictor.py |
Training entry points. |
inference_animation*.py, inference_pipeline.py |
Inference pipelines (batch and interactive). |
ChamferDistancePytorch/ |
CUDA kernels for Chamfer distance and auxiliary metrics. |
DragMesh/
βββ assets/ # Logos, teaser figures, future demo media
β βββ dragmesh_logo.png
β βββ teaser.png
checkpoints/
β βββ dqvae.pth
β βββ kpp.pth
βββ ChamferDistancePytorch/ # CUDA/C++ Chamfer distance implementation (build with setup.py)
βββ config/
β βββ category_split_v2.json # GAPartNet in-domain split definition
βββ modules/
β βββ model_v2.py # Dual Quaternion VAE architecture
β βββ predictor.py # KPP-Net for kinematic reasoning
β βββ loss.py # VAE objectives (Chamfer, dual quaternions, constraints)
β βββ predictor_loss.py # Loss terms for KPP-Net
β βββ data_loader_v2.py # GAPartNet loader + dual quaternion ground truth builder
βββ scripts/
β βββ train_vae_v2.py # Training loop for the VAE motion prior
β βββ train_predictor.py # Training loop for KPP-Net
βββ utils/
β βββ balanced_dataset_utils.py # LMDB dataset class + balanced sampling helper
β βββ dataset_utils.py # Category-aware dataset wrappers
β βββ build_lmdb.py # CLI to build LMDBs from GAPartNet folders
βββ partnet/
β βββ Hunyuan3D-Part/ # External resources (P3-SAM, XPart docs)
βββ results_deterministic/ # Placeholder for inference outputs (MP4/GIF/GLB)
βββ inference_animation.py # Batch evaluation + GLB export
βββ inference_animation_kpp.py # Dataset-driven animation tests (legacy interface)
βββ inference_pipeline.py # Interactive mesh manipulation pipeline
βββ requirements.txt # Python dependencies
βββ README.md
We thank the GAPartNet team for the articulated dataset, and upstream projects such as ChamferDistancePytorch, Objaverse, SAPIEN, and PyTorch3D for their open-source contributions.
