Skip to content

Reproduction metrics differ from paper (official data + pretrained models, defaults) #5

@jh-source

Description

@jh-source

Hi authors, thanks for releasing the code and models. I attempted to reproduce the paper’s main results following the README, using the official datasets and pretrained checkpoints. Most settings were left at defaults. I’m observing noticeable deviations on several metrics (table below).

Reproduction command

python -m scripts.predict \
  --batch_size 2 \
  --relax_batch_size 2 \
  --seed 42 \
  --use_fast_sampling \
  --docking_ckpt best_inference_epoch_model.pt \
  --use_ema_weights \
  --model_in_old_version \
  --filtering_ckpt best_model.pt \
  --run_relaxation \
  --relax_ckpt best_inference_epoch_model.pt \
  --samples_per_complex 10 \
  --inference_steps 20 \
  --pocket_reduction \
  --pocket_buffer 20.0 \
  --pocket_min_size 1 \
  --only_nearby_residues_atomic \

Data and models

  • Dataset: official dataset linked in README
  • Pretrained models: official docking/filtering/relaxation checkpoints linked in README
  • All other parameters: defaults unless explicitly shown above

Results vs. paper

Metric Paper Reproduce
Ligand RMSD < 2Å (%) ↑ 39.7% 36.7%
Median Ligand RMSD (Å) ↓ 2.5 2.8
All-Atom RMSD < 1Å (%) ↑ 41.7% 43%
% PB valid ↑ 72.9% 41.2%
% PB valid and L-RMSD < 2Å ↑ 33.7% 21.4%

Environment

  • OS: Linux (RHEL8 kernel 4.18)
  • GPU: NVIDIA A100

Thanks in advance for helping identify what’s missing to match the paper’s numbers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions