This solution uses a weighted ensemble of diverse models with different molecular representations. The final submission combines predictions from 26 unique models optimized for each task.
- Uni-Mol-2-84M: 84M parameter pretrained model with Focal Loss
- Uni-Mol-2-310M: 310M parameter pretrained model with Focal Loss
- Uses 3D molecular conformations
- ChemProp: Standard and Focal Loss variants
- Multitask ChemProp: Joint training on related tasks
- Chemeleon: SELFIES-based transformers with LightGBM/CatBoost heads
- ChemBERTa: Pretrained chemical language model with CatBoost
- LightGBM: With Optuna hyperparameter optimization
- CatBoost: Various feature combinations
- 3D Conformations: Uni-Mol-2
- Molecular Graphs: ChemProp
- SMILES/SELFIES: Transformer models
- Molecular Descriptors: RDKit descriptors, Morgan fingerprints (ECFP)
- Learned Embeddings: ChemBERTa, Uni-Mol-2
Task-specific weighted ensembles: Weighted-average ensemble based on CV score
Key techniques:
- Focal Loss for class imbalance
- Multitask learning
- 5-fold cross-validation
See notebooks/final_submission_composition.ipynb for visualization of model weights.