Preprocess command fails: missing args, logging bug, undefined var, parser/pipeline key mismatch, and cache filename inconsistency


### Environment
- OS: Linux (EL8, 4.18)
- Python: 3.11 (conda)
- PyTorch: 2.1.2+cu121
- PyG: 2.6.1
- Lightning: 2.4.0
- torchmetrics: 1.8.1
- RDKit: 2024.3.6
- fair-esm: 2.0.0

### Command
```bash
python -m scripts.preprocess.preprocess_data \
  --dataset pdbbind \
  --data_dir /path/to/PDBBIND_atomCorrected \
  --cache_path /path/to/processed/cache_xxx \
  --split_path /path/to/timesplit_xxx \
  --esm_embeddings_path /path/to/esm/esm_embeddings \
  --num_workers 20
```

### Observed errors (from running the command)
- Missing CLI arg in preprocess script
  - AttributeError: 'Namespace' object has no attribute 'bb_random_prior' (referenced in parse_args but arg not defined)
- Incorrect logging usage in training pipeline
  - TypeError: 'module' object is not callable (uses `logging(...)` instead of `logging.info(...)`)
- Wrong output path variable in training pipeline
  - Uses `self.full_cache_path` (undefined) when writing `complex_names.pkl`; should use `self.config.cache_path`
- Missing attribute in training pipeline
  - AttributeError: 'TrainingDataPipeline' object has no attribute 'dataset'
- Parser/pipeline dict key mismatch
  - `ComplexParser.parse_protein()` expects `apo_rec_path`/`holo_rec_path`, but `TrainingDataPipeline.prepare_input_files()` produces `apo_protein_file`/`holo_protein_file`, causing:
  - ValueError: Apo Path=None and Holo Path=None not found

### Suggested fixes
- Add missing arg in `scripts/preprocess/preprocess_data.py`:
  - `parser.add_argument('--bb_random_prior', action='store_true', default=False, ...)`
- Replace `logging(...)` with `logging.info(...)` in `flexdock/data/modules/training/pipeline.py`
- Replace `self.full_cache_path` with `self.config.cache_path` when writing `complex_names.pkl`
- Set `self.dataset = config.dataset` in `TrainingDataPipeline.__init__`
- Unify keys between pipeline and parser:
  - Use `apo_rec_path`/`holo_rec_path` in `prepare_input_files()` (or make parser accept both)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocess command fails: missing args, logging bug, undefined var, parser/pipeline key mismatch, and cache filename inconsistency #4

Environment

Command

Observed errors (from running the command)

Suggested fixes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Preprocess command fails: missing args, logging bug, undefined var, parser/pipeline key mismatch, and cache filename inconsistency #4

Description

Environment

Command

Observed errors (from running the command)

Suggested fixes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions