A collection of well-annotated Python notebooks for PDB file analysis and manipulation, designed for Google Colab environments. These tools have been battle-tested in real research workflows and provide practical solutions for common protein structure tasks.
A curated set of Jupyter notebooks that handle various aspects of protein structure analysis:
- Structure manipulation - Chain renaming, merging, splitting
- Quality control - ANISO removal, missing atom detection, validation
- Format conversion - PDB to FASTA, robust input handling
- Analysis tools - Polar/charged residue analysis, structural assessments
- ProteinMPNN utilities - Selective labeling, design preparation
- Specialized workflows - Ubiquitin analysis, Pyrosetta integration
- Open in Google Colab - Each notebook is designed to run directly in Colab
- Upload your PDB files - Use Colab's file upload or Google Drive integration
- Configure settings - Edit the clearly marked configuration sections
- Run and download - Execute cells and download processed files
- Chain operations - Renaming, merging, splitting protein chains
- File cleaning - Removing unwanted records, standardizing formats
- Robust conversion - PDB to FASTA with error handling
- Structure validation - Missing atoms, coordinate issues
- Residue analysis - Polar, charged, and structural properties
- ANISO removal - Cleaning anisotropic temperature factors
- Selective labeling - Distance-based FIXED residue assignment
- Design preparation - Input formatting for sequence design
- Validation tools - Post-design analysis and verification
- Ubiquitin analysis - Domain-specific structural tools
- Pyrosetta integration - Interface with Rosetta workflows
- Custom protocols - Task-specific analysis pipelines
- Google Colab Ready - No local installation required
- Well Documented - Extensive comments and usage instructions
- Production Tested - Used extensively in real research projects
- Error Handling - Robust file processing with clear error messages
- Flexible Configuration - Easy-to-modify settings sections
- Claude AI Enhanced - Developed with AI assistance for optimal usability
Many notebooks follow this structure:
# ===== CONFIGURATION SECTION =====
INPUT_FILES = "path/to/your/files"
OUTPUT_FORMAT = "desired_format"
PROCESSING_OPTIONS = {...}
# ================================
# Processing code with detailed annotations
# Clear section headers and progress indicators
# Error handling and validation
# Results summary and download links- Google Colab (recommended) or Jupyter environment
- Python 3.8+
- Common libraries (usually pre-installed in Colab):
- BioPython (for enhanced PDB parsing)
- NumPy, Pandas (for data handling)
- Standard library modules
- Browse the collection - Find notebooks relevant to your task
- Read the headers - Each notebook has detailed usage instructions
- Configure settings - Edit the marked configuration sections
- Run step-by-step - Execute cells sequentially
- Download results - Use provided download links
Each notebook includes:
- Clear purpose statement - What the tool does
- Usage instructions - How to configure and run
- Example workflows - Typical use cases
- Error handling - What to do when things go wrong
- Output descriptions - What files you'll get
These tools have been used for:
- Protein design validation
- Structure-function analysis
- Design pipeline preparation
- Quality control workflows
- Custom analysis protocols
- AI-Assisted Development - Created with Claude AI for optimal functionality
- User-Tested - Refined through extensive real-world usage
- Iterative Improvement - Continuously updated based on practical needs
- Community Focused - Designed for easy sharing and collaboration
- Google Colab Optimized - Some notebooks may require modification for local use
- File Upload Required - Most tools expect you to upload PDB files to the session
- Session Temporary - Remember to download results before closing Colab sessions
- Resource Limits - Large datasets may hit Colab's computational limits
These notebooks are provided as-is, with comprehensive documentation. The extensive comments and annotations should guide you through most use cases. For complex modifications, refer to the configuration sections and example workflows provided in each notebook.
MIT License - feel free to use, modify, and share these tools in your research.
A collection of practical protein structure analysis tools, refined through real research applications and enhanced with AI-assisted development for maximum usability.