A comprehensive web application for managing and analyzing microhaplotype genetic data, designed for researchers and institutions working with population genetics and genomic analysis.
MicrohapDB is a specialized database system for storing, managing, and analyzing microhaplotypes - short DNA sequences that contain multiple closely linked polymorphic sites. These genetic markers are valuable for:
- Population genetics studies
- Ancestry inference
- Forensic genetics
- Evolutionary research
- Biogeographical analysis
Location: microhapDB-frontend/
A modern, responsive web interface built with Vue.js that provides:
- Data Upload & Management: Upload PAV (Presence/Absence Variation) and MADC (Microhaplotype Allele Data Collection) files
- Interactive Visualizations: Charts and graphs for genetic data analysis
- User Authentication: ORCID-based login system for researchers
- Admin Dashboard: Administrative tools for data management
- Search & Filter: Advanced search capabilities across genetic datasets
Key Features:
- Real-time data validation during upload
- Batch processing for large datasets
- Export functionality for analysis results
- Role-based access control (Admin/User)
- Integration with ORCID for researcher authentication
Location: microhapDB-backend/
A robust REST API built with FastAPI and Python that handles:
- Database Management: PostgreSQL database with optimized schemas
- File Processing: Handles PAV and MADC file formats with validation
- Authentication: ORCID OAuth integration
- Data Analysis: Statistical computations and genetic analysis algorithms
- API Documentation: Auto-generated OpenAPI/Swagger documentation
- Performance: Optimized queries and caching for large datasets
Key Features:
- RESTful API with comprehensive endpoints
- Database migrations with Alembic
- Background task processing
- Data validation and error handling
- Comprehensive logging and monitoring
- Docker containerization for deployment
- Backend: AWS EC2 instance with Docker containers
- Frontend: AWS S3 + CloudFront for global CDN
- Database: PostgreSQL (configurable for AWS RDS)
- CI/CD: GitHub Actions for automated deployment
- Infrastructure: Terraform for Infrastructure as Code
- GitHub Actions (Recommended): Automated CI/CD pipeline
- Terraform: Infrastructure as Code management
- Manual EC2: Direct deployment to AWS EC2 instances
- Binary genetic variation data
- Population frequency information
- Geographic metadata
- Detailed allele frequency data
- Multi-population comparisons
- Statistical analysis results
- ORCID Authentication: Secure researcher authentication
- Role-based Access: Admin and user permission levels
- Data Validation: Comprehensive input validation and sanitization
- Secure File Upload: Validated file processing with size limits
- Environment Variables: Secure configuration management
- Vue.js 3: Progressive JavaScript framework
- Vue Router: Client-side routing
- Axios: HTTP client for API communication
- Chart.js: Data visualization
- Bootstrap/CSS: Responsive styling
- FastAPI: Modern Python web framework
- SQLAlchemy: Database ORM
- Alembic: Database migrations
- PostgreSQL: Primary database
- Pydantic: Data validation
- Uvicorn: ASGI server
- Docker: Containerization
- GitHub Actions: CI/CD pipeline
- Terraform: Infrastructure as Code
- AWS: Cloud infrastructure (EC2, S3, CloudFront)
- Nginx: Reverse proxy (in production)
- Upload and manage genetic datasets
- Perform population genetics analysis
- Visualize allele frequencies and distributions
- Export data for further analysis
- Collaborate with other researchers
- Centralized genetic data repository
- Multi-user access with role management
- Standardized data formats and validation
- Scalable infrastructure for large datasets
- Integration with existing research workflows
- Population Geneticists: Researchers studying genetic variation in populations
- Forensic Scientists: Professionals using genetic markers for identification
- Evolutionary Biologists: Scientists studying genetic evolution and ancestry
- Academic Institutions: Universities and research centers
- Government Agencies: Organizations involved in genetic research and policy
- Visit the deployed application
- Sign in with your ORCID account
- Upload your genetic data files (PAV/MADC format)
- Explore visualizations and analysis tools
- Export results for your research
- Clone the repository
- Set up the backend (see
microhapDB-backend/README.md) - Set up the frontend (see
microhapDB-frontend/README.md) - Configure environment variables
- Run locally for development
We welcome contributions from the genetics and bioinformatics community! Please see our contributing guidelines and feel free to submit issues or pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Live Application: [Deployed URL]
- API Documentation: [Backend URL]/docs
- GitHub Repository: https://github.com/tylerslonecki/microhapDB
- Issues & Support: GitHub Issues
MicrohapDB is developed to support the global genetics research community in advancing our understanding of human genetic diversity and evolution.