[English] | 简体中文
A powerful full-stack internship aggregation platform designed to help students find, filter, and track their dream internships. Aggregates 600+ internships from 10 top companies and community sources. Specifically optimized for Summer 2026 internship cycles.
Internship Aggregator is a modern web application that solves the problem of scattered internship information. It automatically collects listings from trusted sources, processes them with intelligent tagging, and presents them in a clean, searchable interface.
Whether you're looking for AI/ML specific roles or need to know if a company is friendly to international students (H1B/Visa support), this tool provides the insights you need at a glance.
- 🕷️ Automated Crawler: Real-time synchronization with high-quality internship repositories.
- 🔍 Advanced Filtering: Search by company, role, location, or industry with instant results.
- 🤖 AI-Role Highlighting: Automatically identifies and tags roles related to Artificial Intelligence and Machine Learning.
- 🌍 International Student Focus: Includes a "Friendliness Score" (1-10) to indicate visa sponsorship likelihood.
- ⚡ Minimalist UI: Responsive design built with Tailwind CSS for a seamless desktop and mobile experience.
- 📊 One-Click Apply: Direct links to application pages to save you time.
The core competency of the Internship Aggregator lies in its unparalleled data richness. Unlike other platforms that rely on a single source, we aggregate high-quality listings from a diverse network of sources, ensuring you never miss an opportunity.
We aggregate data from high-quality community-driven sources and official career portals:
| Source Type | Source Name | Update Frequency | Description |
|---|---|---|---|
| Community | SimplifyJobs GitHub | Real-time | The largest community-driven internship repository. |
| Official | Goldman Sachs | Crawled | Official Goldman Sachs internship programs. |
| Official | Apple Careers | Crawled | Engineering and Operations internships at Apple. |
| Official | Meta Careers | Crawled | Internships across Meta's family of apps. |
| Official | NASA STEM | Crawled | Official NASA STEM engagement opportunities. |
| Official | Microsoft Careers | API | Real-time fetching from Microsoft's career API. |
| Official | JPMC Careers | Crawled | Tech programs and internships at JPMorgan Chase. |
| Official | Morgan Stanley | API | Tech programs for students at Morgan Stanley. |
| Official | Google Careers | Crawled | Software, hardware, and research internships at Google. |
| Official | Amazon Jobs | Crawled | Engineering and business internships at Amazon. |
Our platform uses sophisticated collection methods tailored to each source:
- Direct API Integration: For sources like Microsoft and Morgan Stanley, we interact directly with their internal career APIs for maximum speed and accuracy.
- Browser Automation (Playwright): For dynamic websites like Apple, Meta, and JPMorgan Chase, we use headless browser automation to navigate, click tabs, and extract data exactly as a user would.
- Markdown Parsing: For community lists on GitHub, we parse raw markdown files to extract structured internship data.
Our backend exposes a RESTful API for accessing internship data and source configurations.
Retrieve a paginated list of internships.
Parameters:
offset(int, default=0): Number of items to skip.limit(int, default=12): Number of items to return.search(string, optional): Search term for company, role, or industry.sort_by_date(bool, default=true): Sort results by posting date.source(string, optional): Filter by source identifier (e.g.,goldman_sachs_official).
Retrieve a list of all configured data sources.
Response:
Returns an array of source objects containing name, type, url, and enabled status.
Our data collection engine is built for scale and flexibility:
- Configurable Data Sources: All sources are defined in
backend/data_sources.json, allowing for easy additions without code changes. - Specialized Collectors: Each source type (e.g.,
github_readme,simulated_company_listing) has a dedicated collector to handle its specific HTML structure and data format. - Intelligent Parsing: We don't just scrape links; we extract metadata, detect visa sponsorship, and categorize roles using NLP heuristics.
- Frontend: React (Vite), Tailwind CSS, Lucide Icons.
- Backend: FastAPI (Python), SQLModel (ORM), Uvicorn.
- Database: SQLite (local storage for easy setup).
- Automation: Playwright (browser automation), BeautifulSoup/Requests (HTML parsing).
# Clone the repositoryx`
git clone https://github.com/Mikelee2022/internship-aggregator
cd internship-aggregator
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r backend/requirements.txt
# Seed the database (optional)
python backend/crawler.py
# Start the server
uvicorn backend.main:app --reload- API:
http://127.0.0.1:8000 - Docs:
http://127.0.0.1:8000/docs
cd frontend
npm install
npm run dev- App:
http://localhost:5173
internship-aggregator/
├── backend/ # FastAPI & Crawler logic
│ ├── main.py # Entry point
│ ├── models.py # SQLModel definitions
│ └── crawler.py # Scraper implementation
├── frontend/ # React App
│ └── src/ # Components & Logic
└── README.md
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.
