A comprehensive educational data visualization and presentation system that generates Google Slides presentations from assessment data (NWEA, STAR, iReady). The system includes AI-powered chart analysis, intelligent chart selection, and automated slide generation.
- Overview
- Repository Structure
- Components
- Setup Instructions
- Determination Engine & Training Data
- Key Workflows
- Environment Variables
This application provides an end-to-end solution for:
- Data Ingestion: Pulling assessment data from BigQuery
- Chart Generation: Creating visualizations for NWEA, STAR, and iReady assessments
- AI Analysis: Generating insights using GPT-4 with Emergent Learning framework
- Slide Creation: Automatically creating Google Slides presentations
- Intelligent Selection: Using LLM to determine which charts to include based on user prompts
parsec_slides_app/
├── backend/ # Flask backend API
│ ├── app.py # Main Flask application
│ ├── celery_app.py # Celery task queue configuration
│ ├── requirements.txt # Python dependencies
│ └── python/ # Core Python modules
│ ├── data_ingestion.py # BigQuery data ingestion
│ ├── chart_analyzer.py # AI-powered chart analysis
│ ├── decision_llm.py # Determination engine (LLM-based decisions)
│ ├── bigquery_client.py # BigQuery client wrapper
│ ├── google_slides_client.py # Google Slides API client
│ ├── google_drive_upload.py # Google Drive upload utilities
│ ├── nwea/ # NWEA chart generation
│ ├── star/ # STAR chart generation
│ ├── iready/ # iReady chart generation
│ ├── slides/ # Slide creation logic
│ ├── reference_decks/ # Reference PDFs for training
│ └── tasks/ # Celery background tasks
├── frontend/ # Next.js frontend
│ ├── src/
│ │ ├── app/ # Next.js app router pages
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── lib/ # Utility libraries
│ │ └── types/ # TypeScript type definitions
│ └── package.json # Node.js dependencies
└── README.md # This file
Main REST API endpoints:
/health- Health check/config/assessment-filters- Get available assessment filters/config/student-groups- Get student group definitions/data/ingest- Ingest assessment data from BigQuery/charts/generate- Generate charts for assessments/slides/create- Create Google Slides presentation
- Pulls data from BigQuery for NWEA, STAR, and iReady assessments
- Handles filtering by district, school, grade, subject, quarters (BOY/MOY/EOY)
- Quarter Mapping: BOY → Fall, MOY → Winter, EOY → Spring (for data filtering)
- Supports district-only filtering option
- Normalizes column names and data formats
- Supports concurrent queries for performance (up to 5 parallel BigQuery queries)
- NWEA Charts (
backend/python/nwea/nwea_charts.py): Generates year-over-year trend charts, student group comparisons, grade-level dashboards- Uses BOY/MOY/EOY naming in titles and filenames
- Always generates individual charts per grade (no consolidated charts for cohort trends)
- STAR Charts (
backend/python/star/star_charts.py): Generates performance progression charts, benchmark achievement charts, growth metrics- Uses BOY/MOY/EOY naming in titles and filenames
- Always generates individual charts per grade (no consolidated charts for cohort trends or SGP growth)
- Each grade gets its own separate chart file
- iReady Charts (
backend/python/iready/iready_charts.py): Generates diagnostic results, growth tracking, placement charts- Uses BOY/MOY/EOY naming in titles and filenames
- Always generates individual charts per grade
- Uses GPT-4 (text-only, data-based analysis) - NO image analysis
- Analyzes charts using structured JSON data files generated alongside chart images
- Implements Emergent Learning framework:
- Ground Truths: Observable facts with specific numbers
- Insights: Patterns and meanings derived from data
- Hypotheses: Forward-looking predictions
- Opportunities: Actionable recommendations at classroom/grade/school/system levels
- Uses reference decks for context and style guidance (filtered by deck type: BOY/MOY/EOY)
- Optimized for Performance:
- Batches multiple charts per API call (8 charts per call by default)
- Parallel processing (up to 3 concurrent API calls)
- Token estimation and data summarization for large datasets
- Automatic model selection (GPT-4 vs GPT-3.5-turbo) based on prompt size
- Generates structured JSON output with analysis results
Two main functions:
a) should_use_ai_insights()
- Determines whether to use AI insights based on user prompt
- Considers: chart count, user preferences, cost implications
- Returns:
use_ai,reasoning,confidence,analysis_focus - Note: Does NOT use reference decks - makes decisions based on prompt only
b) parse_chart_instructions()
- Parses user prompts to determine which charts to include and their order
- Handles natural language instructions like:
- "all graphs" → includes all charts
- "grades 1-4 math and reading" → filters by grade/subject
- "show Hispanic student trends" → includes demographic charts
- Returns:
chart_selection,instructions,reasoning - Note: Uses hardcoded ordering priorities (section3 > section1 > section4 > section2 > section0 > section6) rather than learning from reference decks
- Future Enhancement: Could analyze reference deck slide orders to learn preferred layouts
- Creates Google Slides presentations
- Slide Types:
- Single Chart Slides: One chart per slide with title and summary
- Dual Chart Slides: Math + Reading pairs on same slide (same grade, same scope)
- No Triple Chart Slides: Removed - only single or dual charts supported
- Handles chart pairing logic (matches math/reading charts by grade and scope)
- Manages slide layout and formatting
- Integrates AI insights into slides (if enabled)
- Uploads charts to Google Drive in batches
- Filters charts based on user prompt and reference deck patterns
- Background task processing for slide creation
- Handles long-running operations asynchronously
- Provides task status tracking
- Configurable time limits (default: 30 minutes for slide creation)
- Error handling for timeout scenarios
- Pages:
/dashboard- Main dashboard/create-deck- Deck creation interface/sign-in- Authentication (Clerk)
- Features:
- Assessment filter selection (NWEA, STAR, iReady)
- Quarter selection (BOY, MOY, EOY)
- District-only filtering toggle
- Grade, subject, student group, and race/ethnicity selection
- User prompt input for chart selection and AI insights
/bigquery/*- BigQuery data fetching/data/ingest- Data ingestion trigger/slides/create- Slide creation trigger/tasks/*- Task status tracking
useAssessmentFilters.ts- Assessment filter managementuseDistrictsAndSchools.ts- District/school selectionuseStudentGroups.ts- Student group selectionuseFormOptions.ts- Form option management
- Reusable UI components (buttons, cards, selects, etc.)
- Built with Radix UI and Tailwind CSS
- Python 3.12+ (or 3.13+)
- Node.js 18+ (or Bun)
- Google Cloud Platform account with BigQuery access
- Google Cloud credentials (service account JSON)
- OpenAI API key
- Supabase account (for database)
- Redis (for Celery task queue)
- Create virtual environment:
cd backend
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
Create
backend/.envor.envin project root:
# Google Cloud
GOOGLE_APPLICATION_CREDENTIALS=path/to/service_account.json
GOOGLE_CLOUD_PROJECT=your-project-id
# OpenAI
OPENAI_API_KEY=your-openai-api-key
# Supabase
SUPABASE_URL=your-supabase-url
SUPABASE_KEY=your-supabase-key
# Redis (for Celery)
REDIS_URL=redis://localhost:6379/0
# Google Drive/Slides
DEFAULT_SLIDES_FOLDER_ID=your-google-drive-folder-id- Run Flask development server:
cd backend
python app.py
# Or with gunicorn:
gunicorn app:app --bind 0.0.0.0:5000- Run Celery worker (for background tasks):
cd backend
celery -A celery_app worker --loglevel=info- Install dependencies:
cd frontend
bun install # or npm install- Set up environment variables:
Create
frontend/.env.local:
NEXT_PUBLIC_BACKEND_URL=http://localhost:5000
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your-clerk-key
CLERK_SECRET_KEY=your-clerk-secret
NEXT_PUBLIC_SUPABASE_URL=your-supabase-url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-supabase-anon-key- Run development server:
cd frontend
bun dev # or npm run devThe Determination Engine (decision_llm.py) uses GPT-3.5-turbo to make intelligent decisions about:
- Whether to use AI insights (based on user preferences and chart count)
- Which charts to include in presentations (based on natural language instructions)
Reference decks are organized by deck type in separate folders:
backend/python/reference_decks/BOY-DECKS/- Beginning of Year reference decksbackend/python/reference_decks/MOY-DECKS/- Middle of Year reference decksbackend/python/reference_decks/EOY-DECKS/- End of Year reference decks
The system automatically selects the appropriate reference deck folder based on the selected quarters (BOY, MOY, or EOY) when creating a presentation. This ensures that:
- BOY decks train on BOY reference patterns
- MOY decks train on MOY reference patterns
- EOY decks train on EOY reference patterns
To improve the determination engine's accuracy, you need to:
Chart filenames should follow a consistent pattern for the LLM to parse:
{scope}_{section}_{details}_{subject}_{window}_{type}.png
Examples:
district_section1_boy_trends.png(orfall_trends.pngfor internal data filtering)school_section3_grade_1_math_boy_trends.pngdistrict_section2_hispanic_reading_boy_trends.png
Required components:
scope:districtorschoolsection:section0,section1,section2,section3, etc.subject:math,reading,math_reading(for dual charts)window:boy,moy,eoy(display names) - maps tofall,winter,springfor data filteringtype:trends,cohort,sgp,growth, etc.grade: Individual grade numbers (e.g.,grade_1,grade_2) - no consolidated grade ranges
The system should handle common variations:
- "All charts":
all graphs,all charts,all of them,everything,include all,show all,output all - Grade ranges:
grades 1-4,grade 1-4,grades 1 through 4 - Subjects:
math,mathematics,reading,ela - Demographics:
Hispanic,Latino,Black,African American,White,demographic,student group - Sections:
section1,section 1,trends,year to year
Reference decks are organized by deck type in separate folders:
BOY-DECKS/ (Beginning of Year):
ADAMSAMPLE_Bridges BOY 2025 UPDATE.pdfPATRICKSAMPLE_2026_Insight Deck_BOY_Plumas Charter.pdfPATRICKSAMPLE_2026_Insights Deck_BOY_Big Picture.pdf2026_Insight Deck_BOY_YPICS.pdfBridges Charter 2024_Insight Deck_Q2_Fall NWEA.pdf
MOY-DECKS/ (Middle of Year):
2025_Insight Deck_MOY_Alta Public Schools.pdf2025-26 MOY Insight Deck LPS.pdf2025-26 MOY Insight Deck Strathmore.pdf2026_Insight Deck_MOY_Monterey Bay Charter School.pdf
EOY-DECKS/ (End of Year):
PATRICKSAMPLE_2025_Insight Deck_EOY_Chowchilla Elementary.pdf2025_Insight Deck_EOY_San Ardo Union.pdf2025_Insight Deck_EOY_Twin Ridges.pdf2025_Insight Deck_EOY_Yosemite USD.pdfSanta Rita EOY 2024-25.pdf
Naming Convention:
- Follow consistent naming:
{PARTNER}_{ASSESSMENT}_{QUARTER}_{YEAR}_{SCOPE}.pdf - Contain example insights following Emergent Learning framework
- Include structured analysis examples (ground truths, insights, hypotheses, opportunities)
Deck Type Selection:
- When creating a deck, the system automatically detects deck type from selected quarters
- BOY quarters → uses BOY-DECKS folder
- MOY quarters → uses MOY-DECKS folder
- EOY quarters → uses EOY-DECKS folder
- If multiple quarters selected, prioritizes: EOY > MOY > BOY
To improve the determination engine, collect and structure:
Create a dataset of user prompts → decisions:
{
"user_prompt": "Show me all math charts for grades 1-4",
"chart_count": 50,
"expected_decision": {
"use_ai": false,
"reasoning": "User wants specific subset, no analysis needed",
"chart_selection": ["section3_grade1_math_*.png", "section3_grade2_math_*.png", ...]
},
"actual_decision": {...}
}Examples of natural language → chart selection:
{
"user_prompt": "I want to see Hispanic student performance trends",
"available_charts": ["district_section2_hispanic_math_fall_trends.png", ...],
"expected_selection": ["district_section2_hispanic_*.png", "school_section2_hispanic_*.png"],
"expected_order": ["district", "school"]
}Examples of good vs. bad insights:
{
"chart_type": "section3_grade1_math_fall_trends",
"good_insight": {
"finding": "Math scores increased 8% from 2022 to 2023",
"implication": "Instructional changes are showing positive impact",
"recommendation": "Continue current math curriculum approach"
},
"bad_insight": {
"finding": "There are some numbers on the chart",
"implication": "This is data",
"recommendation": "Look at the chart"
}
}-
Collect Training Data:
- Log all user prompts and decisions
- Collect feedback on chart selections
- Track insight quality ratings
-
Fine-tune Prompts:
- Update
decision_promptinshould_use_ai_insights()based on common failure modes - Update
selection_promptinparse_chart_instructions()to handle edge cases
- Update
-
Add Reference Examples:
- Add more reference decks with diverse analysis styles
- Include examples of different assessment types (NWEA, STAR, iReady)
- Cover different quarters (BOY, MOY, EOY)
-
Implement Feedback Loop:
- Store user corrections to chart selections
- Track which insights were most useful
- Use feedback to improve prompt engineering
-
Consider Fine-tuning:
- If you have enough training data (1000+ examples), consider fine-tuning GPT-3.5-turbo
- Create a fine-tuning dataset with prompt-completion pairs
- Deploy fine-tuned model for better accuracy
-
Layout Learning from Reference Decks (IMPLEMENTED):
- ✅ New Feature: Reference decks are now analyzed to learn layout patterns
- ✅ Deck Type Filtering: System automatically uses BOY-DECKS, MOY-DECKS, or EOY-DECKS based on selected quarters
- ✅ Extracts slide order from reference PDFs using PDF text parsing (PyPDF2, pdfplumber, pymupdf)
- ✅ Learns section ordering, scope preferences, and subject pairing patterns
- ✅ Learns chart selection patterns (required vs optional charts)
- ✅ Learns chart groupings and presentation flow
- ✅ Updates
parse_chart_instructions()to use learned layouts as context - ✅ Filters charts based on reference deck patterns (omits charts not in reference decks)
- ✅ Falls back to hardcoded ordering if no reference decks available
- Usage: Run
python backend/python/test_layout_learner.pyto test layout extraction - Deck Type Detection: Automatically detects from
quartersparameter (BOY/MOY/EOY) - Chart Analysis: Uses reference decks to provide context for AI chart analysis (Emergent Learning framework)
- No fine-tuning: Currently uses zero-shot GPT-3.5-turbo/GPT-4
- Limited reference decks: Only reference PDFs in BOY-DECKS, MOY-DECKS, EOY-DECKS folders
- No feedback mechanism: No way to learn from user corrections
- Prompt-based only: No structured training data pipeline
- Layout learning: Reference decks ARE used for chart selection and ordering, but extraction accuracy depends on PDF quality
- No image analysis: Chart analysis uses structured JSON data only (no GPT-4o Vision)
- Single worker optimization: Optimized for single-worker environments (3 concurrent API calls, 8 charts per batch)
User selects filters → Frontend calls /data/ingest → Backend queries BigQuery →
Data normalized → Stored in memory/temp files → Charts generated
Assessment data → Chart generation module (nwea/star/iready) →
Matplotlib charts created → Saved as PNG → Chart metadata saved as JSON
User selects filters + provides prompt → Data ingested from BigQuery →
Charts generated (individual per grade) → Decision LLM determines AI usage →
Chart selection LLM filters charts based on prompt + reference deck patterns →
Charts uploaded to Drive in batches → Slides created (single or dual charts only) →
AI insights added (if enabled) → Presentation complete
Chart + JSON data file → Chart analyzer → Load reference decks (filtered by deck type) →
Build Emergent Learning prompt → Batch analysis (8 charts per API call) →
Parallel processing (3 concurrent calls) → GPT-4/GPT-3.5-turbo analysis →
Structured JSON insights → Added to slides
| Variable | Description | Required |
|---|---|---|
GOOGLE_APPLICATION_CREDENTIALS |
Path to GCP service account JSON | Yes |
GOOGLE_CLOUD_PROJECT |
GCP project ID | Yes |
OPENAI_API_KEY |
OpenAI API key | Yes |
SUPABASE_URL |
Supabase project URL | Yes |
SUPABASE_KEY |
Supabase service role key | Yes |
REDIS_URL |
Redis connection URL | Yes |
DEFAULT_SLIDES_FOLDER_ID |
Google Drive folder ID for slides | Yes |
| Variable | Description | Required |
|---|---|---|
NEXT_PUBLIC_BACKEND_URL |
Backend API URL | Yes |
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY |
Clerk auth publishable key | Yes |
CLERK_SECRET_KEY |
Clerk auth secret key | Yes |
NEXT_PUBLIC_SUPABASE_URL |
Supabase project URL | Yes |
NEXT_PUBLIC_SUPABASE_ANON_KEY |
Supabase anonymous key | Yes |
- Changed from Fall/Winter/Spring to BOY/MOY/EOY for user-facing selections
- Internal data filtering still uses Fall/Winter/Spring mapping
- All chart titles and filenames use BOY/MOY/EOY naming
- Removed consolidated charts: All cohort trends and SGP growth charts are now individual per grade
- No triple chart slides: Removed support for 3-chart slides - only single and dual charts supported
- Individual grade charts: Each grade gets its own separate chart file (no grouping)
- Batched chart analysis: 8 charts analyzed per API call (reduces API calls by ~87%)
- Parallel processing: Up to 3 concurrent API calls for faster analysis
- Optimized for single-worker: Designed for Render.com single-worker environments
- Token management: Automatic data summarization and model selection based on prompt size
- Data-only analysis: Uses structured JSON data files instead of image analysis
- Reference deck filtering: Automatically selects appropriate reference decks based on deck type (BOY/MOY/EOY)
- Layout learning: Extracts chart ordering and selection patterns from reference PDFs
When adding new features:
- Chart Types: Add generation logic in respective
{assessment}/folder - AI Analysis: Update
chart_analyzer.pyprompts - Determination Engine: Update
decision_llm.pyprompts and add training examples - Reference Decks: Add new PDFs to appropriate
reference_decks/{BOY|MOY|EOY}-DECKS/folder following naming convention - Chart Generation: Always generate individual charts per grade (no consolidated charts)
- Slide Creation: Only create single or dual chart slides (no triple charts)
[Add your license here]