Skip to content

Conversation

@erkkimon
Copy link

This fork only adds instructions to README; fixes a few problems with setting up the project and adds info about how to use the tools.

It also adds venv to .gitignore to keep the git changes clean.

…o ensure matching frame counts between guide and output videos
…ait_for_sqlite_change() helper that monitors db/wal/shm file changes - Modified main polling loop to wake up immediately when WAL file changes - Reduces perceived lag when headless.py picks up newly committed tasks - Falls back to normal sleep interval for non-SQLite databases
…nt_frames_target now represents total segment length
peteromallet and others added 29 commits July 10, 2025 15:05
- Replace service role RPC calls with direct SQL queries
- Remove fallback RPC code that caused function overload issues
- Both JWT (service-role) and PAT tokens now use same SQL approach
- Service role: can claim any available task (no user restrictions)
- User tokens: can only claim tasks from their own projects
- Eliminates all dependency on problematic RPC functions
- Service roles can now claim any available task regardless of project ownership
- Replace RPC call with direct SQL queries for consistency
- Remove obsolete fallback logic that was incorrectly filtering by user ownership
- Service roles have full access while user tokens remain project-restricted
- Both paths now use same atomic claiming pattern
- Split project lookup and task query into separate steps for better debugging
- Add detailed logging to show project IDs being searched
- Fix complex JOIN syntax that was causing claim failures
- Maintain same atomic claiming pattern with race condition protection
- Should now successfully find and claim queued tasks
- Add detailed logging for project IDs and query execution
- Catch and log SQL errors that might be causing silent failures
- Show exact error messages and data results from task queries
- Add validation for empty project ID arrays
- Help identify if syntax errors are causing claim failures
- Updated claim_next_task Edge Function to check dependencies before claiming tasks
- Created new get_predecessor_output Edge Function combining dependency+output lookup
- Added Python function using Edge Functions for Supabase, fallback for SQLite
- Updated travel segment code to use combined Edge Function approach
- Eliminates RLS permission issues causing workflow failures
- Document new get_predecessor_output Edge Function
- Update Supabase section to explain Edge Function approach for RLS handling
- Clarify that Python code uses Edge Functions for Supabase, direct queries for SQLite
- Fixed incorrect JOIN syntax that relied on non-existent foreign keys
- Replaced automatic relationship queries with manual dependency checking
- Both claim_next_task and get_predecessor_output now use separate queries
- Added proper TypeScript typing for task arrays
- Maintained proper RLS authorization throughout
- Added --worker parameter to create named log files in logs/ folder
- Worker logging takes precedence over debug auto-logging
- Enhanced startup messages to show worker ID when specified
- Maintains existing --debug flag functionality
- Enables parallel worker identification with individual log files
…m_available_task

- Fixed RPC call in get_oldest_queued_task_supabase() to use correct function name
- Updated all debug and error messages to reflect the correct function name
- Resolves worker error where database function did not exist
Major architecture improvements:
- ✅ Eliminated all RPC dependencies, pure Edge Function architecture
- ✅ Dual authentication: Service Key (workers) vs PAT (individual users)
- ✅ Worker management with auto-creation and constraint handling
- ✅ Full Supabase storage integration (image_uploads bucket)
- ✅ Comprehensive test suite: 95.5% success rate (21/22 tests)

New Edge Functions:
- create-task: Task creation with RLS enforcement
- claim-next-task: Atomic claiming with dependency checking
- complete-task: Task completion with file upload
- update-task-status: Status updates (In Progress, Failed)
- get-predecessor-output: Dependency chain resolution
- get-completed-segments: Segment collection for stitching

Repository cleanup:
- Removed debug files, temporary videos, obsolete test scripts
- Streamlined production-ready codebase
- Updated documentation and structure

Authentication flow:
- Service role: Uses worker_id for machine tracking
- User/PAT: Clean task claiming without worker complexity
- RLS enforcement via Edge Functions
- PAT resolution via user_api_tokens table

Worker management:
- Auto-creation trigger for new worker IDs
- Backfill existing workers from tasks
- Specific worker ID: gpu-20250723_221138-afa8403b
- Foreign key constraint handling
Repository organization improvements:
- ✅ Moved comprehensive test suite to tests/ directory
- ✅ Removed unnecessary .md documentation files
- ✅ Eliminated SQL migration files (already deployed)
- ✅ Cleaned up tasks/ directory with obsolete docs
- ✅ Removed duplicate/old test files from tests/
- ✅ Streamlined to production-ready codebase only

Final structure:
- tests/test_travel_workflow_db_edge_functions.py (main test suite)
- STRUCTURE.md (essential documentation)
- headless.py (main application)
- source/ (core modules)
- supabase/ (edge functions)
- Wan2GP/ (core engine)
- Clean, focused repository ready for production
Final cleanup of obsolete RPC function references:
- ✅ Updated comment in headless.py: func_claim_task → claim-next-task Edge Function
- ✅ Disabled migration call in headless.py (migrations complete)
- ✅ Confirmed no func_claim_available_task references in current codebase

All RPC dependencies successfully eliminated:
- No func_claim_available_task calls
- No func_update_task_status calls
- No func_mark_task_failed calls
- Migration functions disabled (Edge Function architecture complete)

Code now uses pure Edge Function architecture exclusively.
✅ Eliminated obsolete RPC migration function:
- Simplified _migrate_supabase_schema() to no-op
- Removed func_migrate_tasks_for_task_type RPC call
- Removed all RPC error handling code

✅ Updated comments throughout:
- 'RPC migrations' → 'Edge Function architecture complete'
- 'RPC fallback' → 'Edge Function exclusively'
- 'RPC could be used' → 'Direct table query'

🎯 Result: Zero RPC dependencies in codebase
- No .rpc() calls anywhere in source code
- Pure Edge Function architecture achieved
- All database operations via modern Supabase patterns
✅ Complete codebase RPC cleanup:
- Updated STRUCTURE.md: removed outdated RPC function references
- Fixed Edge Function comment: 'RPC function' → 'database operation'
- Updated test file reference: test_supabase_headless.py → tests/test_travel_workflow_db_edge_functions.py
- Corrected db_operations.py description: 'RPC wrappers' → 'Edge Function integration'

🔍 Comprehensive verification completed:
- Zero .rpc() calls in Python codebase
- Zero func_*task RPC references in active code
- Only remaining 'func_' references are Edge Function names (create-task, claim-next-task)
- All obsolete files previously deleted from Git history

🎯 Result: 100% pure Edge Function architecture achieved
🎯 Critical Fix for Silent Failures:
- Added _mark_task_failed_via_edge_function() helper function
- When complete-task Edge Function fails, task is now marked as Failed
- Prevents 'completed successfully' messages when uploads actually fail
- Error details captured in task failure message for debugging

🔍 Addresses Issue:
- Previously: Edge Function failure → silent return → 'task completed'
- Now: Edge Function failure → mark as Failed → proper error reporting

Still investigating why Edge Function shows UUID casting error despite deployment.
- Add Deno type declarations to fix TypeScript errors
- Convert task_id to string early to avoid 'cannot cast type jsonb to uuid' errors
- Update all database queries to use string version of task_id
- Fix orchestrator task completion logic to handle UUID types properly

This resolves the issue where tasks weren't being marked as complete due to database update failures.
- Fixed 'return None are' typo to 'return None' in get_oldest_queued_task_supabase function
- Added comprehensive debug logging to complete-task edge function for better error tracking
- Enhanced error handling and logging in task completion workflow
- Modified headless.py DualWriter class to open log files in append mode ('a') instead of write mode ('w')
- Worker log files at logs/{worker_id}.log now preserve previous session logs
- Improves debugging by maintaining log history across worker restarts
- Add new magic_edit.py module in sm_functions/ for image transformations
- Integrate black-forest-labs/flux-kontext-dev-lora model via Replicate API
- Support conditional InScene LoRA usage (in_scene=true/false parameter)
- Add replicate dependency to requirements.txt
- Update headless.py to handle magic_edit task type
- Update STRUCTURE.md to document new functionality
- Full Supabase storage integration for outputs
- Tested with real database tasks (13 magic_edit tasks ready to process)
…ific overrides for num_inference_steps (9 steps), guidance_scale (1.0), and flow_shift (1.0) to allow standard generation parameters regardless of causvid LoRA usage
- Add huggingface_hub>=0.25.0 to requirements.txt for reliable downloads with automatic checksums
- Add safetensors>=0.4.0 for LoRA integrity verification
- Update download_file() to use hf_hub_download() for HuggingFace URLs with:
  - Automatic SHA-256 checksum verification
  - Resume download capability for interrupted transfers
  - Better error handling and retry logic
- Add fallback integrity checks for non-HF URLs:
  - Content-length verification
  - Safetensors format validation
  - Automatic cleanup of corrupted files
- Fixes corrupted LoRA downloads like steamboat-willie-14b.bf16.safetensors
- Add validate_lora_file() with size range validation (1MB-50GB) based on LoRA ranks
- Add content inspection for safetensors files to verify LoRA tensor patterns
- Add check_loras_in_directory() utility for batch validation of LoRA directories
- Create check_loras.py script for command-line LoRA integrity checking
- Enhanced download_file() to validate existing files before assuming they're good
- Added post-download validation for all LoRA files with auto-cleanup of corrupted files
- Detects common corruption patterns: empty files, HTML error pages, wrong formats
- Provides detailed validation messages with file sizes and specific error types

This prevents corrupted LoRA downloads and provides tools to audit existing files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants