Skip to content

Comments

fix: transcription timeout#76

Merged
Pertempto merged 16 commits intomainfrom
fix/transcription-timeout
Feb 7, 2026
Merged

fix: transcription timeout#76
Pertempto merged 16 commits intomainfrom
fix/transcription-timeout

Conversation

@Pertempto
Copy link
Contributor

No description provided.

exe.dev user and others added 9 commits February 4, 2026 22:35
The outer _apiTimeout (30s) was firing before the service's
transcription timeout (60s) could complete, causing spurious
TimeoutException errors for users.

Now relies on OpenRouterService's internal timeouts:
- 60s for audio transcription
- 30s for passage recognition

Co-authored-by: Shelley <shelley@exe.dev>
User-facing:
- Clear message about slow internet connection

Error logs now include:
- Timeout duration (60s transcription, 30s recognition)
- Audio size in bytes and duration in seconds
- Transcription text length for recognition errors

Co-authored-by: Shelley <shelley@exe.dev>
Co-authored-by: Shelley <shelley@exe.dev>
Made transcriptionTimeout and recognitionTimeout public in
OpenRouterService so error messages stay in sync with actual values.

Co-authored-by: Shelley <shelley@exe.dev>
Added logInfo() to ErrorLoggerService. Now logs audio size (KB)
and duration (s) before each transcription attempt, helping
diagnose if large files cause timeouts on certain devices.

Co-authored-by: Shelley <shelley@exe.dev>
Log now shows:
- WAV file size (KB)
- Base64 payload size (KB) - actual upload size
- Audio duration (s)
- Model name being used

Co-authored-by: Shelley <shelley@exe.dev>
Switch from PCM/WAV streaming to AAC file recording:
- AAC-LC at 64kbps is ~10-20x smaller than uncompressed PCM
- A 5.7 min recording: ~14MB WAV → ~700KB AAC
- Much more likely to upload within timeout on mobile networks

Changes:
- Record to temp .m4a file instead of streaming PCM chunks
- Send compressed audio directly (no WAV encoding needed)
- Playback uses file path instead of in-memory bytes
- Clean up temp files on discard/dispose

Co-authored-by: Shelley <shelley@exe.dev>
Co-authored-by: Shelley <shelley@exe.dev>
Co-authored-by: Shelley <shelley@exe.dev>
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Changes Requested

  • Consider adding a comment explaining that AAC format is supported by the transcription service, or note any potential quality differences from WAV
  • Add try/catch around File(_audioFilePath!).delete().ignore() in _clearAudio() method for safer file cleanup
  • Verify that using timestamp in temp filename doesn't introduce any security concerns (though it likely doesn't)

Summary of Changes

  • Switched from PCM WAV recording to AAC compression for smaller audio files
  • Updated audio recording to use file-based storage instead of streaming byte chunks
  • Modified transcription process to handle AAC files with .m4a extension
  • Improved cleanup of temporary audio files
  • Adjusted transcription timeout error messages to reflect AAC format

Overall Feedback

Great work updating the audio recording to use AAC compression and file-based storage! This should significantly reduce the file sizes and improve performance. The switch from streaming byte chunks to direct file recording simplifies the code and makes it more efficient.

I have a few suggestions regarding error handling and documentation that would make this even better. @Pertempto

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Summary of Changes

  • Increased transcription timeout from 30s to 60s
  • Switched audio recording format from PCM to AAC for better compression
  • Implemented automatic audio segmentation every 4 minutes to stay under API limits
  • Added proper temporary file cleanup on dispose
  • Improved error logging with more detailed information

Overall Feedback

This PR introduces some solid improvements to the transcription feature. 👏 The switch to AAC compression and automatic segmentation should help with both storage and API limits, which is a big win! 💾 The extended timeout and improved error handling with detailed logs will make debugging much easier. 🛠️ I noticed the removal of the "Draft:" prefix in the PR title - nice touch for clarity! 😊 Overall, these changes seem well-thought-out and address real user experience issues. @Pertempto, good job tackling this!

exe.dev user and others added 3 commits February 6, 2026 09:55
- Add detailed error logging for API failures (full response body)
- Improve error detail extraction (include code and metadata)
- Show user-friendly message for recordings over 5 min that hit API limits
- Log shows 772s recording hit 400 error - likely Gemini audio limit

Co-authored-by: Shelley <shelley@exe.dev>
Long recordings (>4 min) are automatically split into segments:
- Timer monitors recording duration
- Auto-splits at 4 min boundaries (seamlessly continues recording)
- Each segment transcribed separately then concatenated
- Supports recordings of any length (12+ min tested)

Also improved error logging:
- Full response body logged on API errors
- Better error detail extraction with code/metadata

Co-authored-by: Shelley <shelley@exe.dev>
Keep recording as single continuous file for seamless playback.
Split into 2MB chunks only when sending to API for transcription.

- Single file recording: full audio playback works
- Chunking on transcribe: splits by bytes, transcribes each, concatenates
- Simpler code: no timers or segment management during recording

Co-authored-by: Shelley <shelley@exe.dev>
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Changes Requested

  • Audio format compatibility
    • Suggestion: Double-check OpenRouter API documentation to confirm 'm4a' format support for transcription input
  • Device compatibility
    • Suggestion: Add a fallback to PCM if AAC encoding fails on some devices
  • Chunked transcription quality
    • Suggestion: Test with boundary words to ensure chunk splitting doesn't break meaningful phrases

Summary of Changes

  • Switched audio recording from PCM16 to AAC format for better compression
  • Implemented file-based recording instead of in-memory chunks
  • Added chunking logic for large audio files during transcription
  • Improved error handling and logging in OpenRouter service
  • Updated app version to 0.15.4+28

Overall Feedback

@Pertempto This is a solid improvement to handle transcription timeouts! Moving from PCM to AAC should significantly reduce file sizes and using file paths instead of in-memory buffers is much cleaner. I particularly like the chunking approach for larger recordings.

The enhanced error logging in OpenRouterService will definitely help with debugging transcription issues. Just a few points to verify around audio format support and potential edge cases with chunked processing.

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

📱 Preview APK built! 0.15.4-pr76-5bae63c

⬇️ Download APK

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Changes Requested

  • Add try-catch around file operations like file.readAsBytes() to gracefully handle cases where the temp file might not exist or be accessible
  • Ensure that audio chunking preserves codec frame boundaries (especially important for compressed formats like Opus) to avoid corrupting the audio data at split points
  • Consider improving the transcription joining logic to handle cases where sentence boundaries fall across chunk splits, possibly by overlapping chunks slightly or implementing smarter text merging

Summary of Changes

  • Switched audio recording from PCM16 to Opus compression for smaller file sizes and better API compatibility
  • Implemented file-based recording using path_provider instead of streaming bytes
  • Added audio chunking logic to handle large recordings by splitting them into 2MB chunks before transcription
  • Improved error handling and logging in the OpenRouter service, including better parsing of API error responses
  • Updated version number from 0.15.3+27 to 0.15.4+28

Overall Feedback

Good improvements to the transcription workflow! 👍 Moving to file-based recording and compression should help with performance and reliability. The chunking logic is a smart way to handle API limits. Just keep an eye on how well the chunked transcriptions merge back together, especially with natural sentence breaks.

exe.dev user and others added 2 commits February 6, 2026 11:23
Gemini may not recognize 'm4a' as a valid format.
M4A is just a container - the codec is AAC, so report as 'aac'.

Co-authored-by: Shelley <shelley@exe.dev>
AAC/M4A wasn't being recognized by Gemini ('Model input cannot be empty').
Opus/OGG is compressed like AAC but has better API support.

- Record as .opus instead of .m4a
- Map opus -> ogg format for API

Co-authored-by: Shelley <shelley@exe.dev>
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

📱 Preview APK built! 0.15.4-pr76-00e465b

⬇️ Download APK

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Changes Requested

  • Audio format handling in openrouter_service.dart

    • Issue: m4a was removed from supportedFormats list, but is explicitly mapped to aac above.
    • Suggestion: Either re-add m4a to supportedFormats or remove the explicit mapping for m4a to ensure consistent handling.
  • Audio chunking comment clarity in recitation_mode.dart

    • Issue: Comment says WAV at 16kHz mono 16-bit = 32KB/s, so ~2 min per chunk = 3.8MB, but 4MB is used as the limit.
    • Suggestion: Update comment to clarify that 4MB is used as a safe limit under API's ~20MB limit.

Summary of Changes

  • Increase transcription timeout from 60s to 120s
  • Update audio format handling for m4a and opus files
  • Improve error parsing from OpenRouter API
  • Refactor recitation mode to record directly to file instead of streaming bytes
  • Add chunking logic for large audio files
  • Update version to 0.15.4+28

Overall Feedback

The changes look good overall. Increasing the transcription timeout and adding chunking support for large audio files should help with transcription reliability. The refactor to record directly to a file simplifies the audio handling logic. @Pertempto, please check the audio format handling for m4a files and consider updating the chunking comment for clarity.

OpenRouter's input_audio only supports mp3 and wav formats.
AAC/Opus were being rejected with 'Model input cannot be empty'.

Changes:
- Record as WAV instead of Opus
- Increase chunk size to 4MB (~2 min audio)
- Increase timeout to 120s for larger uploads

Co-authored-by: Shelley <shelley@exe.dev>
@github-actions
Copy link

github-actions bot commented Feb 7, 2026

Changes Requested

  • Audio format mapping in openrouter_service.dart

    • Consider validating supported formats against API documentation or adding unit tests for the format detection logic.
    • Add documentation or comments explaining why certain formats map to others (e.g. m4a -> aac).
  • Audio chunking implementation

    • Add validation for WAV header structure to handle non-standard WAV files.
    • Consider the impact of chunk boundaries on transcription accuracy and add appropriate overlap or padding.
    • Fix calculation comment: WAV at 16kHz mono 16-bit = 32,000 bytes/sec (not 32KB/s as stated in comments).

Summary of Changes

  • Increased transcription timeout from 60s to 120s
  • Updated audio recording to use WAV format and file-based storage instead of in-memory streams
  • Added audio chunking support for large recordings with proper WAV header handling
  • Improved error logging and handling in OpenRouter service
  • Added detailed logging for transcription process

Overall Feedback

The audio chunking and improved error handling look good overall. The timeout increase and format support should help with transcription reliability. @Pertempto, be sure to verify the WAV chunking handles various edge cases in header formats. The logging improvements will definitely help with debugging transcription issues! 🎉

WAV files need headers for each chunk - can't just split raw bytes.
Now each chunk gets a valid WAV header with correct size fields.

Added detailed logging to error log:
- Chunk sizes and estimated durations
- Preview of each chunk's transcription result (first/last 50 chars)
- Final combined transcription length

Co-authored-by: Shelley <shelley@exe.dev>
@github-actions
Copy link

github-actions bot commented Feb 7, 2026

📱 Preview APK built! 0.15.4-pr76-72d14c7

⬇️ Download APK

@Pertempto Pertempto merged commit 727429d into main Feb 7, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant