-
Notifications
You must be signed in to change notification settings - Fork 5
fix: optimize workflow to prevent timeouts by removing problematic step and splitting into chunks #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… failures Co-authored-by: famosab <45968370+famosab@users.noreply.github.com>
|
@copilot One of the workflowa got cancelled after 16m. |
Co-authored-by: famosab <45968370+famosab@users.noreply.github.com>
Co-authored-by: famosab <45968370+famosab@users.noreply.github.com>
I've split the workflow further by chunking large benchmarks. The new configuration:
This reduces the workflow from 3 jobs to 6 jobs, with each job processing at most 8 variant calls instead of 23. This should prevent the timeout issues. |
|
@copilot please analyze the logs of the github action runs, because 2 checks where still failing and the report part was skipped. |
…ailures Co-authored-by: famosab <45968370+famosab@users.noreply.github.com>
Co-authored-by: famosab <45968370+famosab@users.noreply.github.com>
I've analyzed the logs from workflow run 21831716746. Found 2 main issues: Issues:
Fixes in commits 2592ac0 and 4d671f6:
This should keep all jobs under 6 hours and generate reports even if some benchmarks fail. |
The GitHub Action workflow was failing due to timeouts and job cancellations. This PR implements fixes to ensure reliable workflow execution.
Root Cause
Issue 1: "Fix modification dates" step causing 4+ hour network timeouts
The
snakemake --touch resources/regions/${{ matrix.sample }}/test-regions.cov-*.bedcommand builds a full dependency DAG, including upstream rules that download large sample files:This triggered network failures:
ChunkedEncodingError: Connection broken: IncompleteRead(4313039036 bytes read, 1279940331 more expected)Issue 2: Jobs hitting GitHub's 6-hour time limit
After initial chunking with 8 variant calls per job, some jobs were still taking 3-6 hours and hitting GitHub Actions' hard 6-hour job time limit. Analysis of workflow run 21831716746 showed that jobs processing 8 variant calls plus alignment exceeded time limits.
Changes
Removed "Fix modification dates" step entirely
--rerun-triggers mtimeflag in the "Run analysis" step already handles timestamp-based rerunsImplemented chunk-based workflow splitting with optimized chunk size
workflow/rules/common.smkusingchunk_indexandchunk_sizeparametersMade report job tolerant of partial failures
if: ${{ !cancelled() && (success() || failure()) }}condition to report jobgiab-*) to download only benchmark resultsResults
Before: 3 matrix jobs
After: 12 matrix jobs (with chunk_size=4)
Each job now processes at most 4 variant calls, ensuring all jobs complete within GitHub's 6-hour limit and preventing timeout/cancellation issues. The report job generates successfully even if some individual benchmarks fail.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.