Skip to content

Conversation

@lakhmanisahil
Copy link

@lakhmanisahil lakhmanisahil commented Jan 5, 2026

Fixes #201

Summary

This PR implements a 15-minute timeout for builds to prevent them from remaining indefinitely in the RUNNING state. It introduces a new TIMED_OUT build state and enforces timeout handling independently in the builder and progress updater, following maintainer guidance.


Changes Made

1. Core Timeout Infrastructure

  • Added BuildState.TIMED_OUT to represent builds terminated due to timeout
  • Introduced a shared timeout constant (BUILD_TIMEOUT_SECONDS = 900) in common/config.py
  • Added time_started_running to BuildInfo to track when a build actually begins execution
  • Added error_message to BuildInfo to store timeout and failure details

2. Builder (builder/builder.py)

  • Enforced timeout on all build subprocess steps (configure, clean, build) using timeout=BUILD_TIMEOUT_SECONDS
  • Added handling for subprocess.TimeoutExpired:
    • Terminates the running process
    • Aborts the build cleanly
    • Records a clear timeout error message
  • Added __check_if_timed_out() to detect if the progress updater has already marked a build as TIMED_OUT
  • Builder checks timeout state between build steps for early termination
  • Skips archive generation for timed-out builds
  • Ensures cleanup runs even when a build times out or fails

3. Build Manager (build_manager/manager.py)

  • Added mark_build_timed_out() to safely transition builds to TIMED_OUT
  • Prevents overriding terminal states (SUCCESS, FAILURE)
  • Automatically records time_started_running when a build enters the RUNNING state
  • Extended BuildInfo.to_dict() using getattr() to maintain backward compatibility with existing Redis entries

4. Progress Updater (build_manager/progress_updater.py)

  • Added __check_build_timeout() invoked from the existing periodic update loop
  • Timeout is measured from time_started_running (not time_created) for accuracy
  • Handles edge cases where time_started_running is not yet available
  • Marks builds as TIMED_OUT once the timeout threshold is exceeded
  • Added handling for TIMED_OUT in state and progress update paths
  • Includes clear logging for timeout detection

5. Web API (web/app.py)

  • Added /api/builds/<build_id>/status endpoint for lightweight polling
  • Returns build state, progress, and timeout error information
  • Improved robustness of get_all_builds() by skipping and logging individual build errors instead of failing the entire request
  • Updated API usage to align with recent changes (remote_name / commit_ref)

Implementation Details

As suggested by the maintainer, timeout handling is implemented independently in two places:

  1. Builder subprocess timeout
    Each subprocess call is bounded by BUILD_TIMEOUT_SECONDS. If exceeded, the process is terminated and the build is aborted.

  2. Progress updater timeout detection
    The existing periodic task checks whether any RUNNING build has exceeded the timeout since entering the running state and marks it as TIMED_OUT.

Both mechanisms coordinate via BuildState.TIMED_OUT:

  • Either mechanism may trigger the timeout
  • Builder exits early if the progress updater has already marked the build as timed out
  • No direct coupling between the two workflows

This preserves separation of responsibilities while keeping behavior consistent.


Testing

  • Built and tested locally using Docker
  • Verified subprocess timeouts terminate long-running builds
  • Verified progress updater correctly marks timed-out builds
  • Confirmed both mechanisms operate independently
  • Verified timed-out builds do not generate archives
  • Confirmed timeout errors are logged and exposed via API
  • No circular imports or startup errors observed

Backward Compatibility

  • New fields (time_started_running, error_message) are accessed via getattr()
  • No schema or Redis migration required
  • Existing builds continue to function normally

Future Work (not included)

  • Make timeout configurable via environment variable
  • UI support for retrying timed-out builds

Supporting Images (checked for 2 minutes)

Screenshot from 2026-01-05 22-47-47 Screenshot from 2026-01-05 22-44-02 Screenshot from 2026-01-05 22-43-46

@lakhmanisahil
Copy link
Author

Hello @shiv-tyagi,
I’ve reviewed all the suggested changes. I’ll address them and push an update shortly.

Fixes ArduPilot#201.

Signed-off-by: Sahil <lakhmanisahil8@gmail.com>
- Add CBS_BUILD_TIMEOUT_SEC environment variable (defaults to 900s/15min)
- Use env var in both builder and progress_updater for consistency
- Move state transition logic from manager to progress_updater
- Fix time_started persistence by passing build_info parameter
- Add subprocess timeout protection in builder
- Remove common/config.py in favor of environment variable
- Add example timeout value (120s) in .env.sample for testing
- Update docker-compose.yml to pass timeout env var to containers
@lakhmanisahil lakhmanisahil force-pushed the feat/add-build-timeout branch from 9c6c662 to 02ee09e Compare January 16, 2026 18:22
@lakhmanisahil
Copy link
Author

lakhmanisahil commented Jan 16, 2026

Hello @shiv-tyagi,

I’ve updated the PR as per your review:

  • Introduced CBS_BUILD_TIMEOUT_SEC (default 900s / 15 min) iand removed common/config.py.
  • Used the env var consistently in progress_updater.py and builder.py.
  • time_started is now set only on PENDING → RUNNING in the progress updater and persisted correctly.
  • Timeout detection lives entirely in the progress updater; the builder only executes subprocesses.
  • Updated docker-compose.yml and added CBS_BUILD_TIMEOUT_SEC=120 to .env.sample for testing.
  • Added optional build_info to update_build_progress_state() so the progress updater can persist time_started directly, avoiding Redis refetches and ensuring correct timestamp storage.

Copy link
Member

@shiv-tyagi shiv-tyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is taking a good shape now. Please address the comments I have posted. One more thing, in the web/static/js/index.js file, there is a piece of code which decides the colour of labels for the build states. Put the timed out state in red along with FAILURE and ERROR states.

new_state (BuildState): The new state to set for the build.
"""
build_info = self.get_build_info(build_id=build_id)
if build_info is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this. Not in scope of this PR.

Comment on lines +290 to +297
# Set time_started when transitioning to RUNNING
if current_state != BuildState.RUNNING and new_state == BuildState.RUNNING:
build_info.time_started = time.time()
self.logger.info(
f"Build {build_id} transitioned to RUNNING state at "
f"{build_info.time_started}"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this from here and move to __refresh_running_build_state method.

)

# Check for timeout
if build_info.time_started is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set time_started here if not set.

if buid_info.time_started is None:
    set time here

-- check timeout logic ---

build_id: str,
new_state: BuildState) -> None:
new_state: BuildState,
build_info: BuildInfo=None) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't pass this here please. Only do what is really needed in this PR. Your things will still work if you don't make this change.

Tip for future open source contributions, do not touch what is not needed. That makes reviewing harder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a Timed-Out state for builds

2 participants