Skip to content

Conversation

@dejanb
Copy link
Contributor

@dejanb dejanb commented Dec 17, 2025

Propose new CLI for manually pruning SBOMs

Summary by Sourcery

Documentation:

  • Add ADR describing a proposed trustify CLI with a prune subcommand for bulk SBOM deletion, including requirements, design, and future phases for automated lifecycle management.

Propose new CLI for manually pruning SBOMs
@dejanb dejanb requested a review from ctron December 17, 2025 13:36
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 17, 2025

Reviewer's Guide

Adds a new ADR describing a proposed trustify CLI binary focused on SBOM pruning, including architecture, CLI interface, core algorithm, implementation steps, and future phases for automated pruning and batch deletion.

Sequence diagram for trustify prune SBOM pruning workflow

sequenceDiagram
  actor Admin
  participant trustify_cli
  participant auth_module
  participant http_client
  participant trustify_server
  participant oidc_provider

  Admin ->> trustify_cli: run prune with endpoint, filter, options
  trustify_cli ->> auth_module: load_or_refresh_token
  auth_module ->> oidc_provider: validate_or_refresh_token
  oidc_provider -->> auth_module: token_or_error
  auth_module -->> trustify_cli: token

  loop for_each_page_of_sbombs
    trustify_cli ->> http_client: get_sboms(filter, batch_size, offset)
    http_client ->> trustify_server: GET /v2/sbom?q=filter&limit=batch_size&offset=offset
    trustify_server -->> http_client: list_of_sboms
    http_client -->> trustify_cli: list_of_sboms

    alt dry_run_mode
      trustify_cli ->> Admin: display_sboms_to_be_deleted
      break if_no_more_pages
        trustify_cli -->> Admin: dry_run_summary
      end
    else delete_mode
      alt total_sbombs_gt_threshold_and_not_yes_flag
        trustify_cli ->> Admin: prompt_confirmation
        Admin -->> trustify_cli: confirm_or_cancel
        opt cancel
          trustify_cli -->> Admin: abort_operation
          trustify_cli ->> trustify_cli: return_from_prune
        end
      end

      par concurrent_deletions_up_to_max_concurrent
        loop for_each_sbom
          trustify_cli ->> http_client: delete_sbom(id)
          http_client ->> trustify_server: DELETE /v2/sbom/id
          alt transient_error
            trustify_server -->> http_client: 500_or_502_or_503_or_504
            http_client ->> trustify_server: retry_with_backoff
            trustify_server -->> http_client: success_or_failure
          else success_or_non_transient_failure
            trustify_server -->> http_client: delete_result
          end
          http_client -->> trustify_cli: delete_result
          trustify_cli ->> trustify_cli: update_progress_and_log
        end
      end
    end
  end

  trustify_cli -->> Admin: final_summary
Loading

Class diagram for trustify prune CLI components

classDiagram
  class Main {
    +main()
  }

  class PruneArgs {
    +String endpoint
    +Option~String~ token
    +String filter
    +bool dry_run
    +bool yes
    +Option~PathBuf~ log_file
    +usize batch_size
    +usize max_concurrent
  }

  class PruneCommand {
    +run(args PruneArgs)
    +execute_dry_run(args PruneArgs, client TrustifyClient)
    +execute_delete(args PruneArgs, client TrustifyClient)
    +prompt_confirmation(total usize) bool
    +log_result(log_file Option~PathBuf~, result PruneResult)
  }

  class TrustifyClient {
    +String endpoint
    +String token
    +get_sboms(filter String, batch_size usize, offset usize) SbomPage
    +delete_sbom(id String) DeleteResult
  }

  class AuthManager {
    +load_token(env_token Option~String~) Option~String~
    +refresh_token(refresh_token String) String
  }

  class ProgressTracker {
    +usize total_found
    +usize deleted
    +usize failed
    +usize in_progress
    +start()
    +increment_deleted()
    +increment_failed()
    +finish()
  }

  class PruneResult {
    +usize total_found
    +usize deleted
    +usize failed
  }

  Main --> PruneArgs : parses
  Main --> PruneCommand : invokes
  PruneCommand --> TrustifyClient : uses
  PruneCommand --> AuthManager : obtains_token
  PruneCommand --> ProgressTracker : tracks_progress
  PruneCommand --> PruneResult : produces
  TrustifyClient --> PruneResult : contributes_to_result
  ProgressTracker --> PruneResult : aggregates_counts
Loading

File-Level Changes

Change Details Files
Document the design and behavior of a new trustify SBOM pruning CLI tool as an ADR.
  • Describe context, problem statement, and requirements for bulk SBOM pruning in Trustify deployments.
  • Define the new standalone trustify CLI binary, its prune subcommand, and how it communicates with the Trustify REST API using OIDC authentication.
  • Specify CLI arguments, dry-run behavior, confirmation safeguards, batch and concurrency controls, and logging support for pruning operations.
  • Outline the core pruning workflow including SBOM querying, deletion strategy with retries, progress tracking, and error handling.
  • Provide implementation details such as suggested Rust module layout, testing strategy, documentation expectations, and phased future enhancements like background services and batch delete APIs.
docs/adrs/00012-sbom-pruning-cli.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The ADR header number (# 00010. SBOM Pruning CLI Tool) does not match the filename (00012-sbom-pruning-cli.md); align these to avoid confusion in ADR indexing.
  • Before committing to a new trustify binary and top-level trustify/ crate, consider how this fits with existing binaries and workspace layout (e.g., reusing or extending an existing CLI crate) to avoid fragmentation of client tooling.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The ADR header number (`# 00010. SBOM Pruning CLI Tool`) does not match the filename (`00012-sbom-pruning-cli.md`); align these to avoid confusion in ADR indexing.
- Before committing to a new `trustify` binary and top-level `trustify/` crate, consider how this fits with existing binaries and workspace layout (e.g., reusing or extending an existing CLI crate) to avoid fragmentation of client tooling.

## Individual Comments

### Comment 1
<location> `docs/adrs/00012-sbom-pruning-cli.md:1` </location>
<code_context>
+# 00010. SBOM Pruning CLI Tool
+
+## Status
</code_context>

<issue_to_address>
**issue (typo):** ADR number in the title does not match the filename and may be a typo.

The file is named `00012-sbom-pruning-cli.md`, but the heading uses `00010.` Please align the number in the heading with the filename so the ADR can be referenced consistently.

```suggestion
# 00012. SBOM Pruning CLI Tool
```
</issue_to_address>

### Comment 2
<location> `docs/adrs/00012-sbom-pruning-cli.md:181-184` </location>
<code_context>
+1. **Project Setup** (`trustify/` directory)
+   - Create binary structure
+   - Add to workspace Cargo.toml
+   - Setup clap CLI framework
+
+2. **HTTP Client** (`client.rs`)
</code_context>

<issue_to_address>
**nitpick (typo):** Use “Set up” (verb) instead of “Setup” (noun) in this action item.

Here it’s functioning as a verb phrase (an action to perform), so it should read “Set up clap CLI framework,” not the noun form “Setup.”

```suggestion
1. **Project Setup** (`trustify/` directory)
   - Create binary structure
   - Add to workspace Cargo.toml
   - Set up clap CLI framework
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -0,0 +1,243 @@
# 00010. SBOM Pruning CLI Tool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): ADR number in the title does not match the filename and may be a typo.

The file is named 00012-sbom-pruning-cli.md, but the heading uses 00010. Please align the number in the heading with the filename so the ADR can be referenced consistently.

Suggested change
# 00010. SBOM Pruning CLI Tool
# 00012. SBOM Pruning CLI Tool

Comment on lines +181 to +184
1. **Project Setup** (`trustify/` directory)
- Create binary structure
- Add to workspace Cargo.toml
- Setup clap CLI framework
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Use “Set up” (verb) instead of “Setup” (noun) in this action item.

Here it’s functioning as a verb phrase (an action to perform), so it should read “Set up clap CLI framework,” not the noun form “Setup.”

Suggested change
1. **Project Setup** (`trustify/` directory)
- Create binary structure
- Add to workspace Cargo.toml
- Setup clap CLI framework
1. **Project Setup** (`trustify/` directory)
- Create binary structure
- Add to workspace Cargo.toml
- Set up clap CLI framework

@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.23%. Comparing base (648d488) to head (f2917bb).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2186      +/-   ##
==========================================
- Coverage   68.24%   68.23%   -0.01%     
==========================================
  Files         376      376              
  Lines       21208    21208              
  Branches    21208    21208              
==========================================
- Hits        14473    14472       -1     
+ Misses       5868     5864       -4     
- Partials      867      872       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


### Core Algorithm

1. Authenticate with Trustify API using OIDC token
Copy link
Contributor

@ruromero ruromero Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having users retrieve and provide OIDC tokens given the short life span (10 mins) they have will be frustrating to users and not practical for unassisted tasks (example Jobs).

For unassisted tasks like Jobs or CI I guess we can either provide an offline_token (retrieved from the trustify UI, but that requires a specific OIDC client with offline_access) or client_id + client_secret that will interact directly with the IdP to retrieve a valid token.
For standard cli usage like oc or podman users can use the Device Auth Flow (the token will be exposed by the UI) or even involve a browser and do Authentication Code Flow (I don't think that's worth it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to have a full blown OIDC client in there. Using client ID/secret in the process. That allow to create tokens on demand and not run into any issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the purpose of the client and who is going to use it. Managing clients in the IdP and in Trustify is not trivial and requires configuration and restarts, that's why an offline_token can be useful in such cases.


### Requirements

**Immediate need**: Manual tool for bulk SBOM deletion to reduce storage costs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a CLI tool is the immediate need.
I think what is required are some control settings that allow an admin user to define the retention period for each document type, and we could start with SBOMs. Then an automated process that runs on a predefined frequency looking for documents that are outside of the retention period and for each trigger the SBOM delete API.
It would be nice if the retention period control(s) are exposed (to only Admin Users) via the UI.
But it could just be a deployment config setting to start with.

2. Query SBOMs using filter: `GET /v2/sbom?q={filter}&limit={batch_size}&offset={offset}`
3. For dry-run: Display list of SBOMs that would be deleted, exit
4. If not dry-run and count > 100: Prompt for confirmation (unless `--yes`)
5. Delete SBOMs concurrently (up to max_concurrent): `DELETE /v2/sbom/{id}`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to test the concurrency in the context of ensuring orphaned packages get deleted.
For example if the same package is referenced by only 2 SBOMs, and the 2 SBOMs are both deleted concurrently, whether each delete transaction would consider the package as referenced by another SBOM, therefore not orphaned, and not eligible for deletion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2191 (comment)
This issue describes the analysis of this problem.​

@ctron
Copy link
Contributor

ctron commented Dec 19, 2025

I like the approach. What might get tricky is the amount of connection information required (endpoint, issuer URL, client ID and secret), … I think it would make sense to store this in a configuration, with a profile (multiple instances). Then use --profile or TRUSTIFY_PROFILE (env-var), with the option to have a default one.

This file could be created manually at first. But later on, we could have trustify login as a command, performing those steps and manage the creation of this file.

@bxf12315
Copy link
Contributor

Regarding this ADR, I have three questions.​
First, the deletion here should be a physical deletion rather than a soft deletion, right?​
Second, is it sufficient to record deletions only through logs, without using a history table to capture who deleted the historical data, why these data were deleted, and related information, in order to form a complete, auditable deletion trail?​
Third, do we need to consider re‑ingest the deleted records?​

@dejanb
Copy link
Contributor Author

dejanb commented Dec 19, 2025

Thinking a bit more about it, maybe a better approach would be to create an 'admin' API instead of the CLI. The functionality would be completely the same, we would just avoid thinking about all these auth topics and publishing/distributing the binary. Also, users wouldn't need to install anything and could just use the feature using curl/httpie.

So, the example usage could be something like

http POST https://trustify.example.com/v1/admin/sbom/prune filter=="ingested<90 days ago&label:env=staging" batch-size ==500 dry-run==true

@ruromero @PhilipCattanach @ctron wdyt?
@PhilipCattanach Automated process is the end goal, but it would take much more time to develop. Creating manual cli/endpoint is the step we need to do anyway (it contains the actual pruning logic) and it will provide users with something useful much sooner. Additionally, having a tool for manual management is useful on its own for cleaning up one-off incidents and testing automated configurations.

@ruromero
Copy link
Contributor

Thinking a bit more about it, maybe a better approach would be to create an 'admin' API instead of the CLI. The functionality would be completely the same, we would just avoid thinking about all these auth topics and publishing/distributing the binary. Also, users wouldn't need to install anything and could just use the feature using curl/httpie.

So, the example usage could be something like

http POST https://trustify.example.com/v1/admin/sbom/prune filter=="ingested<90 days ago&label:env=staging" batch-size ==500 dry-run==true

@ruromero @PhilipCattanach @ctron wdyt? @PhilipCattanach Automated process is the end goal, but it would take much more time to develop. Creating manual cli/endpoint is the step we need to do anyway (it contains the actual pruning logic) and it will provide users with something useful much sooner. Additionally, having a tool for manual management is useful on its own for cleaning up one-off incidents and testing automated configurations.

I think this is a required step whether we eventually implement the CLI or not. The CLI must interact with the REST API in any case, so we can first clarify how the admin API should be defined and implement it and then decide whether we need the CLI or not.
If the only purpose is to call this admin endpoint for purge. A CLI might not be fully justified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants