Conversation
Greptile SummaryThis PR introduces a
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| api/v1alpha1/sentinelrunjob_types.go | New CRD type definition for SentinelRunJob. The struct is clean but is out of sync with the generated CRD YAML and deepcopy (which include format and jobSpec fields not present in the Go type). |
| api/v1alpha1/zz_generated.deepcopy.go | Generated deepcopy functions for SentinelRunJob types. Out of sync with the Go types — references a JobSpec field that doesn't exist in the current SentinelRunJobSpec struct. |
| internal/controller/sentinelrunjob_controller.go | Core reconciler for SentinelRunJob CRs. Has issues with finalizer logic (no cleanup on delete, no early return), and the overall flow continues even during deletion. |
| internal/controller/sentinelrunjob_job.go | Job generation logic for sentinel runs. Contains a switch fallthrough bug in ensureDefaultVolumeMounts that prevents filtering duplicate volume mounts. Largely duplicated from pkg/controller/sentinel/job.go. |
| internal/controller/sentinelrunjob_secret.go | Secret reconciliation for sentinel run jobs. Contains a critical bug: hasRunSecretData uses envConsoleURL instead of envRunID to check the run ID, causing unnecessary secret updates on every reconcile. |
| pkg/controller/sentinel/reconciler.go | Modified sentinel reconciler that now creates SentinelRunJob CRs. Contains a bug where reconcileSentinelRunJobCR always tries to Create the CR even when it already exists, causing AlreadyExists errors on every re-reconcile. |
| cmd/agent/kubernetes.go | Registers the new SentinelRunJobReconciler controller with the manager. Follows the same pattern as other controller registrations in this file. |
| charts/deployment-operator/crds/deployments.plural.sh_sentinelrunjobs.yaml | Helm chart CRD for SentinelRunJob. Out of sync with Go types — includes format (required) and jobSpec fields not present in the Go struct. |
| config/crd/bases/deployments.plural.sh_sentinelrunjobs.yaml | Base CRD definition for SentinelRunJob. Same sync issue as the Helm chart CRD — generated from a different version of the type that had additional fields. |
Sequence Diagram
sequenceDiagram
participant Console as Console API
participant SR as SentinelReconciler (webhook)
participant K8s as Kubernetes API
participant SRJC as SentinelRunJobReconciler
participant Job as Kubernetes Job
Console->>SR: Poll / WebSocket event (pending run)
SR->>K8s: Create SentinelRunJob CR
K8s-->>SRJC: Reconcile event
SRJC->>Console: GetSentinelRunJob(runID)
Console-->>SRJC: SentinelRunJobFragment
SRJC->>K8s: Create/Get Secret (env vars)
SRJC->>K8s: Create/Get Job
SRJC->>K8s: Set OwnerRef (Job→Secret)
SRJC->>K8s: Set ControllerRef (CR→Job)
SRJC->>SRJC: Check Job health status
SRJC->>Console: UpdateSentinelRunJobStatus
SRJC->>K8s: Patch CR status (conditions)
Note over SRJC,Job: On Job status change (Owns), re-reconcile
Job-->>SRJC: Job status update event
SRJC->>Console: UpdateSentinelRunJobStatus (failed if degraded)
Last reviewed commit: f603c04
1e2b71e to
a9b14bd
Compare
20845e1 to
36d883a
Compare
|
Are you posting the right job reference back to the plural api? Seems like the ui isn't appropriately loading that for stacks at least |
| } | ||
|
|
||
| var status *console.SentinelRunJobStatus | ||
| if health != nil { |
There was a problem hiding this comment.
when the job completes, we should clean up the SentinelRunJob resource and that should cascade to deleting the k8s job as well
|
|
||
| // Reconcile StackRun's Job ensure that Console stays in sync with Kubernetes cluster. | ||
| func (r *StackRunJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { | ||
| func (r *StackRunJobReconciler) Reconcile(ctx context.Context, req ctrl.Request) (_ reconcile.Result, retErr error) { |
There was a problem hiding this comment.
is there any logic to ensure the stack run job is cleaned up once complete here?
There was a problem hiding this comment.
(we don't want these crs just living forever on the cluster, especially since they're reconciled again on each pod restart)
There was a problem hiding this comment.
good point, I have to add this logic
Introduce a
SentinelRunJobandStackRunJobCRD and controllers to manage job lifecycle via controller-runtime, improving reconciliation reliability, ownership handling, and status syncing back to Console.(from michael: Also fixes PROD-4454 (maybe, should e2e test to confirm))
Test Plan
Test environment: https://console.your-env.onplural.sh/
Checklist