From c69739523afe1d7762afb6c9f6096d7eb0567d73 Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Fri, 11 Apr 2025 13:49:52 +0200 Subject: [PATCH 1/8] STAC-22541: Derived state monitors --- SUMMARY.md | 1 + use/alerting/k8s-derived-state-monitors.md | 39 ++++++++++++++++++++++ use/alerting/kubernetes-monitors.md | 22 +++++------- 3 files changed, 49 insertions(+), 13 deletions(-) create mode 100644 use/alerting/k8s-derived-state-monitors.md diff --git a/SUMMARY.md b/SUMMARY.md index 32de888c6..0d3a1ea80 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -32,6 +32,7 @@ * [Troubleshooting](use/alerting/notifications/troubleshooting.md) * [Customize](dynamic/customize-alerting.md) * [Add a monitor using the CLI](use/alerting/k8s-add-monitors-cli.md) + * [Derived State monitor](use/alerting/k8s-derived-state-monitors.md) * [Override monitor arguments](use/alerting/k8s-override-monitor-arguments.md) * [Write a remediation guide](use/alerting/k8s-write-remediation-guide.md) diff --git a/use/alerting/k8s-derived-state-monitors.md b/use/alerting/k8s-derived-state-monitors.md new file mode 100644 index 000000000..7ccb01ca8 --- /dev/null +++ b/use/alerting/k8s-derived-state-monitors.md @@ -0,0 +1,39 @@ +--- +description: SUSE Observability +--- + +# Derived State Monitors + +## Overview + +In Observability scenarios where logical (business) components lack direct monitors but are affected by issues in their technical dependencies, you can use the derived-state-monitor to propagate health states to them. +This monitor traverses component dependencies and selects the most critical health state based on direct observations (e.g., from metrics), ignoring any already-derived states. It starts from a group of components defined by `componentTypes` and propagates health upwards to the top-level logical components. +During traversal, only components with observed (non-derived) health states are considered for health propagation. Components with derived states are skipped in evaluation but still traversed to reach deeper dependencies—for example, logical components depending on other logical components. + +## Derived Health State Monitor example + +A Monitor implemented using the `derived-state-monitor` function looks like: + +``` + - _type: "Monitor" + name: "Aggregated health state of a Deployment, StatefulSet, ReplicaSet and DaemonSet" + tags: + - deployments + - replicasets + - statefulsets + - daemonsets + - derived + - propagated + identifier: "urn:custom:monitor:..." + status: "DISABLED" + description: "Description" + function: {{ get "urn:stackpack:common:monitor-function:derived-state-monitor" }} + arguments: + componentTypes: "deployment, replicaset, statefulset, daemonset" + intervalSeconds: 30 + remediationHint: "Investigate component [{{ causeName }}](/#/components/{{ causeComponentUrnForUrl }}) as is causing the workload to be unhealthy." +``` +* The function has a single argument `componentTypes` where you can express the different component types as a single string of `,` separated values +* The function offers two values to use in the remediation guide, `causeName` being the component name where the state is propagated from and its `causeComponentUrnForUrl` to be able to create a link + +The monitor can be implement using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md) \ No newline at end of file diff --git a/use/alerting/kubernetes-monitors.md b/use/alerting/kubernetes-monitors.md index 038d1a75b..f45845804 100644 --- a/use/alerting/kubernetes-monitors.md +++ b/use/alerting/kubernetes-monitors.md @@ -144,22 +144,18 @@ Cluster doesn't have any health itself. But a cluster is build from few componen - all nodes and then takes the most critical health state. -### Aggregated health state of a DaemonSet +### Derived Workloads health state (Deployment, DaemonSet, ReplicaSet, StatefulSet) -The monitor aggregates states of all children Pods and then returns the most critical health state. +The monitor aggregates states of all top-most dependencies and then returns the most critical health state based on direct observations (e.g., from metrics). +This approach ensures that health signals propagate from low-level technical components (like Pods) to higher-level logical components, but only when the component itself lacks an observed health state. +To use this monitor effectively, make sure that some or all of following health checks are disabled: +* Deployment desired replicas +* DaemonSet desired replicas +* ReplicaSet desired replicas +* StatefulSet desired replicas -### Aggregated health state of a Deployment +If you have a use case where logical components have no direct monitors then you can use the [Derived State Monitor](/use/alerting/k8s-derived-state-monitors.md) function to infer their health based on the technical components they depend on. -The monitor aggregates states of all children ReplicaSets and then returns the most critical health state. ReplicaSets have -the similar Monitor, so eventually this one aggregates health states of all children ReplicaSets and Pods. - -### Aggregated health state of a ReplicaSet - -The monitor aggregates states of all children Pods and then returns the most critical health state. - -### Aggregated health state of a StatefulSet - -The monitor aggregates states of all children Pods and then returns the most critical health state. ## See also From a1d7c5778baf83109d256ee60cc19d5c337aa388 Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Tue, 15 Apr 2025 08:27:42 +0200 Subject: [PATCH 2/8] STAC-22541: Address review comments --- use/alerting/k8s-derived-state-monitors.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/use/alerting/k8s-derived-state-monitors.md b/use/alerting/k8s-derived-state-monitors.md index 7ccb01ca8..fd708f7c4 100644 --- a/use/alerting/k8s-derived-state-monitors.md +++ b/use/alerting/k8s-derived-state-monitors.md @@ -6,9 +6,9 @@ description: SUSE Observability ## Overview -In Observability scenarios where logical (business) components lack direct monitors but are affected by issues in their technical dependencies, you can use the derived-state-monitor to propagate health states to them. -This monitor traverses component dependencies and selects the most critical health state based on direct observations (e.g., from metrics), ignoring any already-derived states. It starts from a group of components defined by `componentTypes` and propagates health upwards to the top-level logical components. -During traversal, only components with observed (non-derived) health states are considered for health propagation. Components with derived states are skipped in evaluation but still traversed to reach deeper dependencies—for example, logical components depending on other logical components. +In Observability scenarios where logical (business) components lack direct monitors but are affected by issues in their technical dependencies, you can use the derived-state-monitor function to derive a state from the connected technical components for the logical component. +This monitor traverses component dependencies and selects the most critical health state based on direct observations (e.g., from metrics), ignoring any already-derived states. It will apply the derived state to all components selected through the `componentTypes` parameter. +During traversal, only components with observed (non-derived) health states are considered for health derivation. Components with derived states are skipped in evaluation but still traversed to reach deeper dependencies—for example, logical components depending on other logical components. ## Derived Health State Monitor example @@ -34,6 +34,6 @@ A Monitor implemented using the `derived-state-monitor` function looks like: remediationHint: "Investigate component [{{ causeName }}](/#/components/{{ causeComponentUrnForUrl }}) as is causing the workload to be unhealthy." ``` * The function has a single argument `componentTypes` where you can express the different component types as a single string of `,` separated values -* The function offers two values to use in the remediation guide, `causeName` being the component name where the state is propagated from and its `causeComponentUrnForUrl` to be able to create a link +* The function offers two values to use in the remediation guide, `causeComponentName` being the component name where the state is propagated from and its `causeComponentUrnForUrl` to be able to create a link -The monitor can be implement using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md) \ No newline at end of file +The monitor can be implemented using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md) \ No newline at end of file From 5592703ac56e9d15cf4b82e3e3fe0907deeed68e Mon Sep 17 00:00:00 2001 From: Daniel Barra Date: Tue, 15 Apr 2025 14:02:52 -0300 Subject: [PATCH 3/8] STAC-22628: Add release notes 2.3.2 release --- SUMMARY.md | 1 + setup/release-notes/v2.3.2.md | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) create mode 100644 setup/release-notes/v2.3.2.md diff --git a/SUMMARY.md b/SUMMARY.md index c48d37d6a..5017660a6 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -134,6 +134,7 @@ * [v2.2.1 - 10/Dec/2024](setup/release-notes/v2.2.1.md) * [v2.3.0 - 30/Jan/2025](setup/release-notes/v2.3.0.md) * [v2.3.1 - 17/Mar/2025](setup/release-notes/v2.3.1.md) + * [v2.3.2 - 14/Apr/2025](setup/release-notes/v2.3.2.md) * [Upgrade SUSE Observability](setup/upgrade-stackstate/README.md) * [Migration from StackState](setup/upgrade-stackstate/migrate-from-6.md) * [Steps to upgrade](setup/upgrade-stackstate/steps-to-upgrade.md) diff --git a/setup/release-notes/v2.3.2.md b/setup/release-notes/v2.3.2.md new file mode 100644 index 000000000..438c47cb6 --- /dev/null +++ b/setup/release-notes/v2.3.2.md @@ -0,0 +1,23 @@ +--- +description: SUSE Observability Self-hosted +--- + +# v2.3.2 - 14/April/2025 + +## Release Notes: SUSE Observability Helm Chart v2.3.2 + +### New Features & Enhancements + +* **Analytics Deprecation:** The Analytics feature is now deprecated and disabled by default for all users. To re-enable it, grant the `access-analytics` and `execute-scripts` permissions to the relevant users or roles. +* **Restricted Scope Metrics:** Users with a restricted scope will now only have visibility into metrics collected after the platform upgrade. Historical metrics prior to the upgrade will not be accessible. +* **Log Noise Reduction:** Implemented a fix to suppress `x-forwarded-for` errors in logs when an IP:Port combination is used in the forwarding configuration. + +### Bug Fixes + +* **Traces in HA Profile:** Resolved an issue where the Traces functionality was partially disabled in the `4000-ha` profile. +* **Broken Link Fixes:** Fixed various broken links identified throughout the product user interface and documentation. + +## Agent Bug Fixes + +* **Static Pod Log Scraping:** The agent has been enhanced to now scrape logs for static pods. This is achieved by utilizing the `kubernetes.io/config.mirror` annotation for system pods. +* **Secret Environment Variables:** Fixed an issue to ensure proper support for the `global.extraEnv.secret` configuration, allowing the addition of secret environment variables to the agent pods. \ No newline at end of file From f73f98c87757f1e48a1a0167c0b708e36d7f72c3 Mon Sep 17 00:00:00 2001 From: Daniel Barra Date: Wed, 16 Apr 2025 07:33:19 -0300 Subject: [PATCH 4/8] Remove unecessary information on release note --- setup/release-notes/v2.3.2.md | 1 - 1 file changed, 1 deletion(-) diff --git a/setup/release-notes/v2.3.2.md b/setup/release-notes/v2.3.2.md index 438c47cb6..f12816144 100644 --- a/setup/release-notes/v2.3.2.md +++ b/setup/release-notes/v2.3.2.md @@ -9,7 +9,6 @@ description: SUSE Observability Self-hosted ### New Features & Enhancements * **Analytics Deprecation:** The Analytics feature is now deprecated and disabled by default for all users. To re-enable it, grant the `access-analytics` and `execute-scripts` permissions to the relevant users or roles. -* **Restricted Scope Metrics:** Users with a restricted scope will now only have visibility into metrics collected after the platform upgrade. Historical metrics prior to the upgrade will not be accessible. * **Log Noise Reduction:** Implemented a fix to suppress `x-forwarded-for` errors in logs when an IP:Port combination is used in the forwarding configuration. ### Bug Fixes From 2f62140668a02d93454a5b7306b7523a0b76d04e Mon Sep 17 00:00:00 2001 From: Alejandro Acevedo Date: Thu, 17 Apr 2025 09:39:08 +0200 Subject: [PATCH 5/8] STAC-22541: Update binding name and add extra one --- use/alerting/k8s-derived-state-monitors.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/use/alerting/k8s-derived-state-monitors.md b/use/alerting/k8s-derived-state-monitors.md index fd708f7c4..aa31210bb 100644 --- a/use/alerting/k8s-derived-state-monitors.md +++ b/use/alerting/k8s-derived-state-monitors.md @@ -31,9 +31,11 @@ A Monitor implemented using the `derived-state-monitor` function looks like: arguments: componentTypes: "deployment, replicaset, statefulset, daemonset" intervalSeconds: 30 - remediationHint: "Investigate component [{{ causeName }}](/#/components/{{ causeComponentUrnForUrl }}) as is causing the workload to be unhealthy." + remediationHint: "Investigate component [{{ causeComponentName }}](/#/components/{{ causeComponentUrnForUrl }}) as is causing the workload to be unhealthy." ``` * The function has a single argument `componentTypes` where you can express the different component types as a single string of `,` separated values -* The function offers two values to use in the remediation guide, `causeComponentName` being the component name where the state is propagated from and its `causeComponentUrnForUrl` to be able to create a link +* The function offers three values to use in the remediation guide + * `componentName` being the name of the logical component. + * `causeComponentName` being the component name where the state is propagated from and its `causeComponentUrnForUrl` to be able to create a link. The monitor can be implemented using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md) \ No newline at end of file From 10bb84544372a0b96e77e22f43ed307cf49f8025 Mon Sep 17 00:00:00 2001 From: Bram Schuur Date: Thu, 17 Apr 2025 10:10:00 +0200 Subject: [PATCH 6/8] STAC-22628: Late addition of release note --- setup/release-notes/v2.3.2.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/setup/release-notes/v2.3.2.md b/setup/release-notes/v2.3.2.md index f12816144..e348bedcc 100644 --- a/setup/release-notes/v2.3.2.md +++ b/setup/release-notes/v2.3.2.md @@ -19,4 +19,5 @@ description: SUSE Observability Self-hosted ## Agent Bug Fixes * **Static Pod Log Scraping:** The agent has been enhanced to now scrape logs for static pods. This is achieved by utilizing the `kubernetes.io/config.mirror` annotation for system pods. -* **Secret Environment Variables:** Fixed an issue to ensure proper support for the `global.extraEnv.secret` configuration, allowing the addition of secret environment variables to the agent pods. \ No newline at end of file +* **Secret Environment Variables:** Fixed an issue to ensure proper support for the `global.extraEnv.secret` configuration, allowing the addition of secret environment variables to the agent pods. +* **Private Image Registry** Fixed various issues around providing a private docker repository in the rancher UI. \ No newline at end of file From 739220c98fa31f900f9496bfb7024c97d3da7c39 Mon Sep 17 00:00:00 2001 From: Daniel Barra Date: Tue, 22 Apr 2025 15:05:14 -0300 Subject: [PATCH 7/8] STAC-22628: Update release notes 2.3.2 --- SUMMARY.md | 2 +- setup/release-notes/v2.3.2.md | 7 ++++++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/SUMMARY.md b/SUMMARY.md index 5017660a6..f2e9c0517 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -134,7 +134,7 @@ * [v2.2.1 - 10/Dec/2024](setup/release-notes/v2.2.1.md) * [v2.3.0 - 30/Jan/2025](setup/release-notes/v2.3.0.md) * [v2.3.1 - 17/Mar/2025](setup/release-notes/v2.3.1.md) - * [v2.3.2 - 14/Apr/2025](setup/release-notes/v2.3.2.md) + * [v2.3.2 - 22/Apr/2025](setup/release-notes/v2.3.2.md) * [Upgrade SUSE Observability](setup/upgrade-stackstate/README.md) * [Migration from StackState](setup/upgrade-stackstate/migrate-from-6.md) * [Steps to upgrade](setup/upgrade-stackstate/steps-to-upgrade.md) diff --git a/setup/release-notes/v2.3.2.md b/setup/release-notes/v2.3.2.md index e348bedcc..b0e7b28f6 100644 --- a/setup/release-notes/v2.3.2.md +++ b/setup/release-notes/v2.3.2.md @@ -10,14 +10,19 @@ description: SUSE Observability Self-hosted * **Analytics Deprecation:** The Analytics feature is now deprecated and disabled by default for all users. To re-enable it, grant the `access-analytics` and `execute-scripts` permissions to the relevant users or roles. * **Log Noise Reduction:** Implemented a fix to suppress `x-forwarded-for` errors in logs when an IP:Port combination is used in the forwarding configuration. +* **Derived State Monitor:** Introduced a new "Derived State Monitor" feature, allowing the derivation of a state based on the status of logical components. ### Bug Fixes * **Traces in HA Profile:** Resolved an issue where the Traces functionality was partially disabled in the `4000-ha` profile. * **Broken Link Fixes:** Fixed various broken links identified throughout the product user interface and documentation. +* **STS CLI Error Handling:** The STS CLI command for uploading a new stackpack now provides more informative and actionable error messages. +* **Private Agent Repository in Rancher:** Addressed various issues related to configuring and utilizing a private repository for the agent within the Rancher UI. +* **Logs API Key via Header:** The API key for accessing logs is now securely passed as a header in API requests. ## Agent Bug Fixes * **Static Pod Log Scraping:** The agent has been enhanced to now scrape logs for static pods. This is achieved by utilizing the `kubernetes.io/config.mirror` annotation for system pods. * **Secret Environment Variables:** Fixed an issue to ensure proper support for the `global.extraEnv.secret` configuration, allowing the addition of secret environment variables to the agent pods. -* **Private Image Registry** Fixed various issues around providing a private docker repository in the rancher UI. \ No newline at end of file +* **Process Agent Kernel Compatibility:** Enabled the process-agent to run on a wider range of Linux kernel versions, specifically between 5.0 and 5.11 (inclusive). +* **Private Image Registry** Fixed various issues around providing a private docker repository in the rancher UI. From 4c6178f0a5d9a200b4263c9281ed52aa4f290c3f Mon Sep 17 00:00:00 2001 From: Daniel Barra Date: Tue, 22 Apr 2025 15:08:02 -0300 Subject: [PATCH 8/8] STAC-22628: Update release notes date of release --- setup/release-notes/v2.3.2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup/release-notes/v2.3.2.md b/setup/release-notes/v2.3.2.md index b0e7b28f6..a6a5d61a0 100644 --- a/setup/release-notes/v2.3.2.md +++ b/setup/release-notes/v2.3.2.md @@ -2,7 +2,7 @@ description: SUSE Observability Self-hosted --- -# v2.3.2 - 14/April/2025 +# v2.3.2 - 22/April/2025 ## Release Notes: SUSE Observability Helm Chart v2.3.2