Skip to content

Conversation

@aantn
Copy link
Collaborator

@aantn aantn commented Jan 1, 2026

Summary

  • add dedicated Ask Holmes response principles covering ranking, links, and tone
  • include the new principles in ask prompts and align style guidance to remove conflicting terse advice

Testing

  • Not run (not requested)

Eval Guidance

  • Recommended: /eval make test-llm-ask-holmes to validate ask-holmes prompt behavior

Codex Task

Summary by CodeRabbit

  • Improvements
    • Enhanced response quality with more thoughtful, evaluative analysis
    • Responses now include ranked alternatives with brief rationale and validated reference links
    • Improved clarity and readability through refined formatting and structure
    • Responses maintain conciseness while preserving essential reasoning and evidence

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Codex <codex@openai.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 1, 2026

Walkthrough

This change introduces a shared response principles template and refactors prompt guidance across multiple Holmes prompt templates. A new _ask_holmes_response_principles.jinja2 file establishes common response guidelines, which is then included in three existing prompt templates (generic_ask.jinja2, generic_ask_conversation.jinja2, and generic_ask_for_issue_conversation.jinja2). The style guide bullets are updated to emphasize preserving essential reasoning, ranking alternatives, and linking evidence while remaining concise.

Changes

Cohort / File(s) Summary
New Response Principles Template
holmes/plugins/prompts/_ask_holmes_response_principles.jinja2
Introduces shared response principles guiding directness, skeptical evaluation, thoroughness in ranking alternatives, conciseness, link validation, and self-checking against user questions.
Generic Ask Prompt Templates
holmes/plugins/prompts/generic_ask.jinja2, holmes/plugins/prompts/generic_ask_conversation.jinja2, holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2
All three files now include the shared principles template. Style guide bullets updated to shift from terse/minimal guidance toward preserving essential reasoning and evidence while providing ranked alternatives with rationale and links.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested reviewers

  • mainred
  • moshemorad

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: introducing new response principles for Ask Holmes. It accurately reflects the primary focus of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d7448a and 37248e1.

📒 Files selected for processing (4)
  • holmes/plugins/prompts/_ask_holmes_response_principles.jinja2
  • holmes/plugins/prompts/generic_ask.jinja2
  • holmes/plugins/prompts/generic_ask_conversation.jinja2
  • holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2
🧰 Additional context used
📓 Path-based instructions (1)
holmes/plugins/prompts/**/*.jinja2

📄 CodeRabbit inference engine (CLAUDE.md)

Prompts: organize as holmes/plugins/prompts/{name}.jinja2

Files:

  • holmes/plugins/prompts/_ask_holmes_response_principles.jinja2
  • holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2
  • holmes/plugins/prompts/generic_ask_conversation.jinja2
  • holmes/plugins/prompts/generic_ask.jinja2
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: Applies to holmes/plugins/prompts/**/*.jinja2 : Prompts: organize as `holmes/plugins/prompts/{name}.jinja2`
📚 Learning: 2025-12-29T08:35:37.668Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: Applies to holmes/plugins/prompts/**/*.jinja2 : Prompts: organize as `holmes/plugins/prompts/{name}.jinja2`

Applied to files:

  • holmes/plugins/prompts/_ask_holmes_response_principles.jinja2
  • holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2
  • holmes/plugins/prompts/generic_ask_conversation.jinja2
  • holmes/plugins/prompts/generic_ask.jinja2
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build
  • GitHub Check: llm_evals
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.11)
🔇 Additional comments (9)
holmes/plugins/prompts/generic_ask.jinja2 (2)

14-14: LGTM! Shared principles template included correctly.

The include statement correctly references the new shared response principles template using the underscore prefix convention for partial templates.


18-20: Style guide updates align well with new response principles.

The updated style guide bullets now emphasize:

  • Preserving essential reasoning and evidence (vs. previous terse approach)
  • Ranking alternatives with rationale and links (new guidance)
  • Avoiding unnecessary repetition (refined from previous wording)

This creates better alignment between the shared principles and template-specific style guidance.

holmes/plugins/prompts/generic_ask_conversation.jinja2 (2)

11-11: LGTM! Consistent include placement.

The shared response principles are correctly included after general instructions, maintaining consistency with the other ask templates.


15-17: LGTM! Style guide consistency maintained.

The style guide updates are identical to those in generic_ask.jinja2, ensuring consistent guidance across all ask templates.

holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2 (2)

27-28: LGTM! Include placement appropriate for this template's structure.

The shared response principles are included after the conversation history section, which is appropriate given this template's unique structure that front-loads investigation context.


34-36: LGTM! Style guide consistency maintained across all templates.

The style guide updates match those in the other ask templates, ensuring uniform guidance throughout.

holmes/plugins/prompts/_ask_holmes_response_principles.jinja2 (3)

1-9: LGTM! Strong foundational principles established.

The response principles provide clear, actionable guidance that promotes:

  • Direct, honest communication
  • Critical thinking and evaluation
  • Thoroughness with structured ranking
  • Conciseness and efficiency

These principles create a solid foundation for consistent "Ask Holmes" responses.


11-12: LGTM! Quality control and readability guidance.

The self-check instruction promotes accuracy, while bold formatting guidance enhances readability. Both are practical and beneficial.


10-10: Remove the suggestion to revise the instruction about testing links.

Holmes provides link validation capabilities through the internet toolset's fetch_webpage tool, which the LLM can call to verify that URLs are accessible and return expected content. The instruction to "Test the links for validity and correctness before returning them" is reasonable and implementable.

The fetch_webpage tool makes HTTP requests and returns error messages (including HTTP status errors, timeouts, and connection failures) when links are invalid, enabling the LLM to validate links before including them in responses.

Likely an incorrect or invalid review comment.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2026

Docker image ready for ed04b04 (built in 4m 8s)

⚠️ Warning: does not support ARM (ARM images are built on release only - not on every PR)

Use this tag to pull the image for testing.

📋 Copy commands

⚠️ Temporary images are deleted after 30 days. Copy to a permanent registry before using them:

gcloud auth configure-docker us-central1-docker.pkg.dev
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:ed04b04
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:ed04b04 me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:ed04b04
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:ed04b04

Patch Helm values in one line (choose the chart you use):

HolmesGPT chart:

helm upgrade --install holmesgpt ./helm/holmes \
  --set registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set image=holmes-dev:ed04b04

Robusta wrapper chart:

helm upgrade --install robusta robusta/robusta \
  --reuse-values \
  --set holmes.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set holmes.image=holmes-dev:ed04b04

@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2026

✅ Results of HolmesGPT evals

Automatically triggered by commit 37248e1 on branch codex/linear-mention-rob-129-holmesgpt-update-system-prompt

View workflow logs

Results of HolmesGPT evals

  • ask_holmes: 9/9 test cases were successful, 0 regressions
Status Test case Time Turns Tools Cost
09_crashpod 40.3s ↑15% 6 15 $0.1822
101_loki_historical_logs_pod_deleted 73.8s ↑41% 10 17 $0.2466
111_pod_names_contain_service 47.8s ↑20% 7 15 $0.1951
12_job_crashing 57.7s ↑19% 8 20 $0.2203
162_get_runbooks 59.3s ↑16% 8 17 $0.2533
176_network_policy_blocking_traffic_no_runbooks 50.6s ↑30% 8 18 $0.2208
24_misconfigured_pvc 48.8s ↑24% 8 19 $0.1964
43_current_datetime_from_prompt 3.3s ±0% 1 $0.0629
61_exact_match_counting 11.6s ↑10% 3 3 $0.0891
Total 43.7s avg 6.6 avg 15.5 avg $1.6668

Time/Cost columns show % change vs historical average (↑slower/costlier, ↓faster/cheaper). Changes under 10% shown as ±0%.

Historical Comparison Details

Filter: excluding branch 'codex/linear-mention-rob-129-holmesgpt-update-system-prompt'

Status: Success - 13 test/model combinations loaded

Experiments compared (30):

Comparison indicators:

  • ±0% — diff under 10% (within noise threshold)
  • ↑N%/↓N% — diff 10-25%
  • ↑N%/↓N% — diff over 25% (significant)
📖 Legend
Icon Meaning
The test was successful
The test was skipped
⚠️ The test failed but is known to be flaky or known to fail
🚧 The test had a setup failure (not a code regression)
🔧 The test failed due to mock data issues (not a code regression)
🚫 The test was throttled by API rate limits/overload
The test failed and should be fixed before merging the PR
🔄 Re-run evals manually

⚠️ Warning: Manual re-runs have NO default markers and will run ALL LLM tests (~100+), which can take 1+ hours. Use markers: regression or filter: test_name to limit scope.

Option 1: Comment on this PR with /eval:

/eval
markers: regression

Or with more options (one per line):

/eval
model: gpt-4o
markers: regression
filter: 09_crashpod
iterations: 5

Run evals on a different branch (e.g., master) for comparison:

/eval
branch: master
markers: regression
Option Description
model Model(s) to test (default: same as automatic runs)
markers Pytest markers (no default - runs all tests!)
filter Pytest -k filter
iterations Number of runs, max 10
branch Run evals on a different branch (for cross-branch comparison)

Quick re-run: Use /last to re-run the most recent /eval on this PR with the same parameters.

Option 2: Trigger via GitHub Actions UI → "Run workflow"

🏷️ Valid markers
  • benchmark
  • chain-of-causation
  • compaction
  • context_window
  • coralogix
  • counting
  • database
  • datadog
  • datetime
  • easy
  • embeds
  • grafana-dashboard
  • hard
  • kafka
  • kubernetes
  • leaked-information
  • logs
  • loki
  • medium
  • metrics
  • network
  • newrelic
  • no-cicd
  • numerical
  • one-test
  • port-forward
  • prometheus
  • question-answer
  • regression
  • runbooks
  • slackbot
  • storage
  • toolset-limitation
  • traces
  • transparency
📋 Valid eval names (use with filter)

test_ask_holmes:

  • 01_how_many_pods
  • 02_what_is_wrong_with_pod
  • 03_what_is_the_command_to_port_forward
  • 04_related_k8s_events
  • 05_image_version
  • 06_explain_issue
  • 07_high_latency
  • 08_sock_shop_frontend
  • 09_crashpod
  • 100a_loki_historical_logs
  • 101_loki_historical_logs_pod_deleted
  • 102_loki_label_discovery
  • 102a_loki_logs_transparency
  • 102b_loki_multiple_pods
  • 103_logs_transparency_default_limit
  • 104a_postgres_root_issue
  • 104b_postgres_missing_index_pgstat
  • 104c_postgres_minimal_missing_index
  • 105_redis_wrong_data_structure
  • 107_log_filter_http_status_code
  • 108_logs_nearby_lines
  • 109_logs_transparency_not_found
  • 10_image_pull_backoff
  • 110_cpu_graph_robusta_runner
  • 110_k8s_events_image_pull
  • 111_disabled_datadog_traces
  • 111_pod_names_contain_service
  • 111_tool_hallucination
  • 112_find_pvcs_by_uuid
  • 114_checkout_latency_tracing_rebuild
  • 115_checkout_errors_tracing
  • 117_new_relic_tracing
  • 117b_new_relic_block_embed
  • 118_new_relic_logs
  • 119_new_relic_metrics
  • 11_init_containers
  • 120_new_relic_traces2
  • 121_new_relic_checkout_errors_tracing
  • 122_new_relic_checkout_latency_tracing_rebuild
  • 123_new_relic_checkout_errors_tracing
  • 124_checkout_latency_prometheus
  • 12_job_crashing
  • 13a_pending_node_selector_basic
  • 13b_pending_node_selector_detailed
  • 14_pending_resources
  • 151_disabled_toolsets_fallback_only
  • 156_kafka_opensearch_latency
  • 157_disk_full_statefulset
  • 158_slack_chat_correct_date
  • 159_prometheus_high_cardinality_cpu
  • 15_failed_readiness_probe
  • 160_electricity_market_bidding_bug
  • 160a_cpu_per_namespace_graph
  • 160b_cpu_per_namespace_graph_with_prom_truncation
  • 160c_cpu_per_namespace_graph_with_global_truncation
  • 161_bidding_version_performance
  • 161_conversation_compaction
  • 162_get_runbooks
  • 163_compaction_follow_up
  • 164_datadog_traces_coupon_code
  • 165_alert_with_multiple_runbooks
  • 16_failed_no_toolset_found
  • 173_coralogix_logs
  • 174_coralogix_traces_ad
  • 175_coralogix_metrics_frontend
  • 176_network_policy_blocking_traffic_no_runbooks
  • 177_grafana_home_dashboard
  • 178_grafana_search_dashboard_query
  • 179_grafana_big_dashboard_query
  • 17_oom_kill
  • 180_connectivity_check_tcp
  • 181_connectivity_check_http
  • 182_connectivity_check_http_url
  • 18_oom_kill_from_issues_history
  • 19_detect_missing_app_details
  • 20_long_log_file_search
  • 21_job_fail_curl_no_svc_account
  • 22_high_latency_dbi_down
  • 23_app_error_in_current_logs
  • 24_misconfigured_pvc
  • 25_misconfigured_ingress_class
  • 26_page_render_times
  • 27a_multi_container_logs
  • 27b_multi_container_logs
  • 28_permissions_error
  • 30_basic_promql_graph_cluster_memory
  • 32_basic_promql_graph_pod_cpu
  • 33_cpu_metrics_discovery
  • 34_memory_graph
  • 35_tempo
  • 36_argocd_find_resource
  • 37_argocd_wrong_namespace
  • 38_rabbitmq_split_head
  • 39_failed_toolset
  • 41_setup_argo
  • 42_dns_issues_result_all_tools
  • 42_dns_issues_result_new_tools
  • 42_dns_issues_result_new_tools_no_runbook
  • 42_dns_issues_result_old_tools
  • 42_dns_issues_steps_new_all_tools
  • 42_dns_issues_steps_new_tools
  • 42_dns_issues_steps_old_tools
  • 43_current_datetime_from_prompt
  • 43_slack_deployment_logs
  • 44_slack_statefulset_logs
  • 45_fetch_deployment_logs_simple
  • 46_job_crashing_no_longer_exists
  • 47_truncated_logs_context_window
  • 48_logs_since_thursday
  • 49_logs_since_last_week
  • 50_logs_since_specific_date
  • 50a_logs_since_last_specific_month
  • 51_logs_summarize_errors
  • 52_logs_login_issues
  • 53_logs_find_term
  • 54_azure_sql
  • 54_not_truncated_when_getting_pods
  • 55_kafka_runbook
  • 57_cluster_name_confusion
  • 57_wrong_namespace
  • 58_counting_pods_by_status
  • 59_label_based_counting
  • 60_count_less_than
  • 61_exact_match_counting
  • 62_fetch_error_logs_with_errors
  • 63_fetch_error_logs_no_errors
  • 64_keda_vs_hpa_confusion
  • 65_health_check_followup
  • 66_http_error_needle
  • 67_performance_degradation
  • 68_cascading_failures
  • 69_rate_limit_exhaustion
  • 70_memory_leak_detection
  • 71_connection_pool_starvation
  • 73a_time_window_anomaly
  • 73b_time_window_anomaly
  • 74_config_change_impact
  • 75_network_flapping
  • 76_service_discovery_issue
  • 77_liveness_probe_misconfiguration
  • 78a_missing_cpu_limits
  • 78b_cpu_quota_exceeded
  • 79_configmap_mount_issue
  • 80_pvc_storage_class_mismatch
  • 81_service_account_permission_denied
  • 82_pod_anti_affinity_conflict
  • 83_secret_not_found
  • 84_network_policy_blocking_traffic
  • 85_hpa_not_scaling
  • 86_configmap_like_but_secret
  • 89_runbook_missing_cloudwatch
  • 90_runbook_basic_selection
  • 91a_datadog_metrics_missing_namespace
  • 91b_datadog_metrics_pod_exists
  • 91c_datadog_metrics_deployment
  • 91d_datadog_metrics_historical_pod
  • 91e_datadog_custom_metrics
  • 91f_datadog_logs_historical_pod
  • 91g_datadog_metrics_mismatched_pod
  • 91h_datadog_logs_empty_query_with_url
  • 91i_datadog_metrics_empty_query_with_url
  • 92_cpu_graph_conversation
  • 93_calling_datadog
  • 93_events_since_specific_date
  • 94_runbook_transparency
  • 95_runbook_memory_leak_detection
  • 96_no_matching_runbook
  • 97_logs_clarification_needed
  • 99_logs_transparency_custom_time

test_investigate:

  • 01_oom_kill
  • 02_crashloop_backoff
  • 03_cpu_throttling
  • 04_image_pull_backoff
  • 05_crashpod
  • 06_job_failure
  • 07_job_syntax_error
  • 08_memory_pressure
  • 09_high_latency
  • 10_KubeDeploymentReplicasMismatch
  • 11_KubePodCrashLooping
  • 12_KubePodNotReady
  • 13_Watchdog
  • 14_tempo
  • 15_dns_resolution
  • 16_dns_resolution_no_tool
  • 17_investigate_correct_date

@aantn
Copy link
Collaborator Author

aantn commented Jan 1, 2026

/eval
markers: regression
branch: master

@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2026

@aantn Your eval run has finished. ✅ Completed successfully


🧪 Manual Eval Results

Parameter Value
Triggered via /eval on branch master
Branch master
Model bedrock/eu.anthropic.claude-sonnet-4-5-20250929-v1:0
Markers regression
Iterations 1
Duration 4m 5s
Workflow View logs

Results of HolmesGPT evals (branch: master)

  • ask_holmes: 9/9 test cases were successful, 0 regressions
Status Test case Time Turns Tools Cost
09_crashpod 46.2s ↑26% 7 16 $0.1342
101_loki_historical_logs_pod_deleted 43.5s ↓21% 7 15 $0.1443
111_pod_names_contain_service 37.6s ↓13% 7 15 $0.1215
12_job_crashing 45.6s ↓12% 8 17 $0.1551
162_get_runbooks 42.0s ↓21% 7 17 $0.1610
176_network_policy_blocking_traffic_no_runbooks 29.2s ↓30% 5 12 $0.1015
24_misconfigured_pvc 30.3s ↓24% 6 15 $0.1057
43_current_datetime_from_prompt 3.5s ±0% 1 $0.0086
61_exact_match_counting 10.2s ±0% 3 3 $0.0326
Total 32.0s avg 5.7 avg 13.8 avg $0.9645

Time/Cost columns show % change vs historical average (↑slower/costlier, ↓faster/cheaper). Changes under 10% shown as ±0%.

Historical Comparison Details

Filter: excluding branch 'master'

Status: Success - 10 test/model combinations loaded

Experiments compared (30):

Comparison indicators:

  • ±0% — diff under 10% (within noise threshold)
  • ↑N%/↓N% — diff 10-25%
  • ↑N%/↓N% — diff over 25% (significant)
📖 Legend
Icon Meaning
The test was successful
The test was skipped
⚠️ The test failed but is known to be flaky or known to fail
🚧 The test had a setup failure (not a code regression)
🔧 The test failed due to mock data issues (not a code regression)
🚫 The test was throttled by API rate limits/overload
The test failed and should be fixed before merging the PR
🔄 Re-run evals manually

⚠️ Warning: Manual re-runs have NO default markers and will run ALL LLM tests (~100+), which can take 1+ hours. Use markers: regression or filter: test_name to limit scope.

Option 1: Comment on this PR with /eval:

/eval
markers: regression

Or with more options (one per line):

/eval
model: gpt-4o
markers: regression
filter: 09_crashpod
iterations: 5

Run evals on a different branch (e.g., master) for comparison:

/eval
branch: master
markers: regression
Option Description
model Model(s) to test (default: same as automatic runs)
markers Pytest markers (no default - runs all tests!)
filter Pytest -k filter
iterations Number of runs, max 10
branch Run evals on a different branch (for cross-branch comparison)

Quick re-run: Use /last to re-run the most recent /eval on this PR with the same parameters.

Option 2: Trigger via GitHub Actions UI → "Run workflow"

🏷️ Valid markers
  • benchmark
  • chain-of-causation
  • compaction
  • context_window
  • coralogix
  • counting
  • database
  • datadog
  • datetime
  • easy
  • embeds
  • grafana-dashboard
  • hard
  • kafka
  • kubernetes
  • leaked-information
  • logs
  • loki
  • medium
  • metrics
  • network
  • newrelic
  • no-cicd
  • numerical
  • one-test
  • port-forward
  • prometheus
  • question-answer
  • regression
  • runbooks
  • slackbot
  • storage
  • toolset-limitation
  • traces
  • transparency
📋 Valid eval names (use with filter)

test_ask_holmes:

  • 01_how_many_pods
  • 02_what_is_wrong_with_pod
  • 03_what_is_the_command_to_port_forward
  • 04_related_k8s_events
  • 05_image_version
  • 06_explain_issue
  • 07_high_latency
  • 08_sock_shop_frontend
  • 09_crashpod
  • 100a_loki_historical_logs
  • 101_loki_historical_logs_pod_deleted
  • 102_loki_label_discovery
  • 102a_loki_logs_transparency
  • 102b_loki_multiple_pods
  • 103_logs_transparency_default_limit
  • 104a_postgres_root_issue
  • 104b_postgres_missing_index_pgstat
  • 104c_postgres_minimal_missing_index
  • 105_redis_wrong_data_structure
  • 107_log_filter_http_status_code
  • 108_logs_nearby_lines
  • 109_logs_transparency_not_found
  • 10_image_pull_backoff
  • 110_cpu_graph_robusta_runner
  • 110_k8s_events_image_pull
  • 111_disabled_datadog_traces
  • 111_pod_names_contain_service
  • 111_tool_hallucination
  • 112_find_pvcs_by_uuid
  • 114_checkout_latency_tracing_rebuild
  • 115_checkout_errors_tracing
  • 117_new_relic_tracing
  • 117b_new_relic_block_embed
  • 118_new_relic_logs
  • 119_new_relic_metrics
  • 11_init_containers
  • 120_new_relic_traces2
  • 121_new_relic_checkout_errors_tracing
  • 122_new_relic_checkout_latency_tracing_rebuild
  • 123_new_relic_checkout_errors_tracing
  • 124_checkout_latency_prometheus
  • 12_job_crashing
  • 13a_pending_node_selector_basic
  • 13b_pending_node_selector_detailed
  • 14_pending_resources
  • 151_disabled_toolsets_fallback_only
  • 156_kafka_opensearch_latency
  • 157_disk_full_statefulset
  • 158_slack_chat_correct_date
  • 159_prometheus_high_cardinality_cpu
  • 15_failed_readiness_probe
  • 160_electricity_market_bidding_bug
  • 160a_cpu_per_namespace_graph
  • 160b_cpu_per_namespace_graph_with_prom_truncation
  • 160c_cpu_per_namespace_graph_with_global_truncation
  • 161_bidding_version_performance
  • 161_conversation_compaction
  • 162_get_runbooks
  • 163_compaction_follow_up
  • 164_datadog_traces_coupon_code
  • 165_alert_with_multiple_runbooks
  • 16_failed_no_toolset_found
  • 173_coralogix_logs
  • 174_coralogix_traces_ad
  • 175_coralogix_metrics_frontend
  • 176_network_policy_blocking_traffic_no_runbooks
  • 177_grafana_home_dashboard
  • 178_grafana_search_dashboard_query
  • 179_grafana_big_dashboard_query
  • 17_oom_kill
  • 180_connectivity_check_tcp
  • 181_connectivity_check_http
  • 182_connectivity_check_http_url
  • 18_oom_kill_from_issues_history
  • 19_detect_missing_app_details
  • 20_long_log_file_search
  • 21_job_fail_curl_no_svc_account
  • 22_high_latency_dbi_down
  • 23_app_error_in_current_logs
  • 24_misconfigured_pvc
  • 25_misconfigured_ingress_class
  • 26_page_render_times
  • 27a_multi_container_logs
  • 27b_multi_container_logs
  • 28_permissions_error
  • 30_basic_promql_graph_cluster_memory
  • 32_basic_promql_graph_pod_cpu
  • 33_cpu_metrics_discovery
  • 34_memory_graph
  • 35_tempo
  • 36_argocd_find_resource
  • 37_argocd_wrong_namespace
  • 38_rabbitmq_split_head
  • 39_failed_toolset
  • 41_setup_argo
  • 42_dns_issues_result_all_tools
  • 42_dns_issues_result_new_tools
  • 42_dns_issues_result_new_tools_no_runbook
  • 42_dns_issues_result_old_tools
  • 42_dns_issues_steps_new_all_tools
  • 42_dns_issues_steps_new_tools
  • 42_dns_issues_steps_old_tools
  • 43_current_datetime_from_prompt
  • 43_slack_deployment_logs
  • 44_slack_statefulset_logs
  • 45_fetch_deployment_logs_simple
  • 46_job_crashing_no_longer_exists
  • 47_truncated_logs_context_window
  • 48_logs_since_thursday
  • 49_logs_since_last_week
  • 50_logs_since_specific_date
  • 50a_logs_since_last_specific_month
  • 51_logs_summarize_errors
  • 52_logs_login_issues
  • 53_logs_find_term
  • 54_azure_sql
  • 54_not_truncated_when_getting_pods
  • 55_kafka_runbook
  • 57_cluster_name_confusion
  • 57_wrong_namespace
  • 58_counting_pods_by_status
  • 59_label_based_counting
  • 60_count_less_than
  • 61_exact_match_counting
  • 62_fetch_error_logs_with_errors
  • 63_fetch_error_logs_no_errors
  • 64_keda_vs_hpa_confusion
  • 65_health_check_followup
  • 66_http_error_needle
  • 67_performance_degradation
  • 68_cascading_failures
  • 69_rate_limit_exhaustion
  • 70_memory_leak_detection
  • 71_connection_pool_starvation
  • 73a_time_window_anomaly
  • 73b_time_window_anomaly
  • 74_config_change_impact
  • 75_network_flapping
  • 76_service_discovery_issue
  • 77_liveness_probe_misconfiguration
  • 78a_missing_cpu_limits
  • 78b_cpu_quota_exceeded
  • 79_configmap_mount_issue
  • 80_pvc_storage_class_mismatch
  • 81_service_account_permission_denied
  • 82_pod_anti_affinity_conflict
  • 83_secret_not_found
  • 84_network_policy_blocking_traffic
  • 85_hpa_not_scaling
  • 86_configmap_like_but_secret
  • 89_runbook_missing_cloudwatch
  • 90_runbook_basic_selection
  • 91a_datadog_metrics_missing_namespace
  • 91b_datadog_metrics_pod_exists
  • 91c_datadog_metrics_deployment
  • 91d_datadog_metrics_historical_pod
  • 91e_datadog_custom_metrics
  • 91f_datadog_logs_historical_pod
  • 91g_datadog_metrics_mismatched_pod
  • 91h_datadog_logs_empty_query_with_url
  • 91i_datadog_metrics_empty_query_with_url
  • 92_cpu_graph_conversation
  • 93_calling_datadog
  • 93_events_since_specific_date
  • 94_runbook_transparency
  • 95_runbook_memory_leak_detection
  • 96_no_matching_runbook
  • 97_logs_clarification_needed
  • 99_logs_transparency_custom_time

test_investigate:

  • 01_oom_kill
  • 02_crashloop_backoff
  • 03_cpu_throttling
  • 04_image_pull_backoff
  • 05_crashpod
  • 06_job_failure
  • 07_job_syntax_error
  • 08_memory_pressure
  • 09_high_latency
  • 10_KubeDeploymentReplicasMismatch
  • 11_KubePodCrashLooping
  • 12_KubePodNotReady
  • 13_Watchdog
  • 14_tempo
  • 15_dns_resolution
  • 16_dns_resolution_no_tool
  • 17_investigate_correct_date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants