Add Ask Holmes response principles #1299

aantn · 2026-01-01T13:41:24Z

Summary

add dedicated Ask Holmes response principles covering ranking, links, and tone
include the new principles in ask prompts and align style guidance to remove conflicting terse advice

Testing

Not run (not requested)

Eval Guidance

Recommended: /eval make test-llm-ask-holmes to validate ask-holmes prompt behavior

Codex Task

Summary by CodeRabbit

Improvements
- Enhanced response quality with more thoughtful, evaluative analysis
- Responses now include ranked alternatives with brief rationale and validated reference links
- Improved clarity and readability through refined formatting and structure
- Responses maintain conciseness while preserving essential reasoning and evidence

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Codex <codex@openai.com>

coderabbitai · 2026-01-01T13:41:36Z

Walkthrough

This change introduces a shared response principles template and refactors prompt guidance across multiple Holmes prompt templates. A new _ask_holmes_response_principles.jinja2 file establishes common response guidelines, which is then included in three existing prompt templates (generic_ask.jinja2, generic_ask_conversation.jinja2, and generic_ask_for_issue_conversation.jinja2). The style guide bullets are updated to emphasize preserving essential reasoning, ranking alternatives, and linking evidence while remaining concise.

Changes

Cohort / File(s)	Summary
New Response Principles Template `holmes/plugins/prompts/_ask_holmes_response_principles.jinja2`	Introduces shared response principles guiding directness, skeptical evaluation, thoroughness in ranking alternatives, conciseness, link validation, and self-checking against user questions.
Generic Ask Prompt Templates `holmes/plugins/prompts/generic_ask.jinja2`, `holmes/plugins/prompts/generic_ask_conversation.jinja2`, `holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2`	All three files now include the shared principles template. Style guide bullets updated to shift from terse/minimal guidance toward preserving essential reasoning and evidence while providing ranked alternatives with rationale and links.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

chore: Add ai safety prompt to system prompt #823: Modifies the same prompt templates by adding and including new prompt partials.
Fix interactive mode so it can answer without follow up questions #546: Updates response guidance in generic_ask.jinja2 instruction bullets.

Suggested reviewers

mainred
moshemorad

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the main change: introducing new response principles for Ask Holmes. It accurately reflects the primary focus of the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d7448a and 37248e1.

📒 Files selected for processing (4)

holmes/plugins/prompts/_ask_holmes_response_principles.jinja2
holmes/plugins/prompts/generic_ask.jinja2
holmes/plugins/prompts/generic_ask_conversation.jinja2
holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2

🧰 Additional context used

📓 Path-based instructions (1)

holmes/plugins/prompts/**/*.jinja2

📄 CodeRabbit inference engine (CLAUDE.md)

Prompts: organize as holmes/plugins/prompts/{name}.jinja2

Files:

holmes/plugins/prompts/_ask_holmes_response_principles.jinja2
holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2
holmes/plugins/prompts/generic_ask_conversation.jinja2
holmes/plugins/prompts/generic_ask.jinja2

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: Applies to holmes/plugins/prompts/**/*.jinja2 : Prompts: organize as `holmes/plugins/prompts/{name}.jinja2`

📚 Learning: 2025-12-29T08:35:37.668Z

Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: Applies to holmes/plugins/prompts/**/*.jinja2 : Prompts: organize as `holmes/plugins/prompts/{name}.jinja2`

Applied to files:

holmes/plugins/prompts/_ask_holmes_response_principles.jinja2
holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2
holmes/plugins/prompts/generic_ask_conversation.jinja2
holmes/plugins/prompts/generic_ask.jinja2

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: build
GitHub Check: llm_evals
GitHub Check: build (3.12)
GitHub Check: build (3.10)
GitHub Check: build (3.11)

🔇 Additional comments (9)

holmes/plugins/prompts/generic_ask.jinja2 (2)

14-14: LGTM! Shared principles template included correctly.

The include statement correctly references the new shared response principles template using the underscore prefix convention for partial templates.

18-20: Style guide updates align well with new response principles.

The updated style guide bullets now emphasize:

Preserving essential reasoning and evidence (vs. previous terse approach)

Ranking alternatives with rationale and links (new guidance)

Avoiding unnecessary repetition (refined from previous wording)

This creates better alignment between the shared principles and template-specific style guidance.

holmes/plugins/prompts/generic_ask_conversation.jinja2 (2)

11-11: LGTM! Consistent include placement.

The shared response principles are correctly included after general instructions, maintaining consistency with the other ask templates.

15-17: LGTM! Style guide consistency maintained.

The style guide updates are identical to those in generic_ask.jinja2, ensuring consistent guidance across all ask templates.

holmes/plugins/prompts/generic_ask_for_issue_conversation.jinja2 (2)

27-28: LGTM! Include placement appropriate for this template's structure.

The shared response principles are included after the conversation history section, which is appropriate given this template's unique structure that front-loads investigation context.

34-36: LGTM! Style guide consistency maintained across all templates.

The style guide updates match those in the other ask templates, ensuring uniform guidance throughout.

holmes/plugins/prompts/_ask_holmes_response_principles.jinja2 (3)

1-9: LGTM! Strong foundational principles established.

The response principles provide clear, actionable guidance that promotes:

Direct, honest communication

Critical thinking and evaluation

Thoroughness with structured ranking

Conciseness and efficiency

These principles create a solid foundation for consistent "Ask Holmes" responses.

11-12: LGTM! Quality control and readability guidance.

The self-check instruction promotes accuracy, while bold formatting guidance enhances readability. Both are practical and beneficial.

10-10: Remove the suggestion to revise the instruction about testing links.

Holmes provides link validation capabilities through the internet toolset's fetch_webpage tool, which the LLM can call to verify that URLs are accessible and return expected content. The instruction to "Test the links for validity and correctness before returning them" is reasonable and implementable.

The fetch_webpage tool makes HTTP requests and returns error messages (including HTTP status errors, timeouts, and connection failures) when links are invalid, enabling the LLM to validate links before including them in responses.

Likely an incorrect or invalid review comment.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-01T13:41:38Z

✅ Docker image ready for ed04b04 (built in 4m 8s)

us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:ed04b04

⚠️ Warning: does not support ARM (ARM images are built on release only - not on every PR)

Use this tag to pull the image for testing.

📋 Copy commands

⚠️ Temporary images are deleted after 30 days. Copy to a permanent registry before using them:

gcloud auth configure-docker us-central1-docker.pkg.dev
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:ed04b04
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:ed04b04 me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:ed04b04
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:ed04b04

Patch Helm values in one line (choose the chart you use):

HolmesGPT chart:

helm upgrade --install holmesgpt ./helm/holmes \
  --set registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set image=holmes-dev:ed04b04

Robusta wrapper chart:

helm upgrade --install robusta robusta/robusta \
  --reuse-values \
  --set holmes.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set holmes.image=holmes-dev:ed04b04

github-actions · 2026-01-01T13:49:35Z

✅ Results of HolmesGPT evals

Automatically triggered by commit 37248e1 on branch codex/linear-mention-rob-129-holmesgpt-update-system-prompt

View workflow logs

Results of HolmesGPT evals

ask_holmes: 9/9 test cases were successful, 0 regressions

Status	Test case	Time	Turns	Tools	Cost
✅	09_crashpod	40.3s ↑15%	6	15	$0.1822
✅	101_loki_historical_logs_pod_deleted	73.8s ↑41%	10	17	$0.2466
✅	111_pod_names_contain_service	47.8s ↑20%	7	15	$0.1951
✅	12_job_crashing	57.7s ↑19%	8	20	$0.2203
✅	162_get_runbooks	59.3s ↑16%	8	17	$0.2533
✅	176_network_policy_blocking_traffic_no_runbooks	50.6s ↑30%	8	18	$0.2208
✅	24_misconfigured_pvc	48.8s ↑24%	8	19	$0.1964
✅	43_current_datetime_from_prompt	3.3s ±0%	1	—	$0.0629
✅	61_exact_match_counting	11.6s ↑10%	3	3	$0.0891
	Total	43.7s avg	6.6 avg	15.5 avg	$1.6668

Time/Cost columns show % change vs historical average (↑slower/costlier, ↓faster/cheaper). Changes under 10% shown as ±0%.

Historical Comparison Details

Filter: excluding branch 'codex/linear-mention-rob-129-holmesgpt-update-system-prompt'

Status: Success - 13 test/model combinations loaded

Experiments compared (30):

github-20639543222.1462.1 (branch: codex/linear-mention-rob-39-update-grafana-tool-to-handle-large-n2m92v)
github-20639526759.1461.1 (branch: master)
github-20639409982.1459.1 (branch: focused-benchmarks)
...and 27 more

Comparison indicators:

±0% — diff under 10% (within noise threshold)
↑N%/↓N% — diff 10-25%
↑N%/↓N% — diff over 25% (significant)

📖 Legend

Icon	Meaning
✅	The test was successful
➖	The test was skipped
⚠️	The test failed but is known to be flaky or known to fail
🚧	The test had a setup failure (not a code regression)
🔧	The test failed due to mock data issues (not a code regression)
🚫	The test was throttled by API rate limits/overload
❌	The test failed and should be fixed before merging the PR

🔄 Re-run evals manually

⚠️ Warning: Manual re-runs have NO default markers and will run ALL LLM tests (~100+), which can take 1+ hours. Use markers: regression or filter: test_name to limit scope.

Option 1: Comment on this PR with /eval:

/eval
markers: regression

Or with more options (one per line):

/eval
model: gpt-4o
markers: regression
filter: 09_crashpod
iterations: 5

Run evals on a different branch (e.g., master) for comparison:

/eval
branch: master
markers: regression

Option	Description
`model`	Model(s) to test (default: same as automatic runs)
`markers`	Pytest markers (no default - runs all tests!)
`filter`	Pytest -k filter
`iterations`	Number of runs, max 10
`branch`	Run evals on a different branch (for cross-branch comparison)

Quick re-run: Use /last to re-run the most recent /eval on this PR with the same parameters.

Option 2: Trigger via GitHub Actions UI → "Run workflow"

🏷️ Valid markers

benchmark
chain-of-causation
compaction
context_window
coralogix
counting
database
datadog
datetime
easy
embeds
grafana-dashboard
hard
kafka
kubernetes
leaked-information
logs
loki
medium
metrics
network
newrelic
no-cicd
numerical
one-test
port-forward
prometheus
question-answer
regression
runbooks
slackbot
storage
toolset-limitation
traces
transparency

📋 Valid eval names (use with filter)

test_ask_holmes:

01_how_many_pods
02_what_is_wrong_with_pod
03_what_is_the_command_to_port_forward
04_related_k8s_events
05_image_version
06_explain_issue
07_high_latency
08_sock_shop_frontend
09_crashpod
100a_loki_historical_logs
101_loki_historical_logs_pod_deleted
102_loki_label_discovery
102a_loki_logs_transparency
102b_loki_multiple_pods
103_logs_transparency_default_limit
104a_postgres_root_issue
104b_postgres_missing_index_pgstat
104c_postgres_minimal_missing_index
105_redis_wrong_data_structure
107_log_filter_http_status_code
108_logs_nearby_lines
109_logs_transparency_not_found
10_image_pull_backoff
110_cpu_graph_robusta_runner
110_k8s_events_image_pull
111_disabled_datadog_traces
111_pod_names_contain_service
111_tool_hallucination
112_find_pvcs_by_uuid
114_checkout_latency_tracing_rebuild
115_checkout_errors_tracing
117_new_relic_tracing
117b_new_relic_block_embed
118_new_relic_logs
119_new_relic_metrics
11_init_containers
120_new_relic_traces2
121_new_relic_checkout_errors_tracing
122_new_relic_checkout_latency_tracing_rebuild
123_new_relic_checkout_errors_tracing
124_checkout_latency_prometheus
12_job_crashing
13a_pending_node_selector_basic
13b_pending_node_selector_detailed
14_pending_resources
151_disabled_toolsets_fallback_only
156_kafka_opensearch_latency
157_disk_full_statefulset
158_slack_chat_correct_date
159_prometheus_high_cardinality_cpu
15_failed_readiness_probe
160_electricity_market_bidding_bug
160a_cpu_per_namespace_graph
160b_cpu_per_namespace_graph_with_prom_truncation
160c_cpu_per_namespace_graph_with_global_truncation
161_bidding_version_performance
161_conversation_compaction
162_get_runbooks
163_compaction_follow_up
164_datadog_traces_coupon_code
165_alert_with_multiple_runbooks
16_failed_no_toolset_found
173_coralogix_logs
174_coralogix_traces_ad
175_coralogix_metrics_frontend
176_network_policy_blocking_traffic_no_runbooks
177_grafana_home_dashboard
178_grafana_search_dashboard_query
179_grafana_big_dashboard_query
17_oom_kill
180_connectivity_check_tcp
181_connectivity_check_http
182_connectivity_check_http_url
18_oom_kill_from_issues_history
19_detect_missing_app_details
20_long_log_file_search
21_job_fail_curl_no_svc_account
22_high_latency_dbi_down
23_app_error_in_current_logs
24_misconfigured_pvc
25_misconfigured_ingress_class
26_page_render_times
27a_multi_container_logs
27b_multi_container_logs
28_permissions_error
30_basic_promql_graph_cluster_memory
32_basic_promql_graph_pod_cpu
33_cpu_metrics_discovery
34_memory_graph
35_tempo
36_argocd_find_resource
37_argocd_wrong_namespace
38_rabbitmq_split_head
39_failed_toolset
41_setup_argo
42_dns_issues_result_all_tools
42_dns_issues_result_new_tools
42_dns_issues_result_new_tools_no_runbook
42_dns_issues_result_old_tools
42_dns_issues_steps_new_all_tools
42_dns_issues_steps_new_tools
42_dns_issues_steps_old_tools
43_current_datetime_from_prompt
43_slack_deployment_logs
44_slack_statefulset_logs
45_fetch_deployment_logs_simple
46_job_crashing_no_longer_exists
47_truncated_logs_context_window
48_logs_since_thursday
49_logs_since_last_week
50_logs_since_specific_date
50a_logs_since_last_specific_month
51_logs_summarize_errors
52_logs_login_issues
53_logs_find_term
54_azure_sql
54_not_truncated_when_getting_pods
55_kafka_runbook
57_cluster_name_confusion
57_wrong_namespace
58_counting_pods_by_status
59_label_based_counting
60_count_less_than
61_exact_match_counting
62_fetch_error_logs_with_errors
63_fetch_error_logs_no_errors
64_keda_vs_hpa_confusion
65_health_check_followup
66_http_error_needle
67_performance_degradation
68_cascading_failures
69_rate_limit_exhaustion
70_memory_leak_detection
71_connection_pool_starvation
73a_time_window_anomaly
73b_time_window_anomaly
74_config_change_impact
75_network_flapping
76_service_discovery_issue
77_liveness_probe_misconfiguration
78a_missing_cpu_limits
78b_cpu_quota_exceeded
79_configmap_mount_issue
80_pvc_storage_class_mismatch
81_service_account_permission_denied
82_pod_anti_affinity_conflict
83_secret_not_found
84_network_policy_blocking_traffic
85_hpa_not_scaling
86_configmap_like_but_secret
89_runbook_missing_cloudwatch
90_runbook_basic_selection
91a_datadog_metrics_missing_namespace
91b_datadog_metrics_pod_exists
91c_datadog_metrics_deployment
91d_datadog_metrics_historical_pod
91e_datadog_custom_metrics
91f_datadog_logs_historical_pod
91g_datadog_metrics_mismatched_pod
91h_datadog_logs_empty_query_with_url
91i_datadog_metrics_empty_query_with_url
92_cpu_graph_conversation
93_calling_datadog
93_events_since_specific_date
94_runbook_transparency
95_runbook_memory_leak_detection
96_no_matching_runbook
97_logs_clarification_needed
99_logs_transparency_custom_time

test_investigate:

01_oom_kill
02_crashloop_backoff
03_cpu_throttling
04_image_pull_backoff
05_crashpod
06_job_failure
07_job_syntax_error
08_memory_pressure
09_high_latency
10_KubeDeploymentReplicasMismatch
11_KubePodCrashLooping
12_KubePodNotReady
13_Watchdog
14_tempo
15_dns_resolution
16_dns_resolution_no_tool
17_investigate_correct_date

aantn · 2026-01-01T13:57:37Z

/eval
markers: regression
branch: master

github-actions · 2026-01-01T14:05:10Z

@aantn Your eval run has finished. ✅ Completed successfully

🧪 Manual Eval Results

Parameter	Value
Triggered via	/eval on branch `master`
Branch	`master`
Model	`bedrock/eu.anthropic.claude-sonnet-4-5-20250929-v1:0`
Markers	`regression`
Iterations	1
Duration	4m 5s
Workflow	View logs

Results of HolmesGPT evals (branch: `master`)

ask_holmes: 9/9 test cases were successful, 0 regressions

Status	Test case	Time	Turns	Tools	Cost
✅	09_crashpod	46.2s ↑26%	7	16	$0.1342
✅	101_loki_historical_logs_pod_deleted	43.5s ↓21%	7	15	$0.1443
✅	111_pod_names_contain_service	37.6s ↓13%	7	15	$0.1215
✅	12_job_crashing	45.6s ↓12%	8	17	$0.1551
✅	162_get_runbooks	42.0s ↓21%	7	17	$0.1610
✅	176_network_policy_blocking_traffic_no_runbooks	29.2s ↓30%	5	12	$0.1015
✅	24_misconfigured_pvc	30.3s ↓24%	6	15	$0.1057
✅	43_current_datetime_from_prompt	3.5s ±0%	1	—	$0.0086
✅	61_exact_match_counting	10.2s ±0%	3	3	$0.0326
	Total	32.0s avg	5.7 avg	13.8 avg	$0.9645

Time/Cost columns show % change vs historical average (↑slower/costlier, ↓faster/cheaper). Changes under 10% shown as ±0%.

Historical Comparison Details

Filter: excluding branch 'master'

Status: Success - 10 test/model combinations loaded

Experiments compared (30):

github-20639554704.1463.1 (branch: codex/linear-mention-rob-129-holmesgpt-update-system-prompt)
github-20639543222.1462.1 (branch: codex/linear-mention-rob-39-update-grafana-tool-to-handle-large-n2m92v)
github-20639409982.1459.1 (branch: focused-benchmarks)
...and 27 more

Comparison indicators:

±0% — diff under 10% (within noise threshold)
↑N%/↓N% — diff 10-25%
↑N%/↓N% — diff over 25% (significant)

📖 Legend

Icon	Meaning
✅	The test was successful
➖	The test was skipped
⚠️	The test failed but is known to be flaky or known to fail
🚧	The test had a setup failure (not a code regression)
🔧	The test failed due to mock data issues (not a code regression)
🚫	The test was throttled by API rate limits/overload
❌	The test failed and should be fixed before merging the PR

🔄 Re-run evals manually

⚠️ Warning: Manual re-runs have NO default markers and will run ALL LLM tests (~100+), which can take 1+ hours. Use markers: regression or filter: test_name to limit scope.

Option 1: Comment on this PR with /eval:

/eval
markers: regression

Or with more options (one per line):

/eval
model: gpt-4o
markers: regression
filter: 09_crashpod
iterations: 5

Run evals on a different branch (e.g., master) for comparison:

/eval
branch: master
markers: regression

Option	Description
`model`	Model(s) to test (default: same as automatic runs)
`markers`	Pytest markers (no default - runs all tests!)
`filter`	Pytest -k filter
`iterations`	Number of runs, max 10
`branch`	Run evals on a different branch (for cross-branch comparison)

Quick re-run: Use /last to re-run the most recent /eval on this PR with the same parameters.

Option 2: Trigger via GitHub Actions UI → "Run workflow"

🏷️ Valid markers

benchmark
chain-of-causation
compaction
context_window
coralogix
counting
database
datadog
datetime
easy
embeds
grafana-dashboard
hard
kafka
kubernetes
leaked-information
logs
loki
medium
metrics
network
newrelic
no-cicd
numerical
one-test
port-forward
prometheus
question-answer
regression
runbooks
slackbot
storage
toolset-limitation
traces
transparency

📋 Valid eval names (use with filter)

test_ask_holmes:

01_how_many_pods
02_what_is_wrong_with_pod
03_what_is_the_command_to_port_forward
04_related_k8s_events
05_image_version
06_explain_issue
07_high_latency
08_sock_shop_frontend
09_crashpod
100a_loki_historical_logs
101_loki_historical_logs_pod_deleted
102_loki_label_discovery
102a_loki_logs_transparency
102b_loki_multiple_pods
103_logs_transparency_default_limit
104a_postgres_root_issue
104b_postgres_missing_index_pgstat
104c_postgres_minimal_missing_index
105_redis_wrong_data_structure
107_log_filter_http_status_code
108_logs_nearby_lines
109_logs_transparency_not_found
10_image_pull_backoff
110_cpu_graph_robusta_runner
110_k8s_events_image_pull
111_disabled_datadog_traces
111_pod_names_contain_service
111_tool_hallucination
112_find_pvcs_by_uuid
114_checkout_latency_tracing_rebuild
115_checkout_errors_tracing
117_new_relic_tracing
117b_new_relic_block_embed
118_new_relic_logs
119_new_relic_metrics
11_init_containers
120_new_relic_traces2
121_new_relic_checkout_errors_tracing
122_new_relic_checkout_latency_tracing_rebuild
123_new_relic_checkout_errors_tracing
124_checkout_latency_prometheus
12_job_crashing
13a_pending_node_selector_basic
13b_pending_node_selector_detailed
14_pending_resources
151_disabled_toolsets_fallback_only
156_kafka_opensearch_latency
157_disk_full_statefulset
158_slack_chat_correct_date
159_prometheus_high_cardinality_cpu
15_failed_readiness_probe
160_electricity_market_bidding_bug
160a_cpu_per_namespace_graph
160b_cpu_per_namespace_graph_with_prom_truncation
160c_cpu_per_namespace_graph_with_global_truncation
161_bidding_version_performance
161_conversation_compaction
162_get_runbooks
163_compaction_follow_up
164_datadog_traces_coupon_code
165_alert_with_multiple_runbooks
16_failed_no_toolset_found
173_coralogix_logs
174_coralogix_traces_ad
175_coralogix_metrics_frontend
176_network_policy_blocking_traffic_no_runbooks
177_grafana_home_dashboard
178_grafana_search_dashboard_query
179_grafana_big_dashboard_query
17_oom_kill
180_connectivity_check_tcp
181_connectivity_check_http
182_connectivity_check_http_url
18_oom_kill_from_issues_history
19_detect_missing_app_details
20_long_log_file_search
21_job_fail_curl_no_svc_account
22_high_latency_dbi_down
23_app_error_in_current_logs
24_misconfigured_pvc
25_misconfigured_ingress_class
26_page_render_times
27a_multi_container_logs
27b_multi_container_logs
28_permissions_error
30_basic_promql_graph_cluster_memory
32_basic_promql_graph_pod_cpu
33_cpu_metrics_discovery
34_memory_graph
35_tempo
36_argocd_find_resource
37_argocd_wrong_namespace
38_rabbitmq_split_head
39_failed_toolset
41_setup_argo
42_dns_issues_result_all_tools
42_dns_issues_result_new_tools
42_dns_issues_result_new_tools_no_runbook
42_dns_issues_result_old_tools
42_dns_issues_steps_new_all_tools
42_dns_issues_steps_new_tools
42_dns_issues_steps_old_tools
43_current_datetime_from_prompt
43_slack_deployment_logs
44_slack_statefulset_logs
45_fetch_deployment_logs_simple
46_job_crashing_no_longer_exists
47_truncated_logs_context_window
48_logs_since_thursday
49_logs_since_last_week
50_logs_since_specific_date
50a_logs_since_last_specific_month
51_logs_summarize_errors
52_logs_login_issues
53_logs_find_term
54_azure_sql
54_not_truncated_when_getting_pods
55_kafka_runbook
57_cluster_name_confusion
57_wrong_namespace
58_counting_pods_by_status
59_label_based_counting
60_count_less_than
61_exact_match_counting
62_fetch_error_logs_with_errors
63_fetch_error_logs_no_errors
64_keda_vs_hpa_confusion
65_health_check_followup
66_http_error_needle
67_performance_degradation
68_cascading_failures
69_rate_limit_exhaustion
70_memory_leak_detection
71_connection_pool_starvation
73a_time_window_anomaly
73b_time_window_anomaly
74_config_change_impact
75_network_flapping
76_service_discovery_issue
77_liveness_probe_misconfiguration
78a_missing_cpu_limits
78b_cpu_quota_exceeded
79_configmap_mount_issue
80_pvc_storage_class_mismatch
81_service_account_permission_denied
82_pod_anti_affinity_conflict
83_secret_not_found
84_network_policy_blocking_traffic
85_hpa_not_scaling
86_configmap_like_but_secret
89_runbook_missing_cloudwatch
90_runbook_basic_selection
91a_datadog_metrics_missing_namespace
91b_datadog_metrics_pod_exists
91c_datadog_metrics_deployment
91d_datadog_metrics_historical_pod
91e_datadog_custom_metrics
91f_datadog_logs_historical_pod
91g_datadog_metrics_mismatched_pod
91h_datadog_logs_empty_query_with_url
91i_datadog_metrics_empty_query_with_url
92_cpu_graph_conversation
93_calling_datadog
93_events_since_specific_date
94_runbook_transparency
95_runbook_memory_leak_detection
96_no_matching_runbook
97_logs_clarification_needed
99_logs_transparency_custom_time

test_investigate:

01_oom_kill
02_crashloop_backoff
03_cpu_throttling
04_image_pull_backoff
05_crashpod
06_job_failure
07_job_syntax_error
08_memory_pressure
09_high_latency
10_KubeDeploymentReplicasMismatch
11_KubePodCrashLooping
12_KubePodNotReady
13_Watchdog
14_tempo
15_dns_resolution
16_dns_resolution_no_tool
17_investigate_correct_date

Add Ask Holmes response principles

37248e1

Signed-off-by: Codex <codex@openai.com>

aantn added the codex label Jan 1, 2026 — with ChatGPT Codex Connector

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Ask Holmes response principles #1299

Add Ask Holmes response principles #1299

Uh oh!

aantn commented Jan 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 1, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 1, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 1, 2026

Uh oh!

aantn commented Jan 1, 2026

Uh oh!

github-actions bot commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Ask Holmes response principles #1299

Are you sure you want to change the base?

Add Ask Holmes response principles #1299

Uh oh!

Conversation

aantn commented Jan 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Eval Guidance

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks

Uh oh!

github-actions bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 1, 2026

✅ Results of HolmesGPT evals

Results of HolmesGPT evals

Uh oh!

aantn commented Jan 1, 2026

Uh oh!

github-actions bot commented Jan 1, 2026

🧪 Manual Eval Results

Results of HolmesGPT evals (branch: master)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aantn commented Jan 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 1, 2026 •

edited

Loading

github-actions bot commented Jan 1, 2026 •

edited

Loading

Results of HolmesGPT evals (branch: `master`)