Skip to content

Conversation

@noCharger
Copy link
Collaborator

@noCharger noCharger commented Nov 17, 2025

Description

This PR adds frequent used queries to the big5 workload based on gap analysis between existing benchmarks and frequent used query patterns.

New Benchmark Queries Added:

  • Predicate "Like" Query - LIKE(@message, "%sshd%") with timestamp sorting
  • Predicate Aggregate with Like - Pattern matching with stats aggregation
  • Parsing operations - parse, eval, and cast commands
  • Distinct High Cardinality - dedup on high-cardinality fields

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>
RyanL1997
RyanL1997 previously approved these changes Nov 17, 2025
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add corresponding tests in CalcitePPLBig5IT?

Signed-off-by: Louis Chu <lingzhichu.clz@gmail.com>
@noCharger
Copy link
Collaborator Author

can you add corresponding tests in CalcitePPLBig5IT?

test added

Summary:
asc_sort_timestamp: 12 ms
asc_sort_timestamp_can_match_shortcut: 14 ms
asc_sort_timestamp_no_can_match_shortcut: 12 ms
asc_sort_with_after_timestamp: 9 ms
bin_bins: 17 ms
bin_span_log: 41 ms
bin_span_time: 16 ms
cardinality_agg_high: 9 ms
cardinality_agg_high_2: 9 ms
cardinality_agg_low: 9 ms
coalesce_nonexistent_field_fallback: 18 ms
composite_date_histogram_daily: 14 ms
composite_terms: 76 ms
composite_terms_keyword: 27 ms
date_histogram_hourly_agg: 11 ms
date_histogram_minute_agg: 23 ms
dedup_metrics_size_field: 26 ms
default: 9 ms
desc_sort_timestamp: 10 ms
desc_sort_timestamp_can_match_shortcut: 18 ms
desc_sort_timestamp_no_can_match_shortcut: 13 ms
desc_sort_with_after_timestamp: 9 ms
keyword_in_range: 27 ms
keyword_terms: 13 ms
keyword_terms_low_cardinality: 16 ms
multi_terms_keyword: 44 ms
parse_regex_with_cast_transformation: 14 ms
query_string_on_message: 11 ms
query_string_on_message_filtered: 20 ms
query_string_on_message_filtered_sorted_num: 17 ms
range: 13 ms
range_agg_1: 15 ms
range_agg_2: 8 ms
range_auto_date_histo: 19 ms
range_auto_date_histo_with_metrics: 19 ms
range_field_conjunction_big_range_big_term_query: 11 ms
range_field_conjunction_small_range_big_term_query: 11 ms
range_field_conjunction_small_range_small_term_query: 14 ms
range_field_disjunction_big_range_small_term_query: 15 ms
range_numeric: 9 ms
range_with_asc_sort: 13 ms
range_with_desc_sort: 12 ms
script_engine_like_pattern_with_aggregation: 26 ms
script_engine_like_pattern_with_sort: 14 ms
scroll: 9 ms
sort_keyword_can_match_shortcut: 17 ms
sort_keyword_no_can_match_shortcut: 18 ms
sort_numeric_asc: 11 ms
sort_numeric_asc_with_match: 26 ms
sort_numeric_desc: 10 ms
sort_numeric_desc_with_match: 20 ms
term: 11 ms
terms_significant_1: 17 ms
terms_significant_2: 18 ms
Total 54 queries succeed. Average duration: 17 ms

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

}
*/
source = big5
| parse `log.file.path` '/var/log/(?<logType>\\w+)/(?<filename>\\w+)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use rex command

source = big5
| parse `log.file.path` '/var/log/(?<logType>\\w+)/(?<filename>\\w+)'
| eval filename_len = length(filename)
| fields `log.file.path`, logType, filename, filename_len, `@timestamp`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: backtick is optional

*/
source = big5
| parse `log.file.path` '/var/log/(?<logType>\\w+)/(?<filename>\\w+)'
| eval filename_len = length(filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_regex_with_cast_transformation.ppl

cast is not accurate, remove it.

"query": {
"script": {
"script": {
"source": "{\"langType\":\"calcite\",\"script\":\"...\"}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is message field type?

@@ -0,0 +1,37 @@
/*
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query body should include size:10?

@aalva500-prog
Copy link
Contributor

This one can be closed, new PR created #4976.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants