-
Notifications
You must be signed in to change notification settings - Fork 260
chore: Add microbenchmark for casting string to numeric #2979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@coderfender Could you review when you have time |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2979 +/- ##
============================================
+ Coverage 56.12% 59.64% +3.51%
- Complexity 976 1375 +399
============================================
Files 119 167 +48
Lines 11743 15497 +3754
Branches 2251 2569 +318
============================================
+ Hits 6591 9243 +2652
- Misses 4012 4955 +943
- Partials 1140 1299 +159 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
coderfender
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments but LGTM @andygrove
| * `SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometCastStringToNumericBenchmark` | ||
| * Results will be written to "spark/benchmarks/CometCastStringToNumericBenchmark-**results.txt". | ||
| */ | ||
| // spotless:on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andygrove nit : perhaps we might dont need to turn off spotless given that all other benchmarks dont ?
Example comment form CometCastBenchmark
/**
* Benchmark to measure Comet execution performance. To run this benchmark:
* {{{
* SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometCastBenchmark
* }}}
*
* Results will be written to "spark/benchmarks/CometCastBenchmark-**results.txt".
*/
|
|
||
| private val castFunctions = Seq("CAST", "TRY_CAST") | ||
| private val targetTypes = | ||
| Seq("BOOLEAN", "BYTE", "SHORT", "INT", "LONG", "FLOAT", "DOUBLE", "DECIMAL(10,2)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : Perhaps one could argue that Boolean isn't necessarily a numeric input ? Also , could we add some more higher precision + scale decimals too ?
| s"SELECT $castFunc(c1 AS $targetType) FROM parquetV1Table", | ||
| Map( | ||
| SQLConf.ANSI_ENABLED.key -> "false", | ||
| CometConf.getExprAllowIncompatConfigKey(classOf[Cast]) -> "true")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we could handle the cast compatibility on a case by case basis which would help us better evaluate custom implementations vs datafusion supported operation along with unsupported cast ops . This can be done in a follow up PR (I will file an issue once this is merged)
Which issue does this PR close?
Part of #2955
Rationale for this change
Add benchmarks for casting strings to numeric types using both
castandtry_cast. ANSI mode is not explicitly enabled because we would either have to ensure that all input values are valid (which would not be benchmarking the validation logic fully) or we would just be timing how long it takes to hit the first invalid value and throw an exception, which is not helpful. Testingtry_castinstead allows us to test the overhead of the validation logic.What changes are included in this PR?
How are these changes tested?