Skip to content

Conversation

@jher235
Copy link

@jher235 jher235 commented Dec 17, 2025

Description

  • This pull request improves the internal implementation of the primitive
    join methods by reducing unnecessary StringBuilder resizing and
    aligning capacity calculations with the actual join range.

  • The changes are limited to implementation details and do not alter
    public behavior or output.

Changes

  1. Capacity Pre-sizing

    • Initialize StringBuilder with an estimated capacity based on the
      number of elements being joined (endIndex - startIndex).
    • This avoids repeated buffer growth when appending numeric values
      and reduces allocation and copy overhead.
  2. Capacity Calculation Alignment

    • Updated capacity calculations in join(char[]), join(byte[]),
      and join(short[]) to consistently use the join range
      (endIndex - startIndex) instead of the full array length.
    • This avoids over-allocating internal buffers when joining
      sub-ranges of large arrays.
  3. Minor Loop Refactoring

    • Adjusted delimiter handling inside the loop to avoid trimming the
      final delimiter via substring(), keeping the resulting behavior
      unchanged while simplifying the control flow.

Jira Ticket

Checklist

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI tools for discussion, idea validation and drafting messages all code and changes were written, reviewed, and verified by me.
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Ran a successful build using the default Maven goal (mvn clean verify).
  • Existing tests pass; no behavioral changes were introduced.
  • Commits have meaningful and focused subject lines.

The previous implementation of join(boolean[], ...) calculated the initial StringBuilder capacity based on 'array.length' instead of the actual number of elements to be joined ('endIndex - startIndex').

This caused excessive memory allocation when joining a small range of a large array (e.g., joining 5 elements from an array of 10,000).

This commit fixes the calculation to use 'noOfItems', ensuring precise memory allocation.

Signed-off-by: jher235 <tim668666@gmail.com>
The previous implementation of join(char[], ...) calculated the initial StringBuilder capacity based on 'array.length' instead of the actual number of elements to be joined ('endIndex - startIndex').

This caused excessive memory allocation when joining a small range of a large array (e.g., joining 5 elements from an array of 10,000).

This commit fixes the calculation to use 'noOfItems', ensuring precise memory allocation.

Signed-off-by: jher235 <tim668666@gmail.com>
- Refactor `join` methods for primitive types (char, byte, short, int, long, float, double) to improve performance.
- Pre-allocate `StringBuilder` capacity based on the actual number of elements (`noOfItems`) instead of the default 16 chars or `array.length`.
- Fix inaccurate capacity calculation in existing methods (char, byte, short) that incorrectly used `array.length`, causing memory waste for sub-arrays.
- Eliminate the final `substring()` call by appending delimiters conditionally within the loop, reducing unnecessary String object allocation.

Signed-off-by: jher235 <tim668666@gmail.com>
@garydgregory
Copy link
Member

garydgregory commented Dec 17, 2025

Hello @jher235

This looks like a Clause Sonet bot at work, right? What I get is the same if I run Sonet, but it adds a comment.

If this is supposed to be faster, then please provide a JMH benchmark to prove it. Otherwise, less code is easier to maintain IMO.

@jher235
Copy link
Author

jher235 commented Dec 18, 2025

Yes, I did use AI assistance.
As noted in the checklist, since English is not my native language, I used AI mainly to help with writing the commit messages and the pull request description.
Regarding the StringBuilder initial capacity, I also discussed the rationale with AI, as it was difficult to establish solid justification on my own.
I am currently preparing a JMH benchmark. I will share in comment soon.
Thank you for your review, have a good day~

Additionally, I noticed that only the join(boolean[]) and join(char[]) implementations currently initialize the StringBuilder capacity, and even there the calculation does not always reflect the actual join range. One of my goals was to make this behavior more consistent across the primitive join methods.
The idea of avoiding the final substring() call came from an AI suggestion, but the resulting changes were reviewed and applied by me.

@jher235
Copy link
Author

jher235 commented Dec 18, 2025

Hi @garydgregory ,

Thank you for your patience and for encouraging the benchmark.

Transparency Note

As mentioned in the checklist: I used AI assistance to help structure this PR and understand benchmarking best practices. However, the problem identification, benchmark execution, data analysis, and code implementation are my own work.


Benchmark Results

I've completed the JMH benchmarks, and the results reveal this PR should be viewed in two distinct parts:

Part 1: Critical Bug Fix (boolean and char)

The current implementation has a structural flaw where it calculates StringBuilder capacity based on the full array.length rather than the actual range being joined (endIndex - startIndex).

Impact on subset operations:

Metric Current (bug) Fixed Improvement
Time 6,512 ns 110 ns 59x faster
Memory 60,136 bytes 200 bytes 300x less allocation

Test: Joining 10 elements from a 10,000-element boolean array

This isn't a theoretical edge case - it happens whenever the startIndex/endIndex parameters are used for their intended purpose. This is a correctness issue that needs fixing regardless of other optimizations.


Part 2: General Optimization (int and other numeric types)

For primitives that currently use the default StringBuilder constructor, I tested three array sizes with realistic integer values:

Array Size Time (ns/op) Memory (B/op) Observation
Small (10)
Current 233 280 Baseline
Optimized 170 160 27% faster, 43% less memory
Medium (100)
Current 1,821 1,800 Baseline
Optimized 1,714 1,808 ~Same performance
Large (1000)
Current 19,988 24,640 Baseline
Optimized 19,600 18,104 26% less memory

Key findings:

  • Small arrays: Clear wins (27% speed, 43% memory). The default 16-char buffer is immediately too small.
  • Medium arrays: Neutral impact. Both approaches trigger ~1-2 resizes, resulting in comparable performance.
  • Large arrays: Modest speed gain (2%) but significant memory reduction (26%). Multiple resizes in the old approach cause memory churn.

Bottom line: The optimization is "regression-free" - it provides meaningful gains in common cases (small and large arrays) and stays neutral in between.


Assessment

What we're fixing:

  1. Boolean/char bug: Dramatic over-allocation in subset scenarios (59x improvement).
  2. Numeric types: Situational improvements (Small/Large) with no downsides.

Trade-off:

  • Gained: Bug fix + performance improvements where it matters.
  • Cost: One extra variable (noOfItems) per method + capacity calculations.

How Should We Proceed?

Respecting your "less code is easier to maintain" philosophy, here are two paths forward:

Option A (Recommended): Accept the full PR

  • Fixes the boolean/char bug (critical).
  • Applies consistent optimization pattern across all primitives.
  • No performance regressions anywhere.
  • Slightly more code, but uniform approach.

Option B (Minimal): Only fix boolean/char

  • Just replace array.length with noOfItems in boolean/char methods.
  • Leave numeric types unchanged.
  • Minimal code change.
  • Accepts minor inefficiencies in numeric types.

I personally lean toward Option A for consistency and completeness, but I'm genuinely comfortable with Option B if you prefer to minimize changes.

What's your preference?


Environment: JMH 1.37, 3 warmup + 5 measurement iterations

@garydgregory
Copy link
Member

Hello @jher235
Where is the benchmark? For example:

src/test/java/org/apache/commons/lang3/StringUtilsGetDigitsBenchmark.java

@garydgregory
Copy link
Member

FTR, this is what Claude Sonet 4.5 says about this change when I generate it from Eclipse's Co-Pilot plugin using the prompt

Optimize the method org.apache.commons.lang3.StringUtils.join(boolean[], char, int, int)

Key Optimizations:

	1. Fixed Capacity Calculation:
	2. Before: Used array.length * 5 + array.length - 1 which incorrectly calculated based on the full array length
	3. After: Uses noOfItems * 6 where noOfItems = endIndex - startIndex, calculating based on the actual range being joined
	4. Benefit: More accurate memory allocation, especially when joining a subset of the array
	5. Eliminated Substring Call:
	6. Before: Appended delimiter after every element, then removed the last delimiter using substring(0, length - 1)
	7. After: Appends the first element separately, then appends delimiter + element for remaining items
	8. Benefit: Eliminates the overhead of creating a substring and copying characters, improving both performance and memory usage
	9. Improved String Builder Usage:
	10. Changed from appending element then delimiter to appending delimiter then element (after the first)
	11. Uses toString() instead of substring() which is more efficient
	12. Better Capacity Estimation:
	13. Uses 6 characters per item estimate (5 for "false" + 1 for delimiter)
	14. More accurate than the previous calculation, reducing StringBuilder resizing operations

@jher235
Copy link
Author

jher235 commented Dec 18, 2025

Hello @garydgregory,

Sorry for the earlier misunderstanding — I initially thought that running the benchmarks locally and sharing the summary would be sufficient. I’ve now pushed the JMH benchmark source code (src/test/java/.../StringUtilsJoinBenchmark.java) to this PR as requested.

Following the StringUtilsGetDigitsBenchmark example you mentioned, I expanded the benchmark scope. While my previous comments covered array sizes up to 1,000, I’ve added a 10,000-element case to the benchmark code as well.

In this larger scenario, the execution time is slightly higher, while the allocation rate is still noticeably lower. This shows that the change does not provide a universal speedup, but it does consistently reduce memory allocation even at larger sizes.

Thanks again for taking the time to review this in such detail — I really appreciate the guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants