perf(hesai): optimize convert_returns hot path #389

k1832 · 2026-01-06T04:22:04Z

PR Type

Improvement

Description

Optimize Hesai decoder hot path (convert_returns) to reduce point cloud processing latency by ~6%.

Changes

Eliminate per-channel heap allocation
- Pre-allocate return_units_buf_ with reserve() in constructor
- Reuse buffer via resize() (O(1) when capacity is sufficient)
Cache packet-level values
- dis_unit (distance unit multiplier)
- return_mode (avoid repeated static_cast)
- dual_return_distance_threshold, min_range, max_range
- is_inside_overlap result
Pre-compute distances
- Store in distances_buf_[] to avoid redundant calculations in inner loop
Add branch prediction hints
- Use __builtin_expect(..., 0) for unlikely conditions (zero distance, out-of-range)

Performance Measurement

Measured via bpftrace on unpack function with PandarXT32 rosbag playback:

~6% reduction in function execution time

Benchmark methodology

Terminal 1 - Launch driver:

ros2 launch nebula_ros nebula_launch.py sensor_model:=PandarXT32 launch_hw:=false

Terminal 2 - Play rosbag (10 iterations):

for i in {1..10}; do \
  ros2 bag play /path/to/nebula/nebula_tests/data/hesai/xt32/1673400677802009732 \
  sleep 1 \
done

Terminal 3 - bpftrace measurement:

LIB="/path/to/install/nebula_decoders/lib/libnebula_decoders_hesai.so"
sudo bpftrace -e "
uprobe:$LIB:*unpack* {
    @start[tid] = nsecs;
}
uretprobe:$LIB:*unpack* /@start[tid]/ {
    \$ns = nsecs - @start[tid];
    @stats_ns[func] = stats(\$ns);
    @dist_ns[func] = hist(\$ns);
    delete(@start[tid]);
}
"

Review Procedure

Review changes in hesai_decoder.hpp (main optimization)
Review minor changes in hesai_sensor.hpp (caching in helper functions)
Verify all unit tests pass

Remarks

This optimization only affects real hardware or rosbag replay with raw packets (/pandar_packets)
Does not affect planning_simulator or logging_simulator (they use pre-decoded point clouds)

Signed-off-by: Keita Morisaki <kmta1236@gmail.com>

mojomex

Hi, thank you for the PR!
I realize that it's still in draft phase, but I've left some comments anyway just to align with you.

I think a 6% performance improvement is sizeable and we are happy to take it!

However, there is quite a lot of boilerplate code that comes from it, so I'd be interested in finding out where the bulk of the 6% comes from. I expect that things like __builtin_expect and some of the caching can likely be excluded while keeping almost all of the perforamnce gains.

Let me know what you think!

mojomex · 2026-01-06T04:45:49Z

nebula_decoders/include/nebula_decoders/nebula_decoders_hesai/decoders/hesai_decoder.hpp

+  // ============ PERFORMANCE OPTIMIZATION: Pre-allocated buffers ============
+  /// @brief Pre-allocated return unit pointers to avoid heap allocation per channel
+  std::vector<const typename SensorT::packet_t::body_t::block_t::unit_t *> return_units_buf_;
+  /// @brief Pre-computed distances for current return group
+  std::array<float, SensorT::packet_t::max_returns> distances_buf_;
+  // =========================================================================


I'd like to avoid adding even more member variables as the decoder is already getting quite large.

As std::array is stack-allocated, it can be moved into convert_returns with no performance overhead.

The std::vector could be replaced by a boost::static_vector and moved to convert_returns as well.

mojomex · 2026-01-06T04:47:04Z

nebula_decoders/include/nebula_decoders/nebula_decoders_hesai/decoders/hesai_decoder.hpp

+#include <cstdint>
+#include <memory>
+#include <optional>
+#include <span>


Spans are a C++20 feature while the project is officially still C++17.
I would love to move to C++20 or 23 but we need to ensure that ROS2 buildfarm can handle that. We are planning buildfarm release soon.

mojomex · 2026-01-06T04:49:52Z

nebula_decoders/include/nebula_decoders/nebula_decoders_hesai/decoders/hesai_decoder.hpp

+
+    // Pre-fetch block pointers for this return group
+    const auto * const blocks = &packet_.body.blocks[start_block_id];
+


get_timestamp_ns, which uses timegm internally, is indeed slow, so caching that is a good idea. The rest of these should be instant anyway and are not needed imho.

mojomex · 2026-01-06T04:53:06Z

nebula_decoders/include/nebula_decoders/nebula_decoders_hesai/decoders/hesai_decoder.hpp

+
+        // ====== OPTIMIZATION: Combined range check with likely/unlikely hints ======
+        if (__builtin_expect(
+              distance < SensorT::min_range || distance > SensorT::max_range ||


Does __builtin_expect measurably improve performance here? If there is not a very clear, strong performance gain, I would like to keep compiler extrinsics out of Nebula as they make the code less readable.

perf(hesai): optimize convert_returns hot path

7d1f5f7

Signed-off-by: Keita Morisaki <kmta1236@gmail.com>

mojomex reviewed Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(hesai): optimize convert_returns hot path #389

perf(hesai): optimize convert_returns hot path #389

Uh oh!

k1832 commented Jan 6, 2026

Uh oh!

mojomex left a comment

Uh oh!

mojomex Jan 6, 2026

Uh oh!

mojomex Jan 6, 2026

Uh oh!

mojomex Jan 6, 2026

Uh oh!

mojomex Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		// Pre-fetch block pointers for this return group
		const auto * const blocks = &packet_.body.blocks[start_block_id];

perf(hesai): optimize convert_returns hot path #389

Are you sure you want to change the base?

perf(hesai): optimize convert_returns hot path #389

Uh oh!

Conversation

k1832 commented Jan 6, 2026

PR Type

Related Links

Description

Changes

Performance Measurement

Benchmark methodology

Review Procedure

Remarks

Uh oh!

mojomex left a comment

Choose a reason for hiding this comment

Uh oh!

mojomex Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

mojomex Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

mojomex Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

mojomex Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants