TBR Pipeline Implementation & Pipeline Optimization #14

indevn · 2025-08-30T09:48:58Z

Description

该PR主要专注于

Tile-Based Rasterization框架实现：模仿移动端的TBR管线，在当前框架上实现软件版本的分Tile光栅化。包含Tile分割、三角Binning以及逐Tile的光栅化。以Tile为基本并行单元实现OpenMP并行化加速。
几何处理管线基本优化：在MVP变换基础上实现透视除法和视口变换；基于现有结构拓展Vertex数据结构，从而以较小的侵入性传递裁剪空间信息；多阶段提前剔除被遮挡片段（>40%），通过视锥体裁剪和屏幕空间背面剔除等方法，大幅减少无效片段生成与着色开销。
Profiling测试框架：为每个管线实现细粒度的timing，作为DEBUG LEVEL的输出，辅助进行性能评估。
层次化深度测试：同时支持Early-Z与Late-Z的实现，通过参数进行Z-Test阶段的控制。在system_test场景中，使用Early-Z相比于使用Late-Z，会减少1373个片段生成。
多阶段的Buffer预分配与容器复用：
- TBR光栅化阶段容器复用：在TBR的Rasterization阶段，由于不需要长期片段存储，统一Fragment池会引入额外的性能开销，故通过Scatch容器的设计，减少频繁内存分配。
  在Rasterize中实现容器复用，通过为每个OpenMP线程分配固定大小的片段暂存容器，避免频繁的内存分配与释放。为防止线程间竞争，写入操作严格限制在瓦片边界内进行。
- 双阶段Binning处理避免频繁内存分配：先遍历所有三角形以确定所需容量并相应调用reserve()预分配空间，随后通过顺序写入完成实际数据填充，从而避免执行过程中的动态调整。相较于之前版本，此项优化使单次测试平均执行时间缩短了20ms。
- 顶点预分配优化：将原先"每线程独立vector+后期合并"的方案改为"单一预分配缓冲区+并行索引回写"机制，彻底消除了多次动态内存分配及二次拷贝/合并操作。与优化前相比，该方案使顶点处理阶段的CPU耗时减少约10毫秒。
SoA数据布局优化：通过Structure-of-Arrays (SoA)数据布局获得更好的缓存局部性，对相关代码进行适配，避免引入明显额外的计算开销。相比于原先的Array-of-Structures (AoS)布局，各阶段只读所需属性（如 Binning 只用位置/裁剪坐标），SoA 能连续加载相应数组，减少无关字段带来的缓存/带宽浪费。
使用全局framebuffer避免Merge开销，优化写回逻辑：原先的多线程并行计算，是为每个线程创建framebuffer，分别并行处理写入后，再合并所有线程结果。线程合并带来很大的计算开销。如今采用单一全局缓冲的设计，在TBR SoA管线中仅分配一份frame级buffer，tile直接写回全局。由于不同tile不重叠，无数据竞争，无需加锁或合并，充分发挥了TBR架构的Tile原子性优势。同时，由于TBR 下不同 tile 覆盖的屏幕区域互不重叠，且在 tile 内部已通过 Early‑Z得出每个像素的最终值。因此可以直接将 tile 行数据拷贝到全局缓冲，该优化消除了之前约42%的总帧时间。
重构Renderer：提取现有的不同管线为单独class；前向管线按照并行组织方式命名为PerTriangle和Tile-Based；将核心函数注释改为Doxygen风格，将函数描述保持在hpp中。
SIMD优化与分支预测优化：
这部分优化可能会让代码可读性变差，故将复杂优化集中在TBR中进行，保留PerTriangle作为基础版本进行对照参考。
- 除法优化：将光栅化阶段的常规重心坐标写法改为边函数半空间测试，减少除法计算利于向量化加速；用相对坐标的cross写法避免数值不稳。
- 掩码化处理减少分支：为使编译器更好地进行矢量化，在RasterizeTile中通过按三角形直接写tile缓冲规避push_back；通过掩码化将“条件分支”转化为“位掩码选择”，从而让同一批像素尽可能在同一指令流中运行。在不引入额外明显开销的前提下，尽可能实现SIMD友好的光栅计算。
shader计算优化：
在VS和FS中，每个顶点/片元都会从UniformBuffer中取传入的矩阵和向量，背后是 std::unordered_map 查表 + std::variant 拷贝。顶点数量、片元数量越多，重复调用次数也就越多。这里尝试通过缓存复用，尽可能消除重复计算。对于VS和FS，在缓存无效时（未进行Prepare等情况），会回退原先的路径进行计算。
- VS调用中的缓存复用：在进入顶点循环前将 modelMatrix/viewMatrix/projectionMatrix 从 UniformBuffer 中取出并缓存成引用或结构体，后续按值传入 VS，消除重复哈希与拷贝。该优化大幅提升顶点计算效率，将原先顶点阶段的140ms削减到25ms耗时。
- FS调用中的缓存复用：同步更新矩阵与 light/cameraPos 的缓存，该优化为光栅化阶段调用削减了约40ms的时间（115ms->75ms）。
- LUT支持：通过添加对镜面反射的LUT缓存支持，优化了镜面反射计算，避免每帧的pow计算。实现了拷贝构造函数和移动构造函数，确保线程安全。该优化为光栅化阶段调用削减了约10ms的时间（75ms->65ms）。

尽管TBR原本是为移动端硬件设计，但经过测试和优化在软件实现下，相比于Triangle为核心的划分，Tile划分同样适用于CPU的并行加速场景，体现出了更好的效果。目前TBR管线的单帧时间明显优于传统PerTriangle管线。
测试在Win11(wslg, WSL2 with Ubuntu24.04LTS)下完成，CPU平台为移动端i9-13900h，TBR单帧平均计算时间保持在100ms以下。

Test Platform

Ubuntu24.04 (in WSL2, Windows11)

TO-DO

Z-Buffer访问模式优化
共享内存优化
TBR轻量Tile重排

Signed-off-by: ZhouFANG <indevn@outlook.com>

…ective-correct interpolation Signed-off-by: ZhouFANG <indevn@outlook.com>

… multi-rendering-mode Signed-off-by: ZhouFANG <indevn@outlook.com>

…g code for TBR. Signed-off-by: ZhouFANG <indevn@outlook.com>

…ace culling for TBR. Signed-off-by: ZhouFANG <indevn@outlook.com>

1. Add backface culling to TRADITIONAL pipeline to match TILE_BASED behavior 2. Fix depth buffer initialization from infinity to 1.0f for standard range Signed-off-by: ZhouFANG <indevn@outlook.com>

Signed-off-by: ZhouFANG <indevn@outlook.com>

This reverts commit 70e1581. Signed-off-by: ZhouFANG <indevn@outlook.com>

… debugging. Signed-off-by: ZhouFANG <indevn@outlook.com>

…s counting in Binning to eliminate frequent dynamic memory reallocations. Signed-off-by: ZhouFANG <indevn@outlook.com>

…ocation Signed-off-by: ZhouFANG <indevn@outlook.com>

Signed-off-by: ZhouFANG <indevn@outlook.com>

ZzzhHe

整体实现非常出色，代码结构清晰、逻辑完整，功能覆盖也非常全面。

在阅读过程中，我发现一些相对细小的改进点，比如：注释位置和风格的一致性、少量命名和魔法数需要调整，以及部分重复代码可以通过抽取工具函数来减少冗余。

这些修改都属于可读性和维护性方面的提升，并不影响整体功能。总体来说，代码质量已经不错，仅需在细节上稍作打磨即可。

src/include/renderer.h

src/include/vertex_soa.hpp

src/rasterizer.cpp

src/include/rasterizer.hpp

src/renderer.cpp

Signed-off-by: ZhouFANG <indevn@outlook.com>

…ipeline to PerTriangle for consistency with TileBased. Switch core function comments to Doxygen style. Signed-off-by: ZhouFANG <indevn@outlook.com>

…ng to enable SIMD-friendly rasterization; use relative-coordinate cross products to ensure numerical stability. Signed-off-by: ZhouFANG <indevn@outlook.com>

ZzzhHe

建议把DEBUG的打印信息从SPDLOG_INFO改为SPDLOG_DEBUG，然后在 #include <spdlog/spdlog.h> 之前添加宏定义 #define SPDLOG_ACTIVE_LEVEL SPDLOG_LEVEL_INFO，就可以区分info打印和debug打印

src/include/renderers/tile_based_renderer.hpp

src/renderers/tile_based_renderer.cpp

src/renderers/deferred_renderer.cpp

… and per-bucket parallel merge) Signed-off-by: ZhouFANG <indevn@outlook.com>

…dContext structure. Replacing hard-coded values with constants. Signed-off-by: ZhouFANG <indevn@outlook.com>

…, set the default log level to INFO Signed-off-by: ZhouFANG <indevn@outlook.com>

…SIMD-friendly rasterization, and add corresponding mask statistics output. Signed-off-by: ZhouFANG <indevn@outlook.com>

…paration and update functionality to reduce redundant computations. Signed-off-by: ZhouFANG <indevn@outlook.com>

Signed-off-by: ZhouFANG <indevn@outlook.com>

…imize computation and eliminate redundancy. Added copy/move constructors for thread safety Signed-off-by: ZhouFANG <indevn@outlook.com>

ZzzhHe

改动很清晰，更新很及时，整体实现也很完整！

indevn force-pushed the main branch from b26abe8 to 0c8f67f Compare August 30, 2025 14:46

fix typo

979a251

Signed-off-by: ZhouFANG <indevn@outlook.com>

indevn force-pushed the main branch from 0c8f67f to 6e42d3c Compare August 30, 2025 14:54

indevn added 6 commits September 2, 2025 22:12

Implement perspective division and viewport transformation with persp…

b9f2ae8

…ective-correct interpolation Signed-off-by: ZhouFANG <indevn@outlook.com>

implement tile-based rasterizer and refractor the pipeline to support…

7093d82

… multi-rendering-mode Signed-off-by: ZhouFANG <indevn@outlook.com>

Add Performance Profiling for Deffered Pipeline. Remove detailed debu…

8d58a84

…g code for TBR. Signed-off-by: ZhouFANG <indevn@outlook.com>

Expand the Vertex data structure, implement frustum culling and backf…

d0ddf62

…ace culling for TBR. Signed-off-by: ZhouFANG <indevn@outlook.com>

Fix rendering consistency between TRADITIONAL and TILE_BASED modes

a4021cd

1. Add backface culling to TRADITIONAL pipeline to match TILE_BASED behavior 2. Fix depth buffer initialization from infinity to 1.0f for standard range Signed-off-by: ZhouFANG <indevn@outlook.com>

add debug mode

70e1581

Signed-off-by: ZhouFANG <indevn@outlook.com>

indevn force-pushed the main branch from 6e42d3c to fe4c91f Compare September 2, 2025 15:46

Revert "add debug mode", which is not necessary for rendering tests.

4538667

This reverts commit 70e1581. Signed-off-by: ZhouFANG <indevn@outlook.com>

indevn force-pushed the main branch from fe4c91f to 4538667 Compare September 2, 2025 15:48

indevn added 4 commits September 6, 2025 23:16

Add Early-Z to TBR. Remove obsolete functions previously used for TBR…

b57d907

… debugging. Signed-off-by: ZhouFANG <indevn@outlook.com>

TBR: Pre-allocate and reuse fragment caches; add RasterizeTo; two-pas…

1d2d9a9

…s counting in Binning to eliminate frequent dynamic memory reallocations. Signed-off-by: ZhouFANG <indevn@outlook.com>

vertex optimization: avoid data movement and multi-stage memory reall…

8a74379

…ocation Signed-off-by: ZhouFANG <indevn@outlook.com>

TBR: Use SoA vertex layout to improve cache locality

bb5acc1

Signed-off-by: ZhouFANG <indevn@outlook.com>

ZzzhHe requested changes Sep 9, 2025

View reviewed changes

indevn added 4 commits September 13, 2025 13:09

TBR: Use global framebuffer to avoid merge overhead

0549211

Signed-off-by: ZhouFANG <indevn@outlook.com>

TBR: Optimize global buffer write-back logic

258607a

Signed-off-by: ZhouFANG <indevn@outlook.com>

Optimize perspective correction, add helper func, simplify code

ffe0d75

Signed-off-by: ZhouFANG <indevn@outlook.com>

Refactor: Extract pipeline into standalone class; rename TraditionalP…

957c9b0

…ipeline to PerTriangle for consistency with TileBased. Switch core function comments to Doxygen style. Signed-off-by: ZhouFANG <indevn@outlook.com>

indevn force-pushed the main branch from 1db9969 to 957c9b0 Compare September 13, 2025 05:10

TBR: Replace barycentric coordinate computation with half-space testi…

d6e3b40

…ng to enable SIMD-friendly rasterization; use relative-coordinate cross products to ensure numerical stability. Signed-off-by: ZhouFANG <indevn@outlook.com>

ZzzhHe requested changes Sep 14, 2025

View reviewed changes

src/include/renderers/tile_based_renderer.hpp Outdated Show resolved Hide resolved

src/renderers/tile_based_renderer.cpp Outdated Show resolved Hide resolved

src/renderers/deferred_renderer.cpp Outdated Show resolved Hide resolved

indevn added 6 commits September 15, 2025 16:15

DR: Optimize fragment collection(pre-reserve per bucket, move-insert,…

30038ef

… and per-bucket parallel merge) Signed-off-by: ZhouFANG <indevn@outlook.com>

Refactor: Modify the triangle binning logic in TBR to use the TileGri…

61e75a8

…dContext structure. Replacing hard-coded values with constants. Signed-off-by: ZhouFANG <indevn@outlook.com>

Change timing-related debug messages from SPDLOG_INFO to SPDLOG_DEBUG…

86d06ad

…, set the default log level to INFO Signed-off-by: ZhouFANG <indevn@outlook.com>

TBR: Perform mask-based computation for TBR rasterization to achieve …

0ea7f22

…SIMD-friendly rasterization, and add corresponding mask statistics output. Signed-off-by: ZhouFANG <indevn@outlook.com>

VS: Optimize the vertex matrix caching in shaders by adding cache pre…

b659f57

…paration and update functionality to reduce redundant computations. Signed-off-by: ZhouFANG <indevn@outlook.com>

FS: Cache vectors and matrices to avoid redundant computations.

e81bcff

Signed-off-by: ZhouFANG <indevn@outlook.com>

Enhanced shader class with LUT caching for specular reflection to opt…

b84cfd2

…imize computation and eliminate redundancy. Added copy/move constructors for thread safety Signed-off-by: ZhouFANG <indevn@outlook.com>

ZzzhHe closed this Sep 17, 2025

ZzzhHe reopened this Sep 17, 2025

ZzzhHe approved these changes Sep 17, 2025

View reviewed changes

ZzzhHe merged commit f10d680 into Simple-XX:main Sep 17, 2025
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TBR Pipeline Implementation & Pipeline Optimization #14

TBR Pipeline Implementation & Pipeline Optimization #14

Uh oh!

indevn commented Aug 30, 2025 •

edited

Loading

Uh oh!

ZzzhHe left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZzzhHe left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZzzhHe left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TBR Pipeline Implementation & Pipeline Optimization #14

TBR Pipeline Implementation & Pipeline Optimization #14

Uh oh!

Conversation

indevn commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Platform

TO-DO

Uh oh!

ZzzhHe left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZzzhHe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZzzhHe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

indevn commented Aug 30, 2025 •

edited

Loading

ZzzhHe left a comment •

edited

Loading