Conversation
mainline inclusion from mainline-6.7-rc1 commit 9e11306 category: feature bugzilla: RVCK-Project#221 -------------------------------- flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form, when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the whole tlb: so set a stride of the size of the hugetlb mapping in order to only flush the hugetlb mapping. However, if the hugepage is a NAPOT region, all PTEs that constitute this mapping must be invalidated, so the stride size must actually be the size of the PTE. Note that THPs are directly handled by flush_pmd_tlb_range(). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC Link: https://lore.kernel.org/r/20231030133027.19542-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Gao Rui <gao.rui@zte.com.cn>
mainline inclusion from mainline-6.8-rc1 commit 54d7431 category: feature bugzilla: RVCK-Project#221 -------------------------------- Allow to defer the flushing of the TLB when unmapping pages, which allows to reduce the numbers of IPI and the number of sfence.vma. The ubenchmarch used in commit 43b3dfd ("arm64: support batched/deferred tlb shootdown during page reclamation/migration") that was multithreaded to force the usage of IPI shows good performance improvement on all platforms: * Unmatched: ~34% * TH1520 : ~78% * Qemu : ~81% In addition, perf on qemu reports an important decrease in time spent dealing with IPIs: Before: 68.17% main [kernel.kallsyms] [k] __sbi_rfence_v02_call After : 8.64% main [kernel.kallsyms] [k] __sbi_rfence_v02_call * Benchmark: int stick_this_thread_to_core(int core_id) { int num_cores = sysconf(_SC_NPROCESSORS_ONLN); if (core_id < 0 || core_id >= num_cores) return EINVAL; cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(core_id, &cpuset); pthread_t current_thread = pthread_self(); return pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset); } static void *fn_thread (void *p_data) { int ret; pthread_t thread; stick_this_thread_to_core((int)p_data); while (1) { sleep(1); } return NULL; } int main() { volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); pthread_t threads[4]; int ret; for (int i = 0; i < 4; ++i) { ret = pthread_create(&threads[i], NULL, fn_thread, (void *)i); if (ret) { printf("%s", strerror (ret)); } } memset(p, 0x88, SIZE); for (int k = 0; k < 10000; k++) { /* swap in */ for (int i = 0; i < SIZE; i += 4096) { (void)p[i]; } /* swap out */ madvise(p, SIZE, MADV_PAGEOUT); } for (int i = 0; i < 4; i++) { pthread_cancel(threads[i]); } for (int i = 0; i < 4; i++) { pthread_join(threads[i], NULL); } return 0; } Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Tested-by: Jisheng Zhang <jszhang@kernel.org> # Tested on TH1520 Tested-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20240108193640.344929-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Gao Rui <gao.rui@zte.com.cn>
|
开始测试 log: https://github.com/RVCK-Project/rvck/actions/runs/21894978777 参数解析结果
测试完成 详细结果:RVCK result
Kunit Test Result[06:29:19] Testing complete. Ran 457 tests: passed: 445, skipped: 12 Kernel Build ResultKernel build succeeded: RVCK-Project/rvck/222/ 591c03e1eb5e54bec8a9067becf7b688 /srv/guix_result/0923fab5c983410da4748e639a9d3f34fd411cd6/Image LAVA Checkargs:
result:Lava check done! lava log: https://lava.oerv.ac.cn/scheduler/job/1418 lava result count: [fail]: 175, [pass]: 1434, [skip]: 290 Check Patch Result
|
mainline inclusion from mainline-6.8-rc4 commit 3951f6a category: feature bugzilla: RVCK-Project#221 -------------------------------- We must clear the cpumask once we have flushed the batch, otherwise cpus get accumulated and we end sending IPIs to more cpus than needed. Fixes: 54d7431 ("riscv: Add support for BATCHED_UNMAP_TLB_FLUSH") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20240130115508.105386-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Gao Rui <gao.rui@zte.com.cn>
mainline inclusion from mainline-6.9-rc2 commit 674bc01 category: feature bugzilla: RVCK-Project#221 -------------------------------- __flush_tlb_range() does not modify the provided cpumask, so its cmask parameter can be pointer-to-const. This avoids the unsafe cast of cpu_online_mask. Fixes: 54d7431 ("riscv: Add support for BATCHED_UNMAP_TLB_FLUSH") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240301201837.2826172-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Gao Rui <gao.rui@zte.com.cn>
|
开始测试 log: https://github.com/RVCK-Project/rvck/actions/runs/21897220829 参数解析结果
测试完成 详细结果:RVCK result
Kunit Test Result[08:09:12] Testing complete. Ran 457 tests: passed: 445, skipped: 12 Kernel Build ResultKernel build succeeded: RVCK-Project/rvck/222/ a5e3c9075e2abc19384a60d0e5595e10 /srv/guix_result/9f3c81015b46e6eea0158c3011297a1cd23aac2a/Image LAVA Checkargs:
result:Lava check done! lava log: https://lava.oerv.ac.cn/scheduler/job/1419 lava result count: [fail]: 174, [pass]: 1435, [skip]: 290 Check Patch Result
|
issues: #221
backport上游补丁,已经按照补丁的用例测试过,批量unmap性能提升很大
~ # time /home/unmap_tlb_flush
Testing with memory size: 256 MB
Threads created, starting swap test...
Completed 1 iterations
Completed 2 iterations
Completed 3 iterations
Completed 4 iterations
Completed 5 iterations
Completed 6 iterations
Completed 7 iterations
Completed 8 iterations
Completed 9 iterations
Completed 10 iterations
Test completed, cleaning up...
Done.
real 10m 56.58s
user 0m 1.88s
sys 10m 53.74s
补丁合入后
~ # time /home/unmap_tlb_flush
Testing with memory size: 256 MB
Threads created, starting swap test...
Completed 1 iterations
Completed 2 iterations
Completed 3 iterations
Completed 4 iterations
Completed 5 iterations
Completed 6 iterations
Completed 7 iterations
Completed 8 iterations
Completed 9 iterations
Completed 10 iterations
Test completed, cleaning up...
Done.
real 2m 56.10s
user 0m 1.54s
sys 2m 46.82s