Skip to content

Conversation

@kaleidoscope416
Copy link
Contributor

@kaleidoscope416 kaleidoscope416 commented Jan 8, 2026

概述

本 PR 实现了 Linux 兼容的 mlock/munlock/mlockall/munlockall 系列系统调用,用于防止关键内存页面被换出到交换空间。这是数据库、加密应用等需要保证内存持久性的场景的重要功能。

相关 Commits (按时间顺序):

  • faf59f0 - 设置 RLIMIT_MEMLOCK 默认值
  • 5b55c44 - MADV_DONTNEED 与 VM_LOCKED 互斥检查等
  • 3ea434d - 删去冗余接口和逻辑
  • 48c854d - 完成锁定物理页的逻辑
  • 913a80a - 加入单元测试
  • 19581f7 - 减去一个锁
  • 6894c04 - 单独定义常量

主要功能

  1. 核心系统调用
  • mlock(addr, len) - 锁定指定内存区域中已映射页面
  • mlock2(addr, len, flags) - 支持 MLOCK_ONFAULT 标志的锁定,锁定指定内存区域中已映射页面,延迟锁定指定内存区域中未映射页面
  • munlock(addr, len) - 解锁指定内存区域
  • mlockall(flags) - 锁定所有内存(支持 MCL_CURRENT 和 MCL_FUTURE)
  • munlockall() - 解锁所有内存
  1. 页面管理
  • 页面级引用计数: 使用 Page::mlock_count 跟踪锁定次数
  • 标志管理:
    • PG_MLOCKED - 标记页面已锁定
    • PG_UNEVICTABLE - 防止页面被换出
  • 大页支持: 正确处理 2MB/1GB 大页的子页锁定
  1. 资源限制
  • RLIMIT_MEMLOCK 检查: 限制进程可锁定的内存总量
  • 默认值: 64KB(与 Linux x86_64 一致)
  • CAP_IPC_LOCK: 框架已预留(TODO: 实现权限检查)
  1. VMA 管理
  • VM_LOCKED 标志: 标记 VMA 中的页面应被锁定
  • VM_LOCKONFAULT 标志: 延迟锁定,仅在缺页时锁定
  • fork 语义: 正确实现 - 子进程不继承锁定状态

Note

Introduces Linux-compatible memory locking with proper accounting and limits.

  • Add kernel/src/mm/mlock.rs with mlock_page/munlock_page, page-table walking (incl. huge-page subpages), and can_do_mlock
  • New syscalls: mlock, mlock2 (MLOCK_ONFAULT), mlockall (MCL_CURRENT|MCL_FUTURE|MCL_ONFAULT), munlock, munlockall; validate ranges, align, CAP/RLIMIT checks
  • AddressSpace/VMA: track locked_vm and def_flags; implement mlock/munlock/mlockall/munlockall; apply VM_LOCKED/VM_LOCKONFAULT; do not inherit locks on fork
  • Page layer: add PG_MLOCKED, use PG_UNEVICTABLE, and per-page mlock_count
  • Fault handling: when VM_LOCKONFAULT set, lock anon pages on demand in do_anonymous_page
  • madvise: return EINVAL for MADV_DONTNEED on VM_LOCKED VMAs
  • mmap/mremap: enforce RLIMIT_MEMLOCK for MAP_LOCKED and locked VMA expansion, returning EPERM/EAGAIN/ENOMEM as appropriate
  • Process: set default RLIMIT_MEMLOCK to 64KB
  • Tests: add user/apps/c_unitest/test_mlock.c; whitelist mlock_test

Written by Cursor Bugbot for commit a4aa5ad. This will update automatically on new commits. Configure here.

kaleidoscope416 added 9 commits January 4, 2026 20:53
  - 对已设置 VM_LOCKED 标志的 VMA 调用 MADV_DONTNEED 时返回 EINVAL
  - 符合 Linux 语义:已锁定内存不能通过 madvise 释放

  2. sys_mlock.rs - RLIMIT_MEMLOCK 资源限制检查

  - 添加 RLIMIT_MEMLOCK 限制检查
  - 计算当前已锁定页面数和请求数
  - 超过限制时返回 ENOMEM

  3. sys_mmap.rs - MAP_LOCKED 标志的资源限制

  - MAP_LOCKED 映射时进行 RLIMIT_MEMLOCK 检查
  - 检查 can_do_mlock() 权限
  - 超过限制时返回 EAGAIN

  4. sys_mremap.rs - 扩展锁定 VMA 时的资源检查

  - 扩展已锁定的 VMA 时检查额外需要的页面数
  - 进行 RLIMIT_MEMLOCK 限制验证

  5. ucontext.rs - Fork 语义和辅助函数

  - 修正 fork 语义:子进程不应继承 locked_vm 计数(从 0 开始)
  - 修正 fork 语义:子进程不应继承 def_flags(mlockall 设置的默认标志)
  - 修正 fork 语义:子进程不应继承 VM_LOCKED 和 VM_LOCKONFAULT 标志
  - 添加 vma_is_accessible() 辅助函数
  - mlock 前检查 VMA 可访问性(PROT_NONE 应返回 ENOMEM)
  - 添加 locked_vm() getter 方法
  - mmap 时应用 def_flags(支持 mlockall MCL_FUTURE)
@fslongjin
Copy link
Member

bugbot run

use ida::IdAllocator;
use log::warn;
use system_error::SystemError;
use x86::current;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import x86::current accidentally added

Low Severity

The import use x86::current; is added but never used anywhere in the file. The current() calls in this file refer to AddressSpace::current() and self.is_current() methods, not the x86 crate's current module. This appears to be accidentally committed code.

Fix in Cursor Fix in Web

@kaleidoscope416
Copy link
Contributor Author

bugbot run

@cursor
Copy link

cursor bot commented Jan 8, 2026

Skipping Bugbot: Unable to authenticate your request. Please make sure Bugbot is properly installed and configured for this repository.

@fslongjin
Copy link
Member

bugbot run

@fslongjin
Copy link
Member

bugbot run


// 执行 mlock2(支持 MLOCK_ONFAULT)
let onfault = flags.contains(Mlock2Flags::MLOCK_ONFAULT);
addr_space.write().mlock(addr, aligned_len, onfault)?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlock2 missing address alignment before locking

High Severity

The mlock2 syscall is missing address alignment logic that exists in mlock. In sys_mlock.rs, the address is aligned down to page boundary and the length is adjusted by adding the page offset before aligning up. However, sys_mlock2.rs only aligns the length without adjusting for page offset, and passes the original unaligned addr to mlock() instead of an aligned address. When a user calls mlock2 with an unaligned address like 0x1001 and length 0x1000, the wrong memory range will be locked. The fix requires adding the same alignment logic: compute page_offset, create aligned_addr, and use adjusted_len = len.saturating_add(page_offset).

🔬 Verification Test

Why verification test was not possible: This is a kernel-level bug in a Rust OS kernel (DragonOS). It requires running within that specific kernel environment with the mlock2 syscall to demonstrate that unaligned addresses are handled incorrectly. The bug is clearly visible by comparing the alignment logic in sys_mlock.rs (lines 29-38) which properly handles alignment versus sys_mlock2.rs (lines 36, 85) which does not.

Fix in Cursor Fix in Web

}
}
// 更新 locked_vm 计数,确保 RLIMIT_MEMLOCK 检查正确
self.locked_vm.fetch_add(total_locked, Ordering::Relaxed);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlockall locked_vm accounting uses wrong count

Medium Severity

The mlockall() function for non-ONFAULT mode incorrectly uses the count of actually locked pages (returned by mlock_vma_pages_range) for locked_vm accounting, while mlock() uses the VMA size. This inconsistency causes locked_vm to be undercounted when not all pages in a VMA are currently mapped. The ONFAULT branch correctly uses VMA size. The total_locked variable should be calculated from VMA sizes like in the ONFAULT branch, not from the mlock_vma_pages_range return value, to maintain consistent accounting with mlock() and proper RLIMIT_MEMLOCK enforcement.

🔬 Verification Test

Why verification test was not possible: This is a kernel-level bug requiring the DragonOS kernel environment to test. The inconsistency is visible by comparing: mlock() at lines 1345-1351 uses VMA intersection size for newly_locked_pages, while mlockall() non-ONFAULT at lines 1504-1511 uses mlock_vma_pages_range() return value which only counts actually mapped pages per its documentation at line 115.

Fix in Cursor Fix in Web

if is_locked {
let page_count = page_count.data();
self.locked_vm.fetch_add(page_count, Ordering::Relaxed);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing RLIMIT check for MCL_FUTURE mmap locking

High Severity

When mlockall(MCL_FUTURE) sets def_flags to include VM_LOCKED, subsequent mmap() calls apply this flag via line 665 and increment locked_vm at lines 694-697. However, no RLIMIT_MEMLOCK check is performed in this code path. The RLIMIT check in sys_mmap.rs only triggers when map_flags.contains(MapFlags::MAP_LOCKED), which doesn't catch VM_LOCKED coming from def_flags. This allows a process to bypass its memory lock limit by calling mlockall(MCL_FUTURE) then making repeated mmap() calls without MAP_LOCKED, effectively locking unlimited memory.

🔬 Verification Test

Why verification test was not possible: This is a kernel-level security issue in DragonOS requiring kernel testing. The bug is evident from code analysis: sys_mmap.rs line 89 only checks RLIMIT when map_flags.contains(MapFlags::MAP_LOCKED), but map_inner at line 665 applies def_flags which may contain VM_LOCKED from mlockall(MCL_FUTURE), and lines 690-697 increment locked_vm without any limit check.

Fix in Cursor Fix in Web

// 如果页面可换出,移回正常 LRU
// 注意:需要检查页面是否真的可以换出(map_count == 0)
if page_guard.can_deallocate() {
page_guard.remove_flags(PageFlags::PG_UNEVICTABLE);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

munlock_page never clears PG_UNEVICTABLE flag

High Severity

The munlock_page function uses can_deallocate() to decide whether to remove the PG_UNEVICTABLE flag when the mlock count reaches zero. However, can_deallocate() is defined as map_count() == 0 && !self.flags.contains(PageFlags::PG_UNEVICTABLE). Since PG_UNEVICTABLE is still set at this point, can_deallocate() always returns false, meaning PG_UNEVICTABLE is never removed. The comment on line 95 states the intent is to check map_count() == 0, but the code incorrectly uses can_deallocate() which creates a circular dependency. This causes any page that was ever mlock'd to permanently remain unevictable, creating a memory leak.

🔬 Verification Test

Why verification test was not possible: This is a kernel-level logic bug in DragonOS that requires kernel debugging to verify. The bug is evident from static analysis: can_deallocate() in page.rs line 600-601 checks !PG_UNEVICTABLE, but in munlock_page at line 96, PG_UNEVICTABLE hasn't been removed yet, so the check always fails and line 97 is never reached.

Fix in Cursor Fix in Web

@kaleidoscope416 kaleidoscope416 marked this pull request as draft January 10, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants