feat(mm)：添加mlock系列的系统调用 #1619

kaleidoscope416 · 2026-01-08T05:26:40Z

概述

本 PR 实现了 Linux 兼容的 mlock/munlock/mlockall/munlockall 系列系统调用，用于防止关键内存页面被换出到交换空间。这是数据库、加密应用等需要保证内存持久性的场景的重要功能。

相关 Commits (按时间顺序):

faf59f0 - 设置 RLIMIT_MEMLOCK 默认值
5b55c44 - MADV_DONTNEED 与 VM_LOCKED 互斥检查等
3ea434d - 删去冗余接口和逻辑
48c854d - 完成锁定物理页的逻辑
913a80a - 加入单元测试
19581f7 - 减去一个锁
6894c04 - 单独定义常量

主要功能

核心系统调用

mlock(addr, len) - 锁定指定内存区域中已映射页面
mlock2(addr, len, flags) - 支持 MLOCK_ONFAULT 标志的锁定，锁定指定内存区域中已映射页面，延迟锁定指定内存区域中未映射页面
munlock(addr, len) - 解锁指定内存区域
mlockall(flags) - 锁定所有内存（支持 MCL_CURRENT 和 MCL_FUTURE）
munlockall() - 解锁所有内存

页面管理

页面级引用计数: 使用 Page::mlock_count 跟踪锁定次数
标志管理:
- PG_MLOCKED - 标记页面已锁定
- PG_UNEVICTABLE - 防止页面被换出
大页支持: 正确处理 2MB/1GB 大页的子页锁定

资源限制

RLIMIT_MEMLOCK 检查: 限制进程可锁定的内存总量
默认值: 64KB（与 Linux x86_64 一致）
CAP_IPC_LOCK: 框架已预留（TODO: 实现权限检查）

VMA 管理

VM_LOCKED 标志: 标记 VMA 中的页面应被锁定
VM_LOCKONFAULT 标志: 延迟锁定，仅在缺页时锁定
fork 语义: 正确实现 - 子进程不继承锁定状态

Note

Introduces Linux-compatible memory locking with proper accounting and limits.

Add kernel/src/mm/mlock.rs with mlock_page/munlock_page, page-table walking (incl. huge-page subpages), and can_do_mlock
New syscalls: mlock, mlock2 (MLOCK_ONFAULT), mlockall (MCL_CURRENT|MCL_FUTURE|MCL_ONFAULT), munlock, munlockall; validate ranges, align, CAP/RLIMIT checks
AddressSpace/VMA: track locked_vm and def_flags; implement mlock/munlock/mlockall/munlockall; apply VM_LOCKED/VM_LOCKONFAULT; do not inherit locks on fork
Page layer: add PG_MLOCKED, use PG_UNEVICTABLE, and per-page mlock_count
Fault handling: when VM_LOCKONFAULT set, lock anon pages on demand in do_anonymous_page
madvise: return EINVAL for MADV_DONTNEED on VM_LOCKED VMAs
mmap/mremap: enforce RLIMIT_MEMLOCK for MAP_LOCKED and locked VMA expansion, returning EPERM/EAGAIN/ENOMEM as appropriate
Process: set default RLIMIT_MEMLOCK to 64KB
Tests: add user/apps/c_unitest/test_mlock.c; whitelist mlock_test

^{Written by Cursor Bugbot for commit a4aa5ad. This will update automatically on new commits. Configure here.}

- 对已设置 VM_LOCKED 标志的 VMA 调用 MADV_DONTNEED 时返回 EINVAL - 符合 Linux 语义：已锁定内存不能通过 madvise 释放 2. sys_mlock.rs - RLIMIT_MEMLOCK 资源限制检查 - 添加 RLIMIT_MEMLOCK 限制检查 - 计算当前已锁定页面数和请求数 - 超过限制时返回 ENOMEM 3. sys_mmap.rs - MAP_LOCKED 标志的资源限制 - MAP_LOCKED 映射时进行 RLIMIT_MEMLOCK 检查 - 检查 can_do_mlock() 权限 - 超过限制时返回 EAGAIN 4. sys_mremap.rs - 扩展锁定 VMA 时的资源检查 - 扩展已锁定的 VMA 时检查额外需要的页面数 - 进行 RLIMIT_MEMLOCK 限制验证 5. ucontext.rs - Fork 语义和辅助函数 - 修正 fork 语义：子进程不应继承 locked_vm 计数（从 0 开始） - 修正 fork 语义：子进程不应继承 def_flags（mlockall 设置的默认标志） - 修正 fork 语义：子进程不应继承 VM_LOCKED 和 VM_LOCKONFAULT 标志 - 添加 vma_is_accessible() 辅助函数 - mlock 前检查 VMA 可访问性（PROT_NONE 应返回 ENOMEM） - 添加 locked_vm() getter 方法 - mmap 时应用 def_flags（支持 mlockall MCL_FUTURE）

fslongjin · 2026-01-08T05:30:02Z

bugbot run

kernel/src/mm/ucontext.rs

cursor · 2026-01-08T05:36:45Z

kernel/src/mm/ucontext.rs

 use ida::IdAllocator;
 use log::warn;
 use system_error::SystemError;
+use x86::current;


Unused import x86::current accidentally added

Low Severity

The import use x86::current; is added but never used anywhere in the file. The current() calls in this file refer to AddressSpace::current() and self.is_current() methods, not the x86 crate's current module. This appears to be accidentally committed code.

kernel/src/mm/syscall/sys_mlock2.rs

kernel/src/mm/syscall/sys_mlockall.rs

kernel/src/mm/syscall/sys_mmap.rs

kernel/src/mm/syscall/sys_mlockall.rs

kaleidoscope416 · 2026-01-08T06:16:50Z

bugbot run

cursor · 2026-01-08T06:16:52Z

Skipping Bugbot: Unable to authenticate your request. Please make sure Bugbot is properly installed and configured for this repository.

fslongjin · 2026-01-08T11:47:04Z

bugbot run

kernel/src/mm/ucontext.rs

kernel/src/mm/mlock.rs

kernel/src/mm/syscall/sys_mlock.rs

fslongjin · 2026-01-10T08:04:48Z

bugbot run

cursor · 2026-01-10T08:13:31Z

kernel/src/mm/syscall/sys_mlock2.rs

+
+        // 执行 mlock2（支持 MLOCK_ONFAULT）
+        let onfault = flags.contains(Mlock2Flags::MLOCK_ONFAULT);
+        addr_space.write().mlock(addr, aligned_len, onfault)?;


mlock2 missing address alignment before locking

High Severity

The mlock2 syscall is missing address alignment logic that exists in mlock. In sys_mlock.rs, the address is aligned down to page boundary and the length is adjusted by adding the page offset before aligning up. However, sys_mlock2.rs only aligns the length without adjusting for page offset, and passes the original unaligned addr to mlock() instead of an aligned address. When a user calls mlock2 with an unaligned address like 0x1001 and length 0x1000, the wrong memory range will be locked. The fix requires adding the same alignment logic: compute page_offset, create aligned_addr, and use adjusted_len = len.saturating_add(page_offset).

🔬 Verification Test

Why verification test was not possible: This is a kernel-level bug in a Rust OS kernel (DragonOS). It requires running within that specific kernel environment with the mlock2 syscall to demonstrate that unaligned addresses are handled incorrectly. The bug is clearly visible by comparing the alignment logic in sys_mlock.rs (lines 29-38) which properly handles alignment versus sys_mlock2.rs (lines 36, 85) which does not.

cursor · 2026-01-10T08:13:31Z

kernel/src/mm/ucontext.rs

+                        }
+                    }
+                    // 更新 locked_vm 计数，确保 RLIMIT_MEMLOCK 检查正确
+                    self.locked_vm.fetch_add(total_locked, Ordering::Relaxed);


mlockall locked_vm accounting uses wrong count

Medium Severity

The mlockall() function for non-ONFAULT mode incorrectly uses the count of actually locked pages (returned by mlock_vma_pages_range) for locked_vm accounting, while mlock() uses the VMA size. This inconsistency causes locked_vm to be undercounted when not all pages in a VMA are currently mapped. The ONFAULT branch correctly uses VMA size. The total_locked variable should be calculated from VMA sizes like in the ONFAULT branch, not from the mlock_vma_pages_range return value, to maintain consistent accounting with mlock() and proper RLIMIT_MEMLOCK enforcement.

🔬 Verification Test

Why verification test was not possible: This is a kernel-level bug requiring the DragonOS kernel environment to test. The inconsistency is visible by comparing: mlock() at lines 1345-1351 uses VMA intersection size for newly_locked_pages, while mlockall() non-ONFAULT at lines 1504-1511 uses mlock_vma_pages_range() return value which only counts actually mapped pages per its documentation at line 115.

cursor · 2026-01-10T08:13:31Z

kernel/src/mm/ucontext.rs

+        if is_locked {
+            let page_count = page_count.data();
+            self.locked_vm.fetch_add(page_count, Ordering::Relaxed);
+        }


Missing RLIMIT check for MCL_FUTURE mmap locking

High Severity

When mlockall(MCL_FUTURE) sets def_flags to include VM_LOCKED, subsequent mmap() calls apply this flag via line 665 and increment locked_vm at lines 694-697. However, no RLIMIT_MEMLOCK check is performed in this code path. The RLIMIT check in sys_mmap.rs only triggers when map_flags.contains(MapFlags::MAP_LOCKED), which doesn't catch VM_LOCKED coming from def_flags. This allows a process to bypass its memory lock limit by calling mlockall(MCL_FUTURE) then making repeated mmap() calls without MAP_LOCKED, effectively locking unlimited memory.

🔬 Verification Test

Why verification test was not possible: This is a kernel-level security issue in DragonOS requiring kernel testing. The bug is evident from code analysis: sys_mmap.rs line 89 only checks RLIMIT when map_flags.contains(MapFlags::MAP_LOCKED), but map_inner at line 665 applies def_flags which may contain VM_LOCKED from mlockall(MCL_FUTURE), and lines 690-697 increment locked_vm without any limit check.

cursor · 2026-01-10T08:13:31Z

kernel/src/mm/mlock.rs

+        // 如果页面可换出，移回正常 LRU
+        // 注意：需要检查页面是否真的可以换出（map_count == 0）
+        if page_guard.can_deallocate() {
+            page_guard.remove_flags(PageFlags::PG_UNEVICTABLE);


munlock_page never clears PG_UNEVICTABLE flag

High Severity

The munlock_page function uses can_deallocate() to decide whether to remove the PG_UNEVICTABLE flag when the mlock count reaches zero. However, can_deallocate() is defined as map_count() == 0 && !self.flags.contains(PageFlags::PG_UNEVICTABLE). Since PG_UNEVICTABLE is still set at this point, can_deallocate() always returns false, meaning PG_UNEVICTABLE is never removed. The comment on line 95 states the intent is to check map_count() == 0, but the code incorrectly uses can_deallocate() which creates a circular dependency. This causes any page that was ever mlock'd to permanently remain unevictable, creating a memory leak.

🔬 Verification Test

Why verification test was not possible: This is a kernel-level logic bug in DragonOS that requires kernel debugging to verify. The bug is evident from static analysis: can_deallocate() in page.rs line 600-601 checks !PG_UNEVICTABLE, but in munlock_page at line 96, PG_UNEVICTABLE hasn't been removed yet, so the check always fails and line 97 is never reached.

kaleidoscope416 added 9 commits January 4, 2026 20:53

初步完成mlock

8281b18

设置 RLIMIT_MEMLOCK 默认值

faf59f0

删去冗余接口和逻辑

3ea434d

完成锁定物理页的逻辑

48c854d

加入单元测试

913a80a

减去一个锁

19581f7

单独定义常量

6894c04

fmt

1951aa1

fmt

c97ebd5

cursor bot reviewed Jan 8, 2026

View reviewed changes

kaleidoscope416 added 9 commits January 8, 2026 13:39

fmt并启用测试

55bcc91

fix: 在mlockall后增加locked_vm的对应值保证rlimt准确

9e5b43e

fix：在munlock中增加标志位的检查，正确减去解锁页面

ce78f2c

fix：修复mlock中对已锁定页面多次计数的问题

5bb1026

fix：增加mlock2和mlockall中对rlimit的检查

53bc828

fix：修复mmap锁定时未增加locked_vm的问题

ae1c72a

fix：强制要求mlockall指定cur或future

723b7d7

fmt

5a2c884

可变性

8082e31

fmt

3f9c91e

cursor bot reviewed Jan 8, 2026

View reviewed changes

kernel/src/mm/ucontext.rs Show resolved Hide resolved

kernel/src/mm/mlock.rs Show resolved Hide resolved

kernel/src/mm/syscall/sys_mlock.rs Outdated Show resolved Hide resolved

kaleidoscope416 added 4 commits January 8, 2026 20:49

fix: 修复mlock_page静默失败的问题&mlock的竞态条件的问题

1a76ad2

增加对CAP_IPC_LOCK的检查

756afbc

修复大页中页表遍历的问题

8e6f8be

增加对起始地址的向下对齐

5e835e0

fmt

a4aa5ad

cursor bot reviewed Jan 10, 2026

View reviewed changes

kaleidoscope416 marked this pull request as draft January 10, 2026 16:04

Uh oh!

feat(mm)：添加mlock系列的系统调用 #1619

Are you sure you want to change the base?

feat(mm)：添加mlock系列的系统调用 #1619

Uh oh!

Conversation

kaleidoscope416 commented Jan 8, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fslongjin commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 8, 2026

Choose a reason for hiding this comment

Unused import x86::current accidentally added

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaleidoscope416 commented Jan 8, 2026

Uh oh!

cursor bot commented Jan 8, 2026

Uh oh!

fslongjin commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fslongjin commented Jan 10, 2026

Uh oh!

cursor bot Jan 10, 2026

Choose a reason for hiding this comment

mlock2 missing address alignment before locking

Uh oh!

cursor bot Jan 10, 2026

Choose a reason for hiding this comment

mlockall locked_vm accounting uses wrong count

Uh oh!

cursor bot Jan 10, 2026

Choose a reason for hiding this comment

Missing RLIMIT check for MCL_FUTURE mmap locking

Uh oh!

cursor bot Jan 10, 2026

Choose a reason for hiding this comment

munlock_page never clears PG_UNEVICTABLE flag

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaleidoscope416 commented Jan 8, 2026 •

edited by cursor bot

Loading