-
-
Notifications
You must be signed in to change notification settings - Fork 168
feat(mm):添加mlock系列的系统调用 #1619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(mm):添加mlock系列的系统调用 #1619
Conversation
- 对已设置 VM_LOCKED 标志的 VMA 调用 MADV_DONTNEED 时返回 EINVAL - 符合 Linux 语义:已锁定内存不能通过 madvise 释放 2. sys_mlock.rs - RLIMIT_MEMLOCK 资源限制检查 - 添加 RLIMIT_MEMLOCK 限制检查 - 计算当前已锁定页面数和请求数 - 超过限制时返回 ENOMEM 3. sys_mmap.rs - MAP_LOCKED 标志的资源限制 - MAP_LOCKED 映射时进行 RLIMIT_MEMLOCK 检查 - 检查 can_do_mlock() 权限 - 超过限制时返回 EAGAIN 4. sys_mremap.rs - 扩展锁定 VMA 时的资源检查 - 扩展已锁定的 VMA 时检查额外需要的页面数 - 进行 RLIMIT_MEMLOCK 限制验证 5. ucontext.rs - Fork 语义和辅助函数 - 修正 fork 语义:子进程不应继承 locked_vm 计数(从 0 开始) - 修正 fork 语义:子进程不应继承 def_flags(mlockall 设置的默认标志) - 修正 fork 语义:子进程不应继承 VM_LOCKED 和 VM_LOCKONFAULT 标志 - 添加 vma_is_accessible() 辅助函数 - mlock 前检查 VMA 可访问性(PROT_NONE 应返回 ENOMEM) - 添加 locked_vm() getter 方法 - mmap 时应用 def_flags(支持 mlockall MCL_FUTURE)
|
bugbot run |
kernel/src/mm/ucontext.rs
Outdated
| use ida::IdAllocator; | ||
| use log::warn; | ||
| use system_error::SystemError; | ||
| use x86::current; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import x86::current accidentally added
Low Severity
The import use x86::current; is added but never used anywhere in the file. The current() calls in this file refer to AddressSpace::current() and self.is_current() methods, not the x86 crate's current module. This appears to be accidentally committed code.
|
bugbot run |
|
Skipping Bugbot: Unable to authenticate your request. Please make sure Bugbot is properly installed and configured for this repository. |
|
bugbot run |
|
bugbot run |
|
|
||
| // 执行 mlock2(支持 MLOCK_ONFAULT) | ||
| let onfault = flags.contains(Mlock2Flags::MLOCK_ONFAULT); | ||
| addr_space.write().mlock(addr, aligned_len, onfault)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlock2 missing address alignment before locking
High Severity
The mlock2 syscall is missing address alignment logic that exists in mlock. In sys_mlock.rs, the address is aligned down to page boundary and the length is adjusted by adding the page offset before aligning up. However, sys_mlock2.rs only aligns the length without adjusting for page offset, and passes the original unaligned addr to mlock() instead of an aligned address. When a user calls mlock2 with an unaligned address like 0x1001 and length 0x1000, the wrong memory range will be locked. The fix requires adding the same alignment logic: compute page_offset, create aligned_addr, and use adjusted_len = len.saturating_add(page_offset).
🔬 Verification Test
Why verification test was not possible: This is a kernel-level bug in a Rust OS kernel (DragonOS). It requires running within that specific kernel environment with the mlock2 syscall to demonstrate that unaligned addresses are handled incorrectly. The bug is clearly visible by comparing the alignment logic in sys_mlock.rs (lines 29-38) which properly handles alignment versus sys_mlock2.rs (lines 36, 85) which does not.
| } | ||
| } | ||
| // 更新 locked_vm 计数,确保 RLIMIT_MEMLOCK 检查正确 | ||
| self.locked_vm.fetch_add(total_locked, Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mlockall locked_vm accounting uses wrong count
Medium Severity
The mlockall() function for non-ONFAULT mode incorrectly uses the count of actually locked pages (returned by mlock_vma_pages_range) for locked_vm accounting, while mlock() uses the VMA size. This inconsistency causes locked_vm to be undercounted when not all pages in a VMA are currently mapped. The ONFAULT branch correctly uses VMA size. The total_locked variable should be calculated from VMA sizes like in the ONFAULT branch, not from the mlock_vma_pages_range return value, to maintain consistent accounting with mlock() and proper RLIMIT_MEMLOCK enforcement.
🔬 Verification Test
Why verification test was not possible: This is a kernel-level bug requiring the DragonOS kernel environment to test. The inconsistency is visible by comparing: mlock() at lines 1345-1351 uses VMA intersection size for newly_locked_pages, while mlockall() non-ONFAULT at lines 1504-1511 uses mlock_vma_pages_range() return value which only counts actually mapped pages per its documentation at line 115.
| if is_locked { | ||
| let page_count = page_count.data(); | ||
| self.locked_vm.fetch_add(page_count, Ordering::Relaxed); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing RLIMIT check for MCL_FUTURE mmap locking
High Severity
When mlockall(MCL_FUTURE) sets def_flags to include VM_LOCKED, subsequent mmap() calls apply this flag via line 665 and increment locked_vm at lines 694-697. However, no RLIMIT_MEMLOCK check is performed in this code path. The RLIMIT check in sys_mmap.rs only triggers when map_flags.contains(MapFlags::MAP_LOCKED), which doesn't catch VM_LOCKED coming from def_flags. This allows a process to bypass its memory lock limit by calling mlockall(MCL_FUTURE) then making repeated mmap() calls without MAP_LOCKED, effectively locking unlimited memory.
🔬 Verification Test
Why verification test was not possible: This is a kernel-level security issue in DragonOS requiring kernel testing. The bug is evident from code analysis: sys_mmap.rs line 89 only checks RLIMIT when map_flags.contains(MapFlags::MAP_LOCKED), but map_inner at line 665 applies def_flags which may contain VM_LOCKED from mlockall(MCL_FUTURE), and lines 690-697 increment locked_vm without any limit check.
| // 如果页面可换出,移回正常 LRU | ||
| // 注意:需要检查页面是否真的可以换出(map_count == 0) | ||
| if page_guard.can_deallocate() { | ||
| page_guard.remove_flags(PageFlags::PG_UNEVICTABLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
munlock_page never clears PG_UNEVICTABLE flag
High Severity
The munlock_page function uses can_deallocate() to decide whether to remove the PG_UNEVICTABLE flag when the mlock count reaches zero. However, can_deallocate() is defined as map_count() == 0 && !self.flags.contains(PageFlags::PG_UNEVICTABLE). Since PG_UNEVICTABLE is still set at this point, can_deallocate() always returns false, meaning PG_UNEVICTABLE is never removed. The comment on line 95 states the intent is to check map_count() == 0, but the code incorrectly uses can_deallocate() which creates a circular dependency. This causes any page that was ever mlock'd to permanently remain unevictable, creating a memory leak.
🔬 Verification Test
Why verification test was not possible: This is a kernel-level logic bug in DragonOS that requires kernel debugging to verify. The bug is evident from static analysis: can_deallocate() in page.rs line 600-601 checks !PG_UNEVICTABLE, but in munlock_page at line 96, PG_UNEVICTABLE hasn't been removed yet, so the check always fails and line 97 is never reached.
概述
本 PR 实现了 Linux 兼容的 mlock/munlock/mlockall/munlockall 系列系统调用,用于防止关键内存页面被换出到交换空间。这是数据库、加密应用等需要保证内存持久性的场景的重要功能。
相关 Commits (按时间顺序):
主要功能
Note
Introduces Linux-compatible memory locking with proper accounting and limits.
kernel/src/mm/mlock.rswithmlock_page/munlock_page, page-table walking (incl. huge-page subpages), andcan_do_mlockmlock,mlock2(MLOCK_ONFAULT),mlockall(MCL_CURRENT|MCL_FUTURE|MCL_ONFAULT),munlock,munlockall; validate ranges, align, CAP/RLIMIT checkslocked_vmanddef_flags; implementmlock/munlock/mlockall/munlockall; applyVM_LOCKED/VM_LOCKONFAULT; do not inherit locks on forkPG_MLOCKED, usePG_UNEVICTABLE, and per-pagemlock_countVM_LOCKONFAULTset, lock anon pages on demand indo_anonymous_pagemadvise: returnEINVALforMADV_DONTNEEDonVM_LOCKEDVMAsmmap/mremap: enforceRLIMIT_MEMLOCKforMAP_LOCKEDand locked VMA expansion, returningEPERM/EAGAIN/ENOMEMas appropriateRLIMIT_MEMLOCKto 64KBuser/apps/c_unitest/test_mlock.c; whitelistmlock_testWritten by Cursor Bugbot for commit a4aa5ad. This will update automatically on new commits. Configure here.