-
-
Notifications
You must be signed in to change notification settings - Fork 169
feat(build): replace rootfs and user space app creation with nix base scripts #1505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
bc23e7c to
d465518
Compare
flake 2 3 4
…s build process for vfat
d465518 to
b912281
Compare
- 修复了ext4 inode读写操作中的自旋锁死锁问题 - 添加了父目录指针支持,实现parent()方法 - 改进了块设备寻址逻辑,统一使用512字节LBA - 增强了根文件系统探测机制,支持ext4和FAT自动识别 - 修复了ELF加载器中解释器路径查找问题 - 更新了another_ext4依赖版本 Signed-off-by: longjin <longjin@DragonOS.org>
Split the test runner and test data into separate derivations (runner, tests) and expose them via symlinkJoin to avoid unnecessary Rust rebuilds. Use sourceByRegex for test sources and patchelf to set the interpreter/RPATH for bundled test binaries. Harden the image builder script: ensure build dir exists, use fakeroot for tar operations, chmod temp dirs before cleanup, adjust traps, and increase the disk image size to 5G. Add .envrc, ignore .direnv, and make minor Nix formatting/comment tweaks.
|
merge的ext4支持将在最终合并本pr前清理 |
|
@fslongjin 可以看看这个状态下可以先合并,后续工作长期来做? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a Nix-based build system for rootfs and userspace application creation in DragonOS, replacing the previous build approach. The changes enable unprivileged image building, declarative userspace package management, and support for both ext4 and vfat filesystem formats.
Key changes:
- Nix flake infrastructure for reproducible builds of rootfs images and QEMU runtime scripts
- Kernel support for automatic ext4/FAT filesystem detection and mounting
- Critical ext4 filesystem fixes including LBA addressing correction, deadlock prevention in PageCache operations, and symlink support for dynamic linker resolution
- Integration of gVisor syscall tests with proper Rust toolchain management
Reviewed changes
Copilot reviewed 19 out of 24 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| flake.nix | Main Nix flake defining build-rootfs and start applications for x86_64 |
| flake.lock | Lock file with dependency versions (contains future timestamps indicating version issues) |
| user/default.nix | Rootfs image build script generator using guestfish and libguestfs |
| user/rootfs-tar.nix | Docker-based rootfs tar creation using dockerTools.buildImage |
| user/apps/default.nix | Declaration of userspace applications including busybox, curl, dropbear, and gVisor tests |
| user/apps/about/default.nix | Nix derivation for the "about" utility |
| user/apps/tests/syscall/gvisor/default.nix | gVisor syscall test runner and test data packaging |
| user/apps/tests/syscall/gvisor/runner/Cargo.lock | Added to version control for Nix reproducibility |
| user/apps/tests/syscall/gvisor/runner/.gitignore | Removed Cargo.lock from ignore list |
| tools/qemu/default.nix | QEMU launch script generator with architecture-specific configuration |
| tools/qemu/qemu-firmware.nix | QEMU firmware files extraction from Debian package |
| tools/nix-dev-shell/flake.nix | Development shell with Rust toolchain from fenix |
| tools/nix-dev-shell/flake.lock | Dev shell dependency lock file |
| tools/run-qemu.sh | Added root=/dev/vda kernel parameter for rootfs mounting |
| kernel/src/filesystem/vfs/vcore.rs | Automatic filesystem type detection and mounting with ext4/FAT fallback |
| kernel/src/filesystem/ext4/filesystem.rs | Fixed root inode initialization with self_ref and parent pointers |
| kernel/src/filesystem/ext4/inode.rs | Fixed PageCache deadlock, added parent() method, symlink read support |
| kernel/src/filesystem/ext4/gendisk.rs | Critical fix for LBA addressing to use 512-byte units consistently |
| kernel/src/libs/elf.rs | Added symlink following for dynamic linker interpreter path resolution |
| kernel/src/process/exec.rs | Improved error handling by replacing panic with proper error propagation |
| kernel/crates/kdepends/Cargo.toml | Updated another_ext4 dependency to revision 8bc384254c |
| kernel/Cargo.lock | Updated another_ext4 dependency lock |
| .gitignore | Removed Nix-related files from ignore list to allow version control |
| .envrc | Added direnv configuration to use nix-dev-shell flake |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
| info!("Successfully migrate rootfs to FAT32!"); | ||
| info!("Successfully migrate rootfs to {}!", fs_name); |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar issue: "Successfully migrate" should be "Successfully migrated" to match tense consistency.
| fn parent(&self) -> Result<Arc<dyn IndexNode>, SystemError> { | ||
| // 只有目录才有父目录的概念 | ||
| // 先检查当前inode是否为目录 | ||
| let guard = self.0.lock(); | ||
|
|
||
| // 如果存储了父级指针,直接返回 | ||
| if let Some(parent) = guard.parent.upgrade() { | ||
| return Ok(parent); | ||
| } | ||
|
|
||
| Err(SystemError::ENOENT) | ||
| } |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent() implementation doesn't handle the root inode case correctly. For the root inode, parent points to itself (as initialized in filesystem.rs line 84), which means calling parent() on the root will return itself rather than returning an appropriate error or the root itself with clear semantics. This could cause infinite loops in code that traverses the filesystem tree upward. Consider either: (1) returning a specific error like EINVAL for root, or (2) clearly documenting that root.parent() returns root itself, which matches Unix semantics where ".." in root points to root.
| ] else [ | ||
| "-device" "virtio-serial" "-device" "virtconsole,chardev=mux" | ||
| ])); | ||
| kernelCmdlinePart = if isNographic then "console=/dev/hvc0" else ""; |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernelCmdlinePart only adds "console=/dev/hvc0" when nographic is true, but it doesn't include the "root=/dev/vda" parameter that was added in tools/run-qemu.sh line 255. This inconsistency means the nix-based QEMU script may fail to mount the root filesystem. Consider adding the root parameter here: kernelCmdlinePart = if isNographic then "root=/dev/vda console=/dev/hvc0" else "root=/dev/vda";
| kernelCmdlinePart = if isNographic then "console=/dev/hvc0" else ""; | |
| kernelCmdlinePart = if isNographic then "root=/dev/vda console=/dev/hvc0" else "root=/dev/vda"; |
user/apps/default.nix
Outdated
| static.busybox | ||
| static.curl | ||
| static.dropbear | ||
| pkgs.glibc |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code should either be removed or documented with a reason why it's kept. The commented lines for gccNGPackages_15.libstdcxx and libcxx provide no context about why they're disabled or when they might be needed. Either remove them if they're not needed, or add a comment explaining their purpose and why they're temporarily disabled.
| pkgs.glibc | |
| pkgs.glibc | |
| # Optional C++ standard libraries for debugging C++ workloads. | |
| # They are excluded from the default rootfs to keep the image small, | |
| # but can be re-enabled here if C++ apps are added. |
| runHook preInstall | ||
| find $out/${installDir}/tests -type f -name '*_test' -exec install -m755 {} $out/${installDir}/tests \; || true | ||
| runHook postInstall |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runHook preInstall is called after the installation phase has already started and most installation operations have been performed. The hook should be called at the beginning of the installPhase, not after the tar extraction and file operations. Move runHook preInstall to the start of the installPhase (before line 61), and ensure postInstall is at the very end.
| guestfish -a "$TEMP_IMG" <<EOF | ||
| run | ||
| part-init /dev/sda ${partitionType} | ||
| part-add /dev/sda primary 2048 -2048 | ||
| mkfs ${rootfsType} /dev/sda1 | ||
| mount /dev/sda1 / | ||
| tar-in $FINAL_TAR / | ||
| chmod 0755 / | ||
| umount / | ||
| sync | ||
| shutdown | ||
| EOF |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guestfish heredoc uses unquoted shell variables (${partitionType}, ${rootfsType}, $FINAL_TAR) which could be vulnerable to command injection if these values are attacker-controlled. While partitionType and rootfsType are likely safe as they come from Nix parameters, it's better to quote the variables or validate them. Consider using proper quoting: mkfs "${rootfsType}" /dev/sda1 and tar-in "$FINAL_TAR" /
| trap 'chmod +w -R "$TEMP_DIR" "$EXTRACT_DIR" && rm -rf "$TEMP_DIR" "$EXTRACT_DIR"' EXIT | ||
| # 解压原始 tar,排除 /nix/store | ||
| echo " Extracting and not filtering..." |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: "not filtering" should be clarified or removed as the comment is misleading. The code actually does filter by excluding 'proc', 'dev', and 'sys'.
| echo " Extracting and not filtering..." | |
| echo " Extracting (excluding proc, dev, sys)..." |
| Some(RootFsKind::Fat) => Ok(FATFileSystem::new(gendisk.clone())?), | ||
| None => { | ||
| // 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。 | ||
| Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| { | ||
| let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?; | ||
| Ok(fat) |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch in filesystem initialization. FATFileSystem::new() returns Result<Arc<FATFileSystem>, SystemError> while Ext4FileSystem::from_gendisk() returns Result<Arc<dyn FileSystem>, SystemError>. The current code wraps the result in Ok() which creates a nested Result type that doesn't match the expected type. The code should cast FATFileSystem to the trait object: Some(RootFsKind::Fat) => FATFileSystem::new(gendisk.clone()).map(|fs| fs as Arc<dyn FileSystem>)
| Some(RootFsKind::Fat) => Ok(FATFileSystem::new(gendisk.clone())?), | |
| None => { | |
| // 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。 | |
| Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| { | |
| let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?; | |
| Ok(fat) | |
| Some(RootFsKind::Fat) => { | |
| FATFileSystem::new(gendisk.clone()).map(|fs| fs as Arc<dyn FileSystem>) | |
| } | |
| None => { | |
| // 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。 | |
| Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| { | |
| let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?; | |
| Ok(fat as Arc<dyn FileSystem>) |
| Some(RootFsKind::Fat) => Ok(FATFileSystem::new(gendisk.clone())?), | ||
| None => { | ||
| // 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。 | ||
| Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| { | ||
| let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?; | ||
| Ok(fat) |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch in fallback filesystem initialization. The same issue as line 206 - FATFileSystem::new() returns Result<Arc<FATFileSystem>, SystemError> which needs to be cast to Arc<dyn FileSystem>. The code should be: let fat = FATFileSystem::new(gendisk.clone())?; Ok(fat as Arc<dyn FileSystem>)
| Some(RootFsKind::Fat) => Ok(FATFileSystem::new(gendisk.clone())?), | |
| None => { | |
| // 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。 | |
| Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| { | |
| let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?; | |
| Ok(fat) | |
| Some(RootFsKind::Fat) => { | |
| let fat = FATFileSystem::new(gendisk.clone())?; | |
| Ok(fat as Arc<dyn FileSystem>) | |
| } | |
| None => { | |
| // 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。 | |
| Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| { | |
| let fat = FATFileSystem::new(gendisk.clone())?; | |
| Ok(fat as Arc<dyn FileSystem>) |
flake.nix
Outdated
| apps.${system} = { | ||
| start.${target} = { | ||
| type = "app"; | ||
| program = "${qemuScripts.${target}}/bin/run-dragonos-x86"; | ||
| }; | ||
| build-rootfs.${target} = { | ||
| type = "app"; | ||
| program = "${pkgs.callPackage ./user/default.nix { | ||
| inherit pkgs system fenix target syscallTestDir; | ||
| rootfsType = "ext4"; | ||
| buildDir = "./bin"; | ||
| }}/bin/build-rootfs-image"; | ||
| }; | ||
| }; |
Copilot
AI
Dec 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The apps structure uses nested attribute sets (apps.${system}.start.${target}) which creates an unusual access pattern. This means users need to run nix run .#start.x86_64 rather than the more common nix run .#start-x86_64. Consider flattening the structure to apps.${system}."start-${target}" for better ergonomics and consistency with common Nix flake patterns.
Introduce mkApps to generate app entries for x86_64 and riscv64, pass diskPath through the flake and user build scripts, refactor the QEMU runner to produce arch-specific args, and add cross/musl handling for user apps and rootfs generation
|
等我补充个文档 |
Add a reproducible Sphinx flake (docs/flake.nix + flake.lock) and a set of documentation pages (nix dev guide, devcontainer, userland/rootfs, dev-skills, write-docs). Enable myst linkify and update docs requirements. Add per-app flakes and locks (user/apps/about, user/apps/tests/syscall/gvisor) and integrate gvisor tests into user/apps default.nix.
|
等待 #1509 合入再rebase |
Add a debug argument (debug ? false) to tools/qemu and only pass "-s" "-S" when debug is enabled. Update flake to set debug = true and autotest = "none"
Introduce a `testOpt` overlay to consolidate various test-related options, including syscall test configuration and autotest settings. This change centralizes these settings, making them easier to manage and pass down through the Nix expression.
Update `flake.nix` to use the correct executable names for `dragonos-run` and `dragonos-rootfs`. This commit also adds a section to `nix-skills.md` to clarify how to avoid redundant `nix run` commands in the same terminal.
f4bd17d to
3db2732
Compare
实现:
目前本方案的缺点
重复构建时,如果实时修改用户态程序且多次构建,由于每次导入都会对整个rootfs构建一次镜像,容易快速消耗存储空间,需要定期执行
nix store gc(可改进)由于同样的空间占用考虑,build-rootfs和start的产物为打包后经过链接的脚本(而非直接产物),执行脚本在同样的 bin 目录下生成 rootfs 镜像。(一般直接 nix run .#build-rootfs.x86_64 再 nix run .#start.x86_64 即可)
用法
安装nix并启用flake功能
使用 Makefile 来构建 kernel
如果磁盘占用很多,可以执行gc命令,回收 rootfs 构建时产生的镜像依赖
下一步工作