Skip to content

Conversation

@Samuka007
Copy link
Member

@Samuka007 Samuka007 commented Dec 18, 2025

实现:

  • 非特权下镜像构建
  • 声明式的用户态软件包构建
  • 对于 vfat 格式的rootfs,脚本内置实例化符号链接,代价是镜像大小来到 4.3G(可选)
  • 实现 ext4 格式 rootfs 的生成
  • 对于需要改造的用户态软件,直接使用nix构建上下文进行修改,无需自己另开仓库、另写脚本
  • 使用基于nix dockerTools的镜像构建方案,目录结构可维护性强
  • 构建系统不再局限于 ubuntu

目前本方案的缺点

重复构建时,如果实时修改用户态程序且多次构建,由于每次导入都会对整个rootfs构建一次镜像,容易快速消耗存储空间,需要定期执行 nix store gc (可改进)

由于同样的空间占用考虑,build-rootfs和start的产物为打包后经过链接的脚本(而非直接产物),执行脚本在同样的 bin 目录下生成 rootfs 镜像。(一般直接 nix run .#build-rootfs.x86_64 再 nix run .#start.x86_64 即可)

用法

  • checkout
git checkout build/nix-qemu
  • 安装nix并启用flake功能

  • 使用 Makefile 来构建 kernel

make kernel
  • (Optional) 利用nix develop构建kernel
    如果你没有配置 Rust 环境,可以先使用nix的 Rust 环境:
nix develop ./tools/nix-dev-shell

然后做一些dirty的修复,因为非rustup安装的cargo不支持 +version 的语法

grep -rlZ '+nightly-2025-08-10' ./build-scripts | xargs -0 sed -i 's/+nightly-2025-08-10//g'
grep -rlZ '+nightly-2025-08-10' ./kernel | xargs -0 sed -i 's/+nightly-2025-08-10//g'
sed -i 's/CCPREFIX=x86_64-linux-gnu-/CCPREFIX=/g' kernel/env.mk

最后就可以照常地

make kernel
  • 构建rootfs

构建kernel的同时你也可以另开一个终端构建rootfs

nix run .#build-rootfs.x86_64
  • 启动 DragonOS
nix run .#start.x86_64

如果磁盘占用很多,可以执行gc命令,回收 rootfs 构建时产生的镜像依赖

nix store gc

下一步工作

  • 减小rootfs占用nix store的体积
  • 让gvisor syscall test能正常通过启动
  • 实现对deb包的打包兼容
  • 将kernel构建纳入
  • 允许更加灵活的binary注入
  • 打包rootfs img时从从零创建的范式转向原DADK的覆盖行为(保留先前系统内修改的内容,避免宿主机重复创建镜像导致的文件系统碎片)

@github-actions github-actions bot added the ambiguous The title of PR/issue doesn't match the format label Dec 18, 2025
Samuka007 and others added 5 commits December 19, 2025 13:21
- 修复了ext4 inode读写操作中的自旋锁死锁问题
- 添加了父目录指针支持,实现parent()方法
- 改进了块设备寻址逻辑,统一使用512字节LBA
- 增强了根文件系统探测机制,支持ext4和FAT自动识别
- 修复了ELF加载器中解释器路径查找问题
- 更新了another_ext4依赖版本

Signed-off-by: longjin <longjin@DragonOS.org>
Split the test runner and test data into separate derivations (runner,
tests) and expose them via symlinkJoin to avoid unnecessary Rust
rebuilds. Use sourceByRegex for test sources and patchelf to set the
interpreter/RPATH for bundled test binaries.

Harden the image builder script: ensure build dir exists, use fakeroot
for tar operations, chmod temp dirs before cleanup, adjust traps, and
increase the disk image size to 5G. Add .envrc, ignore .direnv, and make
minor Nix formatting/comment tweaks.
@Samuka007 Samuka007 linked an issue Dec 19, 2025 that may be closed by this pull request
@Samuka007 Samuka007 changed the title Build/nix qemu feat(build): replace rootfs and user space app creation with nix base scripts Dec 19, 2025
@github-actions github-actions bot added enhancement New feature or request and removed ambiguous The title of PR/issue doesn't match the format labels Dec 19, 2025
Copy link
Member Author

image.png

使用 patchelf 为 testcases 打上动态链接 patch 后,在nix构建的rootfs环境下有几率触发segment fault

@Samuka007
Copy link
Member Author

merge的ext4支持将在最终合并本pr前清理

@Samuka007 Samuka007 marked this pull request as ready for review December 19, 2025 13:20
@Samuka007
Copy link
Member Author

@fslongjin 可以看看这个状态下可以先合并,后续工作长期来做?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Nix-based build system for rootfs and userspace application creation in DragonOS, replacing the previous build approach. The changes enable unprivileged image building, declarative userspace package management, and support for both ext4 and vfat filesystem formats.

Key changes:

  • Nix flake infrastructure for reproducible builds of rootfs images and QEMU runtime scripts
  • Kernel support for automatic ext4/FAT filesystem detection and mounting
  • Critical ext4 filesystem fixes including LBA addressing correction, deadlock prevention in PageCache operations, and symlink support for dynamic linker resolution
  • Integration of gVisor syscall tests with proper Rust toolchain management

Reviewed changes

Copilot reviewed 19 out of 24 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
flake.nix Main Nix flake defining build-rootfs and start applications for x86_64
flake.lock Lock file with dependency versions (contains future timestamps indicating version issues)
user/default.nix Rootfs image build script generator using guestfish and libguestfs
user/rootfs-tar.nix Docker-based rootfs tar creation using dockerTools.buildImage
user/apps/default.nix Declaration of userspace applications including busybox, curl, dropbear, and gVisor tests
user/apps/about/default.nix Nix derivation for the "about" utility
user/apps/tests/syscall/gvisor/default.nix gVisor syscall test runner and test data packaging
user/apps/tests/syscall/gvisor/runner/Cargo.lock Added to version control for Nix reproducibility
user/apps/tests/syscall/gvisor/runner/.gitignore Removed Cargo.lock from ignore list
tools/qemu/default.nix QEMU launch script generator with architecture-specific configuration
tools/qemu/qemu-firmware.nix QEMU firmware files extraction from Debian package
tools/nix-dev-shell/flake.nix Development shell with Rust toolchain from fenix
tools/nix-dev-shell/flake.lock Dev shell dependency lock file
tools/run-qemu.sh Added root=/dev/vda kernel parameter for rootfs mounting
kernel/src/filesystem/vfs/vcore.rs Automatic filesystem type detection and mounting with ext4/FAT fallback
kernel/src/filesystem/ext4/filesystem.rs Fixed root inode initialization with self_ref and parent pointers
kernel/src/filesystem/ext4/inode.rs Fixed PageCache deadlock, added parent() method, symlink read support
kernel/src/filesystem/ext4/gendisk.rs Critical fix for LBA addressing to use 512-byte units consistently
kernel/src/libs/elf.rs Added symlink following for dynamic linker interpreter path resolution
kernel/src/process/exec.rs Improved error handling by replacing panic with proper error propagation
kernel/crates/kdepends/Cargo.toml Updated another_ext4 dependency to revision 8bc384254c
kernel/Cargo.lock Updated another_ext4 dependency lock
.gitignore Removed Nix-related files from ignore list to allow version control
.envrc Added direnv configuration to use nix-dev-shell flake

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
}
info!("Successfully migrate rootfs to FAT32!");
info!("Successfully migrate rootfs to {}!", fs_name);
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue: "Successfully migrate" should be "Successfully migrated" to match tense consistency.

Copilot uses AI. Check for mistakes.
Comment on lines +277 to +288
fn parent(&self) -> Result<Arc<dyn IndexNode>, SystemError> {
// 只有目录才有父目录的概念
// 先检查当前inode是否为目录
let guard = self.0.lock();

// 如果存储了父级指针,直接返回
if let Some(parent) = guard.parent.upgrade() {
return Ok(parent);
}

Err(SystemError::ENOENT)
}
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent() implementation doesn't handle the root inode case correctly. For the root inode, parent points to itself (as initialized in filesystem.rs line 84), which means calling parent() on the root will return itself rather than returning an appropriate error or the root itself with clear semantics. This could cause infinite loops in code that traverses the filesystem tree upward. Consider either: (1) returning a specific error like EINVAL for root, or (2) clearly documenting that root.parent() returns root itself, which matches Unix semantics where ".." in root points to root.

Copilot uses AI. Check for mistakes.
] else [
"-device" "virtio-serial" "-device" "virtconsole,chardev=mux"
]));
kernelCmdlinePart = if isNographic then "console=/dev/hvc0" else "";
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernelCmdlinePart only adds "console=/dev/hvc0" when nographic is true, but it doesn't include the "root=/dev/vda" parameter that was added in tools/run-qemu.sh line 255. This inconsistency means the nix-based QEMU script may fail to mount the root filesystem. Consider adding the root parameter here: kernelCmdlinePart = if isNographic then "root=/dev/vda console=/dev/hvc0" else "root=/dev/vda";

Suggested change
kernelCmdlinePart = if isNographic then "console=/dev/hvc0" else "";
kernelCmdlinePart = if isNographic then "root=/dev/vda console=/dev/hvc0" else "root=/dev/vda";

Copilot uses AI. Check for mistakes.
static.busybox
static.curl
static.dropbear
pkgs.glibc
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should either be removed or documented with a reason why it's kept. The commented lines for gccNGPackages_15.libstdcxx and libcxx provide no context about why they're disabled or when they might be needed. Either remove them if they're not needed, or add a comment explaining their purpose and why they're temporarily disabled.

Suggested change
pkgs.glibc
pkgs.glibc
# Optional C++ standard libraries for debugging C++ workloads.
# They are excluded from the default rootfs to keep the image small,
# but can be re-enabled here if C++ apps are added.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +73
runHook preInstall
find $out/${installDir}/tests -type f -name '*_test' -exec install -m755 {} $out/${installDir}/tests \; || true
runHook postInstall
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runHook preInstall is called after the installation phase has already started and most installation operations have been performed. The hook should be called at the beginning of the installPhase, not after the tar extraction and file operations. Move runHook preInstall to the start of the installPhase (before line 61), and ensure postInstall is at the very end.

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +109
guestfish -a "$TEMP_IMG" <<EOF
run
part-init /dev/sda ${partitionType}
part-add /dev/sda primary 2048 -2048
mkfs ${rootfsType} /dev/sda1
mount /dev/sda1 /
tar-in $FINAL_TAR /
chmod 0755 /
umount /
sync
shutdown
EOF
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guestfish heredoc uses unquoted shell variables (${partitionType}, ${rootfsType}, $FINAL_TAR) which could be vulnerable to command injection if these values are attacker-controlled. While partitionType and rootfsType are likely safe as they come from Nix parameters, it's better to quote the variables or validate them. Consider using proper quoting: mkfs "${rootfsType}" /dev/sda1 and tar-in "$FINAL_TAR" /

Copilot uses AI. Check for mistakes.
trap 'chmod +w -R "$TEMP_DIR" "$EXTRACT_DIR" && rm -rf "$TEMP_DIR" "$EXTRACT_DIR"' EXIT
# 解压原始 tar,排除 /nix/store
echo " Extracting and not filtering..."
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "not filtering" should be clarified or removed as the comment is misleading. The code actually does filter by excluding 'proc', 'dev', and 'sys'.

Suggested change
echo " Extracting and not filtering..."
echo " Extracting (excluding proc, dev, sys)..."

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +211
Some(RootFsKind::Fat) => Ok(FATFileSystem::new(gendisk.clone())?),
None => {
// 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。
Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| {
let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?;
Ok(fat)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch in filesystem initialization. FATFileSystem::new() returns Result<Arc<FATFileSystem>, SystemError> while Ext4FileSystem::from_gendisk() returns Result<Arc<dyn FileSystem>, SystemError>. The current code wraps the result in Ok() which creates a nested Result type that doesn't match the expected type. The code should cast FATFileSystem to the trait object: Some(RootFsKind::Fat) =&gt; FATFileSystem::new(gendisk.clone()).map(|fs| fs as Arc&lt;dyn FileSystem&gt;)

Suggested change
Some(RootFsKind::Fat) => Ok(FATFileSystem::new(gendisk.clone())?),
None => {
// 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。
Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| {
let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?;
Ok(fat)
Some(RootFsKind::Fat) => {
FATFileSystem::new(gendisk.clone()).map(|fs| fs as Arc<dyn FileSystem>)
}
None => {
// 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。
Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| {
let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?;
Ok(fat as Arc<dyn FileSystem>)

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +211
Some(RootFsKind::Fat) => Ok(FATFileSystem::new(gendisk.clone())?),
None => {
// 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。
Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| {
let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?;
Ok(fat)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch in fallback filesystem initialization. The same issue as line 206 - FATFileSystem::new() returns Result<Arc<FATFileSystem>, SystemError> which needs to be cast to Arc<dyn FileSystem>. The code should be: let fat = FATFileSystem::new(gendisk.clone())?; Ok(fat as Arc&lt;dyn FileSystem&gt;)

Suggested change
Some(RootFsKind::Fat) => Ok(FATFileSystem::new(gendisk.clone())?),
None => {
// 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。
Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| {
let fat: Arc<FATFileSystem> = FATFileSystem::new(gendisk.clone())?;
Ok(fat)
Some(RootFsKind::Fat) => {
let fat = FATFileSystem::new(gendisk.clone())?;
Ok(fat as Arc<dyn FileSystem>)
}
None => {
// 兜底:按常见顺序尝试初始化(ext4 -> fat),便于未来扩展 probe 或处理特殊镜像。
Ext4FileSystem::from_gendisk(gendisk.clone()).or_else(|_| {
let fat = FATFileSystem::new(gendisk.clone())?;
Ok(fat as Arc<dyn FileSystem>)

Copilot uses AI. Check for mistakes.
flake.nix Outdated
Comment on lines 29 to 42
apps.${system} = {
start.${target} = {
type = "app";
program = "${qemuScripts.${target}}/bin/run-dragonos-x86";
};
build-rootfs.${target} = {
type = "app";
program = "${pkgs.callPackage ./user/default.nix {
inherit pkgs system fenix target syscallTestDir;
rootfsType = "ext4";
buildDir = "./bin";
}}/bin/build-rootfs-image";
};
};
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The apps structure uses nested attribute sets (apps.${system}.start.${target}) which creates an unusual access pattern. This means users need to run nix run .#start.x86_64 rather than the more common nix run .#start-x86_64. Consider flattening the structure to apps.${system}."start-${target}" for better ergonomics and consistency with common Nix flake patterns.

Copilot uses AI. Check for mistakes.
Introduce mkApps to generate app entries for x86_64 and riscv64, pass
diskPath through the flake and user build scripts, refactor the QEMU
runner to produce arch-specific args, and add cross/musl handling for
user apps and rootfs generation
@Samuka007 Samuka007 marked this pull request as draft December 20, 2025 06:43
@Samuka007
Copy link
Member Author

等我补充个文档

Add a reproducible Sphinx flake (docs/flake.nix + flake.lock) and a set
of documentation pages (nix dev guide, devcontainer, userland/rootfs,
dev-skills, write-docs). Enable myst linkify and update docs
requirements. Add per-app flakes and locks (user/apps/about,
user/apps/tests/syscall/gvisor) and integrate gvisor tests into
user/apps default.nix.
@Samuka007 Samuka007 marked this pull request as ready for review December 20, 2025 09:47
@Samuka007
Copy link
Member Author

等待 #1509 合入再rebase

Add a debug argument (debug ? false) to tools/qemu and only pass "-s"
"-S" when debug is enabled. Update flake to set debug = true and
autotest = "none"
Introduce a `testOpt` overlay to consolidate various test-related
options, including syscall test configuration and autotest settings.
This change centralizes these settings, making them easier to manage and
pass down through the Nix expression.
Update `flake.nix` to use the correct executable names for
`dragonos-run` and `dragonos-rootfs`. This commit also adds a section to
`nix-skills.md` to clarify how to avoid redundant `nix run` commands in
the same terminal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] 使用 nix + guestfish 替代 DADK 用户程序构建

3 participants