Add blueprint/sled agent details #9524

labbott · 2025-12-16T14:14:20Z

No description provided.

andrewjstone

Overall looks good! 🚀

nexus/db-model/src/deployment.rs

sled-agent/types/versions/src/initial/inventory.rs

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

nexus/types/src/deployment.rs

andrewjstone · 2025-12-16T23:45:09Z

nexus/types/src/deployment.rs

+)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum BlueprintMeasurementSetDesiredContents {
+    /// This measurement source is whatever happens to be on the sled's


Looks like this comment is copy-pasta and could use an update.

I re-read this and it looks fine. I can't tell if I already fixed it and github is being weird or if this was in reference to another comment

sled-agent/types/versions/src/measurements/inventory.rs

andrewjstone · 2025-12-16T23:56:07Z

sled-agent/types/versions/src/measurements/inventory.rs

+}
+
+impl IdOrdItem for ReconciledSingleMeasurement {
+    type Key<'a> = String;


I think you can change this to be type Key<'a> = &'a str and then return &self.file_name from key(&self) and you could get rid of the clone.

sled-agent/types/versions/src/measurements/inventory.rs

davepacheco

I haven't made it very far yet but wanted to post some early questions.

davepacheco · 2025-12-18T18:54:03Z

dev-tools/reconfigurator-cli/tests/output/cmds-example-stdout

+            path on boot disk: /fake/path/install/zones.json
+            boot disk inventory:
+                manifest generated by installinator (mupdate ID: 00000000-0000-0000-0000-000000000000)
+                no artifacts in install dataset (this should only be seen in simulated systems)
+            no non-boot disks


Is it right that this duplicates the block above it?

I should at least change the path for testing but this is going to look very similar since they are being generated in very similar circumstances. The mupdate-id does get duplicated but I found it difficult to pull the mupdate ID up a level.

davepacheco · 2025-12-18T18:54:18Z

dev-tools/reconfigurator-cli/tests/output/cmds-example-stdout

        b957d6cf-f7b2-4bee-9928-c5fde8c59e04     crucible         install-dataset 
        e246f5e3-0650-4afc-860f-ee7114d309c5     crucible         install-dataset 
+    measurements: 
+    install dataset


What does this mean? That the current set of measurements that are in-use are coming from the install dataset?

Yes, see the comment in https://rfd.shared.oxide.computer/rfd/0512#_measurements_in_reconfigurator about install dataset

davepacheco · 2025-12-18T19:03:10Z

dev-tools/reconfigurator-cli/tests/output/cmds-example-stdout

+    reference measurements:
+        (measurement set is empty)


What's "reference measurements" here? Is this the set of measurements that Nexus has told Sled Agent are allowed right now?

I think more generally I'm confused about the difference between these three sections here ("measurements", "measurement manifest", and "reference measurements").

What's "reference measurements" here? Is this the set of measurements that Nexus has told Sled Agent are allowed right now?

Correct

I think more generally I'm confused about the difference between these three sections here ("measurements", "measurement manifest", and "reference measurements").

"measurement manifest" is the set of measurements on the install dataset that get placed during a MUPdate.

the "measurements" above are either a list of hashes for reconfigurator based update or a directive to use the install dataset.

the "reference measurements" are the resolved paths we should actually pass to sprockets

davepacheco · 2025-12-18T19:09:46Z

schema/crdb/measurements/up07.sql

+   ALTER COLUMN  measurement_manifest_boot_disk_path DROP default,
+   ALTER COLUMN  measurement_manifest_source DROP default,
+   ALTER COLUMN  measurement_manifest_mupdate_id DROP default,
+   ALTER COLUMN  measurement_manifest_boot_disk_error DROP default;


This is done in other cases when we want to add a new column without a default, but we need to supply a default temporarily when we add the column in order to migrate existing data:
https://github.com/oxidecomputer/omicron/tree/main/schema/crdb#211-adding-a-new-column-without-a-default-value

davepacheco · 2025-12-18T19:11:03Z

schema/crdb/dbinit.sql

    -- similar to `usable_hardware_threads` and friends above.
    cpu_family omicron.public.sled_cpu_family NOT NULL,

+    -- The path to the boot disk image file.


What is the "boot disk image file"? (It doesn't seem to be a disk image.)

davepacheco · 2025-12-18T19:27:50Z

schema/crdb/dbinit.sql

+    -- The path to the boot disk image file.
+    measurement_manifest_boot_disk_path TEXT NOT NULL,
+    -- The source of the zone manifest on the boot disk: from installinator or
+    -- sled-agent (synthetic). NULL means there is an error reading the zone manifest.
+    measurement_manifest_source omicron.public.inv_zone_manifest_source,
+    -- The mupdate ID that created the zone manifest if this is from installinator. If
+    -- this is NULL, then either the zone manifest is synthetic or there was an
+    -- error reading the zone manifest.
+    measurement_manifest_mupdate_id UUID,
+    -- Message describing the status of the zone manifest on the boot disk. If
+    -- this is NULL, then the zone manifest was successfully read, and the
+    -- inv_zone_manifest_zone table has entries corresponding to the zone
+    -- manifest.
+    measurement_manifest_boot_disk_error TEXT,


A lot of the comments here seem to be copy/paste of the zone manifest comments. What's different about these fields?

These are tracking the measurement manifest specifically. I missed updating the comments because it does look a lot like the zone manifest.

The measurement manifest exists on the boot disk and the non-boot disk and can have a separate error (e.g. the measurement file doesn't exist). As noted before, it also tracks the same mupdate override because trying to pull that to a higher level was a pretty bit mess.

davepacheco · 2025-12-18T19:30:18Z

schema/crdb/dbinit.sql

    host_phase_2_desired_slot_b STRING(64),
+
+    -- measurement contents
+    measurements STRING(64)[],


What is this, exactly? How big is the array? What do the elements correspond to?

I'll clarify this comment

davepacheco

I've gone through the blueprint-related types here. I have not yet gone through the inventory or the db-model/db-queries stuff but I feel like I'm getting a feel for the structure of things now!

davepacheco · 2025-12-18T23:17:43Z

nexus/types/src/deployment.rs

+        }
+    }
+}
+#[derive(


Suggested change

#[derive(

/// Describes the set of software measurements that should be trusted by this sled (for trust quorum)

#[derive(

(or something -- this seems like an important type and could use a doc comment)

davepacheco · 2025-12-18T23:20:18Z

nexus/types/src/deployment.rs

+)]
+#[serde(rename_all = "snake_case")]
+pub struct BlueprintMeasurementsDesiredContents {
+    pub measurements: BTreeSet<BlueprintSingleMeasurement>,


Not a big deal but maybe contributing to some confusion: it seems like an extra level of indirection here (sled.measurements.measurements is the set of measurements)?

yes this is a hold over from a very early draft where we were explicitly modeling "old measurements" and "new measurements". That turned out to not work well/feel redundant. I went back and forth on collapsing it back in. I kind of like keeping the abstraction in case we need it later but I also realize it's probably going to be just as much work. Let me see if I can remove the extra layer.

davepacheco · 2025-12-18T23:31:12Z

nexus/types/src/deployment.rs

    }
 }

+#[derive(


Suggested change

#[derive(

/// Identifies a specific TUF repo artifact containing measurement data

#[derive(

?

davepacheco · 2025-12-18T23:33:22Z

nexus/types/src/deployment.rs

+    pub version: BlueprintArtifactVersion,
+    pub hash: ArtifactHash,
+    pub prune: bool,


(sorry, rust-analyzer is failing me here)

I expected this to just be an ArtifactHash. Am I understanding correctly that version is just here for nicer human-readable output?

What's prune for? I can't find a use of it besides serialization and display. Maybe it's used in the bigger PR for decision making? If so, maybe add a comment here?

yes the version is mostly for debugging/extra checks.

prune definitely deserves a bigger comment here. This is to decide what measurements get included in the active set of measurements. https://rfd.shared.oxide.computer/rfd/0512#_reconfigurator_implementation_details has an example sequence and measurements that will be dropped next time will be marked as prune

(I saw there's been some discussion about word choosing for similar(?) behavior with other parts of the system. I'm open to name changes to match)

davepacheco · 2025-12-18T23:47:41Z

nexus/types/src/deployment.rs

+        write!(f, "{} prune: {}", self.hash, self.prune)
+    }
+}
+/// Where the measurement source is located


Suggested change

/// Where the measurement source is located

/// For a particular sled, which measurements should be used for trusting other members of the cluster

davepacheco · 2025-12-19T00:08:43Z

nexus/types/src/deployment.rs

+}
+
+impl BlueprintMeasurementsDesiredContents {
+    pub fn default_contents() -> Self {


I'm always worried about people interpreting "default" differently. In particular, "default" seems like something that should be safe if you don't know or care what you're doing, but this is not a safe value because it would cause a sled to trust no measurements at all. (Right?)

This seems to be mostly used in tests. How about calling this empty_for_tests()?

The other user is blueprint_read(). We could instead load the per-sled measurement data first and then construct a fully-formed BlueprintMeasurementsDesiredContents with what we read... I like this approach much better but I see that blueprint_read() is already not doing this with zones and datasets so maybe that's not worth it. In that case though maybe have this called none() so it at least doesn't suggest to people that it's a safe default?

Edit: after reading more code, I think I was wrong in interpreting the empty set of measurements as "trusting nothing" because when we convert this to an OmicronMeasurements we translate an empty set to InstallDataset. That's counter intuitive to me -- I'd strongly suggest making that more explicit with an enum with two variants ... which brings me back to feeling like: could we delete BlueprintMeasurementsDesiredContents as it currently exists and rename BlueprintMeasurementSetDesiredContents to BlueprintMeasurementsDesiredContents? Or even just BlueprintSledMeasurements? ("desired" is implied for everything in the blueprint)

davepacheco · 2025-12-19T00:11:16Z

nexus/types/src/inventory/display.rs

                }
            }
+
+            writeln!(indented, "reference measurements:")?;


I wonder if we can be slightly more specific so that people who aren't steeped in all the measurement work would know which measurements these are. Would it be accurate to call this "computed set of measurements acceptable from other sleds"?

davepacheco · 2025-12-19T00:17:20Z

sled-agent/src/rack_setup/plan/service.rs

                    host_phase_2:
                        BlueprintHostPhase2DesiredSlots::current_contents(),
                    remove_mupdate_override: None,
+                    measurements: BlueprintMeasurementsDesiredContents::default_contents(),


This doesn't seem right. This means "trust nothing", but after RSS, the sled should be trusting what's in the install dataset, right? That would also make it analogous to how this works for zone images. There, the per-sled value in the blueprint is more analogous to OmicronMeasurementSetDesiredContents, which lets you say InstallDataset. I think we might want to do that here (but I might be missing why OmicronMeasurementsDesiredContents exists).

Edit: I see now that an empty set here does mean InstallDataset and I've commented elsewhere that I think we should make that more explicit.

davepacheco · 2025-12-19T00:34:16Z

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs

+                // TODO this will come in a subsequent PR
+                measurements:
+                    BlueprintMeasurementsDesiredContents::default_contents(),


I wonder if we could better track these call sites with a separate constructor, like placeholder()? Instead of relying on these TODOs?

(I just suggested something similar to Karen on another PR)

davepacheco · 2025-12-19T00:35:18Z

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs

                    .finalize(),
                host_phase_2: self.host_phase_2.finalize(),
+                // TODO this will come in a subsequent PR
+                measurements:


In this particular case, I feel like a safer default would be to use whatever was in the BlueprintSledConfig that we started with in creating this ActiveSledEditor. I feel like you should be able to grab that in new.

labbott marked this pull request as draft December 16, 2025 14:14

labbott force-pushed the measurement_inventory_blueprints branch from e13289c to 044523e Compare December 16, 2025 20:46

andrewjstone reviewed Dec 17, 2025

View reviewed changes

sled-agent/types/versions/src/measurements/inventory.rs Outdated Show resolved Hide resolved

labbott force-pushed the measurement_inventory_blueprints branch 3 times, most recently from 4d5b5be to 4acf358 Compare December 17, 2025 18:47

Add blueprint/sled agent details

4acf358

labbott marked this pull request as ready for review December 18, 2025 15:44

labbott mentioned this pull request Dec 18, 2025

Drop as conversion in inventory #9541

Open

davepacheco reviewed Dec 18, 2025

View reviewed changes

labbott added 5 commits December 18, 2025 21:59

only do v11 (THANK YOU KAREN FOR THE HINT)

2ddaf59

I have altered the database pray I do not alter it further

12604e4

make those explicit TODO

f648154

Fix comment coffee/paste

4f407f3

Clarify that that

2d4dab6

davepacheco reviewed Dec 19, 2025

View reviewed changes

THAT WILL HELP

c03b47f

	#[derive(
	/// Describes the set of software measurements that should be trusted by this sled (for trust quorum)
	#[derive(

	#[derive(
	/// Identifies a specific TUF repo artifact containing measurement data
	#[derive(

	/// Where the measurement source is located
	/// For a particular sled, which measurements should be used for trusting other members of the cluster

Add blueprint/sled agent details #9524

Are you sure you want to change the base?

Add blueprint/sled agent details #9524

Uh oh!

Conversation

labbott commented Dec 16, 2025

Uh oh!

andrewjstone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers