Skip to content

Conversation

@europaul
Copy link
Contributor

@europaul europaul commented Jan 7, 2026

Description

The main part of this PR is the integration of https://github.com/jaypipes/ghw/ library into zedagent. We use it to collect info about the hardware that EVE is running on and to send this data in the HW inventory message.

Other changes include:

  • moving the SMART info from hardwareinfo message to hardwarehealth message as per the updated EVE API
  • deprecating timer.hardwareinfo.interval config parameter, because the hardwareinfo message is not sent periodically anymore

PR dependencies

How to test and validate this PR

For now only visual inspection of the HW inventory message and comparison with the actual HW configuration
Also I added the following Eden test: lf-edge/eden#1111

Changelog notes

Make EVE report the HW it's running on for easier onboarding.

PR Backports

No.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

func getKernelFlavor() string {
// Try to deduce from kernel version or /proc/version
version := getKernelVersion()
if strings.Contains(version, "-rt") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also -hwe flavor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation to extracting flavor from kernel version string.

@github-actions github-actions bot requested review from rene, rucoder and shjala January 13, 2026 15:26
@shjala
Copy link
Member

shjala commented Jan 13, 2026

why not upstreaming these changes? why we want to start maintaining a library?

@europaul
Copy link
Contributor Author

why not upstreaming these changes? why we want to start maintaining a library?

we want to upstream when it done. It's just faster to iterate on our own fork until we get the changes working and we get the info that we need

@europaul europaul force-pushed the integrate-ghw branch 2 times, most recently from 35e04fc to 45ee957 Compare January 15, 2026 18:56
@europaul europaul marked this pull request as ready for review January 15, 2026 18:57

func getFirmwareAttributes() map[string]string {
settings := make(map[string]string)
root := "/sys/class/firmware-attributes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which hw/BIOS vendors populate this? (I don't see it on an old Supermicro nor on my laptop).
Does something need to be enabled in the kernel to get this info?

Copy link
Contributor Author

@europaul europaul Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tied it on our ThinkPad laptops and it was working well, but it looks like this is not available on most servers.
The AI tells me that's it's mostly populated on Lenovo and Dell systems and needs CONFIG_THINK_LMI or CONFIG_DELL_WMI_SYSMAN respectively.

But I don't think there is a better unified solution to get the BIOS settings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have those options in our current kconfig?

We can separately check if there are other kconfig options needed which will provide this across more manufacturers. @rucoder mentioned that there might be a tool to get this, but unless I misunderstood him. even that tool relies on /sys/class/firware-attributes being populated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to have the support for Dell, but not for Lenovo

Comment on lines 107 to 108
InitStrategy: StrategyDellCmd,
DisplayStrategy: StrategyLedCmd,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the introduction of "Strategy", why do you need a separate strategy for init and for display?
Here it could be StrategyDell and you set an init function and display function baseed on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes more sense 👍

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.99%. Comparing base (2281599) to head (50cc172).
⚠️ Report is 206 commits behind head on master.

Files with missing lines Patch % Lines
pkg/pillar/evetpm/tpm.go 0.00% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5535      +/-   ##
==========================================
+ Coverage   19.52%   27.99%   +8.46%     
==========================================
  Files          19       19              
  Lines        3021     2322     -699     
==========================================
+ Hits          590      650      +60     
+ Misses       2310     1528     -782     
- Partials      121      144      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for _, disk := range disksInfo.Disks {
stDiskInfo := new(info.StorageDiskInfo)
if disk.CollectingStatus != types.SmartCollectingStatusSuccess {
stDiskInfo.DiskName = *proto.String(disk.DiskName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this *proto.String for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just useless boilerplate that we have across the code base - removing.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates the ghw library (github.com/zededa/ghw) to collect and report comprehensive hardware inventory data from EVE devices. The main changes move SMART disk health data from periodic hardware info messages to the hardware health reporting, deprecate the timer.hardwareinfo.interval configuration parameter, and introduce hardware inventory capabilities.

Changes:

  • Added hardware inventory collection using ghw library for PCI, USB, CPU, memory, storage, serial, network, CAN, BIOS, TPM, and watchdog devices
  • Moved SMART disk information from hardwareinfo message to hardwarehealth message
  • Deprecated timer.hardwareinfo.interval configuration parameter as hardwareinfo is no longer sent periodically

Reviewed changes

Copilot reviewed 20 out of 237 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/pillar/types/global.go Removes deprecated HardwareInfoInterval configuration key and adds Failure() method to SenderStatus
pkg/pillar/types/global_test.go Updates test to remove HardwareInfoInterval from expected config items
pkg/pillar/hardware/smart.go Updates import to use zededa/ghw instead of jaypipes/ghw
pkg/pillar/hardware/inventory.go New file implementing hardware inventory collection using ghw library
pkg/pillar/hardware/inventory_test.go New test file for inventory collection functionality
pkg/pillar/hardware/kernel.go New file adding kernel version and flavor detection functions
pkg/pillar/hardware/kernel_test.go New test file for kernel flavor detection
pkg/pillar/hardware/leds.go New file extracting LED control functionality from ledmanager
pkg/pillar/hardware/leds_test.go New test file for LED status presence detection
pkg/pillar/evetpm/tpm.go Updates TPM property constants to use tpm2 library definitions and adds GetSpecVersion function
pkg/pillar/cmd/zedagent/hardwareinfo.go Refactored to use inventory collection and remove SMART data handling
pkg/pillar/cmd/zedagent/handlemetrics.go Moves SMART disk data collection to hardware health reporting
pkg/pillar/cmd/zedagent/reportinfo.go Adds HwInventorySupport capability flag
pkg/pillar/cmd/ledmanager/ledmanager.go Refactored to use LED functions from hardware package
pkg/pillar/controllerconn/send.go Adds URLPath function and updates URLPathString documentation
pkg/pillar/go.mod Updates dependencies including ghw, eve-api, and related packages
pkg/pillar/Dockerfile Adds dmidecode and pciutils for test tools
pkg/installer/Dockerfile Updates eve-debug base image version
pkg/debug/spec.sh Adds error handling for missing hardwaremodel file
docs/CONFIG-PROPERTIES.md Updates documentation to reflect timer.hardwareinfo.interval deprecation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 252 to 261
// URLPath - generate url for either v1 or v1 API path
func URLPath(server string, isV2api bool, devUUID uuid.UUID, action string) *url.URL {
u, err := url.Parse(URLPathString(server, isV2api, devUUID, action))
if err != nil {
return nil
}

return u
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// URLPath - generate url for either v1 or v1 API path
func URLPath(server string, isV2api bool, devUUID uuid.UUID, action string) *url.URL {
u, err := url.Parse(URLPathString(server, isV2api, devUUID, action))
if err != nil {
return nil
}
return u
}

I think this is unused now - was in the old PR that sent the inventory to a different http endpoint

var errStr string
for key, err := range errs {
if err != nil {
errStr += fmt.Sprintf("failed to query for %s: %v", key, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use %w here for the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot use %w here, Sprintf doesn't support this and we concatenate the errors.
but the errors are nested nicely:

querying for hardware failed: failed to query for PCI: could not retrieve PCI information: pcidb: No pci-ids DB files found (and network fetch disabled)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but fmt.Errorf can do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we cannot concatenate the errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package main

import (
	"errors"
	"fmt"
	"os"
)

func main() {
	errs := []error{
		fmt.Errorf("error one"),
		os.ErrNotExist,
		os.ErrPermission,
	}

	var cumulatedError error

	for _, e := range errs {
		cumulatedError = fmt.Errorf("%w\n%w", cumulatedError, e)
	}

	fmt.Printf("cumulated error: %+v\n", cumulatedError)
	fmt.Println("-----------")

	if errors.Is(cumulatedError, os.ErrNotExist) {
		fmt.Println("is ENOENT")
	}
	if errors.Is(cumulatedError, os.ErrPermission) {
		fmt.Println("is EPERM")
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Turns out there is already a solution for this in the standard library: errors.Join!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Turns out there is already a solution for this in the standard library: errors.Join!

Cool, that's even better then.

return err
}

func stringToPCIAddress(str string) *info.PCIAddress {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we are already using it inside the function. the rest is just conversion to uint32

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks right then.

IoportRange: serial.IO,
}
var irq uint64
fmt.Sscanf(serial.IRQ, "%d", &irq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use strconv.ParseUint?
Also the returned error is not checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

if strings.Contains(s, "gb") {
mult = 1000
}
// Remove non-digits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work with ethernet adapters providing 2.5gbit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest I've never seen this format with gb. On my laptop and qemu it's always just numbers like 10000

return nil
}

func getFirmwareAttributes() map[string]string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can see it provides only

  bios-vendor
  bios-version
  bios-release-date
  bios-revision

that we already have. Besides as discussed in #5535 (comment) this function might not be that useful for actual edge devices...

Comment on lines 76 to 86
func (status SenderStatus) Failure() bool {
if status == SenderStatusNone {
return false
}
if status == SenderStatusDebug {
return false
}

return true
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (status SenderStatus) Failure() bool {
if status == SenderStatusNone {
return false
}
if status == SenderStatusDebug {
return false
}
return true
}

I think, this is not used anymore

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the controllers already handle retrieving the SMART from the health message?
We'd like to have that in Eden (if it uses SMART) and in the commercial controller before we stop sending it in the hardware info message.

@europaul
Copy link
Contributor Author

Do the controllers already handle retrieving the SMART from the health message? We'd like to have that in Eden (if it uses SMART) and in the commercial controller before we stop sending it in the hardware info message.

I informed @zed-rishabh about the API change and am finishing the changes in Eden

}
}
if usb.Parent.PCI != nil {
parent.PciParent = stringToPCIAddress(fmt.Sprintf("%s:%s:%s.%s", usb.Parent.PCI.Domain, usb.Parent.PCI.Bus, usb.Parent.PCI.Device, usb.Parent.PCI.Function))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first this string is concatenated and then a regex is used to split it up again with the same components as string and then those strings are converted to uint32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, changed it!

@europaul
Copy link
Contributor Author

Do the controllers already handle retrieving the SMART from the health message? We'd like to have that in Eden (if it uses SMART) and in the commercial controller before we stop sending it in the hardware info message.

Let's also keep that info in ZInfoHardware for the moment until all controllers support the new EVE API. I already made the changes to the PR.

christoph-zededa and others added 10 commits January 19, 2026 17:19
- use eve-api version that includes the hardware inventory protobufs
- use our fork of ghw library

Signed-off-by: Christoph Ostarek <christoph@zededa.com>
Report is based on the info from ghw library.
Also add EVE/kernel release, flavor, version.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
According the the EVE API changes, SMART info is now part of the
hardwarehealth message leaving hardwareinfo with only static hardware
inventory data.

We'll also keep the SMART info in ZInfoHardware for a while for
backward compatibility.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Report HW inventory support capability as part of device info message.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
The hardware info message will only be sent after onboarding and after
reboots. The hardware health message will continue to be sent
periodically as before. Therefore there is no need for a separate
interval parameter for hardware info.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Also add tests for kernel info retrieval.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Since we already have this capability in evetpm package,
use it instead of ghw.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Previously, the knowledge of which hardware models possess Status LEDs
(and how to drive them) was encapsulated entirely within `ledmanager`.
To report the presence of a Status LED in the inventory, this logic
needed to be accessible to the `hardware` package.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Add logic to set StatusLedPresent based on hardware model.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Collect BIOS/firmware settings from the Linux
`/sys/class/firmware-attributes` class.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
@europaul
Copy link
Contributor Author

I added an Eden test for this functionality lf-edge/eden#1111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants