-
Notifications
You must be signed in to change notification settings - Fork 25
mellanox_firmware plugin refactoring #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@filanov, please review and tell me what you think about this refactor I placed mellanox_firmware_suite under sos/report. for example: Error while trying to load module sos.report.plugins.base_tool: ModuleNotFoundError Error while trying to load module sos.report.plugins.cable_collector: ModuleNotFoundError |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughIntroduces a modular framework for Mellanox firmware diagnostics featuring device context abstraction, base tool infrastructure, factory patterns for tool instantiation, and specialized collectors for system, firmware, and cable data. Refactors the main plugin to delegate orchestration to a centralized manager. Changes
Poem
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.4)sos/report/mellanox_firmware_suite/collectors/system_collector.pysos/report/mellanox_firmware_suite/collectors/cable_collector.pysos/report/mellanox_firmware_suite/collectors/collector_manager.py
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@sos/report/mellanox_firmware_suite/collectors/cable_collector.py`:
- Around line 54-111: In _collect_link_data, before calling tool.amber_collect
and tool.amber_collect_pcie, ensure the directory for the CSV paths returned by
__generate_archive_file_path exists; compute amber_csv_path and
amber_pci_csv_path as you already do, create their parent directories
(os.makedirs(pathlib.Path(amber_csv_path).parent, exist_ok=True)) and then call
tool.amber_collect and tool.amber_collect_pcie using
amber_csv_path/amber_pci_csv_path and the existing
amber_file_name/amber_pci_file_name variables so writes won't fail if the output
dirs are missing.
In `@sos/report/mellanox_firmware_suite/tools/base_tool.py`:
- Around line 37-49: _in _run_command_, guard against
exec_cmd/_collect_cmd_output returning None or a dict with status==None by
normalizing the result: after calling self.plugin.exec_cmd(...) or
self.plugin._collect_cmd_output(...), if the returned value is None set res to
an empty dict, then read status = res.get("status"); if status is None replace
it with a safe non-zero default (e.g. 1) to indicate failure, and set output =
res.get("output", ""); finally return the normalized (status, output). This
ensures callers checking rc != 0 / rc == 0 behave consistently and handles
_collect_cmd_output timeouts too.
In `@sos/report/mellanox_firmware_suite/tools/MFT/mstdump.py`:
- Around line 4-10: The mstdump_run method currently ignores the idx when
building the command, causing identical dumps; update MstdumpTool.mstdump_run so
the cmd passed to execute_cmd includes the idx (for example append a segment
flag like "--segment {idx}" or the appropriate mstdump argument) while keeping
key and filename usage, and apply the same change to the corresponding method in
mstregdump.py so each iteration invokes a distinct command per idx.
In `@sos/report/mellanox_firmware_suite/tools/MSTFlint/mstregdump.py`:
- Around line 4-10: The method MstregdumpTool.mstregdump_run currently accepts
an unused idx; either remove the idx parameter (and update its callers/loop) or
incorporate idx into the executed command so each invocation differs. Update the
call to self.execute_cmd (the cmd string built for mstregdump using
self.ctx.device) to include the idx value in the command invocation (or the
appropriate flag/segment syntax) and keep filename handling intact, or remove
idx from the signature and from the loop that calls mstregdump_run to avoid
repeating the same command.
In `@sos/report/plugins/mellanox_firmware.py`:
- Around line 164-191: The setup method appends to self.device_contexts on every
run causing duplicates when setup is called multiple times; at the start of
setup (before calling detect_tool/detect_devices) clear or reinitialize
self.device_contexts (e.g., set self.device_contexts = [] or call clear()) so
DeviceContext entries created in the loop (using DeviceContext, device_contexts,
setup, detect_tool, detect_devices) don't accumulate across repeated setup
invocations.
🧹 Nitpick comments (2)
sos/report/mellanox_firmware_suite/device_context.py (1)
1-9: Clean context abstraction with proper encapsulation.The class design is appropriate for a per-device context object. The read-only properties protect the immutable device identity while exposing a mutable
cachefor tool results.Consider adding type hints to the constructor parameters for consistency with
BaseToolwhich uses type hints:def __init__(self, device: str, pci: str, primary: bool, global_collector: bool, provider) -> None:Also, a
__repr__method could aid debugging:def __repr__(self): return f"DeviceContext(device={self._device!r}, pci={self._pci!r})"sos/report/mellanox_firmware_suite/tools/__init__.py (1)
85-97: Optional: avoid global cache growth and id reuse edge cases.Using
id(ctx)as part of the key can retain entries longer than needed and risks id reuse in long-lived processes. Consider aWeakKeyDictionarykeyed byctxto let caches expire naturally.♻️ Suggested refactor (weak-keyed cache)
+from weakref import WeakKeyDictionary-_TOOL_CACHE = {} +_TOOL_CACHE = WeakKeyDictionary() def get_tool(name, plugin, ctx): - cache_key = f"{id(ctx)}_{name}" - if name not in TOOLS_MAP: raise ValueError(f"Unknown tool: '{name}'") - if cache_key not in _TOOL_CACHE: - _TOOL_CACHE[cache_key] = TOOLS_MAP[name](plugin, ctx) + ctx_cache = _TOOL_CACHE.setdefault(ctx, {}) + if name not in ctx_cache: + ctx_cache[name] = TOOLS_MAP[name](plugin, ctx) - return _TOOL_CACHE[cache_key] + return ctx_cache[name]
| def _collect_link_data(self, plugin, tool, prefix, ctx): | ||
| file_base = f"{prefix}_{ctx.bdf}_" | ||
|
|
||
| tool.show_module( | ||
| filename=f"{file_base}--show_module" | ||
| ) | ||
|
|
||
| tool.cable_dump( | ||
| filename=f"{file_base}--cable_--dump" | ||
| ) | ||
|
|
||
| tool.cable_ddm( | ||
| filename=f"{file_base}--cable_--ddm" | ||
| ) | ||
|
|
||
| tool.show_counters( | ||
| filename=f"{file_base}--show_counters" | ||
| ) | ||
|
|
||
| tool.show_eye( | ||
| filename=f"{file_base}--show_eye" | ||
| ) | ||
|
|
||
| tool.show_fec( | ||
| filename=f"{file_base}--show_fec" | ||
| ) | ||
|
|
||
| tool.show_serdes_tx( | ||
| filename=f"{file_base}--show_serdes_tx" | ||
| ) | ||
|
|
||
| tool.show_rx_fec_histogram( | ||
| filename=f"{file_base}--rx_fec_histogram_--show_histogram" | ||
| ) | ||
|
|
||
| if ctx.primary: | ||
| self._collect_pcie_link_details(plugin, tool, file_base, ctx) | ||
|
|
||
| amber_file_name = f"{file_base}--amber_collect" | ||
| amber_csv_path = self.__generate_archive_file_path( | ||
| plugin, | ||
| file_name=f"{amber_file_name}.csv" | ||
| ) | ||
|
|
||
| tool.amber_collect( | ||
| path=amber_csv_path, filename=amber_file_name | ||
| ) | ||
|
|
||
| amber_pci_file_name = f"{file_base}--amber_collect_--pci" | ||
| amber_pci_csv_path = self.__generate_archive_file_path( | ||
| plugin, | ||
| file_name=f"{amber_pci_file_name}.csv" | ||
| ) | ||
|
|
||
| tool.amber_collect_pcie( | ||
| path=amber_pci_csv_path, | ||
| filename=amber_pci_file_name | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure amber CSV output directory exists.
amber_collect writes directly to the CSV path; if the command output directory hasn’t been created yet, the write can fail. Create the directory before invoking.
🛠️ Suggested fix
amber_csv_path = self.__generate_archive_file_path(
plugin,
file_name=f"{amber_file_name}.csv"
)
+ os.makedirs(os.path.dirname(amber_csv_path), exist_ok=True)
tool.amber_collect(
path=amber_csv_path, filename=amber_file_name
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _collect_link_data(self, plugin, tool, prefix, ctx): | |
| file_base = f"{prefix}_{ctx.bdf}_" | |
| tool.show_module( | |
| filename=f"{file_base}--show_module" | |
| ) | |
| tool.cable_dump( | |
| filename=f"{file_base}--cable_--dump" | |
| ) | |
| tool.cable_ddm( | |
| filename=f"{file_base}--cable_--ddm" | |
| ) | |
| tool.show_counters( | |
| filename=f"{file_base}--show_counters" | |
| ) | |
| tool.show_eye( | |
| filename=f"{file_base}--show_eye" | |
| ) | |
| tool.show_fec( | |
| filename=f"{file_base}--show_fec" | |
| ) | |
| tool.show_serdes_tx( | |
| filename=f"{file_base}--show_serdes_tx" | |
| ) | |
| tool.show_rx_fec_histogram( | |
| filename=f"{file_base}--rx_fec_histogram_--show_histogram" | |
| ) | |
| if ctx.primary: | |
| self._collect_pcie_link_details(plugin, tool, file_base, ctx) | |
| amber_file_name = f"{file_base}--amber_collect" | |
| amber_csv_path = self.__generate_archive_file_path( | |
| plugin, | |
| file_name=f"{amber_file_name}.csv" | |
| ) | |
| tool.amber_collect( | |
| path=amber_csv_path, filename=amber_file_name | |
| ) | |
| amber_pci_file_name = f"{file_base}--amber_collect_--pci" | |
| amber_pci_csv_path = self.__generate_archive_file_path( | |
| plugin, | |
| file_name=f"{amber_pci_file_name}.csv" | |
| ) | |
| tool.amber_collect_pcie( | |
| path=amber_pci_csv_path, | |
| filename=amber_pci_file_name | |
| ) | |
| def _collect_link_data(self, plugin, tool, prefix, ctx): | |
| file_base = f"{prefix}_{ctx.bdf}_" | |
| tool.show_module( | |
| filename=f"{file_base}--show_module" | |
| ) | |
| tool.cable_dump( | |
| filename=f"{file_base}--cable_--dump" | |
| ) | |
| tool.cable_ddm( | |
| filename=f"{file_base}--cable_--ddm" | |
| ) | |
| tool.show_counters( | |
| filename=f"{file_base}--show_counters" | |
| ) | |
| tool.show_eye( | |
| filename=f"{file_base}--show_eye" | |
| ) | |
| tool.show_fec( | |
| filename=f"{file_base}--show_fec" | |
| ) | |
| tool.show_serdes_tx( | |
| filename=f"{file_base}--show_serdes_tx" | |
| ) | |
| tool.show_rx_fec_histogram( | |
| filename=f"{file_base}--rx_fec_histogram_--show_histogram" | |
| ) | |
| if ctx.primary: | |
| self._collect_pcie_link_details(plugin, tool, file_base, ctx) | |
| amber_file_name = f"{file_base}--amber_collect" | |
| amber_csv_path = self.__generate_archive_file_path( | |
| plugin, | |
| file_name=f"{amber_file_name}.csv" | |
| ) | |
| os.makedirs(os.path.dirname(amber_csv_path), exist_ok=True) | |
| tool.amber_collect( | |
| path=amber_csv_path, filename=amber_file_name | |
| ) | |
| amber_pci_file_name = f"{file_base}--amber_collect_--pci" | |
| amber_pci_csv_path = self.__generate_archive_file_path( | |
| plugin, | |
| file_name=f"{amber_pci_file_name}.csv" | |
| ) | |
| tool.amber_collect_pcie( | |
| path=amber_pci_csv_path, | |
| filename=amber_pci_file_name | |
| ) |
🤖 Prompt for AI Agents
In `@sos/report/mellanox_firmware_suite/collectors/cable_collector.py` around
lines 54 - 111, In _collect_link_data, before calling tool.amber_collect and
tool.amber_collect_pcie, ensure the directory for the CSV paths returned by
__generate_archive_file_path exists; compute amber_csv_path and
amber_pci_csv_path as you already do, create their parent directories
(os.makedirs(pathlib.Path(amber_csv_path).parent, exist_ok=True)) and then call
tool.amber_collect and tool.amber_collect_pcie using
amber_csv_path/amber_pci_csv_path and the existing
amber_file_name/amber_pci_file_name variables so writes won't fail if the output
dirs are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not feasible—the plugin directory is created during the setup flow, and the path is generated using the appropriate system calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anton-metsner, thank you for the clarification! I understand now that the plugin framework's setup flow and system calls handle the directory creation automatically, so the explicit makedirs is unnecessary. I appreciate you explaining how the plugin infrastructure works.
✏️ Learnings added
Learnt from: anton-metsner
Repo: NVIDIA/doca-sosreport PR: 91
File: sos/report/mellanox_firmware_suite/collectors/cable_collector.py:54-111
Timestamp: 2026-01-22T22:39:45.230Z
Learning: In sos plugin framework (sos/report/plugins), the plugin directory structure is created during the setup flow. Methods like `plugin.archive.get_archive_path()` and `plugin._make_command_filename()` use appropriate system calls that ensure directories exist, so explicit `os.makedirs()` calls are not needed before writing files to paths generated by these methods.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| class MstdumpTool(BaseTool): | ||
| def mstdump_run(self, idx, filename=None): | ||
| return self.execute_cmd( | ||
| cmd=f"mstdump {self.ctx.device}", | ||
| key=f"mstdump_{self.ctx.device}_{idx}", | ||
| filename=filename | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n sos/report/mellanox_firmware_suite/tools/MFT/mstdump.pyRepository: NVIDIA/doca-sosreport
Length of output: 416
🏁 Script executed:
# Search for calls to mstdump_run
rg "mstdump_run" --type py -B 2 -A 2Repository: NVIDIA/doca-sosreport
Length of output: 989
🏁 Script executed:
# Get full context of the mstdump_run calls
cat -n sos/report/mellanox_firmware_suite/collectors/firmware_collector.py | sed -n '1,150p'Repository: NVIDIA/doca-sosreport
Length of output: 3548
🏁 Script executed:
# Check mstregdump tool to see if it has a similar pattern
cat -n sos/report/mellanox_firmware_suite/tools/MFT/mstregdump.pyRepository: NVIDIA/doca-sosreport
Length of output: 155
🏁 Script executed:
# List MFT tools directory
ls -la sos/report/mellanox_firmware_suite/tools/MFT/Repository: NVIDIA/doca-sosreport
Length of output: 779
🏁 Script executed:
# Check BaseTool implementation
cat -n sos/report/mellanox_firmware_suite/tools/base_tool.pyRepository: NVIDIA/doca-sosreport
Length of output: 1969
🏁 Script executed:
# Check resourcedump tool to see if it uses idx in command
cat -n sos/report/mellanox_firmware_suite/tools/MFT/resourcedump.pyRepository: NVIDIA/doca-sosreport
Length of output: 401
🏁 Script executed:
# Check mlxdump tool
cat -n sos/report/mellanox_firmware_suite/tools/MFT/mlxdump.pyRepository: NVIDIA/doca-sosreport
Length of output: 373
🏁 Script executed:
# Check flint tool for comparison
cat -n sos/report/mellanox_firmware_suite/tools/MFT/flint.pyRepository: NVIDIA/doca-sosreport
Length of output: 1433
🏁 Script executed:
# Check MstFlint tools for similar patterns
ls -la sos/report/mellanox_firmware_suite/tools/MstFlint/Repository: NVIDIA/doca-sosreport
Length of output: 162
🏁 Script executed:
# Find all tool directories
find sos/report/mellanox_firmware_suite/tools -type dRepository: NVIDIA/doca-sosreport
Length of output: 200
🏁 Script executed:
# Check MSTFlint tools directory
ls -la sos/report/mellanox_firmware_suite/tools/MSTFlint/Repository: NVIDIA/doca-sosreport
Length of output: 743
🏁 Script executed:
# Check mstregdump tool to see if it uses idx differently
cat -n sos/report/mellanox_firmware_suite/tools/MSTFlint/mstregdump.pyRepository: NVIDIA/doca-sosreport
Length of output: 428
🏁 Script executed:
# Check mstresourcedump to see if it uses idx
cat -n sos/report/mellanox_firmware_suite/tools/MSTFlint/mstresourcedump.pyRepository: NVIDIA/doca-sosreport
Length of output: 432
idx parameter is unused in the mstdump command.
The method accepts idx but only applies it to the cache key and filename, not to the command itself. This causes the loop in firmware_collector.py to execute the identical mstdump {device} command three times, producing identical output. If the intent is to dump different segments with the mstdump command, add idx to the command (e.g., mstdump {self.ctx.device} --segment {idx}). Otherwise, remove the loop and extra calls. Note: The same issue exists in mstregdump.py.
🤖 Prompt for AI Agents
In `@sos/report/mellanox_firmware_suite/tools/MFT/mstdump.py` around lines 4 - 10,
The mstdump_run method currently ignores the idx when building the command,
causing identical dumps; update MstdumpTool.mstdump_run so the cmd passed to
execute_cmd includes the idx (for example append a segment flag like "--segment
{idx}" or the appropriate mstdump argument) while keeping key and filename
usage, and apply the same change to the corresponding method in mstregdump.py so
each iteration invokes a distinct command per idx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected behavior,
It should be the same command for any idx
Filename generation is outside the scope of this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anton-metsner, thank you for the clarification! I understand now—the idx parameter is intentionally used only for cache key differentiation and filename generation, while the command itself remains consistent across iterations. This makes sense for the use case where you're collecting the same diagnostic output multiple times or organizing outputs by index.
I've noted this design pattern for future reference.
✏️ Learnings added
Learnt from: anton-metsner
Repo: NVIDIA/doca-sosreport PR: 91
File: sos/report/mellanox_firmware_suite/tools/MFT/mstdump.py:4-10
Timestamp: 2026-01-22T22:41:21.556Z
Learning: In sos/report/mellanox_firmware_suite/tools/MFT/mstdump.py and mstregdump.py, the idx parameter in mstdump_run and mstregdump_run methods is intentionally used only for cache key and filename generation, not in the command itself. The command should remain the same for any idx value—this is expected behavior.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| class MstregdumpTool(BaseTool): | ||
| def mstregdump_run(self, idx, filename=None): | ||
| return self.execute_cmd( | ||
| cmd=f"mstregdump {self.ctx.device}", | ||
| key=f"mstregdump_{self.ctx.device}_{idx}", | ||
| filename=filename | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the full content of the mstregdump.py file
cat sos/report/mellanox_firmware_suite/tools/MSTFlint/mstregdump.pyRepository: NVIDIA/doca-sosreport
Length of output: 358
🏁 Script executed:
# Search for callers of mstregdump_run to understand the loop pattern
rg "mstregdump_run" -B 3 -A 3Repository: NVIDIA/doca-sosreport
Length of output: 1391
🏁 Script executed:
# Look at BaseTool to understand execute_cmd and how idx might be used
find . -name "*.py" -type f | xargs grep -l "class BaseTool" | head -5Repository: NVIDIA/doca-sosreport
Length of output: 120
🏁 Script executed:
# Check if there are similar patterns in other MSTFlint tools
ls -la sos/report/mellanox_firmware_suite/tools/MSTFlint/Repository: NVIDIA/doca-sosreport
Length of output: 743
🏁 Script executed:
# Check BaseTool.execute_cmd to understand caching behavior
cat sos/report/mellanox_firmware_suite/tools/base_tool.pyRepository: NVIDIA/doca-sosreport
Length of output: 1549
🏁 Script executed:
# Look at mstlink.py to see if similar tools have the same pattern
cat sos/report/mellanox_firmware_suite/tools/MSTFlint/mstlink.pyRepository: NVIDIA/doca-sosreport
Length of output: 2479
🏁 Script executed:
# Check the firmware_collector more carefully to understand the intent
cat sos/report/mellanox_firmware_suite/collectors/firmware_collector.py | head -100Repository: NVIDIA/doca-sosreport
Length of output: 2890
🏁 Script executed:
# Check mstdump.py to see if it has similar pattern with idx not being used
cat sos/report/mellanox_firmware_suite/tools/MFT/mstdump.pyRepository: NVIDIA/doca-sosreport
Length of output: 346
🏁 Script executed:
# Search for any documentation or hints about segment/index in mstregdump context
rg "segment" sos/report/mellanox_firmware_suite/ -A 2 -B 2Repository: NVIDIA/doca-sosreport
Length of output: 2080
🏁 Script executed:
# Check all tool files to see if this idx pattern is common
rg "def.*run.*idx" sos/report/mellanox_firmware_suite/tools/Repository: NVIDIA/doca-sosreport
Length of output: 281
🏁 Script executed:
# Check if there are any comments or documentation about why idx loop is used
rg "1, 2, 3" sos/report/mellanox_firmware_suite/collectors/firmware_collector.py -B 5 -A 5Repository: NVIDIA/doca-sosreport
Length of output: 597
🏁 Script executed:
# Look at git history or comments to understand intent (if any)
git log --oneline -n 20 -- sos/report/mellanox_firmware_suite/tools/MSTFlint/mstregdump.py 2>/dev/null | head -10Repository: NVIDIA/doca-sosreport
Length of output: 133
idx doesn't affect the mstregdump invocation.
The method receives idx but does not pass it to the command, so the for idx in (1, 2, 3) loop executes the same mstregdump {device} command three times. If idx is meant to select a segment, pass it into the command; otherwise remove the parameter and loop.
🤖 Prompt for AI Agents
In `@sos/report/mellanox_firmware_suite/tools/MSTFlint/mstregdump.py` around lines
4 - 10, The method MstregdumpTool.mstregdump_run currently accepts an unused
idx; either remove the idx parameter (and update its callers/loop) or
incorporate idx into the executed command so each invocation differs. Update the
call to self.execute_cmd (the cmd string built for mstregdump using
self.ctx.device) to include the idx value in the command invocation (or the
appropriate flag/segment syntax) and keep filename handling intact, or remove
idx from the signature and from the loop that calls mstregdump_run to avoid
repeating the same command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected behavior,
It should be the same command for any idx
Filename generation is outside the scope of this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anton-metsner Thank you for the clarification! I understand now—the design intentionally runs the same mstregdump command multiple times, and the idx parameter serves to:
- Differentiate cache keys (
f"mstregdump_{self.ctx.device}_{idx}") so that each invocation bypasses the cache and executes independently - Enable unique filename generation by the caller (e.g.,
f"mstregdump_{ctx.bdf}_{idx}")
This pattern makes sense for capturing multiple snapshots of the same device state to separate files. I appreciate you explaining the rationale!
✏️ Learnings added
Learnt from: anton-metsner
Repo: NVIDIA/doca-sosreport PR: 91
File: sos/report/mellanox_firmware_suite/tools/MSTFlint/mstregdump.py:4-10
Timestamp: 2026-01-22T22:42:08.461Z
Learning: In `sos/report/mellanox_firmware_suite/tools/MSTFlint/mstregdump.py`, the `mstregdump_run` method intentionally accepts an `idx` parameter that is used only in the cache key (not in the command string) to allow the same command to run multiple times with different cache keys and caller-generated filenames. This is expected behavior for capturing multiple snapshots of device state.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
… collectors Refactored Mellanox firmware collection into a modular suite built around device contexts and collectors. Detection now creates a DeviceContext per Mellanox PCI/MST device, selects MFT vs mstflint, and orchestrates execution via CollectorManager. Introduced reusable tool wrappers (flint/mstflint, mstdump/mstregdump, resourcedump, mlxlink/mstlink, etc.) and dedicated collectors for system, firmware, and cable data. These share caching and use consistent output naming. Removed the old inline command tables from the plugin while preserving coverage, simplifying control flow, and improving logging. Signed-off-by: Anton Metsner <antonm@nvidia.com>
Collect PCIe link status and error counters for Mellanox devices using mstlink/mlxlink to improve PCIe diagnostics. Signed-off-by: Anton Metsner <antonm@nvidia.com>
filanov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general comment about the filenames i think that it's better to avoid setting it via code and let the sdk to decide the name but up to you
| def commands(self): | ||
| """ | ||
| Build and return all commands for MFT firmware collection. | ||
| for pci_addr, is_primary in pci_devices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure i understand this part, what does it do and why is there a break if you found one object?
| filename=f"flint_{ctx.bdf}_q_full" | ||
| ) | ||
|
|
||
| for idx in (1, 2, 3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for idx in (1, 2, 3): | |
| for idx in range(1,4): |
| filename=f"mstflint_{ctx.bdf}_q_full" | ||
| ) | ||
|
|
||
| for idx in (1, 2, 3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for idx in (1, 2, 3): | |
| for idx in range(1,4): |
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.