Skip to content

Conversation

@kozlov721
Copy link
Collaborator

@kozlov721 kozlov721 commented Dec 16, 2025

Purpose

Fixes various issues with upgrading checkpoints from 0.3.11 to 0.4.x

Specification

  • Removes DebugLoader from saved config
  • Fixed bugs in checkpoint loading
  • Fixed incorrect execution order generation
  • Fixed a bug in PPLCNet regarding wrong activations
  • Improved the upgrade command

Dependencies & Potential Impact

None / not applicable

Deployment Plan

None / not applicable

Testing & Validation

None / not applicable

@kozlov721 kozlov721 requested a review from a team as a code owner December 16, 2025 03:24
@kozlov721 kozlov721 requested review from conorsim, klemen1999 and tersekmatija and removed request for a team December 16, 2025 03:24
@github-actions github-actions bot added fix Fixing a bug CLI Changes affecting the CLI labels Dec 16, 2025
@param opts: A list of optional CLI overrides of the config file.
"""
create_model(config, opts, weights=weights).infer(
create_model(config, opts, weights=weights, debug_mode=True).infer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected to have debug_mode=True?

old_order = ckpt.get("execution_order")
new_order = get_model_execution_order(self)

for node_name, node in self.nodes.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make this whole for loop a bit more readable and not so deeply nested? I realize there are several fallback steps but can we introduce some helper functions and add small comments on each fallback step so that this code is easier to understand and maintene?
Maybe we can try putting it into a LLM asking it clean it up and make it more readable - but we need to have some test cases to make sure we don't actually change any logic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll look into making this more readable


for name, module in model.named_modules():
if name and list(module.parameters()):
if list(module.parameters()) and not list(module.children()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are execution orders generated and saved so far in the checkpoints still generally correct or we can't use them because of the bug we had?
And what was the bug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug here is that the module names are saved among the parameters like this:

class Foo:
  conv1 = ...
  conv2 = ...

class Bar:
  conv3 = ...

This would generate [Foo, Foo.conv1, Foo.conv2, Bar, Bar.conv3]

While this refactor:

class Foo:
  conv1 = ...
 
class Bar:
  conv2 = ...
  conv3 = ...

would generate [Foo, Foo.conv1, Bar, Bar.conv2, Bar.conv3].

This makes the order not match due to the inclusion of the parent module.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this example tbh. You show here 2 different networks no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wasn't very detauled. These would be two implemenations of the same network using 2 modules where the layers are just run sequentially.

Copy link
Collaborator

@klemen1999 klemen1999 Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so if I understand correct we have the parents in the checkpoints that are currently saved whereas now we'll only have leaf nodes?
What does that mean in context of backcompatibility, reusing current execution orders,etc? For loading older weights can we filter out these parent nodes first and then try to match them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compatibility between 0.3.11 and 0.4.2 and so on should still work.

So the process of re-exporting to 0.3.11 and then again re-exporting with 0.4.2 should work. (or using the luxonis_train upgrade ckpt in 0.4.2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLI Changes affecting the CLI fix Fixing a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants