Skip to content

Struggle with replicating the training curve #44

@PandaCodes

Description

@PandaCodes

Hi! Thank you for the amazing paper and for the open-sourcing the code!

I was trying to reproduce the training process and encountered few difficulties.
For some reason, the given datasets, when loaded, has all the grain masses equal to zero:
Screenshot 2024-05-18 at 21 37 52
Screenshot 2024-05-18 at 21 37 20
Which, obviously, breaks the program when divided by.

I was trying to set it None for each molecule since in this case torchmd guesses masses during Parameters building. However, it sets all to 12. (which, I guess, corresponds to C-alpha atoms). This is probably not the way it is supposed to be, since it implies the wrong physics. And the training loss is stuck around 2.8:
Screenshot 2024-05-19 at 01 14 54

After that, I've mapped resname-s to the known AA masses. This indeed improved the train loss being started from 2 and decreased to 1 (and still being slowly on the way down):
Screenshot 2024-05-19 at 13 14 59
However, the training curve looks nothing like in the example notebook where it start from around 5 and drops to almost zero.

I am using train_ff.yaml with only "log_dir" and "device" modified.

Do you have an idea of what might be wrong?
From what I can see, the input.yaml of the newly trained model differs from the one in data/models/fastfolders, particularly in such fields as max_num_neighbors and some other , so my next step would be to try using the same values, I guess.

I would much appreciate If you could help me with replicating the results. I am eager to use the trajectory reweighting method with another CG-potentials and slightly extended CG-systems and I really hope your implementation to help me a lot with that.

P.S. In order to launch I also had to resolve an environment (which doesn't work from the environment.yaml missing certain packages that conflict with each other) and add a "timestep" key to the logger. I can make a PR with the environment.yaml that worked for me.

Edit: The one with mapped AA masses eventually went to the zero-proximity after 5k steps. Still would be great to make optimisation faster, like in the example notebook.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions