-
Notifications
You must be signed in to change notification settings - Fork 14
Smaller model support #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Awesome! I am taking a look today/tomorrow at this. |
|
@etredal While the cosine similarity is high, there seems to be a mode collapse issue with the replacement model. I'm going to add the attention mask, set |
|
Just did another practice run and got some clues for the repetitive text patterns. The replacement model has over 50% lower variance resulting in a flatter distribution which could explain the mode collapse. Recon loss is also high (1.27). I'm going to add more epochs, increase the learning rate, and look into more tweaks to reduce recon loss and preserve variance. I'll keep you updated. |

New features
num_featuresbased on model being used (num_layers*hidden_size)total_loss_per_featureto make a more fair comparison between models of different sizesAlso, I ran into some issues with the Poetry config file. I had to change up some of the syntax and version constraints to work on my Linux machine.
I've also attached the data from running DistilGPT2 below. Some of the metrics like sparsity loss aren't as useful though so I'm going to rerun it with the added loss metric that I've mentioned above.
run_distilgpt2.zip
Let me know if this is good and what other verifications for the CLT you had in mind.