Smaller model support #7

ssrajadh · 2025-10-13T03:23:17Z

New features

CLI arg to select model (from GPT2 Family), with default being gpt2 (ex: python examples/practice_run.py --model_name distilgpt2)
Dynamically calculating num_features based on model being used (num_layers * hidden_size)
Added loss metric - total_loss_per_feature to make a more fair comparison between models of different sizes
More logging + model validation with error handling

Also, I ran into some issues with the Poetry config file. I had to change up some of the syntax and version constraints to work on my Linux machine.

I've also attached the data from running DistilGPT2 below. Some of the metrics like sparsity loss aren't as useful though so I'm going to rerun it with the added loss metric that I've mentioned above.
run_distilgpt2.zip

Let me know if this is good and what other verifications for the CLT you had in mind.

…re making PR

etredal · 2025-10-13T04:31:11Z

Awesome! I am taking a look today/tomorrow at this.

etredal · 2025-10-13T04:44:15Z

The additional validation that I am most concerned about is this right here, and I think it is printed to the terminal. What are the actual token outputs of the original model vs replacement model for inference? Doing some research it seems we are going for 50% accuracy, so comparing the text, does the replacement model also have meaningful output even if it didn't necessarily choose the same as the original model?

ssrajadh · 2025-10-13T22:22:03Z

@etredal
Here are the token outputs:


Test text 1: The president of the United States lives in the White House.
  Cosine similarity: 0.9622
  Mean squared error: 1046.1991
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Shape of input_ids before generate: torch.Size([1, 12])

  Original model output: The president of the United States lives in the White House. He is the only person in the world to be elected president.”

  Replacement model output: The president of the United States lives in the White House. The White and the white-the-the-the-white-the-the-the-the-the-white-the-the-the-the-the-the-

Test text 2: Artificial intelligence systems can learn from data.
  Cosine similarity: 0.9934
  Mean squared error: 824.0825
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Shape of input_ids before generate: torch.Size([1, 9])

  Original model output: Artificial intelligence systems can learn from data. But it is not all that surprising. Artificial intelligence systems are increasingly used to generate and analyze information from data. For example, researchers at the University of Cambridge have developed a system that can learn from data.

  Replacement model output: Artificial intelligence systems can learn from data.
ArtArt art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art art

Test text 3: The Sahara Desert is the largest hot desert in the world.
  Cosine similarity: 0.9911
  Mean squared error: 821.0219
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Shape of input_ids before generate: torch.Size([1, 12])

  Original model output: The Sahara Desert is the largest hot desert in the world. It is a barren desert with some of the highest rainfall in the world, with over a million people living there.

The Sahara Desert is the most hot desert in the world

  Replacement model output: The Sahara Desert is the largest hot desert in the world. The Sahara is the largest. The Sahara is the Sahara.
The most advanced.
The French the French
The French
The French
The French
The British
The French

While the cosine similarity is high, there seems to be a mode collapse issue with the replacement model. I'm going to add the attention mask, set pad_token_id, and add logging for the original vs. replacement model logits. Then I'll rerun the test and keep you updated with the outputs and metrics.

…token_id

ssrajadh · 2025-10-16T02:54:09Z

Just did another practice run and got some clues for the repetitive text patterns. The replacement model has over 50% lower variance resulting in a flatter distribution which could explain the mode collapse. Recon loss is also high (1.27). I'm going to add more epochs, increase the learning rate, and look into more tweaks to reduce recon loss and preserve variance. I'll keep you updated.

ssrajadh added 2 commits October 10, 2025 17:19

added smaller model support + logging. will test on google colab befo…

582f2c6

…re making PR

added normalized metrics

21b4de8

etredal requested a review from StickOnAStick October 13, 2025 04:31

added training quality check, logits statistics, attention mask, pad_…

f8b9119

…token_id

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smaller model support #7

Smaller model support #7

Uh oh!

ssrajadh commented Oct 13, 2025

Uh oh!

etredal commented Oct 13, 2025

Uh oh!

etredal commented Oct 13, 2025

Uh oh!

ssrajadh commented Oct 13, 2025 •

edited

Loading

Uh oh!

ssrajadh commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Smaller model support #7

Are you sure you want to change the base?

Smaller model support #7

Uh oh!

Conversation

ssrajadh commented Oct 13, 2025

New features

Uh oh!

etredal commented Oct 13, 2025

Uh oh!

etredal commented Oct 13, 2025

Uh oh!

ssrajadh commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssrajadh commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ssrajadh commented Oct 13, 2025 •

edited

Loading