More detail on language-mixing phenomena

Thank you for your excellent work and the insightful paper. We are currently attempting to reproduce your results using the kk-datasets configuration, and have observed that no language-mixing phenomena occur even after model convergence.

What is the average response length (in tokens) observed during language-mixing events? 

Some Exp Settings：
- model：qwen2.5-7b-base
- max_response_length: 4096
- datasets: kk 5-8ppl


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More detail on language-mixing phenomena #72

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

More detail on language-mixing phenomena #72

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions