Hi!
I was inquiring about the layer_types config that you used for tinyllama_lckv.json, What was the intuition for this config based on the num_attention_heads and num_key_value_heads and num_hidden_layers for the layer_types and forward_passes and backward_passes ?