Hello, should I use what kind of hyper-paramter for the first try? For example, learning rate, AdamW or Madgrad?