-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
Hi,
Thanks for the great work. I think gradient normalization is a reasonable idea to extend GC. But I notice 2 points which is confused to me:
-
The gradient which size is greater than 1 is centralized by the mean and all the gradient (which is not filtered by the size ) are normalized by the std, is this an empirically better implementation or it is just a bug.
-
Also I notice the calculation dimension of mean and std in gradient normalization is different, which is not very intuitive to me.
Thanks for the reply.
Metadata
Metadata
Assignees
Labels
No labels