Skip to content

A problem about ClassificationHead in the model.py #12

@Shimao-Zhang

Description

@Shimao-Zhang

Thanks for your great work! And I notice that you utilized a non-linear layer with GELU and a LayerNorm operation and a linear layer called decoder as the voken classification head, which is different from the way mentioned in the paper. In the paper, it is a softmax layer following a linear layer. Did they perform similarly or just cuz I misunderstand it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions