Structural downsampling and static token sparsification

Hi, it's a quite solid and promising work but I have some questions.
(1) In the paper, you perform an average pooling with kernel size 2 × 2 after the sixth block for the structural downsampling. But in Table 3, you show the results of   **_structural downsampling and static_** dynamic token sparsification. What is the difference between **_structural downsampling and static token sparsification_** since their ACCs are not same?
(2) I'm interested in the average pooling with kernel size 2 × 2. Did you do extra experiments in the position of such structural downsampling, like the seventh block or the tenth block in ViT? 
(3) Could you provide the codes for reproducing the results of  **_structural downsampling and static token sparsification_** in Table 3 and the probability heat-map in Figure 6?

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structural downsampling and static token sparsification #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Structural downsampling and static token sparsification #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions