About LF-VILA code in PatchEmbed3D of video encoder

the padding seems not right, or maybe i made a mistake
```
# padding
        _, _, D, H, W = x.size() 
        if H % self.patch_size[0] != 0: 
            x = F.pad(x, (0, 0, 0, self.patch_size[1] - H % self.patch_size[1]))
        if W % self.patch_size[1] != 0:
            x = F.pad(x, (0, 0, 0, 0, 0, self.patch_size[0] - D % self.patch_size[0]))
```

owing to `patch_size=[1, 8, 8]` where 8x8 is HxW in implementation, should it be padded in H and W dimension? 
condition `H % self.patch_size[0] != 0` and `W % self.patch_size[1] != 0` make me lost
thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About LF-VILA code in PatchEmbed3D of video encoder #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About LF-VILA code in PatchEmbed3D of video encoder #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions