This repository was archived by the owner on Aug 29, 2023. It is now read-only.

Description
Hi,
I am trying to learn about the code, and I find the following line:
|
tgt = torch.zeros_like(query_embed) |
The input
tgt of the decoder is all zeros, and I see the all-zeros-tensor is used as input in the decoder layer:
|
q = k = self.with_pos_embed(tgt, query_pos) |
Here tgt is all-zeros and the query_pos is a learnable embedding, which causes q and k to be non-zero tensor (same tensor in value as query_pos, but the tgt is still all-zeros(used as v). According to the computation rule of qkv attention, if v is all-zeros, the output of qkv would be all-zeros. Thus the self-attention module does not contribute to the model. Am I correct on this?