Skip to content

Questions Regarding Decoder Design and Integration in CrossViViT #15

@zezhishao

Description

@zezhishao

Dear Authors,

Thank you for your excellent work, which has been incredibly valuable for our research.

However, I have a few questions regarding the decoder, specifically the self.temporal_transformer. From my understanding, the time series representations latent_ts are directly fed into the self.temporal_transformer. Here, latent_ts is a tensor with the shape [B, L, D], where L denotes the length of the input time series. At the same time, the self.temporal_transformer acts as a Transformer encoder, which also outputs a tensor with the shape [B, L, D].

I have two main questions:

  1. Does this imply that the input length and output length must be identical? In my case, I have longer input sequences (e.g., one week) and shorter output sequences (e.g., one day). How can I integrate CrossViViT into this scenario?
  2. What is the rationale behind using a Transformer encoder in this design? In an encoder setup, the values at time step T+1 seem to be directly influenced by the values at T-L, even though multi-head self-attention allows for some level of information exchange. Could you elaborate on the reasoning behind this design choice? For prediction tasks, a more common approach is to use an auto-regressive transformer decoder. Have you experimented with this type of decoder architecture?

Thank you again for your valuable insights.

Best regards!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions