-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Dear Authors,
Thank you for your excellent work, which has been incredibly valuable for our research.
However, I have a few questions regarding the decoder, specifically the self.temporal_transformer. From my understanding, the time series representations latent_ts are directly fed into the self.temporal_transformer. Here, latent_ts is a tensor with the shape [B, L, D], where L denotes the length of the input time series. At the same time, the self.temporal_transformer acts as a Transformer encoder, which also outputs a tensor with the shape [B, L, D].
I have two main questions:
- Does this imply that the input length and output length must be identical? In my case, I have longer input sequences (e.g., one week) and shorter output sequences (e.g., one day). How can I integrate CrossViViT into this scenario?
- What is the rationale behind using a Transformer encoder in this design? In an encoder setup, the values at time step T+1 seem to be directly influenced by the values at T-L, even though multi-head self-attention allows for some level of information exchange. Could you elaborate on the reasoning behind this design choice? For prediction tasks, a more common approach is to use an auto-regressive transformer decoder. Have you experimented with this type of decoder architecture?
Thank you again for your valuable insights.
Best regards!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels