Transformer
-
Sequence-to-sequence
-
Application Scenarios
-
Transformer's Encoder
- Architecture
- Residual
- Norm
-
Transformer's Decoder
- Architecture
- Autoregressive
- Masked Multi-Head Attention
- Non-autoregressive
- Cross attention
-
Training
- The loss function
- Teacher forcing
-
Copy Mechanism
-
Guided Attention
-
Beam Search
-
Sampling
-
Optimizing Evaluation Metrics
-
Scheduled Sampling