Loading
Search for a command to run...
A neural network architecture that uses self-attention mechanisms to process sequential data in parallel, forming the foundation of all modern LLMs.
Introduced in the 2017 paper "Attention Is All You Need," the transformer architecture revolutionized natural language processing. Unlike RNNs that process tokens sequentially, transformers use self-attention to weigh the relevance of all tokens simultaneously, enabling massive parallelization. This architecture powers GPT (decoder-only), BERT (encoder-only), and T5 (encoder-decoder) models. Transformers scale effectively with more parameters and data, leading to the current era of large language models.