Transformer Neural Networks are the heart of pretty much everything exciting in AI right now. ChatGPT, Google Translate and many other cool things, are based on Transformers. This StatQuest cuts through all the hype and shows you how a Transformer works, one-step-at-a time. NOTE: If you're interested in learning more about Backpropagation, check out these 'Quests: The Chain Rule: Gradient Descent: Backpropagation Main Ideas: Backpropagation Details Part 1: Backpropagation Details Part 2: If you're interested in learning more about the SoftMax function, check out: If you're interested in learning more about Word Embedding, check out: If you'd like to learn more about calculating similarities in the context of neural networks and the Dot Product, check out: Cosine Similarity: Attention: For a complete index of all the StatQuest videos, check out: If you'd like to support StatQuest, please consider... Patreon: ...or... YouTube Membership: ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... ...or just donating to StatQuest! Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: 0:00 Awesome song and introduction 1:26 Word Embedding 7:30 Positional Encoding 12:53 Self-Attention 23:37 Encoder and Decoder defined 23:53 Decoder Word Embedding 25:08 Decoder Positional Encoding 25:50 Transformers were designed for parallel computing 27:13 Decoder Self-Attention 27:59 Encoder-Decoder Attention 31:19 Decoding numbers into words 32:23 Decoding the second token 34:13 Extra stuff you can add to a Transformer #StatQuest #Transformer #ChatGPT











