Retrieval Augmented Generation (RAG) is the de facto technique for giving LLMs the ability to interact with any document or dataset, regardless of its size. Follow along as I cover how to parse and manipulate documents, explore how embeddings are used to describe abstract concepts, implement a simple yet powerful way to surface the most relevant parts of a document to a given query, and ultimately build a script that you can use to have a locally-hosted LLM engage your own documents. Check out my other Ollama videos: Links: Code from video - Ollama Python library - Project Gutenberg - Nomic Embedding model (on ollama) - BGE Embedding model - How to use a model from HF with Ollama - Cosine Similarity - #cdfc Timestamps: 00:00 - Intro 00:26 - Environment Setup 00:49 - Function review 01:50 - Source Document 02:18 - Starting the project 02:37 - parse_file() 04:35 - Understanding embeddings 05:40 - Implementing embeddings 07:01 - Timing embedding 07:35 - Caching embeddings 10:06 - Prompt embedding 10:19 - Cosine similarity for embedding comparison 12:16 - Brainstorming improvements 13:15 - Giving context to our LLM 14:29 - CLI input 14:49 - Next steps











