Войти
  • 48495Просмотров
  • 1 год назадОпубликованоDecoder

RAG from the Ground Up with Python and Ollama

Retrieval Augmented Generation (RAG) is the de facto technique for giving LLMs the ability to interact with any document or dataset, regardless of its size. Follow along as I cover how to parse and manipulate documents, explore how embeddings are used to describe abstract concepts, implement a simple yet powerful way to surface the most relevant parts of a document to a given query, and ultimately build a script that you can use to have a locally-hosted LLM engage your own documents. Check out my other Ollama videos: Links: Code from video - Ollama Python library - Project Gutenberg - Nomic Embedding model (on ollama) - BGE Embedding model - How to use a model from HF with Ollama - Cosine Similarity - #cdfc Timestamps: 00:00 - Intro 00:26 - Environment Setup 00:49 - Function review 01:50 - Source Document 02:18 - Starting the project 02:37 - parse_file() 04:35 - Understanding embeddings 05:40 - Implementing embeddings 07:01 - Timing embedding 07:35 - Caching embeddings 10:06 - Prompt embedding 10:19 - Cosine similarity for embedding comparison 12:16 - Brainstorming improvements 13:15 - Giving context to our LLM 14:29 - CLI input 14:49 - Next steps