Skip to main content

Training a Custom LLM with Your Data Using LLaMA and LangChain

 In this blog, we’ll walk through a step-by-step guide to building an AI application that can intelligently answer questions based on your website’s content using LangChain, Ollama, and ChromaDB. This approach leverages Retrieval-Augmented Generation (RAG), enabling you to use a pre-trained language model while grounding its responses in your own data — without needing to fine-tune the model.

What We'll Use

  • LangChain – for chaining together data loading, embedding, retrieval, and prompt logic

  • Ollama – to run open-source LLMs (like LLaMA or Mistral) locally

  • ChromaDB – as an efficient vector store

  • WebsiteLoader – to extract data directly from your website

  • RecursiveTextSplitter – for clean and structured chunking of long web content

Step 1: Install Dependencies

pip install langchain chromadb beautifulsoup4 unstructured requests tiktoken

Step 2: Load Website Content

from langchain_community.document_loaders import WebBaseLoader


urls = [

    "https://your-website.com/page1",

    "https://your-website.com/page2"

]

loader = WebBaseLoader(urls)

documents = loader.load()

Step 3: Split the Text with RecursiveTextSplitter

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
docs = text_splitter.split_documents(documents)

Step 4: Generate Embeddings using Ollama

from langchain.embeddings import OllamaEmbeddings

embedding = OllamaEmbeddings(model="nomic-embed-text")

Step 5: Store Embeddings in ChromaDB

from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embedding,
    persist_directory="./chroma_store"
)

vectorstore.persist()

Step 6: Define Your Prompt and Invoke the Chain

from langchain.chains import RetrievalQA
from langchain.llms import Ollama

llm = Ollama(model="mistral")  # or llama3

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever()
)

query = "What services do we offer on our website?"
response = qa_chain.run(query)

print(response)

 Conclusion

This pipeline enables you to turn your website into a custom AI knowledge source without fine-tuning any models. With just a few tools—LangChain, Ollama, and ChromaDB—you can create intelligent assistants that understand and reason over your own content.



Comments

Popular posts from this blog

Generation of Computers

Generation of Computers   – Computers were developed in different phases known as generations of computer. Depending upon the technologies used the development of electronic computers can be divided into five generations. 1.  First generation The duration lasted from 1946-1959 was based on  vacuum tubes . Because thousands of such bulbs were used, the computers were very large and generate a large amount of heat, causing many problems in temperature regulation.  Magnetic drums   were used for  memory  purpose and instruction and data was given through  punch cards . Computer were operated manually and instruction given in  machine language . E.g.   –   UNIVAC  (Universal automatic computer),  ENIAC  (Electronic Numerical Integrator And Calculator ) ,  Mark I  etc. Main Features  – 1.        Vacuum tube technology used 2.     ...

Input and Output devices

Input device Input device is a device through which data and instruction are entered into computer system. An input devices converts the data and instructions into binary form that computer can understand. This transformation is performed by “Input interface”.   The data entered through input device can be some text, some graphical image or symbol, sound etc, depending on the form of the raw data the various input devices are available. Basic Function Performed by Input unit of a computer system -   1. It accepts the instruction and data from the user. 2. It converts these instruction and data in computer acceptable form. 3. It supplies the converted instruction and data to the computer system for further processing. Some of the commonly input devices used are:- 1. Keyboard 2. Mouse 3. Joy stick 4. Track ball 5. Touch screen 6. Light Pen 7. Digitizer 8. Scanner 9. Speech Recognition Devices 1. Keyboard Keyboard is an input device for enteri...

Computer Memory

Memory :  A memory is just like a human brain. It is used to store data and instructions. Computer memory is the storage space in computer where data is to be processed and instructions required for processing are stored. The memory is divided into large number of small parts. Each part is called cell. Each location or cell has a unique address, which varies from zero to memory size minus one. The computer storage memory is measure in term of Bytes. Eight bits make one Bytes. (Measure units)   Primary Memory/Main Memory Primary memory holds only those data and instructions on which computer is currently working. Has limited capacity and data gets lost when power is switched off. It is also called main memory. It is generally made up of semiconductor device. These memories are not as fast as registers. The data and instructions required to be processed earlier reside in main memory. Characteristic of Main Memory ·         ...