top of page

How to Run the LLama3 Model with Ollama in a Docker Container


LLAMA

What is LLama ?

In simple terms, LLaMA (Large Language Model Meta AI) is a powerful computer program developed by Meta (the company formerly known as Facebook) that can understand and generate human language. It’s like a very advanced chatbot or text assistant.

What is Ollama ?

Ollama is a user-friendly tool that helps you run and manage AI models, which are computer programs designed to understand and generate human language. Think of Ollama as a smart assistant that makes it easier to use these advanced AI models without needing deep technical knowledge.

How LLama works ?

LLaMA is trained on a vast amount of text data from books, articles, websites, and other sources. It uses this training to learn patterns in language, such as grammar, vocabulary, and context. When you give LLaMA a task (like writing a story or answering a question), it uses what it has learned to produce a response that makes sense based on the input it receives.


In essence, LLaMA is like a super-smart text assistant that can handle a wide range of language-related tasks, making it a valuable tool for businesses, educators, writers, and anyone who works with language.


Prerequisites

Before we get started, make sure Docker is installed on your system.


Step 1: Download the Ollama Docker Image

[root@siddhesh ~]# docker pull ollama/ollama
Using default tag: latest
latest: Pulling from ollama/ollama
7646c8da3324: Pull complete
d1060ab4fb75: Pull complete
e58f7d737fbb: Pull complete
Digest: sha256:4a3c5b5261f325580d7f4f6440e5094d807784f0513439dcabfda9c2bdf4191e
Status: Downloaded newer image for ollama/ollama:latest
docker.io/ollama/ollama:latest
[root@siddhesh ~]#

The command docker pull ollama/ollama is used to download a Docker image from the Docker Hub repository.


Step 2: Start the Ollama Container

[root@siddhesh ~]# docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
affc37587b569d3253acc076d19465f1179485afedb5dee6c867e00716c63963
[root@siddhesh ~]#

The command docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama is used to start a new Docker container from the ollama/ollama image. This command launches a container using the Ollama image and establishes a mapping between port 11434 on your local machine and port 11434 within the container.


Step 3: Execute the LLama2 Model Locally
[root@siddhesh ~]# docker exec -it ollama ollama run llama3
pulling manifest
verifying sha256 digest
writing manifest
removing any unused layers
success
[root@siddhesh ~]#

This will execute the LLama3 model locally.


Step 4: Streaming response through LLama3

[root@siddhesh ~]# curl http://localhost:11434/api/generate -d '{"model": "llama3","prompt": "What is 200 + 300?","stream": false}'
{"model":"llama3",
"created_at":"2024-07-05T08:51:11.582432818Z",
"response":"The answer to 200 + 300 is 500.",
"done":true,
"done_reason":"stop",
"context[128006,882,128007,271,3923,374,220,1049,489,220,3101,30,128009,128006,78191,128007,271,791,4320,311,220,1049,489,220,3101,374,220,2636,13,128009],
"total_duration":12737708015,
"load_duration":39566513,
"prompt_eval_count":13,
"prompt_eval_duration":4891908000,
"eval_count":13,"eval_duration":7753703000}
[root@siddhesh ~]#

This JSON response gives a detailed overview of how the AI model performed when generating a response. It includes details like the model used, the generated text, timing details, and internal metrics about how the model handled the input prompt.

In the above example, we asked the question "What is 200 + 300?" to LLama3. In return, we received the response "The answer to 200 + 300 is 500." Similarly, you can ask any question and receive a response locally, without relying on any other remotely hosted source.


bottom of page