Deploy LLM in HuggingFace Spaces For Free Using Ollama

In this tutorial, we’ll explore how to deploy Large Language Models (LLMs) for free using Ollama and LangChain on Hugging Face Spaces. This approach allows you to leverage powerful language models without the need for expensive GPU resources or complex infrastructure.

Prerequisites

A Hugging Face account
Basic knowledge of Python, FastAPI, and Docker
Familiarity with LLMs, Ollama, and LangChain

Step 1: Setting Up the Project

First, let’s create our project structure:

llm-space/
├── app.py
├── Dockerfile
├── requirements.txt
└── start.sh

Step 2: Creating the FastAPI Application

In app.py, we’ll create a FastAPI application that uses Ollama and LangChain. We will be using TinyLlama from Ollama because of its size of around 700 mb. Since, HuggingFace provides limited Hardware, TinyLlama works fine and provides faster response than other models.

import os
import logging
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from langchain_community.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI()
MODEL_NAME = 'tinyllama'

def get_llm():
    callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
    return Ollama(model=MODEL_NAME, callback_manager=callback_manager)

class Question(BaseModel):
    text: str

@app.get("/")
def read_root():
    return {"Hello": f"Welcome to {MODEL_NAME} FastAPI"}

@app.on_event("startup")
async def startup_event():
    logger.info(f"Starting up with model: {MODEL_NAME}")

@app.on_event("shutdown")
async def shutdown_event():
    logger.info("Shutting down")

Step 3: Creating the Dockerfile

Create a Dockerfile to set up the environment:

FROM python:3.9-slim

# Install curl and Ollama
RUN apt-get update && apt-get install -y curl && \
    curl -fsSL https://ollama.ai/install.sh | sh && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

# Set up user and environment
RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
    PATH="/home/user/.local/bin:$PATH"

WORKDIR $HOME/app

COPY --chown=user requirements.txt .
RUN pip install --no-cache-dir --upgrade -r requirements.txt

COPY --chown=user . .

# Make the start script executable
RUN chmod +x start.sh

CMD ["./start.sh"]

Step 4: Creating the Start Script

Create start.sh to initialize Ollama and start the FastAPI server:

#!/bin/bash

# Set environment variables for optimization
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
export CUDA_VISIBLE_DEVICES=0

# Start Ollama in the background
ollama serve &

# Pull the model if not already present
if ! ollama list | grep -q "tinyllama"; then
    ollama pull tinyllama
fi

# Wait for Ollama to start up
max_attempts=30
attempt=0
while ! curl -s http://localhost:11434/api/tags >/dev/null; do
    sleep 1
    attempt=$((attempt + 1))
    if [ $attempt -eq $max_attempts ]; then
        echo "Ollama failed to start within 30 seconds. Exiting."
        exit 1
    fi
done

echo "Ollama is ready."

# Print the API URL
echo "API is running on: http://0.0.0.0:7860"

# Start the FastAPI server
uvicorn app:app --host 0.0.0.0 --port 7860 --workers 4 --limit-concurrency 20

Step 5: Creating the Requirements File

Create requirements.txt with the necessary dependencies:

fastapi
uvicorn
langchain
langchain_community
ollama

Step 6: Setting Up Hugging Face Space

Go to Hugging Face and sign in.

Click on Profile Icon then “New Space” or Click on Spaces then “Create new Space”

Give your space a name and set it to “Public” if you want others to access it. Choose a License or leave it blank.

Choose “Docker” then “Blank” as the Space SDK.

Choose the FREE Space Hardware. Then Select “Public” because we will be using FastAPI and this requires a Public Space which provides as the url to access the model.

After clicking “Create Space”, we will get instructions like this:

Step 7: Uploading Files to Hugging Face Space

Upload the following files to your Hugging Face Space:

Clone the repo and then copy the files in the local repo (Recommended)

app.py
Dockerfile
requirements.txt
start.sh

git add .

git commit -m "Push"

git push

Step 8: Building and Deploying

Once you’ve uploaded all the files, Hugging Face Spaces will automatically detect the Dockerfile and start building your container. This process may take a few minutes.

After the process is completed, you will see something like this message.

Step 9: Testing Your Deployment

After the build is complete, you can test your deployment by:

Accessing the Space URL provided by Hugging Face.

Click on the three dots beside the “Settings” and “Select Embed this Space”. Remember, You won’t see “Embed this Space” if your repo is private.

Using the FastAPI Swagger UI (usually available at /docs) to test the /ask endpoint.

The response will be slower at first (20 seconds) but it relatively gets better (10 to 15 seconds) depending on the output. That’s why it is better to stream the response.

Feel Free to Make any Changes in the Local Repo and push the changes. HuggingFace will detect those changes. You can also change the model from TinyLlama to Gemma or Phi 3 mini. While the response will be better, the response will be slower.

Conclusion

You’ve now successfully deployed an LLM using Ollama and LangChain on Hugging Face Spaces for free. This setup allows you to run inference on various language models without the need for expensive GPU resources.

Remember that free tier resources on Hugging Face Spaces are limited, so this setup is best for small-scale projects or demonstrations. For production use or larger workloads, you might need to consider upgrading to a paid tier or exploring other hosting options.

Happy coding and enjoy your free LLM deployment!

Data Science and AI Labs

Deploy LLM in HuggingFace Spaces For Free Using Ollama

Prerequisites

Step 1: Setting Up the Project

Step 2: Creating the FastAPI Application

Step 3: Creating the Dockerfile

Step 4: Creating the Start Script

Step 5: Creating the Requirements File

Step 6: Setting Up Hugging Face Space

Step 7: Uploading Files to Hugging Face Space

Step 8: Building and Deploying

Step 9: Testing Your Deployment

Conclusion

Post a Comment

0 Comments

Followers-Theo dõi

Mã Giảm Giá

Categories

Featured Post