By Ash in AI — Jul 4, 2025

How to Setup Local LLMs with Web Search and Global Access

Running large language models locally gives you complete control over your AI interactions while maintaining privacy. In this comprehensive guide, I'll show you how to set up a powerful local AI system with Ollama, create a beautiful web interface with Open WebUI, add web search capabilities with SearXNG, and make it accessible from anywhere using Cloudflare Tunnels.

What You'll Build

By the end of this tutorial, you'll have:

🤖 Ollama running powerful coding models locally
🌐 Open WebUI - A sleek chat interface accessible via your custom domain
🔍 SearXNG - Privacy-focused web search integration
🌍 Global access through Cloudflare Tunnels
🔒 Complete privacy - everything runs on your hardware

Prerequisites

Ubuntu/Linux system with at least 16GB RAM (32GB recommended for larger models)
Docker installed
Basic familiarity with command line
A domain managed by Cloudflare (for tunnel setup)

Step 1: Installing Ollama

Ollama makes it incredibly easy to run LLMs locally. Let's start by installing it:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

The installation script will:

Download and install Ollama
Create a system service
Start the Ollama API server on port 11434

Verify the installation:

# Check if Ollama is running
ollama list

Step 2: Installing AI Models

For coding tasks, Qwen2.5-Coder models are among the best available. Let's install both the 7B and 14B versions:

# Install the 7B model (faster, good for quick tasks)
ollama pull qwen2.5-coder:7b

# Install the 14B model (better quality, more capable)
ollama pull qwen2.5-coder:14b

Test your installation:

# Start an interactive session with the 14B model
ollama run qwen2.5-coder:14b

You can now chat with the model directly in your terminal. Type /bye to exit.

Step 3: Setting Up Open WebUI

Open WebUI provides a beautiful, ChatGPT-like interface for your local models. We'll run it using Docker:

# Add your user to the docker group (if not already done)
sudo usermod -aG docker $USER

# Run Open WebUI with host networking for easy Ollama access
sudo docker run -d --network host \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
  ghcr.io/open-webui/open-webui:main

Verify Open WebUI is running:

# Check container status
sudo docker ps | grep open-webui

# Test the web interface
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080

You should see a 200 response, indicating Open WebUI is running on port 8080.

Step 4: Setting Up SearXNG for Web Search

To give your AI access to current information, we'll set up SearXNG, a privacy-respecting search engine:

# Create configuration directory
mkdir -p ~/searxng

# Create SearXNG configuration file
cat > ~/searxng/settings.yml << 'EOF'
use_default_settings: true
server:
  secret_key: "your-secret-key-here"
  limiter: false
  public_instance: false
search:
  formats:
    - html
    - json
engines:
  - name: google
    engine: google
    use_mobile_ui: false
  - name: bing
    engine: bing
  - name: duckduckgo
    engine: duckduckgo
    disabled: false
EOF

# Run SearXNG with custom configuration
sudo docker run -d --name searxng \
  -p 8888:8080 \
  -v ~/searxng/settings.yml:/etc/searxng/settings.yml \
  --restart always \
  searxng/searxng

Test SearXNG:

# Test the JSON API
curl -s "http://localhost:8888/search?q=test&format=json" | head -c 200

Step 5: Integrating Web Search with Open WebUI

Now we'll update Open WebUI to use SearXNG for web search capabilities:

# Stop and remove the current Open WebUI container
sudo docker stop open-webui && sudo docker rm open-webui

# Recreate with web search enabled
sudo docker run -d --network host \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
  -e ENABLE_RAG_WEB_SEARCH=true \
  -e RAG_WEB_SEARCH_ENGINE=searxng \
  -e SEARXNG_QUERY_URL="http://localhost:8888/search?q=<query>&format=json" \
  ghcr.io/open-webui/open-webui:main

Step 6: Configuring Cloudflare Tunnels

Cloudflare Tunnels allow you to securely expose your local services to the internet without opening firewall ports.

Option 1: Using Cloudflare Dashboard (Recommended)

Go to Cloudflare Dashboard → Zero Trust → Networks → Tunnels
Create a new tunnel or select an existing one
Add a Public Hostname:
- Subdomain: chat
- Domain: yourdomain.com
- Service Type: HTTP
- URL: localhost:8080

Option 2: Using CLI

# Install cloudflared (if not already installed)
curl -L --output cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared.deb

# Authenticate with Cloudflare
cloudflared tunnel login

# Create a tunnel
cloudflared tunnel create my-ai-chat

# Configure the tunnel
cat > ~/.cloudflared/config.yml << EOF
tunnel: my-ai-chat
credentials-file: /home/$USER/.cloudflared/your-tunnel-id.json

ingress:
  - hostname: chat.yourdomain.com
    service: http://localhost:8080
  - service: http_status:404
EOF

# Run the tunnel
cloudflared tunnel run my-ai-chat

Step 7: Enable Web Search in Open WebUI

Access your chat interface at https://chat.yourdomain.com
Create an account (first user becomes admin)
Go to Settings → Admin Settings → Features
Enable "Web Search" functionality
Configure search settings to use your SearXNG instance

Step 8: Testing Your Setup

Your AI assistant now has powerful capabilities. Test it with these examples:

Basic Chat:

"Write a Python function to calculate fibonacci numbers"

Web Search Queries:

"What's the latest news about AI developments?"
"Current Bitcoin price?"
"Find recent updates about Python 3.12"

The AI will automatically search the web when it needs current information and provide synthesized responses.

Performance Optimization

RAM Usage by Model Size:

7B models: ~4-5 GB RAM
14B models: ~8-10 GB RAM
32B models: ~18-20 GB RAM

Model Selection Tips:

Use 7B models for quick responses and lower resource usage
Use 14B models for better code quality and reasoning
Switch between models based on your current needs

Security Considerations

Authentication: Set up user authentication in Open WebUI
Rate Limiting: Configure rate limits to prevent abuse
Monitoring: Monitor resource usage and access logs
Updates: Keep all components updated regularly

Troubleshooting

Common Issues:

Models not showing in Open WebUI:

# Check if containers can communicate
sudo docker exec open-webui curl http://127.0.0.1:11434/api/tags

Web search not working:

# Test SearXNG API
curl "http://localhost:8888/search?q=test&format=json"

Cloudflare Tunnel issues:

Verify tunnel status in Cloudflare Dashboard
Check local service is running on correct port
Review tunnel configuration

Conclusion

You now have a powerful, private AI assistant with web search capabilities accessible from anywhere. This setup gives you:

Complete privacy - all processing happens locally
No API costs - use your own hardware
Custom models - choose models that fit your needs
Web search - access to current information
Global access - available from any device, anywhere

The combination of Ollama, Open WebUI, SearXNG, and Cloudflare Tunnels creates a production-ready AI system that rivals commercial offerings while maintaining complete control over your data and interactions.

Next Steps:

Experiment with different models for various tasks
Set up automated backups of your conversations
Explore custom functions and tools in Open WebUI
Consider adding more specialized models for specific domains

Happy AI chatting! 🤖✨

This setup was tested on Ubuntu 22.04 with 32GB RAM. Adjust resource allocation based on your hardware specifications.