How to Setup Local LLMs with Web Search and Global Access
Running large language models locally gives you complete control over your AI interactions while maintaining privacy. In this comprehensive guide, I'll show you how to set up a powerful local AI system with Ollama, create a beautiful web interface with Open WebUI, add web search capabilities with SearXNG, and make it accessible from anywhere using Cloudflare Tunnels.
What You'll Build
By the end of this tutorial, you'll have:
- 🤖 Ollama running powerful coding models locally
- 🌐 Open WebUI - A sleek chat interface accessible via your custom domain
- 🔍 SearXNG - Privacy-focused web search integration
- 🌍 Global access through Cloudflare Tunnels
- 🔒 Complete privacy - everything runs on your hardware
Prerequisites
- Ubuntu/Linux system with at least 16GB RAM (32GB recommended for larger models)
- Docker installed
- Basic familiarity with command line
- A domain managed by Cloudflare (for tunnel setup)
Step 1: Installing Ollama
Ollama makes it incredibly easy to run LLMs locally. Let's start by installing it:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
The installation script will:
- Download and install Ollama
- Create a system service
- Start the Ollama API server on port 11434
Verify the installation:
# Check if Ollama is running
ollama list
Step 2: Installing AI Models
For coding tasks, Qwen2.5-Coder models are among the best available. Let's install both the 7B and 14B versions:
# Install the 7B model (faster, good for quick tasks)
ollama pull qwen2.5-coder:7b
# Install the 14B model (better quality, more capable)
ollama pull qwen2.5-coder:14b
Test your installation:
# Start an interactive session with the 14B model
ollama run qwen2.5-coder:14b
You can now chat with the model directly in your terminal. Type /bye
to exit.
Step 3: Setting Up Open WebUI
Open WebUI provides a beautiful, ChatGPT-like interface for your local models. We'll run it using Docker:
# Add your user to the docker group (if not already done)
sudo usermod -aG docker $USER
# Run Open WebUI with host networking for easy Ollama access
sudo docker run -d --network host \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
ghcr.io/open-webui/open-webui:main
Verify Open WebUI is running:
# Check container status
sudo docker ps | grep open-webui
# Test the web interface
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080
You should see a 200
response, indicating Open WebUI is running on port 8080.
Step 4: Setting Up SearXNG for Web Search
To give your AI access to current information, we'll set up SearXNG, a privacy-respecting search engine:
# Create configuration directory
mkdir -p ~/searxng
# Create SearXNG configuration file
cat > ~/searxng/settings.yml << 'EOF'
use_default_settings: true
server:
secret_key: "your-secret-key-here"
limiter: false
public_instance: false
search:
formats:
- html
- json
engines:
- name: google
engine: google
use_mobile_ui: false
- name: bing
engine: bing
- name: duckduckgo
engine: duckduckgo
disabled: false
EOF
# Run SearXNG with custom configuration
sudo docker run -d --name searxng \
-p 8888:8080 \
-v ~/searxng/settings.yml:/etc/searxng/settings.yml \
--restart always \
searxng/searxng
Test SearXNG:
# Test the JSON API
curl -s "http://localhost:8888/search?q=test&format=json" | head -c 200
Step 5: Integrating Web Search with Open WebUI
Now we'll update Open WebUI to use SearXNG for web search capabilities:
# Stop and remove the current Open WebUI container
sudo docker stop open-webui && sudo docker rm open-webui
# Recreate with web search enabled
sudo docker run -d --network host \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
-e ENABLE_RAG_WEB_SEARCH=true \
-e RAG_WEB_SEARCH_ENGINE=searxng \
-e SEARXNG_QUERY_URL="http://localhost:8888/search?q=<query>&format=json" \
ghcr.io/open-webui/open-webui:main
Step 6: Configuring Cloudflare Tunnels
Cloudflare Tunnels allow you to securely expose your local services to the internet without opening firewall ports.
Option 1: Using Cloudflare Dashboard (Recommended)
- Go to Cloudflare Dashboard → Zero Trust → Networks → Tunnels
- Create a new tunnel or select an existing one
- Add a Public Hostname:
- Subdomain:
chat
- Domain:
yourdomain.com
- Service Type:
HTTP
- URL:
localhost:8080
- Subdomain:
Option 2: Using CLI
# Install cloudflared (if not already installed)
curl -L --output cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared.deb
# Authenticate with Cloudflare
cloudflared tunnel login
# Create a tunnel
cloudflared tunnel create my-ai-chat
# Configure the tunnel
cat > ~/.cloudflared/config.yml << EOF
tunnel: my-ai-chat
credentials-file: /home/$USER/.cloudflared/your-tunnel-id.json
ingress:
- hostname: chat.yourdomain.com
service: http://localhost:8080
- service: http_status:404
EOF
# Run the tunnel
cloudflared tunnel run my-ai-chat
Step 7: Enable Web Search in Open WebUI
- Access your chat interface at
https://chat.yourdomain.com
- Create an account (first user becomes admin)
- Go to Settings → Admin Settings → Features
- Enable "Web Search" functionality
- Configure search settings to use your SearXNG instance
Step 8: Testing Your Setup
Your AI assistant now has powerful capabilities. Test it with these examples:
Basic Chat:
"Write a Python function to calculate fibonacci numbers"
Web Search Queries:
"What's the latest news about AI developments?"
"Current Bitcoin price?"
"Find recent updates about Python 3.12"
The AI will automatically search the web when it needs current information and provide synthesized responses.
Performance Optimization
RAM Usage by Model Size:
- 7B models: ~4-5 GB RAM
- 14B models: ~8-10 GB RAM
- 32B models: ~18-20 GB RAM
Model Selection Tips:
- Use 7B models for quick responses and lower resource usage
- Use 14B models for better code quality and reasoning
- Switch between models based on your current needs
Security Considerations
- Authentication: Set up user authentication in Open WebUI
- Rate Limiting: Configure rate limits to prevent abuse
- Monitoring: Monitor resource usage and access logs
- Updates: Keep all components updated regularly
Troubleshooting
Common Issues:
Models not showing in Open WebUI:
# Check if containers can communicate
sudo docker exec open-webui curl http://127.0.0.1:11434/api/tags
Web search not working:
# Test SearXNG API
curl "http://localhost:8888/search?q=test&format=json"
Cloudflare Tunnel issues:
- Verify tunnel status in Cloudflare Dashboard
- Check local service is running on correct port
- Review tunnel configuration
Conclusion
You now have a powerful, private AI assistant with web search capabilities accessible from anywhere. This setup gives you:
- Complete privacy - all processing happens locally
- No API costs - use your own hardware
- Custom models - choose models that fit your needs
- Web search - access to current information
- Global access - available from any device, anywhere
The combination of Ollama, Open WebUI, SearXNG, and Cloudflare Tunnels creates a production-ready AI system that rivals commercial offerings while maintaining complete control over your data and interactions.
Next Steps:
- Experiment with different models for various tasks
- Set up automated backups of your conversations
- Explore custom functions and tools in Open WebUI
- Consider adding more specialized models for specific domains
Happy AI chatting! 🤖✨
This setup was tested on Ubuntu 22.04 with 32GB RAM. Adjust resource allocation based on your hardware specifications.