A ray of straight lights
A ray of straight lights
A ray of straight lights

The Era of AI: Cloud vs. Local AI

The Era of AI: Cloud vs. Local AI

The Era of AI: Cloud vs. Local AI

Cristian Jay Duque

Chief Content Officer @ IOL Inc.

May 27, 2024

May 27, 2024

As we stand on the brink of the AI era, it's fascinating to observe how AI is integrating seamlessly into various aspects of our lives—whether at work, school, home, or even outdoors. However, most AI applications today rely on cloud computing, which provides on-demand access to computing resources, particularly for data storage and processing power. This model has revolutionized technology use by offering scalable and powerful computational resources. Yet, it comes with its drawbacks: dependency on a reliable internet connection, potential latency issues, and concerns over data privacy. 

The Rise of Local AI 

Imagine accessing all the AI capabilities directly on your local machine without relying on the internet. Enter local AI, also known as edge AI. This approach involves running AI algorithms on the device itself, offering significant advantages like lower latency, enhanced privacy, and reduced dependence on constant internet connectivity. Local AI shines in scenarios with poor or intermittent connectivity and in applications requiring real-time processing, such as self-driving cars, medical equipment, and industrial control systems. By keeping data processing local, it also enhances data security by minimizing the need to transmit sensitive information over the internet. 

With advancements in technology, we are likely to see a hybrid model where local AI and cloud-based systems work together, creating a flexible and robust AI landscape. This integration will enable individuals and businesses to harness the full potential of AI, irrespective of internet availability or data privacy concerns. 

— 

Technical Stuff 

How to Install Your Local AI on Your Machine 

Hardware Requirements 

CPU 

  • Optimal Choice: 11th Gen Intel or Zen4-based AMD CPUs. 

  • Reason: AVX512 support accelerates AI model operations, and DDR5 support boosts performance through increased memory bandwidth. 

  • Key Features: CPU instruction sets are more crucial than core counts. 

RAM 

  • Minimum Requirement: 16GB. 

  • Purpose: Sufficient for running models like 7B parameters effectively. Suitable for smaller models or larger ones with caution.

  • Disk Space 

    • Practical Minimum: 50GB. 

    • Usage: Accommodates Docker container (around 2GB for Ollama-WebUI) and model files, covering essentials without much extra buffer. 

GPU 

  • Recommendation: Not mandatory but beneficial for enhanced performance.

  • Model Inference: A GPU can significantly speed up performance, especially for large models. 

  • VRAM Requirements for FP16 Models:

    • 7B model: ~26GB VRAM 

  • Quantized Models Support: Efficient handling with less VRAM: 

    • 7B model: ~4GB VRAM 

    • 13B model: ~8GB VRAM 

    • 30B model: ~16GB VRAM 

    • 65B model: ~32GB VRAM 

Larger Models 

Running 13B+ and MoE Models: Recommended only with a Mac, a large GPU, or one supporting quant formats efficiently due to high memory and computational needs. 

Overall Recommendation 

For optimal performance with Ollama and Ollama-WebUI: 

  • CPU: Intel/AMD with AVX512 or DDR5 support. 

  • RAM: At least 16GB. 

  • Disk Space: Around 50GB. 

  • GPU: Recommended for performance boosts, especially with models at the 7B parameter level or higher. Large or quant-supporting GPUs are essential for running larger models efficiently. 

Software Requirements 

Operating Systems: Windows 10/11 64-bit, Latest Linux, or Mac OS - **Software:** Docker (Latest), WSL Repo (Available in Microsoft Store for Windows installation) 

How to Install 

Note: For certain Docker environments, additional configurations might be needed. If you encounter any connection issues, our detailed guide on Open WebUI Documentation is ready to assist you.

Quick Start with Docker 

Warning: When using Docker to install Open WebUI, make sure to include the `-v 

open-webui:/app/backend/data` in your Docker command. This step is crucial as it ensures your database is properly mounted and prevents any loss of data. 

Tip: If you wish to utilize Open WebUI with Ollama included or CUDA acceleration, we recommend utilizing our official tags with either `:cuda` or `:ollama`. To enable CUDA, you must install the Nvidia CUDA container toolkit on your Linux/WSL system. 

— 

Installation with Default Configuration 

If Ollama is on Your Computer 

Use this command: 

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main 

If Ollama is on a Different Server 

To connect to Ollama on another server, change the `OLLAMA_BASE_URL` to the server's URL: 

docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main 

To Run Open WebUI with Nvidia GPU Support 

Use this command: 

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:cuda 

Installation for OpenAI API Usage Only 

If you're only using the OpenAI API, use this command: 

docker run -d -p 3000:8080 -e OPENAI_API_KEY=your_secret_key -v 

open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main

Installing Open WebUI with Bundled Ollama Support 

This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup: 

With GPU Support 

Utilize GPU resources by running the following command: 

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama 

For CPU Only 

If you're not using a GPU, use this command instead: 

docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama 

Both commands facilitate a built-in, hassle-free installation of both Open WebUI and Ollama, ensuring that you can get everything up and running swiftly. 

After installation, you can access Open WebUI at http://localhost:3000. Enjoy!

Other Installation Methods 

We offer various installation alternatives, including non-Docker native installation methods, Docker Compose, Kustomize, and Helm. Visit our [Open WebUI Documentation](https://docs.openwebui.io) or join our Discord community for comprehensive guidance. 

Troubleshooting 
Open WebUI: Server Connection Error 

If you're experiencing connection issues, it’s often due to the WebUI Docker container not being able to reach the Ollama server at `127.0.0.1:11434` (host.docker.internal:11434) inside the container. Use the `--network=host` flag in your Docker command to resolve this. Note that the port changes from `3000` to `8080`, resulting in the link: http://localhost:8080

Example Docker Command:

docker run -d --network=host -v open-webui:/app/backend/data -e 

OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main 

Keeping Your Docker Installation Up-to-Date 

In case you want to update your local Docker installation to the latest version, you can do it with Watchtower: 

docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui 

This implementation can benefit places and organizations with limited internet access. Enjoy 

CeeJay

As we stand on the brink of the AI era, it's fascinating to observe how AI is integrating seamlessly into various aspects of our lives—whether at work, school, home, or even outdoors. However, most AI applications today rely on cloud computing, which provides on-demand access to computing resources, particularly for data storage and processing power. This model has revolutionized technology use by offering scalable and powerful computational resources. Yet, it comes with its drawbacks: dependency on a reliable internet connection, potential latency issues, and concerns over data privacy. 

The Rise of Local AI 

Imagine accessing all the AI capabilities directly on your local machine without relying on the internet. Enter local AI, also known as edge AI. This approach involves running AI algorithms on the device itself, offering significant advantages like lower latency, enhanced privacy, and reduced dependence on constant internet connectivity. Local AI shines in scenarios with poor or intermittent connectivity and in applications requiring real-time processing, such as self-driving cars, medical equipment, and industrial control systems. By keeping data processing local, it also enhances data security by minimizing the need to transmit sensitive information over the internet. 

With advancements in technology, we are likely to see a hybrid model where local AI and cloud-based systems work together, creating a flexible and robust AI landscape. This integration will enable individuals and businesses to harness the full potential of AI, irrespective of internet availability or data privacy concerns. 

— 

Technical Stuff 

How to Install Your Local AI on Your Machine 

Hardware Requirements 

CPU 

  • Optimal Choice: 11th Gen Intel or Zen4-based AMD CPUs. 

  • Reason: AVX512 support accelerates AI model operations, and DDR5 support boosts performance through increased memory bandwidth. 

  • Key Features: CPU instruction sets are more crucial than core counts. 

RAM 

  • Minimum Requirement: 16GB. 

  • Purpose: Sufficient for running models like 7B parameters effectively. Suitable for smaller models or larger ones with caution.

  • Disk Space 

    • Practical Minimum: 50GB. 

    • Usage: Accommodates Docker container (around 2GB for Ollama-WebUI) and model files, covering essentials without much extra buffer. 

GPU 

  • Recommendation: Not mandatory but beneficial for enhanced performance.

  • Model Inference: A GPU can significantly speed up performance, especially for large models. 

  • VRAM Requirements for FP16 Models:

    • 7B model: ~26GB VRAM 

  • Quantized Models Support: Efficient handling with less VRAM: 

    • 7B model: ~4GB VRAM 

    • 13B model: ~8GB VRAM 

    • 30B model: ~16GB VRAM 

    • 65B model: ~32GB VRAM 

Larger Models 

Running 13B+ and MoE Models: Recommended only with a Mac, a large GPU, or one supporting quant formats efficiently due to high memory and computational needs. 

Overall Recommendation 

For optimal performance with Ollama and Ollama-WebUI: 

  • CPU: Intel/AMD with AVX512 or DDR5 support. 

  • RAM: At least 16GB. 

  • Disk Space: Around 50GB. 

  • GPU: Recommended for performance boosts, especially with models at the 7B parameter level or higher. Large or quant-supporting GPUs are essential for running larger models efficiently. 

Software Requirements 

Operating Systems: Windows 10/11 64-bit, Latest Linux, or Mac OS - **Software:** Docker (Latest), WSL Repo (Available in Microsoft Store for Windows installation) 

How to Install 

Note: For certain Docker environments, additional configurations might be needed. If you encounter any connection issues, our detailed guide on Open WebUI Documentation is ready to assist you.

Quick Start with Docker 

Warning: When using Docker to install Open WebUI, make sure to include the `-v 

open-webui:/app/backend/data` in your Docker command. This step is crucial as it ensures your database is properly mounted and prevents any loss of data. 

Tip: If you wish to utilize Open WebUI with Ollama included or CUDA acceleration, we recommend utilizing our official tags with either `:cuda` or `:ollama`. To enable CUDA, you must install the Nvidia CUDA container toolkit on your Linux/WSL system. 

— 

Installation with Default Configuration 

If Ollama is on Your Computer 

Use this command: 

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main 

If Ollama is on a Different Server 

To connect to Ollama on another server, change the `OLLAMA_BASE_URL` to the server's URL: 

docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main 

To Run Open WebUI with Nvidia GPU Support 

Use this command: 

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:cuda 

Installation for OpenAI API Usage Only 

If you're only using the OpenAI API, use this command: 

docker run -d -p 3000:8080 -e OPENAI_API_KEY=your_secret_key -v 

open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main

Installing Open WebUI with Bundled Ollama Support 

This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup: 

With GPU Support 

Utilize GPU resources by running the following command: 

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama 

For CPU Only 

If you're not using a GPU, use this command instead: 

docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama 

Both commands facilitate a built-in, hassle-free installation of both Open WebUI and Ollama, ensuring that you can get everything up and running swiftly. 

After installation, you can access Open WebUI at http://localhost:3000. Enjoy!

Other Installation Methods 

We offer various installation alternatives, including non-Docker native installation methods, Docker Compose, Kustomize, and Helm. Visit our [Open WebUI Documentation](https://docs.openwebui.io) or join our Discord community for comprehensive guidance. 

Troubleshooting 
Open WebUI: Server Connection Error 

If you're experiencing connection issues, it’s often due to the WebUI Docker container not being able to reach the Ollama server at `127.0.0.1:11434` (host.docker.internal:11434) inside the container. Use the `--network=host` flag in your Docker command to resolve this. Note that the port changes from `3000` to `8080`, resulting in the link: http://localhost:8080

Example Docker Command:

docker run -d --network=host -v open-webui:/app/backend/data -e 

OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main 

Keeping Your Docker Installation Up-to-Date 

In case you want to update your local Docker installation to the latest version, you can do it with Watchtower: 

docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui 

This implementation can benefit places and organizations with limited internet access. Enjoy 

CeeJay

As we stand on the brink of the AI era, it's fascinating to observe how AI is integrating seamlessly into various aspects of our lives—whether at work, school, home, or even outdoors. However, most AI applications today rely on cloud computing, which provides on-demand access to computing resources, particularly for data storage and processing power. This model has revolutionized technology use by offering scalable and powerful computational resources. Yet, it comes with its drawbacks: dependency on a reliable internet connection, potential latency issues, and concerns over data privacy. 

The Rise of Local AI 

Imagine accessing all the AI capabilities directly on your local machine without relying on the internet. Enter local AI, also known as edge AI. This approach involves running AI algorithms on the device itself, offering significant advantages like lower latency, enhanced privacy, and reduced dependence on constant internet connectivity. Local AI shines in scenarios with poor or intermittent connectivity and in applications requiring real-time processing, such as self-driving cars, medical equipment, and industrial control systems. By keeping data processing local, it also enhances data security by minimizing the need to transmit sensitive information over the internet. 

With advancements in technology, we are likely to see a hybrid model where local AI and cloud-based systems work together, creating a flexible and robust AI landscape. This integration will enable individuals and businesses to harness the full potential of AI, irrespective of internet availability or data privacy concerns. 

— 

Technical Stuff 

How to Install Your Local AI on Your Machine 

Hardware Requirements 

CPU 

  • Optimal Choice: 11th Gen Intel or Zen4-based AMD CPUs. 

  • Reason: AVX512 support accelerates AI model operations, and DDR5 support boosts performance through increased memory bandwidth. 

  • Key Features: CPU instruction sets are more crucial than core counts. 

RAM 

  • Minimum Requirement: 16GB. 

  • Purpose: Sufficient for running models like 7B parameters effectively. Suitable for smaller models or larger ones with caution.

  • Disk Space 

    • Practical Minimum: 50GB. 

    • Usage: Accommodates Docker container (around 2GB for Ollama-WebUI) and model files, covering essentials without much extra buffer. 

GPU 

  • Recommendation: Not mandatory but beneficial for enhanced performance.

  • Model Inference: A GPU can significantly speed up performance, especially for large models. 

  • VRAM Requirements for FP16 Models:

    • 7B model: ~26GB VRAM 

  • Quantized Models Support: Efficient handling with less VRAM: 

    • 7B model: ~4GB VRAM 

    • 13B model: ~8GB VRAM 

    • 30B model: ~16GB VRAM 

    • 65B model: ~32GB VRAM 

Larger Models 

Running 13B+ and MoE Models: Recommended only with a Mac, a large GPU, or one supporting quant formats efficiently due to high memory and computational needs. 

Overall Recommendation 

For optimal performance with Ollama and Ollama-WebUI: 

  • CPU: Intel/AMD with AVX512 or DDR5 support. 

  • RAM: At least 16GB. 

  • Disk Space: Around 50GB. 

  • GPU: Recommended for performance boosts, especially with models at the 7B parameter level or higher. Large or quant-supporting GPUs are essential for running larger models efficiently. 

Software Requirements 

Operating Systems: Windows 10/11 64-bit, Latest Linux, or Mac OS - **Software:** Docker (Latest), WSL Repo (Available in Microsoft Store for Windows installation) 

How to Install 

Note: For certain Docker environments, additional configurations might be needed. If you encounter any connection issues, our detailed guide on Open WebUI Documentation is ready to assist you.

Quick Start with Docker 

Warning: When using Docker to install Open WebUI, make sure to include the `-v 

open-webui:/app/backend/data` in your Docker command. This step is crucial as it ensures your database is properly mounted and prevents any loss of data. 

Tip: If you wish to utilize Open WebUI with Ollama included or CUDA acceleration, we recommend utilizing our official tags with either `:cuda` or `:ollama`. To enable CUDA, you must install the Nvidia CUDA container toolkit on your Linux/WSL system. 

— 

Installation with Default Configuration 

If Ollama is on Your Computer 

Use this command: 

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main 

If Ollama is on a Different Server 

To connect to Ollama on another server, change the `OLLAMA_BASE_URL` to the server's URL: 

docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main 

To Run Open WebUI with Nvidia GPU Support 

Use this command: 

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:cuda 

Installation for OpenAI API Usage Only 

If you're only using the OpenAI API, use this command: 

docker run -d -p 3000:8080 -e OPENAI_API_KEY=your_secret_key -v 

open-webui:/app/backend/data --name open-webui --restart always 

ghcr.io/open-webui/open-webui:main

Installing Open WebUI with Bundled Ollama Support 

This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup: 

With GPU Support 

Utilize GPU resources by running the following command: 

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama 

For CPU Only 

If you're not using a GPU, use this command instead: 

docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama 

Both commands facilitate a built-in, hassle-free installation of both Open WebUI and Ollama, ensuring that you can get everything up and running swiftly. 

After installation, you can access Open WebUI at http://localhost:3000. Enjoy!

Other Installation Methods 

We offer various installation alternatives, including non-Docker native installation methods, Docker Compose, Kustomize, and Helm. Visit our [Open WebUI Documentation](https://docs.openwebui.io) or join our Discord community for comprehensive guidance. 

Troubleshooting 
Open WebUI: Server Connection Error 

If you're experiencing connection issues, it’s often due to the WebUI Docker container not being able to reach the Ollama server at `127.0.0.1:11434` (host.docker.internal:11434) inside the container. Use the `--network=host` flag in your Docker command to resolve this. Note that the port changes from `3000` to `8080`, resulting in the link: http://localhost:8080

Example Docker Command:

docker run -d --network=host -v open-webui:/app/backend/data -e 

OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main 

Keeping Your Docker Installation Up-to-Date 

In case you want to update your local Docker installation to the latest version, you can do it with Watchtower: 

docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui 

This implementation can benefit places and organizations with limited internet access. Enjoy 

CeeJay