Skip to main content

Ollama (Local / Self-Hosted)

Ollama is typically used for local or self-hosted inference. In most cases it does not require third-party cloud API keys, but it does require a reachable service endpoint.

1. Deployment Modes

  • Local mode: install and run Ollama on your own device.
  • Remote mode: deploy Ollama on your own server and expose reachable networking.

2. Pre-checks

  • Target model has been pulled on the server.
  • Endpoint address, port, and network policy are configured correctly.

3. Configure in Mask

  1. Select Ollama as provider.
  2. Fill in local or remote service endpoint.
  3. Choose an installed model and run a test.

4. Common Issues

  • Connection failed: endpoint unreachable, firewall, or closed ports.
  • Model not found: model not downloaded on server side.
  • Slow response: insufficient local compute or excessive concurrency.

5. Privacy Note

  • Local Ollama is often the easiest way to keep data on-device.
  • If using remote self-hosted nodes, data is sent to your server; you are responsible for network and storage security.