Hugging Face
Instructions for using machine learning models hosted on HuggingFace with Spice.
To use a model hosted on HuggingFace, specify the huggingface.co path in the from field and, when needed, the files to include.
Configuration
from
fromThe from key takes the form of huggingface:model_path. Below shows 2 common example of from key configuration.
- huggingface:username/modelname: Implies the latest version of- modelnamehosted by- username.
- huggingface:huggingface.co/username/modelname:revision: Specifies a particular- revisionof- modelnameby- username, including the optional domain.
The from key follows the following regex format.
\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\zThe from key consists of five components:
- Prefix: The value must start with - huggingface:.
- Domain (Optional): Optionally includes - huggingface.co/immediately after the prefix. Currently no other Huggingface compatible services are supported.
- Organization/User: The HuggingFace organization ( - org).
- Model Name: After a - /, the model name (- model).
- Revision (Optional): A colon ( - :) followed by the git-like revision identifier (- revision).
name
nameThe model name. This will be used as the model ID within Spice and Spice's endpoints (i.e. https://data.spiceai.io/v1/models). This can be set to the same value as the model ID in the from field.
params
paramshf_token
The Huggingface access token.
-
model_type
The architecture to load the model as. Supported values: mistral, gemma, mixtral, llama, phi2, phi3, qwen2, gemma2, starcoder2, phi3.5moe, deepseekv2, deepseekv3
-
tools
Which [tools] should be made available to the model. Set to auto to use all available tools.
-
system_prompt
An additional system prompt used for all chat completions to this model.
-
files
filesThe specific file path for Huggingface model. For example, GGUF model formats require a specific file path, other varieties (e.g. .safetensors) are inferred.
Example
models:
  - from: huggingface:huggingface.co/lmstudio-community/Qwen2.5-Coder-3B-Instruct-GGUF
    name: sloth-gguf
    files:
      - path: Qwen2.5-Coder-3B-Instruct-Q3_K_L.ggufAccess Tokens
Access tokens can be provided for Huggingface models in two ways:
- In the Huggingface token cache (i.e. - ~/.cache/huggingface/token). Default.
- Via model params. 
models:
  - name: llama_3.2_1B
    from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
    params:
      hf_token: ${ secrets:HF_TOKEN }Examples
Load a ML model to predict taxi trips outcomes
models:
  - from: huggingface:huggingface.co/spiceai/darts:latest
    name: hf_model
    files:
      - path: model.onnx
    datasets:
      - taxi_tripsLoad a LLM model to generate text
models:
  - from: huggingface:huggingface.co/microsoft/Phi-3.5-mini-instruct
    name: phiLoad a private model
models:
  - name: llama_3.2_1B
    from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
    params:
      hf_token: ${ secrets:HF_TOKEN }For more details on authentication, see access tokens.
Limitations
- The throughput, concurrency & latency of a locally hosted model will vary based on the underlying hardware and model size. Spice supports Apple metal and CUDA for accelerated inference. 
- ML models currently only support ONNX file format. 
Last updated
Was this helpful?