Hugging Face
Instructions for using machine learning models hosted on HuggingFace with Spice.
To use a model hosted on HuggingFace, specify the huggingface.co path in the from field and, when needed, the files to include.
Configuration
from
fromThe from key takes the form of huggingface:model_path. Below shows 2 common example of from key configuration.
huggingface:username/modelname: Implies the latest version ofmodelnamehosted byusername.huggingface:huggingface.co/username/modelname:revision: Specifies a particularrevisionofmodelnamebyusername, including the optional domain.
The from key follows the following regex format.
\A(huggingface:)(huggingface\.co\/)?(?<org>[\w\-]+)\/(?<model>[\w\-]+)(:(?<revision>[\w\d\-\.]+))?\zThe from key consists of five components:
Prefix: The value must start with
huggingface:.Domain (Optional): Optionally includes
huggingface.co/immediately after the prefix. Currently no other Huggingface compatible services are supported.Organization/User: The HuggingFace organization (
org).Model Name: After a
/, the model name (model).Revision (Optional): A colon (
:) followed by the git-like revision identifier (revision).
name
nameThe model name. This will be used as the model ID within Spice and Spice's endpoints (i.e. https://data.spiceai.io/v1/models). This can be set to the same value as the model ID in the from field.
params
paramshf_token
The Huggingface access token.
-
model_type
The architecture to load the model as. Supported values: mistral, gemma, mixtral, llama, phi2, phi3, qwen2, gemma2, starcoder2, phi3.5moe, deepseekv2, deepseekv3
-
tools
Which [tools] should be made available to the model. Set to auto to use all available tools.
-
system_prompt
An additional system prompt used for all chat completions to this model.
-
files
filesThe specific file path for Huggingface model. For example, GGUF model formats require a specific file path, other varieties (e.g. .safetensors) are inferred.
Example
models:
- from: huggingface:huggingface.co/lmstudio-community/Qwen2.5-Coder-3B-Instruct-GGUF
name: sloth-gguf
files:
- path: Qwen2.5-Coder-3B-Instruct-Q3_K_L.ggufAccess Tokens
Access tokens can be provided for Huggingface models in two ways:
In the Huggingface token cache (i.e.
~/.cache/huggingface/token). Default.Via model params.
models:
- name: llama_3.2_1B
from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
params:
hf_token: ${ secrets:HF_TOKEN }Examples
Load a ML model to predict taxi trips outcomes
models:
- from: huggingface:huggingface.co/spiceai/darts:latest
name: hf_model
files:
- path: model.onnx
datasets:
- taxi_tripsLoad a LLM model to generate text
models:
- from: huggingface:huggingface.co/microsoft/Phi-3.5-mini-instruct
name: phiLoad a private model
models:
- name: llama_3.2_1B
from: huggingface:huggingface.co/meta-llama/Llama-3.2-1B
params:
hf_token: ${ secrets:HF_TOKEN }For more details on authentication, see access tokens.
Limitations
The throughput, concurrency & latency of a locally hosted model will vary based on the underlying hardware and model size. Spice supports Apple metal and CUDA for accelerated inference.
ML models currently only support ONNX file format.
Last updated
Was this helpful?