LogoLogo
BlogTwitterDiscordTelegramSignup/Login
  • Getting Started
    • Welcome to Spice.ai Cloud
    • Getting Started
      • Sign in with GitHub
      • Create a Spice app
      • Add a Dataset and query data
      • Add AI Model and chat with your data
      • Next Steps
    • FAQ
  • Features
    • Federated SQL Query
    • Data Acceleration
      • In-Memory Arrow Data Accelerator
      • DuckDB Data Accelerator
      • PostgreSQL Data Accelerator
      • SQLite Data Accelerator
    • Search & Retrieval
    • AI Gateway
    • Semantic Models
    • ML Models
    • Observability
      • Task History
      • Zipkin
  • Building Blocks
    • Data Connectors
      • ABFS
      • ClickHouse
      • Databricks
      • Delta Lake
      • Dremio
      • DuckDB
      • DynamoDB
      • FlightSQL
      • FTP
      • GitHub
      • GraphQL
      • HTTPS
      • LocalPod
      • Memory
      • MSSQL
      • MySQL
      • ODBC
      • Postgres
      • S3
      • SharePoint
      • Snowflake
      • Spark
      • SpiceAI
    • Model Providers
      • Anthropic
      • Azure
      • Hugging Face
      • OpenAI
      • Perplexity
      • SpiceAI
      • XAI
  • API
    • SQL Query API
      • HTTP API
      • Apache Arrow Flight API
    • OpenAI API
    • Health API
  • Portal
    • Playground
      • SQL Query
      • AI Chat
    • Organizations
    • Apps
      • API keys
      • Secrets
      • Connect GitHub
      • Transfer
    • Public Apps
    • App Spicepod
      • Spicepod Configuration
      • Deployments
      • Spice Runtime Versions
    • Monitoring
    • Profile
      • Personal Access Tokens
  • Use-Cases
    • Agentic AI Apps
    • Database CDN
    • Data Lakehouse
    • Enterprise Search
    • Enterprise RAG
  • SDKs
    • Python SDK
      • Streaming
    • Node.js SDK
      • Streaming
      • API Reference
    • Go SDK
    • Rust SDK
    • Dotnet SDK
    • Java SDK
  • Integrations
    • GitHub Copilot
    • Grafana
  • REFERENCE
    • Core Concepts
      • Duration Literals
    • SQL Reference
      • Data Types
      • SQL Functions
        • Aggregate
          • APPROX_COUNT_DISTINCT
          • AVG
          • BIT_AND
          • BIT_OR
          • CORR
          • COUNT
          • COVAR_POP
          • COVAR_SAMP
          • HLL
          • LISTAGG
          • MAX
          • MIN
          • NDV
          • STDDEV
          • STDDEV_POP
          • STDDEV_SAMP
          • SUM
          • VAR_POP
          • VAR_SAMP
        • Binary
          • BASE64
          • BIT_LENGTH
          • FROM_HEX
          • HEX
          • TO_HEX
          • UNBASE64
          • UNHEX
        • Bitwise
          • BIT_AND
          • BIT_OR
          • LSHIFT
          • RSHIFT
          • XOR
        • Boolean
          • IS [NOT] DISTINCT FROM
          • ISFALSE
          • IS [NOT] NULL
          • ISNUMERIC
          • ISTRUE
          • IS_MEMBER
        • Conditional
          • BOOL_AND
          • BOOL_OR
          • CASE
          • COALESCE
          • GREATEST
          • LEAST
          • NULLIF
        • Conversion
          • BINARY_STRING
          • CAST
          • CONVERT_FROM
          • CONVERT_REPLACEUTF8
          • CONVERT_TIMEZONE
          • CONVERT_TO
          • FLATTEN
          • FROM_HEX
          • HASH
          • HEX
          • TOASCII
          • TO_CHAR
          • TO_DATE
          • TO_HEX
          • TO_NUMBER
          • TO_TIME
          • TO_TIMESTAMP
          • UNHEX
        • Cryptography
          • AES_DECRYPT
          • AES_ENCRYPT
          • MD5
          • SHA
          • SHA1
          • SHA256
          • SHA512
        • Data Generation
          • RANDOM
        • Datatype
          • IS_BIGINT
          • IS_DATE
          • IS_INT
          • IS_VARCHAR
          • SIZE
          • TYPEOF
        • Date/Time
          • CONVERT_TIMEZONE
          • CURRENT_DATE
          • CURRENT_DATE_UTC
          • CURRENT_TIME
          • CURRENT_TIMESTAMP
          • DATEDIFF
          • DATE_ADD
          • DATE_DIFF
          • DATE_PART
          • DATE_SUB
          • DATE_TRUNC
          • DAY
          • DAYOFMONTH
          • DAYOFWEEK
          • DAYOFYEAR
          • EXTRACT
          • HOUR
          • LAST_DAY
          • MINUTE
          • MONTH
          • MONTHS_BETWEEN
          • NEXT_DAY
          • QUARTER
          • SECOND
          • TIMESTAMPADD
          • TIMESTAMPDIFF
          • TO_DATE
          • TO_TIME
          • TO_TIMESTAMP
          • UNIX_TIMESTAMP
          • WEEK
          • WEEKOFYEAR
          • YEAR
        • Math
          • ABS
          • ACOS
          • ASIN
          • ATAN
          • CBRT
          • CEILING
          • COS
          • COSH
          • COT
          • DEGREES
          • E
          • EXP
          • FLOOR
          • LOG
          • LOG10
          • MOD
          • PI
          • POWER
          • RADIANS
          • ROUND
          • SIGN
          • SIN
          • SINH
          • SQRT
          • STDDEV
          • STDDEV_POP
          • STDDEV_SAMP
          • TAN
          • TANH
          • TRUNCATE
        • Percentile
          • MEDIAN
          • PERCENTILE_CONT
          • PERCENTILE_DISC
        • Regular Expressions
          • REGEXP_EXTRACT
          • REGEXP_LIKE
          • REGEXP_MATCHES
          • REGEXP_REPLACE
          • REGEXP_SPLIT
        • Semistructured Data
          • ARRAY_CONTAINS
          • MAP_KEYS
          • MAP_VALUES
        • String
          • ASCII
          • BASE64
          • BTRIM
          • CHARACTER_LENGTH
          • CHAR_LENGTH
          • CHR
          • COL_LIKE
          • CONCAT
          • CONCAT_WS
          • ENDS_WITH
          • FROM_HEX
          • HEX
          • ILIKE
          • INITCAP
          • INSTR
          • IS_UTF8
          • LCASE
          • LEFT
          • LENGTH
          • LEVENSHTEIN
          • LIKE
          • LOCATE
          • LOWER
          • LPAD
          • LTRIM
          • MASK
          • MASK_FIRST_N
          • MASK_HASH
          • MASK_LAST_N
          • MASK_SHOW_FIRST_N
          • MASK_SHOW_LAST_N
          • OCTET_LENGTH
          • POSITION
          • QUOTE
          • REGEXP_EXTRACT
          • REGEXP_LIKE
          • REGEXP_MATCHES
          • REGEXP_REPLACE
          • REGEXP_SPLIT
          • REPEAT
          • REPEATSTR
          • REPLACE
          • REVERSE
          • RIGHT
          • RPAD
          • RTRIM
          • SIMILAR_TO
          • SOUNDEX
          • SPLIT_PART
          • STARTS_WITH
          • STRPOS
          • SUBSTRING
          • SUBSTRING_INDEX
          • TOASCII
          • TO_HEX
          • TRANSLATE
          • TRIM
          • UCASE
          • UNBASE64
          • UNHEX
          • UPPER
        • Window
          • COUNT
          • COVAR_POP
          • COVAR_SAMP
          • CUME_DIST
          • DENSE_RANK
          • FIRST_VALUE
          • HLL
          • LAG
          • LEAD
          • MAX
          • MIN
          • NDV
          • NTILE
          • PERCENT_RANK
          • RANK
          • ROW_NUMBER
          • SUM
          • VAR_POP
          • VAR_SAMP
      • SQL Commands
        • SELECT
        • USE
        • SHOW
        • DESCRIBE
        • WITH
    • Release Notes
  • Pricing
    • Paid Plans
    • Community Plan
  • Support
    • Support
  • Security
    • Security at Spice AI
    • Report a vulnerability
  • Legal
    • Privacy Policy
    • Website Terms of Use
    • Terms of Service
    • End User License Agreement
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
Export as PDF
  1. SDKs
  2. Python SDK

Streaming

Last updated 4 months ago

Was this helpful?

The spicepy SDK supports streaming partial results as they become available.

This can be used to enable more efficient pipelining scenarios where processing each row of the result set can happen independently.

spicepy enables streaming through the use of the .

The object returned from spicepy.Client.query() is a .

>>> from spicepy import Client
>>> import os
>>> client = Client(os.environ["API_KEY"])
>>> rdr = client.query("SELECT * FROM eth.recent_blocks")
<pyarrow._flight.FlightStreamReader object at 0x1059c9980>

Calling to_pandas() on the FlightStreamReader will wait for the stream to return all of the data before returning a pandas DataFrame.

To operate on partial results while the data is streaming, we will take advantage of the method on FlightStreamReader. This returns a FlightStreamChunk, which has a data attribute that is a . Once we have the RecordBatch, we can call to_pandas() on it to return the partial data as a pandas DataFrame. When the stream has ended, calling read_chunk() will raise a StopIteration exception that we can catch.

In this example, we retrieve all 10,000 suppliers from the TPCH Suppliers table. This query retrieves all suppliers in a single call:

from spicepy import Client

client = Client(os.environ["API_KEY"])
query = """
    SELECT s_suppkey, s_name
    FROM tpch.supplier
"""

reader = client.query(query)
suppliers = reader.read_pandas()

Alternatively, to process chunks of data as they arrive instead of waiting for all data to arrive, FlightStreamReader supports reading chunks of data as they become available with read_chunk(). Using the same query example above, but processing data chunk by chunk:

reader = client.query(query)

has_more = True
while has_more:
    try:
        flight_batch = reader.read_chunk()
        record_batch = flight_batch.data
        processChunk(record_batch.to_pandas())
    except StopIteration:
        has_more = False

This call will return a pandas with all 10,000 suppliers, and is a synchronous call that waits for all data to arrive before returning.

pyarrow Flight API
pyarrow.flight.FlightStreamReader
read_chunk()
RecordBatch
DataFrame