Skip to main content
Skip to main content

ArrowFlight table engine

The ArrowFlight table engine enables ClickHouse to read from and write to remote datasets via the Apache Arrow Flight protocol. This integration allows ClickHouse to interact with external Flight-enabled servers in a columnar Arrow format with high performance.

Creating a Table

CREATE TABLE [IF NOT EXISTS] [db.]table_name (name1 [type1], name2 [type2], ...)
    ENGINE = ArrowFlight('host:port', 'dataset_name' [, 'username', 'password']);

Engine Parameters

  • host:port — Address of the remote Arrow Flight server. If the port is omitted, the default port 8815 is used. String.
  • dataset_name — Identifier of the dataset on the Flight server (used as a PATH descriptor or in a SELECT * query depending on the arrow_flight_request_descriptor_type setting). String.
  • username — Username for basic HTTP authentication. String.
  • password — Password for basic HTTP authentication. String.

If username and password are omitted, authentication is not used (this works only if the Arrow Flight server allows unauthenticated access).

The column list is optional — if omitted, the schema is inferred from the remote Arrow Flight server via GetSchema.

Named Collections

The engine supports named collections for storing connection parameters:

CREATE TABLE remote_flight_data
    ENGINE = ArrowFlight(named_collection_name);

Named collection parameters:

ParameterRequiredDefaultDescription
host or hostnameNo""Server hostname.
portYesServer port.
datasetYesDataset name or descriptor.
use_basic_authenticationNotrueEnable basic authentication.
user or usernameIf auth enabledUsername for authentication.
passwordNo""Password for authentication.
enable_sslNofalseEnable TLS encryption.
ssl_caNo""Path to the CA certificate file for TLS verification.
ssl_override_hostnameNo""Override the hostname checked during TLS verification.

Settings

  • arrow_flight_request_descriptor_type — Controls how the dataset name is sent to the Flight server. Possible values: path (default, sends as a PATH descriptor) or command (sends as a CMD descriptor with SELECT * FROM <dataset>). Use command for Flight servers that expect SQL commands (e.g., Dremio).

Usage Example

Reading data from a remote Arrow Flight server:

CREATE TABLE remote_flight_data
(
    id UInt32,
    name String,
    value Float64
) ENGINE = ArrowFlight('127.0.0.1:9005', 'sample_dataset');

SELECT * FROM remote_flight_data ORDER BY id;
┌─id─┬─name────┬─value─┐
│  1 │ foo     │ 42.1  │
│  2 │ bar     │ 13.3  │
│  3 │ baz     │ 77.0  │
└────┴─────────┴───────┘

Inserting data into a remote Arrow Flight server:

INSERT INTO remote_flight_data VALUES (4, 'qux', 99.9);

Notes

  • If columns are specified in the CREATE TABLE statement, they must match the schema returned by the Flight server.
  • If columns are omitted, the schema is inferred automatically from the remote server.
  • Both reading (SELECT) and writing (INSERT) are supported.
  • The arrow_flight_request_descriptor_type setting controls whether the dataset name is sent as a PATH descriptor or as a CMD descriptor wrapping a SELECT * query.

See Also