Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions docs/admin_client_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -354,12 +354,12 @@ The schema client validates SQL queries and returns their output schemas without
```python
# Validate a query and get its schema
schema_response = client.schema.get_output_schema(
sql_query='SELECT block_num, hash, timestamp FROM eth.blocks WHERE block_num > 1000000',
is_sql_dataset=True
tables={'query_analysis': 'SELECT block_num, hash, timestamp FROM eth.blocks WHERE block_num > 1000000'},
dependencies={'eth': '_/eth_firehose@1.0.0'}
)

# Inspect the Arrow schema
print(schema_response.schema)
print(schema_response.schemas['query_analysis'].schema)
```

This is particularly useful for:
Expand Down Expand Up @@ -651,8 +651,11 @@ with Client(query_url=..., admin_url=..., auth_token=...) as client:

```python
# Validate before registration
schema = client.schema.get_output_schema(sql_query, True)
print(f"Query will produce {len(schema.schema['fields'])} columns")
response = client.schema.get_output_schema(
tables={'t': sql_query},
dependencies={...}
)
print(f"Query will produce {len(response.schemas['t'].schema['fields'])} columns")
```

### 3. Version Your Datasets
Expand Down
49 changes: 33 additions & 16 deletions docs/api/client_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -598,17 +598,21 @@ Validate SQL query and get its output Arrow schema without executing it.

```python
get_output_schema(
sql_query: str,
is_sql_dataset: bool = True
) -> models.OutputSchemaResponse
tables: Optional[dict[str, str]] = None,
dependencies: Optional[dict[str, str]] = None,
functions: Optional[dict[str, Any]] = None
) -> models.SchemaResponse
```

**Parameters:**

- `sql_query` (str): SQL query to analyze.
- `is_sql_dataset` (bool, optional): Whether this is for a SQL dataset. Default: True.
| Parameter | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `tables` | `dict[str, str]` | `None` | Optional map of table names to SQL queries |
| `dependencies` | `dict[str, str]` | `None` | Optional map of alias -> dataset reference |
| `functions` | `dict[str, Any]` | `None` | Optional map of function definitions |

**Returns:** `OutputSchemaResponse` with Arrow schema.
**Returns:** `SchemaResponse` containing schemas for all requested tables.

**Raises:**

Expand All @@ -619,11 +623,11 @@ get_output_schema(

```python
response = client.schema.get_output_schema(
'SELECT block_num, hash FROM eth.blocks WHERE block_num > 1000000',
is_sql_dataset=True
tables={'my_table': 'SELECT block_num FROM eth.blocks'},
dependencies={'eth': '_/eth_firehose@1.0.0'}
)

print(response.schema) # Arrow schema dict
print(response.schemas['my_table'].schema) # Arrow schema dict
```

---
Expand Down Expand Up @@ -712,13 +716,22 @@ Response from deploying a dataset.

- `job_id` (int): ID of the created job

#### `OutputSchemaResponse`
#### `SchemaResponse`

Response containing Arrow schema for a query.
Response containing schemas for one or more tables.

**Fields:**

- `schemas` (dict[str, TableSchemaWithNetworks]): Map of table names to their schemas

#### `TableSchemaWithNetworks`

Response containing Arrow schema for a query and associated networks.

**Fields:**

- `schema` (dict): Arrow schema dictionary
- `networks` (list[str]): List of referenced networks

### Request Models

Expand All @@ -733,14 +746,15 @@ Request to register a dataset.
- `version` (str, optional): Version string
- `manifest` (dict): Dataset manifest

#### `OutputSchemaRequest`
#### `SchemaRequest`

Request to get output schema for a query.
Request for schema analysis with dependencies, tables, and functions.

**Fields:**

- `sql_query` (str): SQL query
- `is_sql_dataset` (bool): Whether this is for a SQL dataset
- `dependencies` (dict[str, str], optional): External dataset dependencies
- `tables` (dict[str, str], optional): Table definitions
- `functions` (dict[str, Any], optional): User-defined functions

---

Expand Down Expand Up @@ -976,7 +990,10 @@ try:
print(f"Query returns {len(df)} rows")

# Validate schema
schema = client.schema.get_output_schema(query.query, True)
schema = client.schema.get_output_schema(
query.query,
dependencies={'eth': '_/eth_firehose@1.0.0'}
)
print(f"Schema: {schema.schema}")

# Register and deploy
Expand Down
Loading