Skip to content

Comments

Feat/bigquery datasource [ part - 1 ]#115

Merged
anoop-narang merged 9 commits intomainfrom
feat/bigquery-datasource
Feb 10, 2026
Merged

Feat/bigquery datasource [ part - 1 ]#115
anoop-narang merged 9 commits intomainfrom
feat/bigquery-datasource

Conversation

@anoop-narang
Copy link
Contributor

@anoop-narang anoop-narang commented Feb 10, 2026

Add BigQuery as a native datasource

Adds BigQuery support as a native datasource, allowing users to connect to GCP BigQuery
projects, discover tables, and query them through DataFusion.

What's included

  • BigQuery datasource implementation — connects via service account credentials, discovers
    tables from INFORMATION_SCHEMA, and fetches data using BigQuery Jobs API with pagination and
    batched Arrow writes
  • Configurable region — supports cross-dataset discovery with a region parameter (defaults to
    "us"), or scoped discovery when a specific dataset is provided
  • Inline credential support — credentials_json in the connection config is automatically
    stored as a secret and linked to the connection, matching the existing flow for Postgres
    passwords

Connection API

  {
      "name": "my_bq",
      "source_type": "bigquery",
      "config": {
          "project_id": "my-gcp-project",
          "credentials_json": "{...service account JSON...}",
          "region": "US",
          "dataset": "my_dataset"
      }
  }

Alternatively, credentials can be pre-created as a secret and referenced via secret_name or
secret_id.

Add BigQuery as a native data source with table discovery and data
fetching via the gcp-bigquery-client crate. Includes Source enum
variant, credential support, and integration into the NativeFetcher.
Replace hardcoded "region-us" with a configurable region field on the
BigQuery source config, defaulting to "us" when omitted.
Paginate through all result pages via get_query_results instead of only
reading the first page. Buffer rows and flush in 10k-row batches to
bound memory usage, matching the postgres/mysql fetcher pattern.
…s null

BigQuery returns ARRAY, STRUCT, and JSON columns as non-string JSON
values (arrays, objects). Previously these were silently dropped as
nulls, causing Arrow validation errors on non-nullable columns. Now
they are serialized to their JSON string representation.
@anoop-narang anoop-narang marked this pull request as ready for review February 10, 2026 14:30
@anoop-narang
Copy link
Contributor Author

thread 'test_update_table_sync_cache_invalidation' (3845) panicked at tests/caching_catalog_tests.rs:22:10:
Failed to start Redis container: Client(PullImage { descriptor: "redis:7-alpine", err: DockerResponseServerError { status_code: 500, message: "toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit" } })

@anoop-narang anoop-narang changed the title Feat/bigquery datasource Feat/bigquery datasource part - 1 Feb 10, 2026
@anoop-narang anoop-narang changed the title Feat/bigquery datasource part - 1 Feat/bigquery datasource [ part - 1 ] Feb 10, 2026
The connection handler only recognized password, token, and bearer_token
as inline credential fields. BigQuery's credentials_json was not being
extracted, stored as a secret, or linked to the connection.
@anoop-narang anoop-narang merged commit ab01651 into main Feb 10, 2026
8 checks passed
@anoop-narang anoop-narang deleted the feat/bigquery-datasource branch February 10, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant