BigQuery Query

Runs SQL queries against Google BigQuery. Returns results as Arrow RecordBatch.

Configuration

- gcp_bigquery_query:
    name: get_orders
    credentials_path: /etc/gcp/service-account.json
    project_id: my-project
    query: "SELECT * FROM `my-project.dataset.orders` WHERE date = @date"
    parameters:
      date: "{{event.data.date}}"

Fields

FieldTypeDefaultDescription
namestringrequiredTask name.
credentials_pathstringrequiredGCP service account credentials.
project_idstringrequiredGCP project ID (data project).
job_project_idstringGCP project ID for billing (if different).
querystring/resourcerequiredSQL query. Supports templating and resource files.
parametersmapNamed query parameters.
locationstringBigQuery location (e.g., US, EU).
max_resultsintMax rows per page.
timeoutduration10sQuery timeout.
use_query_cachebooltrueUse BigQuery query cache.
use_legacy_sqlboolfalseUse legacy SQL syntax.
default_datasetstringDefault dataset for unqualified table names.
labelsmapJob labels.
depends_onlistUpstream task names.
retryobjectRetry configuration.

Example: Query with resource file

- gcp_bigquery_query:
    name: daily_report
    credentials_path: /etc/gcp/service-account.json
    project_id: my-project
    query:
      resource: queries/daily_report.sql
    parameters:
      start_date: "{{event.data.start}}"
      end_date: "{{event.data.end}}"