BigQuery Jobs

Create, monitor, and manage BigQuery jobs. Used for load jobs from GCS.

Operations

OperationDescription
createCreate a load job.
getGet job status.
cancelCancel a running job.
deleteDelete a job.

Configuration

- gcp_bigquery_job:
    name: load_from_gcs
    operation: create
    credentials_path: /etc/gcp/service-account.json
    project_id: my-project
    source_uris:
      - "gs://my-bucket/data/*.parquet"
    destination_table:
      project_id: my-project
      dataset_id: raw
      table_id: events
    source_format: parquet
    write_disposition: write_append
    autodetect: true

Fields

FieldTypeDefaultDescription
namestringrequiredTask name.
operationstringrequiredcreate, get, cancel, delete.
credentials_pathstringrequiredGCP service account credentials.
project_idstringrequiredGCP project ID.
locationstringBigQuery location.
source_urislist/templateGCS source URIs (for create).
destination_tableobjectTarget table (project_id, dataset_id, table_id).
source_formatstringnewline_delimited_jsonparquet, csv, newline_delimited_json, avro.
write_dispositionstringwrite_appendwrite_append, write_truncate, write_empty.
create_dispositionstringcreate_if_neededcreate_if_needed, create_never.
autodetectboolAuto-detect schema from source.
schemalistExplicit schema (list of field definitions).
max_bad_recordsintMax bad records before job fails.
job_idstringJob ID (for get, cancel, delete). Supports templating.
poll_intervalduration5sStatus check interval.
max_poll_durationduration30mMax time to wait for completion.
labelsmapJob labels.
depends_onlistUpstream task names.
retryobjectRetry configuration.