Reference

Fetching data

exception dap.api.DAPClientError

Bases: RuntimeError

class dap.api.DAPClient

Bases: object

Client proxy for the Data Access Platform (DAP) server-side API.

In order to invoke high-level functionality such as initializing and synchronizing a database or data warehouse, or low-level functionality such as triggering a snapshot or incremental query, you need to instantiate a client, which acts as a proxy to DAP API.

class dap.api.AccessToken

Bases: object

A JWT access token. This object is immutable.

The access token counts as sensitive information not to be exposed (e.g. in logs).

is_expiring() → bool

Checks if the token is about to expire.

  • Returns: True if the token is about to expire.

class dap.api.DAPSession

Bases: object

Represents an authenticated session to DAP.

async close() → None

Closes the underlying network sockets.

async authenticate() → None

Authenticates with API key to receive a JWT.

async query_snapshot(namespace: str, table: str, query: SnapshotQuery) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Starts a snapshot query.

async query_incremental(namespace: str, table: str, query: IncrementalQuery) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Starts an incremental query.

async get_tables(namespace: str) → List[str]

Retrieves the list of tables available for querying.

  • Parameters: namespace – A namespace identifier such as canvas or mastery.

  • Returns: A list of tables available for querying in the given namespace.

async get_table_schema(namespace: str, table: str) → VersionedSchema

Retrieves the versioned schema of a table.

  • Parameters:

    • namespace – A namespace identifier such as canvas or mastery.

    • table – A table identifier such as submissions, quizzes, or users.

  • Returns: The schema of the table as exposed by DAP API.

async download_table_schema(namespace: str, table: str, output_directory: str) → None

Saves the schema as a JSON file into a local directory.

  • Parameters:

    • namespace – A namespace identifier such as canvas or mastery.

    • table – A table identifier such as submissions, quizzes, or users.

    • output_directory – Path to the directory to save the JSON file to.

async get_job(job_id: str) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Retrieve job status.

async get_job_status(job_id: str) → JobStatus

Retrieve job status.

async get_objects(job_id: str) → List[Object]

Retrieve object IDs once the query is completed successfully.

async get_resources(objects: List[Object]) → Dict[str, Resource]

Retrieve URLs to data stored remotely.

async download_resources(resources: List[Resource], output_directory: str, decompress: bool = False) → List[str]

Save data stored remotely into a local directory.

  • Parameters:

    • resources – List of output resources to be downloaded.

    • output_directory – Path to the target directory to save downloaded files to.

    • decompress – If True, the file will be decompressed after downloading. Default is False.

  • Returns: A list of paths to files saved in the local file system.

async download_resource(resource: Resource, output_directory: str, decompress: bool = False) → str

Save a single remote file to a local directory.

  • Parameters:

    • resource – Resource to download.

    • output_directory – Path of the target directory to save the downloaded file.

    • decompress – If True, the file will be decompressed after downloading. Default is False.

  • Returns: A path of the file saved in the local file system.

async download_objects(objects: List[Object], output_directory: str, decompress: bool = False) → List[str]

Save data stored remotely into a local directory.

  • Parameters:

    • objects – List of output objects to be downloaded.

    • output_directory – Path to the target directory to save downloaded files to.

    • decompress – If True, the file will be decompressed after downloading. Default is False.

  • Returns: A list of paths to files saved in the local file system.

async download_object(object: Object, output_directory: str, decompress: bool = False) → str

Save a single remote file to a local directory.

  • Parameters:

    • object – Object to download.

    • output_directory – Path of the target directory to save the downloaded file.

    • decompress – If True, the file will be decompressed after downloading. Default is False.

  • Returns: A path of the file saved in the local file system.

async stream_resource(resource: Resource) → AsyncIterator[StreamReader]

Creates a stream reader for the given resource.

  • Parameters: resource – Resource to download.

  • Yields: An asynchronous generator that can be used with an asynchronous context manager.

  • Raises: DownloadError – Raised when the host returns an HTTP error response, and rejects the request.

Wait until a job terminates.

  • Parameters: job – A job that might be still running.

  • Returns: A job that has completed with success or terminated with failure.

async execute_job(namespace: str, table: str, query: SnapshotQuery | IncrementalQuery) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Start a query job and wait until it terminates.

async download_table_data(namespace: str, table: str, query: SnapshotQuery | IncrementalQuery, output_directory: str, decompress: bool = False) → DownloadTableDataResult

Executes a query job and downloads data to a local directory.

  • Parameters:

    • namespace – A namespace identifier such as canvas or mastery.

    • table – A table identifier such as submissions, quizzes, or users.

    • query – An object that encapsulates the parameters of the snapshot or incremental query to execute.

    • output_directory – Path to the directory to save downloaded files to.

    • decompress – If True, the file will be decompressed after downloading. Default is False.

  • Returns: Result of the query, including a list of paths to files saved in the local file system.

  • Raises: DAPClientError – Raised when the query returned an error or fetching data has failed.

async get_table_data(namespace: str, table: str, query: SnapshotQuery | IncrementalQuery) → GetTableDataResult

Executes a query job on a given table.

  • Parameters:

    • namespace – A namespace identifier such as canvas or mastery.

    • table – A table identifier such as submissions, quizzes, or users.

    • query – An object that encapsulates the parameters of the snapshot or incremental query to execute.

  • Returns: Result of the query, including metadata.

  • Raises: DAPClientError – Raised when the query returned an error or fetching data has failed.

exception dap.api.DownloadError

Bases: DAPClientError

Synchronizing a local database

class dap.replicator.sql.SQLReplicator

Bases: object

Encapsulates logic that replicates changes acquired from DAP API in a SQL database.

class dap.replicator.sql.SQLDrop

Bases: object

Encapsulates logic that drops a table from the SQL database.

async drop(namespace: str, table_name: str) → None

Drops the given database table.

Types

class dap.dap_types.URL

Bases: object

A Uniform Resource Locator (URL).

  • Parameters: url – The URL string.

class dap.dap_types.VersionedSchema

Bases: object

The state of the schema at a specific point in time.

Schemas are backwards compatible. They receive strictly monotonically increasing version numbers as schema evolution takes place.

  • Parameters:

    • schema – The JSON Schema object to validate against.

    • version – The version of the schema.

class dap.dap_types.Object

Bases: object

A reference to a binary or text object persisted in object storage, such as a CSV, JSON, or Parquet file.

The lifetime of the object depends on the operation that created it but typically lasts for 24 hours. Object identifiers can be traded for pre-signed URLs via an authenticated endpoint operation while the object exists.

  • Parameters: id – Uniquely identifies the object.

class dap.dap_types.Resource

Bases: object

A pre-signed URL to a binary or text object persisted in object storage, such as a CSV, JSON or Parquet file.

The lifetime of the pre-signed URL depends on the operation that created it but typically lasts for 15 minutes. No authentication is required to fetch the object via the pre-signed URL.

  • Parameters: url – URL to the object.

class dap.dap_types.JobStatus

Bases: Enum

Tracks the lifetime of a job from creation to termination (with success or failure).

isTerminal() → bool

Signals if a job has been terminated (with ‘complete’ or ‘failed’ status).

class dap.dap_types.TableJob

Bases: object

A data access job in progress.

  • Parameters:

    • id – Opaque unique identifier of the job.

    • status – The current status of the job.

    • expires_at – The time when job will no longer be available.

class dap.dap_types.CompleteJob

Bases: TableJob

A data access job that has completed with success.

  • Parameters:

    • objects – The list of objects generated by the job.

    • schema_version – Version of the schema that records in the table conform to.

class dap.dap_types.CompleteSnapshotJob

Bases: CompleteJob

A snapshot query that has completed with success.

  • Parameters: at – Timestamp (in UTC) that identifies the table state. This can be used as a starting point for future incremental queries.

class dap.dap_types.CompleteIncrementalJob

Bases: CompleteJob

An incremental query that has completed with success.

  • Parameters:

    • since – Start timestamp (in UTC); only those records are returned that have been persisted since the specified date and time.

    • until – End timestamp (in UTC); only those records are returned that have been persisted before the specified date and time. This can be used as a starting point for future incremental queries.

class dap.dap_types.FailedJob

Bases: TableJob

A data access job that has terminated with failure.

  • Parameters: error – Provides more details on the error that occurred.

class dap.dap_types.TableList

Bases: object

A list of tables that exist in the organization domain.

class dap.dap_types.Format

Bases: Enum

Identifies the format of the data returned, e.g. TSV, CSV, JSON Lines, or Parquet.

Tab-separated values (TSV) is a simple tabular format in which each record (table row) occupies a single line.

  • Output always begins with a header row, which lists all metadata and data field names.

  • Fields (table columns) are delimited by tab characters.

  • Non-printable characters and special values are escaped with backslash (\).

Comma-separated values (CSV) output follows RFC 4180 with a few extensions:

  • Output always begins with a header row, which lists all metadata and data field names.

  • Strings are quoted with double quotes (”) if they contain special characters such as the double quote itself, the comma delimiter, a newline, a carriage return, a tab character, etc., or if their string representation would be identical to a special value such as NULL.

  • Empty strings are always represented as “”.

  • NULL values are represented with the unquoted literal string NULL.

  • Missing values are presented as an empty string (no characters between delimiters).

  • Each row has the same number of fields.

When the output data is represented in the JSON Lines format, each record (table row) occupies a single line. Each line is a JSON object, which can be validated against the corresponding JSON schema.

Properties with null values are omitted in JSON.

Parquet files are compatible with Spark version 3.0 and later.

TSV = 'tsv'

Tab-separated values, in compliance with PostgreSQL COPY.

CSV = 'csv'

Comma-separated values, as per RFC 4180.

JSONL = 'jsonl'

JSON lines format, with a single JSON object occupying each line.

Parquet = 'parquet'

Parquet format, as generated by Spark.

class dap.dap_types.Mode

Bases: Enum

Output generation mode controls how nested fixed-cardinality fields are expanded into columns.

Mode expanded lays out nested fixed-cardinality fields into several columns. Consider the following example for TSV: tsv meta.ts meta.action key.id value.plain value.nested.sub1 value.nested.sub2 value.nested.sub3 2023-10-23T01:02:03Z U 1 string 1 multi-\nline

Mode condensed keeps nested fields together. Observe how a nested field becomes a single JSON-valued field: tsv meta.ts meta.action key.id value.plain value.nested 2023-10-23T01:02:03Z U 1 string {"sub1": 1, "sub2": "multi-\\nline"}

In case both JSON and the output format (e.g. CSV or TSV) define escaping rules, they are applied consecutively. This is why there are multiple backslash characters in the example above: JSON escapes a newline character as n, and then TSV escapes the backslash character to make the sequence \n.

Properties with null values are omitted in condensed nested fields, as in JSON.

If all nested values are NULL, the tabular result is empty, not {} (empty JSON object). Specifically, TSV would write N (NULL) and CSV would write no value (blank field).

Output generation mode does not affect fields meta and key, which are always expanded. Likewise, variable-cardinality fields (e.g. JSON array or object) are unaffected by mode, and are always exported as JSON.

expanded = 'expanded'

Nested fixed-cardinality fields are expanded into several columns.

condensed = 'condensed'

Nested fixed-cardinality fields are exported as embedded JSON.

class dap.dap_types.TableQuery

Bases: object

Encapsulates a query request to retrieving data from a table.

  • Parameters:

    • format – The format of the data to be returned.

    • mode – Output generation mode.

class dap.dap_types.SnapshotQuery

Bases: TableQuery

Snapshot queries return the present state of the table.

Snapshot queries help populate an empty database. After the initial snapshot query, you would use incremental queries to get the most up-to-date version of the data.

class dap.dap_types.IncrementalQuery

Bases: TableQuery

Incremental queries return consolidated updates to a table, and help update a previous state to the present state.

If only a since timestamp is given (recommended), the operation returns all changes since the specified point in time. If multiple updates took place to a record since the specified time, only the most recent version of the record is returned.

If both a since and an until timestamp is given, the operation returns all records that have changed since the start timestamp of the interval but have not been altered after the end timestamp of the interval. Any records that have been updated after the until timestamp are not included in the query result. This functionality is useful to break up larger batches of changes but cannot be reliably used as a means of reconstructing a database state in the past (i.e. a point-in-time query or a backup of a previous state).

The range defined by since and until is inclusive for the since timestamp but exclusive for the until timestamp.

You would normally use incremental queries to fetch changes since a snapshot query or a previous incremental query. If issued as a follow-up to a snapshot query, the since timestamp of the incremental query would be equal to the at timestamp of the snapshot query. If issued as a follow-up to an incremental query, you would chain the until timestamp returned by the previous query job with the since timestamp of the new query request.

  • Parameters:

    • since – Start timestamp (in UTC); only those records are returned that have been persisted since the specified date and time. This typically equals at returned by a previous snapshot query job, or until returned by a previous incremental query job.

    • until – End timestamp (in UTC); only those records are returned that have not been changed after the specified date and time. If omitted (recommended), defaults to the commit time of the latest record.

class dap.dap_types.ResourceResult

Bases: object

Associates object identifiers with pre-signed URLs to output resources.

  • Parameters: urls – A dictionary of key-value pairs consisting of an ObjectID and the corresponding resource URL.

class dap.dap_types.Credentials

Bases: object

Credentials to be passed to Instructure API Gateway.

All Instructure Platform Services go through the API Gateway. Access to credentials is managed via the Instructure Identity Service.

  • Parameters:

    • basic_credentials – Encoded credentials.

    • client_id – The OAuth Client ID.

    • client_region – The client’s region decoded from the Client ID key.

class dap.dap_types.TokenProperties

Bases: object

An authentication/authorization token issued by API Gateway.

  • Parameters:

    • access_token – A base64-encoded access token string with header, payload and signature parts.

    • expires_in – Expiry time (in sec) of the access token. This field is informational, the timestamp is also embedded in the access token.

    • scope – List of services accessible by the client. Informational field, as the scope is also embedded in the access token.

    • token_type – Type of the access token.

class dap.dap_types.TableDataResult

Bases: object

The result of a table query operation.

  • Parameters:

    • schema_version – Version of the schema that records in the table conform to.

    • timestamp – Timestamp (in UTC) that identifies the table state.

    • job_id – The ID of the executed backend job.

class dap.dap_types.DownloadTableDataResult

Bases: TableDataResult

The result of downloading the output of a snapshot or an incremental query to the local file system.

  • Parameters: downloaded_files – A list of paths to files containing the downloaded table data.

class dap.dap_types.GetTableDataResult

Bases: TableDataResult

The result of fetching the output of a snapshot or an incremental query.

  • Parameters: objects – The list of objects generated by the job, which can be traded for resource URLs.

Exceptions

exception dap.dap_error.ServerError

Bases: Exception

An error returned by the server.

  • Parameters: body – Unspecified content returned by the server.

exception dap.dap_error.OperationError

Bases: Exception

Encapsulates an error from an endpoint operation.

  • Parameters:

    • type – A machine-processable identifier for the error. Typically corresponds to the fully-qualified exception class, as per the type system of the language that emitted the message (e.g. Java, Python or Scala exception type).

    • uuid – Unique identifier of the error. This identifier helps locate the exact source of the error (e.g. find the log entry in the server log stream). Make sure to include this identifier when contacting support.

    • message – A human-readable description for the error for informational purposes. The exact format of the message is unspecified, and implementations should not rely on the presence of any specific information.

exception dap.dap_error.AuthenticationError

Bases: OperationError

Raised when the client fails to provide valid authentication credentials.

exception dap.dap_error.AccountNotOnboardedError

Bases: OperationError

Raised when the client is not onboarded.

exception dap.dap_error.AccountDisabledError

Bases: OperationError

Raised when the client is onboarded but access is forbidden.

class dap.dap_error.Location

Bases: object

Refers to a location in parsable text input (e.g. JSON, YAML or structured text).

  • Parameters:

    • line – Line number (1-based).

    • column – Column number w.r.t. the beginning of the line (1-based).

    • character – Character number w.r.t. the beginning of the input (1-based).

exception dap.dap_error.ValidationError

Bases: OperationError

Raised when a JSON validation error occurs.

  • Parameters: location – Location of where invalid input was found.

exception dap.dap_error.NotFoundError

Bases: OperationError

Raised when an entity does not exist or has expired.

  • Parameters:

    • id – The identifier of the entity not found, e.g. the name of a table or the UUID of a job.

    • kind – The entity that is not found such as a namespace, table, object or job.

exception dap.dap_error.OutOfRangeError

Bases: OperationError

Raised when data is queried outside of the allowed time range.

  • Parameters:

    • since – The earliest permitted timestamp.

    • until – The latest permitted timestamp.

exception dap.dap_error.SnapshotRequiredError

Bases: OperationError

Raised when data is queried outside of the allowed time range, and the table was reloaded recently. A new snapshot is required to keep data consistency.

  • Parameters:

    • since – The earliest permitted timestamp.

    • until – The latest permitted timestamp.

exception dap.dap_error.ProcessingError

Bases: OperationError

Raised when a job has terminated due to an unexpected error.

exception dap.dap_error.GatewayTimeoutError

Bases: Exception

Raised when received timeout from gateway.

  • Parameters: message – Always the same message signaling that a timeout received.

Last updated

Copyright © 2024 Instructure, Inc. All rights reserved.