Reference

Fetching data

AccessToken

A JWT access token. This object is immutable.

The access token counts as sensitive information not to be exposed (e.g. in logs).

init ( self, jwt_token: str ) → None

Creates a new JWT access token.

str ( self ) → str

Returns the string representation of the JWT access token.

is_expiring ( self ) → bool

Checks if the token is about to expire.

Returns: (bool) - True if the token is about to expire.

DAPClient

Client proxy for the Data Access Platform (DAP) server-side API.

In order to invoke high-level functionality such as initializing and synchronizing a database or data warehouse, or low-level functionality such as triggering a snapshot or incremental query, you need to instantiate a client, which acts as a proxy to DAP API.

Tracking for usage analytics is done here as it is needed for both CLI and library use cases, additionally this way it's tied to operations where the DAP service is used (e.g. no tracking for local dropdb)

aenter ( self ) → DAPSession

Initiates a new client session.

aexit ( self, exc_type: Type[BaseException] | None, exc_val: BaseException | None, exc_tb: traceback | None ) → None

Terminates a client session.

init ( self, base_url: str | None, credentials: Credentials | None, tracking: bool | None ) → None

Initializes a new client proxy to communicate with the DAP back-end.

DAPClientError

Bases: RuntimeError

DAPSession

Represents an authenticated session to DAP.

init ( self, session: ClientSession, base_url: str, credentials: Credentials, tracking_data: TrackingData | None ) → None

Creates a new logical session by encapsulating a network connection.

authenticate ( self ) → None

Authenticates with API key to receive a JWT.

await_job ( self, job: TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Wait until a job terminates.

Parameters:

job (TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob) - A job that might be still running.

Returns: (TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob) - A job that has completed with success or terminated with failure.

close ( self ) → None

Closes the underlying network sockets.

download_object ( self, object: Object, output_directory: str, decompress: bool, progress: JobProgress | None ) → str

Save a single remote file to a local directory.

Parameters:

object (Object) - Object to download.
output_directory (str) - Path of the target directory to save the downloaded file.
decompress (bool) - If True, the file will be decompressed after downloading. Default is False.
progress (JobProgress | None) - A progress bar to update during the download.

Returns: (str) - A path of the file saved in the local file system.

download_objects ( self, objects: List[Object], output_directory: str, decompress: bool ) → List[str]

Save data stored remotely into a local directory.

Parameters:

objects (List[Object]) - List of output objects to be downloaded.
output_directory (str) - Path to the target directory to save downloaded files to.
decompress (bool) - If True, the file will be decompressed after downloading. Default is False.

Returns: (List[str]) - A list of paths to files saved in the local file system.

download_resource ( self, resource: Resource, output_directory: str, decompress: bool ) → str

Save a single remote file to a local directory.

Parameters:

resource (Resource) - Resource to download.
output_directory (str) - Path of the target directory to save the downloaded file.
decompress (bool) - If True, the file will be decompressed after downloading. Default is False.

Returns: (str) - A path of the file saved in the local file system.

download_resources ( self, resources: List[Resource], output_directory: str, decompress: bool ) → List[str]

Save data stored remotely into a local directory.

Parameters:

resources (List[Resource]) - List of output resources to be downloaded.
output_directory (str) - Path to the target directory to save downloaded files to.
decompress (bool) - If True, the file will be decompressed after downloading. Default is False.

Returns: (List[str]) - A list of paths to files saved in the local file system.

download_table_data ( self, namespace: str, table: str, query: SnapshotQuery | IncrementalQuery, output_directory: str, decompress: bool ) → DownloadTableDataResult

Executes a query job and downloads data to a local directory.

Parameters:

namespace (str) - A namespace identifier such as canvas or mastery.
table (str) - A table identifier such as submissions, quizzes, or users.
query (SnapshotQuery | IncrementalQuery) - An object that encapsulates the parameters of the snapshot or incremental query to execute.
output_directory (str) - Path to the directory to save downloaded files to.
decompress (bool) - If True, the file will be decompressed after downloading. Default is False.

Returns: (DownloadTableDataResult) - Result of the query, including a list of paths to files saved in the local file system.

download_table_schema ( self, namespace: str, table: str, output_directory: str ) → None

Saves the schema as a JSON file into a local directory.

Parameters:

namespace (str) - A namespace identifier such as canvas or mastery.
table (str) - A table identifier such as submissions, quizzes, or users.
output_directory (str) - Path to the directory to save the JSON file to.

execute_job ( self, namespace: str, table: str, query: SnapshotQuery | IncrementalQuery ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Start a query job and wait until it terminates.

execute_operation_on_tables ( self, namespace: str, tables: str, operation_name: str, operation: Callable[[str, str], Awaitable[T]] ) → None

Executes given operation on multiple tables in the given namespace. The operations are currently executed in a sequential manner, independently of each other but some error types stop the execution of subsequent operations since in these cases they would also fail.

Parameters:

namespace (str) - A namespace identifier such as canvas or mastery.
tables (str) - A single table, a comma separated list of table names or the special "all".
operation_name (str) - The CLI command that is being executed.
operation (Callable[[str, str], Awaitable[T]]) - The operation to execute on a single table.

get_job ( self, job_id: str ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Retrieve job status.

get_job_status ( self, job_id: str ) → JobStatus

Retrieve job status.

get_objects ( self, job_id: str ) → List[Object]

Retrieve object IDs once the query is completed successfully.

get_resources ( self, objects: List[Object] ) → Dict[str, Resource]

Retrieve URLs to data stored remotely.

get_table_data ( self, namespace: str, table: str, query: SnapshotQuery | IncrementalQuery ) → GetTableDataResult

Executes a query job on a given table.

Parameters:

namespace (str) - A namespace identifier such as canvas or mastery.
table (str) - A table identifier such as submissions, quizzes, or users.
query (SnapshotQuery | IncrementalQuery) - An object that encapsulates the parameters of the snapshot or incremental query to execute.

Returns: (GetTableDataResult) - Result of the query, including metadata.

get_table_list ( self, namespace: str, table_param: str ) → List[str]

Returns a list of tables on which an operation should be performed. In case of "all" the list of tables for that namespace is retrieved.

Parameters:

namespace (str) - A namespace identifier such as canvas or mastery.
table_param (str) - can be a single table, a comma separated list of table names or the special "all".

get_table_schema ( self, namespace: str, table: str ) → VersionedSchema

Retrieves the versioned schema of a table.

Parameters:

namespace (str) - A namespace identifier such as canvas or mastery.
table (str) - A table identifier such as submissions, quizzes, or users.

Returns: (VersionedSchema) - The schema of the table as exposed by DAP API.

get_tables ( self, namespace: str ) → List[str]

Retrieves the list of tables available for querying.

Parameters:

namespace (str) - A namespace identifier such as canvas or mastery.

Returns: (List[str]) - A list of tables available for querying in the given namespace.

query_incremental ( self, namespace: str, table: str, query: IncrementalQuery ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Starts an incremental query.

query_snapshot ( self, namespace: str, table: str, query: SnapshotQuery ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

Starts a snapshot query.

stream_resource ( self, resource: Resource ) → AsyncIterator[StreamReader]

Creates a stream reader for the given resource.

Parameters:

resource (Resource) - Resource to download.

DownloadError

Bases: DAPClientError

Synchronizing a local database

SQLDrop

Encapsulates logic that drops table(s) from the SQL database.

drop ( self, namespace: str, table_names: str ) → None

Drops the given database tables.

SQLReplicator

Encapsulates logic that replicates changes acquired from DAP API in a SQL database.

Types

CompleteIncrementalJob

Bases: CompleteJob

An incremental query that has completed with success.

Properties:

since (datetime) - Start timestamp (in UTC); only those records are returned that have been persisted since the specified date and time.
until (datetime) - End timestamp (in UTC); only those records are returned that have been persisted before the specified date and time. This can be used as a starting point for future incremental queries.

CompleteJob

Bases: TableJob

A data access job that has completed with success.

Properties:

objects (List[Object]) - The list of objects generated by the job.
schema_version (int) - Version of the schema that records in the table conform to.

CompleteSnapshotJob

Bases: CompleteJob

A snapshot query that has completed with success.

Properties:

at (datetime) - Timestamp (in UTC) that identifies the table state. This can be used as a starting point for future incremental queries.

Credentials

Credentials to be passed to Instructure API Gateway.

All Instructure Platform Services go through the API Gateway. Access to credentials is managed via the Instructure Identity Service.

Properties:

basic_credentials (str) - Encoded credentials.
client_id (str) - The OAuth Client ID.
client_region (str) - The client's region decoded from the Client ID key.

DownloadTableDataResult

Bases: TableDataResult

The result of downloading the output of a snapshot or an incremental query to the local file system.

Properties:

downloaded_files (List[str]) - A list of paths to files containing the downloaded table data.

FailedJob

Bases: TableJob

A data access job that has terminated with failure.

Properties:

error (ProcessingError) - Provides more details on the error that occurred.

Format

Identifies the format of the data returned, e.g. TSV, CSV, JSON Lines, or Parquet.

Tab-separated values (TSV) is a simple tabular format in which each record (table row) occupies a single line.

Output always begins with a header row, which lists all metadata and data field names.
Fields (table columns) are delimited by tab characters.
Non-printable characters and special values are escaped with backslash (\\).

Comma-separated values (CSV) output follows RFC 4180 with a few extensions:

Output always begins with a header row, which lists all metadata and data field names.
Strings are quoted with double quotes (") if they contain special characters such as the double quote itself, the comma delimiter, a newline, a carriage return, a tab character, etc., or if their string representation would be identical to a special value such as NULL.
Empty strings are always represented as "".
NULL values are represented with the unquoted literal string NULL.
Missing values are presented as an empty string (no characters between delimiters).
Each row has the same number of fields.

When the output data is represented in the JSON Lines format, each record (table row) occupies a single line. Each line is a JSON object, which can be validated against the corresponding JSON schema.

Properties with null values are omitted in JSON.

Parquet files are compatible with Spark version 3.0 and later.

Members:

TSV = 'tsv' - Tab-separated values, in compliance with PostgreSQL COPY.
CSV = 'csv' - Comma-separated values, as per RFC 4180.
JSONL = 'jsonl' - JSON lines format, with a single JSON object occupying each line.
Parquet = 'parquet' - Parquet format, as generated by Spark.

GetTableDataResult

Bases: TableDataResult

The result of fetching the output of a snapshot or an incremental query.

Properties:

objects (List[Object]) - The list of objects generated by the job, which can be traded for resource URLs.

IncrementalQuery

Bases: TableQuery

Incremental queries return consolidated updates to a table, and help update a previous state to the present state.

If only a since timestamp is given (recommended), the operation returns all changes since the specified point in time. If multiple updates took place to a record since the specified time, only the most recent version of the record is returned.

If both a since and an until timestamp is given, the operation returns all records that have changed since the start timestamp of the interval but have not been altered after the end timestamp of the interval. Any records that have been updated after the until timestamp are not included in the query result. This functionality is useful to break up larger batches of changes but cannot be reliably used as a means of reconstructing a database state in the past (i.e. a point-in-time query or a backup of a previous state).

The range defined by since and until is inclusive for the since timestamp but exclusive for the until timestamp.

You would normally use incremental queries to fetch changes since a snapshot query or a previous incremental query. If issued as a follow-up to a snapshot query, the since timestamp of the incremental query would be equal to the at timestamp of the snapshot query. If issued as a follow-up to an incremental query, you would chain the until timestamp returned by the previous query job with the since timestamp of the new query request.

Properties:

since (datetime) - Start timestamp (in UTC); only those records are returned that have been persisted since the specified date and time. This typically equals at returned by a previous snapshot query job, or until returned by a previous incremental query job.
until (datetime | None) - End timestamp (in UTC); only those records are returned that have not been changed after the specified date and time. If omitted (recommended), defaults to the commit time of the latest record.

JobStatus

Tracks the lifetime of a job from creation to termination (with success or failure).

Members:

Waiting = 'waiting'
Running = 'running'
Complete = 'complete'
Failed = 'failed'

Mode

Output generation mode controls how nested fixed-cardinality fields are expanded into columns.

Mode expanded lays out nested fixed-cardinality fields into several columns. Consider the following example for TSV:

meta.ts               meta.action  [key.id](key.id)  value.plain  value.nested.sub1  value.nested.sub2  value.nested.sub3
2023-10-23T01:02:03Z  U            1       string       1                  multi-\nline        \N

Mode condensed keeps nested fields together. Observe how a nested field becomes a single JSON-valued field:

meta.ts               meta.action  [key.id](key.id)  value.plain  value.nested
2023-10-23T01:02:03Z  U            1       string       {"sub1": 1, "sub2": "multi-\\nline"}

In case both JSON and the output format (e.g. CSV or TSV) define escaping rules, they are applied consecutively. This is why there are multiple backslash characters in the example above: JSON escapes a newline character as \n, and then TSV escapes the backslash character to make the sequence \\n.

Properties with null values are omitted in condensed nested fields, as in JSON.

If all nested values are NULL, the tabular result is empty, not {} (empty JSON object). Specifically, TSV would write \N (NULL) and CSV would write no value (blank field).

Output generation mode does not affect fields meta and key, which are always expanded. Likewise, variable-cardinality fields (e.g. JSON array or object) are unaffected by mode, and are always exported as JSON.

Members:

expanded = 'expanded' - Nested fixed-cardinality fields are expanded into several columns.
condensed = 'condensed' - Nested fixed-cardinality fields are exported as embedded JSON.

Object

A reference to a binary or text object persisted in object storage, such as a CSV, JSON, or Parquet file.

The lifetime of the object depends on the operation that created it but typically lasts for 24 hours. Object identifiers can be traded for pre-signed URLs via an authenticated endpoint operation while the object exists.

Properties:

id (str) - Uniquely identifies the object.

Resource

A pre-signed URL to a binary or text object persisted in object storage, such as a CSV, JSON or Parquet file.

The lifetime of the pre-signed URL depends on the operation that created it but typically lasts for 15 minutes. No authentication is required to fetch the object via the pre-signed URL.

Properties:

url (URL) - URL to the object.

ResourceResult

Associates object identifiers with pre-signed URLs to output resources.

Properties:

urls (Dict[str, Resource]) - A dictionary of key-value pairs consisting of an ObjectID and the corresponding resource URL.

SnapshotQuery

Bases: TableQuery

Snapshot queries return the present state of the table.

Snapshot queries help populate an empty database. After the initial snapshot query, you would use incremental queries to get the most up-to-date version of the data.

TableDataResult

The result of a table query operation.

Properties:

schema_version (int) - Version of the schema that records in the table conform to.
timestamp (datetime) - Timestamp (in UTC) that identifies the table state.
job_id (str) - The ID of the executed backend job.

TableJob

A data access job in progress.

Properties:

id (str) - Opaque unique identifier of the job.
status (JobStatus) - The current status of the job.
expires_at (datetime | None) - The time when job will no longer be available.

TableList

A list of tables that exist in the organization domain.

Properties:

tables (List[str]) - A list of table names.

TableQuery

Encapsulates a query request to retrieving data from a table.

Properties:

format (Format) - The format of the data to be returned.
mode (Mode | None) - Output generation mode.

TokenProperties

An authentication/authorization token issued by API Gateway.

Properties:

access_token (str) - A base64-encoded access token string with header, payload and signature parts.
expires_in (int) - Expiry time (in sec) of the access token. This field is informational, the timestamp is also embedded in the access token.
scope (str) - List of services accessible by the client. Informational field, as the scope is also embedded in the access token.
token_type (str) - Type of the access token.

URL

A Uniform Resource Locator (URL).

Properties:

url (str) - The URL string.

VersionedSchema

The state of the schema at a specific point in time.

Schemas are backwards compatible. They receive strictly monotonically increasing version numbers as schema evolution takes place.

Properties:

schema (Dict[str, None | bool | int | float | str | Dict[str, JsonType] | List[JsonType]]) - The JSON Schema object to validate against.
version (int) - The version of the schema.

Exceptions

AccountDisabledError

Bases: OperationError

Raised when the client is onboarded but access is forbidden.

AccountNotOnboardedError

Bases: OperationError

Raised when the client is not onboarded.

AccountUnderMaintenanceError

Bases: OperationError

Raised when account disabled because of maintenance

AuthenticationError

Bases: OperationError

Raised when the client fails to provide valid authentication credentials.

GatewayTimeoutError

Bases: Exception

Raised when received timeout from gateway.

Properties:

message (str) - Always the same message signaling that a timeout received.

Location

Refers to a location in parsable text input (e.g. JSON, YAML or structured text).

Properties:

line (int) - Line number (1-based).
column (int) - Column number w.r.t. the beginning of the line (1-based).
character (int) - Character number w.r.t. the beginning of the input (1-based).

NotFoundError

Bases: OperationError

Raised when an entity does not exist or has expired.

Properties:

id (str) - The identifier of the entity not found, e.g. the name of a table or the UUID of a job.
kind (str) - The entity that is not found such as a namespace, table, object or job.

OperationError

Bases: Exception

Encapsulates an error from an endpoint operation.

Properties:

type (str) - A machine-processable identifier for the error. Typically corresponds to the fully-qualified exception class, as per the type system of the language that emitted the message (e.g. Java, Python or Scala exception type).
uuid (str) - Unique identifier of the error. This identifier helps locate the exact source of the error (e.g. find the log entry in the server log stream). Make sure to include this identifier when contacting support.
message (str) - A human-readable description for the error for informational purposes. The exact format of the message is unspecified, and implementations should not rely on the presence of any specific information.

OutOfRangeError

Bases: OperationError

Raised when data is queried outside of the allowed time range.

Properties:

since (datetime) - The earliest permitted timestamp.
until (datetime | None) - The latest permitted timestamp.

ProcessingError

Bases: OperationError

Raised when a job has terminated due to an unexpected error.

ServerError

Bases: Exception

An error returned by the server.

Properties:

body (Any) - Unspecified content returned by the server.

SnapshotRequiredError

Bases: OperationError

Raised when data is queried outside of the allowed time range, and the table was reloaded recently. A new snapshot is required to keep data consistency.

Properties:

since (datetime) - The earliest permitted timestamp.
until (datetime | None) - The latest permitted timestamp.

ValidationError

Bases: OperationError

Raised when a JSON validation error occurs.

Properties:

location (Location) - Location of where invalid input was found.

PreviousExamples NextData Sync

Last updated 21 days ago

Was this helpful?

Fetching data

AccessToken

__init__ ( self, jwt_token: str ) → None

__str__ ( self ) → str

is_expiring ( self ) → bool

DAPClient

__aenter__ ( self ) → DAPSession

__aexit__ ( self, exc_type: Type[BaseException] | None, exc_val: BaseException | None, exc_tb: traceback | None ) → None

__init__ ( self, base_url: str | None, credentials: Credentials | None, tracking: bool | None ) → None

DAPClientError

DAPSession

__init__ ( self, session: ClientSession, base_url: str, credentials: Credentials, tracking_data: TrackingData | None ) → None

authenticate ( self ) → None

await_job ( self, job: TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

close ( self ) → None

download_object ( self, object: Object, output_directory: str, decompress: bool, progress: JobProgress | None ) → str

download_objects ( self, objects: List[Object], output_directory: str, decompress: bool ) → List[str]

download_resource ( self, resource: Resource, output_directory: str, decompress: bool ) → str

download_resources ( self, resources: List[Resource], output_directory: str, decompress: bool ) → List[str]

download_table_data ( self, namespace: str, table: str, query: SnapshotQuery | IncrementalQuery, output_directory: str, decompress: bool ) → DownloadTableDataResult

download_table_schema ( self, namespace: str, table: str, output_directory: str ) → None

execute_job ( self, namespace: str, table: str, query: SnapshotQuery | IncrementalQuery ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

execute_operation_on_tables ( self, namespace: str, tables: str, operation_name: str, operation: Callable[[str, str], Awaitable[T]] ) → None

get_job ( self, job_id: str ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

get_job_status ( self, job_id: str ) → JobStatus

get_objects ( self, job_id: str ) → List[Object]

get_resources ( self, objects: List[Object] ) → Dict[str, Resource]

get_table_data ( self, namespace: str, table: str, query: SnapshotQuery | IncrementalQuery ) → GetTableDataResult

get_table_list ( self, namespace: str, table_param: str ) → List[str]

get_table_schema ( self, namespace: str, table: str ) → VersionedSchema

get_tables ( self, namespace: str ) → List[str]

query_incremental ( self, namespace: str, table: str, query: IncrementalQuery ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

query_snapshot ( self, namespace: str, table: str, query: SnapshotQuery ) → TableJob | CompleteSnapshotJob | CompleteIncrementalJob | FailedJob

stream_resource ( self, resource: Resource ) → AsyncIterator[StreamReader]

DownloadError

Synchronizing a local database

SQLDrop

drop ( self, namespace: str, table_names: str ) → None

SQLReplicator

Types

CompleteIncrementalJob

CompleteJob

CompleteSnapshotJob

Credentials

DownloadTableDataResult

FailedJob

Format

GetTableDataResult

IncrementalQuery

JobStatus

Mode

Object

Resource

ResourceResult

SnapshotQuery

TableDataResult

TableJob

TableList

TableQuery

TokenProperties

URL

VersionedSchema

Exceptions

AccountDisabledError

AccountNotOnboardedError

AccountUnderMaintenanceError

AuthenticationError

GatewayTimeoutError

Location

NotFoundError

OperationError

OutOfRangeError

ProcessingError

ServerError

SnapshotRequiredError

ValidationError

init ( self, jwt_token: str ) → None

str ( self ) → str

aenter ( self ) → DAPSession

aexit ( self, exc_type: Type[BaseException] | None, exc_val: BaseException | None, exc_tb: traceback | None ) → None

init ( self, base_url: str | None, credentials: Credentials | None, tracking: bool | None ) → None

init ( self, session: ClientSession, base_url: str, credentials: Credentials, tracking_data: TrackingData | None ) → None