datasetinsights.io.downloader

datasetinsights.io.downloader.base

class datasetinsights.io.downloader.base.DatasetDownloader(**kwargs)

Bases: abc.ABC

This is the base class for all dataset downloaders The DatasetDownloader can be subclasses in the following way

class NewDatasetDownloader(DatasetDownloader, protocol=”protocol://”)

Here the ‘protocol://’ should match the prefix that the method download source_uri supports. Example http:// gs://

abstract download(source_uri, output, **kwargs)

This method downloads a dataset stored at the source_uri and stores it in the output directory

Parameters
  • source_uri – URI that points to the dataset that should be downloaded

  • output – path to local folder where the dataset should be stored

  • **kwargs

datasetinsights.io.downloader.base.create_downloader(source_uri, **kwargs)
This function instantiates the dataset downloader

after finding it with the source-uri provided

Parameters
  • source_uri – URI used to look up the correct dataset downloader

  • **kwargs

Returns: The dataset downloader instance matching the source-uri.

datasetinsights.io.downloader.gcs_downloader

class datasetinsights.io.downloader.gcs_downloader.GCSDatasetDownloader(**kwargs)

Bases: datasetinsights.io.downloader.base.DatasetDownloader

This class is used to download data from GCS

download(source_uri=None, output=None, **kwargs)
Parameters
  • source_uri – This is the downloader-uri that indicates where on GCS the dataset should be downloaded from. The expected source-uri follows these patterns gs://bucket/folder or gs://bucket/folder/data.zip

  • output – This is the path to the directory where the download will store the dataset.

datasetinsights.io.downloader.http_downloader

class datasetinsights.io.downloader.http_downloader.HTTPDatasetDownloader(**kwargs)

Bases: datasetinsights.io.downloader.base.DatasetDownloader

This class is used to download data from any HTTP or HTTPS public url and perform function such as downloading the dataset and checksum validation if checksum file path is provided.

download(source_uri, output, checksum_file=None, **kwargs)

This method is used to download the dataset from HTTP or HTTPS url.

Parameters
  • source_uri (str) – This is the downloader-uri that indicates where the dataset should be downloaded from.

  • output (str) – This is the path to the directory where the download will store the dataset.

  • checksum_file (str) – This is path of the txt file that contains checksum of the dataset to be downloaded. It can be HTTP or HTTPS url or local path.

Raises

ChecksumError – This will raise this error if checksum doesn’t matches

datasetinsights.io.downloader.unity_simulation

UnitySimulationDownloader downloads a dataset from Unity Simulation

class datasetinsights.io.downloader.unity_simulation.Downloader(manifest_file: str, data_root: str)

Bases: object

Parse a given manifest file to download simulation output

For more on Unity Simulation please see these docs

manifest

the csv manifest file stored in a pandas dataframe

Type

DataFrame

data_root

root directory where the simulation output should be downloaded

Type

str

MANIFEST_FILE_COLUMNS = ('run_execution_id', 'app_param_id', 'instance_id', 'attempt_id', 'file_name', 'download_uri')
download_all()

Download all files in the manifest file.

download_binary_files()

Download all binary files.

download_captures()

Download all captures files. See captures

download_metrics()

Download all metrics files. See metrics

download_references()

Download all reference files. All reference tables are static tables during the simulation. This typically comes from the definition of the simulation and should be created before tasks running distributed at different instances.

static match_filetypes(manifest)

Match filetypes for every rows in the manifest file.

Parameters

manifest (pd.DataFrame) – the manifest csv file

Returns

a list of filetype strings

class datasetinsights.io.downloader.unity_simulation.UnitySimulationDownloader(access_token=None, **kwargs)

Bases: datasetinsights.io.downloader.base.DatasetDownloader

This class is used to download data from Unity Simulation

For more on Unity Simulation please see these docs <https://github.com/Unity-Technologies/Unity-Simulation-Docs>

Args: access_token: “Access token to be used to authenticate to

unity simulation for downloading the dataset”

SOURCE_URI_PATTERN = 'usim://([^@]*)?@?([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})/(\\w+)'
download(source_uri, output, include_binary=False, **kwargs)
Parameters
  • source_uri

    This is the downloader-uri that indicates where on unity simulation the dataset should be downloaded from.

    The expected source-uri follows these patterns

    usim://access-token@project-id/run-execution-id or usim://project-id/run-execution-id

  • output – This is the path to the directory where the download will store the dataset.

  • include_binary

    Whether to download binary files such as images or LIDAR point clouds. This flag applies to Datasets where metadata (e.g. annotation json, dataset catalog, …)

    can be separated from binary files.

parse_source_uri(source_uri)
Parameters
datasetinsights.io.downloader.unity_simulation.download_manifest(run_execution_id, manifest_file, access_token, project_id, use_cache=True)

Download manifest file from a single run_execution_id For more on Unity Simulation see these docs

Parameters
  • run_execution_id (str) – Unity Simulation run execution id

  • manifest_file (str) – path to the destination of the manifest_file

  • access_token (str) – short lived authorization token

  • project_id (str) – Unity project id that has Unity Simulation enabled

  • use_cache (bool, optional) – indicator to skip download if manifest file already exists. Default: True.

Returns

Full path to the manifest_file

Return type

str

class datasetinsights.io.downloader.GCSDatasetDownloader(**kwargs)

Bases: datasetinsights.io.downloader.base.DatasetDownloader

This class is used to download data from GCS

download(source_uri=None, output=None, **kwargs)
Parameters
  • source_uri – This is the downloader-uri that indicates where on GCS the dataset should be downloaded from. The expected source-uri follows these patterns gs://bucket/folder or gs://bucket/folder/data.zip

  • output – This is the path to the directory where the download will store the dataset.

class datasetinsights.io.downloader.HTTPDatasetDownloader(**kwargs)

Bases: datasetinsights.io.downloader.base.DatasetDownloader

This class is used to download data from any HTTP or HTTPS public url and perform function such as downloading the dataset and checksum validation if checksum file path is provided.

download(source_uri, output, checksum_file=None, **kwargs)

This method is used to download the dataset from HTTP or HTTPS url.

Parameters
  • source_uri (str) – This is the downloader-uri that indicates where the dataset should be downloaded from.

  • output (str) – This is the path to the directory where the download will store the dataset.

  • checksum_file (str) – This is path of the txt file that contains checksum of the dataset to be downloaded. It can be HTTP or HTTPS url or local path.

Raises

ChecksumError – This will raise this error if checksum doesn’t matches

class datasetinsights.io.downloader.UnitySimulationDownloader(access_token=None, **kwargs)

Bases: datasetinsights.io.downloader.base.DatasetDownloader

This class is used to download data from Unity Simulation

For more on Unity Simulation please see these docs <https://github.com/Unity-Technologies/Unity-Simulation-Docs>

Args: access_token: “Access token to be used to authenticate to

unity simulation for downloading the dataset”

SOURCE_URI_PATTERN = 'usim://([^@]*)?@?([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})/(\\w+)'
download(source_uri, output, include_binary=False, **kwargs)
Parameters
  • source_uri

    This is the downloader-uri that indicates where on unity simulation the dataset should be downloaded from.

    The expected source-uri follows these patterns

    usim://access-token@project-id/run-execution-id or usim://project-id/run-execution-id

  • output – This is the path to the directory where the download will store the dataset.

  • include_binary

    Whether to download binary files such as images or LIDAR point clouds. This flag applies to Datasets where metadata (e.g. annotation json, dataset catalog, …)

    can be separated from binary files.

parse_source_uri(source_uri)
Parameters
datasetinsights.io.downloader.create_downloader(source_uri, **kwargs)
This function instantiates the dataset downloader

after finding it with the source-uri provided

Parameters
  • source_uri – URI used to look up the correct dataset downloader

  • **kwargs

Returns: The dataset downloader instance matching the source-uri.