datasetinsights.io.downloader¶
datasetinsights.io.downloader.base¶
-
class
datasetinsights.io.downloader.base.DatasetDownloader(**kwargs)¶ Bases:
abc.ABCThis is the base class for all dataset downloaders The DatasetDownloader can be subclasses in the following way
class NewDatasetDownloader(DatasetDownloader, protocol=”protocol://”)
Here the ‘protocol://’ should match the prefix that the method download source_uri supports. Example http:// gs://
-
abstract
download(source_uri, output, **kwargs)¶ This method downloads a dataset stored at the source_uri and stores it in the output directory
- Parameters
source_uri – URI that points to the dataset that should be downloaded
output – path to local folder where the dataset should be stored
**kwargs –
-
abstract
-
datasetinsights.io.downloader.base.create_downloader(source_uri, **kwargs)¶ - This function instantiates the dataset downloader
after finding it with the source-uri provided
- Parameters
source_uri – URI used to look up the correct dataset downloader
**kwargs –
Returns: The dataset downloader instance matching the source-uri.
datasetinsights.io.downloader.gcs_downloader¶
-
class
datasetinsights.io.downloader.gcs_downloader.GCSDatasetDownloader(**kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloaderThis class is used to download data from GCS
-
download(source_uri=None, output=None, **kwargs)¶ - Parameters
source_uri – This is the downloader-uri that indicates where on GCS the dataset should be downloaded from. The expected source-uri follows these patterns gs://bucket/folder or gs://bucket/folder/data.zip
output – This is the path to the directory where the download will store the dataset.
-
datasetinsights.io.downloader.http_downloader¶
-
class
datasetinsights.io.downloader.http_downloader.HTTPDatasetDownloader(**kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloaderThis class is used to download data from any HTTP or HTTPS public url and perform function such as downloading the dataset and checksum validation if checksum file path is provided.
-
download(source_uri, output, checksum_file=None, **kwargs)¶ This method is used to download the dataset from HTTP or HTTPS url.
- Parameters
source_uri (str) – This is the downloader-uri that indicates where the dataset should be downloaded from.
output (str) – This is the path to the directory where the download will store the dataset.
checksum_file (str) – This is path of the txt file that contains checksum of the dataset to be downloaded. It can be HTTP or HTTPS url or local path.
- Raises
ChecksumError – This will raise this error if checksum doesn’t matches
-
datasetinsights.io.downloader.unity_simulation¶
UnitySimulationDownloader downloads a dataset from Unity Simulation
-
class
datasetinsights.io.downloader.unity_simulation.Downloader(manifest_file: str, data_root: str)¶ Bases:
objectParse a given manifest file to download simulation output
For more on Unity Simulation please see these docs
-
manifest¶ the csv manifest file stored in a pandas dataframe
- Type
DataFrame
-
data_root¶ root directory where the simulation output should be downloaded
- Type
str
-
MANIFEST_FILE_COLUMNS= ('run_execution_id', 'app_param_id', 'instance_id', 'attempt_id', 'file_name', 'download_uri')¶
-
download_all()¶ Download all files in the manifest file.
-
download_binary_files()¶ Download all binary files.
-
download_references()¶ Download all reference files. All reference tables are static tables during the simulation. This typically comes from the definition of the simulation and should be created before tasks running distributed at different instances.
-
static
match_filetypes(manifest)¶ Match filetypes for every rows in the manifest file.
- Parameters
manifest (pd.DataFrame) – the manifest csv file
- Returns
a list of filetype strings
-
-
class
datasetinsights.io.downloader.unity_simulation.UnitySimulationDownloader(access_token=None, **kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloaderThis class is used to download data from Unity Simulation
For more on Unity Simulation please see these docs <https://github.com/Unity-Technologies/Unity-Simulation-Docs>
Args: access_token: “Access token to be used to authenticate to
unity simulation for downloading the dataset”
-
SOURCE_URI_PATTERN= 'usim://([^@]*)?@?([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})/(\\w+)'¶
-
download(source_uri, output, include_binary=False, **kwargs)¶ - Parameters
source_uri –
This is the downloader-uri that indicates where on unity simulation the dataset should be downloaded from.
The expected source-uri follows these patterns
usim://access-token@project-id/run-execution-id or usim://project-id/run-execution-id
output – This is the path to the directory where the download will store the dataset.
include_binary –
Whether to download binary files such as images or LIDAR point clouds. This flag applies to Datasets where metadata (e.g. annotation json, dataset catalog, …)
can be separated from binary files.
-
parse_source_uri(source_uri)¶ - Parameters
source_uri – Parses source-uri in the following format
or –
usim – //project-id/run-execution-id
-
-
datasetinsights.io.downloader.unity_simulation.download_manifest(run_execution_id, manifest_file, access_token, project_id, use_cache=True)¶ Download manifest file from a single run_execution_id For more on Unity Simulation see these docs
- Parameters
run_execution_id (str) – Unity Simulation run execution id
manifest_file (str) – path to the destination of the manifest_file
access_token (str) – short lived authorization token
project_id (str) – Unity project id that has Unity Simulation enabled
use_cache (bool, optional) – indicator to skip download if manifest file already exists. Default: True.
- Returns
Full path to the manifest_file
- Return type
str
-
class
datasetinsights.io.downloader.GCSDatasetDownloader(**kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloaderThis class is used to download data from GCS
-
download(source_uri=None, output=None, **kwargs)¶ - Parameters
source_uri – This is the downloader-uri that indicates where on GCS the dataset should be downloaded from. The expected source-uri follows these patterns gs://bucket/folder or gs://bucket/folder/data.zip
output – This is the path to the directory where the download will store the dataset.
-
-
class
datasetinsights.io.downloader.HTTPDatasetDownloader(**kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloaderThis class is used to download data from any HTTP or HTTPS public url and perform function such as downloading the dataset and checksum validation if checksum file path is provided.
-
download(source_uri, output, checksum_file=None, **kwargs)¶ This method is used to download the dataset from HTTP or HTTPS url.
- Parameters
source_uri (str) – This is the downloader-uri that indicates where the dataset should be downloaded from.
output (str) – This is the path to the directory where the download will store the dataset.
checksum_file (str) – This is path of the txt file that contains checksum of the dataset to be downloaded. It can be HTTP or HTTPS url or local path.
- Raises
ChecksumError – This will raise this error if checksum doesn’t matches
-
-
class
datasetinsights.io.downloader.UnitySimulationDownloader(access_token=None, **kwargs)¶ Bases:
datasetinsights.io.downloader.base.DatasetDownloaderThis class is used to download data from Unity Simulation
For more on Unity Simulation please see these docs <https://github.com/Unity-Technologies/Unity-Simulation-Docs>
Args: access_token: “Access token to be used to authenticate to
unity simulation for downloading the dataset”
-
SOURCE_URI_PATTERN= 'usim://([^@]*)?@?([a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12})/(\\w+)'¶
-
download(source_uri, output, include_binary=False, **kwargs)¶ - Parameters
source_uri –
This is the downloader-uri that indicates where on unity simulation the dataset should be downloaded from.
The expected source-uri follows these patterns
usim://access-token@project-id/run-execution-id or usim://project-id/run-execution-id
output – This is the path to the directory where the download will store the dataset.
include_binary –
Whether to download binary files such as images or LIDAR point clouds. This flag applies to Datasets where metadata (e.g. annotation json, dataset catalog, …)
can be separated from binary files.
-
parse_source_uri(source_uri)¶ - Parameters
source_uri – Parses source-uri in the following format
or –
usim – //project-id/run-execution-id
-
-
datasetinsights.io.downloader.create_downloader(source_uri, **kwargs)¶ - This function instantiates the dataset downloader
after finding it with the source-uri provided
- Parameters
source_uri – URI used to look up the correct dataset downloader
**kwargs –
Returns: The dataset downloader instance matching the source-uri.