The download Module¶
Utilities for downloading project source data.
This module makes it easy to download raw source data from online sources to their standard locations in a local project directory. It leverages the Pooch package to verify the integrity of the downloaded files and to avoid downloading files that are already available locally.
Top level function src.data.download.fetch() downloads source data files.
Top level variable src.data.download.SOURCES is a dict with all the
source data files that are registered for this project. Here’s a snippet:
{
# National Fire Incident Reporting System (NFIRS) data.
"nfirs.csv": {
"fname": "nfirs.csv",
"url": "https://drive.google.com/uc?id=1ENJZwazX7hJ4GwI03DKgX51y-644x-cZ",
"known_hash": "0fcd2c4edae304dbb21c1b0dc6ca9afd17d7d65f21e51cd26571f9d42db7f825",
"downloader": "download_from_google_drive",
},
...
}
Add more entries to make new files downloadable. New entries should have:
fname: The file basename.url: The URL for download from.known_hash: The file’s SHA256 hash value to verify download integrity.
You can use pooch.file_hash() or hashlib to get file hash values.
Optionally, a source can specify downloader and processor functions
with special instructions for downloading and processing (e.g., unzipping) a
file. The values for these items can be functions or or strings that are mapped
to functions in either src.data.download.DOWNLOADERS or
src.data.download.PROCESSORS. For details, see Pooch documentation on
custom downloaders and post-processing hooks.
Run this module as a script to download project data.
Downloaders¶
download_from_google_drive(url, output_file, …) |
A downloader to fetch files from Google Drive. |