The download Module

Utilities for downloading project source data.

This module makes it easy to download raw source data from online sources to their standard locations in a local project directory. It leverages the Pooch package to verify the integrity of the downloaded files and to avoid downloading files that are already available locally.

Top level function src.data.download.fetch() downloads source data files.

Top level variable src.data.download.SOURCES is a dict with all the source data files that are registered for this project. Here’s a snippet:

{
    # National Fire Incident Reporting System (NFIRS) data.
    "nfirs.csv": {
        "fname": "nfirs.csv",
        "url": "https://drive.google.com/uc?id=1ENJZwazX7hJ4GwI03DKgX51y-644x-cZ",
        "known_hash": "0fcd2c4edae304dbb21c1b0dc6ca9afd17d7d65f21e51cd26571f9d42db7f825",
        "downloader": "download_from_google_drive",
    },
    ...
}

Add more entries to make new files downloadable. New entries should have:

  • fname: The file basename.
  • url: The URL for download from.
  • known_hash: The file’s SHA256 hash value to verify download integrity.

You can use pooch.file_hash() or hashlib to get file hash values.

Optionally, a source can specify downloader and processor functions with special instructions for downloading and processing (e.g., unzipping) a file. The values for these items can be functions or or strings that are mapped to functions in either src.data.download.DOWNLOADERS or src.data.download.PROCESSORS. For details, see Pooch documentation on custom downloaders and post-processing hooks.

Run this module as a script to download project data.

src.data.download.SOURCES

A registry of project data sources.

Type:dict
src.data.download.DOWNLOADERS

A registry of special downloading functions.

Type:dict
src.data.download.PROCESSORS

A registry of special post-processing functions.

Type:dict

Top Level Functions

fetch(url[, fname, known_hash, path, …]) Fetch a project file.

Downloaders

download_from_google_drive(url, output_file, …) A downloader to fetch files from Google Drive.