src.data.download.fetch

src.data.download.fetch(url, fname=None, known_hash=None, path=None, downloader=None, processor=None)

Fetch a project file.

This function downloads a source file to the project raw data directory. The function first checks for an up-to-date local copy of the file in path. If it finds one, then it checks its SHA256 hash against the known_hash. If the hashes match, then this function skips the download.

If the function doesn’t find a local copy of the file, or if the local hash doesn’t match the known hash, then this function downloads the file. The function also compares the SHA256 hash of the downloaded file to the known hash and raises an error if they don’t match.

See src.data.download.SOURCES for a dict of registered project files. Each item’s contents can serve as arguments for this function.

>>>from src.data import download
>>>download.fetch(**download.SOURCES["my-file.csv"])
"/path/to/my-file.csv"
Parameters:
  • url (str) – The URL to download data from.
  • fname (str) – The base name of the file to fetch (e.g., “my-file.csv”).
  • known_hash (str) – The file’s SHA256 hash value.
  • path (str) – Directory in which to store the file.
  • downloader (str, callable) – A special downloading function.
  • processor (str, callable) – A special post-processing function.
Returns:

The path to the downloaded file.

Return type:

str