bossdata.remote module

Download BOSS data files from a remote server.

The remote module is responsible for downloading data files into a local filesystem using a directory layout that mirrors the remote data source. Most scripts will create a single Manager object using the default constructor for this purpose:

import bossdata.remote
mirror = bossdata.remote.Manager()

This mirror object is normally configured by the $BOSS_DATA_URL and $BOSS_LOCAL_ROOT environment variables and no other modules uses these variables, except through a a Manager object. These parameters can also be set by Manager constructor arguments. When neither the environment variables nor the constructor arguments are set, a default data URL appropriate for the most recent public data release (DR12) is used, and a temporary directory is created and used for the local root.

Manager objects have no knowledge of how data files are organized or named: use the bossdata.path module to build the paths of frequently used data files. See API Usage for recommendations on using the bossdata.path and bossdata.remote modules together.

class bossdata.remote.Manager(data_url=None, local_root=None, verbose=True)[source]

Bases: object

Manage downloads of BOSS data via HTTP.

The default mapping from remote to local filenames is to mirror the remote file hierarchy on the local disk. The normal mode of operation is to establish the local root for the mirror using the BOSS_LOCAL_ROOT environment variable. When the constructor is called with no arguments, it will raise a ValueError if either BOSS_DATA_URL or BOSS_LOCAL_ROOT is not set.

Parameters:
  • data_url (str) – Base URL of all BOSS data files. A trailing / on the URL is optional. If this arg is None, then the value of the BOSS_DATA_URL environment variable we be used instead.
  • local_root (str) – Local path to use as the root of the locally mirrored file hierarchy. If this arg is None, then the value of the BOSS_LOCAL_ROOT environment variable, if any, will be used instead. If a value is provided, it should identify an existing writeable directory.
Raises:

ValueError – No such directory local_root or missing data_url.

default_data_url = 'http://dr12.sdss3.org'

Default to use when $BOSS_DATA_URL is not set.

See Executable scripts and API Usage for details.

download(remote_path, local_path, chunk_size=4096, progress_min_size=10)[source]

Download a single BOSS data file.

Downloads are streamed so that the memory requirements are independent of the file size. During the download, the file is written to its final location but with ‘.downloading’ appended to the file name. This means than any download that is interrupted or fails will normally not lead to an incomplete file being returned by a subsequent call to get(). Instead, the file will be re-downloaded. Tere is no facility for resuming a previous partial download. After a successful download, the file is renamed to its final location and has its permission bits set to read only (to prevent accidental modifications of files that are supposed to exactly mirror the remote file system).

Parameters:
  • remote_path (str) – The full path to the remote file relative to the remote server root, which should normally be obtained using bossdata.path methods.
  • local_path (str) – The (absolute or relative) path of the local file to write.
  • chunk_size (int) – Size of data chunks to use for the streaming download. Larger sizes will potentially download faster but also require more memory.
  • progress_min_size (int) – Display a text progress bar for any downloads whose size in Mb exceeds this value. No progress bar will ever be shown if this value is None.
Returns:

Absolute local path of the downloaded file.

Return type:

str

Raises:
  • ValueError – local_path directory does not exist.
  • RuntimeError – HTTP request returned an error status.
get(remote_path, progress_min_size=10, auto_download=True, local_paths=None)[source]

Get a local file that mirrors a remote file, downloading the file if necessary.

Parameters:
  • remote_path (str,iterable) – This arg will normally be a single string but can optionally be an iterable over strings for some advanced functionality. Strings give the full path to a remote file and should normally be obtained using bossdata.path methods. When passing an iterable, the first item specifies the desired file and subsequent items specify acceptable substitutes. If the desired file is not already available locally but at least one substitute file is locally available, this method immediately returns the first substitute without downloading the desired file. If no substitute is available, the desired file is downloaded and returned.
  • progress_min_size (int) – Display a text progress bar for any downloads whose size in Mb exceeds this value. No progress bar will ever be shown if this value is None.
  • auto_download (bool) – Automatically download the file to the local mirror if necessary. If this is not set and the file is not already mirrored, then a RuntimeError occurs.
  • local_paths (list) –

    When this arg is not None, the local paths corresponding to each input remote path are stored to this arg, resulting in a list of the same size as the input remote_path (or length 1 if remote_path is a single string). This enables the following pattern for detecting when a substitution has ocurred:

    mirror = bossdata.remote.Manager()
    remote_paths = [the_preferred_path, a_backup_path]
    local_paths = []
    local_path = mirror.get(remote_paths, local_paths=local_paths)
    if local_path != local_paths[0]:
        print('substituted {} for {}.'.format(local_path, local_paths[0]))
    
Returns:

Absolute local path of the local file that mirrors the remote file.

Return type:

str

Raises:

RuntimeError – File is not already mirrored and auto_download is False.

local_path(remote_path)[source]

Get the local path corresponding to a remote path.

Does not check that the file or its parent directory exists. Use get() to ensure that the file exists, downloading it if necessary.

Parameters:remote_path (str) – The full path to the remote file relative to the remote server root, which should normally be obtained using bossdata.path methods.
Returns:Absolute local path of the local file that mirrors the remote file.
Return type:str
Raises:RuntimeError – No local_root specified when this manager was created.