bossdata.meta module¶

Support for querying the metadata associated with BOSS observations.

class bossdata.meta.Database(finder=None, mirror=None, lite=True, quasar_catalog=False, quasar_catalog_name=None, platelist=False, verbose=False)[source]¶

Bases: object

Initialize a searchable database of BOSS observation metadata.

Parameters:

finder (bossdata.path.Finder) – Object used to find the names of BOSS data files. If not specified, the default Finder constructor is used.
mirror (bossdata.remote.Manager) – Object used to interact with the local mirror of BOSS data. If not specified, the default Manager constructor is used.
lite (bool) – Use the “lite” metadata format, which is considerably faster but only provides a subset of the most commonly accessed fields. Ignored if either quasar_catalog or platelist is True.
quasar_catalog (bool) – Initialize database using the BOSS quasar catalog instead of spAll.
quasar_catalog_name (str) – The name of the BOSS quasar catalog to use, or use the default if this is None.
platelist (bool) – Initialize the database use the platelist catalog instead of spAll.

prepare_columns(column_names)[source]¶

Validate column names and lookup their types.

Parameters:	column_names (str) – Comma-separated list of column names or the special value ‘*’ to indicate all available columns.
Returns:	Tuple (names,dtypes) of lists of column names and corresponding numpy data types. Use `zip()` to convert the return value into a recarray dtype.
Return type:	tuple
Raises:	`ValueError` – Invalid column name.

select_all(what='*', where=None, sort=None, max_rows=100000)[source]¶

Fetch all results of an SQL select query.

Since this method loads all the results into memory, it is not suitable for queries that are expected to return a large number of rows. Instead, use select_each() for large queries.

Parameters:	what (str) – Comma separated list of column names to return or ‘’ to return all columns. where* (str) – SQL selection clause or None for no filtering. Reserved column names such as PRIMARY must be escaped with backticks in this clause. max_rows (int) – Maximum number of rows that will be returned.
Returns:	astropy.table.Table: Table of results with column names matching those in the database, and column types inferred automatically. Returns None if no rows are selected.
Return type:	:class
Raises:	`RuntimeError` – failed to execute query.

select_each(what='*', where=None)[source]¶

Iterate over the results of an SQL select query.

This method is normally used as an iterator, e.g.

for row in select(...):

# each row is a tuple of values ...

Since this method does not load all the results of a large query into memory, it is suitable for queries that are expected to return a large number of rows. For smaller queries, the select_all() method might be more convenient.

Parameters:	what (str) – Comma separated list of column names to return or ‘’ to return all columns. where* (str) – SQL selection clause or None for no filtering. Reserved column names such as PRIMARY must be escaped with backticks in this clause.
Raises:	`sqlite3.OperationalError` – failed to execute query.

bossdata.meta.create_meta_full(catalog_path, db_path, verbose=True, primary_key='(PLATE, MJD, FIBER)')[source]¶

Create the “full” meta database from a locally mirrored catalog file.

The created database renames FIBERID to FIBER and has a composite primary index on the (PLATE,MJD,FIBER) columns. Sub-array columns are also unrolled: see sql_create_table() for details. The conversion takes about 24 minutes on a laptop with sufficient memory (~4 Gb). During the conversion, the file being written has the extension .building appended, then this extension is removed (and the file is made read only) once the conversion successfully completes. This means that if the conversion is interrupted for any reason, it will be restarted the next time this function is called and you are unlikely to end up with an invalid database file.

Parameters:	catalog_path (str) – Absolute local path of the “full” catalog file, which is expected to be a FITS file. db_path (str) – Local path where the corresponding sqlite3 database will be written.

bossdata.meta.create_meta_lite(sp_all_path, db_path, verbose=True)[source]¶

Create the “lite” meta database from a locally mirrored spAll file.

The created database has a composite primary index on the (PLATE,MJD,FIBER) columns and the input columns MODELFLUX0..4 are renamed MODELFLUX_0..4 to be consistent with their names in the full database after sub-array un-rolling.

The DR12 spAll lite file is ~115Mb and converts to a ~470Mb SQL database file. The conversion takes about 3 minutes on a laptop with sufficient memory (~4 Gb). During the conversion, the file being written has the extension .building appended, then this extension is removed (and the file is made read only) once the conversion successfully completes. This means that if the conversion is interrupted for any reason, it will be restarted the next time this function is called and you are unlikely to end up with an invalid database file.

Parameters:	sp_all_path (str) – Absolute local path of the “lite” spAll file, which is expected to be a gzipped ASCII data file. db_path (str) – Local path where the corresponding sqlite3 database will be written.

bossdata.meta.get_plate_mjd_list(plate, finder=None, mirror=None)[source]¶

Return the list of MJD values when a plate was observed.

Uses a query of the platelist, so this file will be automatically downloaded if necessary. Only MJD values for which the observation data quality is marked “good” will be returned.

Parameters:	plate (int) – Plate number. finder (bossdata.path.Finder) – Object used to find the names of BOSS data files. If not specified, the default Finder constructor is used. mirror (bossdata.remote.Manager) – Object used to interact with the local mirror of BOSS data. If not specified, the default Manager constructor is used.
Returns:	A list of MJD values when this plate was observed. The list will be empty if this plate has never been observed.
Return type:	list

bossdata.meta.sql_create_table(table_name, recarray_dtype, renaming_rules={}, primary_key=None)[source]¶

Prepare an SQL statement to create a database for a numpy structured array.

Any columns in the structured array data type that are themselves arrays will be unrolled to a list of scalar columns with names COLNAME_I for element [i] of a 1D array and COLNAME_I_J for element [i,j] of a 2D array, etc, with indices I,J,... starting from zero.

Parameters:	table_name (str) – Name to give the new table. recarray_dtype – Numpy structured array data type that defines the columns to create. renaming_rules (dict) – Dictionary of rules for renaming columns. There are no explicit checks that these rules do not create duplicate column names or that all rules are applied. primary_key (str) – Column name(s) to use as the primary key, after apply renaming rules. No index is created if this argument is None.
Returns:	Tuple (sql,num_cols) where sql is an executable SQL statement to create the database and num_cols is the number of columns created.
Return type:	tuple
Raises:	`ValueError` – Cannot map data type to SQL.