Executable scripts

For complete documentation on the command-line options of any script use the –help option, for example:

bossquery --help

You will normally want to configure bossdata by setting some environment variables.

bossquery

Query the meta data for BOSS observations. For example:

bossquery --what PLATE,MJD,FIBER,PLUG_RA,PLUG_DEC,Z --where 'OBJTYPE="QSO"' --sort Z --save qso.dat

The –save option supports many different output formats that are automatically selected based on the file extension. In addition, this program automatically maps the .dat and .txt extensions to the ascii format.

The –what, –where and –sort options all use SQL syntax (these are in fact substituted into a SQL string).

  • –what takes a comma separated list of column names (like SQL SELECT) and defaults to PLATE,MJD,FIBER:

    --what PLATE,MJD,FIBER,PLUG_RA,PLUG_DEC,Z
    
  • –where takes a SQL ‘WHERE’ string:

    --where '(OBJTYPE="QSO" and Z > 0.1) or CLASS="QSO"'
    
  • –sort takes a list of columns with optional DESC keyword following columns to reverse their order (a la SQL ORDER BY):

    --sort 'CLASS, Z DESC'
    

This command uses an sqlite3 database of metadata that will be created if necessary. By default, the “lite” version database will be used, which provides faster queries and a smaller database file. However, the full spAll data model is also available with the –full option (resulting in slower queries and a larger database file). The “lite” and “full” databases are separate files based on different downloads. Once either has been created the first time, it will be immediately available for future queries. Note that it can take a while to create the initial database file: allow about 30 minutes for either version. Once the database has been created, you can safely delete the downloaded source file if you are short on disk space.

The columns in the lite database are a subset of those in the full database but the values are not numerically identical between them because they are truncated in the text file used to generate the lite database. However, the level of these truncation errors should be insignificant for any science applications.

There are some minor inconsistencies between the data models of the lite and full versions of the meta data provided by BOSS. In particular, the lite format uses the name FIBER while the full version uses FIBERID. We resolve this by consistently using the shorter form FIBER in both SQL databases. Also, the full format includes columns that are themselves arrays. One of these, MODELFUX(5), is included in the lite format using names MODELFLUX0...MODELFUX4. We normalize the mapping of array columns to scalar SQL columns using the syntax COLNAME_I for element [i] of a 1D array and COLNAME_I_J for element [i,j] of a 2D array, with indices starting from zero. This means, for example, that MODELFLUX(5) values are consistently named MODELFLUX_0...MODELFLUX_4 in both SQL databases.

In the case where a query is made without specifying –full but the lite database file is not present, an attempt will be made to use the full database. If neither DB files are present the same logic is applied to the catalog files. If present, the lite catalog file will be parsed and the lite DB created; if that is not present, the full catalog file will be parsed and the full DB created. Only after exhausting these options will a download (of the lite DB) file be attempted.

Note that specifying –full will only (and always) use the full DB or catalog file.

The –quasar-catalog option can be used to query the BOSS quasar catalog instead of spAll. By default, the current version of the catalog will be used; use the –quasar-catalog-name option to specify an earlier version.

The `--platelist option can be used to query the BOSS plate list database instead of spAll.

bossfetch

Fetch BOSS data files containing the spectra of specified observations and mirror them locally. For example:

bossfetch --verbose qso.dat

Fetched files will be placed under $BOSS_LOCAL_ROOT with paths that exactly match the URLs they are downloaded from with the prefix substitution:

$BOSS_DATA_URL => $BOSS_LOCAL_ROOT

For example, with the default configuration given above, the file at:

http://dr12.sdss3.org/sas/dr12/boss/spectro/redux/v5_7_0/spectra/lite/3586/spec-3586-55181-0190.fits

would be downloaded to:

$BOSS_LOCAL_ROOT/sas/dr12/boss/spectro/redux/v5_7_0/spectra/lite/3586/spec-3586-55181-0190.fits

By default, the “lite” format of each spectrum data file is downloaded, which is sufficient for many purposes and signficantly (about 8x) smaller. The “lite” format contains HDUs 0-3 of the full spectrum data file and does not include the spectra of individual exposures. To download the full files instead, use the --full option. Both types of files can co-exist in your local mirror. You can also load the plate spFrame or flux-calibrated spCFrame files using the --frame or --cframe options, respectively. These files contain a half plate of spectra for a single band (blue/red) and exposure. Finally, you can load the spPlate files containing combined spectra for a whole plate using the --platefile option. See the Overview of SDSS Spectroscopic Data for details.

The --verbose option displays a progress bar showing the fraction of files already locally available. Any files that were previously fetched will not be downloaded again so it is safe and efficient to run bossfetch for overlapping lists of observations. Note that the progress bar may appear to update unevenly if some files are already mirrored and others need to be downloaded.

Each data file download is streamed to a temporary files with .downloading appended to their name then renamed to remove this extension after the download completes normally. If a download is interrupted or fails for some reason, the partially downloaded file will remain in the local mirror. Re-running a bossfetch command will automatically re-download any partially downloaded file.

By default, downloading is split between two parallel subprocesses but you can change this with the --nproc option. For downloading “lite” files, using more than 2 subprocesses will probably not improve the overall performance.

If you want to transfer large amounts of files, you should consider using globus. To prepare a globus bulk data transfer file list, use the –globus option to specify the remote/local endpoint pair remote#endpoint:local#endpoint. Note that the –save option must also be used to specify an output filename. SDSS endpoints are documented at here.

For example, to transfer files from lbnl#sdss3 to local#endpoint:

bossfetch qso.dat --globus lbnl#sdss3:username#endpoint --save globus-xfer.dat
ssh username@cli.globusonline.org transfer -s 1 < globus-xfer.dat

bossplot

Plot the spectrum of a single BOSS observation, identified by its PLATE, MJD of the observation, and the FIBER that was assigned to the target whose spectrum you want to plot. For example (these are the defaults if you omit any parameters):

bossplot --plate 6641 --mjd 56383 --fiber 30

This should open a new window containing the plot that you will need to close in order to exit the program. To also save your plot, add the --save-plot option with a filename that has a standard graphics format extension (pdf,png,...). If you omit the filename, --save-plot uses the name bossplot-{plate}-{mjd}-{fiber}.png. To save plots directly without displaying them, also use the --no-display option.

You can also save the data shown in a plot using --save-data with an optional filename (the default is bossplot-{plate}-{mjd}-{fiber}.dat). Data is saved using the ascii.basic format and only wavelengths with valid data are included in the output.

Use --wlen-range [MIN:MAX] to specify a wavelength range over which to plot (x-axis), overriding the default, auto-detected range. Similarly, --flux-range [MIN:MAX] and --wdisp-range [MIN:MAX] work for the flux (left y-axis) and dispersion (right y-axis). MIN and MAX can be either blank (which means use the default value), an absolute value (1000), or a percentage (10%), and percentages and absolute values may be mixed. Working examples:

--wlen-range [:7500]
--wlen-range [10%:90%]
--wlen-range [10%:8000]

Note that a percentage value between 0-100% is interpreted as a percentile for vertical (flux, wdisp) axes. In all other cases, percentage values specify a limit value equal to a fraction of the full range [lo:hi]:

limit = lo + fraction*(hi - lo)

and can be < 0% or >100% to include padding. Another visual option --scatter will give a scatter plot of the flux rather than the flux 1-sigma error band.

Plots include a label PLATE-MJD-FIBER by default (or PLATE-MJD-FIBER-EXPID for a single exposure). Add the option --label-pos <VALIGN>-<HALIGN> option to change its position, with <VALIGN> = top, center, bottom and <HALIGN> = left, center, right. Use --label-pos none to remove the label. Use --no-grid to remove the default wavelength grid lines.

Several options are available to see data beyond just object flux. Use --show-sky to show the subtracted sky (modeled) flux, --add-sky to show the total of object flux and modeled sky flux, --show-mask to show grayed regions where data has been masked out because it is deemed invalid, and --show-dispersion to show wavelength dispersion.

You will sometimes want to see data that would normally be masked as invalid. To include pixels with a particular mask bit set, use the --allow-mask option, e.g.:

bossplot --allow-mask 'BRIGHTSKY|SCATTEREDLIGHT'

Note that multiple flags can be combined using the logical-or symbol |, but this requires quoting as shown above. To show all data, including any invalid pixels, use the --show-invalid option.

The bossplot command will automatically download the appropriate data file if necessary. This is ‘conservative’: if an existing local file can be used to satisfy a request, no new files will be downloaded.

Spectra can be plotted from different data files. By default the spec-lite data file is used for a coadd or the spec file for an individual exposure. Use the --frame or --cframe options to plot a single-exposure spectrum from a plate spFrame file or its flux-calibrated equivalent spCFrame file. Use the --platefile option to plot the combined spectrum from an spPlate file. See the Overview of SDSS Spectroscopic Data for details.

To plot a single exposure, use the --exposure option to specify the sequence number (0,1,...) of the desired exposure. You can also set the --band option either blue or red to plot a single camera’s data, or both to superimpose the overlapping data from both cameras. Note that when displaying data from a co-added data product (spec, speclite, spPlate), the exposure sequence number only indexes exposures that were actually used in the final co-added spectrum. However, the spFrame and spCFrame data products include all exposures used as input to the co-add (based on a bossdata.plate.Plan) so, in cases where not all exposures are used, the --exposure option indexes a larger list of science exposures. Use the --verbose option to display information about the available exposures in either case.

This script uses the matplotlib python library, which is not required for the bossdata package and therefore not automatically installed, but is included in scientific python distributions like anaconda.