1. Galactic Astrophysics Group
    1. Welcome
  2. Code
    1. galpy
    2. wendy
    3. apogee
    4. gaia_tools
    5. mwdust
    6. Bovy's GitHub
  3. Computing
    1. Compute servers
    2. Data
    3. This site

Accessing data on the compute servers

Where are the data located?

Large data sets are located in the


directory. Typically, you should not depend on this directly, but use an environment variable DATADIR that is set to this directory and your code should use this environment variable to find the location of the top data directory. The same holds for the individual large data sets described below. All of these data might get moved around in the future, so by relying on environment variables your code will be resilient against such changes.


The APOGEE data are located at $DATADIR/sdss/apogee following the layout described here. To use this data, you should set an environment variable SDSS_LOCAL_SAS_MIRROR to point to $DATADIR/sdss/apogee. This wil allow use of the apogee package, but will also be more generally useful.

Gaia data

The Gaia data are located at $DATADIR/Gaia following the layout of the Gaia data archive. To use this data as part of the gaia_tools package, you should set an environment variable GAIA_TOOLS_DATA to point to $DATADIR/data4code/gaia_tools.

The DR1 Gaia data were downloaded with the following commands:

curl -O http://cdn.gea.esac.esa.int/Gaia/tgas_source/fits/TgasSource_000-000-0[00-15].fits
curl -O http://cdn.gea.esac.esa.int/Gaia/gaia_source/fits/GaiaSource_000-0[00-19]-[000-255].fits
curl -O http://cdn.gea.esac.esa.int/Gaia/gaia_source/fits/GaiaSource_000-020-[000-110].fits

2MASS and other catalogs

2MASS data are located at $DATADIR/2mass and have been directly downloaded from the 2MASS archive. The data have been loaded into a PostgreSQL database catalogs that is running on the server (as is usual, this database is owned by the user postgres). The point-sources table is twomass_psc; the schema for this table is here. You can use it for example as

> psql catalogs -U postgres
catalogs=# select count(*) as rows FROM twomass_psc;
(1 row)

In Python, you can access this database using, e.g., the psycopg2 module
>>> import psycopg2
>>> conn= psycopg2.connect(“dbname=catalogs user=postgres”)
>>> cur= conn.cursor()
>>> cur.execute(“select count(*) as rows FROM twomass_psc;”)
>>> print(cur.fetchall())
>>> cur.close()
>>> conn.close()

A slightly more complicated computes the number counts in 0.1 mag bins in J:
>>> cur.execute(“select floor(j_m*10), count(*) as count from twomass_psc group by floor(j_m*10);”)
>>> a= numpy.array(cur.fetchall(),dtype=‘float’).T
>>> sindx= numpy.argsort(a[0])
>>> semilogy(a[0,sindx]/10.,a[1,sindx])

Note that these queries each take about five minutes to execute. The 2MASS point-source catalog comes with a table twomass_psc_hp12 that has the HEALPix index at level 12 in nested format for (RA,Dec). The HEALPix index is contained in hp12index and can be joined with twomass_psc on pts_key. For example, to get all stars in pixel 100 at level 10, you can run in Python
cur.execute(“select ra,decl from twomass_psc, twomass_psc_hp12 where twomass_psc.pts_key = twomass_psc_hp12.pts_key and floor(hp12index/16) = 100;”)

making use of the fact that pixel numbers at lower levels are simple divisions by a factor of four for each lower level.

Downloading data

If you require a large data set, it should be downloaded and kept under the DATADIR directory. Only users that are part of the datagrp group are allowed to write to DATADIR. Users are added to this group using

usermod -g datagrp USER