################### Configuration Files ################### This page outlines how to construct configuration files to run your own routines with `~skypy.pipeline.Pipeline`. `SkyPy` is an astrophysical simulation pipeline tool that allows to define any arbitrary workflow and store data in table format. You may use `SkyPy` `~skypy.pipeline.Pipeline` to call any function --your own implementation, from any compatible external software or from the `SkyPy library`. Then `SkyPy` deals with the data dependencies and provides a library of functions to be used with it. These guidelines start with an example using one of the `SkyPy` functions, and it follows the concrete YAML syntax necessary for you to write your own configuration files, beyond using `SkyPy` functions. SkyPy example ------------- In this section, we exemplify how you can write a configuration file and use some of the `SkyPy` functions. In this example, we sample redshifts and magnitudes from the SkyPy luminosity function, `~skypy.galaxies.schechter_lf`. - `Define variables`: From the documentation, the parameters for the `~skypy.galaxies.schechter_lf` function are: ``redshift``, the characteristic absolute magnitude ``M_star``, the amplitude ``phi_star``, faint-end slope parameter ``alpha``, the magnitude limit ``magnitude_limit``, the fraction of sky ``sky_area``, ``cosmology`` and ``noise``. If you are planning to reuse some of these parameters, you can define them at the top-level of your configuration file. In our example, we are using ``Astropy`` linear and exponential models for the characteristic absolute magnitude and the amplitude, respectively. Also, ``noise`` is an optional parameter and you could use its default value by omitting its definition. .. code:: yaml cosmology: !astropy.cosmology.default_cosmology.get [] z_range: !numpy.linspace [0, 2, 21] M_star: !astropy.modeling.models.Linear1D [-0.9, -20.4] phi_star: !astropy.modeling.models.Exponential1D [3e-3, -9.7] magnitude_limit: 23 sky_area: 0.1 deg2 - `Tables and columns`: You can create a table ``blue_galaxies`` and for now add the columns for redshift and magnitude (note here the ``schechter_lf`` returns a 2D object) .. code:: yaml tables: blue_galaxies: redshift, magnitude: !skypy.galaxies.schechter_lf redshift: $z_range M_star: $M_star phi_star: $phi_star alpha: -1.3 m_lim: $magnitude_limit sky_area: $sky_area `Important:` if cosmology is detected as a parameter but is not set, it automatically uses the cosmology variable defined at the top-level of the file. This is how the entire configuration file looks like! .. literalinclude:: luminosity.yml :language: yaml You may now save it as ``luminosity.yml`` and run it using the `SkyPy` `~skypy.pipeline.Pipeline`: .. plot:: :include-source: true :context: close-figs import matplotlib.pyplot as plt from skypy.pipeline import Pipeline # Execute SkyPy luminosity pipeline pipeline = Pipeline.read("luminosity.yml") pipeline.execute() # Blue population skypy_galaxies = pipeline['blue_galaxies'] # Plot histograms fig, axs = plt.subplots(1, 2, figsize=(9, 3)) axs[0].hist(skypy_galaxies['redshift'], bins=50, histtype='step', color='purple') axs[0].set_xlabel(r'$Redshift$') axs[0].set_ylabel(r'$\mathrm{N}$') axs[0].set_yscale('log') axs[1].hist(skypy_galaxies['magnitude'], bins=50, histtype='step', color='green') axs[1].set_xlabel(r'$Magnitude$') axs[1].set_yscale('log') plt.tight_layout() plt.show() You can also run the pipeline directly from the command line and write the outputs to a fits file: .. code-block:: bash $ skypy luminosity.yml luminosity.fits Don’t forget to check out for more complete examples_! .. _examples: https://skypy.readthedocs.io/en/stable/examples/index.html YAML syntax ----------- YAML_ is a file format designed to be readable by both computers and humans. Fundamentally, a file written in YAML consists of a set of key-value pairs. Each pair is written as ``key: value``, where whitespace after the ``:`` is optional. The hash character ``#`` denotes the start of a comment and all further text on that line is ignored by the parser. This guide introduces the main syntax of YAML relevant when writing a configuration file to use with ``SkyPy``. Essentially, it begins with definitions of individual variables at the top level, followed by the tables, and, within the table entries, the features of objects to simulate are included. Main keywords: ``parameters``, ``cosmology``, ``tables``. Variables ^^^^^^^^^ * `Variable definition`: a variable is defined as a key-value pair at the top of the file. YAML is able to interpret any numeric data with the appropriate type: integer, float, boolean. Similarly for lists and dictionaries. In addition, SkyPy has added extra functionality to interpret and store Astropy Quantities_. Everything else is stored as a string (with or without explicitly using quotes) .. code:: yaml # YAML interprets counter: 100 # An integer miles: 1000.0 # A floating point name: "Joy" # A string planet: Earth # Another string mylist: [ 'abc', 789, 2.0e3 ] # A list mydict: { 'fruit': 'orange', 'year': 2020 } # A dictionary # SkyPy extra functionality angle: 10 deg distance: 300 kpc * `Import objects`: the SkyPy configuration syntax allows objects to be imported directly from external (sub)modules using the ``!`` tag and providing neither a list of arguments or a dictionary of keywords. For example, this enables the import and usage of any Astropy cosmology: .. code:: yaml cosmology: !astropy.cosmology.Planck13 # import the Planck13 object and bind it to the variable named "cosmology" Parameters ^^^^^^^^^^ * `Parameters definition`: parameters are variables that can be modified at execution. For example, .. code:: yaml parameters: hubble_constant: 70 omega_matter: 0.3 Functions ^^^^^^^^^ * `Function call`: functions are defined as tuples where the first entry is the fully qualified function name tagged with and exclamation mark ``!`` and the second entry is either a list of positional arguments or a dictionary of keyword arguments. For example, if you need to call the ``log10()`` and ``linspace()`` NumPy_ functions, then you define the following key-value pairs: .. code:: yaml log_of_2: !numpy.log10 [2] myarray: !numpy.linspace [0, 2.5, 10] You can also define parameters of functions with a dictionary of keyword arguments. Imagine you want to compute the comoving distance for a range of redshifts and an `Astropy` Planck 2015 cosmology. To run it with the `SkyPy` pipeline, call the function and define the parameters as an indented dictionary. .. code:: yaml comoving_distance: !astropy.cosmology.Planck15.comoving_distance z: !numpy.linspace [ 0, 1.3, 10 ] Similarly, you can specify the functions arguments as a dictionary: .. code:: yaml comoving_distance: !astropy.cosmology.Planck15.comoving_distance z: !numpy.linspace {start: 0, stop: 1.3, num: 10} `N.B.` To call a function with no arguments, you should pass an empty list of ``args`` or an empty dictionary of ``kwargs``. For example: .. code:: yaml cosmo: !astropy.cosmology.default_cosmology.get [] * `Variable reference`: variables can be referenced by their full name tagged with a dollar sign ``$``. In the previous example you could also define the variables at the top-level of the file and then reference them: .. code:: yaml redshift: !numpy.linspace [ 0, 1.3, 10 ] comoving_distance: !astropy.cosmology.Planck15.comoving_distance z: $redshift * The `cosmology` to be used by functions within the pipeline only needs to be set up at the top. If a function needs ``cosmology`` as an input, you need not define it again, it is automatically detected. For example, calculate the angular size of a galaxy with a given physical size, at a fixed redshift and for a given cosmology: .. code:: yaml cosmology: !astropy.cosmology.FlatLambdaCDM H0: 70 Om0: 0.3 size: !skypy.galaxies.morphology.angular_size physical_size: 10 kpc redshift: 0.2 * `Job completion`: ``.depends`` can be used to force any function call to wait for completion of any other job. A simple example where, for some reason, the comoving distance needs to be called after completion of the angular size function: .. code:: yaml cosmology: !astropy.cosmology.Planck15 size: !skypy.galaxies.morphology.angular_size physical_size: 10 kpc redshift: 0.2 comoving_distance: !astropy.cosmology.Planck15.comoving_distance z: !numpy.linspace [ 0, 1.3, 10 ] .depends: size By doing so, you force the function call ``redshift`` to be completed before is used to compute the comoving distance. Tables ^^^^^^ * `Table creation`: a dictionary of table names, each resolving to a dictionary of column names for that table. Let us create a table called ``telescope`` with a column to store the width of spectral lines that follow a normal distribution .. code:: yaml tables: telescope: spectral_lines: !scipy.stats.norm.rvs loc: 550 scale: 1.6 size: 100 * `Column addition`: you can add as many columns to a table as you need. Imagine you want to add a column for the telescope collecting surface .. code:: yaml tables: telescope: spectral_lines: !scipy.stats.norm.rvs loc: 550 scale: 1.6 size: 100 collecting_surface: !numpy.random.uniform low: 6.9 high: 7.1 size: 100 * `Column reference`: columns in the pipeline can be referenced by their full name tagged with a dollar sign ``$``. Example: the galaxy mass that follows a lognormal distribution. You can create a table ``galaxies`` with a column ``mass`` where you sample 10000 object and a second column, ``radius`` which also follows a lognormal distribution but the mean depends on how massive the galaxies are: .. code:: yaml tables: galaxies: mass: !numpy.random.lognormal mean: 5. size: 10000 radius: !numpy.random.lognormal mean: $galaxies.mass * `Multi-column assignment`: multi-column assignment is performed with any 2d-array, where one of the dimensions is interpreted as the rows of the table and the second dimension, as separate columns. Or you can do it from a function that returns a tuple. We use multi-column assignment in the following example where we sample a two-dimensional array of values from a lognormal distribution and then store them as three columns in a table: .. code:: yaml tables: halos: mass, radius, concentration: !numpy.random.lognormal size: [10000, 3] * `Table initialisation`: by default tables are initialised using ``astropy.table.Table()`` however this can be overridden using the ``.init`` keyword to initialise the table with any function call. For example, you can stack galaxy properties such as radii and mass: .. code:: yaml radii: !numpy.logspace [ 1, 2, 100 ] mass: !numpy.logspace [ 9, 12, 100 ] tables: galaxies: .init: !astropy.table.vstack [[ $radii, $mass ]] * `Table reference`: when a function call depends on tables, you need to ensure the referenced table has the necessary content and is not empty. You can do that with ``.complete``. Example: you want to perform a very simple abundance matching, i.e. painting galaxies within your halos. You can create two tables ``halos`` and ``galaxies`` storing the halo mass and galaxy luminosities. Then you can stack these two tables and store it in a third table called ``matching``. .. code:: yaml tables: halos: halo_mass: !numpy.random.uniform low: 1.0e8 high: 1.0e14 size: 20 galaxies: luminosity: !numpy.random.uniform low: 0.05 high: 10.0 size: 20 matching: .init: !astropy.table.hstack tables: [ $halos, $galaxies ] .depends: [ halos.complete, galaxies.complete ] .. _YAML: https://yaml.org .. _NumPy: https://numpy.org .. _Quantities: https://docs.astropy.org/en/stable/units/ .. _clone(): https://docs.astropy.org/en/stable/api/astropy.cosmology.FLRW.html?highlight=clone#astropy.cosmology.FLRW.clone