AST 3100 – Mini-course: Basics of Coding and Statistics (2025)
1. Syllabus
- Lectures
- AB 113, Friday 11-1. Generally a more lecture-style meeting in the first hour followed by a hands-on sessions.
- Lecturer
- Marten van Kerkwijk, MP 1203B, 416-946-7288, mhvk@astro.utoronto.ca
- Office hours
- Drop by my office, or by appointment
- Web page
-
http://www.astro.utoronto.ca/~mhvk/STATSCODINGMINI/
Synopsis
Almost all of us use code and statistics, as tools to analyze data, simulate something, design an instrument, or do semi-analytical modeling. This minicourse is aimed at given you the basics with which to tackle programming and statistics needs. We will work in a way that combines the two, discussing basic statistics and writing code in which those basics are applied and simulated (ideally in a way immediately relevant for your research). The hope is to get into a habit of ensuring that both code and statistics are well-tested and well-documented.
Coding
- Specific topics
- Basic principles: breaking into small pieces, clear naming, consistent style; nothing is interactive more than once, tool everything;
- Version control: track history automatically; make small incremental changes;
- Automated testing: test modules, assert code expectations, bugs are new tests;
- Profiling: optimize what matters;
- Documentation: embed where possible, auto-generate, auto-test code examples;
- Texts
- A general description of how to optimize programming by Wilson et al., "Best practices for scientific computing", 2014, PLoS Biol 12(1), and also its sequel (prequel?), "Good enough practices in scientific computing", 2017, PLoS Comput Biol 13(6).
- The zen of python;
- Astropy's developer documentation, which includes not just how to contribute but also discusses coding style, documentation, and testing;
- OpenAstronomy's python packaging guide to make your own package using their template, including automatic testing and documentation building;
- Setting up Emacs as a full-fledged IDE (Emacs's org mode is what prevents me from becoming fully unorganized; started by an astronomer… but feel free to use your own preferred editor!).
Statistics
- Specific topics
- General error propagation; χ2 fitting, probabilities, number of parameters, degrees of freedom. Estimating expected uncertainties. The wonderfully broad applicability of fitting a line.
- Optimal measurements; thinking clearly about what in the data we really need to extract.
- Poisson errors, maximum likelihood for Poisson-distributed data.
- Pitfalls: Resolution and binning (e.g., for X-ray spectra); number of trials (e.g., source/period finding).
- Monte-Carlo analysis.
- Statistics texts
- A good general text book is Bayesian Logical Data Analysis for the Physical Sciences (BLDAPS), by Phil Gregory (2005, Cambridge Univ. Press), available on-line via the UofT library. This text also describes non-Bayesian analysis and shows how for quite general cases the results are very similar.
- Useful for me has been Numerical Recipes, arguably more for the text than the code, in particular the chapter on Modeling of Data.
- Often it helps most to read articles about topics closely related to
what one has to do oneself. I hope to cover:
- Horne 1986PASP...98..609H: An optimal extraction algorithm for CCD spectroscopy.
- Alard & Lupton, 1998ApJ...503..325A: A Method for Optimal Image Subtraction.
- Rucinski 2002AJ....124.1746R: Radial Velocity Studies of Close Binary Stars. VII. Methods and Uncertainties.
- Cash 1979ApJ...228..939C: Parameter estimation in astronomy through application of the likelihood ratio (Poisson statistics).
- Gregory & Loredo 1992ApJ...398..146G: A new method for the detection of a periodic signal of unknown shape and period.
Prerequisites
- Familiarity with basic python;
- A github account and a laptop set up for development (if this is the first time you do this, start with astropy and follow their instructions on getting the development version before the first class; you'll see that there is a choice for authentication in github; I recommend using SSH keys). If you encounter difficulties, write them down: you have just found a possible first contribution!
Evaluation
- Two problem sets (20% each): set 1 (due Oct 10).
- A contribution to someone else's open-source packge (20%). I can help most with astropy and numpy, which each label issues that are suitable for new contributors (astropy, numpy), but other packages are fine too (e.g., Jo's galpy; my baseband, baseband-tasks or screens; Hanno's REBOUND; or check out astropy affiliated packages).
- A piece of your own code fully tested and documented (40%), ideally involving statistics. I strongly recommend following OpenAstronomy's packaging guide, as it makes it very easy to automate continuous integration and generation of documentation.