Data Analysis Methodology

Most variable star data consists of measurements of the brightness of the star at various times.Usually, the analysis of variable star data is aimed at obtaining possible periods of the star and is carried out by applying several different methods of time series analysis in conjunction with each other. One may employ these methods to confirm/deny periods obtained from other methods. These methods may be broadly classified as light curves, phase diagrams (or the “folded” light curve), Fourier-type transforms (or power-spectra), correlational analyses and wavelet analyses. These methods are discussed below, with the exception of wavelet analysis, which is not used as frequently and is beyond the context of this discussion.

Step 1: Light Curve

A usual first step is to just look at the star's light curve. A light curve is a plot of brightness (magnitudes) versus time (Julian Date). Thus, by looking at the light curve, one can get a sense of the periodicity or irregularity of the star's variation including whether the variation is long term or short term. You should also be able to determine a rough approximation of the characteristic timescale of the periodic behaviour by looking at the time between successive maxima (or minima).

The primary advantages of using light curves are that they are intuitive to read and require no prior analysis to plot. However, they are of limited usefulness in man instances, for example, when determining the periods of complex variability. Another problem for this technique occurs when the spacing of the observations approaches or exceeds that of the period itself, or when large gaps (relative to the period) exist in the curve due to seasonal non-visibility of the star or other conditions. Having said this, generating a light curve is almost always a useful first step in any analysis, or for corroborating the period candidates determined by other methods.

Step 2: Phase Diagram

A common second step is to plot the data in a phase diagram, generated by assuming the suspected period determined from the light curve is the correct one. The advantage of using a phase diagram as a diagnostic tool for the suspected period is that it clearly shows the shape of the periodic signal. If we use a period, which is far from the true value, the phased data scatter all over the diagram, instead of forming a single narrow, smooth curve. We can use this fact to modify the suspected period to determine its true value. However, this will only work if the star is regular and monoperiodic. We begin by determining the suspected period (either from the light curve, or from some other source).Next, plot the resulting phase diagram. Then, modify the period and repeat until the points fall along a narrow locus. Several methods of period determination exist, which are based on this technique, whereby the scatter of the phase diagrams of different test periods are compared; the period associated with the least scatter being most likely the correct one.

It is relatively easy to generate a table of phase values using a spreadsheet program, such as Microsoft Excel, using the formula:

=mod((B1-epoch),period),period

where B1 refers to the cell containing the time of the measurement.

Step 3: Fourier Analysis

Assuming the signal is periodic with one or more constant period(s), one may perform Fourier analysis on the time series. Fourier analysis is based on the idea that any piece-wise continuous and periodic function can be constructed by an infinite sum of sine and cosine functions. In other words, one attempts to express the signal as a linear combination of sine functions, each with a specific frequency and amplitude. The amplitude of a particular sinusoid at a given frequency shows the extent to which the signal is oscillating at that requency. Thus, if the time series shows strong periodicity at a few periods, the amplitudes corresponding to the sine curves with those periods will be relatively large. Another measure of the presence of a frequency is power. Thus, a Fourier analysis program may output either a plot of amplitude or power versus frequency (or period), commonly referred to as a power spectrum. The highest peak in the power spectrum represents the most likely period of the star.

The assumption that a variable star is strictly periodic is often too strong, which can be problematic for Fourier analysis. If the star is semiregular, exhibiting characteristic timescales of variability without having an actual period, the power spectrum will not display any spikes that are significantly stronger than the rest.

Another significant deficiency associated with Fourier analysis results when regularly-spaced gaps occur in the data (eg. when the star can no longer be seen at certain parts of the year). If this is the case, a sinusoid with the wrong period can be in phase with the oscillations of the signal where there is data, and out of phase where the data is missing, producing a good fit. Furthermore, the regular spacing of observations (once per night) creates “fake” alias periods offset from the actual period:

1/P_alias= 1/P_real ± N/T

where N is a whole number and T is the separation between regularly spaced points or the inherent periodicity = of the times of observation. Determining which period candidate is the true period is often not a trivial = task.

Step 4: Self-Correlation Analysis

One method of time series analysis, which doesn't produce alias periods is self-correlation analysis. It is especially useful for semi-regular stars and for stars with irregularly spaced observations. However, self-correlation analysis is not as helpful as Fourier analysis in determining the periods of multi-periodic stars.

This method determines the cycle-to-cycle behavior of the star, averaged over all the data. The measurements do not have to be equally spaced. The governing principal behind this method is that for an approximately periodic star, observations which are separated by one whole period will have zero difference in magnitude. Points that are separated by a half-period will, on average, have the largest difference in magnitude. Thus, for all pairs of measurements, the difference in magnitude and the difference in time are calculated. Delta mag is then plotted against delta time to some upper limit. This limit should be a few times greater than the expected timescales but less than the total time span of the data. The delta mags are binned in delta t so that, if possible, there are at least a few values in each bin; the delta mags in each bin are then averaged. The average delta mag will be a minimum at multiples of tau.

Each minimum can be used to estimate tau. The height of the maxima is a measure of the average amplitude of the variability. If the variability were perfectly periodic and the magnitudes had no error, then the minima would fall to zero; in fact, the height of the minima is determined by the average error of the magnitudes and by the degree of irregularity.

You will find several sources of variable star data to undertake your own research project here.

Data Analysis Methodology courtesy of Akos Bakos.