Errorbars and fitting

Intro

Here, we're going to take our plotting demo a little further and try to fit data with a curve, and plot data with errorbars.

We'll be looking at data from galcrash here; we'll consider the dataset below, taken with periapse of 0.5 kpc (note that the homework requires 1.0 kpc):

#perapse = 0.5 kpc
#mass time (Myr)  uncertainty (Myr)
0.1   1100  100
0.2    520  50
0.3    350  25
0.4    300  25
0.5    280  20

We're going to want to fit this data with something of the form t=m-1/2.

Scilab

With data this small, entering it into SciLab is pretty straightforward:

-->masses = [0.1 0.2 0.3 0.4 0.5];
-->time = [1100 520 350 300 280];
-->uncertainty = [100 50 25 25 20];

We'll now plot in mostly the same way as before, using a style to choose plotting data points rather than lines, and using rect to set the range of our plot so that it doesn't hide the first and last data points. We'll end by adding error bars; errbar plots the data with y-errorbars ranging from (y-3rd argument) to (y+4th argument); for symmetric error bars like we have here, we'll just use the same uncertainty twice:

--> clf; plot2d(masses,time,style=-1,rect=[0,200,0.6,1200]);
--> xtitle("","red galaxy mass / green galaxy mass", 
           "Merger time (Myr)", "");
--> errbar(masses,time,uncertainty,uncertainty);

Now we want to find a value of C which fits the data. A perfectly acceptable way of exploring the data to find out what parameters make sense is a little trial-and-error, choosing C yourself, plotting the function yourself and seeing how it works:

--> // plot the data.
--> clf; plot2d(masses,time,style=-1,rect=[0,200,0.6,1200]); 
--> // Mass values for the fitted curve
--> finem = [0.05:0.01:0.55];
--> // A trial "C" value
--> c = 1.; 
--> // Calculate the expected merger times using this C
--> fittedt = (453./c)*sqrt((0.5)^3)*(finem)^(-1);  
--> // Plot ...
--> plot2d(finem, fittedt);
--> //oops, C is too low (times are too low); try a new C
--> c = 2.;
--> fittedt = (453./c)*sqrt((0.5)^3)*(finem)^(-1);  
--> plot2d(finem, fittedt);
--> //...etc

To do the fitting, we need to do something we haven't done before - create scilab functions. One that gives the fitted data, given some parameters, and one that calculates the error between that and the real data.

-->function g=error(p,z)
-->  realtimes=z(1), masses=z(2);
-->  c = abs(p(1)+1.0e-6);
-->  fittimes = 453./c * sqrt((0.5)^3) * masses^(-1),
-->  g  = realtimes-fittimes
-->endfunction

(Note that we've made C be the fit parameter plus a tiny little bit, just so that during the fitting procedure, if the fitter tries c=0, things won't blow up.)

Now we do the fit:

-->p0 = [1];              // initial guess of 1
-->z  = [time; masses];  
-->[p,err] = datafit(error, z, p0);
-->p
 p  =
    1.458

So the fitting function found the best fit was 1.458. We can now plot this best fit;

--> clf; plot2d(masses,time,style=-1,rect=[0,200,0.6,1200]);
--> finem = [0.05:0.01:0.55];
--> fittedt = (453./p)*sqrt((0.5)^3)*(finem)^(-1);  
--> xtitle("","red galaxy mass / green galaxy mass", "Merger time (Myr)", "");
--> errbar(masses,time,uncertainty,uncertainty);
--> plot2d(finem, fitteddt)

Gnuplot

Gnuplot isn't as powerful as scilab, but it is also a little easier to use.

If we have a file called "data.txt" which contains the following,

#perapse = 0.5 kpc
#mass time (Myr)  uncertainty (Myr)
0.1   1100  100
0.2    520  50
0.3    350  25
0.4    300  25
0.5    280  20
then we can plot the data with errorbars as follows:

gnuplot> set xrange [0.05:0.55]
gnuplot> plot 'data.txt' with errorbars

Fitting the data is only slightly more complicated. Again, we have to define a function to fit to (and gnuplot assumes the dependant variable is called x, so we have to use that), and then we define an initial guess for the fit parameters, and off we go:

gnuplot> fittime(x)=453./c*sqrt((0.5)**3)*x**(-1)
gnuplot> c = 1
gnuplot> fit fittime(x) 'data.txt' using 1:2 via c

[...lots of iterations...]

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

c               = 1.45809        +/- 0.04775     (2.76%)


correlation matrix of the fit parameters:

               c      
c               1.000 
gnuplot> 
and now that the parameter is set, you can plot it:
gnuplot> set xlabel "Red mass / Green Mass"
gnuplot> set ylabel "Merger time (Myr)"
gnuplot> plot fittime(x), 'data.txt' with errorbars

Octave/Matlab

In Octave, the procedure for plotting is very similar to that in SciLab:

octave:1> masses = [0.1 0.2 0.3 0.4 0.5];
octave:2> time = [1100 520 350 300 280];
octave:3> uncertainty = [100 50 25 25 20];
octave:4> errorbar(masses,time,uncertainty);
octave:5> xlabel("Red Mass/Green Mass")
octave:6> ylabel("Merger Time (Myr)")

To do data fitting in Octave (but not Matlab) you may need to install some optional modules; try the below first. If it doesn't work for you, you should be able to download the packages from Octave Forge: you are looking for

and then from within Octave, you can install the above packages given where they are downloaded to:
octave:9> pkg install "/Users/ljdursi/Downloads/struct-1.0.9.tar.gz"
octave:10> pkg install "/Users/ljdursi/Downloads/miscellaneous-1.0.11.tar.gz"
octave:11> pkg install "/Users/ljdursi/Downloads/optim-1.0.16.tar.gz"
One that's done, you again have to define the fitting function and invoke the fitting routine:
octave:12> fitfun = inline("453./(abs(p(1)+1.e-6)) * sqrt((0.5)^(3)) .* x.^(-1)","x","p")
octave:13> guess = [1.0, 0,0];
octave:14> [f, p, kvg, iter, corp, covp, covr, stdresid, Z, r2] = leasqr(masses, time, guess, fitfun);
octave:15> hold on  #save previous graph
octave:16> plot(masses,f) #plot fitted data