Once the data has been entered into the computer, either via a data acquisition system or by hand, it often still needs to be manipulated to ascertain the range of validity. While some of this work can be done analytically (and should, as a verification of the algorithms being used), high-speed computers can perform this analysis quickly on large batches of data, reducing the time between data acquisition and interpretation significantly.
Most of the time experiments are not measuring those quantities that are of interest to physicists. Some form of calculation must be carried out to convert the measured data into the variables that are actually being investigated. Since the measured data is not perfect, there will be some inherent uncertainty associated with the measurement, and it is important to know how these uncertainties propagate through the calculations to generate a range of uncertainty in the final quantity.
In order to calculate this final uncertainty, the uncertainty, or variance, of each measurement must be calculated. For example, suppose that the final quantity, x, is known to be a function of at least two measured variables, u and v,
|
|
x = f(u,v), |
(4.1) |
where the measured value of u satisfies
, and similarly for v. Then the most probable value for x can be calculated to be
|
|
|
(4.2) |
Note that this is not always exact, but it can be used for the vast majority of cases encountered.
The
uncertainty associated with x is essentially the same as the variance of
x, given in terms of u and v. To calculate the variance, recall that the deviation
can be written as a
power series of the deviations in u and v:
|
|
|
(4.3) |
The average deviation, by the definition of the mean, is always zero. This is because there are (on average) an equal number of points that are negative as are positive. However, we are usually only interested in the absolute difference between the point and the mean. We could use the absolute value, but this leads to discontinuities. Instead, we look at the square of the difference. Thus, the variance is defined to be
|
|
|
(4.4) |
Applying this equation to x and using equation (4.3), the variance of x is then
|
|
|
(4.5) |
Where equation (4.4) was used in the last step to simplify the terms in u and v, and the covariance is defined as
|
|
|
(4.6) |
Equation (4.5) is known as the error propagation equation. The extension of equations (4.3) and (4.5) to equations involving more than two sets of measurements should be obvious.
The first two terms in equation (4.5) are the averages of the squares of the deviations weighted by the square of the partial derivatives, and may be considered to be the averages of the squares of the deviations in x produced by the uncertainties in u and v, respectively. The third term is the average of the cross terms involving products of deviations in u and v weighted by the product of the partial derivatives. If the measurements u and v are independent of each other then, on the average, there should be equal distributions of positive and negative values for this term. Thus, this term will vanish in the limit of a large number of observations. Since the independence of the measurements cannot always be explicitly verified, calculating the covariance provides a good test of independence.
As defined, equation (4.5) cant be used in a computer calculation without knowing how to calculate the partial derivatives. Fortunately, in many cases this can be avoided. Suppose that x is determined via equation (4.2), where f is a known function of the variables u and v. Then given u and v, a function to calculate x could be easily written. Once we have this function, the change in x with respect to u is then found by calling the function, replacing u with u+su in the passed argument. The deviation in x caused by the deviation in u is then found by subtracting x from each result. The deviation in x caused by the deviation in v is calculated in a similar way. Finally, squaring each deviation term, summing them together, and taking the square root of the result results in the desired uncertainty. Notice that this assumes that the variances in u and v are symmetric, and that the covariance is negligible.
Example:
The acceleration due to gravity is given by
.
If the height is measured to be
and the time is
, then if we assume that the covariance is negligible, the
final value of the acceleration due to gravity can be computed using gravity.c.
In many experiments, instead of measuring a single quantity multiple times, a series of measurements are made of a pair of variables, (xi,yi), one measurement for each index i, where i runs from 1 to N. The goal of the experiment is to find a function y = y(x) that describes the relation between these two measured variables. If the theoretical curve describing the relationship is a either a straight line, an exponential growth or decay, or a power law, the parameters describing the curve can be found directly by plotting the data on linear, semi-log or log-log graph paper, respectively. Any other type of curve requires numerical solution.
In every case, the method of
maximum-likelihood applies. Assume
that N measurements of an independent variable xi and
a dependent variable yi were made. We want to fit this data to some function y(xi),
where the function itself is parameterized by a set of parameters aj,
where j runs from 1 to m, i.e.,
. In order to
determine the parameters, we first convert y(xi) to a
normalized probability function, evaluated at the observed value xi,
|
|
|
(4.7) |
The likelihood function L(a1,a2, ,am) is then defined as the product of these probability functions,
|
|
|
(4.8) |
To find the maximum-likelihood values of the parameters, equation (4.8) is minimized with respect to the parameters. This results in m equations in N variables that need to be solved simultaneously.
The general maximum-likelihood method is a very powerful tool for data analysis. Indeed, whole courses and books have been devoted to understanding when and how to use the maximum-likelihood method (If you are interested in more information, see Data Reduction and Error Analysis for the Physical Sciences, by Philip R. Bevington and D. Keith Robinson). However, for most applications, it is more complicated than is needed. Instead, a special case of the maximum-likelihood method can be used, a method known as the least squares fitting method.
In the least squares fitting method, we have pairs of measurements (xi,yi) of an independent variable x and dependent variable y, and we wish to fit these measurements to a power series expansion
|
|
|
(4.9) |
If the measurements are normally distributed, then the probability of making a specific measurement is
|
|
|
(4.10) |
The probability of observing the set of measurements is then the product of the individual probabilities,
|
|
|
(4.11) |
In order to find the best values for the parameters ak, the probability in equation (4.11) must be a maximum. This is accomplished by finding those values that minimize the argument of the exponent, i.e., minimizing
|
|
|
(4.12) |
The magnitude of c2 is determined by four factors:
1) Fluctuations in the measured values of the variables yi;
2) The values assigned to the uncertainties si in the measured variables yi;
3) The selection of the analytical function y(x) as an approximation to the true function y0(x); and
4) The values of the parameters of the function y(x).
Our objective is to find the best values for these parameters.
To find the values of the coefficients that yield the minimum value of c2 we must set to zero the partial derivatives of c2 with respect to each of the parameters
|
|
|
(4.13) |
This yields a set of linear equations which must be solved for the coefficients ak.
Rather than focus on how to solve equations (4.13) in general, consider the special case of a linear relationship between x and y. In other words, let
|
|
y(x) = a + bx. |
(4.14) |
In this case, solution of equations (4.13) yield
|
|
|
(4.15) |
|
|
|
(4.16) |
Example:
An experiment to determine the half-life of an unknown radioactive source measures the counting rate as a function of time. From our general knowledge of radioactive decays, the counting rate as a function of time goes as
.
This can be turned into a linear equation by taking the natural logarithm of both sides to yield
.
Comparison with equation (4.14) yields
.
Then the half-life can be calculated using the program halflife.c.
How do we know when the result is a good fit? The chi-squared parameter gives a good indication of this. In general, the value of the chi-squared parameter should be approximately equal to the number of degrees of freedom,
|
|
|
(4.17) |
where N is the number of data points and m is the number of calculated parameters. Instead of working directly with the chi-squared value, it is often useful to look at the reduced chi-square, which is defined to be
|
|
|
(4.18) |
This makes interpretation of the results much easier. If the value of <c2> is much larger than 1, this indicates that the fit is a poor one. However, do not be deceived into thinking that this means very small values indicate an excellent fit. If <c2> is very small, this usually indicates that something is missing from the data. Either your understanding of the experiment is wrong or the uncertainties have been underestimated. Values of <c2> that are approximately one are the best and indicate a good fit between the theory and the data.
There are some limitations to the least squares method. When a curve is fitted by the method to a data set, the data must first be histogrammed. This means that the range over which the independent variable changes must be divided into intervals, and the number of data points in each interval counted. If the data varies linearly with the independent variable, this is not a problem. However, if the variation is more complicated, fine details of the variation which are important can be lost if the binning interval is too course. On the other hand, if the binning interval is too fine, the number of counts in each bin might be too small to justify the assumption that a Gaussian distribution can approximate the data. In general, the choice of bin width will be a compromise between the need for sufficient statistics to maintain a small relative error and the need to preserve interesting structure in the data.
How large should the bins be? If the experiment is a counting experiment, then the distribution
is actually a Poisson distribution. In
this case, a good first order approximation for the number of data points per
bin is at least ten. This is because
the difference between the probability predictions of the two distributions for
a mean value of 10 and a standard deviation of
is small. Thus, we may be fairly confident about the
results of a fit if none of the bins have less than 10 counts and if the
chi-square value is not relied upon too heavily.
Finally, it again needs to be emphasized that least squares fitting only works for curves that can be put into some form of polynomial. When this cant be accomplished, more general methods for searching the parameter space for the maximum likelihood must be used.
If we have a set of N points that we wish to fit with a polynomial, function theory shows that the highest order available to the polynomial is N-1. Fitting such a polynomial to the data would, in all likelihood, result in a poor curve. The polynomial will obviously coincide with the data points, but it may exhibit large oscillations between the points. In addition, if there are many points in the data set, the calculations to fit the polynomial can become very difficult.
So how do we fit a function to the points smoothly? Instead of using a N-1th order polynomial, we could break the curve into smaller segments and fit each one separately. However, if we are not careful, the curve becomes disjointed at the points where the segments are joined. This results in a plot that probably does not represent the function. What is truly needed is a method of stitching the segments together to insure they are continuous. This is the method known as cubic splines.
Suppose we choose to make a series of cubic fits to successive groups of data points. In order to produce a smooth curve, we not only want the curve to pass through the data points, but also require the first and second derivatives of the function to be continuous at each point. Thus, we can consider N-1 separate cubic polynomials for the N-1 intervals on the graph and, at each point, require that the value of the left-side polynomial, along with its first and second derivatives, match the value and derivatives of the right-side polynomial.
Begin by writing the Taylor series for the cubic polynomial for interval i, expanded about the point xi,
|
|
|
(4.19) |
where yi and yi signify the first and second derivatives evaluated at x = xi, and the third derivative has been replaced by its divided difference form, which is exact for a cubic function. Setting x = xi+1 = xi + h, substituting this into equation (4.19) to get yi+1, and taking the difference yields
|
|
|
(4.20) |
Performing this same calculation for yi-1 yields
|
|
|
(4.21) |
The continuity conditions are established by taking the first derivative of equation (4.19), then setting the derivatives equal at the boundaries (xi for the i-1 i boundary, and xi+1 for the i i+1 boundary). This yields
|
|
|
(4.22) |
and
|
|
|
(4.23) |
Notice that calculating the second derivative and applying the continuity conditions leads to an identity due to the fact that the use of the divided difference form for the third derivative assures continuity of the second derivative across the boundaries. The spline equation can be found by using equations (4.22) and (4.23) to eliminate the first derivatives from equations (4.20) and (4.21),
|
|
|
(4.24) |
The right hand side is proportional to the second differences in the tabulated data, and thus are all known.
The set of linear equations inherent in equation (4.24) can be solved for the second derivatives as long as we know the values y1 and yN. There are a number of ways these endpoint values can be determined. One possibility is to set them to 0, obtaining what are known as natural splines. Alternatively, if the second derivatives are known or can be calculated numerically, the true values can be used.
Instead of using a program to solve equation (4.24), a spreadsheet can be used. In this case, when the boundary values y1 and yN are supplied, the program will automatically readjust the variables until they stabilize at the appropriate solutions. However, care must be taken no matter what method is used. It is apparent from equation (4.24) that the choice of the second derivative at the boundary may have an important effect on the interpolation at the ends of the function, and a wrong choice can produce undesirable shapes at the end of the plot. Also, although the cubic spline method produces a smooth variation between the data points, with continuity of the function along with its first and second derivatives, it cannot guarantee that there will be no particular oscillations between the points.