Physics Lab
Uncertainty, Error Analysis, and All That

Last revised June 11, 2007

This is a page of information supporting the physics laboratory courses supervised by Prof. Cox in the Department of Physics/Geosciences at Texas A&M University-Kingsville
Written by Prof. Cox; parts based on suggestions from Prof. Suson.


What is error analysis?

First, "error analysis" has nothing to do with mistakes; "uncertainty analysis" is a much more descriptive term. Further, such an analysis is completely separate from whether your data agree with theory or expectations. This analysis is entirely about how reliable are YOUR results, based on how you determined those results. It is not concerned with anyone else's results or anyone's expectations, just your results.

Uncertainty analysis starts at the level of your measurements. No measurement is perfectly precise, and it is an important part of honest measurement to communicate how precise that measurement is. In everyday usage, it is usually accepted that the level of precision in a measurement can be sufficiently indicated through use of the practice referred to as significant digits. In a professional science lab, that usage is generally not good enough, and in the College Physics and University Physics laboratories here, that usage is not supposed to be acceptable.

Before going further, you need to be aware of the difference between precision and accuracy. In brief, a value is precise if it can justifiably be expressed with a lot of significant digits; one measure of precision is simply the number of justifiable significant digits. A value is accurate if its result agrees with what would be obtained using standard instruments and procedures. To illustrate: suppose your measuring instument is a steel surveyor's tape, marked in millimeters, manufactured for use in indoor locations at temperatures of 20-25 degrees Celsius. If you must use it to measure the exterior of a shed in Antarctica in winter, your result can be quite precise, with four significant digits, but it won't be accurate because steel contracts appreciably over that much temperature change. (However, if you also know the temperature where you were measuring, you can apply a correction to obtain a result which is both reasonably precise and reasonably accurate. In contrast, if you use the fact that the tape has contracted as an excuse to justify neglecting precision, no correction can ever provide a more useful result.) On the other hand, if you use the same tape indoors to measure the length of a lively puppy, your result will be accurate but not precise, because the puppy will be wiggling.

Uncertainty analysis is most important in situations where you should be nearing the reliability limits of your measurement tools. If repeated measurements of the same quantity don't come out different, then either your instruments or your methods aren't sensitive enough for really fruitful investigation of something new. In this case, you should pursue one of two options: improve either your instruments or your methods, until fluctuations do appear.

For an analogy, consider the casino business. What is the outcome of a single spin of a roulette wheel? If there were any way to predict it, casinos would be out of business (or at least would no longer offer roulette) very soon. But what will be the net outcome of the next hundred thousand spins? This is sure enough that casinos are consistent money-making businesses despite all the expensive glitter and the big payouts that they advertise. They will keep, consistently, a very predictable number of thousands of dollars out of each and every million dollars wagered. But if you track the exact amount of money going to the casino for each successive million dollars wagered, you would see variations at the level of at least several dollars between the house's shares of, say, the first million wagered one day and the first million the next day. You probably won't see those fluctuations by looking at casino daily report figures; you will see them easily if you have access to track the casino's cash transactions to the penny.

Single Measurements

Each separate measurement value has an uncertainty, which should be recorded along with the result of the measurement. Unless some other criterion is spelled out, the accepted meaning of this uncertainty is that you assert the following: A plot of the results of a large number of measurements should give a curve, usually a "bell" curve, whose mean (average) is your value and whose standard deviation is your measurement uncertainty estimate. In more familiar terms: the true value is most likely within this range of your result, but there is a chance, around a one-in-three chance, that the true value could be farther off.

Another approach to estimating your measurement uncertainty is this: In a 'normal' distribution, a standard bell curve, the area within one standard deviation is 67%; the area within three standard deviations is 99%: nearly a certainty. Hence if you determine that the result is, say, "certainly within 1 cm" of your best value, then that margin respresents, presumably, a three-standard-deviation width, and a reasonable estimate of the one-standard-deviation uncertainty would thus be a third of that, or about 0.3 cm.

Your initial knowledge of this uncertainty comes from how you conducted the measurement and from what your measuring tools are like. Usually you have to line up a ruler or equivalent with an object. (Reading a dial is an example; here the 'ruler' is the scale next to the needle and the 'object' is the needle.) How accurately can you determine how the object is aligned with respect to the marks on the ruler? This is frequently dependent on what your measurement comes out to; if there is a ruler mark at just the 'right' place, the uncertainty should be considerably smaller than when the nearest mark is a third of a division away. In either case, this accuracy is of course somewhat subjective; some people have better eyes than other people, or think they do. But usually there will be less than a factor-of-two difference between the most optimistic and most pessimistic estimates of such an alignment uncertainty, if the value being measured is steady.

If your measuring instrument has only a digital display, and that display is steady, then what you use as reading uncertainty is easy to determine; it is +/- one-half of the interval between adjacent possible readings. Thus, for a digital stopwatch which shows hundredths of a second, the reading uncertainty is +/- 0.005 s. However, I have seen a digital thermometer which showed only even integers for the Fahrenheit temperature; the uncertainty in a reading from it would be +/- 1 degree.

If your needle, or your digital dislay, or the phenomenon you are measuring, is not steady, then the uncertainty is almost certainly larger than for a similar measurement that is steady. In this case you need to judge the center and the extent of the fluctuations; the center gives the reading value and the extent of the fluctuations gives you the basis for estimating the uncertainty.

Another important concept to consider at this point is systematic uncertainty. Suppose your boss wants the most accurate estimate possible of the area of the shed that blew away in a hurricane. The data you have are a good set of multiple measurements taken with that surveyor's tape discussed earlier, but no one noted the temperature that day. The data will give you a precise estimate of area, including the uncertainty associated with the random scatter among repeated measurements, but your area calculation's accuracy estimate will have to also include an allowance for the effect of expansion or contraction of the tape. This is an example of "systematic uncertainty", an uncertainty affecting each measurement in a set in the same way, as opposed to random or statistical uncertainty, where the chance that one measurement may give a result that is too high is potentially (partially) offset by the chance that another measurement may make the result low.

Systematic uncertainties are the hardest to deal with. Repeating the measurements cannot be used to reduce them, because these effects will be present in the same way each time, unlike random effects which will be different each time and hence will tend to cancel somewhat in averages. Systematic uncertainties can only be determined through use of controls and calibration runs. That is, every so often, you take an instrument off its usual task and use it to measure a standard object. Typically, it will register a value that is slightly off; you then adjust it. At the next check, probably it will be off again, but differently. Perhaps some contact surface is wearing away, a few molecules at a time, or it is picking up molecules from what it is contacting, or both. For the first data run after you adjust it, it should be pretty good; for the last data run before you do a recheck, the adjustment that the recheck calls for should give valid data; but in between, who knows just how big the effect will be? - your allowance for this goes into your estimated systematic uncertainty. The more test runs you can manage, the less such 'calibration drift' hurts your data, but the greater your costs are too. In practice, you reduce your estimated systematic uncertainties as much as you can afford through frequent calibration runs, but sometimes doing another run costs more (sometimes more money, sometims just more time) than everything else involved, so your options may be limited.

Multiple Measurements

If your situation allows repeating your measurement, then you can obtain a much less subjective value for uncertainty, by analyzing the scatter in repeated measurements.

For obvious reasons, if you have multiple measurements of the same quantity, then their average should be a better estimate for that quantity than any one of the measurements. But the scatter in those measurements also tells you about what you're measuring.

However, the fact that you have multiple measurements does not necessarily mean that their average is a better result. Suppose you measure the current when each of three resistors, rated at 10, 20, and 30 ohms respectively, is connected to a battery. What does the average of the three currents tell you? Answer, nothing worth reporting. These may be three measurements of current, but they are not three measurements of the current under the same circumstances.

In our labs, sometimes measurements repeated many times will show no difference. Two conclusions follow from such observations: (1) the instrument is not as sensitive as one would ideally want; and (2) the uncertainty estimate you use will be the same as the uncertainty estimate for an individual reading, as described above.

If repetition of measurements does produce variation, then you can use statistical techniques. If you have, say, ten measured values, then there are two typical accuracy-related questions that you can answer. These questions are similar, but they are different. First, what is likely about any single new measurement? Second, what is likely about the average of any similar group of new measurements?

Underlying such predictions is an assumption that the variation in your measurements is indeed due to independent random effects. Then the frequencies of the various possible outcomes will follow a 'normal' curve (a bell curve), and so should your sample. In fact, the bell curve for your sample should be a good match for the bell curve of all possible measurement outcomes. Hence we expect you to calculate the sample standard deviation, as a measure of the uncertainty in any one measurement. (Note the following, though. The smaller the sample, the more likely it is that one or both extremes of the true distribution will not be represented in the sample. Hence a "sample standard deviation" for a sample containing N items is calculated with a denominator of N-1, while a "population standard deviation" for an entire population consisting of N items would use N.) This standard deviation is the estimate of how far off any INDIVIDUAL measurement may be from the true value: a deviation of over one standard deviation has about a one in three chance of occurring.

However, this standard deviation does NOT say how far off your AVERAGE might be from whatever the true value is. While sample averages should show random variation from one sample to another, that variation should have a smaller range than the variation from one measurement to another. Statistical theorems assert that means from N-measurement samples will have a distribution which is square-root-of-N times narrower than the distribution of single measurements. (Provided the variations are random. If your data show no variation, then your uncertainties are controlled by instrument-reading uncertainties, not random uncertainties, and statistical analysis cannot improve that.) Hence, given a sample, and assuming the variation in your sample is random, your best guess for the true value is your sample average, and the uncertainty in that best value is not the standard deviation but the standard deviation of the mean, which we frequently refer to as "the uncertainty".

Another way of distinguishing these measures is to think this way: If you take a really large number of measurements and plot them on a suitable scale, you will get a broad curve. The standard deviation describes how broad this curve is. The uncertainty, however, is supposed to describe how well you know where the true center of the curve is. If you have one experiment with a hundred data points and a similar one with ten thousand data points, the bell curves should be very similar, having the same width, but you should have a lot more confidence as to exactly where the center is when you have the ten thousand points.

This uncertainty, the standard deviation of the mean, also describes what is likely for the mean of a set of new measurements, just as the standard deviation describes what is likely for a single new measurement.

An example of a wording intended to make this distinction (and your understanding of it) clear to the reader (or grader) of your results is the following hypothetical statement about a fence consisting of vertical poles: "All the poles were measured, finding that they extend (2.34 +/- 0.12) m above ground. This gives an average height for the fence of (2.34 +/- 0.02) m."

Calculated Results: Propagation of Uncertainties

It is quite rare that a relevant experiment involves a direct measurement of the quantity we are interested in. Even such a familiar quantity as speed is an example. To the everyday driver, a radar gun seems to be an instrument which directly measures speed; a physicist knows that it doesn't measure speed at all, it compares frequencies, determining speed from the difference. If you try to measure speed using familiar equipment, the indirect nature of your result is more evident: you directly measure lengths and times, you then get speed from a calculation.

Once you know the uncertainties in your data values, you turn to determining the uncertainties in your results. This is "propagation of uncertainty", or "propagation of errors." If you need to be prepared for the worst, then estimating uncertainties in the results by combining maximum and minimum cases will do. But science does not usually treat worst cases in discussing experimental results; you are normally expected to quote and use one-standard-deviation uncertainties, and 'worse' values are expected to turn out to be true approximately one time in three.

When you are reporting a final answer whose uncertainty is essentially due to only one of the inputs, then determining the final uncertainty is simple. (For instance, if you are determining acceleration with wood meter sticks and computer-monitored photogates for timing, then the effect of timing uncertainty on your results is going to be pretty negligible. The uncertainty in (d/t2) will be (uncertainty in d)/t2, and the effect of uncertainty in t is negligible.)
If you have several inputs whose uncertainties are appreciable, you can determine how much output uncertainty would come from each input uncertainty considered separately, but how do you combine the effects of different inputs?
Consider an illustration. Suppose a contractor has to do a lot of concrete sidewalks in a new neighborhood, ten a day for several weeks. The developer's plans specified they should all be 6 m by 2 m, so instead of measuring each time, the contractor cut some wood into 10 sets of 6-m and 2-m lengths, cutting two at a time, to use as frames. For each sidewalk, he used a longer pair to mark the length and a shorter pair for width; at the end of the day he went back and collected the wood to use again the next day for more sidewalks. The wood lengths were supposed to be identical, but later measurement found that because he was hurrying, one long pair was cut uniformly 2% too long, two pairs were uniformly 1% long, four pairs were right, two pairs were 1% short, and one pair was 2% short; and the same percentage pattern (roughly a bell curve) was true for the shorter board pairs.
First question: what are the mean and standard deviation of the long-board lengths? Answers: mean is 6.00 m; population standard deviation, as a percentage, is approximately 1.1%. And for the short boards? 2.00 m and 1.1%.
Now assume that in ten days, 100 sidewalks, he happens to use each possible combination of 6-m and 2-m pairs once. (For a random sample, this outcome is extremely unlikely. But it describes exactly the population of all possible outcomes, so I use it for this calculation.) The mean area of the sidewalks will be 12 m2, because the deviations are symmetric; what will be the standard deviation of the areas?
If all the short boards were right and only long boards were off, the 1.1% deviation in length would mean a 1.1% deviation in area; a similar calculation would apply for widths. Does that mean the areas will have a standard deviation of (1.1% + 1.1%), or 2.2%?
We can calculate. Longest "6-m" boards with longest "2-m" boards happened once in ten days; 6.12 m by 2.04 m gives 12.5 m2 or 4% extra area. Longest "6-m" with next-to-longest widths happened twice, since there were two next-to-longest sets, and gives 3% extra area. Etc. Overall, +/-4% happened once each; +/-3% four times each; +/-2% 12 times each (this includes +/-2% with correct as well as +/-1% with +/-1%); +/-1% 20 times each; 0% happened 26 times. The population standard deviation for these hundred cases is about 1.55%. (Calculate it.) NOT 2.2%.

General probability theory tells us that for independent random variations, the correct estimate of the overall output uncertainty is a Pythagorean combination of the effects of the uncertainties in the different inputs. In our example, uncertainty in length contributed 1.1% to uncertainty in area; uncertainty in width contributed another 1.1%. We combine these like combining components of a vector to get the magnitude; since in our case we had two equal components of 1.1% each, the magnitude is sqrt(2) times 1.1%, which agrees with our calculated result of about 1.55%.
In more general, more mathematical, terms, if input1's uncertainty, considered by itself, would cause an output uncertainty of d(Out1), input2's uncertainty would cause d(Out2), etc., AND if the uncertainties are both independent and random, then the proper overall uncertainty estimate for the output is obtained by treating all the d(Out)'s as components of a many-dimensional vector and finding the size of that vector.

If the uncertainties in the inputs to your calculations are not independent or not random, then your reporting should (1) indicate why you think they are not, and (2) discuss what you used to arrive at the uncertainty in your output instead of using the approaches described here. (The "error analysis" book which we expect you to have is a recommended resource for this type of case.) The most pessimistic estimate would be that all the discrepancies will always favor being off on the same side, in which case straight addition of the output uncertainty contributions would be realistic. But Murphy is seldom that completely against you.


Return to Dr. Cox's homepage, from which you can select your course's homepage; or, use your browser's Back button.