Last revised June 11, 2007
This is a page of information supporting the physics laboratory
courses supervised by
Prof. Cox in the
Department of Physics/Geosciences at
Texas A&M
University-Kingsville
Written by Prof. Cox; parts based on suggestions from Prof. Suson.
Uncertainty analysis starts at the level of your measurements. No measurement is perfectly precise, and it is an important part of honest measurement to communicate how precise that measurement is. In everyday usage, it is usually accepted that the level of precision in a measurement can be sufficiently indicated through use of the practice referred to as significant digits. In a professional science lab, that usage is generally not good enough, and in the College Physics and University Physics laboratories here, that usage is not supposed to be acceptable.
Before going further, you need to be aware of the difference between precision and accuracy. In brief, a value is precise if it can justifiably be expressed with a lot of significant digits; one measure of precision is simply the number of justifiable significant digits. A value is accurate if its result agrees with what would be obtained using standard instruments and procedures. To illustrate: suppose your measuring instument is a steel surveyor's tape, marked in millimeters, manufactured for use in indoor locations at temperatures of 20-25 degrees Celsius. If you must use it to measure the exterior of a shed in Antarctica in winter, your result can be quite precise, with four significant digits, but it won't be accurate because steel contracts appreciably over that much temperature change. (However, if you also know the temperature where you were measuring, you can apply a correction to obtain a result which is both reasonably precise and reasonably accurate. In contrast, if you use the fact that the tape has contracted as an excuse to justify neglecting precision, no correction can ever provide a more useful result.) On the other hand, if you use the same tape indoors to measure the length of a lively puppy, your result will be accurate but not precise, because the puppy will be wiggling.
Uncertainty analysis is most important in situations where you should be nearing the reliability limits of your measurement tools. If repeated measurements of the same quantity don't come out different, then either your instruments or your methods aren't sensitive enough for really fruitful investigation of something new. In this case, you should pursue one of two options: improve either your instruments or your methods, until fluctuations do appear.
For an analogy, consider the casino business. What is the outcome of a single spin of a roulette wheel? If there were any way to predict it, casinos would be out of business (or at least would no longer offer roulette) very soon. But what will be the net outcome of the next hundred thousand spins? This is sure enough that casinos are consistent money-making businesses despite all the expensive glitter and the big payouts that they advertise. They will keep, consistently, a very predictable number of thousands of dollars out of each and every million dollars wagered. But if you track the exact amount of money going to the casino for each successive million dollars wagered, you would see variations at the level of at least several dollars between the house's shares of, say, the first million wagered one day and the first million the next day. You probably won't see those fluctuations by looking at casino daily report figures; you will see them easily if you have access to track the casino's cash transactions to the penny.
Another approach to estimating your measurement uncertainty is this: In a 'normal' distribution, a standard bell curve, the area within one standard deviation is 67%; the area within three standard deviations is 99%: nearly a certainty. Hence if you determine that the result is, say, "certainly within 1 cm" of your best value, then that margin respresents, presumably, a three-standard-deviation width, and a reasonable estimate of the one-standard-deviation uncertainty would thus be a third of that, or about 0.3 cm.
Your initial knowledge of this uncertainty comes from how you conducted the measurement and from what your measuring tools are like. Usually you have to line up a ruler or equivalent with an object. (Reading a dial is an example; here the 'ruler' is the scale next to the needle and the 'object' is the needle.) How accurately can you determine how the object is aligned with respect to the marks on the ruler? This is frequently dependent on what your measurement comes out to; if there is a ruler mark at just the 'right' place, the uncertainty should be considerably smaller than when the nearest mark is a third of a division away. In either case, this accuracy is of course somewhat subjective; some people have better eyes than other people, or think they do. But usually there will be less than a factor-of-two difference between the most optimistic and most pessimistic estimates of such an alignment uncertainty, if the value being measured is steady.
If your measuring instrument has only a digital display, and that display is steady, then what you use as reading uncertainty is easy to determine; it is +/- one-half of the interval between adjacent possible readings. Thus, for a digital stopwatch which shows hundredths of a second, the reading uncertainty is +/- 0.005 s. However, I have seen a digital thermometer which showed only even integers for the Fahrenheit temperature; the uncertainty in a reading from it would be +/- 1 degree.
If your needle, or your digital dislay, or the phenomenon you are measuring, is not steady, then the uncertainty is almost certainly larger than for a similar measurement that is steady. In this case you need to judge the center and the extent of the fluctuations; the center gives the reading value and the extent of the fluctuations gives you the basis for estimating the uncertainty.
Another important concept to consider at this point is systematic uncertainty. Suppose your boss wants the most accurate estimate possible of the area of the shed that blew away in a hurricane. The data you have are a good set of multiple measurements taken with that surveyor's tape discussed earlier, but no one noted the temperature that day. The data will give you a precise estimate of area, including the uncertainty associated with the random scatter among repeated measurements, but your area calculation's accuracy estimate will have to also include an allowance for the effect of expansion or contraction of the tape. This is an example of "systematic uncertainty", an uncertainty affecting each measurement in a set in the same way, as opposed to random or statistical uncertainty, where the chance that one measurement may give a result that is too high is potentially (partially) offset by the chance that another measurement may make the result low.
Systematic uncertainties are the hardest to deal with. Repeating the measurements cannot be used to reduce them, because these effects will be present in the same way each time, unlike random effects which will be different each time and hence will tend to cancel somewhat in averages. Systematic uncertainties can only be determined through use of controls and calibration runs. That is, every so often, you take an instrument off its usual task and use it to measure a standard object. Typically, it will register a value that is slightly off; you then adjust it. At the next check, probably it will be off again, but differently. Perhaps some contact surface is wearing away, a few molecules at a time, or it is picking up molecules from what it is contacting, or both. For the first data run after you adjust it, it should be pretty good; for the last data run before you do a recheck, the adjustment that the recheck calls for should give valid data; but in between, who knows just how big the effect will be? - your allowance for this goes into your estimated systematic uncertainty. The more test runs you can manage, the less such 'calibration drift' hurts your data, but the greater your costs are too. In practice, you reduce your estimated systematic uncertainties as much as you can afford through frequent calibration runs, but sometimes doing another run costs more (sometimes more money, sometims just more time) than everything else involved, so your options may be limited.
For obvious reasons, if you have multiple measurements of the same quantity, then their average should be a better estimate for that quantity than any one of the measurements. But the scatter in those measurements also tells you about what you're measuring.
However, the fact that you have multiple measurements does not necessarily mean that their average is a better result. Suppose you measure the current when each of three resistors, rated at 10, 20, and 30 ohms respectively, is connected to a battery. What does the average of the three currents tell you? Answer, nothing worth reporting. These may be three measurements of current, but they are not three measurements of the current under the same circumstances.
In our labs, sometimes measurements repeated many times will show no difference. Two conclusions follow from such observations: (1) the instrument is not as sensitive as one would ideally want; and (2) the uncertainty estimate you use will be the same as the uncertainty estimate for an individual reading, as described above.
If repetition of measurements does produce variation, then you can use statistical techniques. If you have, say, ten measured values, then there are two typical accuracy-related questions that you can answer. These questions are similar, but they are different. First, what is likely about any single new measurement? Second, what is likely about the average of any similar group of new measurements?
Underlying such predictions is an assumption that the variation in your measurements is indeed due to independent random effects. Then the frequencies of the various possible outcomes will follow a 'normal' curve (a bell curve), and so should your sample. In fact, the bell curve for your sample should be a good match for the bell curve of all possible measurement outcomes. Hence we expect you to calculate the sample standard deviation, as a measure of the uncertainty in any one measurement. (Note the following, though. The smaller the sample, the more likely it is that one or both extremes of the true distribution will not be represented in the sample. Hence a "sample standard deviation" for a sample containing N items is calculated with a denominator of N-1, while a "population standard deviation" for an entire population consisting of N items would use N.) This standard deviation is the estimate of how far off any INDIVIDUAL measurement may be from the true value: a deviation of over one standard deviation has about a one in three chance of occurring.
However, this standard deviation does NOT say how far off your AVERAGE might be from whatever the true value is. While sample averages should show random variation from one sample to another, that variation should have a smaller range than the variation from one measurement to another. Statistical theorems assert that means from N-measurement samples will have a distribution which is square-root-of-N times narrower than the distribution of single measurements. (Provided the variations are random. If your data show no variation, then your uncertainties are controlled by instrument-reading uncertainties, not random uncertainties, and statistical analysis cannot improve that.) Hence, given a sample, and assuming the variation in your sample is random, your best guess for the true value is your sample average, and the uncertainty in that best value is not the standard deviation but the standard deviation of the mean, which we frequently refer to as "the uncertainty".
Another way of distinguishing these measures is to think this way: If you take a really large number of measurements and plot them on a suitable scale, you will get a broad curve. The standard deviation describes how broad this curve is. The uncertainty, however, is supposed to describe how well you know where the true center of the curve is. If you have one experiment with a hundred data points and a similar one with ten thousand data points, the bell curves should be very similar, having the same width, but you should have a lot more confidence as to exactly where the center is when you have the ten thousand points.
This uncertainty, the standard deviation of the mean, also describes what is likely for the mean of a set of new measurements, just as the standard deviation describes what is likely for a single new measurement.
An example of a wording intended to make this distinction (and your understanding of it) clear to the reader (or grader) of your results is the following hypothetical statement about a fence consisting of vertical poles: "All the poles were measured, finding that they extend (2.34 +/- 0.12) m above ground. This gives an average height for the fence of (2.34 +/- 0.02) m."
Once you know the uncertainties in your data values, you turn to determining the uncertainties in your results. This is "propagation of uncertainty", or "propagation of errors." If you need to be prepared for the worst, then estimating uncertainties in the results by combining maximum and minimum cases will do. But science does not usually treat worst cases in discussing experimental results; you are normally expected to quote and use one-standard-deviation uncertainties, and 'worse' values are expected to turn out to be true approximately one time in three.
When you are reporting a final answer whose uncertainty is
essentially due to only one of the inputs, then determining
the final uncertainty is simple. (For instance, if
you are determining acceleration with wood meter sticks and
computer-monitored photogates for timing, then the effect of
timing uncertainty on your results is going to be pretty
negligible. The uncertainty in (d/t2) will be
(uncertainty in d)/t2, and the effect of uncertainty
in t is negligible.)
If you have several inputs whose uncertainties are appreciable,
you can determine how much output uncertainty would come from
each input uncertainty considered separately, but how do you
combine the effects of different inputs?
Consider an illustration. Suppose a contractor has to do a
lot of concrete sidewalks in a new neighborhood, ten a day
for several weeks. The developer's plans specified they
should all be 6 m by 2 m, so instead of measuring each time,
the contractor cut some wood into 10 sets of 6-m and 2-m lengths,
cutting two at a time, to use as frames. For each sidewalk,
he used a longer pair to mark the length and a shorter pair
for width; at the end of the day he went back and collected
the wood to use again the next day for more sidewalks. The
wood lengths were supposed to be identical, but later
measurement found that because he was hurrying, one long
pair was cut uniformly 2% too long, two pairs were uniformly
1% long, four pairs were right, two pairs were 1% short, and
one pair was 2% short; and the same percentage pattern
(roughly a bell curve) was true for the shorter board pairs.
First question: what are the mean and standard deviation of
the long-board lengths? Answers: mean is 6.00 m; population
standard deviation, as a percentage, is approximately 1.1%.
And for the short boards? 2.00 m and 1.1%.
Now assume that in ten days, 100 sidewalks, he happens to use
each possible combination of 6-m and 2-m pairs once. (For a
random sample, this outcome is extremely unlikely. But it
describes exactly the population of all possible outcomes,
so I use it for this calculation.) The mean
area of the sidewalks will be 12 m2, because the
deviations are symmetric; what will be the standard deviation
of the areas?
If all the short boards were right and only long boards were
off, the 1.1% deviation in length would mean a 1.1% deviation
in area; a similar calculation would apply for widths. Does
that mean the areas will have a standard deviation of
(1.1% + 1.1%), or 2.2%?
We can calculate. Longest "6-m" boards with longest "2-m"
boards happened once in ten days; 6.12 m by 2.04 m gives
12.5 m2 or 4% extra area. Longest "6-m" with
next-to-longest widths happened twice, since there were two
next-to-longest sets, and gives 3% extra
area. Etc. Overall, +/-4% happened once each; +/-3% four times
each; +/-2% 12 times each (this includes +/-2% with correct as
well as +/-1% with +/-1%); +/-1% 20 times each; 0% happened
26 times. The population standard deviation for these hundred
cases is about 1.55%. (Calculate it.) NOT 2.2%.
General probability theory tells us that for independent
random variations, the correct estimate of the overall
output uncertainty is a Pythagorean combination of the
effects of the uncertainties in the different inputs. In
our example, uncertainty in length contributed 1.1% to
uncertainty in area; uncertainty in width contributed
another 1.1%. We combine these like combining components
of a vector to get the magnitude; since in our case
we had two equal components of 1.1% each, the magnitude is
sqrt(2) times 1.1%, which agrees with our calculated result of
about 1.55%.
In more general, more mathematical, terms, if
input1's uncertainty, considered by itself, would cause
an output uncertainty of d(Out1), input2's
uncertainty would cause d(Out2), etc., AND if the
uncertainties are both independent and random, then the
proper overall uncertainty estimate for the output is obtained
by treating all the d(Out)'s as components of a
many-dimensional vector and finding the size of that vector.
If the uncertainties in the inputs to your calculations are not independent or not random, then your reporting should (1) indicate why you think they are not, and (2) discuss what you used to arrive at the uncertainty in your output instead of using the approaches described here. (The "error analysis" book which we expect you to have is a recommended resource for this type of case.) The most pessimistic estimate would be that all the discrepancies will always favor being off on the same side, in which case straight addition of the output uncertainty contributions would be realistic. But Murphy is seldom that completely against you.