When we calculate a statistic for example, a mean, a variance, a proportion, or a correlation coefficient, there is no reason to expect that such point estimate would be exactly equal to the true population value, even with increasing sample sizes. There are always sampling inaccuracies, or error. In most Six Sigma projects, there are at least some descriptive statistics calculated from sample data. In truth, it cannot be said that such data are the same as the population’s true mean, variance, or proportion value. There are many situations in which it is preferable instead to express an interval in which we would expect to find the true population value. This interval is called an interval estimate. A confidence interval is an interval, calculated from the sample data, that is very likely to cover the unknown mean, variance, or proportion. For example, after a process improvement a sampling has shown that its yield has improved from 78% to 83%. But, what is the interval in which the population’s yield lies? If the lower end of the interval is 78% or less, you cannot say with any statistical certainty that there has been a significant improvement to the process. There is an error of estimation, or margin of error, or standard error, between the sample statistic and the population value of that statistic. The confidence interval defines that margin of error. The next page shows a decision tree for selecting which formula to use for each situation. For example, if you are dealing with a sample mean and you do not know the population’s true variance (standard deviation squared) or the sample size is less than 30, than you use the t Distribution confidence interval. Each of these applications will be shown in turn.

## Confidence Intervals in Six Sigma Methodology

Confidence intervals are very important to Six Sigma methodology. To understand Confidence Intervals better, consider this example scenario: Acme Nelson, a leading market research firm conducts a survey among voters in USA asking them whom would they vote if elections were to be held today. The answer was a big surprise! In addition to Democrats and Republicans, there is this surprise independent candidate, John Doe who is expected to secure 22% of the vote. We asked Acme, how sure are you? In other words how accurate is this prediction? Their answer: “Well, we are 95% confident that John Doe will get 22% plus or minus 2% vote” In the statistical world, they are saying that John Doe will get a vote between 20% and 24% (also known is Confidence Range) with a probability of 95% (Confidence Level).

## Definition of Confidence Intervals

According to University of Glasgow Department of Statistics, Confidence Interval is defined as: A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. If independent samples are taken repeatedly from the same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the unknown parameter. In our Acme research example

- The confidence interval is the range 20 to 24
- The confidence level is 95%
- The confidence limits are 20 (lower limit) and 24 (upper limit)
- The unknown population parameter is “What percentage of the total vote will John Doe Get”