MAT2792-ABED_1: The Normal Curve

B. The Normal Curve

A histogram is a useful tool for visualizing data. However, analyzing a histogram in search of specific details is difficult. An alternative to analyzing a histogram is to analyze a smooth curve that follows the histogram's shape. You will see in this Lesson how the analysis of data is simpler when represented by a normal curve.

A normal curve, sometimes called the Gaussian curve or bell curve, is a symmetrical curve that represents a normal distribution.

The next section compares a histogram made from normal data to a normal curve. The histogram and normal curve are both based on the data below.

Value	6.5	7.5	8.5	9.5	10.5	11.5	12.5
Frequency	1	6	24	38	24	6	1

The mean, median, and mode of the data are all 9.5. The mean, median, and mode all occur in the middle of the data for a normal histogram and a normal curve.

A histogram is made up of individual pieces of data so you can think of a histogram as being made up of equal size rectangles, one for each data value. Because the rectangles are of equal size, you can use their areas for comparison. 100 pieces of data are used in the histogram, so the total area is 100 rectangles. The total area under any normal curve is 1, the 'whole' of the data.

You can use the area of smaller sections to make comparisons. Notice that the highlighted area in each of the following diagrams is 24% of the total area. This means that 24% of data lies between 10 and 11.

The standard deviation of this data is approximately 1. Notice that almost all of the data is within three standard deviations of the mean. (In this case, if you add the standard deviation to the mean three times, you get 12.5. If you subtract the standard deviation from the mean three times you get 6.5.)

One major difference between a histogram and a normal curve is that the data in a histogram has a left and a right endpoint—the extremes of the data provided. In the example, all data is between 6 and 13. In contrast, a normal curve does not have a left or right endpoint, but continues forever in both directions. When data is normally distributed, the chance of encountering a data value outside of three standard deviations is very low and outside five standard deviations is negligible.