why do we use box and whisker plots
A box and whisker plot (sometimes called a boxplot) is a graph that presents
information from a five-number summary. It does not show a distribution in as
much detail as a stem and leaf plot or histogram does, but is especially useful
for indicating whether a distribution is skewed and whether there are potential
unusual observations (
) in the data set. Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared. (See the section on for more information. ) are ideal for comparing distributions because the centre, spread and overall range are immediately apparent. A box and whisker plot is a way of summarizing a set of data measured on an interval scale. It is often used in explanatory data analysis. This type of graph is used to show the shape of the distribution, its central value, and its variability. the ends of the box are the upper and lower quartiles, so the box spans the the whiskers are the two lines outside the box that extend to the highest and lowest observations. Like, Carl works at a computer store. He also recorded the number of sales he made each month. In the past 12 months, he sold the following numbers of computers: 51, 17, 25, 39, 7, 49, 62, 41, 20, 6, 43, 13. Give a five-number summary of Carl's and Angela's sales.
Make two box and whisker plots, one for Angela's sales and one for Carl's. Briefly describe the comparisons between their sales. Answers First, put the data in ascending order. Then find the. 6, 7, 13, 17, 20, 25, 39, 41, 43, 49, 51, 62. Median = (12th + 1st) 2 = 6. 5th value = 32 There are six numbers below the median, namely: 6, 7, 13, 17, 20, 25. = (6 + 1 ) 2= 3. 5 = 15 Here are six numbers above the median, namely: 39, 41, 43, 49, 51, 62. = (6 + 1) 2= 3. 5 = 46 The five-number summary for Carl's sales is 6, 15, 32, 46, 62. Using the same calculations, we can determine that the five-number summary for Angela is 1, 17, 26, 42, 57. that box and whisker plots can be drawn either vertically or horizontally. Carl's highest and lowest sales are both higher than Angela's corresponding sales, and Carl's median sales figure is higher than Angela's. Also, Carl's interquartile range is larger than Angela's. These results suggest that Carl consistently sells more computers than Angela does. There are several ways to describe the centre and spread of a distribution. One way to present this information is with a five-number summary. It uses the median as its centre value and gives a brief picture of the other important distribution values. Another measure of spread uses the mean and standard deviation to decipher the spread of data.
This technique, however, is best used with symmetrical distributions with no outliers. Despite this restriction, the mean and standard deviation measures are used more commonly than the five-number summary. The reason for this is that many natural phenomena can be approximately described by a normal distribution. And for normal distributions, the mean and standard deviation are the best measures of centre and spread respectively. Standard deviation takes every value into account, has extremely useful properties when used with a normal distribution, and is mathematically manageable. But the standard deviation is not a good measure of spread in highly skewed distributions and, in these instances, should be supplemented by other measures such as the semi-quartile range. The semi-quartile range is rarely used as a measure of spread, partly because it is not as manageable as others. Still, it is a useful statistic because it is less influenced by extreme values than the standard deviation, is less subject to sampling fluctuations in highly skewed distributions and is limited to only two values Q. However, it cannot stand alone as a measure of spread. A box and whisker plot is a graphical method of displaying variation in a set of data.
In most cases a histogram provides a sufficient display; however, a box and whisker plot can provide additional detail while allowing multiple sets of data to be displayed in the same graph. Some types are called box and whisker plots with outliers. Why Use a Box and Whisker Plot? Box and whisker plots are very effective and easy to read. They summarize data from multiple sources and display the results in a single graph. Box and whisker plots allow for comparison of data from different categories for easier, more effective decision-making. Use box and whisker plots when you have multiple data sets from independent sources that are related to each other in some way. Examples include test scores between schools or classrooms, data from before and after a process change, similar features on one part such as cam shaft lobes, or data from duplicate machines manufacturing the same products. A box and whisker plot is developed from five statistics. For example, given the following 20 data points, the five required statistics are displayed. Note that for a data set with an even number of values, the median is calculated as the average of the two middle values. Here are the data represented in box and whisker plot format. Left: The center represents the middle 50%, or 50th percentile of the data set and is derived using the lower and upper quartile values.
The median value is displayed inside the box. The maximum and minimum values are displayed with vertical lines ( whiskers ) connecting the points to the center box. Right: For comparison, a histogram of the data is also shown, showing the frequency of each value in the data set. Suppose you wanted to compare the performance of three lathes responsible for the rough turning of a motor shaft. The design specification is 18. 85 +/- 0. 1 mm. Diameter measurements from a sample of shafts taken from each roughing lathe are displayed in a box and whisker plot. Lathe 1 appears to be making good parts, and is centered in the tolerance. Lathe 2 appears to have excess variation, and is making shafts below the minimum diameter. Lathe 3 is performing with relatively less variation than Lathe 2; however, it is centered on the lower side of the specification and is making shafts below specification. Most software packages that perform statistical analysis can create box and whisker plots. References Juran, J. M. and Frank M. Gryna, Juran s Quality Control Handbook, Fourth Edition, McGraw Hill, Inc. , 1988. Wortman, Bill, Certified Six Sigma Black Belt Primer, Revision 13, Quality Council of Indiana.
- Views: 74
why do we use stem and leaf plots
why do we use mean median and mode
why do we use a chi square test
why do we test the null hypothesis
why do we take log of data
why do we need mean median and mode
why do scientists use graphs when analyzing data