Hands-On Math

Last Revision March, 2011

Last Revision March, 2011

An excerpt from the abstract follows:

_____________________________________________

Although weather conditions and pea sizes vary from year to year, it has been found over the years that about 25% of the crop are small enough to be packaged as their Gourmet brand and about 15% of the crop are large enough to be used by their sister company, Handson Soups Inc. The medium size, 60% of the crop, is packaged as their Choice brand.

A variety of machines are used in pea processing. The hulls are removed by machine. Stems, small stones, grit, dirt and chaff are removed using techniques such as flotation, air blowers, vibration and screens. Some cleaning machines can be seen here.

The peas are all machine-harvested on a single day and placed in a chilled facility for processing.

To determine the two needed size thresholds for the crop, a mixture of 30,000 peas taken equally from all of the different growers is analyzed and a binned size distribution is obtained for the mix. This season 44 bins were employed. The bins are each 0.2 mm wide with centres ranging from 0.3 mm. through 8.9 mm. The average size

It does not appear easy to interpolate this binned distribution accurately, to determine the 25% and 85% size thresholds. A plot of the accumulated bin values would serve better. See next.

From this accumulation plot one could visually guess the 25% threshold as ~ 3.89 and the 85% as ~5.6.

If sieves were set up according to these values and the sample lot was processed, 20.81% would be classified as Gourmet peas and 17.38% would be classified as soup peas. Could better thresholds be found?

A similar question relates to a deviation from the mean. Some fraction of the first selection are a centimetre or more taller than the mean height. Would that fraction be much different for a second or third such selection?

Would you expect the fraction found in a selection, for those taller than the average by two centimetres, to be smaller or larger than that found for those one centimetre taller?

Would you expect that such differences found between selections would become smaller or larger if the selections were made larger?

Answers to questions like these were first sought by carrying out experiments in which aspects of large populations of beans, people, horse races, gambling games, and the like were noted and analyzed, particular attention being given to obtaining insight into possible strategies for making wagers.

An excerpt from that reference follows:

The foregoing plot is not necessary to our objective of locating the two threshold values and is shown here for interest only.

We could plot its accumulation and read its thresholds but it is easier to employ ERF() that is also fully characterized by

The expressions contained in the outlined cells can be seen by clicking their cell addresses following:

. | . |

AI4 | AI5 | AI16 |

The results of using Goal Seek for the two thresholds are shown next.

(Had we wanted greater precision than that provided by Goal Seek, we could have written a macro for the purpose.)

Sieves were set up according to the threshold values 4.0 and 5.7 and the sample lot was processed. Now, 25.46% were classified as Gourmet peas and 14.72% were classified as Soup peas.

Using the normal distribution as a proxy for the sample distribution to obtain the thresholds provided a sorting result very much closer to the desired result than did the thresholds obtained by estimation from the accumulation of the sample distribution.

_____________________________________________

Excel's Visual Basic programming language provides a RND() function that generates pseudo random values from zero up to but not including one. RND() is designed to produce a uniform distribution on that interval, that is, all values produced have the same likelihood of occurrence.

To construct each sample, 4,000 values from RND() were added together. The central limit theorem teaches that such sums should tend to be normally distributed. (The addition can be likened to 4000 stages in the growth of a pea.) Placing 30,000 of these sums on the spreadsheet together with the average sample value and standard deviation completed the first stage. The macro follows:

A set of 30,000 values so produced should have an average value of ~2000. This is a bit large for peas. Also, a standard deviation that could apply to a pea crop was desired. Transforming these values to values suited to the pea crop was accomplished by multiplying each of the samples by one selected constant and adding another constant to the result. The objective values were a matter of mixed judgment, trial and error, and not wanting the result to appear contrived.

The means of choosing the multiplier and additive constants may be interesting. Excel's

Here X1 is to be set equal to the desired standard deviation by adjusting X3 and Y3, the desired constants, subject to the constraint that X2 be the desired average value. (

It may be of interest to see how some different sets of 30,000 samples produced by the macro match in average value and standard deviation. These are shown next for 5 consecutive sample sets.

The Law of Large Numbers seems to be very well satisfied. The averages differ little from their expected value of 2000.

Only one, NORMDIST(), of the about 80 available Excel2000 Statistics functions was employed in this topic. It was used to generate the normal distribution plot values for the bin abscissa given the sample

Top | Previous Topic | Next Topic | Topics |