# Chapter 3

## Area under a curve - An Integral defining erf(x)

A very important function in the study of Statistics is the Gaussian probability integral: A slightly different relationship to erf(x) is more frequently seen in mathematics: The error function, erf(), is used in physics, engineering etc., to determine the probability that events will occur between intervals of x.  To determine values for erf(x) we must evaluate the integral on the right of the expression.

(Excel2000 provides the function ERF(z) but our aim is to demonstrate its calculation and provide a closed form approximation for erf().  The steps of doing so provide further insight into the use of the spreadsheet for one's own purposes.)

Rectangular quadrature, representing the area of an interval under a curve by the average of the start and finish values times the width of the interval, works well when the curve is a straight line.  If the curve could be represented by an order two or higher polynomial, then other quadrature rules may produce better results. In particular, the n-point Gaussian Quadrature Rule can yield an exact result for polynomials of degree 2n-1.

The rule employs weighted values of the integrand at n points within the interval.  Points and weights for the interval from 1 to 2 for n = 4 are shown next. Note that the abscissa labels are not proportionately spaced.  Our calculation of erf(x) employs this 4 point Gaussian Quadrature Rule.

A plot of the integrand is shown next. It is a plot from columns B, for the abscissa, and G, for the ordinate, of the spreadsheet that was used to calculate it. Representative cells from that spreadsheet now follow: The expressions contained in the outlined cells can be seen by clicking their cell addresses following:

 . .
 E1 C2 E2 G2 I2 C3 D3 E3 F3 A6 B6 C6 D6 E6 F6 G6 H6 I6

Returning to the spreadsheet:  Column A is for future use. Its values run from zero to π/2 in steps of π/64000.  Briefly, this column will later serve as abscissa values for a sine series approximation to erf(x).

Column B values run from 0 to 3.2 in steps of 0.0001, (C1), and forms the intervals of x for which the integrand is evaluated.  The evaluations, (G6:G32005), employ the value for e, (G1), and four points within each interval, as seen for (C6:f6), using the weights shown at E1 and E2.

The incremental area, (H6), is formed as the product of the Gaussian Estimate of the integrand times the width of the interval, (C1), times the multiplier 2/√π, (I2).

The incremental areas are accumulated, summed, in column I to form the approximate erf(x) which
is plotted from the table as is seen next. ### How does it compare?

Using the same arguments, a table of the Excel2000 function ERF(x) was created and the point-by-point relative difference, in %, between it and erf(x) was calculated as ABS[(ERF(x)/erf(x)) -1]. The table is shown, in part, next. The average value of the relative differences is shown outlined at the top of the third column.

Although less than half of the full table is shown,  it is surmised from the discontinuities, that ERF(x) is calculated in a sequence of ranges.  For examples, the first range finishes at row 5513; the second range finishes at row 6996; and so on.  A chart of the relative differences, which appear quite strange, follows: One wonders which of erf(x) or ERF(x) is the more accurate.  Few values for Erf() to be found on the web offer more than 7 decimal digits.  IBM, at this location, provides a 10 decimal digit value, 0.8427007794 for x = 1.

In comparison:  ERF(1) = 0.842700735175, differs from the IBM value by +0.000005248%.  erf(1) = 0.842700792950, differs from the IBM value by -0.000001608%, less than 1/3 of the relative difference provided by ERF(x).

As a second check:  "Computer Approximations", The SIAM Series in Applied Mathematics, John Wiley & Sons, 1968 provides ARCERF(1/2) as 0.47693 62762 04469 87338 ..... .  To compare without employing interpolation, the step size of the table, 0.0001 was modified slightly so that xo would assume this value quite closely, 0.47693627620449, as is shown next at B4775 of the table. The corresponding value for erf() is shown at I4775 and its series approximation at J4775.  L4775 contains the Excel2000 value, ERF(B4775).  L4774 shows the relative error in percent of the table calculation with respect to 1/2. L4776 shows the relative error of ERF(B4775).

Both of the foregoing tests against other known values serve to provide more confidence in our spreadsheet table values than in the Excel2000 ERF() values.

### Symmetry, Complement and Inverse

It is clear from the definition that erf(-x) = erf(x).  Note that the Excel2000 ERF(x) always expects a positive argument.

The complement of erf(x) is expressed as erfc(x) where erfc(x) = 1 - erf(x) and is known as the error function complement.   Excel2000 does not provide erfc(x).

The inverse of erf(x), the argument x of erf(x) for which erf(x) attains a specific value, is expressed as arcerf(x).  Excel2000 does not provide an ARCERF(x).  It can be obtained, approximately, by using their Goal Seek tool from the Tools menu as is illustrated next. As seen in the foregoing, cell AL2 contains the value 1. As can be seen on the edit line, cell AM2 contains the expression =ERF(AL2) and shows the appropriate value in the cell. On selecting Goal Seek from the Tools menu, the small menu appears to allow entry of its parameters.  Here we wish to change AL2 such that AM2 will attain the value 1/3. Clicking OK brings up: Although AM2 is not exactly 1/3, it is close. Writing one’s own macro can provide a better approximation.

Goal Seek is quite a convenient tool and worth knowing.

### Approximate erf()

As it is time-consuming to look-up and interpolate in a table such as this, it can be useful to obtain an approximation for erf(x) that can be evaluated for any x in the interval.  The shape of the function suggests that a sine series could be used on the interval 0 to π/2 for the approximation as is seen next.

erf(u) = aSIN(u)+bSIN(2u)+cSIN(3u)+........+iSIN(12u)+mSIN(13u)+nSIN(14u)

Solver was used to find the coefficients a through n using the average relative error between the calculated values for erf(u) and the approximation as its objective function. The coefficients follow: The sine series employs the variable u found in column A of the spreadsheet.  Given a value for x, u can be found as: u = π * x / 6.4.

A plot of the error between the approximation and the table follows: Note the almost uniform error extrema over the entire range of x.  This effect was produced by raising each relative error between erf() and its approximation to the 12th power and using the average value of those errors as the objective function for the approximation.  What this does is greatly increase the influence of large errors on the objective function, automatically providing them with more weight and thus providing a larger target for minimization.

In many situations the near uniform error extrema of relative error is a highly desirable feature of an approximation.

As an exercise, the viewer might try to improve on the approximation by approximating with functions other than the sine series.   A polynomial series or the ratio of two polynomial series could be tried.

### The Normal or Gaussian Distribution

When the Gaussian probability integral employs a density function that has a non-zero mean and has a non-standard width, t2
is replaced by (z-a)2 / 2σ2 where a is the average or mean value of the density function and σ is a measure of its width.  The probability integral becomes: When a is zero and σ =1, p(z) regains its standard normal form.  See here for further details.

Next

The next topic will employ the erf() function in a design problem.

 Top Next Topic Topics