Two main
elements in
any series.
The measure
of the utility
of mathematical
formulae.
Comparison
between the
utility of
mathematical
formulae
and graphical
representation.
examples in the series giving a quantity R which = -7=* This quantity R represents the V n
limits within which the probable value of the mean M may vary. M of course is
determined absolutely by our series, but a conceivable addition of a few examples
might alter it to a greater or less degree, it? is a measure of the degree to which
this variation is likely to occur, and the value of the mean should be put down as
M + R . R will never be large, but its variation will indicate the reliability of the value
found for M.
R , then, is in itself a measure of the stability of the type, while r serves to indicate
the conformity of the general series to that type.
In the preceding paragraphs and in the notes in the Appendix we have given the
simplest mathematical formulae for testing the goodness of fit between a Frequency Curve
and the corresponding Probability Curve. These formulae are of most use when considerations
of space or convenience preclude the reproduction of the actual graphic curves.
In the present work, however, it has been found practicable to reproduce the Curves of
Frequency in such a way that the inferences to be drawn from them can be at once
pointed out. Consequently, although the results derived from the mathematical formulae
are also published it was not really necessary to do so. They merely serve as aids and
corroborations. There is nothing recondite hidden under these cryptic symbols, a fact
which will, it is hoped, be appreciated by the attentive reader. When the two Curves,
that of Probability and that of Frequency, are given side by side it is perfectly easy to
judge of the misfit by eye, especially when, as in the present case, mechanical aid is given
to the eye by the way in which the curves are plotted. Accordingly we have employed
squared paper and have drawn the Probability Curve and the Frequency Curve together.
They enclose the same area, viz. the number of squares equal to the number of examples
in each particular series.
To obtain the precise amount of the misfit therefore it is only necessary to reckon the
area (irrespective of whether it be excess or defect) over which the two curves fail to
coincide. The proportion of this to the total area may then be expressed as so much per
cent., which will enable us to compare directly the results obtained on series containing
different numbers of examples.
An experiment on any one of our figures will make this clear, and the experimenter
may convince himself of the accuracy of the result by comparing it with the mathematical
expression of the same which is appended in each case.
The method by which the misfit is measured is explained in the Appendix, Note IV,
p. 128. The actual method of obtaining the estimate o f the misfit except by the graphical
method is not given here. A reference may be made to Quetelet’s work already cited
(Lettres sur la Théorie des Probabilités), in which there will be found an explanation of
the method and an example of its application. Though lengthy the account is easily
intelligible, and as we have only occasionally used the method for purposes of corroboration
it is unnecessary to give it here in detail.
It is impossible to give an accurate rule for the application of the values found for the
misfit. There values depend not merely on the discrepancies between the two curves, but
also on the number of examples in the series, and on the value of the probable error. It
may, however, be stated that a small misfit, say under 10 per cent., affords strong evidence
of homogeneity. It is probable that with series of under 100 examples a misfit of 20 per
cent, should be regarded as small. In the same way large misfits, say over 30 per cent.,
73
are arguments in favour of heterogeneity, but these tests can only be used to strengthen
the evidence derived from other sources.
It has been stated that if the number of observations composing a series is large, Th e chances
e.g. 1000, the contours of the Probability Curve and of the Frequency Curve will Srvesmay
approximate very closely to one another and should indeed theoretically coincide. If the feil.t0 fit
coincidence is far from perfect then the series is to be considered as not homogeneous, pau c ftyo f
But a difficulty arises in applying the theory of probabilities to series which are made up matenal*
of individuals far under 1000 in number, and it is the more important to consider it in this
place as the series to be discussed are sometimes comparatively small. It is of course
eminently desirable that series which are to be submitted to the test of a Probability
Curve should be large, but unfortunately circumstances generally forbid it. If it is
justifiable to protest against the efforts which are constantly being made to establish the
ethnological position of a people on the evidence of a paltry half-dozen specimens, yet on
the other hand it very rarely happens that a series of crania can be obtained which is
considerable enough to exclude the possibility of ‘ random sampling,’ that is to say, the
fallacious irregularity due to paucity of material. When, however, the series of observations
exceeds thirty in number the test of a Probability Curve may be applied if due attention
be paid to the checks given by the mathematical formulae, and when the number of
observations is over sixty it may be applied with confidence. In our present work the
conclusions obtained for one index can be compared with those obtained for another index
in the same period; and if the curves for several different features in the same period
yield analogous results the argument will be enormously strengthened.
If any single Frequency Curve exhibits an outline such as that of the curve on p. 69,
Fig. 14, it is clear that its deviation from the theory of probability is enormously greater
than can be explained on the mere ground of paucity of material. This is probably true
even for series of less than sixty individuals; and if several curves derived from the same
set of subjects by means of different properties of those subjects give similar results the
evidence becomes exceedingly strong.
Various methods have been suggested for testing whether the deviations in
some one particular series may be regarded as the result of random sampling; but
there can be no real safeguard so long as only one feature is considered at a time;
it is only possible to feel secure when corroborative evidence is given by the other
concurrent series.
Two more points must be noted before this chapter is closed. Many anthropologists Theirregu-
when making use of Frequency Curves employ one trick or another for regulating their i^quencyhe
irregularities, and so obtain a smooth curve instead of one with an irregular jagged outline. Curve^
Such a procedure is under ordinary circumstances totally unjustifiable. The actual smoothed
statistics obtained by observation ought not to be manipulated in any way so as to alter impunity,
their value.
If this is done the meaning of various parts of the Frequency Curve is disguised
if not completely hidden. The ‘ distribution’ of supposed error may well be only the
invention of new and very real error. We have already pointed out that the position of
a misfit is of great importance in attaching a meaning to it.
This more or less arbitrary smoothing of irregularities may either entirely alter the
position of the misfit or even remove it altogether. It would be quite possible, by some
of the methods used, to make almost any series of observations fit the Probability Curve
very accurately. This is not the object of the application of Mathematics to Anthropology
MACIVKR L