Classical phonetics assumes ordinal – that is: step wise – feature values, which often are binary. A particular front vowel is e.g. either round, like German [y:], or not-round, like its German counterpart [i:]. The values – in the setting used in the VokalJäger – don’t necessarily have to be binary: the openness of back vowels could be represented in the numerical value series 4, 3, 2 and 1, which represent open, mid open, mid close and close.
The situation becomes a bit tricky when small variations around those elementary values should be described. Classical phonetics employs the use of diacritics.
the Floating phonetic feature and its value
In the VokalJäger a different concept is utilized. It is assumed that, given a speech segment, a probability p can be evaluated to indicate how likely it is that, say, the vowel indeed is open, mid open, mid close or close. If one uses an ensemble of N numerical elementary phonetic feature values, e.g. m = 4, 3, 2 and 1, it is possible to mathematically evaluate an expected value. The elementary phonetic feature values constitute an ordinal representation of the underlying elementary phonetic feature conditions, here: open, mid open, mid close and close. This expected value, mathematically:shall be called floating phonetic feature value and be denoted by the Greek letter zeta ζ [Keil 2017, formula (35), p. 171; note the superscript 0: this particular ζ constitutes the initial floating phonetic feature value, from which, by means of order statistics, the final version is derived]. The key idea of the floating phonetic feature value is that ζ “floats” between the classical ordinal feature values (here: elementary phonetic feature values m): diacritics are now replaced by small numerical variations around the elementary values m. As result, a floating phonetic feature was constructed with an associated numerical value ζ.
Support and contrast sounds
The probabilities p are calculated in a particular way: it is assumed, that for each elementary phonetic feature condition i – and hence: its associated elementary phonetic feature values m – certain sounds can be selected as representative – the so called support sounds. One may, e.g. have the elementary phonetic feature condition mid open for back vowels be represented by the support sound [ɔ] and assign the elementary phonetic feature value m = 3. Correspondingly there are so called contrast sounds – those being the sounds, which do not possess the respective elementary phonetic feature condition. The vowels [o:] and [a:] may e.g. be chosen as contrast sounds for the elementary phonetic feature condition mid open, respectively m = 3. Hence [o:] and [a:] form a representation for those sounds not showing the condition mid open, those not being m = 3, i.e. those with m ≠ 3.
In practice the probabilities are calculated using mathematical models. Employing machine-learning techniques, it is possible to train so called binary classifiers to detect existence respectively non-existence of singular elementary phonetic feature conditions (respectively: detect the prevalence of support sounds).
Figure: The key components of the floating phonetic feature value concept [Keil 2017, figure 68, p. 172; enhanced/colored version]. The entire process is described in full detail in Keil (2017, p. 170-213).
contrast groups and Phonetic feature differences
Once the floating phonetic feature values ζ have been evaluated for certain groups, the next question is: do those ζ significantly differ between two groups? That can be answered by looking at the numerical differences statistically. Firstly the ζ center – mathematically: the median – of each group is evaluated (more precise: the center of the centers within same words):
[Keil 2017, formula (36), p. 183]. Then the numerical absolute difference is calculated and the statistical significance of the difference is considered:
[Keil 2017, formula (37), p. 187]. This measure – the phonetic feature distance Δζ – constitutes a probability weighted phonetic difference. To qualify as “significant” in above formula, the differences of the medians have to be statistically significant at least on the 95% level and the numerical absolute difference must exceed 0.25, what “equates” about to “half” a diacritic.
If one assembles the groups based on items, which are phonetically / phonologically distinct, those groups should contrast each other’s when looking at the floating phonetic feature values ζ – i.e. they should show significant feature distances Δζ. Groups which are hypothesized to differ are labelled VokalJäger’s contrast groups.
Example: Differences in High German O sounds
The following picture exemplifies the concept of the phonetic feature distance Δζ on the scale of the floating phonetic feature of openness of back vowels. High German sounds from the Kiel PHONDAT Corpus were assembled into two contrast groups, one for the short O sounds, in High German mid open [ɔ], and a second for long O sounds, in High German mid closed [o:].
It can clearly be observed that High German short O-s assemble around ζ values of 3 – as expected for [ɔ]. The long O-s land contrarily at ζ values of 2 – as expected for [o:]. In all cases there are distances, which are are highly significant at the 99% level. The between-group distances range from Δζ = 1.0 to Δζ = 1.3 in this example.
Here one re-confirms with the VokalJäger text book knowledge: German short and long O sounds are qualitatively separate sounds.
Figure. Phonetic feature distances Δζ (line) between the contrast groups of High German short O (lower symbol) and long O sounds (upper symbol), plotted on the scale ζ of the floating phonetic feature openness of back vowels [Keil 2017, figure 75, p. 186; colored version]. The symbol size in the plot corresponds with the coefficient of separation (COS), a measure for non-overlap of distributions, based on Bhattacharyya (1943)’s concepts.