The VokalJäger (literally in German:  hunter of vowels) constitutes an algorithm to measure phonetic differences in speech signals applying machine-learning techniques. Its calculation kernel is implemented in the statistical R programming language [R Development Core Team 2015].

The initial version of the VokalJäger formed the basis of my 2016 PhD-thesis ‘Der VokalJäger’, at Marburg University (FB09 Germanistik und Kunstgeschichte), which was published in 2017 as Volume 122 of the Deutsche Dialektgeographie by Olms. This document, hence forward referenced as Keil (2017), delivers the full introduction and description of the algorithm plus the results of the experiments conducted 2014–2016. Sample pages are available here on this site.

robust acoustic variables processing

To process acoustic variables the VokalJäger automatically determines the most “appropriate” set of configuration parameters to perform phonetic measurements with the software Praat [Boersma/Weenink 2015]. It further applies robust statistics to extract and normalize formant values and related phonetic variables to achieve independence from both, corpus vocabulary and speaker physiognomy.

Machine learning

Machine-learning is utilized to “train” the VokalJäger to detect the prevalence of so called binary features: Can this or that phonetic feature – like roundness or the highest grade of openness – be found in a segment? Here Kuhn (2015)’s generic R-suite CARET is employed. Once trained, the VokalJäger can automatically calculate the probability of the binary feature being present in an unknown signal.  With this probability it is further possible to calculate an expected value, the floating phonetic feature value, here called ζ (zeta).  Then, using this ζ-value, quantitative probability weighted phonetic differenceshere labelled Δζ (delta-zeta) can be derived. This allows to statistically test, whether or not two different groups of speech segments – usually phonologically assembled at different points in real-time – separate significantly concerning a certain floating phonetic feature. One may then conclude the occurrence of a merger or a split in language development.

use cases high german and frankfurterisch

A prototypical setting is to train the VokalJäger on High German and to test it – respectively: use it – with dialect recordings.

In Keil (2017) the VokalJäger was used to analyze historic audio files of the now legacy Frankfurt city dialect Frankfurterisch. It is shown that certain phonetic monophthonic vowel features, which are representative for the dialect, were still measurable in the old Frankfurt Lautdenkmal recording from 1937 but missing in the newer tapes from the (REDE) project [Purschke 2014 f.; Schmidt/Herrgen/Kehrein 2008 f.]. In this setting the VokalJäger was trained on the Kiel PHONDAT Corpus of Read Speech as representation of High German [IPDS 1994].

A fully fledged documentation of the Frankfurt dialect can be found on the sister webpage

Vokaljäger 2.0: the next generation

The original VokalJäger was a monolithic standalone application, trained on the High German Kiel PHONDAT Corpus and limited to monophthongs. In a currently ongoing project with the Phlipps-Universität-Marburg, Deutscher Sprachatlas, the VokalJäger is being converted into a flexible and generic toolbox bridging phonetics and machine-learning. This “new” tool is code-named VokalJäger 2.0 – VJ.EAT (enhanced algorithmic toolbox).  This version will be released as R-package to the interested public.

There are several new use cases and projects the new VokalJäger is currently being calibrated for and tested with:

  • Prosodic Features with the current specific focus on elementary discourse particles [Pistor 2016]. This enhances the old process for the capacity to work with F0-curves.
  • Tagesschau-Korpus. Replace the Kiel PHONDAT Corpus as reference for the VokalJaeger test-wise with a corpus derived from the German Tagessschau news.
  • Diphthongic Features. While the original algorithm was limited to monophthongs, the enhancement takes care of the movement between two distinct vowel states.