Phoneme vs Allophone
Why This Matters
The first analytical move a phonologist makes when looking at a new
language is to determine which sound differences carry meaning and
which do not. The English speaker hears the p in pin and the p
in spin as the same sound; an instrumental study of the recording
reveals that the first p is aspirated [pʰ] and the second is
unaspirated [p]. The two are physically distinct
acoustic events but the same phoneme — they cannot replace each
other to change meaning in any English word. In a different language
(Hindi, Korean, Thai), aspirated and unaspirated stops are distinct
phonemes, and replacing one with the other does change meaning.
This is the phoneme/allophone distinction: the same physical inventory of sounds carves up into different phonemic categories in different languages, and the carving is what determines meaning distinctions. Without this distinction, no other phonological analysis is well-formed. With it, the rest of phonology — features, rules, syllable structure, prosody — becomes possible.
The same machinery underwrites the analysis of speech-recognition systems, text-to-speech systems, and acoustic models in computational linguistics: the inventory the model needs to discriminate is the phonemic inventory; allophonic variation is either modeled explicitly (context-dependent triphones) or absorbed into the acoustic model.
Definitions
Phoneme
A phoneme is the smallest unit of sound that can distinguish
meaning in a given language. Phonemes are written between slashes:
/p/, /t/, /i/.
Allophone
An allophone is a phonetic realization of a phoneme that occurs
in a specific phonological context. Allophones are written in
square brackets, the same notation used for narrow phonetic
transcription: [pʰ], [p], [p̚].
Minimal pair
A minimal pair is a pair of words that differ in exactly one sound at the same position and have different meanings. The existence of a minimal pair is the standard evidence that the two sounds in question realize different phonemes.
Complementary distribution
Two sounds are in complementary distribution if every context in which one occurs is a context in which the other does not, and vice versa. Sounds in complementary distribution are candidates for being allophones of the same phoneme.
Free variation
Two sounds are in free variation if they can occur interchangeably in the same context without changing the meaning of the word. Sounds in free variation are also candidates for being allophones of the same phoneme. The distinction between free variation and complementary distribution turns on whether the choice is context-determined or speaker-determined.
How to Determine Whether Two Sounds Are Allophones
Given two phonetically similar sounds in a language, the standard analytical procedure is:
- Look for a minimal pair. If two words differ in exactly one sound at the same position and have different meanings, the two sounds are different phonemes. Done.
- If no minimal pair exists, list the contexts of each sound. For each occurrence of each sound, record the immediately preceding and following segment, and any larger context (syllable position, stress) that might be relevant.
- Check for complementary distribution. If every context in which sound A occurs is a context in which sound B does not, and vice versa, the two sounds are in complementary distribution and are likely allophones of one phoneme.
- Check for free variation. If the two sounds can occur in the same context without changing meaning (different speakers or different productions by the same speaker), they are in free variation and are again candidates for allophone status.
- Identify the underlying phoneme. The standard convention is to pick the allophone with the wider distribution as the underlying form, and write a phonological rule that derives the narrower-distribution allophone from the underlying form in the appropriate context.
| Sound pair | Minimal pair? | Distribution | Conclusion |
|---|---|---|---|
English [pʰ] / [p] | No | [pʰ] in stressed syllable onsets; [p] after /s/ and in weaker positions | Allophones of /p/ |
Hindi [pʰ] / [p] | Yes: phal / pal | Same lexical position can distinguish words | Separate phonemes |
Korean [l] / [ɾ] | No | Lateral in coda/pre-consonantal positions; flap between vowels or before vowels | Allophones of one liquid phoneme |
Worked Example 1: English Aspiration
In English, voiceless stops (/p/, /t/, /k/)
have aspirated and unaspirated allophones. The aspirated allophone
[pʰ] occurs at the beginning of a stressed syllable; the
unaspirated allophone [p] occurs after /s/
in the same syllable and in unstressed positions.
| Word | Broad transcription | Narrow transcription |
|---|---|---|
| pin | /pɪn/ | [pʰɪn] |
| spin | /spɪn/ | [spɪn] |
| happy | /hæpi/ | [hæpi] |
| pat | /pæt/ | [pʰæt] |
There is no English minimal pair distinguishing [pʰ] from
[p]. The two sounds are in complementary distribution: one
is stressed-syllable-initial, the other is post-/s/ or
unstressed. They are allophones of a single phoneme, conventionally
written /p/. A plain-language rule is enough:
/p/ is realized as [pʰ] at the beginning of a stressed syllable.
The stressed-syllable version is more accurate than a simple word-boundary rule: words such as appear also create an aspiration environment even though the stop is not word-initial.
The corresponding analysis for English speakers learning Korean or Hindi is a known difficulty: a learner whose phonemic system does not distinguish aspirated from unaspirated stops must build a new phonemic distinction from scratch.
Worked Example 2: Hindi Aspiration
In Hindi (and many other languages of the Indian subcontinent), aspirated and unaspirated stops are distinct phonemes. The following pairs are minimal pairs:
| Hindi word | Transcription | Gloss |
|---|---|---|
| पल | /pal/ | "moment" |
| फल | /pʰal/ | "fruit" |
| ताल | /tal/ | "rhythm" |
| थाल | /tʰal/ | "plate" |
| काल | /kal/ | "time" |
| खाल | /kʰal/ | "skin" |
Because /pal/ and /pʰal/ are different
words with different meanings, [p] and [pʰ] are
distinct phonemes in Hindi. The contrast is phonemic in Hindi but
allophonic in English, even though the same physical sounds are
involved. This is the textbook illustration that phoneme inventories
are language-specific.
Worked Example 3: Korean Liquids
In Korean, lateral and flap realizations are often analyzed as
allophones of a single liquid phoneme, conventionally represented
as /l/ or /ɾ/ depending on the source. The distribution is
roughly:
[l]occurs syllable-finally and before another consonant.[ɾ]occurs syllable-initially before a vowel and intervocalically.
| Korean word | Romanization | Transcription |
|---|---|---|
| 마을 | maeul | [maɯl] |
| 마루 | maru | [maɾu], with the flap [ɾ] intervocalic |
| 사람 | saram | [saɾam] |
| 슬슬 | seulseul | [sɯlsɯl] |
There are no minimal pairs distinguishing [l] from
[ɾ] in Korean. The two are in complementary distribution
and are allophones of one phoneme. A Korean speaker learning
English faces the inverse difficulty of the English-speaker-learning-
Hindi case: [l] and [ɹ] are distinct phonemes in
English (compare light and right), and treating them as one
produces the well-attested /l/-/r/ confusion pattern.
ML Connection: ASR, TTS, and Phonemic Targets
Speech-recognition (ASR) and text-to-speech (TTS) systems must choose what level of representation to use as the discrete target.
Phoneme-level systems. Many traditional ASR systems, especially HMM-GMM and hybrid DNN-HMM systems, used phoneme- or senone-level targets. Allophonic variation was either modeled by context-dependent triphones (the unit is "phoneme A in the context of phoneme B preceding and phoneme C following," giving inventories of thousands of triphone states) or absorbed into the acoustic model, which learns to emit phonemic labels regardless of allophonic surface form.
Subphonemic systems (HMM-GMM senones, the senone-clustered state in Kaldi) sit below the phoneme level: a senone is a clustered context-dependent acoustic state that may correspond to a fragment of an allophone. Senones are a practical implementation choice, not a linguistic claim about discrete units.
End-to-end character or grapheme-level systems based on CTC, RNN-Transducer, or attention often target letters, byte-pair units, or grapheme strings directly and do not expose phonemes as the output vocabulary. In those systems, any phoneme/allophone organization has to be inferred from hidden representations rather than read off from explicit labels.
Self-supervised speech models (wav2vec, HuBERT, Whisper) learn discrete or continuous speech representations from raw audio. Probing studies (Pasad et al. 2021; Conneau et al. 2020) show that the resulting representations carry phonemic information at intermediate layers — that is, the layers cluster acoustically distinct realizations of the same phoneme together, which is evidence that a phoneme/allophone-like abstraction is present in the learned representation.
The careful summary: the phoneme/allophone distinction is what many traditional ASR label sets tried to capture explicitly. End-to-end systems may still learn phoneme-like structure, but that claim has to be tested with probes or controlled evaluations rather than assumed from the output vocabulary.
Common Mistakes
Confusing phonemic with phonetic transcription
Slashes mean phonemic; square brackets mean phonetic. The same word can have a phonemic transcription that is identical across speakers and a phonetic transcription that varies with speaker, register, and context. Mixing the two notations is the most common beginner error in phonology coursework.
Assuming the phoneme inventory is universal
Phoneme inventories are language-specific. The same physical sound can be a phoneme in one language and an allophone in another (English vs Hindi aspiration), and a sound that is phonemic in one language can be entirely absent from another (retroflex consonants in English vs Hindi; clicks in English vs Xhosa). The IPA chart is a cross-linguistic inventory of possibilities, not a list of phonemes of any particular language.
Forgetting that one minimal pair is enough
The existence of a single minimal pair is sufficient to establish that two sounds are distinct phonemes. The absence of a minimal pair is not sufficient to establish that two sounds are allophones; one must additionally check for complementary distribution or free variation, because some genuine phonemic contrasts have low functional load and produce few minimal pairs.
Treating free variation as 'random' allophone choice
Free variation is not noise. Different allophones in free variation often correlate with sociolinguistic factors (formality, register, dialect, speaker age). The choice between them is sociolinguistically structured even when it is not phonologically context-determined.
Exercises
Problem
The Spanish phoneme /d/ has two allophones: a stop [d] and a
fricative [ð] (the "th" of these). Given the words dedo
"finger" [deðo], donde "where" [donde], and cada "each"
[kaða], state the rule that determines which allophone appears
in which context.
Problem
In Korean, propose a phonological rule that derives the flap
[ɾ] from underlying /l/ in the appropriate context.
Problem
A famous case in phonological theory: in some dialects of
Standard American English, the vowels in writer and rider are
phonetically distinct (the writer vowel is shorter), even though
the surface flap [ɾ] is the same in both. Sketch the analysis:
what phonemes are present in the underlying form, and how do the
phonetic distinctions arise? This is known as Canadian Raising or
pre-fortis clipping depending on the dialect.
Problem
Self-supervised speech models such as wav2vec 2.0 produce representations that probe well for phonemic information at intermediate layers. Design an experiment to test whether these representations also encode allophonic information beyond what phoneme labels capture, and predict the result.
Finite-State Formalization Note
Phonology has a small but real formal-verification literature. Finite-state phonology (Kaplan and Kay 1994; Karttunen 1993) proved that most context-sensitive phonological rules can be implemented as finite-state transducers, and the foma and xfst tools formalize this. The phoneme-allophone distinction maps to the underlying form vs surface form distinction in a finite-state cascade: the underlying form is the input alphabet (phonemic), the surface form is the output alphabet (allophonic), and the transducer encodes the phonological rules.
Next Topics
- Distinctive features and natural classes: the formal feature system that makes phonological rules concise.
- Phonological rule formalism: the SPE-style rule notation and its modern descendants.
- Syllable structure and phonotactics: the supra-segmental scaffolding for phoneme distributions.
References
Canonical:
- Hayes, Bruce. Introductory Phonology (2009), Chapters 2-3.
- Ladefoged, Peter, and Keith Johnson. A Course in Phonetics (2014, 7th ed.), Chapter 2.
- Hyman, Larry M. Phonology: Theory and Analysis (1975), Chapter 1.
- Pike, Kenneth L. Phonemics: A Technique for Reducing Languages to Writing (1947).
- International Phonetic Association. Handbook of the International Phonetic Association (1999).
Computational:
- Kaplan, Ronald M., and Martin Kay. "Regular Models of Phonological Rule Systems." Computational Linguistics 20 (1994) 331-378.
- Pasad, Ankita, Ju-Chieh Chou, and Karen Livescu. "Layer-wise Analysis of a Self-Supervised Speech Representation Model." ASRU (2021).