Phoneme vs Allophone

Why This Matters

The first analytical move a phonologist makes when looking at a new language is to determine which sound differences carry meaning and which do not. The English speaker hears the p in pin and the p in spin as the same sound; an instrumental study of the recording reveals that the first p is aspirated [pʰ] and the second is unaspirated [p]. The two are physically distinct acoustic events but the same phoneme — they cannot replace each other to change meaning in any English word. In a different language (Hindi, Korean, Thai), aspirated and unaspirated stops are distinct phonemes, and replacing one with the other does change meaning.

This is the phoneme/allophone distinction: the same physical inventory of sounds carves up into different phonemic categories in different languages, and the carving is what determines meaning distinctions. Without this distinction, no other phonological analysis is well-formed. With it, the rest of phonology — features, rules, syllable structure, prosody — becomes possible.

The same machinery underwrites the analysis of speech-recognition systems, text-to-speech systems, and acoustic models in computational linguistics: the inventory the model needs to discriminate is the phonemic inventory; allophonic variation is either modeled explicitly (context-dependent triphones) or absorbed into the acoustic model.

Definitions

Definition

Phoneme

A phoneme is the smallest unit of sound that can distinguish meaning in a given language. Phonemes are written between slashes: /p/, /t/, /i/.

Definition

Allophone

An allophone is a phonetic realization of a phoneme that occurs in a specific phonological context. Allophones are written in square brackets, the same notation used for narrow phonetic transcription: [pʰ], [p], [p̚].

Definition

Minimal pair

A minimal pair is a pair of words that differ in exactly one sound at the same position and have different meanings. The existence of a minimal pair is the standard evidence that the two sounds in question realize different phonemes.

Definition

Complementary distribution

Two sounds are in complementary distribution if every context in which one occurs is a context in which the other does not, and vice versa. Sounds in complementary distribution are candidates for being allophones of the same phoneme.

Definition

Free variation

Two sounds are in free variation if they can occur interchangeably in the same context without changing the meaning of the word. Sounds in free variation are also candidates for being allophones of the same phoneme. The distinction between free variation and complementary distribution turns on whether the choice is context-determined or speaker-determined.

How to Determine Whether Two Sounds Are Allophones

Given two phonetically similar sounds in a language, the standard analytical procedure is:

Look for a minimal pair. If two words differ in exactly one sound at the same position and have different meanings, the two sounds are different phonemes. Done.
If no minimal pair exists, list the contexts of each sound. For each occurrence of each sound, record the immediately preceding and following segment, and any larger context (syllable position, stress) that might be relevant.
Check for complementary distribution. If every context in which sound A occurs is a context in which sound B does not, and vice versa, the two sounds are in complementary distribution and are likely allophones of one phoneme.
Check for free variation. If the two sounds can occur in the same context without changing meaning (different speakers or different productions by the same speaker), they are in free variation and are again candidates for allophone status.
Identify the underlying phoneme. The standard convention is to pick the allophone with the wider distribution as the underlying form, and write a phonological rule that derives the narrower-distribution allophone from the underlying form in the appropriate context.

Sound pair	Minimal pair?	Distribution	Conclusion
English `[pʰ]` / `[p]`	No	`[pʰ]` in stressed syllable onsets; `[p]` after `/s/` and in weaker positions	Allophones of `/p/`
Hindi `[pʰ]` / `[p]`	Yes: `phal` / `pal`	Same lexical position can distinguish words	Separate phonemes
Korean `[l]` / `[ɾ]`	No	Lateral in coda/pre-consonantal positions; flap between vowels or before vowels	Allophones of one liquid phoneme

Worked Example 1: English Aspiration

In English, voiceless stops (/p/, /t/, /k/) have aspirated and unaspirated allophones. The aspirated allophone [pʰ] occurs at the beginning of a stressed syllable; the unaspirated allophone [p] occurs after /s/ in the same syllable and in unstressed positions.

Word	Broad transcription	Narrow transcription
pin	`/pɪn/`	`[pʰɪn]`
spin	`/spɪn/`	`[spɪn]`
happy	`/hæpi/`	`[hæpi]`
pat	`/pæt/`	`[pʰæt]`

There is no English minimal pair distinguishing [pʰ] from [p]. The two sounds are in complementary distribution: one is stressed-syllable-initial, the other is post-/s/ or unstressed. They are allophones of a single phoneme, conventionally written /p/. A plain-language rule is enough:

/p/ is realized as [pʰ] at the beginning of a stressed syllable.

The stressed-syllable version is more accurate than a simple word-boundary rule: words such as appear also create an aspiration environment even though the stop is not word-initial.

The corresponding analysis for English speakers learning Korean or Hindi is a known difficulty: a learner whose phonemic system does not distinguish aspirated from unaspirated stops must build a new phonemic distinction from scratch.

Worked Example 2: Hindi Aspiration

In Hindi (and many other languages of the Indian subcontinent), aspirated and unaspirated stops are distinct phonemes. The following pairs are minimal pairs:

Hindi word	Transcription	Gloss
पल	`/pal/`	"moment"
फल	`/pʰal/`	"fruit"
ताल	`/tal/`	"rhythm"
थाल	`/tʰal/`	"plate"
काल	`/kal/`	"time"
खाल	`/kʰal/`	"skin"

Because /pal/ and /pʰal/ are different words with different meanings, [p] and [pʰ] are distinct phonemes in Hindi. The contrast is phonemic in Hindi but allophonic in English, even though the same physical sounds are involved. This is the textbook illustration that phoneme inventories are language-specific.

Worked Example 3: Korean Liquids

In Korean, lateral and flap realizations are often analyzed as allophones of a single liquid phoneme, conventionally represented as /l/ or /ɾ/ depending on the source. The distribution is roughly:

[l] occurs syllable-finally and before another consonant.
[ɾ] occurs syllable-initially before a vowel and intervocalically.

Korean word	Romanization	Transcription
마을	maeul	`[maɯl]`
마루	maru	`[maɾu]`, with the flap `[ɾ]` intervocalic
사람	saram	`[saɾam]`
슬슬	seulseul	`[sɯlsɯl]`

There are no minimal pairs distinguishing [l] from [ɾ] in Korean. The two are in complementary distribution and are allophones of one phoneme. A Korean speaker learning English faces the inverse difficulty of the English-speaker-learning- Hindi case: [l] and [ɹ] are distinct phonemes in English (compare light and right), and treating them as one produces the well-attested /l/-/r/ confusion pattern.

ML Connection: ASR, TTS, and Phonemic Targets

Speech-recognition (ASR) and text-to-speech (TTS) systems must choose what level of representation to use as the discrete target.

Phoneme-level systems. Many traditional ASR systems, especially HMM-GMM and hybrid DNN-HMM systems, used phoneme- or senone-level targets. Allophonic variation was either modeled by context-dependent triphones (the unit is "phoneme A in the context of phoneme B preceding and phoneme C following," giving inventories of thousands of triphone states) or absorbed into the acoustic model, which learns to emit phonemic labels regardless of allophonic surface form.

Subphonemic systems (HMM-GMM senones, the senone-clustered state in Kaldi) sit below the phoneme level: a senone is a clustered context-dependent acoustic state that may correspond to a fragment of an allophone. Senones are a practical implementation choice, not a linguistic claim about discrete units.

End-to-end character or grapheme-level systems based on CTC, RNN-Transducer, or attention often target letters, byte-pair units, or grapheme strings directly and do not expose phonemes as the output vocabulary. In those systems, any phoneme/allophone organization has to be inferred from hidden representations rather than read off from explicit labels.

Self-supervised speech models (wav2vec, HuBERT, Whisper) learn discrete or continuous speech representations from raw audio. Probing studies (Pasad et al. 2021; Conneau et al. 2020) show that the resulting representations carry phonemic information at intermediate layers — that is, the layers cluster acoustically distinct realizations of the same phoneme together, which is evidence that a phoneme/allophone-like abstraction is present in the learned representation.

The careful summary: the phoneme/allophone distinction is what many traditional ASR label sets tried to capture explicitly. End-to-end systems may still learn phoneme-like structure, but that claim has to be tested with probes or controlled evaluations rather than assumed from the output vocabulary.

Common Mistakes

Watch Out

Confusing phonemic with phonetic transcription

Slashes mean phonemic; square brackets mean phonetic. The same word can have a phonemic transcription that is identical across speakers and a phonetic transcription that varies with speaker, register, and context. Mixing the two notations is the most common beginner error in phonology coursework.

Watch Out

Assuming the phoneme inventory is universal

Phoneme inventories are language-specific. The same physical sound can be a phoneme in one language and an allophone in another (English vs Hindi aspiration), and a sound that is phonemic in one language can be entirely absent from another (retroflex consonants in English vs Hindi; clicks in English vs Xhosa). The IPA chart is a cross-linguistic inventory of possibilities, not a list of phonemes of any particular language.

Watch Out

Forgetting that one minimal pair is enough

The existence of a single minimal pair is sufficient to establish that two sounds are distinct phonemes. The absence of a minimal pair is not sufficient to establish that two sounds are allophones; one must additionally check for complementary distribution or free variation, because some genuine phonemic contrasts have low functional load and produce few minimal pairs.

Watch Out

Treating free variation as 'random' allophone choice

Free variation is not noise. Different allophones in free variation often correlate with sociolinguistic factors (formality, register, dialect, speaker age). The choice between them is sociolinguistically structured even when it is not phonologically context-determined.

Exercises

ExerciseCore

Problem

The Spanish phoneme /d/ has two allophones: a stop [d] and a fricative [ð] (the "th" of these). Given the words dedo "finger" [deðo], donde "where" [donde], and cada "each" [kaða], state the rule that determines which allophone appears in which context.

ExerciseCore

Problem

In Korean, propose a phonological rule that derives the flap [ɾ] from underlying /l/ in the appropriate context.

ExerciseAdvanced

Problem

A famous case in phonological theory: in some dialects of Standard American English, the vowels in writer and rider are phonetically distinct (the writer vowel is shorter), even though the surface flap [ɾ] is the same in both. Sketch the analysis: what phonemes are present in the underlying form, and how do the phonetic distinctions arise? This is known as Canadian Raising or pre-fortis clipping depending on the dialect.

ExerciseResearch

Problem

Self-supervised speech models such as wav2vec 2.0 produce representations that probe well for phonemic information at intermediate layers. Design an experiment to test whether these representations also encode allophonic information beyond what phoneme labels capture, and predict the result.

Finite-State Formalization Note

Phonology has a small but real formal-verification literature. Finite-state phonology (Kaplan and Kay 1994; Karttunen 1993) proved that most context-sensitive phonological rules can be implemented as finite-state transducers, and the foma and xfst tools formalize this. The phoneme-allophone distinction maps to the underlying form vs surface form distinction in a finite-state cascade: the underlying form is the input alphabet (phonemic), the surface form is the output alphabet (allophonic), and the transducer encodes the phonological rules.

Next Topics

Distinctive features and natural classes: the formal feature system that makes phonological rules concise.
Phonological rule formalism: the SPE-style rule notation and its modern descendants.
Syllable structure and phonotactics: the supra-segmental scaffolding for phoneme distributions.

References

Canonical:

Hayes, Bruce. Introductory Phonology (2009), Chapters 2-3.
Ladefoged, Peter, and Keith Johnson. A Course in Phonetics (2014, 7th ed.), Chapter 2.
Hyman, Larry M. Phonology: Theory and Analysis (1975), Chapter 1.
Pike, Kenneth L. Phonemics: A Technique for Reducing Languages to Writing (1947).
International Phonetic Association. Handbook of the International Phonetic Association (1999).

Computational:

Kaplan, Ronald M., and Martin Kay. "Regular Models of Phonological Rule Systems." Computational Linguistics 20 (1994) 331-378.
Pasad, Ankita, Ju-Chieh Chou, and Karen Livescu. "Layer-wise Analysis of a Self-Supervised Speech Representation Model." ASRU (2021).