Constituency Tests

Why This Matters

A constituent is a sequence of words that functions as a unit in syntax: it can be moved, replaced, or coordinated as a whole. The empirical question — does this sequence of words form a constituent? — is the daily work of generative-syntax analysis.

The answer is determined by constituency tests: standard diagnostics that reveal a sequence's syntactic status. A sequence that passes the tests is treated as a constituent; one that fails is not.

For the sentence The cat sat on the mat, constituency tests identify:

the cat as a constituent (NP).
sat on the mat as a constituent (VP).
on the mat as a constituent (PP).
the mat as a constituent (NP).

But not:

cat sat — fails every test.
the cat sat on — stranded preposition plus incomplete VP.
sat on the — verb plus preposition plus incomplete determiner phrase.

The tests are the empirical foundation of phrase-structure grammars (Chomsky 1957) and modern formal syntax. Every introductory syntax textbook (Carnie, Radford, Sportiche- Koopman-Stabler, Adger) teaches them as the entry point.

In ML and NLP, constituency structure is the substrate of:

Parsing: produce the phrase-structure tree of a sentence. Constituency parsers output annotated trees that approximate the constituency-test diagnostics used in syntax.
Probing studies: probing classifiers (probing-classifiers-for-linguistic-structure) on BERT identify representations that match constituency structure (Hewitt-Manning 2019, Tenney et al. 2019).
Tree-LSTM and tree-structured neural networks process inputs as constituency trees rather than flat sequences.
Linguistic-structure evaluation of LLMs: targeted syntactic evaluation benchmarks test whether models preserve constituency-sensitive dependencies such as subject-verb agreement, filler-gap dependencies, and negative-polarity-item licensing.

The Five Standard Tests

1. Substitution

A constituent can be substituted by a single word (a pro-form) of the same type. NPs can be substituted by pronouns, VPs by do so, PPs by there/here/then.

Examples:

The cat sat on the mat → It sat on the mat. (NP test passes.)
The cat sat on the mat → The cat did so. (VP test passes.)
The cat sat on the mat → The cat sat there. (PP test passes.)

Substitution failures indicate non-constituents: The cat sat on cannot be substituted by anything coherent because sat on without its complement is not a constituent.

2. Movement (Topicalization, Clefting, Wh-questions)

A constituent can be moved to the front of the sentence for focus or question formation. Movement is one of the cleanest constituency tests.

Topicalization: The cat sat on the mat → On the mat, the cat sat. (PP moves; passes.)
Cleft: It is the cat that sat on the mat. (NP clefts; passes.) It is sat on the mat that the cat (did). (VP clefts more awkwardly; sometimes ambiguous.)
Wh-question: On what did the cat sit? (PP wh-moves; passes.) Who sat on the mat? (NP wh-moves; passes.)

Non-constituents do not move:

The cat sat on → On did the cat sit the mat? — fails.
cat sat → no coherent way to move just these two words.

3. Coordination

Constituents of the same type can coordinate with and, or, but. Two NPs can coordinate, two VPs can coordinate, two PPs can coordinate.

The cat and the dog sat on the mat. (Two NPs.)
The cat sat on the mat and slept on the bed. (Two VPs.)
The cat sat on the mat or under the chair. (Two PPs.)

Non-constituents cannot coordinate:

The cat sat on and ran past the mat — can be parsed but the sat on and ran past are reinterpreted as VPs with PP-internal structure; the test is sensitive to the reinterpretation.

4. Ellipsis

A constituent can be elided (omitted) with the ellipsis recoverable from context. VP-ellipsis is the cleanest case.

The cat sat on the mat, and the dog did too. (VP elided in the second clause.)
The cat slept; the dog did too. (VP elided.)

Non-constituents can't be elided:

The cat sat on the mat, and the dog (sat) (on) the bed — partial-VP ellipsis like this is degraded.

5. Pro-form replacement / Sluicing

Specific pro-forms substitute for specific constituent types.

NPs: he, she, it, they, this, that.
VPs: did, did so, do that, did the same.
PPs: there, here, then, in there.
Sentences: so, that, yes, no.
Wh-clauses: I know who, I know why.

The pro-form test combines with substitution and provides the finest granularity of constituent type.

Tree Structure from Tests

Combining test results yields a phrase-structure tree. For The cat sat on the mat:

       S
      / \
     /   \
   NP     VP
   /\     /\
  / \    /  \
 D   N  V    PP
 |   |  |   /\
the cat sat /  \
           P    NP
           |   /\
           on D  N
              |  |
             the mat

Each node corresponds to a constituent identified by the tests. The hierarchy reflects nested-constituent relations: the mat is a constituent inside the PP on the mat, which is a constituent inside the VP sat on the mat, etc.

Different syntactic frameworks (X-bar, minimalism, dependency grammar, CCG) draw the trees differently but agree on the basic constituent structure that the tests reveal.

Limits and Edge Cases

Constituency tests are empirical diagnostics, not theorems. They sometimes give ambiguous or contradictory results.

Ambiguous attachment: The cat saw the man with the telescope. The PP with the telescope attaches either to saw (the cat used the telescope) or to the man (the man had the telescope). Constituency tests partially distinguish the two readings but the sentence is genuinely ambiguous.

Right-node raising: I bought, and you sold, the same car. The shared object the same car is at a syntactically unusual position; the standard tests strain.

Parasitic gaps: Which book did you criticize without reading? The gap after reading is parasitic on the gap after criticize; constituent structure is more complex than basic tests reveal.

Idiomatic non-compositional units: kick the bucket "die" is a single semantic unit but the syntactic tests treat the bucket as a separable NP (The bucket was kicked is grammatical, though it loses the idiomatic reading).

These edge cases are why formal syntactic frameworks (X-bar theory, minimalism) are needed: the tests give first-order data that the frameworks then organize into a coherent theory.

ML Connections

Constituency parsing

On the Penn Treebank, strong neural constituency parsers report high labeled-bracketing F1 under the standard evaluation setup. Production and research systems use:

Chart parsers with grammar rules + statistical scores (Berkeley Parser).
Neural CRF parsers with biaffine scoring (Stern-Klein 2017, Kitaev-Klein 2018).
Sequence-to-sequence parsers that linearize the tree as a string (Vinyals-Kaiser 2014).

Production NLP pipelines may expose parsing as a preprocessing step, often dependency parsing rather than full constituency parsing. LLMs do not typically expose explicit constituency trees; whether a representation contains tree information has to be tested with probes or targeted evaluations.

Probing for constituency in transformer representations

Hewitt-Manning 2019 A Structural Probe for Finding Syntax in Word Representations trained a probe to recover syntactic-tree distance between word pairs from BERT's contextual embeddings. A linear projection of BERT activations into a low-dimensional "syntax space" recovered substantial dependency-tree structure. That is evidence that tree-distance information is extractable from the representation, not proof that BERT explicitly stores a human-readable tree.

Tree-LSTM and tree-structured neural networks

Pre-transformer NLP models (Socher et al. 2013, Tai et al. 2015) built tree-structured RNNs that processed inputs along the constituency tree. The structural-bias gain on tasks sensitive to long-range dependencies was real but limited; transformers (with no explicit tree structure) eventually matched and exceeded tree-LSTMs by absorbing the structure implicitly.

LLM grammaticality and constituency

Targeted syntactic evaluation benchmarks (Marvin-Linzen 2018, BLiMP) test whether models preserve constituency-sensitive dependencies such as subject-verb agreement across embedded clauses, filler-gap dependencies, and negative-polarity-item licensing. Good performance is behavioral evidence; it does not settle which internal representation the model uses.

Common Mistakes

Watch Out

Treating one positive test as definitive

Constituency tests should agree. A sequence that passes substitution but fails movement is suspect. The convergence of multiple tests is what makes constituency assignment reliable; single-test arguments are weak.

Watch Out

Confusing constituents with phrases

Constituent is the empirical structural notion; phrase is the theory-internal label (NP, VP, PP). Different syntactic frameworks label the same constituent differently (an NP in X-bar theory might be a DP in modern minimalism). The tests identify what's a constituent; the framework decides what to call it.

Watch Out

Forgetting that English is not the universal case

Constituency tests are language-specific. The substitution test relies on English pronouns; movement uses English wh-fronting; ellipsis uses English do-support. For other languages, the analogous tests use the corresponding constructions, and the resulting constituent structure can be quite different (e.g., free-word-order languages have flatter constituent structure in many frameworks).

Watch Out

Treating dependency grammar as denying constituency

Dependency grammar represents syntax as head-dependent relations rather than as constituency. The two views are substantially convertible; constituency tests still apply but their interpretation differs. Modern computational-linguistics work uses dependency grammar more often than constituency, but both encode similar empirical facts.

Cross-Network Links

LinguisticsPath internal: x-bar-theory is the formal framework that organizes constituent types; next natural topics are dependency grammar, movement, and locality.
TheoremPath direction: constituency parsing, tree-LSTMs, and syntactic probes are the ML-theory side of the same object.
ComputationPath: context-free grammars and CYK parsing algorithm pages provide the formal-language framing.
DSAPath: parsing algorithms (CYK, Earley, chart parsers) use constituency structure as the underlying grammar.

References

Canonical:

Carnie, Andrew. Syntax: A Generative Introduction (2021, 4th ed.), Chapters 3-4.
Radford, Andrew. Analysing English Sentences (2009, 2nd ed.), Chapters 1-2.
Sportiche, Dominique, Hilda Koopman, and Edward Stabler. An Introduction to Syntactic Analysis and Theory (2014).
Chomsky, Noam. Syntactic Structures (1957).
Adger, David. Core Syntax: A Minimalist Approach (2003).
Kroeger, Paul. Analyzing Syntax: A Lexical-Functional Approach (2004).

Computational:

Stern, Mitchell, Jacob Andreas, and Dan Klein. "A Minimal Span-Based Neural Constituency Parser." ACL (2017).
Kitaev, Nikita, and Dan Klein. "Constituency Parsing with a Self-Attentive Encoder." ACL (2018).
Marvin, Rebecca, and Tal Linzen. "Targeted Syntactic Evaluation of Language Models." EMNLP (2018).
Warstadt, Alex, et al. "BLiMP: The Benchmark of Linguistic Minimal Pairs for English." TACL 8 (2020) 377-392.