Discourse Analysis
Definition
Discourse analysis in NLP studies the structure and coherence of multi-sentence texts. Rhetorical Structure Theory (RST) models how sentences and paragraphs relate through rhetorical relations like Elaboration, Contrast, Cause, and Evidence. Coherence models assess whether a sequence of sentences forms a coherent, well-organized text. Discourse parsing identifies clause-level discourse units and the relations between them, producing a hierarchical tree. Applications include text summarization (discourse structure guides extraction), essay scoring (coherence metrics), and sentiment analysis (discourse-level negation and contrast can flip sentence-level sentiment).
Why It Matters
Discourse analysis is important for NLP tasks that require understanding multi-sentence text as a coherent whole. Text summarization systems that understand discourse structure produce better summaries by selecting sentences that capture the main claims and key evidence rather than just the highest-scoring individual sentences. For AI writing assistants, discourse coherence scoring evaluates whether document-level text flows logically. For customer feedback analysis, discourse-level understanding distinguishes a complaint that eventually leads to praise ('The delivery was late BUT the product quality exceeded my expectations') from a straightforwardly negative review.
How It Works
RST discourse parsers use a shift-reduce algorithm with a CKY-like dynamic program, making greedy or beam-search decisions about how to attach discourse units. Neural discourse parsers use transformer encoders for discourse unit representations and train on corpora like RST-DT (347 annotated Wall Street Journal articles). Implicit discourse relation classification (identifying relations between adjacent sentences without explicit connectives like 'because' or 'however') is particularly challenging, requiring inference about the intended relationship from semantic content alone.
Discourse Analysis — Rhetorical Relations Between Sentences
The product launch was a success.
However, delivery times were too slow.
This led to a wave of negative reviews.
We hired additional logistics staff to fix it.
Discourse relations
Real-World Example
An AI essay grading system uses discourse analysis to evaluate student essays for organizational coherence. The RST parser identifies whether the essay contains: a claim (nucleus), supporting evidence (satellite with Evidence relation), counterargument (Contrast relation), and rebuttal (further Contrast relation). Essays with well-structured discourse trees receive higher scores for 'organization'; essays where all sentences have flat Elaboration relations with no argumentative structure receive lower scores. This automated dimension of scoring previously required entirely manual rubric-based evaluation.
Common Mistakes
- ✕Applying sentence-level sentiment models to discourse-level negation—'The product was terrible—just kidding, I love it' requires discourse understanding
- ✕Ignoring discourse analysis for long-document tasks—summarization and coherence scoring without discourse structure produce lower-quality outputs
- ✕Expecting high accuracy from discourse parsers on informal text—discourse models trained on news articles perform poorly on conversational or informal text
Related Terms
Constituency Parsing
Constituency parsing breaks a sentence into nested hierarchical phrases—noun phrases, verb phrases, clauses—producing a tree structure that reveals the grammatical constituents of a sentence.
Coreference Resolution
Coreference resolution identifies all expressions in a text that refer to the same real-world entity—linking 'Sarah,' 'she,' and 'the manager' to the same person—enabling coherent multi-sentence understanding.
Text Summarization
Text summarization automatically condenses long documents into shorter versions that preserve the most important information, enabling rapid review of support tickets, articles, and conversations at scale.
Natural Language Processing (NLP)
Natural Language Processing (NLP) is the field of AI focused on enabling computers to understand, interpret, and generate human language—powering applications from chatbots and search engines to translation and sentiment analysis.
Linguistic Annotation
Linguistic annotation is the process of manually or automatically labeling text with linguistic information—such as POS tags, parse trees, named entities, or coreference chains—creating training data for supervised NLP models.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →