Natural Language Processing (NLP)

Discourse Analysis

Definition

Discourse analysis in NLP studies the structure and coherence of multi-sentence texts. Rhetorical Structure Theory (RST) models how sentences and paragraphs relate through rhetorical relations like Elaboration, Contrast, Cause, and Evidence. Coherence models assess whether a sequence of sentences forms a coherent, well-organized text. Discourse parsing identifies clause-level discourse units and the relations between them, producing a hierarchical tree. Applications include text summarization (discourse structure guides extraction), essay scoring (coherence metrics), and sentiment analysis (discourse-level negation and contrast can flip sentence-level sentiment).

Why It Matters

Discourse analysis is important for NLP tasks that require understanding multi-sentence text as a coherent whole. Text summarization systems that understand discourse structure produce better summaries by selecting sentences that capture the main claims and key evidence rather than just the highest-scoring individual sentences. For AI writing assistants, discourse coherence scoring evaluates whether document-level text flows logically. For customer feedback analysis, discourse-level understanding distinguishes a complaint that eventually leads to praise ('The delivery was late BUT the product quality exceeded my expectations') from a straightforwardly negative review.

How It Works

RST discourse parsers use a shift-reduce algorithm with a CKY-like dynamic program, making greedy or beam-search decisions about how to attach discourse units. Neural discourse parsers use transformer encoders for discourse unit representations and train on corpora like RST-DT (347 annotated Wall Street Journal articles). Implicit discourse relation classification (identifying relations between adjacent sentences without explicit connectives like 'because' or 'however') is particularly challenging, requiring inference about the intended relationship from semantic content alone.

Discourse Analysis — Rhetorical Relations Between Sentences

S1

The product launch was a success.

CONTRAST
S2

However, delivery times were too slow.

CAUSE
S3

This led to a wave of negative reviews.

ELABORATION
S4

We hired additional logistics staff to fix it.

Discourse relations

CONTRASTIdeas in opposition
CAUSEOne causes another
ELABORATIONAdds detail
CONDITIONConditional dependency

Real-World Example

An AI essay grading system uses discourse analysis to evaluate student essays for organizational coherence. The RST parser identifies whether the essay contains: a claim (nucleus), supporting evidence (satellite with Evidence relation), counterargument (Contrast relation), and rebuttal (further Contrast relation). Essays with well-structured discourse trees receive higher scores for 'organization'; essays where all sentences have flat Elaboration relations with no argumentative structure receive lower scores. This automated dimension of scoring previously required entirely manual rubric-based evaluation.

Common Mistakes

  • Applying sentence-level sentiment models to discourse-level negation—'The product was terrible—just kidding, I love it' requires discourse understanding
  • Ignoring discourse analysis for long-document tasks—summarization and coherence scoring without discourse structure produce lower-quality outputs
  • Expecting high accuracy from discourse parsers on informal text—discourse models trained on news articles perform poorly on conversational or informal text

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Discourse Analysis? Discourse Analysis Definition & Guide | 99helpers | 99helpers.com