Natural Language Processing (NLP)

Coreference Resolution

Definition

Coreference resolution is the NLP task of clustering all mentions of the same entity within a document. A mention can be a proper noun ('Google'), a common noun phrase ('the company'), or a pronoun ('it'). Resolving coreferences is essential for tasks that require reading across sentence boundaries—question answering, summarization, and multi-turn dialogue understanding. Modern neural systems use span-based models that jointly score mention detection and coreference linking, trained on corpora like OntoNotes. Performance degrades on long documents and ambiguous pronoun references.

Why It Matters

In multi-turn chatbot conversations, coreference resolution prevents the bot from losing track of what the user is talking about. When a user says 'I ordered a laptop last week. It arrived broken. Can you replace it?' the bot must understand that both 'it' occurrences refer to 'laptop' to fulfill the request correctly. Without coreference resolution, the system treats each sentence in isolation and cannot chain references across turns, leading to confused or repetitive responses.

How It Works

Modern coreference resolvers use span-based neural architectures: they first enumerate candidate mention spans, then score pairs of mentions for coreference using a bilinear scoring function over their span representations. Representations are built from contextualized embeddings (BERT/SpanBERT) that capture surrounding context. Cluster formation uses a greedy or beam-search antecedent selection step. End-to-end models jointly learn mention detection and coreference scoring. SpanBERT, pre-trained with a span masking objective, performs particularly well on this task.

Coreference Resolution — Mention Chains

Annotated sentence

P1SarahtoldP2EmmathatP1shewouldhelpP2herlater.

Resolved coreference chains

PERSON_1 → Sarah

Sarahshe

PERSON_2 → Emma

Emmaher

Resolved reading

"Sarah told Emma that Sarah would help Emma later."

Real-World Example

A legal document review system uses coreference resolution to track all references to each contract party across a 50-page agreement. After resolution, the system can answer 'What are all of Vendor's obligations?' by following every reference to 'Vendor,' 'it,' 'the providing party,' and 'the supplier' throughout the document—a task that would take a paralegal hours is completed in seconds.

Common Mistakes

  • Assuming coreference is solved—error rates are still high on complex, long documents
  • Ignoring coreference in multi-turn dialogue—pronouns across turns create unresolved references
  • Using sentence-level models for document-level tasks—coreference spans entire documents

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Coreference Resolution? Coreference Resolution Definition & Guide | 99helpers | 99helpers.com