GUM analyzes the discourse structure of documents using tree-like graphs, in an enhanced version of Rhetorical Structure Theory (RST, Mann & Thompson 1988), called eRST. The data is initially annotated in the course LING-4427 for basic RST trees, and enhanced with eRST annotations outside of the class. The eRST formalism distinguishes itself from RST in the addition of secondary, tree breaking relations, and the anchoring of discourse relations to categorized signal tokens, which characterize specific instances of discourse relations. The eRST formalism is described in detail in this paper:
Zeldes, Amir, Aoyama, Tatsuya, Liu, Yang Janet, Peng, Siyao, Das, Debopam and Gessler, Luke (2024) "eRST: A Signaled Graph Theory of Discourse Relations and Organization".
https://arxiv.org/abs/2403.13560
Before reading the guidelines, annotators should read Mann & Thompson (1988) "Rhetorical Structure Theory", which is also the reference for resolving annotation issues that are: 1. not covered in these guidelines, and 2. have no relevant examples as precedent when searching in ANNIS.
The RST-DT guidelines (Carlson & Marcu 2001) may also be consulted where these do not contradict GUM guidelines, and especially for text segmentation, for which GUM should have identical guidelines to RST-DT.
The main sentences (or equivalent fragments/utterances) identified as <s> tags in the markup phase, which are also the basis of syntactic analysis in the dependency annotation phase, are always separate segments for the purpose of RST analysis.
Additional segments may be needed, often to delineate subordinate clauses that match an RST relation function. In practice, the following types of clauses are usually made into separate segments:
Subject and object clauses are not segmented, with the exception of attribution, in which reported speech or thoughts are more central than the speech verb:
Using elaboration-additional for speech verb object clauses means we think the main rhetorical act is considered to be reporting the fact that something was said; if we want to assign the nested rhetorical structure of what was said in itself as the main point, this means we should use attribution and treat the speech itself as the nucleus, with its own discourse function.
Relative and adverbial clauses modifying nouns (using a relative pronoun, zero relative or participial clauses) are all segmented and typically act as elaboration-attribute or purpose-attribute:
To-infinitives or that clauses modifying a noun and prepositional adnominal clauses are also segmented, and analyzed similarly to relative clauses (by default as purpose-attribute, but other options are possible in context). Note that for prepositional modifiers, the headword of the adnominal clause must be a verb (typically a gerund in -ing):
Pure prepositional noun modifiers are not segmented. In the following example 'survival' is a noun and therefore not eligible for EDU status:
Infrequently, the adnominal clause can be the nucleus:
But note that if the modified noun is only a small part of the main clause, the adnominal clause is usually a satellite:
If a relative or other adnominal clause interrupts a larger EDU, we join both parts of that unit using the same-unit relation, but if such a unit is followed only by the sentence's final punctuation, there is no need for same-unit (i.e. we do not make a segment just for the final period after a sentence-final relative clause).
Trailing non-opening punctuation (i.e. not '(') is attached to the modifier clause if present:
Most full clauses coordinated by 'and', 'or' or 'but' are made into independent EDUs. The coordinating conjunction belongs to the second EDU, e.g.:
Exceptions to splitting EDUs include:
Coordination is inside an object clause, which is not eligible to be an EDU:
When a coordinate VP is the to-infinitive object of another verb:
When the object is shared across both clauses (i.e. VP coordination) and no EDU intervenes, the sentence is considered a single EDU:
But when the subject is shared, we do segment:
Exception: when another EDU intervenes between two parts of a VP coordination, we segment despite shared objects, and merge the segmented units with same-unit. This also applies if we have multiple coordinate VPs:
Elliptical coordinate VPs are segmented (roughly corresponding to gapping constructions or Right-Node-Raising, or cases analyzed as orphan in Universal Dependencies):
Note also that modal and auxiliary sharing is allowed and does not prevent EDU segmentation for coordinated infinitives. Examples:
The same is not true for verbs taking a to-infinitive complement, such as 'want', as shown in the example above.
Ordinary subject and object clefts, as well as pseudo clefts are left unsegmented, following RST-DT guidelines:
However adverbial or conditoinal clefts are segmented:
Similarly, extraposed clauses with dummy pronouns are not segmented, because the expletive 'it' can be treated as equivalent to the extraposed clause:
In this example, we treat the whole EDU as "To water plants is important". Even though 'important' is evaluative, "to water plants" is equivalent to the subject, and subject clauses are generally not segmented, meaning this is identical to the unsegmented treatment of [Watering plants is important].
Following RST-DT guidelines, parentheticals set apart by parentheses or dashes are segmented, even if they are otherwise syntactically ineligible to be EDUs, such as appositions. Compare:
Syntactically unintegrated citations are segmented, but integrated ones which function as an argument are not:
Note that parenthetical dates in article citations are not EDUs, but parenthetical dates describing dated events, birth years, etc. are EDUs:
In non-academic settings, where dates are not typically provided as part of the reference name of a work, the parenthetical is segmented as usual. This is particularly the case for names of works accompanied by a date, rather than author+year citations in academic contexts. For example:
Note that unlike in this example, we cannot say "we read Smith" to unambiguously mean "Smith (2000)" - in that sense "(2000)" is really part of the 'name' of the reference "Smith (2000)", whereas "(2006)" is not part of the name of the comedy and not normally expected, i.e. it is a satellite.
Following RST-DT guidelines:
Text fragments followed by colons are treated as separate EDUs, even when the fragment is a word or phrase, as long as the text that follows the colon provides further elaboration on the topic introduced by the colon
Notice that this does not mean that all colons separate EDUs. Specifically, ":" inside an NP is not an EDU break point:
But colons are used as segmentation points when introducing a new idea, elaborating on a previous point, etc.:
In accordance with RST-DT guidelines, EDUs without a verbal predicate (e.g. prepositional phrases) are segmented in the presence of a strong discourse marker. The RST-DT guidelines list the following exhaustive set of markers:
Free relatives, which are predicates attached to a WH word which simultaneously occupies a grammatical function in the matrix and relative clause, are not segmented:
Note that these can be identified by the non-insertability of a relative pronoun:
Some known verbs which do or do not trigger attribution segments include:
The verb 'mean' can be attributional in some constructions, but not others:
Based on RST-DT, "figure out" is not an attribution verb, though "understand" is:
In clauses with 'every time' or 'by the time', we segment not at the relative clause boundary, but before the 'time' expression:
In other words, we do not segment [...time] [you...] and we do segment before 'every time', 'by the time', etc., which is treated the same as 'whenever'/'when', etc. (cf. two instances of by the time/every time in RST-DT). Similarly:
The complex conjunction 'as soon as' is taken to introduce a single temporal EDU (usually context-circumstance), similarly to 'when'. It is NOT segmented into [as soon] [as], but left whole:
Initial conjunctions before a subordinator are segmented, and same-unit is used to join them to their predicate:
Note that in this example, the 'and' belongs to the verb 'continued', and the circumstance could be dropped, leaving the 'and' in the main clause: "and [...] we continued". The same can happen with other coordinations and subordinating conjunctions ("but [if ...] then we will...")
These are treated as phatic organization expressions (premodifiers or postmodifiers) or sometimes evaluations (usually postmodifiers, see below), and therefore do not constitute attribution clauses, but do constitute EDUs. As a result, they cause a same-unit split when they occur medially:
Interrupted clauses form an EDU when:
Some examples of multiple EDUs:
Some examples of single EDUs:
Typically enough + to are segmented:
The 'to' clause is often labeled purpose-goal or mode-manner, depending on context. An adnominal adjective + enough + infinitive would be purpose-attribute:
In the last case, outrunning the police is a purpose of the car, but not necessarily a purpose of the entire clause about having the car.
Sentences with 'feel like' in the sense 'think' are segmented as attribution predicates. Note that 'like' belongs to the content EDU, similarly to the guideline for attaching 'that' to the content EDU:
Some exceptional constructions which form conventional full predications are segmented apart from an attribution verb even though they do not contain a verb. For example:
Contrast the above cases with simple nominal objects of the same predicates, which do not result in segmentation:
The expression "capable of" is taken to be the same as "can" or "able to" and is not segmented, even when its complement is a verb:
Following RST-DT's segmentation, the expression "what if" is not segmented. As a result, the "if" is not annotated as a DM in signaling annotation (since the corresponding condition relation does not exist).
In the following, W refers to the Writer (or speaker) and R refers to the addressed Reader (or hearer); S refers to a satellite and N refers to a nucleus in each relation (see also Choosing between relations further below for some tests to distinguish confusable relations):
→ if a relation can only point forward by definition; ← if only backward; →← if either forward or backward; ^ for multi-nuclear relations.
Each document should form a complete 'tree' in the sense that there are no separate groups of segments or 'islands' that are not linked by relations.
There should be a single top level span or multinuclear node spanning the entire text, i.e. with a unit index 1-N, where N is the length of the document in EDUs.
If several different topics are discussed which form encapsulated 'islands' with no relations between them, then by convention these will all be joined at the end of the annotation process using a multinuclear group dominating the islands with the joint-other relation.
Some typical scenarios of how 'islands' are formed and grouped include:
The subsections of a travel guide can form islands that should be joined by a joint-other. Typically sections like ‘getting there’ and ‘understand’ are autonomous and are analyzed internally, then joined at the top with the rest of the article, though the main heading may precede the entire joint and modify it (see below).
Lists of ingredients, destinations and other enumerations are instead joined by a joint-list relation.
In how-to guides, the subsections (preparations, tips, warnings), or different methods (method 1, method 2…) are often separate and can be analyzed internally, then connected by a joint-other (for different subsections), joint-list (for a list of methods with equivalent discourse function), or a joint-sequence unifying all chronologically ordered steps in a method, and then a higher joint-list unifying the methods.
The main progression of biographies often forms a joint-sequence (e.g. Early Life section followed by Career)
Sections in an academic paper at the same level often form islands joined by joint-other (notice that they are not equivalent or parallel, and therefore not a joint-list). If there is an abstract, often the joint-other of main sections can be seen as an elaboration on the abstract block.
A satellite which has a satellite will have a span grouping the lower satellite before it modifies something else. In the example below, if we think that 33 is an elaboration of 32, and 32 is an elaboration of 30, then by extension, 33 is also part of the complex elaboration of 30. This means that 32+33 need to be grouped together first under a span, and then form the higher elaboration. The bad example at the top is a case of what we call 'chaining' (a flat sequence of arrows). Relative clause elaborations are also usually grouped with their main clause into a span (30-31) before the span is modified.

When a single EDU is interrupted, for example by a relative clause, a same-unit multinuclear relation is created to contain the embedded unit. That embedded unit is attached based on syntactic criteria to the part of the same-unit which has its head (for a relative clause: attach the relative clause EDU to the part that contains the noun being modified).
If a same-unit construct has a modifier which applies to the entire interrupted EDU, then it is attached above the same-unit, not inside. For example in the image below, the purpose EDU is attached to the entire same-unit, not to part 2 (the sub-unit on the right), since it would have modified the entire, single EDU if it hadn't needed the same-unit split:
Questions are typically seen as satellites to their respective answers and are linked using the topic-solutionhood relation.
When connecting individual sections or QA pairs to the main joint-other of an article, a span should be used above the entire QA/section subtree to make it clear that the entire subtree is a member of the joint. Do not link just the main segment to the joint directly if there are other segments in the subtree.
Fillers like 'you know' or 'I mean' receive their own EDUs based on the clause to predicate mapping in RST, but they are generally close to empty in content, and are always satellites. When used before a main predicate (including medially inside Same-Unit), they are seen as preparations:
When used as postmodifiers, they are still analyzed as phatic, conveying a very weak sense of the speaker assessing the nucleus as being understandable or obvious. Note that this is the only context in which an organization-type relation can point backwards:
If the filler has the form of a question, but is not soliciting an answer, it is still phatic, but otherwise it can also be a genuine question, in which case "you know" can be an attribution:
If there are multiple identical fillers, they may form restatements or joints as appropriate, and the new multinuc will serve as a preparation etc.:
Note that we use joint-other and not joint-list for multiple fillers, since their content is not additive (as a test, consider that you cannot insert a coordinating conjunction such as "??I mean and you know ...").
In cases where multiple speakers use back-channeling to respond to each other, we assume a hierarchical analysis, in which each back channel EDU refers back to the block containing its predecessor. For example here, 175 and 177 are uttered by the same speaker, and 176 by a different one, so we assume each EDU scopes over the previous block:
For phatic dependents of the same speaker's speech, we prefer to assume minimal scope - for example if an interrupted segment is repaired before continuing on to a longer paragraph or multiple turns, we attach the phatic unit immediately to the repair unit, and not to the larger section that it may begin.
If a repaired EDU (usually headed by a syntactic reparandum relation) has its own discernible function, it is handled based on general guidelines. Some special cases include:


You should not have spans that have no incoming connections except for another span or multinuc above them. Spans are there to group elements, so that they can have an incoming or outgoing connection relating to some other node. In particular, EDUs should not have a span containing only themselves:

(e.g. [image of a magician] <-attribution-positive-- ([photo:] --organization-preparation-> [Paul Budd])
In academic articles, the paper is often preceded by contact details for the authors, such as affiliations and e-mail addresses. These can be seen as ‘attribution-positive’ information to the entire article, and usually attach to the top level node unifying all subsequent nodes.
Some discourse markers are ambiguous or behave in ways which are initially hard to interpret. The following guidelines help with some common dilemmas.
When used with past tense predicates, 'until' is often temporal and therefore circumstantial:
But with non-past tense, it often marks contingency-condition, for example:
Is equivalent to:
Unless is generally seen as signaling a negative conditional:
This is similar to:
Instead is often indicative of adversative-antithesis, but can appear either in the antithesis satellite itself, or in the nucleus:
Rather or rather than work very similarly to 'instead', and can also indicate adversative-antithesis in either the satellite or nucleus.
Depending clauses are often interpreted as conditionals:
This construction is conditional, as it corresponds roughly to "if the weather is a certain way...".
If an acronym for an expression within a sentence is specified in parentheses, it is considered a satellite partial restatement, but not multinunclear (since the parentheses only repeat part of the main sentence):
However if the acronym in parentheses appears first, it is interpreted as context-background, since a satellite restatement cannot precede the restated content:
Translations can be analyzed in the same way; if they have a language specified before a colon, that is segmented based on EDU segmentation guidelines, and can be considered an organization-preparation (but only if there is a colon). Compare:
When considering two similar relations between sentences without an explicit connective like 'beacuse' or 'if', sometimes inserting a connective or phrase can help to disambiguate. Useful phrases include:
Some examples:
Comparative correlatives are interpreted as conditional constructions:
References forming an EDU (i.e. non-syntactically integrated, see segmentation guidelines) typically function as evidence:
Parenthetical currency and other measurement unit conversions are taken to be restatements. If the parent EDU contains more than just the unit term, then the restatement is satellite-nucleus, otherwise multinuclear:
Tag questions, including negative tag questions, are interpreted as restatements (and not as contrast), since they presuppose and re-assert the initial statement:
They are usually nucleus-satellite, since the tag question conveys less explicit information than the initial statement, though it is possible to have multinuclear constructions when the tag question expresses the full content of the initial statement.
Date EDUs in parentheses can be context-circumstance if they specify the date when something happened:
Even in cases where a single NP is elaborated on, we prefer the context-circumstance relation to an embedded elaboraton-attribute relation, since the parenthetical is effectively extra-syntactic, and is therefore treated in the same way as a separate sentence (see similar treatment of parenthetical restatement-partial of an NP):