¶ Rhetorical Structure Theory Annotation and eRST
GUM analyzes the discourse structure of documents using tree-like graphs, in an enhanced version of Rhetorical Structure Theory (RST, Mann & Thompson 1988), called eRST. The data is initially annotated in the course LING-4427 for basic RST trees, and enhanced with eRST annotations outside of the class. The eRST formalism distinguishes itself from RST in the addition of secondary, tree breaking relations, and the anchoring of discourse relations to categorized signal tokens, which characterize specific instances of discourse relations. The eRST formalism is described in detail in this paper:
Zeldes, Amir, Aoyama, Tatsuya, Liu, Yang Janet, Peng, Siyao, Das, Debopam and Gessler, Luke (2024) "eRST: A Signaled Graph Theory of Discourse Relations and Organization".
https://arxiv.org/abs/2403.13560
Before reading the guidelines, annotators should read Mann & Thompson (1988) "Rhetorical Structure Theory", which is also the reference for resolving annotation issues that are: 1. not covered in these guidelines, and 2. have no relevant examples as precedent when searching in ANNIS.
The RST-DT guidelines (Carlson & Marcu 2001) may also be consulted where these do not contradict GUM guidelines, and especially for text segmentation, for which GUM should have identical guidelines to RST-DT.
The main sentences (or equivalent fragments/utterances) identified as <s> tags in the markup phase, which are also the basis of syntactic analysis in the dependency annotation phase, are always separate segments for the purpose of RST analysis.
Additional segments may be needed, often to delineate subordinate clauses that match an RST relation function. In practice, the following types of clauses are usually made into separate segments:
- Conditional clauses (with 'if' or other complementizers) are typically separated and later placed in the 'conditional' relation with the segment containing the clause they modify.
- [I'll send it] [if I have time]
- Circumstance clauses, especially denoting time and place (e.g. temporal 'while' clauses), are similarly segmented apart.
- [I was at home] [while cars were driving by]
- Infinitive clauses of purpose ('in order to', or just 'to' infinitive with the same meaning):
- [We went to Turkey] [to enjoy some sunny weather]
- Infinitival objects (as opposed to adverbial clauses purpose clauses) are NOT segmented, so the following is one segment:
- Subordinate clauses (both conditional and adverbial) in cleft constructions are separated from the main clause of the construction.
- [It was when I arrived at the airport] [that I realized I forgot my passport.]
- [It is when we do not tolerate deviation from norms] [that it becomes a problem.]
- Relative clauses and reduced relatives: (typically annotated as elaboration-attribute)
- [We met the woman] [who lived in the house]
- [We met the woman] [living in the house]
- [We met the woman] [ordered to live in the house]
- To-infinitives modifying a noun, e.g.:
- [we had a chance] [to succeed] (these are typically elaboration-attribute, not purpose, but some exceptions truly denote the purpose of the modified noun)
- Prepositional phrases are almost never EDUs (see exceptions below):
- NO segmentation: [We met the woman in charge of the house]
¶ Subject and object clauses
Subject and object clauses are not segmented, with the exception of attribution, in which reported speech or thoughts are more central than the speech verb:
- One segment:
- [That he said so angered Bill] (the act of saying is the main point, no attribution)
- [Bill decided to go] (not a reported speech or thought)
- Multiple segments with attribution:
- [It was wonderful,] [Bill thought] (main point is the evaluation relation, the evaluator is given as attribution)
- [Would you like to come?] [Bill replied] [Yes, I would] (main point is the positive response, 'Bill replied' is just an attribution)
- [I imagine] [it’s the same for faculty]
- Elaborations where the speech verb is more central (opposite of attribution). In the following example, the speech verb "say" is coordinated with another nucleus as part of an instruction on what to do - the speech content is an elaboration-additional satellite:
- [To annoy people in an elevator]purpose-goal [jump up at every floor]joint-list [and say]joint-list ["All aboard!"]elaboration-additional
Using elaboration-additional for speech verb object clauses means we think the main rhetorical act is considered to be reporting the fact that something was said; if we want to assign the nested rhetorical structure of what was said in itself as the main point, this means we should use attribution and treat the speech itself as the nucleus, with its own discourse function.
¶ Relative and modifier clauses
Relative and adverbial clauses modifying nouns (using a relative pronoun, zero relative or participial clauses) are all segmented and typically act as elaboration-attribute or purpose-attribute:
- [I saw the girl] [who went to the beach]
- [I liked the apples] [we ate]
- [They wanted the blue kind,] [melting in the sun next to the green]
- [We chose the car] [covered in snow]
To-infinitives or that clauses modifying a noun and prepositional adnominal clauses are also segmented, and analyzed similarly to relative clauses (by default as purpose-attribute, but other options are possible in context). Note that for prepositional modifiers, the headword of the adnominal clause must be a verb (typically a gerund in -ing):
- [They made the decision] [to go]purpose-attribute
- [They developed a method] [to open bottles]purpose-attribute
- [They developed a method] [for opening bottles]purpose-attribute
Pure prepositional noun modifiers are not segmented. In the following example 'survival' is a noun and therefore not eligible for EDU status:
- [They developed a method for survival]
Infrequently, the adnominal clause can be the nucleus:
- [The conclusion was the fact]organization-preparation [that there were no alternatives]
But note that if the modified noun is only a small part of the main clause, the adnominal clause is usually a satellite:
- [Contributing factors included the weather, the economy, and the fact] [that there was no more time]elaboration-attribute
If a relative or other adnominal clause interrupts a larger EDU, we join both parts of that unit using the same-unit relation, but if such a unit is followed only by the sentence's final punctuation, there is no need for same-unit (i.e. we do not make a segment just for the final period after a sentence-final relative clause).
Trailing non-opening punctuation (i.e. not '(') is attached to the modifier clause if present:
- [The method,]same-unit [adopted in 2005,]elaboration-attribute [ensures fast delivery.]same-unit
Most full clauses coordinated by 'and', 'or' or 'but' are made into independent EDUs. The coordinating conjunction belongs to the second EDU, e.g.:
- [call the local harbor master] [and he will sort you out]
Exceptions to splitting EDUs include:
-
Coordination is inside an object clause, which is not eligible to be an EDU:
- [They were ordered to call the harbor master and get sorted out]
-
When a coordinate VP is the to-infinitive object of another verb:
- [He wanted to go and check]
-
When the object is shared across both clauses (i.e. VP coordination) and no EDU intervenes, the sentence is considered a single EDU:
- [The harbor master ordered and prepared the boat] (boat is a shared object, no segmentation of coordination)
-
But when the subject is shared, we do segment:
- [The harbor master called] [and sorted us out] (harbor master is subject of both verbs, but no shared objects)
-
Exception: when another EDU intervenes between two parts of a VP coordination, we segment despite shared objects, and merge the segmented units with same-unit. This also applies if we have multiple coordinate VPs:
- [The harbor master paid for]same-unit [(although we knew]attribution-positive [he had no money)]concession [and prepared the boat]same-unit
Elliptical coordinate VPs are segmented (roughly corresponding to gapping constructions or Right-Node-Raising, or cases analyzed as orphan in Universal Dependencies):
- [Mary drank coffee] [and Jane tea]
Note also that modal and auxiliary sharing is allowed and does not prevent EDU segmentation for coordinated infinitives. Examples:
- [Kim should call first] [and then go]
- [Kim has tried] [and succeeded]
The same is not true for verbs taking a to-infinitive complement, such as 'want', as shown in the example above.
¶ Clefts, pseudo clefts and extraposition
Clefts and pseudo clefts are left unsegmented, following RST-DT guidelines:
- [It was John who brought the flowers]
- [What John brought was flowers]
Similarly, extraposed clauses with dummy pronouns are not segmented, because the expletive 'it' can be treated as equivalent to the extraposed clause:
- [It's important to water plants]
In this example, we treat the whole EDU as "To water plants is important". Even though 'important' is evaluative, "to water plants" is equivalent to the subject, and subject clauses are generally not segmented, meaning this is identical to the unsegmented treatment of [Watering plants is important].
¶ Parentheticals and references
Following RST-DT guidelines, parentheticals set apart by parentheses or dashes are segmented, even if they are otherwise syntactically ineligible to be EDUs, such as appositions. Compare:
- [We spent $50]same-unit [(approx. 40 Euros)]restatement-partial [on food]same-unit
- [We spent $50, or approx. 40 Euros, on food]
Syntactically unintegrated citations are segmented, but integrated ones which function as an argument are not:
- [Smith (2000) has shown this convincingly]
- [This has been shown convincingly] [(Smith 2000)]explanation-evidence
Note that parenthetical dates in article citations are not EDUs, but parenthetical dates describing dated events, birth years, etc. are EDUs:
- [We read Smith (2000)]
- [Jane Smith] [(1901-1974)]context-circumstance [was a paleontologist]
In non-academic settings, where dates are not typically provided as part of the reference name of a work, the parenthetical is segmented as usual. This is particularly the case for names of works accompanied by a date, rather than author+year citations in academic contexts. For example:
- [the romantic comedy Imagine Me and You] [(2006)]context-circumstance
Note that unlike in this example, we cannot say "we read Smith" to unambiguously mean "Smith (2000)" - in that sense "(2000)" is really part of the 'name' of the reference "Smith (2000)", whereas "(2006)" is not part of the name of the comedy and not normally expected, i.e. it is a satellite.
Following RST-DT guidelines:
<blockquote> Text fragments followed by colons are treated as separate EDUs, even when the fragment is a word or phrase, as long as the text that follows the colon provides further elaboration on the topic introduced by the colon </blockquote>
Notice that this does not mean that all colons separate EDUs. Specifically, ":" inside an NP is not an EDU break point:
- [We watched Star Wars: The Empire Strikes Back]
But colons are used as segmentation points when introducing a new idea, elaborating on a previous point, etc.:
- [I got the best ones:] [banana and cherry flavors]elaboration-additional
- [Examples:]organization-preparation [olive oil, butter, ghee]
In accordance with RST-DT guidelines, EDUs without a verbal predicate (e.g. prepositional phrases) are segmented in the presence of a strong discourse marker. The RST-DT guidelines list the following exhaustive set of markers:
- because of, in spite of, despite, regardless, irrespective, without, according to, as a result of, not only ... but also
Free relatives, which are predicates attached to a WH word which simultaneously occupies a grammatical function in the matrix and relative clause, are not segmented:
- [you can do whatever you want]
- [I saw who you were talking to]
Note that these can be identified by the non-insertability of a relative pronoun:
- [you can do whatever (*that) you want]
- [I saw who (*which) you were talking to]
Some known verbs which do or do not trigger attribution segments include:
- Passives: [It is said that winter is cold]
- But active, even impersonal: [people say] [that winter is cold]
The verb 'mean' can be attributional in some constructions, but not others:
- [I mean,] [they shouldn't have done it] (source of this opinion indicated by 'I mean')
- [This means there will be more debt] (not a source, 'mean' means 'inference')
Based on RST-DT, "figure out" is not an attribution verb, though "understand" is:
- [We figured out that the crisis is coming]
- [We understand] [that a crisis is coming]
In clauses with 'every time' or 'by the time', we segment not at the relative clause boundary, but before the 'time' expression:
- [Every time you do it]contingency-condition [something bad happens]
- [By the time you were done]context-circumstance [it was already too late]
In other words, we do not segment [...time] [you...] and we do segment before 'every time', 'by the time', etc., which is treated the same as 'whenever'/'when', etc. (cf. two instances of by the time/every time in RST-DT). Similarly:
- [the minute you arrived] [they screamed] (NOT: [the minute] [you arrived])
The complex conjunction 'as soon as' is taken to introduce a single temporal EDU (usually context-circumstance), similarly to 'when'. It is NOT segmented into [as soon] [as], but left whole:
- [They left] [as soon as they had finished eating]context-circumstance
¶ Leading 'and' before subordinator
Initial conjunctions before a subordinator are segmented, and same-unit is used to join them to their predicate:
- [and]same-unit [when the rain stopped]context-circumstance [we continued]same-unit
Note that in this example, the 'and' belongs to the verb 'continued', and the circumstance could be dropped, leaving the 'and' in the main clause: "and [...] we continued". The same can happen with other coordinations and subordinating conjunctions ("but [if ...] then we will...")
¶ Fillers 'I mean', 'you know' and 'see'
These are treated as phatic organization expressions (premodifiers or postmodifiers) or sometimes evaluations (usually postmodifiers, see below), and therefore do not constitute attribution clauses, but do constitute EDUs. As a result, they cause a same-unit split when they occur medially:
- [See,]organization-phatic [they needed to go]
- [But]same-unit [you know,]organization-phatic [that was it.]same-unit
¶ Interruptions and repairs
Interrupted clauses form an EDU when:
- They are separate sentences based on sentence segmentation guidelines (e.g. because there is a speaker change)
- When they contain a clausal predicate head structure, even if it is incomplete, usually because they contain a verb
Some examples of multiple EDUs:
- [I want to] [I have to go]
- [I saw that --] [because] [you know] [I didn't know]
Some examples of single EDUs:
- [I- I- you don't have to!] (no predicate attached to "I", so it does not amount to an EDU despite the repair)
- [and -- or, actually no] ("and" did not introduce its own predicate, so we have one EDU)
Typically enough + to are segmented:
- [Do it enough] [to make it come off]purpose-goal
- [Yell loud enough] [to make it audible next door]mode-manner
The 'to' clause is often labeled purpose-goal or mode-manner, depending on context. An adnominal adjective + enough + infinitive would be purpose-attribute:
- [They had a fast enough car] [to outrun the police]purpose-attribute
In the last case, outrunning the police is a purpose of the car, but not necessarily a purpose of the entire clause about having the car.
Sentences with 'feel like' in the sense 'think' are segmented as attribution predicates. Note that 'like' belongs to the content EDU, similarly to the guideline for attaching 'that' to the content EDU:
- [I feel] [like it's not always true]
Some exceptional constructions which form conventional full predications are segmented apart from an attribution verb even though they do not contain a verb. For example:
- [I thought,] [oh boy!] (note that "oh boy" is conventionally a full utterance)
- [I said no wonder.] ("no wonder" is the content of an uttered sentence, though it is a verbless construction)
Contrast the above cases with simple nominal objects of the same predicates, which do not result in segmentation:
- [I said some harsh words.]
- [I thought about the accident.]
¶ Relations and Conventions
In the following, W refers to the Writer (or speaker) and R refers to the addressed Reader (or hearer); S refers to a satellite and N refers to a nucleus in each relation (see also Choosing between relations further below for some tests to distinguish confusable relations):
→ if a relation can only point forward by definition; ← if only backward; →← if either forward or backward; ^ for multi-nuclear relations.
- Presentational
- adversative - connects discourse units for which some incompatibility is being highlighted; three subtypes are distinguished:
- adversative-antithesis (→←) - R is meant to prefer N as an alternative to S
- adversative-concession (→←) - R is meant to look past an incompatibility of N with S
- adversative-contrast (^) - W presents multiple Ns as incompatible, but of equal prominence
- context-background (→←) - S provides information to increase R's understanding of N
- explanation - S adds support for N by explaining its utterance; three subtypes are recognized:
- explanation-evidence (→←) - S provides evidence which increases R's belief in N
- explanation-justify (→←) - S increases R's acceptance of W's right to say N
- explanation-motivation (→←) - S is meant to influence R's willingness to act according to N
- organization - S makes R more prepared for the appearance of N; three subtypes are recognized:
- organization-heading (→) - is used specifically when the preparation is expressed in an explicit text organizing device such as a heading
- organization-phatic (→←) - is used when the preparation merely amounts to W holding the floor, without otherwise contributing propositional content to the discourse, including backchanneling, incomplete or repaired/aborted utterances
- organization-preparation (→) - covers all other forms of S units primarily used to signal an upcoming N
- topic - S is presented in order to steer the discourse toward N; two sub-types are distinguished:
- topic-question (→) - N is the answer to the question posed by S
- topic-solutionhood (→←) - N is a solution to a problem presented by S
- Subject matter
- attribution - S informs about the source of information in N; two subtypes are distinguished:
- attribution-positive (→←) - S states a source for the information in N
- attribution-negative (→←) - S states that a potential source is NOT a source of the information in N
- causal - S and N form a cause and result pair; two subtypes are distinguished:
- causal-cause (→←) - S is the cause of N (and N is more prominent)
- causal-result (→←) - S is the result of N (or: N is the cause of S, and N is more prominent)
- context-circumstance (→←) - S details circumstances (often spatio-temporal) under which N applies
- contingency-condition (→←) - N occurs depending on S
- elaboration - S gives additional information about N. Two subtypes are recognized:
- elaboration-attribute (←) - is used when S elaborates on a participant within N, rather than on the entire proposition in N
- elaboration-additional (←) - is used in all other cases, when S is an elaboration on N as a whole
- evaluation-comment (→←) - S provides an assessment of N by W (R does not have to share this assessment)
- mode - S supplied information about how N happens; two subtypes are distinguished:
- mode-manner (→←) - S indicates the manner in which N happens
- mode-means (→←) - S indicates the means by which N happens
- joint - connects multiple, non-contrasting discourse units of distinct context and equal prominence
- joint-disjunction (^) - W presents multiple Ns which can be regarded as interchangeable alternatives
- joint-list (^) - W presents multiple Ns in parallel which can be regarded as additive to one another (both X and Y), are of equal prominence, and do not exhibit a temporal sequence, contrast or alternative status. The elements should serve an equivalent purpose to each other (otherwise, see joint-other)
- joint-sequence (^) - Multiple Ns form a temporally ordered sequence of events in order
- joint-other (^) - any other collection of unlike discourse units of equal prominence at the same level of hierarchy, but of disparate (non-equivalent) discourse purpose compared to each other
- purpose - S specifies the purpose of N; two subtypes are distinguished:
- purpose-attribute (→←) - is used when S gives the purpose of a participant within N, rather than on the entire proposition in N
- purpose-goal (→←) - the proposition in N as a whole is initiated or exists in order to realize S
- restatement - connects discourse units which are roughly equivalent; two subtypes are distinguished:
- restatement-partial (←) - S partly realizes the same role and content as a previous N
- restatement-repetition (^) - Multiple Ns realize the same role and content
- same-unit (^) - indicates a discontinuous discourse unit (this is not a discourse relation)
-
Each document should form a complete 'tree' in the sense that there are no separate groups of segments or 'islands' that are not linked by relations.
-
There should be a single top level span or multinuclear node spanning the entire text, i.e. with a unit index 1-N, where N is the length of the document in EDUs.
-
If several different topics are discussed which form encapsulated 'islands' with no relations between them, then by convention these will all be joined at the end of the annotation process using a multinuclear group dominating the islands with the joint-other relation.
-
Some typical scenarios of how 'islands' are formed and grouped include:
- The series of questions & answers (QAs) in an interview. If no specific rhetorical progression is found between multiple QAs, then each pair may form its own island and these are connected by a joint-other (and not by the more explicitly coordinated joint-list relation). Items outside the QA sequence may join the tree at a higher level (e.g. headings as preparation-organization for the entire interview, or an introductory paragraphs giving background about the speakers).
- Note that although the questions in an interview appear in sequence, they are not labeled as a sequence either, unless the answers themselves form a chronological succession (answer 1: "first I did X", answer 2: "later we decided to do Y"). Generally the collection of answers simply forms a joint. The figure below gives an example of this structure.
-
The subsections of a travel guide can form islands that should be joined by a joint-other. Typically sections like ‘getting there’ and ‘understand’ are autonomous and are analyzed internally, then joined at the top with the rest of the article, though the main heading may precede the entire joint and modify it (see below).
-
Lists of ingredients, destinations and other enumerations are instead joined by a joint-list relation.
-
In how-to guides, the subsections (preparations, tips, warnings), or different methods (method 1, method 2…) are often separate and can be analyzed internally, then connected by a joint-other (for different subsections), joint-list (for a list of methods with equivalent discourse function), or a joint-sequence unifying all chronologically ordered steps in a method, and then a higher joint-list unifying the methods.
-
The main progression of biographies often forms a joint-sequence (e.g. Early Life section followed by Career)
-
Sections in an academic paper at the same level often form islands joined by joint-other (notice that they are not equivalent or parallel, and therefore not a joint-list). If there is an abstract, often the joint-other of main sections can be seen as an elaboration on the abstract block.
A satellite which has a satellite will have a span grouping the lower satellite before it modifies something else. In the example below, if we think that 33 is an elaboration of 32, and 32 is an elaboration of 30, then by extension, 33 is also part of the complex elaboration of 30. This means that 32+33 need to be grouped together first under a span, and then form the higher elaboration. The bad example at the top is a case of what we call 'chaining' (a flat sequence of arrows). Relative clause elaborations are also usually grouped with their main clause into a span (30-31) before the span is modified.
When a single EDU is interrupted, for example by a relative clause, a same-unit multinuclear relation is created to contain the embedded unit. That embedded unit is attached based on syntactic criteria to the part of the same-unit which has its head (for a relative clause: attach the relative clause EDU to the part that contains the noun being modified).
If a same-unit construct has a modifier which applies to the entire interrupted EDU, then it is attached above the same-unit, not inside. For example in the image below, the purpose EDU is attached to the entire same-unit, not to part 2 (the sub-unit on the right), since it would have modified the entire, single EDU if it hadn't needed the same-unit split:
¶ Handling questions
-
Questions are typically seen as satellites to their respective answers and are linked using the topic-solutionhood relation.
-
When connecting individual sections or QA pairs to the main joint-other of an article, a span should be used above the entire QA/section subtree to make it clear that the entire subtree is a member of the joint. Do not link just the main segment to the joint directly if there are other segments in the subtree.
Fillers like 'you know' or 'I mean' receive their own EDUs based on the clause to predicate mapping in RST, but they are generally close to empty in content, and are always satellites. When used before a main predicate (including medially inside Same-Unit), they are seen as preparations:
- [I mean,]organization-phatic [I had to do it] (the sense is 'you know' merely leads the hearer to expect some statement)
When used as postmodifiers, they are still analyzed as phatic, conveying a very weak sense of the speaker assessing the nucleus as being understandable or obvious. Note that this is the only context in which an organization-type relation can point backwards:
- [There was no other option,] [you know.]organization-phatic
If the filler has the form of a question, but is not soliciting an answer, it is still phatic, but otherwise it can also be a genuine question, in which case "you know" can be an attribution:
- [There was no other option,] [you know?]organization-phatic
- ([There was no other option,] <==attribution-positive [you know right?]) topic-question==> [I know!]
If there are multiple identical fillers, they may form restatements or joints as appropriate, and the new multinuc will serve as a preparation etc.:
- [I mean,]restatement-repetition [I mean,]restatement-repetition ...
- [I mean,]joint-other [you know,]joint-other ...
Note that we use joint-other and not joint-list for multiple fillers, since their content is not additive (as a test, consider that you cannot insert a coordinating conjunction such as "??I mean and you know ...").
In cases where multiple speakers use back-channeling to respond to each other, we assume a hierarchical analysis, in which each back channel EDU refers back to the block containing its predecessor. For example here, 175 and 177 are uttered by the same speaker, and 176 by a different one, so we assume each EDU scopes over the previous block:
If a repaired EDU (usually headed by a syntactic reparandum relation) has its own discernible function, it is handled based on general guidelines. Some special cases include:
- The repaired unit is sufficiently realized to carry out the same function as its repair. In these cases use restatement-repetition: [We went there last Mon-] [we went there Monday last week.]
- The repaired unit is deficient, in which case it is seen as a organization-phatic for what was finally said: [I wanted to-]organization-phatic [I wanted to thank you]
- Ideally a satellite and nucleus should form a group covered by a span, and one nucleus should not have more than one incoming satellite. For example, if a satellite provides background to a nucleus which also has an elaboration, it may make more sense to see the background as modifying the entire span of nucleus and elaboration (since background is given for the benefit of both the other EDUs).
- In cases of two equal satellites to the same nucleus with the same function, a joint-other, joint-list or restatement-repetition multinuc can be used as appropriate. This is the preferred structure if both satellites are seen to provide a similar or closely related contribution. For example, the structure on the left is preferred to the structure on the right below, because both satellites give the same elaborating information, namely specifying the members of an organization. If the satellites have different functions or give rather different details, then they can be attached directly to the nucleus without forming a multinuc first.
You should not have spans that have no incoming connections except for another span or multinuc above them. Spans are there to group elements, so that they can have an incoming or outgoing connection relating to some other node. In particular, EDUs should not have a span containing only themselves:
¶ Headings, dates, images and captions
- Headings are typically seen as a organization-heading for the following group of segments comprising the section under the heading. This is especially true if the heading does not contain information not covered again in the section. The ‘organization’ should target an added span covering the entire section, and not just the head segment of the section. (see image below)
- In some cases, the heading contains the main gist of a (usually short) section, and the section itself may be seen as an elaboration-additional of the heading.
- Images themselves do not form RST segments, however when their captions are part of the text, the entire effect of the image and caption may be taken into consideration. Most often (or when in doubt), an image and its heading will provide context-background for the subsequent text, but under some circumstances a caption and the related image may provide explanation-evidence or serve as an elaboration-additional, or in rare cases even other relations.
- The words 'Figure X' or 'Table X' are also annotated as a organization-preparation if they form their own segment, or organization-heading if they are graphically set apart:
- If there is a heading ‘organization-heading’ followed by a ‘context-background’ image and caption at the beginning of a section, typically the caption (standing in for the image as well) is seen as giving background to the entire section, and the heading is a preparation for the group of segments containing both the section and the background caption.
- If there is a secondary caption or a caption-internal segment giving attribution, such as the photographer’s name or the name of a person quoted in a block quote, these may be seen as an ‘attribution’ or the primary caption segment. The word "image:", "photo:" or similar are often preparations for this attribution
(e.g. [image of a magician] <-attribution-positive-- ([photo:] --organization-preparation-> [Paul Budd])
- If there is a segment detailing the date (e.g. for a news item or interview), and the date applies to the entire text, it may be seen as a ‘context-circumstance’ to the entire text. If the date is qualifying a more specific sub-part of the document it may be attached accordingly, again using the ‘context-circumstance’ relation.
- If a main heading is reiterated in the text, this is not generally seen as a ‘restatement’, but rather the heading is seen as ‘organization-preparation’ for the section in the interest of consistency.
In academic articles, the paper is often preceded by contact details for the authors, such as affiliations and e-mail addresses. These can be seen as ‘attribution-positive’ information to the entire article, and usually attach to the top level node unifying all subsequent nodes.
- If multiple addresses have separate segments, they can be joined via ‘joint-list’.
- The title of the article, which usually precedes the addresses, is generally attached using the ‘organization-preparation’ function as usual, pointing to a higher span above the article and addresses (see the image below).
Some discourse markers are ambiguous or behave in ways which are initially hard to interpret. The following guidelines help with some common dilemmas.
When used with past tense predicates, 'until' is often temporal and therefore circumstantial:
- [They lived on the island] [until the great hurricane came]joint-sequence
But with non-past tense, it often marks contingency-condition, for example:
- [freeze it] [until you need it]
Is equivalent to:
- [keep it frozen] [unless you need it]
- [unfreeze it] [if/when you need it]
Unless is generally seen as signaling a negative conditional:
- [do it] [unless they object]contingency-condition
This is similar to:
- [do it] [if they don't object]contingency-condition
Instead is often indicative of adversative-antithesis, but can appear either in the antithesis satellite itself, or in the nucleus:
- [Don't go alone,]adversative-antithesis [bring someone with you instead.]
- [Instead of going alone,]adversative-antithesis [bring someone with you.]
Rather or rather than work very similarly to 'instead', and can also indicate adversative-antithesis in either the satellite or nucleus.
Depending clauses are often interpreted as conditionals:
- [Depending on the weather]contingency-condition [you may be able to go out]
This construction is conditional, as it corresponds roughly to "if the weather is a certain way...".
¶ Acronyms and translations
If an acronym for an expression within a sentence is specified in parentheses, it is considered a satellite partial restatement, but not multinunclear (since the parentheses only repeat part of the main sentence):
- [He was caught by the Central Intelligence Agency] [(CIA)]restatement-partial
However if the acronym in parentheses appears first, it is interpreted as context-background, since a satellite restatement cannot precede the restated content:
- [We used the]same-unit [(Light Emitting Diode)]context-background [LED bulbs]same-unit (Note: context-background points left to right here, since the acronym explains what LED means on the right)
Translations can be analyzed in the same way; if they have a language specified before a colon, that is segmented based on EDU segmentation guidelines, and can be considered an organization-preparation (but only if there is a colon). Compare:
- [She was born in Gdansk] [(German:]organization-preparation [Danzig)]restatement-partial
- [She was born in Gdansk] [(In German, Danzig)]restatement-partial
When considering two similar relations between sentences without an explicit connective like 'beacuse' or 'if', sometimes inserting a connective or phrase can help to disambiguate. Useful phrases include:
- 'because' - if you can insert 'because' between clauses, often you have a causal cause or result relationship
- 'the reason I say this...' - if you can insert this, it can indicate explanation-justify
- 'what you need to know about this...' - can indicate context-background
- 'proof of this is...' - can indicate explanation-evidence
- 'or' and 'alternatively' can indicate joint-disjunction
Some examples:
- " [ IE's market share has dropped to 56%.] [Mozilla's Firefox has been actively increasing its market share] " - in this example, it's easy to insert 'because', and the relationship is causal-cause.
- If this were explanation-justify, we could say "the reason I say this is that Mozilla... "
- If it were context-background, we could say "what you need to know about this is that Mozilla..." which is also more forced
- " [York is a fairly small city -] [four days is enough to see the major sights] " - in this example, we can add "proof of this is..." between the two units, and the relation is explanation-evidence
- If it were causal-cause we could say "the city is small because four days is enough..." - but actually it is not small because of this fact
- It if were context-background, it would be as natural or more natural to say "York is small. What you need to know about this is that four days are enough..."
Comparative correlatives are interpreted as conditional constructions:
- [The more you know about your audience]contingency-condition [the better your jokes will be]
¶ Academic citations and references
References forming an EDU (i.e. non-syntactically integrated, see segmentation guidelines) typically function as evidence:
- [This has been shown in a previous study] [ [20] ]explanation-evidence
Parenthetical currency and other measurement unit conversions are taken to be restatements. If the parent EDU contains more than just the unit term, then the restatement is satellite-nucleus, otherwise multinuclear:
- [$50]restatement-repetition [(40 Euros)]restatement-repetition (multinuclear)
- [We paid $50] [(40 Euros)]restatement-partial (satellite)
Tag questions, including negative tag questions, are interpreted as restatements (and not as contrast), since they presuppose and re-assert the initial statement:
- [You want it,] [don't you?]restatement-partial
- [You want this one,] [is that right?]restatement-partial
They are usually nucleus-satellite, since the tag question conveys less explicit information than the initial statement, though it is possible to have multinuclear constructions when the tag question expresses the full content of the initial statement.
Date EDUs in parentheses can be context-circumstance if they specify the date when something happened:
- [The siege led to the starvation of the city] [(CE 410)]context-circumstance
Even in cases where a single NP is elaborated on, we prefer the context-circumstance relation to an embedded elaboraton-attribute relation, since the parenthetical is effectively extra-syntactic, and is therefore treated in the same way as a separate sentence (see similar treatment of parenthetical restatement-partial of an NP):
- [She then married Jack Smith] [(1834-1881)]context-circumstance