Determiner-Established Deixis to Communicative Artifacts in Pedagogical Text

This page contains links and documentation for the datasets mentioned in the paper.

Word Senses

Here is a CSV file containing the raw annotations from both annotators. Each row represents a unique synset/gloss, and the columns contain the following:

Column 1: label ascribed by annotator #1 ('y' for communicative artifacts; else 'n')
Column 2: label ascribed by annotator #2 (same marking scheme)
Column 3: word from candidate phrases in the text that brought the synset into consideration
Column 4: synset name
Column 5: synset gloss

The 200 rows in the above file represent the VCS set described in the paper.

Here is a CSV file containing the final annotations, with disagreements resolved. Each row represents a unique synset/gloss, and the columns contain the following:

Column 1: label ('y' for communicative artifacts; else 'n')
Column 2: word from candidate phrases in the text that brought the synset into consideration
Column 3: synset name
Column 4: synset gloss

The 62 'y' rows in the above file represent the CCS set described in the paper.

Candidate Instances

Here is a zip file containing CSV files for each of the 122 Wikibooks included in the paper analysis. Each row represents a match for the sought dependency patterns (i.e., a candidate instance of communicative deixis), and the columns represent the following:

Column 1: sentence number in the text (useful only if you wish to parse the entire document)
Column 2: one-indexed word position in the sentence of the start of the candidate instance
Column 3: one-indexed word position in the sentence of the end of the candidate instance
Column 4: determiner in the candidate instance
Column 5: noun in the candidate instance
Column 6: plain text of the sentence containing the candidate instance
Column 7: phrase-structure parse of the sentence containing the candidate instance (however, dependency parsing was instead used to identify candidate instances)

Wikibooks

Finally, here is a zip file containing the HTML files for Wikibooks in the paper analysis, along with the markdown files generated from them. Due to their size, CoreNLP results on the Wikibooks are available by request. Generating them may be a faster option.

Read more about me or find my contact information here.