Determiner-Established Deixis to Communicative Artifacts in Pedagogical Text
This page contains links and documentation for the datasets mentioned in the paper.
Word Senses
Here is a CSV file containing the raw annotations from both annotators. Each row represents a unique synset/gloss, and the columns contain the following:
- Column 1: label ascribed by annotator #1 ('y' for communicative artifacts; else 'n')
- Column 2: label ascribed by annotator #2 (same marking scheme)
- Column 3: word from candidate phrases in the text that brought the synset into consideration
- Column 4: synset name
- Column 5: synset gloss
The 200 rows in the above file represent the VCS set described in the paper.
Here is a CSV file containing the final annotations, with disagreements resolved. Each row represents a unique synset/gloss, and the columns contain the following:
- Column 1: label ('y' for communicative artifacts; else 'n')
- Column 2: word from candidate phrases in the text that brought the synset into consideration
- Column 3: synset name
- Column 4: synset gloss
The 62 'y' rows in the above file represent the CCS set described in the paper.
Candidate Instances
Here is a zip file containing CSV files for each of the 122 Wikibooks included in the paper analysis. Each row represents a match for the sought dependency patterns (i.e., a candidate instance of communicative deixis), and the columns represent the following:
- Column 1: sentence number in the text (useful only if you wish to parse the entire document)
- Column 2: one-indexed word position in the sentence of the start of the candidate instance
- Column 3: one-indexed word position in the sentence of the end of the candidate instance
- Column 4: determiner in the candidate instance
- Column 5: noun in the candidate instance
- Column 6: plain text of the sentence containing the candidate instance
- Column 7: phrase-structure parse of the sentence containing the candidate instance (however, dependency parsing was instead used to identify candidate instances)
Wikibooks
Finally, here is a zip file containing the HTML files for Wikibooks in the paper analysis, along with the markdown files generated from them. Due to their size, CoreNLP results on the Wikibooks are available by request. Generating them may be a faster option.
Read more about me or find my contact information here.