Publications


This list also appears in my curriculum vitae. Papers with future publication dates are accepted to appear in their respective venues.

Peer-Reviewed Conference Proceedings

Understanding How to Inform Blind and Low-Vision Users about Data Privacy through Privacy Question Answering Assistants. Yuanyuan Feng, Abhilasha Ravichander, Yaxing Yao, Shikun Zhang, Rex Chen, Shomir Wilson, and Norman Sadeh. In Proceedings of the 33rd USENIX Security Symposium (USENIX), 2024.

The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis. Pranav Narayanan Venkit, Mukund Srinath, Sanjana Gautam, Saranya Venkatraman, Vipul Gupta, Rebecca J. Passonneau, and Shomir Wilson. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023. Outstanding Paper Award.

Privacy Lost and Found: An Investigation at Scale of Web Privacy Policy Availability. Mukund Srinath, Soundarya Nurani Sundareswara, Pranav Venkit, C. Lee Giles, and Shomir Wilson. In Proceedings of the 23rd ACM Symposium on Document Engineering (DocEng), 2023. Best Student Paper Award.

Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text. Mukund Srinath, Lee Matheson, Pranav Venkit, Gabriela Zanfir-Fortuna, Florian Schaub, C. Lee Giles and Shomir Wilson. In Proceedings of the 23rd ACM Symposium on Document Engineering (DocEng), 2023.

Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles. Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao Huang, and Shomir Wilson. In Proceedings of the Sixth AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2023.

Creation and Analysis of a Corpus of Scam Emails Targeting Universities. Grace Ciambrone and Shomir Wilson. In Companion Proceedings of the ACM Web Conference (WebConf), 2023. [data]

Nationality Bias in Text Generation. Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao Kenneth Huang, and Shomir Wilson. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023.

A Study of Implicit Language Model Bias Against People with Disabilities. Pranav Venkit, Mukund Srinath, and Shomir Wilson. In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 2022.

STAPI: An Automatic Scraper for Extracting Iterative Title-Text Structure from Web Documents. Nan Zhang, Shomir Wilson, and Prasenjit Mitra. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022.

A Tale of Two Regulatory Regimes: Creation and Analysis of a Bilingual Privacy Policy Corpus. Siddhant Arora, Henry Hosseini, Christine Utz, Vinayshekhar Bannihatti Kumar, Tristan O. Dhellemmes, Abhilasha Ravichander, Peter Story, Jasmine Mangat, Rex Chen, Martin Degeling, Thomas Norton, Thomas Hupperich, Shomir Wilson, and Norman Sadeh. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022.

Automated Detection of Doxing on Twitter. Younes Karimi, Anna Squicciarini, and Shomir Wilson. In Proceedings of the 25th ACM Conference on Computer-Supported Cooperative Work And Social Computing (CSCW), 2022.

Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies. Mukund Srinath, Shomir Wilson, and C. Lee Giles. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021. [data]

Breaking Down Walls of Text: How Can NLP Benefit Consumer Privacy? Abhilasha Ravichander, Alan W Black, Thomas Norton, Shomir Wilson, and Norman Sadeh. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021.

PrivaSeer: A Privacy Policy Search Engine. Mukund Srinath, Soundarya Nurani Sundareswara, C. Lee Giles, and Shomir Wilson. In Proceedings of the 21st International Conference on Web Engineering (ICWE), 2021.

A Large-Scale Exploration of Terms of Service Documents on the Web. Soundarya Nurani Sundareswara, Mukund Srinath, Shomir Wilson and C. Lee Giles. In Proceedings of the 21st ACM Symposium on Document Engineering (DocEng), 2021.

From Prescription to Description: Mapping the GDPR to a Privacy Policy Corpus Annotation Scheme. Ellen Poplavska, Thomas B. Norton, Shomir Wilson, and Norman Sadeh. In Proceedings of the 33rd International Conference on Legal Knowledge and Information Systems (JURIX), December 9-11, 2020. [data]

Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text. Vinayshekhar Bannihatti Kumar, Roger Iyengar, Namita Nisal, Yuanyuan Feng, Hana Habib, Peter Story, Sushain Cherivirala, Margaret Hagan, Lorrie Cranor, Shomir Wilson, Florian Schaub and Norman Sadeh. In Proceedings of The Web Conference (WebConf), April 20-24, 2020. [software]

Question Answering for Privacy Policies: Combining Computational and Legal Perspectives. Abhilasha Ravichander, Alan Black, Shomir Wilson, Thomas Norton, and Norman Sadeh. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), November 3-7, 2019.

Vaccine: Obfuscating Access Pattern Against File-Injection Attacks. Hao Liu, Boyang Wang, Nan Niu, Shomir Wilson, and Xuetao Wei. In Proceedings of the IEEE Conference on Communications and Network Security (CNS), June 10-12, 2019.

Supervised and Unsupervised Methods for Robust Separation of Section Titles and Prose Text in Web Documents. Abhijith Athreya Mysore Gopinath, Shomir Wilson, and Norman Sadeh. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Brussels (EMNLP), Belgium, 2018. [data and code]

Identifying the provision of choices in privacy policy text. Kanthashree Sathyendra, Shomir Wilson, Florian Schaub, Norman Sadeh, and Sebastian Zimmeck. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Copenhagen (EMNLP), Denmark, 2017. [data]

Automated analysis of privacy requirements for mobile apps. Sebastian Zimmeck, Ziqi Wang, Lieyong Zou, Roger Iyengar, Bin Liu, Florian Schaub, Shomir Wilson, Norman Sadeh, Steven M. Bellovin, and Joel Reidenberg. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, California, March 2017.

The creation and analysis of a website privacy policy corpus. Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Eduard Hovy, Joel Reidenberg, and Norman Sadeh. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, August 2016. [data]

Crowdsourcing annotations of websites' privacy policies: Can it really work? Shomir Wilson, Florian Schaub, Rohan Ramanath, Norman Sadeh, Fei Liu, Noah Smith and Frederick Liu. In Proceedings of the 25th International World Wide Web Conference (WWW), Montréal, Canada, April 2016. Best Paper Finalist. [corrigendum]

This table is different: A WordNet-based approach to identifying references to document entities. Shomir Wilson, Alan W Black, and Jon Oberlander. In Proceedings of The 8th International Global WordNet Conference (GWC), Bucharest, Romania, January 2016. [data]

Identifying relevant text fragments to help crowdsource privacy policy annotations. Rohan Ramanath, Florian Schaub, Shomir Wilson, Fei Liu, Norman Sadeh, and Noah Smith. In Proceedings of the Second AAAI Conference on Human Computation and Crowdsourcing (HCOMP), works-in-progress track, Pittsburgh, PA, November 2014.

Determiner-established deixis to communicative artifacts in pedagogical text. Shomir Wilson and Jon Oberlander. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore, USA, June 23-25, 2014. [data]

Toward automatic processing of English metalanguage. Shomir Wilson. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP), Nagoya, Japan, October 14-18, 2013.

Privacy manipulation and acclimation in a location sharing application. Shomir Wilson, Justin Cranshaw, Norman Sadeh, Alessandro Acquisti, Lorrie Cranor, Jay Springfield, Sae Young Jeong, and Arun Balasubramanian. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (Ubicomp), Zurich, Switzerland, September 8-12, 2013.

Tweets are forever: A large-scale quantitative analysis of deleted tweets. Hazim Almuhimedi, Shomir Wilson, Bin Liu, Norman Sadeh, and Alessandro Acquisti. In Proceedings of the 2013 ACM Conference on Computer Supported Cooperative Work (CSCW), San Antonio, TX, February 23-27, 2013.

The creation of a corpus of English metalanguage. Shomir Wilson. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), Jeju, South Korea, July 9-11, 2012. [slides] [corpus]

Application of MCL in a dialog agent. Darsana Josyula, Scott Fults, Michael L. Anderson, Shomir Wilson, and Don Perlis. In Papers from the Third Language and Technology Conference (LTC), 2007.

Peer-Reviewed Journal Articles

Online Self-Disclosure, Social Support, and User Engagement During the COVID-19 Pandemic. Jooyoung Lee, Sarah Rajtmajer, Eesha Srivatsavaya, and Shomir Wilson. ACM Transactions on Social Computing (TSC), 2023.

Researchers' Experiences in Analyzing Privacy Policies: Challenges and Opportunities. Abraham Mhaidli, Selin Fidan, An Doan, Gina Herakovic, Mukund Srinath, Lee Matheson, Shomir Wilson, and Florian Schaub. In Proceedings on Privacy Enhancing Technologies (PoPETs), 2023. Best Student Paper Award.

Digital Inequality Through the Lens of Self-Disclosure. Jooyoung Lee, Sarah Rajtmajer, Eesha Srivatsavaya, and Shomir Wilson. In Proceedings on Privacy Enhancing Technologies (PoPETs), 2021.

Analyzing privacy policies at scale: From crowdsourcing to automated annotations. Shomir Wilson, Florian Schaub, Frederick Liu, Kanthashree Mysore Sathyendra, Sebsatian Zimmeck, Rohan Ramanath, Fei Liu, Norman Sadeh, and Noah Smith. In ACM Transactions on the Web (TWEB) 13(1), December 2018.

PrivOnto: A semantic framework for the analysis of privacy policies. Alessandro Oltramari, Dhivya Piraviperumal, Florian Schaub, Shomir Wilson, Sushain Cherivirala, Thomas B. Norton, N. Cameron Russell, Peter Story, Joel Reidenberg, and Norman Sadeh. In Semantic Web Journal (SWJ), May 2017.

Nudges for privacy and security: Understanding and assisting users' choices online. Alessandro Acquisti, Idris Adjerid, Rebecca Balebako, Laura Brandimarte, Lorrie Faith Cranor, Saranga Komanduri, Pedro Giovanni Leon, Norman Sadeh, Florian Schaub, Manya Sleeper, Yang Wang, and Shomir Wilson. ACM Computing surveys (CSUR) 50(3), August 2017.

In search of the use-mention distinction and its impact on language processing tasks. Shomir Wilson. The International Journal of Computational Linguistics and Applications 2(1-2), pp 139-154, 2011.

The pathological liar: An exclusionary approach to self-referential contradictions in natural language. Shomir Wilson. Aporia 14(2), 2004.

Peer-Reviewed AAAI Symposium Proceedings

Analyzing vocabulary intersections of expert annotations and topic models for data practices in privacy policies. Frederick Liu, Shomir Wilson, Florian Schaub and Norman Sadeh. In Proceedings of the AAAI Fall Symposium on Privacy and Language Technologies, Arlington, VA, November 2016.

Automatic extraction of opt-out choices from privacy policies. Kanthashree Mysore Sathyendra, Florian Schaub, Shomir Wilson and Norman Sadeh. In Proceedings of the AAAI Fall Symposium on Privacy and Language Technologies, Arlington, VA, November 2016.

Analyzing and predicting privacy law compliance of mobile apps. Sebastian Zimmeck, Ziqi Wang, Lieyong Zou, Roger Iyengar, Bin Liu, Florian Schaub, Shomir Wilson, Norman Sadeh, Steven M. Bellovin and Joel Reidenberg. In Proceedings of the AAAI Fall Symposium on Privacy and Language Technologies, Arlington, VA, November 2016.

The Metacognitive Loop: An architecture for building robust intelligent systems. Hamid Haidarian, Wikum Dinalankara, Scott Fults, Shomir Wilson, Don Perlis, Matt Schmill, Tim Oates, Darsana Josyula, and Michael Anderson. In Proceedings of the AAAI Fall Symposium on Commonsense Knowledge (AAAI/CSK'10), Arlington, VA, USA, November 11-13, 2010.

Toward domain-neutral human-level metacognition. Michael L. Anderson, Matt Schmill, Tim Oates, Don Perlis, Darsana Josyula, Dean Wright, and Shomir Wilson. In Proceedings of the 2007 AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning, 2007.

Book Chapters

Nudges (and Deceptive Patterns) for Privacy: Six Years Later. Alessandro Acquisti, Idris Adjerid, Laura Brandimarte, Lorrie Faith Cranor, Saranga Komanduri, Pedro Giovanni Leon, Norman Sadeh, Florian Schaub, Yang Wang, and Shomir Wilson. In Sabine Trepte, Philipp Masur (Ed.), The Routledge Handbook of Privacy and Social Media, Taylor & Francis, 2023.

A bridge from the use-mention distinction to natural language processing. Shomir Wilson. In Saka, P., Johnson, M. (Ed.), The Semantics and Pragmatics of Quotation. Springer, 2017.

The metacognitive loop and reasoning about anomalies. Matthew Schmill, Michael L. Anderson, Scott Fults, Darsana Josyula, Tim Oates, Donald Perlis, Hamid Haidarian Shahri, Shomir Wilson, and Dean Wright. In Cox, M., Raja, A. (Ed.), Metareasoning: Thinking about Thinking. MIT Press, MA, USA, 2010.

Magazine Articles

Reports on the 2019 AAAI Spring Symposium Series (Privacy-Enhancing Artificial Intelligence and Language Technologies). Shomir Wilson, et al. To appear in AI Magazine 40:3.

Reports on the 2016 AAAI Fall Symposium Series (Privacy and Language Technologies). Patrícia Alves-Oliveira, Richard G. Freedman, Dan Grollman, Laura Herlant, Laura Humphrey, Fei Liu, Ross Mead, Frank Stein, Tom Williams, and Shomir Wilson. AI Magazine 38:2, Summer 2017.

A self-help guide for autonomous systems. Michael L. Anderson, Scott Fults, Darsana P. Josyula, Tim Oates, Don Perlis, Matthew D. Schmill, Shomir Wilson, and Dean Wright. AI Magazine, Summer 2008.

Peer-Reviewed Conference Poster Abstracts

Comparing Scam Emails and Email User Education at Universities. Duo Pan, Ellen Poplavska, Nora O'Toole, and Shomir Wilson. In The Seventeenth Symposium on Usable Privacy and Security (unpublished work), held online, August 2021.

A Multilingual Comparison of Email Scams. Duo Pan, Ellen Poplavska, Yichen Yu, Susan Strauss, and Shomir Wilson. In the Sixteenth Symposium on Usable Privacy and Security (unpublished work), held online, August 2020.

Automatic Title Generation to Improve the Readability of Privacy Policies. Abhijith Athreya Mysore Gopinath, Vinayshekhar Bannihatti Kumar, Shomir Wilson, Norman Sadeh. In the Sixteenth Symposium on Usable Privacy and Security (unpublished work), held online, August 2020.

Privacy Not Found: A Study of the Availability of Privacy Policies on the Web. Soundarya Nurani Sundareswara, Shomir Wilson, Mukund Srinath, C. Lee Giles. In the Sixteenth Symposium on Usable Privacy and Security (unpublished work), held online, August 2020.

Question Answering for Privacy Policies: Combining Computational and Legal Perspectives. Abhilasha Ravichander, Alan Black, Shomir Wilson, Thomas Norton, and Norman Sadeh. In Proceedings of the Sixteenth Symposium on Usable Privacy and Security (published work), held online, August 2020.

Increasing the salience of data use opt-outs online. Namita Nisal, Sushain K. Cherivirala, Kanthashree M. Sathyendra, Margaret Hagan, Florian Schaub, Shomir Wilson, Lorrie Faith Cranor, and Norman Sadeh. In the Thirteenth Symposium on Usable Privacy and Security (unpublished work), Santa Clara, CA, June 2017.

Mobile app privacy compliance: Automated technology to help regulators, app stores and developers. Sebastian Zimmeck, Lieyong Zou, Bin Liu, Shomir Wilson, Steven M. Bellovin, Ziqi Wang, Roger Iyengar, Florian Schaub, Norman Sadeh, and Joel Reidenberg. In the Thirteenth Symposium on Usable Privacy and Security (published work), Santa Clara, CA, June 2017.

Visualization and interactive exploration of data practices in privacy policies. Sushain K. Cherivirala, Florian Schaub, Mads Schaarup Andersen, Shomir Wilson, Norman Sadeh, and Joel R. Reidenberg. In the Twelfth Symposium on Usable Privacy and Security (unpublished work), Denver, CO, June 2016.

Towards usable privacy policies: Semi-automatically extracting data practices from websites' privacy policies. Norman Sadeh, Alessandro Acquisti, Travis Breaux, Lorrie Cranor, Aleecia McDonald, Joel Reidenberg, Noah Smith, Fei Liu, N. Cameron Russell, Florian Schaub, Shomir Wilson, James Graves, Pedro Leon, Rohan Ramanath, and Ashwini Rao. In the Tenth Symposium on Usable Privacy and Security (unpublished work), Palo Alto, CA, July 2014.

Peer-Reviewed Workshop Proceedings

Automated Ableism: An Exploration of Explicit Disability Biases in AIaaS Sentiment and Toxicity Analysis Models. Pranav Narayanan Venkit, Mukund Srinath, and Shomir Wilson. In Proceedings of the Third Workshop on Trustworthy Natural Language Processing (TrustNLP), 2023. Best Short Paper Award.

Effects of Online Self-Disclosure on Receiving Social Support During the COVID-19 Pandemic. Jooyoung Lee, Sarah Rajtmajer, Eesha Srivatsavaya, and Shomir Wilson. In 1st Workshop on NLP for Positive Impact (unpublished papers) at ACL, 2021.

Demystifying privacy policies with language technologies: Progress and challenges. Shomir Wilson, Florian Schaub, Aswarth Dara, Sushain K. Cherivirala, Sebastian Zimmeck, Mads Schaarup Andersen, Pedro Giovanni Leon, Eduard Hovy, and Norman Sadeh. In Proceedings of the Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS) at LREC, Portoro, Solvenia, May 2016.

Distinguishing use and mention in natural language. Shomir Wilson. In Proceedings of the NAACL HLT Student Research Workshop, 29-33. Los Angeles, CA: Association for Computational Linguistics. 2010.

The role of metacognition in robust AI systems. Matt Schmill, Tim Oates, Michael L. Anderson, Darsana Josyula, Don Perlis, Shomir Wilson, and Scott Fults. In Papers from the Workshop on Metareasoning at the Twenty-Third AAAI Conference on Artificial Intelligence, 2008.

Ontologies for reasoning about failures in AI systems. Michael L. Anderson, Scott Fults, Darsana Josyula, Tim Oates, Don Perlis, Matt Schmill, and Shomir Wilson. Proceedings of the First International Workshop on Metareasoning in Agent-Based Systems, Hawaii, 2007.

Dissertation

A Computational Theory of the Use-Mention Distinction in Natural Language. Shomir Wilson. University of Maryland, 2011.

Technical Reports

Towards Automatic Classification of Privacy Policy Text. Frederick Liu, Shomir Wilson, Peter Story, Sebastian Zimmeck, and Norman Sadeh. Technical Report CMU-LTI-17-010, Carnegie Mellon University, 2017.

The Usable Privacy Policy Project: Combining crowdsourcing, machine learning and natural language processing to semi-automatically answer those privacy questions users care about. Norman Sadeh, Alessandro Acquisti, Travis Breaux, Lorrie Cranor, Aleecia McDonald, Joel Reidenberg, Noah Smith, Fei Liu, N. Cameron Russell, Florian Schaub, and Shomir Wilson. Technical Report CMU-ISR-13-119, Carnegie Mellon University, 2013.

Automatic categorization of privacy policies: A pilot study. Waleed Ammar, Shomir Wilson, Norman Sadeh, and Noah A. Smith. Technical Report CMU-LTI-12-019 / CMU-ISR-12-114, Carnegie Mellon University, 2012.

Other Papers

Survey on Sociodemographic Bias in Natural Language Processing. Vipul Gupta, Pranav Narayanan Venkit, Shomir Wilson, and Rebecca J. Passonneau. arXiv:2306.08158, 2023.

Creation and Analysis of an International Corpus of Privacy Laws. Sonu Gupta, Ellen Poplavska, Nora O'Toole, Siddhant Arora, Thomas Norton, Norman Sadeh, and Shomir Wilson. arXiv:2206.14169, 2022. [data]

An Exploratory Analysis of Broadcast Police Communications in Chicago. Pranav Venkit, Chris Graziul, Miranda Goodman, Samantha Kenny, Shomir Wilson. Abstract accepted by the Penn State Annual Social Thought Conference, 2022.

Automated Detection of Doxing on Twitter. Younes Karimi, Anna Squicciarini, and Shomir Wilson. arXiv:2202.00879, 2022. Preprint of our CSCW 2022 paper.

Identification of Bias Against People with Disabilities in Sentiment Analysis and Toxicity Detection Models. Pranav Narayanan Venkit and Shomir Wilson. arXiv:2111.13259, 2021.

A 'Sourceful' Twist: Emoji Prediction Based on Sentiment, Hashtags and Application Source. Pranav Venkit, Zeba Karishma, Chi-Yang Hsu, Rahul Katiki, Kenneth Huang, Shomir Wilson, and Patrick Dudas. arXiv:2103.07833, 2021.

Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies. Mukund Srinath, Shomir Wilson, and C. Lee Giles. arXiv:2004.11131, 2020.

An active logic approach to Moore's Paradox. Shomir Wilson. Scholarly paper for M.S. in Computer Science, 2008.

Evaluation of functional-linkage networks applied to protein annotation. Shomir Wilson. Honors thesis for B.S. in Computer Science, 2005.

Wittgenstein takes the Turing Test. Shomir Wilson. Honors thesis for B.S. in Philosophy; also selected for presentation at The First Undergraduate Philosophy Conference at Northwestern University, 2005.

Construction of a crystal graph simulation engine. Shomir Wilson. Honors thesis for B.S. in Mathematics, 2004.