About Me

Who I Am

I (he/him) am an Assistant Professor in the College of Information Sciences and Technology at Penn State, where I lead the Human Language Technologies Lab. My research spans natural language processing, privacy, security, and computational social science.

Prior to becoming faculty, I held postdoctoral positions at Carnegie Mellon University and the University of Edinburgh. I received my PhD in Computer Science from the University of Maryland in 2011.

Contacting Me

You can reach me at shomir _at_ psu.edu. If you are on Penn State's University Park campus, you can stop by my office at E310 in the Westgate Building.

Students taking my classes may benefit from browsing my Guide for Interacting With Faculty before contacting me.

Students interested in joining my lab: I sometimes have openings for PhD students, MS students, and undergraduates to work on projects related to natural language processing or privacy. If you're interested in working with me, first consult my Guide for Joining My Lab and then email me with "read your recruiting note" as the subject line. Include a CV and an explanation of your specific interests in my research.

Curriculum Vitae

Here it is.

Latest News

I also sometimes post news and thoughts to Twitter or Mastodon.

2023-01-27: Our proposal SaTC: CORE: Small: Toward Privacy Equity through Contextual Understanding of Self-Disclosure" has been awarded. I'll be working with colleague Sarah Rajtmajer on studying relationships between socioeconomic status and individuals' privacy behaviors in social media.

2023-01-21: Our paper "An Exploratory Study of Demonym Biases in GPT-2" was accepted to appear at EACL.

2023-01-12: Our proposal "Understanding the Prevalence of Drinking Water Service Disruption through Large-Scale Analysis of News Articles and Social Media" was recently funded by Penn State's Center for Socially Responsible AI. Here's an announcement in Penn State News.

2022-12-09: This article in Penn State News describes my PhD student Younes Karimi's work to automatically identify doxing on Twitter.

2022-12-02: HLT Lab undergraduate researcher Nora O'Toole is featured in this Penn State News article.

2022-10-17: Here's an article in Penn State News about my PhD students' work identifying biases in language models against terms that describe people with disabilities.

2022-10-05: I've been selected to join the Steering Committee for Penn State's Center for Socially Responsible Artificial Intelligence.

2022-08-14: I will participate in the seminar "Privacy in Speech and Language Technology" at Dagstuhl in late August.

2022-07-01: We've released the GPI ("Government Privacy Instructions") Corpus, a collection of 1,043 privacy laws, regulations, and guidelines from 182 jurisdictions around the world. Read our ArXiv paper about it and download it here.

2022-06-02: I led (with Athina Markopoulou) a breakout session titled "Privacy, Policy, and People" at the NSF SaTC PI Meeting.

2022-04-22: Thanks to Sepideh Ghanavati for virtually hosting me for a talk with her lab at the University of Maine.

2022-04-06: We have two papers accepted to LREC: "STAPI: An Automatic Scraper for Extracting Iterative Title-Text Structure from Web Documents" and "A Tale of Two Regulatory Regimes: Creation and Analysis of a Bilingual Privacy Policy Corpus".

2022-02-04: I've been elected to Penn State's University Faculty Senate for a four-year term, starting in Fall 2022 and ending in Spring 2026.

2022-01-15: Our paper "Automated Detection of Doxing on Twitter" has been accepted to CSCW 2022. Here's a preprint on arXiv.

For older news, check the archive.