About Me


Who I Am

I am an Assistant Professor in the College of Information Sciences and Technology at Penn State, where I lead the Human Language Technologies Lab. I am also a Faculty Affiliate of Penn State's Institute for Computational and Data Sciences and a member of the Social Data Analytics graduate faculty.

My research spans natural language processing, privacy, and artificial intelligence. I participate in the Usable Privacy Policy Project. You can read more about my research interests.

From 2016 until 2018 I was an Assistant Professor in the EECS Department at the University of Cincinnati. Prior to that I was a postdoc and a lecturer in Carnegie Mellon University's School of Computer Science and an NSF International Research Fellow in the University of Edinburgh's School of Informatics. I received my PhD in Computer Science from the University of Maryland in 2011.

Contacting Me

You can reach me at shomir _at_ psu.edu. If you are on Penn State's University Park campus, you can stop by my office at E310 in the Westgate Building.

Students taking my classes may benefit from browsing my Guide for Interacting With Faculty before contacting me.

Students interested in joining my lab: I often have openings for PhD students, MS students, and undergraduates to work on projects related to natural language processing or privacy. If you're interested in working with me, first consult my Guide for Joining My Lab and then email me with "read your recruiting note" as the subject line. Include a CV and an explanation of your specific interests in my research.

Curriculum Vitae

Here it is.

Latest News

I also sometimes post news and thoughts to Twitter.

2020-10-24: Our paper "From Prescription to Description: Mapping the GDPR to a Privacy Policy Corpus Annotation Scheme" has been accepted for presentation at JURIX.

2020-08-11: My advisees presented three posters at (virtual) SOUPS. They covered our research on limitations in the availability of privacy policies on the web, automatically generating titles for sections of privacy policies, and differences in scam emails written in different languages.

2020-07-08: I co-hosted a mentoring session at (virtual) ACL 2020 about finding good advisors and mentors. Here are our slides from the session.

2020-06-16: I added Thoughts on Failure to my Advice for Students. It's a personal narrative about my biggest failures, with some advice from experience. Students who set challenging goals for themselves may find the contents of this page particularly relatable.

2020-05-20: I would be glad to work with recent PhD graduates on applications for CRA's 2020 CIFellows postdoctoral fellowship. If you're interested, please get in touch soon.

2020-05-09: Congratulations to IST's Spring 2020 graduates! I contributed to this celebratory video that our Office of Marketing and Communications put together.

2020-05-07: I added a Guide for Publishing in Computer/Information Science Venues to my Advice for Students.

2020-04-28: I'm pleased to announce the release of PrivaSeer, our search engine for the privacy policies of 1,005,781 English language websites. You can also read this arXiv paper about the corpus behind the search engine.

2020-01-24: Here's a press release about the Penn State - Univeristy of Auckland research collaboration workshop I attended in November 2019.

2020-01-17: Our paper "Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text" has been accepted for publication and oral presentation at The Web Conference.

2020-01-02: Our proposal titled "Exploring the Effects of Socioeconomic Status on Privacy Behaviors in an Online Social Network" will be funded by Penn State's Center for Social Data Analytics. Sarah Rajtmajer and I will work together on this project.

For older news, check the archive.