About Me

Who I Am

I am an Assistant Professor in the College of Information Sciences and Technology at Penn State, where I lead the Human Language Technologies Lab. I am also a Faculty Affiliate of Penn State's Institute for CyberScience and a member of the Social Data Analytics graduate faculty.

My research spans natural language processing, privacy, and artificial intelligence. I participate in the Usable Privacy Policy Project. You can read more about my research interests.

From 2016 until 2018 I was an Assistant Professor in the EECS Department at the University of Cincinnati. Prior to that I was a postdoc and a lecturer in Carnegie Mellon University's School of Computer Science and an NSF International Research Fellow in the University of Edinburgh's School of Informatics. I received my PhD in Computer Science from the University of Maryland in 2011.

Contacting Me

You can reach me at shomir _at_ psu.edu. If you are on PSU's University Park campus, you can stop by my office at E310 in the Westgate Building.

Prospective Students: I have openings for Ph.D. students, M.S. students, and undergraduates to work on projects related to natural language processing or privacy. I'm looking for students who have coursework in machine learning, NLP, or artificial intelligence, and I place a high value on good writing skills and attentiveness to detail. If you're interested in working with me, read some of my recent publications and email me with "read your recruiting note" as the subject line. Include a CV and an explanation of your specific interests in my research.

Curriculum Vitae

Here it is.

Latest News

2019-01-11: Our paper "Analyzing privacy policies at scale: From crowdsourcing to automated annotations" has been published in ACM Transactions on the Web. Here's a copy.

2019-12-05: Thanks to Jamie Blustein and Computer Science at Dalhousie University for hosting me for a talk today.

2018-11-29: I am pleased to congratulate (belatedly) my Spring 2018 M.S. graduates on their new jobs: Abhijith Athreya is now a Chief Engineer at Samsung R&D, and Baradwaj Aryasomayajula is a Software Developer at Verizon.

2018-10-31: Later this week I will be at EMNLP 2018 to present our poster for our accepted paper. I will also chair the Social Applications II session, and I am looking for Ph.D. students to join my lab. Please chat with me if you are interested.

2018-10-16: Thanks to IST's communications and marketing team for including a section about my photography in this article. That's my picture at the top, and the section about me is toward the end.

2018-10-11: IST has multiple faculty openings. We have an open-rank opening in security and privacy and an assistant professor opening in human-centered design. We're also looking for teaching faculty.

2018-09-25: I am now a Faculty Affiliate of Penn State's Institute for CyberScience.

2018-08-30: Our paper "Supervised and Unsupervised Methods for Robust Separation of Section Titles and Prose Text in Web Documents" has been accepted for presentation at EMNLP in November. Here's the paper. The code and the datasets are on GitHub.

2018-08-10: I am organizing a AAAI Spring Symposium titled "Privacy-Enhancing Artificial Intelligence and Language Technologies" (PAL) at Stanford University on March 25-27, 2019. Consider submitting a paper, or help me promote it with this flyer.

2018-07-26: Congrats to my students Abhijith Mysore and Baradwaj Aryasomayajula on successfully defending their M.S. theses!

2018-07-19: Last week I led a workshop on natural language processing for 20 visiting faculty from Ming Chi University of Taiwan, as part of their visit to the University of Cincinnati. Best wishes for the rest of their stay.

2018-05-07: I was a faculty marshal for the College of Engineering and Applied Sciences at UC's Spring Commencement. Congratulations again to all of our graduates!

2018-03-23: The Usable Privacy Policy Project created a series of videos to explain our research. I speak first in this one, about the natural language processing aspects of the project.

For older news, check the archive.