|
| Assistant Professor |
| Stony Brook University (SUNY Stony Brook) |
| 1422 Computer Science |
| Stony Brook, NY 11794-4400 |
| (phone) 631-632-8457 |
| (fax) 631-632-8334 |
 |
|
Office Hours: TBD for Fall 2013
News:
- 2 papers at ACL 2013: one on connotation
lexicon, another on new
image-text parallel corpus
- New media coverage: our work on connotation lexicon is
featured by FastCompany
- 1 journal to appear at TPAMI 2013
- Invited speaker at Vision+NLP
Workshop at NAACL 2013
- Panel speaker at Student
Research Workshop
at NAACL 2013
- Interview with News for New York @ WNBC on deception cues in product reviews
- Area chair for EMNLP 2012
- Area chair for NAACL 2012
Teaching:
Research Projects:
|
- Integrative Models for Natural Language and
Images, Language Grounding
Web data today is increasingly multi-modal, opening up opportunities as well as the need
for integrative models to bridge Natural Language Processing
with Computer Vision.
Our recent explorations include
- Generating natural language descriptions of images
by guiding object detection with language prior
[CVPR-11],
by predicting likely action verbs from language-driven world knowledge
[CoNLL-11],
and by composing phrases retrieved by partial image matching
[ACL-12].
- Understanding
characteristics of visual descriptions
[NAACL-12].
- Constructing a new image-text parallel corpus by reducing information misalignment between images and text
[ACL-13].
|
- Writing Styles, Deception
Detection, Personal Analytics, Forensic Language Technologies
Language is a window into people's minds. We explore data-driven approaches to
statistical stylometry (i.e., the study of linguistic styles),
and forensic language technologies (e.g., authorship verification, obfuscation,
deception detection).
This research is naturally interdisciplinary with
broad connections to Psychology, Social Science, Cognitive Science, Psycholinguistics, and
Literature.
Our recent development includes
- Detecting socio-cognitive identities, such as
authorship [EMNLP-12],
gender [CoNLL-11], and nationality.
- Uncovering (hidden) intent of the authors, such as deception
[ACL-11,
ACL-12,
ICWSM-12],
and textual vandalism [ACL-11].
|
|
Publications:
- Recent Papers:
2013
Generalizing Image Captions for Image-Text Parallel Corpus.
Polina Kuznetsova, Vicente Ordonez, Alexander Berg, Tamara Berg and Yejin Choi.
Association for Computational Linguistics (ACL), short, 2013.
*Data: the generalized 1M image-caption corpus will be made available before the conference.
Connotation Lexicon: A Dash of Sentiment Beneath the Surface
Meaning.
Song Feng, Jun Seok Kang, Polina Kuznetsova and Yejin Choi.
Association for Computational Linguistics (ACL), 2013.
*Data: Connotation lexicon
Featured in Fast Company
BabyTalk: Understanding and Generating Simple Image Descriptions.
Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li,
Yejin Choi, Alexander C. Berg, Tamara L Berg.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2013.
2012
Characterizing Stylistic Elements in Syntactic
Structure.
Song Feng, Ritwik Banerjee and Yejin Choi.
Empirical Methods in Natural Language Processing (EMNLP), 2012.
Syntactic Stylometry for Deception Detection.
Song Feng, Ritwik Banerjee and Yejin Choi.
Association for Computational Linguistics (ACL), short, 2012.
Collective Generation of Natural Image Descriptions.
Polina Kuznetsova, Vicente Ordonez, Alexander Berg, Tamara Berg and Yejin Choi.
Association for Computational Linguistics (ACL), 2012.
Distributional Footprints of Deceptive Product Reviews.
Song Feng, Longfei Xing, Anupam Gogar and Yejin Choi.
International AAAI Conference on Weblogs and Social Media. (ICWSM), 2012.
Best paper runner up
Featured in
Investor's Business Daily (Aug 2012);
MIT's Technology Review (Jun 2012);
Newsweek (Sep 2012);
Business Insider;
Computer World;
Consumerist;
Detecting Visual Text.
Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi, Yejin Choi, Hal Daume' III, Alexander C. Berg, Tamara L. Berg.
North American Chapter of the Association for Computational Linguistics. (NAACL), 2012.
2011
Learning General Connotation of Words using Graph-based Algorithms.
Song Feng, Ritwik Bose, and Yejin Choi.
Empirical Methods in Natural Language Processing (EMNLP), 2011.
Domain Independent Authorship Attribution without Domain Adaptation.
Rohith Menon and Yejin Choi.
Recent Advances in Natural Language Processing (RANLP), 2011.
Composing Simple Image Descriptions using Web-scale N-grams.
Siming Li, Girish Kulkarni, Tamara Berg, Alex Berg and Yejin Choi.
Computational Natural Language Learning (CoNLL), 2011.
Gender Attribution: Tracing Stylometric Evidence Beyond Topic and Genre.
Ruchita Sarawgi, Kailash Gajulapalli and Yejin Choi.
Computational Natural Language Learning (CoNLL), 2011.
Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis.
Manoj Harpalani, Michael Hart, Sandesh Singh, Rob Johnson and Yejin Choi.
Association for Computational Linguistics (ACL), short, 2011.
Finding Deceptive Opinion Spam by Any Stretch of the Imagination.
Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey Hancock.
Association for Computational Linguistics (ACL), 2011.
Featured in
WNBC
News for New York (Sep 2012, 5pm news);
NHPR Radio (Sep 22, 2011);
Bloomberg
Business Week (Oct 2011);
NY Times (Aug 19, 2011);
Baby Talk: Understanding and Generating Image Descriptions.
Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara Berg.
Computer Vision and Pattern Recognition (CVPR), 2011
- Complete list of publications:
By Year
By Topic
Students:
Short Bio:
Yejin Choi received her Ph.D. in Computer Science at Cornell University,
and BS in Computer Science and Engineering at Seoul National University.
She spent the summer of 2009 as a research intern at Yahoo! Research and joined the faculty of Computer Science Department at Stony Brook University in Sep 2010.