This is a past event
Generating Descriptions of Images
Can we connect computer vision to language generation in order to produce accurate, fluent, and naturally varied descriptions of images? What issues come up, and how can the visual and linguistic systems work together? (Why does this matter?) In this talk, I describe some of the research coming out of the Johns Hopkins CLSP 2011 summer workshop, where researchers in natural language processing and computer vision collaborated to characterize and generate visual descriptions. I will focus on our prototype vision-to-language system, Midge, which uses vision detections alongside descriptive text to generate likely parse trees describing an image. Relevant papers stemming from this work:
EACL, "Midge: Generating Image Descriptions From Computer Vision Detections", http://aclweb.org/anthology/E/E12/E12-1076.pdf (Most relevant for this talk)
NAACL, "Detecting Visual Text", http://abdn.ac.uk/~r07mm9/papers/desctext.pdf
CVPR, "Understanding and Predicting Importance in Images", http://abdn.ac.uk/~r07mm9/papers/importancefactors.pdf
Meg is a postgraduate student in the natural language generation (NLG) group at the University of Aberdeen. She is also a visiting scholar at the Center for Spoken Language Understanding, part of OHSU, in Portland, Oregon. Meg's web page: http://www.abdn.ac.uk/~r07mm9/
- Speaker
- Margaret Mitchell
- Hosted by
- Kees van Deemter
- Venue
- MT203