Computing Science seminar by Margaret Mitchell

23 May 2012, 14:00 - 15:00

This is a past event

Generating Descriptions of Images

Can we connect computer vision to language generation in order to produce accurate, fluent, and naturally varied descriptions of images? What issues come up, and how can the visual and linguistic systems work together? (Why does this matter?) In this talk, I describe some of the research coming out of the Johns Hopkins CLSP 2011 summer workshop, where researchers in natural language processing and computer vision collaborated to characterize and generate visual descriptions. I will focus on our prototype vision-to-language system, Midge, which uses vision detections alongside descriptive text to generate likely parse trees describing an image. Relevant papers stemming from this work:

EACL, "Midge: Generating Image Descriptions From Computer Vision Detections", http://aclweb.org/anthology/E/E12/E12-1076.pdf (Most relevant for this talk)

NAACL, "Detecting Visual Text", http://abdn.ac.uk/~r07mm9/papers/desctext.pdf

CVPR, "Understanding and Predicting Importance in Images", http://abdn.ac.uk/~r07mm9/papers/importancefactors.pdf

Meg is a postgraduate student in the natural language generation (NLG) group at the University of Aberdeen. She is also a visiting scholar at the Center for Spoken Language Understanding, part of OHSU, in Portland, Oregon. Meg's web page: http://www.abdn.ac.uk/~r07mm9/

Speaker: Margaret Mitchell
Hosted by: Kees van Deemter
Venue: MT203