Computing Science Mini-Workshop on Natural Language Processing

03 September 2014, 14:00 - 17:00

This is a past event

Computing Science Mini-Workshop on Natural Language Processing

This is a mini-workshop on Natural Language Processing with talks by several visitors and department colleagues.

Speakers:

Hailong Cao on "Soft Dependency Matching for Hierarchical Phrase-based Machine Translation"

Abstract: In this talk, I would like to present a soft dependency matching model for hierarchical phrase-based (HPB) machine translation. When a HPB rule is extracted, we enrich it with dependency knowledge automatically learnt from the training data. The dependency knowledge not only encodes the dependency relations between the components inside the rule, but also contains the dependency relations between the rule and its context. When a rule is applied to translate a sentence, the dependency knowledge is used to compute the syntactic structural consistency of the rule against the dependency tree of the sentence. We characterize the structure consistency by three features and integrate them into the standard SMT log-linear model to guide the translation process. Our method is evaluated on multiple Chinese-to-English machine translation test sets. The experimental results show that our soft matching model achieves 0.7-1.4 BLEU points improvements over a strong baseline system.
Bio: Hailong Cao received his PhD from Harbin Institute of Technology (HIT) on 2006 and worked as a postdoc researcher in NICT Japan for a machine translation project. Then he returned School of Computer Science and Technology, HIT to be a lecturer in the Machine Intelligence and Translation Lab (MITLAB). Now he is focusing on the teaching and research about natural language processing (NLP). He is interested in statistical machine translation and syntactic parsing and other areas as well. His papers appeared on ACL, COLING and EMNLP etc. He is an active member of the NLP academic community and has served ACL, COLING and other conferences as a PC member.

Tiejun Zhao on "Chinese Sentence Constituent Parsing"

Abstract: My talk is divided into two parts: first, it is a brief introduction to the research of faculty members in MITLab, SCST, HIT. The second part presents a progress outline for Chinese constituent parsing. We find that the subject, predicate, object and other constituents in a sentence play important roles in text understanding. Some examples illustrate their functions for understanding a Chinese paragraph. In fact, PTB corpus (English and Chinese) have assigned the tags of SUBJ and OBJ, but these tags are discarded in most research. This talk describes our ongoing work on identification of 4 types of Chinese sentence constituents based on CTB and TCT (Tsinghua Chinese Treebank). The main contribution of this work is to propose a new mechanism for hierarchical parsing of Chinese sentence constituents while the previous implementations are in sequential chunks. Primary experiment results have proved its validity.
Bio: Dr. Tiejun Zhao, professor of Research Center of Language Technology, School of Computer Science and Technology, Harbin Institute of Technology. He is deputy director of MOE-MS Key Laboratory of NLP & Speech in HIT. He received his PhD degree from HIT in 1997. Now he teaches the courses of AI, NER and IE for master students and undergraduates in the school. His research interests include: natural language understanding, machine translation, and applied artificial intelligence. He has published over 30 papers on journals and conferences in recent 3 years with his students. He serves Chinese Information Society as an associate director of Machine Translation Subject Committee and the member of editorial board of Journal of Chinese Information Processing. He also serves international NLP community as a PC member, session chair and track chair.

Sien Moens on "Machine Understanding of Text: Advances Made in the MUSE Project"

Abstract: One of the most prominent and challenging goals in natural language processing is text understanding. Although we have developed technology for the recognition of the semantic roles of sentence constituents, for extraction of temporal and spatial information and for coreference resolution, it is still difficult for a machine to understand text in a way that is comparable to human understanding. In this talk we report on advances made in text understanding in the EU project MUSE (www.muse-project.eu/). We especially focus on processing children's stories. We will present results obtained by porting existing models for semantic role labeling trained on news content to the domain of children’s stories, on results and prospects of models that jointly recognise the semantics of a text, and on our current research in inferring semantic information left implicit in a text.
Bio: Marie-Francine (Sien) Moens is a professor at the Department of Computer Science of the Katholieke Universiteit Leuven, Belgium. She holds a M.Sc. and a Ph.D. degree in Computer Science from this university. She is head of the Language Intelligence and Information retrieval (LIIR) research group. Her main interests are in the domain of automated content retrieval and extraction from text using a combination of statistical, machine learning and symbolic techniques, and exploiting insights from linguistic and cognitive theories.

Chenghua Lin on "Sherlock: a Semi-Automatic Quiz Generation System using Linked Data"
Roman Kutlak on "Scrutable Autonomous Systems"
Angrosh Mandya on "Lexico-syntactic Text Simplification and Compression with Typed Dependencies"
Adam Wyner on "Decomposition for Argument Identification and Extraction"

Speaker: Various
Hosted by: Chenghua Lin and Adam Wyner
Venue: Meston 2