Projects

In this section
Projects

The variety of types of research pursued in the NLG group is shown in the details of our major projects of recent years, accessible by the links below.

Current Projects

  • SASSY
  • CURIOS
  • Digital Conservation
  • WikiRivers
  • MinkApp
Affecting People

People | What the work is about | Bibliography


This funding from the Engineering and Physical Sciences Research Council under its "platform grant" scheme provides general support for the activities of the NLG group, focussing on three strands in particular:

  • experimental studies of how readers are affected by language
  • modelling of language users, particularly with regard to "affective" aspects
  • examination of how best to construct more general NLG systems, in terms of internal structures and processes

The grant has supported a number of workshops, collaboration with research work elsewhere, further development of the SimpleNLG software, and several "mini-projects" in the area of NLG.


People


What the work is about

Limitations of Traditional NLG

In the real world, texts vary enormously both in their communicative purpose, and in the abilities and preferences of the people who read them. Much previous research in NLG has assumed that the purpose of generated texts is simply to communicate factual information to a user [17]. There has been little attention to other aims, such as persuading people [16], teaching people [9,25], helping people make decisions [18], [6], and entertaining people [19]. While texts with these other aims usually do communicate information, they do so in order to affect the reader at a deeper level, and this has an impact on how the information should be communicated (the central task of NLG). Even where the main goal is to inform, the other ways in which the language affects the reader may have an important effect on the achievement of that goal.

Traditional NLG tackles a single type of generic goal (factual information) for a general user (or one of a small number of user types). The focus needs to be broadened to a variety of types of goals for specific users. Although NLG research has begun to explore the issues of reader variability (eg [23], [1]), including user modelling (see [24] for a good review), this is at an early stage, and tends to concentrate on broad decisions about content rather than fine-grained linguistic form, the focus of our proposed work.

Our own projects have begun to address these issues. User groups have included children with linguistic difficulties (STANDUP), adults with limited literacy (SkillSum), general members of the public (STOP, ILEX [12]), and professional doctors and engineers (SumTime, [6]), sometimes with individual customisation (STOP, SkillSum). The texts have been informative (SumTime), persuasive (STOP, SkillSum), humorous (STANDUP), and entertaining (NECA).

Strategic Vision

NLG has enormous potential to achieve benefits in the real world, especially given the growing importance of eCommerce, eHealth and eGovernment, but current NLG applications exist only in niche areas. We believe that there are two main reasons for this:

  1. Firstly, many real applications challenge the assumptions of traditional NLG highlighted above (single, generic goal; general user). We would like to push forward the scientific understanding of how the attributes of an individual reader (and the reading process for them) influence the effect that particular linguistic choices have on them. This will then result in an ability to build systems which, from a model of the reader, can intelligently select linguistic forms in order to achieve increasingly ambitious effects. Hence our goal is to learn better how to affect people with natural language.
  2. Secondly, NLG can be somewhat inward-looking. As our current projects (PolicyGrid, BabyTalk) show, NLG adds value to other computational solutions and often cannot be viewed as a stand-alone technology. We would like to lead in the emergence of NLG from its small corner, as it contributes to wider research initiatives and is increasingly exploited commercially. This requires us to make use of the methodologies and knowledge of other disciplines, within and outside Computer Science, to a much greater extent than hitherto. Hence there is a need for strategic alliances with a variety of researchers and disciplines.

To address the problems highlighted above, we see the following scientific themes as especially relevant:

  1. Psychology and Reader Experiments.We need to understand the relevance to NLG of attention, perception and memory. Particularly relevant are results about human reading [15] and how humans align their language use in order to effectively reach their hearers [2]. Although we are already at the forefront of measuring the effects of NLG texts on real users (e.g. testing reading time, or task completion) collaboration with psychologists will enable us to broaden and deepen this strand, looking at more fine-grained measures of reader behaviour (eg using eye-tracking) and assessments of a wider range of effects (such as emotional impact). In general, NLG can offer to psychologists the opportunity to further formalise and test their theories in more realistic settings. In return, results from psychology can inform our user and context models, as well as providing evidence about the effects of language alternatives in controlled settings.
  2. User Modelling and Affective computing. Affective computing is computing that relates to, arises from, or deliberately influences emotions or other non-strictly rational aspects of humans [13]. So far, however, work in "affective NLG" has aimed mainly to produce text that portrays the emotions of the writer, rather than considering how linguistic factors can affect the emotions of the reader. Work in affective computing may provide useful ways of formalising theories of emotion [10], modelling affective state and measuring effects on this state. In general, affective results may be easiest to monitor and achieve in multimodal communication systems, and this may require us to work with areas such as machine vision.
  3. NLG Architectures. The above issues (non-informative texts, reader variation), expose deficiencies in current NLG practices. Complex effects often involve a number of very different aspects of the text (e.g. sentence structuring, choice of vocabulary), interacting in non-trivial ways, and independent of the core factual content. Also, many effects arise from purely surface phenomena (eg text length, choice of words, word co-occurrences), and yet pipeline NLG architectures [17] discover surface effects only after all central decisions have been made. Abstract stylistic goals may have to be balanced against basic communicative tasks [21]; the COGENT project addresses some of these issues. There are a number of approaches to these problems: intelligent backtracking [4], 'overgeneration' architectures [5], and stochastic search [7], but such methods go beyond most current NLG architectures [8], and are still relatively untested on realistic examples.

Benefits

This research can be expected to have large benefits for both science and technology. From a scientific perspective, it will lead to theoretical results about some very poorly understood aspects of language. From an engineering point of view, it will establish practical methodologies for NLG development and evaluation. From a technological perspective, our work could lead to systems that help people in numerous ways, e.g. encouraging people to change their behaviour (cf. STOP, SkillSum), teaching children and other learners (cf. STANDUP), assisting specialists to understand complex data (cf. SumTime, BabyTalk). NLG research is on the cusp of a movement from simple informative software to more general, powerful and varied communication systems. Key to this development is a better understanding of how to affect people with natural language.


Bibliography

  1. Cawsey, A., Jones, R.B., and Pearson, J., "The Evaluation of a Personalised Information System for Patients with Cancer". User Modeling and User-Adapted Interaction, vol 10, no 1, 2000.
  2. Garrod, S. and Pickering, M., "Why is conversation so easy?". Trends Cogn Sciences8(1), pp 8-11, 2004.
  3. Hovy, E.H., "Pragmatics and Natural Language Generation". Artificial Intelligence43(2) pp153-198, 1990.
  4. Kamal, H. and Mellish, C., "An ATMS Approach to Systemic Sentence Generation". Procs of the Third International Conference on Natural Language Generation (INLG-04), New Forest, UK, pp 80-89, 2004.
  5. Langkilde, I. and Knight, K., "Generation that Exploits Corpus-based Statistical Knowledge". Procs of COLING/ACL, 1998.
  6. Law, A., Freer, Y., Hunter, J., Logie, R., McIntosh, N. and Quinn, J., "A Comparison of Graphical and Textual Presentations of Time Series Data to Support Medical Decision Making in the Neonatal Intensive Care Unit". Jnl of Clinical Monitoring and Computing, to appear (2005).
  7. Manurung, H., Ritchie, G., and Thompson, H., "A flexible integrated architecture for generating poetic texts". Procs of the Fourth Symposium on Natural Language Processing (SNLP 2000), Chiang Mai, Thailand, May 2000.
  8. Mellish, C. and Evans, R., "Implementation Architectures for Natural Language Generation". Natural Language Engineering, 10(3/4): pp 261-282, 2004.
  9. Moore, J., Porayska-Pomsta, K., Varges, S. and Zinn, C., "Generating Tutorial Feedback with Affect". Procs of the Seventeenth International Florida Artificial Intelligence Research Symposium Conference (FLAIRS), AAAI Press, 2004.
  10. Oatley, K. and Jenkins, J., Understanding Emotions, Blackwell, 1996.
  11. Oberlander, J. and Gill, A., "Individual differences and implicit language: Personality, parts-of-speech and pervasiveness." In Procs of the 26th Annual Conference of the Cognitive Science Society, pp1035-1040. Chicago, August 5-7, 2004.
  12. O'Donnell, M., Knott, A., Mellish, C. and Oberlander, J., "ILEX: The Architecture of a Dynamic Hypertext Generation System". Natural Language Engineering, 7: pp 225-250, 2001.
  13. Picard, R. W., Affective Computing. MIT Press, 1997.
  14. Piwek, P., "An Annotated Bibliography of Affective Natural Language Generation". Version 1.3 available from http://www.itri.brighton.ac.uk/~Paul.Piwek/topic-papers.html
  15. Rayner, K. and Pollatsek, A., The Psychology of Reading, Lawrence Erlbaum Associates, 1995.
  16. Reed, C. and Norman, T.J. (eds), Argumentation Machines: New Frontiers in Argumentation and Computation. Dordrecht: Kluwer, 2004.
  17. Reiter, E. and Dale, R., Building Natural Language Generation Systems. Cambridge: CUP, 2000.
  18. Reiter, E., Sripada, S., Hunter, J., Yu J., Davy I., "Choosing Words in Computer-Generated Weather Forecasts". Artificial Intelligence167(1-2): pp 137-169, 2005.
  19. Ritchie, G., "Current directions in computational humour". Artificial Intelligence Review16(2): pp 119-135, 2001.
  20. de Rosis, F. and Grasso, F., "Affective Natural Language Generation". In A. Paiva (ed.), Affective Interactions, Springer LNAI 1814, 2000.
  21. van Deemter, K., "Is Optimality-Theoretic Semantics Relevant for NLP?". Jnl of Semantics21(3), 2004.
  22. Walker, Marilyn A., Cahn, Janet E. and Whittaker, Stephen J., "Improvising linguistic style: social and affective bases for agent personality". Pp. 96 - 105 in Proc. 1st International Conference on Autonomous Agents, Marina del Rey, USA, 1997.
  23. Walker, M., Whittaker, S., Stent, A., Maloor, P., Moore, J., Johnston, M., Vasireddy, G. "Generation and Evaluation of User Tailored Responses in Multimodal Dialogue". Cognitive Science, 28(5), pp 811-840, 2003.
  24. Zukerman, I. and Litman, D. "Natural Language Processing and User Modeling: Synergies and Limitations". User Modeling and User-Adapted Interaction, 11(1-2), pp 129 - 158, 2001.
  25. Zinn, C., Moore, J. and Core, M., "Multimodal Intelligent Information Presentation". O. Stock and M. Zancanaro (eds.), Text, Speech and Language Technology, Vol. 27, pages 227-254, Kluwer Academic Publishers, 2005 (in press).
BabyTalk-Family

When a newborn baby is admitted to a neonatal intensive care unit (NICU), parents are frequently overwhelmed by the experience. The neonatal environment in which their baby is looked after can cause feelings of worry, confusion, and helplessness. Parents would often like more information about what is happening to their baby: Like the baby's current weight, oxygen levels, milk feeding quantities, and so on. This coupled with understanding enables parents to adapt and cope with the situation. This sort of information is important because it helps parents to take on their parental role, as well as get involved with the care of their child.

To help supply parents with this sort of information, we are developing a computer system - known as BabyTalk-Family - that can automatically generate easy to understand reports on the medical condition of babies in neonatal care. These reports are updated every 24 hours and made available online to the infant's parents, providing a simple summary of their child's progress.

We are currently working with parents and clinical staff to help improve this system. The system will be trialled in a neonatal unit, in collaboration with the Simpson Centre for Reproductive Health neonatal unit at Edinburgh Royal Infirmary hospital.

Contact: Ehud Reiter


Media


People

University of Aberdeen

NHS Lothian

Digital Economy Hub

The University of Aberdeen has a long-standing tradition of cross-disciplinary research across national and international rural arenas. In the past 10 years, research income in the rural domain totalled 12 million (8.5m active).

This platform of rural research is matched by an equally vibrant and successful programme of ICT research.

Major on-going activities include the International Technology Alliance in Network & Information Sciences (2006-2016), the PolicyGrid eSocial Science Research Node (2006-2012), the Platform Grant - Affecting People with Natural Language (2007-2011) and EC Broadband for All (2004-2009).

Research is based around four interconnecting themes: Accessibility & Mobilities, Healthcare, Enterprise & Culture, and Natural Resource Conservation.

dot.rural applies digital technologies, including intelligent agents, narual language generation, knowledge graph, semantic web and linked data, in the above four themes.

Project Homepage: Digital Economy Hub: Rural Digital Economy

Contact: Pete Edwards

Empirical Effects of Vague Language

We have been carrying out experiments with human subjects investigating the processing of vague quantifiers in referring expressions, eg, 'few', 'many'.

Participants are presented with stimuli on screen in the form of squares containing arrays of dots, and are instructed to select one of the squares with reference to how many dots it contains. The experiments show that, under some circumstances, people make their selection faster when the referring expression uses a vague quantifier than when it uses a crisp alternative. The experiments also show that, under some circumstances, this response time advantage can be achieved by using crisp verbal quantifiers like `fewest', `most', ie, that the response time advantage might not be due to vagueness per se, but to the verbal format.

The results have implications for NLG systems that must choose between different forms of linguistic referring expressions for conveying numerical information to human readers.

This work is supported by the EPSRC Platform Grant

How Was School Today?

Supporting Narrative for Non-Speaking Children

Being able to tell stories about ourselves is a central part of the human experience and of social interaction. Most people do this naturally, for example while chatting with family members over the dinner table. But telling stories about oneself can be a real struggle for people with complex communication needs (CCN); they find it very difficult to create and articulate such stories. People with CCN (ie individuals with severe physical and communication impairments and possibly varying degrees of intellectual disability, eg due to cerebral palsy) rely on computer-generated synthetic speech. Speech generating devices are currently limited to short, pre-stored utterances or tedious preparation of text files which are output, word for word, via a speech synthesiser. Restrictions in speed and vocabulary can be a frustrating experience and are an impediment to spontaneous social conversation.

This project is a follow on to the feasibility study "How was School Today...?" where we wanted to see if we can help children with CCN create stories about what they did in a day by developing a computer tool which produces a draft story based on knowledge of the user's planned daily activities (eg from a diary) and automatically-acquired sensor data; and also an editing and narration tool which lets the user edit the story into something which is his/hers and not just a computer output.

Project Homepage: "How was School Today...?"

Contact: Ehud Reiter

Semantic Grid for Rural Policy Development and Appraisal (PolicyGrid)

PolicyGrid is a research Node of the National Centre for e-Social Science (NCeSS). NCeSS is funded by the Economic and Social Research Council (ESRC) to investigate how innovative and powerful computer-based infrastructure and tools developed over the past five years under the UK e-Science programme can benefit the social science research community. PolicyGrid involves a collaboration between computer scientists and social scientists at the University of Aberdeen, the Macaulay Institute (Aberdeen) and elsewhere in the UK.

The project aims to support policy-related research activities within social science by developing appropriate Grid middleware tools which meet the requirements of social science practitioners. The vision of the Semantic Grid is central to the PolicyGrid research agenda.

The first stage of PolicyGrid developed novel interfaces using NLG to allow researchers to interact with a digital repository. The project is now extending this work to produce a general “NLG service” working on semantic web data whose behaviour can be influenced by “policies” incorporating user preferences and imposed constraints from the environment and context of use.

Contact: Pete Edwards

Common Ground and Granularity of Referring Expressions

Dr Kees van Deemter is collaborating with Dr Raquel Fernandez (Amsterdam) and Dale Barr (Glasgow), with funding from the EURO-XPRAG: ESF Research Networking Programme.

EURO-XPRAG main website

What If?

We have a richness of data about numerous aspects of our activities, yet these data are only any use when we know what they are, agree upon what they are and how they relate to each other. Semantic descriptions of data, the means by which we can achieve these aims, are widely used to help exploit data in industry, academia and at home. One way of providing such meaning or semantics for data is through "ontologies", yet these ontologies can be hard to build, especially for the very people that are expert in the fields whose knowledge is being captured but who are not experienced in the specialised "modelling" field.

In the "what if...?" project we look at the problems of creating ontologies using the Web Ontology Language (OWL). With OWL logical forms, computers can deduce knowledge that is only implied within the statements made by the modeller. So any statement made by a modeller can have a dramatic effect on what is implied. These implications can be both "good" and "bad" in terms of the aims of the modeller. Consequently, a modeller is always asking themselves "what if...?" questions as they model a field of interest. Such a question might be "what happens if I say that a planet must be orbiting a star?" or "what happens if I add in this date/time ontology?".

The aim of the "what if...?" project is to build a dialogue system allowing a person building an ontology to ask such questions and get meaningful answers. This requires getting the computer to determine what the consequences of a change in the ontology would be and getting it to present these consequences in a meaningful way. To do a good job, the system will have to understand something about what the person is trying to do and what sorts of results will be most interesting to them. For this, we need to understand more about how ontologists model a domain and interact with tools; be able to model the dialogues between a human and the authoring system; achieve responsive automated reasoning that can provide the dialogue system with the information it needs to create that dialogue.

Contact: Jeff Z. Pan

The WhatIf project is supported by the Science and Engineering Research Council from 2012 to 2015 through grants EP/J014354/1 and EP/J014176/1.



Key Research Areas

There are three main research areas:

  • Understanding the process of ontology authoring
  • Natural dialogue systems and controlled natural languages
  • Incremental ontology reasoning
  • Reasoning enabled test-driven ontology authoring

Who We Are

University of Aberdeen

  • Chris Mellish
  • Jeff Z. Pan
  • Artemis Parvizi
  • Yuan Ren
  • Kees van Deemter

University of Manchester

  • Caroline Jay
  • Robert Stevens
  • Markel Vigo

Advisors

  • Richard Power, Open University
  • Mike Uschold, Semantic Arts Inc.
  • Peter Winstanley, Scottish Government

Documents, Presentations & Publications

Documents will be posted here in due course.


RefNet

RefNet is an EPSRC research network advancing collaboration between research communities that have tended to work separately, namely computer scientists, linguists and psychologists. The phenomeon on which the network focusses is reference.

Reference is the process of making sure that a user/receiver can identify an entity - for example a person, thing, place, or an event. Reference can be considered the "anchor" of communication. As such it is crucial for communication between people, and for many practical applications: from robotics and gaming to embodied agents, satellite navigation, and multimodal interfaces. Through the study of reference, RefNet will build a base of interdisciplinary skills and resources for research on communication.

Project Homepage: RefNet

Contact: Kees van Deemter


RefNet's objectives are:

  1. To promote high-quality interdisciplinary research, and research resources relating to reference, particularly involving computational linguistics and psycholinguistics.
  2. To find ways to improve practical applications in which reference plays a role.
  3. To build skills for the interdisciplinary study of language and communication.

To do this, RefNet organizes activities whose goals are networking, skywriting, consultation, training, and showcasing of research.

Previous Projects

Atlas

Textual Descriptions Access to Geo-referenced Statistical Data

Summary

A lot of data available to public is geo-referenced. For example, census data is often aggregated over different levels of geographic regions such as counties and wards. Currently such data is presented to the public using thematic maps such as the ones published by National Statistics showing data from the Census 2001.

Although such visual presentations of geo-referenced data work great for sighted users they are inaccessible to visually impaired users. Particularly, visually impaired users find it hard to perceive important trends and patterns in the underlying data which sighted users so effortlessly manage using the visual maps. There are a number of emerging technologies to improve accessibility of map data to visually impaired users such as haptic maps and sonic maps .

In this project we apply Natural Language Generation (NLG) technology to automatically produce textual summaries of map data highlighting 'important' content extracted from the underlying spatial data. We hope that visually impaired users can use existing screen readers to listen to these textual summaries before exploring the data sets in detail using other access methods. We believe that textual summaries of spatial data could be useful to sighted users as well because multi-modal presentations (visual maps + textual summaries) often work better.

Objectives

  1. To develop NLG techniques for generating textual summaries of spatial data.
  2. To evaluate the utility of the textual summaries with visually impaired users in collaboration with Grampian Society for the Blind .
  3. To evaluate the utility of the combination of textual summaries and visual maps in collaboration with HCI Lab, University of Maryland .

People

  1. Yaji Sripada
  2. Kavita Thomas

Publications

  1. Kavita E Thomas and Somayajulu Sripada (2010) Atlas.txt:Exploring Lingustic Grounding Techniques for Communicating Spatial Information to Blind Users, Universal Access in the Information Society. [ONLINE] DOI: 10.1007/s10209-010-0217-5 pdf
  2. Kavita E Thomas and Somayajulu Sripada (2008) What's in a message? Interpreting Geo-referenced Data for the Visually-impaired Proceedings of the Int. conference on NLG. pdf
  3. Kavita E Thomas, Livia Sumegi, Leo Ferres and Somayajulu Sripada (2008) Enabling Access to Geo-referenced Information: Atlas.txt, Proceedings of the Cross-disciplinary Conference on Web Accessibility. pdf
  4. Kavita E Thomas and Somayajulu Sripada (2007) Atlas.txt:Linking Geo-referenced Data to Text for NLG, Proceedings of the ENLG07 Workshop. pdf

Background

This project is part of our ongoing work on developing technology for automatically producing textual summaries of numerical data . Our work on summarising time series data as part of the SumTime project has lead to the development of SumTime-Mousam, an NLG system that was deployed in the industry to generate marine (for the offshore oil industry) weather forecasts from numerical weather prediction (NWP) data. As part of RoadSafe, we are currently extending this technology to generate weather forecasts for winter road maintenance applications. We are also working on summarising scuba dive computer data in the ScubaText project and clinical data from neonatal intensive care units in the BabyTalk project.

Grampian Society for the Blind

Grampian Society for the Blind is a charity providing advice and support to people with visual impairments in the North-East (of Scotland). In the current project we work closely with their members for understanding their requirements and also for evaluating our technology.

Funded by EPSRC Logo

BabyTalk

BabyTalk is investigating ways of summarising and presenting patient information to medical professionals and family members. Our focus is on data in the Neonatal Intensive Care Unit.

This involves the use of Intelligent Signal Processing to analyse and interpret the available information about the patient, and Natural Language Generation techniques to generate coherent, readable summaries of this information in English.

Our ultimate aim is to use this technology to provide decision support to medical professionals, who base treatment on large amounts of information. Summaries will also help to keep family members informed about the condition of their baby.

Joking Computer

The Joking Computer project ran from February 2009 until October 2010, with funding from the EPSRC (Grant EP/G020280/1), under the Partnerships for Public Engagement scheme. It was based in the Natural Language Generation Group within Computing Science at the University of Aberdeen , with collaboration from the Glasgow Science Centre and Satrosphere .

The aim was to create an interactive exhibit, for use in science centres, based on the software built by the STANDUP project, and was related to the NLG Group's EPSRC-supported work on Affecting People with Natural Language . A further outcome was a website aimed at the general public , to provide information about this type of research in a very accessible way.

The staff involved at Aberdeen were:

Publicity about this project

Media Coverage

December 2009

NEONATE

The NEONATE project has three major objectives:

  • to investigate on a systematic basis, a comprehensive range of actions taken in the Neonatal Intensive Care Unit
  • to identify the terms used to describe patient state by staff at different levels and types of expertise
  • to use the results of these investigations to implement and evaluate computerised aids designed to support clinical decision making

Making more data available to decision makers does not necessarily of itself lead to improved care. This has been demonstrated in the neonatal intensive care unit where providing nurses and junior doctors with detailed trends of physiological information does not lead to improved patient outcomes. Our earlier studies (COGNATE project) have shown that a major reason for this finding is that the staff caring for the infants observe them closely and frequently to obtain more information than just the data shown on the monitors.

Presenting Ontologies in Natural Language

Chris Mellish and Xiantang Sun, supported by EPSRC grant GR/S62932.

  • 2004 Project poster
  • 2005 Project poster
  • Mellish, C. and Sun, X., "Natural Language Directed Inference in the Presentation of Ontologies", Procs of the 10th European Workshop on Natural Language Generation, Aberdeen, 2005. PDF version
  • Mellish, C. and Sun, X., "The Semantic Web as a Linguistic Resource: Opportunities for Natural Language Generation". Presented at the Twenty-sixth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, 2005. Also in Knowledge Based Systems Vol 19, pp298-303, 2006. PDF version
  • Pan, J. and Mellish, C., "Supporting Semi-Automatic Semantic Annotation of Multimedia Resources". Presented at the special session on "Semantics in Multimedia Analysis and Natural Language Processing" at the 3rd IFIP Conference on Artificial Intelligence Applications & Innovations (AIAI), Athens, 2006 PDF version
  • Mellish, C. and Pan, J., "Finding Subsumers for Natural Language Presentation". Presented at the DL2006 International Workshop on Description Logics, Windermere, England, 2006. PDF version
  • Sun, X. and Mellish, C., "Domain Independent Sentence Generation from RDF Representations for the Semantic Web". Presented at the ECAI06 Combined Workshop on Language-Enabled Educational Technology and Development and Evaluation of Robust Spoken Dialogue Systems, Riva del Garda, Italy, 2006. PDF version
  • Sun, X. and Mellish, C., "An Experiment on `free' Generation from Single RDF Triples". Presented at the European Workshop on Natural Language Generation, Dagstuhl, Germany, 2007. PDF version
  • Mellish, C. and Pan, J., "Natural Language Directed Inference from Ontologies". Artificial Intelligence 172(10): 1285-1315 (2008). PDF version
  • Prolog code for generating subsumers of ontology concepts for natural language presentation

Related papers:

  • Hielkema, F., Edwards, P. and Mellish, C., "Flexible Natural Language Access to Community-Driven Metadata". Submitted for publication, 2007. PDF version
ROADSAFE

RoadSafe was a collaborative project between the Computing Science Department at the University of Aberdeen and Aerospace & Marine International. The RoadSafe project aimed to build upon the expertise Aerospace & Marine International has in weather forecasting and the expertise the Computing Science Department at the University of Aberdeen has in building real world Natural Language Generation Systems.

The RoadSafe project:

  • used Knowledge Aquisition techniques to understand how humans write textual instructions for road maintenance vehicle routing
  • produced a system capable of automatically evaluating a region's geographical data combined with the weather forecast for 10'000s of points in that region to provide textual routing and de-icer spread rate instructions
  • utilised Aerospace & Marine International's expert forecasters in order to post-edit generated advisory texts and therefore improve the performance of the system

The main objective of the project was to use the advisory texts produced by RoadSafe as a guide to local councils for grit and salting applications during the winter.

People

External Collaborator

  • Ian Davy, Aerospace & Marine International

Publications

Publicity

Demos

SCUBATEXT: Generating Textual Reports of Scuba Dive Computer Data

SCUBA divers carry out decompression stops while ascending to the surface to allow their bodies to naturally get rid of the unwanted nitrogen. Divers can also be decompressed in decompression chambers to remove excess Nitrogen. Over the years dive tables have been used to provide guideline information about required decompression times during the ascent of a dive and also about required rest times between two successive dives. When used faithfully these tables help in planning safe dives to avoid 'the bends'.

One of the modern items of diving gear is a dive computer. A dive computer is a sports gadget that is worn on the divers' wrist (looks more like a wrist watch than a computer) to continually monitor their dives. A dive computer continuously records data such as depth and ambient temperature about the dive. It can also generate a dive table on the fly and compare the recorded data against the table data to inform divers about required decompression stops. They therefore ensure that divers are continually informed to perform safe dives.

Dive computers record dive logs which contain time series of dive depth and tissue saturation. These data sets can be useful to:

  • clinicians - to diagnose decompression illness
  • diving Instructors - to evaluate learners' dives and to provide feedback
  • dive supervisors - to monitor dives

In this project we develop techniques to produce textual (English) reports of dive data recorded by dive computers. The computer generated report will contain the following information

  • Issues across multiple dive profiles such as:
    • rapid ascent incidents
    • necessary and unnecessary stops
  • Unsafe dive profiles with special patterns such as square and reverse profiles:
    • square
    • saw-tooth
    • reverse
SkillSum

SkillSum developed an automatic assessment and reporting tool for adult basic skills (literacy and numeracy). The tool was a web-based system that allowed new entrant students at a college to take a basic skills assessment as part of their normal enrolment process.

When the test was completed, the tool produced a report for the user describing his or her skill level and whether this was adequate for the course about to be taken, and suggesting actions he or she could take to improve basic skills.

STANDUP

StandUp logoThe STANDUP project (System To Augment Non-speakers' Dialogue Using Puns) is a collaborative project between the School of Computing at the University of Dundee, the School of Informatics at the University of Edinburgh, and the Department of Computing Science at the University of Aberdeen, funded by EPSRC (the Engineering and Physical Sciences Research Council). The project began in October 2003 and ran until March 2007.

Overview

We have explored how humour may be used to help non-speaking children learn to use language more effectively. There is evidence to suggest that language play, including using puns and other jokes, has a beneficial effect on a child's developing language and communication skills.

Children with communication impairments are often reliant on augmented communication aids in order to carry on conversations, but these aids give little scope for generating novel language. This inhibits experimentation with language and limits the trying out of humorous ideas, which can in turn have a stultifying effect on language development.

We have begun to address this deficiency in the language environment of the non-speaking child by providing a software tool which promotes humorous language play.

Starting from our previous research on the automated generation of punning riddles, we have designed and implemented a program which allows the user to experiment with the construction of simple jokes. The user interface of this system has been specially designed to be accessible to children with communication and physical disabilities. We have carried out tests of the usability and appropriateness of the system by observing and evaluating the use of our software by children.

STANDUP is no longer an active project.

People

University of Aberdeen

University of Dundee

University of Edinburgh

Events & Progress
  • 29 July-5 August 2006 Dr. Annalu Waller and Mr. Rolf Black attended the ISAAC 2006 conference in Düsseldorf, Germany, where they presented two papers.
  • 3-6 July 2006 Dr. Graeme Ritchie attended the 18th International ISHS Humor Conference, which took place at the Danish University of Education (Danmarks Paedagogiske Universitet, DPU) in Copenhagen. Dr. Ritchie gave a presentation on STANDUP, including a demonstration of the software.
  • Summer 2006 The STANDUP software is being evaluated at Capability Scotland 's Corseford School, Renfrewshire.
  • May 24-26 2006 Dr. Ruli Manurung attended LREC 2006 in Genoa, presenting a poster and paper titled "Building a Lexical Database for an Interactive Joke-Generator".
  • July 22-27 2005 Dr. Dave O'Mara attended HCII 2005 in Las Vegas, presenting a paper titled "Facilitating user feedback in the design of a novel joke generation system for people with severe communication impairment"
  • April 28 2005 Dr. Dave O'Mara and Dr. Ruli Manurung were invited to speak at the AAC Special Interest Group Seminar in Perth. The talk presented some background to the STANDUP project, and concentrated on the recently developed mockup implementation and the early evaluation studies conducted.
  • April 2005 A complete working "mockup" for the first version of the STANDUP software has been implemented. It is intended to test the design and usability of the user interface. Dr. Dave O'Mara and Dr. Ruli Manurung will shortly be carrying out initial evaluation studies with domain experts and end users across Scotland.
  • February 2005 The first version of the STANDUP "backend", i.e. joke generation component, has been implemented. At the core of this component is an SQL database containing most of the lexical resources needed for joke generation. This database is generated by combining existing freely available lexical resources such as the Unisyn phonetic dictionary and WordNet , a widely used lexical semantics database.
  • October 21st: Dr. Dave O'Mara was invited to be the guest speaker by the Dundee section of the British Federation of Women Graduates. The talk lasted around an hour and fifteen minutes and presented the background to why we think humour and joke generation can help children with language impairment improve their language and communication skills.
  • October 6th-10th: Dr. Dave O'Mara presented a paper titled "The Role of Assisted Communicators as Domain Experts in Early Software Design" at the ISAAC 2004 conference in Natal, Brazil.
  • January 2004: Abstracts submitted to ISHS 2004 and ISAAC 2004 conferences.
  • October 2003: Project commences. Research assistants Dave O'Mara and Ruli Manurung take up posts.

Workshop on Language Play and Computers (August 2006)

The STANDUP Project hosted a 2-day Workshop on Language Play and Computers at the University of Dundee , on the 25th and 26th August 2006. This workshop presented a variety of recent research into developing software which gives children the opportunity for language play through interaction with a computer.

There were talks by researchers who have developed language play software, including the local creators of the STANDUP program which allows children to create their own puns. There was also time for hands-on experience of some examples of this type of software, under the supervision of its designers.

The event was aimed both at practitioners in education, including teachers and speech/language therapists, and at researchers in education, language, humour, and (especially) combinations of these.

The presentation slides are available for download below.

Programme Outline

Day 1 (Friday)

Chair: Graeme Ritchie, University of Aberdeen

09.00 - 09.30: Registration (Coffee)
09.30 - 09.45: Welcome and introduction to workshop
09.45 - 10.45: Judy Robertson (Glasgow Caledonian University - Now Heriot Watt University ) StoryStation: intelligent feedback on story writing
10.45 - 11.15 Refreshment break
11.15 - 12.15 Nicola Yuill (University of Sussex )
The Laughing PC: Using Jokes in Software to Improve Children's Reading Comprehension
12.15 - 13.15 Lunch
13.15 - 14.15 Annalu Waller (University of Dundee )
"I want to tell you a joke. Are you ready?":an introduction to the STANDUP Project
14.15 - 14.45 Refreshment break
14.45 - 16.00 Hands-on practice with software
16.00 - 16.30 Feedback and discussion session

Day 2 (Saturday)

Chair: Annalu Waller, University of Dundee

9.00 - 9.30 : Registration (Coffee)
9.30 - 9.45 : Welcome and introduction to 2nd day
9.45 - 10.45 : Judy Robertson (Glasgow Caledonian University - Now Heriot Watt University )
Developing young people's storytelling skills through computer game design
10.45 - 11.15 : Refreshment break
11.15 - 12.15 : Helen Pain (University of Edinburgh )
Joke generation by children with complex communication needs: approaches to evaluation and findings in the STANDUP project
12.15 - 1.15 : Lunch
1.15 - 2.15 : Lisa Gjedde (Danish University of Education )
Storytelling, play and learning in an augmented interactive environment
2.15 - 2.45 : Refreshment break
2.45 - 4.00 : Hands-on practice with software
4.00 - 4.30 : Feedback and discussion session

Content of talks

StoryStation: intelligent feedback on story writing

(Judy Robertson)
Download presentation slides here (560kb).

StoryStation is an intelligent tutoring system which gives children feedback as they write stories. The software is intended for children aged ten years and above who have a basic competency in writing, but would benefit from further help. The system provides assistance with spelling, vocabulary usage and characterization techniques, as well as tools such as word banks, a dictionary and a thesaurus. The main pedagogical philosophy behind StoryStation is to identify and praise the pupils' skills as a strategy to help them evaluate and appreciate their own work. Feedback is generated by comparing the skills a pupil has used in his current story with skills he demonstrated in previous stories. If the pupil has not used the system before, his mastery of individual skills is compared to the norms derived from stories previously written by other pupils of the same ability level. The feedback is presented via animated characters. This talk will describe the learner centred development process of StoryStation, which involved extensive consultation with teachers and pupils and present some initial findings from a field study in an Edinburgh school.

The Laughing PC: Using Jokes in Software to Improve Children's Reading Comprehension

(Nicola Yuill)
Download presentation slides here (218kb).

In this presentation I will show two pieces of technology to support language play and reading comprehension. First I describe 'Joke City', a piece of software that supports children's discussion of ambiguous language in jokes, which has been shown to improve their comprehension. Associated with this software is a suite of literacy assessment tools developed with Brighton & Hove local authority, some of which can be self-administered by children and automatically scored, using a school ICT suite. Second, I present WordCat, a piece of software we have developed that helps children classify words simultaneously by their spelling patterns and meanings. This makes use of SCOSS, a generally-applicable software interface that helps children work collaboratively.

"I want to tell you a joke. Are you ready?":an introduction to the STANDUP Project

(Annalu Waller)
Download presentation slides here (1.37mb).

The STANDUP project has developed interactive software which allows children with complex communication needs (CCN) to generate novel punning riddles. Typically developing children enjoy jokes and riddles, offering an opportunity to practise language, conversation and social interaction skills during childhood. CCN restricts the opportunities to play with language, and this in turn restricts the development of linguistic, communicative and social skills. Children with CCN do access pre-stored humour using existing AAC devices. However, independent access to novel language is difficult. The STANDUP project has addressed this problem by designing interactive software which allows a child to generate and tell novel puns. This is done using information about concepts, words, their relationships to each other, and additional details such as rhyme. The user-interface is appropriate for users with physical and language impairments, and allows different levels of complexity (of vocabulary, joke structure, etc.). For example, at the simplest level, requesting 'any joke' might result in a joke such as: "What do you call a spicy missile? -- A hot shot." At a more complex level, the user may start by choosing a topic word. This talk discusses the role of humour in the development of language skills, and introduces the audience to the techniques employed to involve therapists, teachers and adults who use aided communication in the design of a software language playground for children with CCN.

Developing young people's storytelling skills through computer game design

(Judy Robertson)
Download presentation slides here (72kb).

In this talk I will discuss the development of young people's interactive storytelling skills within a game creation environment. I will describe the Gamemaker workshop series, in which young people learn to use a computer game authoring tool called Neverwinter Nights to develop interactive, branching stories. Given this new medium for narrative expression, there is need to better understand the ways in which the young people choose to express their story ideas, in order to provide further support and scaffolding for their developing narrative skills. In particular, I will discuss the plot themes they chose to explore, and the ways in which they included interactivity in the story, particularly through dialogue.

Joke generation by children with complex communication needs: approaches to evaluation and findings in the STANDUP project

(Helen Pain)
Download presentation slides here (2.21mb).

The STANDUP project has developed interactive software which allows children with complex communication needs (CCN) to generate novel punning riddles. The project takes a user-centred design approach, with formative evaluation throughout the design process with targeted feedback provided by both Speech and Language Therapists familiar with the children in the target population, and by adults who were previously in this group. We discuss how this feedback influenced the design process. We also consider methodological issues of this approach for this group of users. We will describe the evaluation study carried out in an independent special school, using a multiple single-participant case study methodology with 9 children with CCN. Initial piloting was carried out with 10 typically developing children (TDC) to identify any problems with the study design. Further evaluation is planned with TDC. Outcomes of the evaluation study will be discussed.

Storytelling, play and learning in an augmented interactive environment

(Lisa Gjedde) Download presentation slides here (1.3mb).

Story-based learning in an augmented interactive environment may offer learners with multiple functional deficits a new way of learning and communicating. By using storytelling and games it is possible to create an interactive learning environment that enables non-speaking learners to communicate their sense of identity in more complex ways than is usually possible. This talk will present some of the findings of a research and development project that resulted in the design of a story-based multimedia program about life in medieval times, "A Medieval Tale". The program is augmented for severely challenged learners. Based on cases and examples of use by this target group, there will be a discussion of the potential of this type of program for learning and communication.

This is a multimedia narrative learning resource about medieval life, with augmentative functions for learners with multiple functional deficits. The program uses a narrative framework to address learners of different abilities, making it an inclusive tool for learning and experiencing the world of medieval fiction and culture. The program is the first inclusive resource to span a wide target group through the use of storytelling and interactive games in an augmented computer environment.

Publications

2009

  • Waller, A., Black, R., O'Mara, D.A., Pain, H., Ritchie, G., Manurung, R. (2009) Evaluating the STANDUP Pun Generating Software with Children with Cerebral Palsy. ACM Transactions on Accessible Computing (TACCESS) Volume 1, Issue 3 (February 2009) Article No. 16. Copy on publisher's site

2008

  • Manurung, R., Ritchie, G., Pain, H., Waller, A., O'Mara, D., Black, R. (2008) The construction of a pun generator for language skills development. Applied Artificial Intelligence, 22(9) pp. 841-869. (PDF copy )
  • Manurung, R., Ritchie, G., Pain, H., Waller, A., O'Mara, D., Black, R. (2008) Adding phonetic similarity data to a lexical database. Language Resources and Evaluation 42 (3), pp.319-324. (PDF copy ) (Copy on publisher's site )

2007

  • Ritchie, G., Manurung, R., Pain, H., Waller, A., Black, R. and O'Mara, D. (2007) A practical application of computational humour. Pp. 91-98 in Proceedings of the 4th International Joint Conference on Computational Creativity, ed. Amilcar Cardoso and Geraint A. Wiggins. London. (PDF )
  • Black, R., Waller, A., Ritchie, G., Pain, H., Manurung, R. (2007) Evaluation of Joke-Creation Software with Children with Complex Communication Needs. Communication Matters 21 (1), pp. 23-28.

2006

  • Manurung, R., Ritchie, G., O'Mara,D., Waller, A., Pain, H. (2006) Combining lexical resources for an interactive language tool. In Proceedings of ISAAC 2006, the 12th Biennial International Conference of the International Society for Augmentative and Alternative Communication (CD), Düsseldorf, Germany, 29 July - 5 August 2006. (PDF ).
  • O'Mara,D., Waller, A., Manurung, R., Ritchie, G., Pain, H., Black, R. (2006) Designing and evaluating joke-building software for AAC users. In Proceedings of ISAAC 2006, the 12th Biennial International Conference of the International Society for Augmentative and Alternative Communication (CD), Düsseldorf, Germany, 29 July - 5 August 2006. (PDF ).
  • Manurung, R., O'Mara,D., Pain, H. Ritchie, G., Waller, A., (2006) Building a lexical database for an interactive joke-generator. In Proceedings of LREC 2006 , the Fifth International Conference on Language Resources and Evaluation (CD), Genoa, Italy, 24-26 May 2006. (PDF ). Poster downloadable here .
  • Ritchie, G., Manurung, R., Pain, H., Waller, A., O'Mara,D. (2006) The STANDUP Interactive Riddle Builder. IEEE Intelligent Systems 21 (2), March/April. Pp. 67-69. (PDF )

2005

  • Waller, A., O'Mara, D., Manurung, R., Pain, H., and Ritchie, G. (2005) Facilitating user feedback in the design of a novel joke generation system for people with severe communication impairment. In Proceedings of HCII 2005 (CD), Vol.5, G. Salvendy (Ed). Lawrence Erlbaum, NJ, USA. (PDF ).

2004

  • O'Mara, D., Waller, A., Manurung, R., Ritchie, G., Pain, H. (2004) I say, I say, I say... Australian Group on Severe Communication Impairment News. Vol. 23,2. ISSN: 1443-9107
  • Manurung, R., Low, A., Trujillo-Dennis, L., O'Mara, D., Pain, H., Ritchie, G. and Waller, A. (2004) Interactive computer generation of jokes for language skill development. Presented at 2004 Conference of International Society for Humor Studies, Dijon, France. (PDF). Slides from presentation available here .
  • O'Mara, D., Waller, A., Ritchie, G., Pain, H. and Manurung, R. (2004) The role of assisted communicators as domain experts in early software design. In Proceedings of ISAAC 2004, the 11th Biennial International Conference of the International Society for Augmentative and Alternative Communication (CD), Natal, Brazil, 6-10 October 2004. (PDF).

Background to STANDUP

  • O'Mara, D., Waller, A. and Todman, J. (2004) The recognition and use of verbal humour by children with language impairment. Presented as Emerging Scholar, 2004 Conference of International Society for Humor Studies, Dijon, France. (PDF).
  • O'Mara, D. and Waller, A. (2003). What do you get when you cross a communication aid with a riddle? The Psychologist 16(2), pp.78-80. ISSN 0952-8229. (PDF )
  • Binsted, K., Pain, H. and Ritchie, G. (1997). Children's evaluation of computer-generated punning riddles. In Pragmatics and Cognition 5(2):305-354. (PDF , PS)
  • Binsted, K. (1996). Machine humour: An implemented model of puns. PhD thesis, University Of Edinburgh, Edinburgh, Scotland. (PDF )
Publicity

This page gives details of some of the coverage of STANDUP in the wider world.

Links

 

Software

About

There are a number of software and data files available from the STANDUP project. None of these are supported in any way, and are supplied without guarantees.

The original STANDUP software

  • GO TO STANDUP 1 DOWNLOAD - see below

Newer variant - STANDUP 2 (The Joking Computer)

In December 2010, a different version of the software (STANDUP 2) was made available for download, as a result of The Joking Computer project. It has slightly different facilities from the original STANDUP software (now known as "STANDUP 1"), and is not simply an extended version. The download and installation procedure is very similar.

  • GO TO STANDUP 2 (JOKING COMPUTER) DOWNLOAD - www.abdn.ac.uk/jokingcomputer/JC_Download.shtml

Other Resources

Various programs and data produced on the STANDUP project are also available from Resources panel below.

Resources
From this page, you can download various software, data and documentation files produced by the STANDUP and Joking Computer projects (and related to the main STANDUP system).

STANDUP 1 Files

STANDUP 1 Program (source code) and documentation

This is the STANDUP 1 Java program in full:

This is the documentation for the above program:

STANDUP 2 (Joking Computer) Files

STANDUP 2 Program (source code) - This is the STANDUP 2 Java program:

There is no complete and up to date documentation for the STANDUP 2 Java code, although much of the STANDUP 1 documentation is still applicable.

Java APIs

STANDUP 1 and STANDUP 2 differ only in the user interface facilities. They have the same backend (dictionary and joke-building mechanisms). The following APIs were constructed for STANDUP 1, but should also work with STANDUP 2.

If you want to build a Java program which uses the facilities of the STANDUP Joke Generator (without the STANDUP User Interface), then here are the files which you need for the relevant Java API (class definitions).

  • The Joke Generator API

If you want to build a Java program which uses the facilities of the STANDUP Lexical Database (without the STANDUP User Interface and without the STANDUP Joke Generator), then here are the files which you need for the relevant Java API (class definitions).

  • The Lexicon API

The Databases, and the Database Construction Kit

The files here will enable you to modify the lexical database in various ways (if you have the technical expertise and stamina):

  • The Database Files and Construction Kit - homepages.abdn.ac.uk/g.ritchie/pages/standup/downloads/resources/database/

Documents

Riddle Generation API

(Last update to software/data: 31 March 2007; this page last edited: 7 June 2007)

This page contains information on the STANDUP riddle generation API. It is a JAVA .jar library that provides access to the joke generation functionality of the STANDUP system.

Download

The files related to the API are as follows:

Documentation

The main package relating to joke generation is the standup.joke package. See the package summary documentation here , and in particular, the package description here for a detailed overview of the various classes and interfaces.

Contents of .jar

The standup_jokegen_v1.4.1.jar file contains Java bytecode for the following packages (in some cases, subsets of the full packages, i.e. classes that are required for joke generation):

  • standup.authoring
  • standup.authoring.dbbuild
  • standup.authoring.familiarityscoring
  • standup.authoring.jokebuilder
  • standup.authoring.wordsets
  • standup.joke
  • standup.lexicon
  • standup.profiling
  • standup.sql
  • standup.symbol
  • standup.unify
  • standup.utils
  • standup.xml

Additionally, under standup/resources/, it contains various data files required for joke generation.

Additional Files

Although not formally required, the STANDUP joke generation API is designed to work together with the STANDUP SQL lexical database.

Certain functionality requires supplementary .jar files on the Java classpath, which can be downloaded from their respective webpages; e.g.:

Example Usage

The TestJokeGen.java file is an example program that demonstrates how to use the joke generation API. To compile and run it, do the following:

javac.exe -cp standup_jokegen_v1.4.1.jar TestJokeGen.java
java -Xms384m -Xmx384m -cp .;standup_jokegen_v1.4.1.jar TestJokeGen
Lexical Database API

(Last software/data update: 31 March 2007; this page last edited: 7 June2007)

This page contains information on the STANDUP lexical database API. It is a JAVA .jar library that provides access to the lexical database functionality of the STANDUP system.

Download:

The files related to the API are as follows:

Documentation:

The main package relating to the lexical database is the standup.lexicon package.
See the package summary documentation here , and in particular, the package description for a detailed overview of the various classes and interfaces.

Contents of .jar:

The standup_lexicon_v1.4.1.jar file contains Java bytecode for the following packages (in some cases, subsets of the full packages, i.e. classes that are required for lexicon manipulation):

  • standup.authoring
  • standup.authoring.dbbuild
  • standup.authoring.familiarityscoring
  • standup.authoring.wordsets
  • standup.lexicon
  • standup.profiling
  • standup.sql
  • standup.symbol
  • standup.unify
  • standup.utils
  • standup.xml

Additionally, under standup/resources/, it contains various data files required for lexical access.

Additional files:

Although not formally required, the STANDUP lexical database API is designed to work together with the STANDUP SQL lexical database .

Certain functionality requires supplementary .jar files on the Java classpath, e.g.:

You can download the latest versions from their respective webpages, or you can obtain copies from here .

Example usage:

The TestLexicon.java file is an example program that demonstrates how to use the lexical database API. To compile and run it, do the following:

javac.exe -cp standup_lexicon_v1.4.1.jar TestLexicon.java
java -Xms384m -Xmx384m -cp .;standup_lexicon_v1.4.1.jar TestLexicon
SQL Lexical Database
(Last update to software/data : 28 November 2006; this page last edited: 7 June 2007)

This page contains information on the STANDUP SQL lexical database. It is a lexicon that has the following features:

  • Integrates semantic, orthographic, and phonetic information from various lexical resources such as WordNet and Unisyn .
  • Maps specific (WordNet-based) wordsenses to AAC/literacy symbols such as Widgit Rebus and Mayer-Johnson PCS.
  • Associates wordsenses with familiarity scores, a measure of how "familiar" a word is. Various lexical resources contribute towards this measure, among others, the MRC psycholinguistic database , the British National Corpus , and SemCor.

It is implemented as an SQL relational database using the PostgreSQL database server.

The STANDUP lexical database was created to support the STANDUP interactive riddle generator system, but could plausibly be used for other general-purpose applications. As such, we have created two instances of the database: the _joke variant and the _lex variant.

Database instances: _joke vs. _lex

We provide 2 instances of the STANDUP lexical database: one that is intended to support joke generation, and one for general-purpose lexical usage.

Joke-generation database

  • Contains a subset of the lexicon, i.e. only wordsenses with a familiarity score > 0. Total lexeme count: 45506.
  • Contains cached schema instantiations for joke generation.
  • Database size: 3.32GB

General-purpose lexical database

  • Contains the full lexicon, i.e. includes wordsenses with familiarity score = 0. Total lexeme count: 130263.
  • Does not contain any joke generation-specific information.
  • Database size: 7.52GB

Database construction kit

Aside from the two instances of databases described above, we also provide a "database construction kit" that enables the creation of a customized version of the STANDUP lexical database. It consists of a collection of SQL scripts and various supplementary data files used by the scripts. Instructions on how to use this kit are detailed below .

Download:

The files related to the lexical database are as follows:

Installing PostgreSQL

  1. Download and extract PostgreSQL to your hard drive somewhere temporarily. Double-click the extracted postgresql-8.1.msi file to begin the installation process.
  2. Leave the selected language as English and click "Start". Click "Next" twice.
  3. You should then see the "Installation options" screen. The default behaviour of PostgreSQL is to install itself under C:\Program Files\PostgreSQL\8.1\ -- if this presents a problem, you can change it here by clicking the 'Browse' button.
  4. Click "Next". At the next screen, you can just leave all the default settings as is. Just make sure that "Install as a service" is checked.
  5. You can enter any password you want here, but if you just leave it blank, one will be randomly generated for you. This is the password for the Windows account that will run the service, not the database superuser account (that comes later).
  6. Click Next. If it asks for confirmation whether to create the account, click Yes.
  7. You should then see the "Initialise database cluster" screen.
    Set locale to "English, United Kingdom".
    Set encoding to "UTF-8".
    Set superuser name to "postgres".
    Set password to "pgsuper!" (without the quotation marks).
    Reconfirm password: "pgsuper!".
  8. Click Next. You should then see the "Enable procedural languages" screen.
  9. Make sure "PL/pgsql" is checked and click Next. You should then see the "Enable contrib modules" screen. Leave things as is and click Next.
  10. Click Next again. This should begin the installation. It might take a few minutes.
  11. Click Finish.

Restoring an existing database

Now that PostgreSQL is installed, we need to load, or in Postgres parlance, restore the standup_v1.4 database.

First, download either standup_v1.4_061127_joke.backup or standup_v1.4_061127_lex.backup to your hard drive somewhere temporarily.

There are 2 ways to restore the database, i.e. by entering the command from a DOS command line interface, or by using pgAdmin III, the PostgreSQL administration GUI tool. They both accomplish the same thing, so it's down to your preference:

The command-line way

  1. Open a DOS command prompt. You can do this by going to the Start menu and choosing "Run...". In the resulting dialog box, type in "cmd" and click OK.
  2. If you haven't changed any settings above, enter this command to create the database:
    "C:\Program Files\PostgreSQL\8.1\bin\createdb.exe" -E UTF8 -U postgres "standup_v1.4"
  3. If successful, it should return with a CREATE DATABASE message. Now, to restore the database, enter this command:
    "C:\Program Files\PostgreSQL\8.1\bin\pg_restore.exe" -i -U postgres -d "standup_v1.4" -v "C:\My Documents\X.backup"
    (Where X is either standup_v1.4_061127_joke or standup_v1.4_061127_lex, and is assumed to be saved to the My Documents folder. If you saved it anywhere else, change the command above accordingly.)
  4. This can take anywhere between thirty minutes and a few hours depending on the configuration of the computer being used (in particular, hard disk speed and amount of RAM). If you spot an error saying 'could not execute query: ERROR: language "plpgsql" already exists', just ignore it -- it's perfectly normal. Once the restore process is complete, it should say something like: WARNING: errors ignored on restore: 1 -- this is simply reporting the aforementioned error.
  5. Close the DOS window by entering the command exit or pressing the 'X' icon in the top right corner.

The GUI way

  1. Launch the pgAdmin III tool: go to the Start menu, choose Programs > PostgreSQL > pgAdmin III
  2. On the left side of the window should be a list of Servers containing 1 entry: "PostgreSQL Database Server 8.1 (localhost:5432). Double-click this entry.
  3. A "Connect to server" dialog box should pop up. Enter the password you entered earlier: "pgsuper!" (without the quotation marks) and click OK.
  4. Some new entries should appear: Databases, Tablespaces, Group Roles, and Login Roles. Right-click on Databases and choose "New Database".
  5. A "New Database" dialog box should pop up. Enter name: "standup_v1.4". Leave everything as is (everything else should be empty except Encoding, which should be "UTF8". Click OK. This will create the "standup_v1.4" database.
  6. Now double-click the "Databases" entry to expand it. You should see the 'standup_v1.4' database there.
  7. Right-click on "standup_v1.4" and choose "Restore". The "Restore Database standup_v1.4" dialog box should pop up. Click the "..." button next to the Filename field, and locate the standup_v1.4_061127_joke.backup or standup_v1.4_061127_lex.backup file you downloaded. Click OK and the database restore process will begin. This can take anywhere between thirty minutes and a few hours depending on the configuration of the computer being used (in particular, hard disk speed and amount of RAM). If you spot an error saying 'could not execute query: ERROR: language "plpgsql" already exists', just ignore it -- it's perfectly normal.
  8. Once the restore process is complete, it should say something like: "WARNING: errors ignored on restore: 1
    Process returned exit code 1."

    -- this is simply reporting the aforementioned error.
  9. At this point, do NOT click the "OK" button! This will cause PostgreSQL to try and restore the database again, and this will only serve to confuse it! Click the "Cancel" button instead.
  10. Exit the pgAdmin III application by choosing File > Exit.

Using the database construction kit

  • Unzip standup_dbkit_v1.4.zip somewhere to your hard drive.
  • Create the database, e.g. by running the following command:

    "C:\Program Files\PostgreSQL\8.1\bin\createdb.exe" -E UTF8 -U postgres "standup_v1.4"

    If successful, it should return with a CREATE DATABASE message.
  • Obtain a psqlterminal to the newly created database, e.g.:

    "C:\Program Files\PostgreSQL\8.1\bin\psql.exe" -h localhost -p 5432 standup_v1.4 "postgres"

    (Make sure the database construction kit directory, e.g. /dbbuildscript, is the current directory.)
  • Execute the first stage of the database construction process by entering the following:

    \i batchscript1.sql

    Upon completion, compute the familiarity scores(FAM-scores) for the lexemes in the database:
    1. Obtain disambiguated custom lexicons needed for computing FAM-scores using the wordset disambiguation tool:

      java -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.wordsets.WordSetTool

      Alternatively, use the ones found in dbbuildscript/data/wordsets_20061112.zip. These are the lexicons used for FAM-score values found in the STANDUP databases above.
    2. Run the FAM-score Calculator and configure the various score sources:

      java -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.familiarityscoring.FScoreCalculator

      As a reference guide, the STANDUP lexicon uses the following values for the score source priorities and ranges:

      1. MRC psycholinguistic DB: age of acquisition [0,1.0]
      2. MRC psycholinguistic DB: CFI [0,1.0]
      3. Spelling list derived sets (1 to 6) [0.4, 1.0]
      4. The set of lexemes which have pictorial symbols [0.6]
      5. Frequency scores for compound nouns from the BNC [0.4, 0.9]
      6. SemCor frequency scores [0, 0.5]

      When you have configured your familiarity score sources, click the 'Process' button to fill the fscore column in the lexicon table. This will take roughly half an hour.

  • Execute the second stage of the database construction process by entering the following:

    \i batchscript2.sql

    Upon completion, compute the following resources:
    • Wordform roots: the morphological roots of the wordforms found in the STANDUP lexicon can be computed from the enhortho field of the Unisynlexicon. To create this resource, run the following:

      java -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.dbbuild.RootFinder

    • Pseudolexeme orthographic remainders: the STANDUP lexicon contains information of pairs of lexemes that have phonetically similar prefixes and suffixes, e.g. "spook" and "spectacles" may be paired to create a joke using the neologism "spook-tacles". To support this, it needs to compute the orthographic remainders of the pairing, e.g. "-tacles". To create this resource, run the following:

      java -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.dbbuild.OrthoSplitter

      and

      java -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.dbbuild.OrthoSplitterRear

    • Serialized hashtables and indices: if you plan on using the STANDUP lexical database API with the new database, various Java serialized files must be created -- these files greatly speed up the performance of the API. Run the following: java -Xms384M -Xmx384M -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.dbbuild.SerializerAll
      java -Xms384M -Xmx384M -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.dbbuild.SerializerIndices
      Upon completion, import the following files into the lexical database API .jar under standup/resources/serialized:
      • c:/compiledlexemes.dat (rename jokeonly_compiledlexemes.dat if building 'joke' DB)
      • c:/compiledwordforms.dat (rename jokeonly_compiledwordforms.dat if building 'joke' DB)
      • c:/compiledconcepts.dat (rename jokeonly_compiledconcepts.dat if building 'joke' DB)
      • c:/widgitcodestofiles.dat
      • c:/spellingtowfid.dat
      • c:/wfidtolxid.dat
  • Execute the third stage of the database construction process. If you are building a general-purpose lexical database, enter the following:

    \i batchscript3_lex.sql

    Upon completion, run the following:

    java -Xms384m -Xmx384m -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.dbbuild.CustomLexiconsAndTopicBuilder

    Import the following files into the lexical database API .jar under standup/resources/xml:
    • c:/topicdb_fc1.topic
    • c:/topicdb_fc2.topic
    • c:/topicdb_fc3.topic
    • c:/topicdb_fc4.topic
    • c:/topicdb_fc5.topic
    • c:/customlex_fc1.lexicon
    • c:/customlex_fc2.lexicon
    • c:/customlex_fc3.lexicon
    • c:/customlex_fc4.lexicon
    • c:/customlex_fc5.lexicon

    ...and that's it!

    If, however, you are building a database to support joke-generation, enter the following:

    \i batchscript3_joke.sql.

    Upon completion, compute the clause instantiation-filtered schema instantiations as follows:

    java -Xms384M -Xmx384M -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.dbbuild.SchemaFilterer

    This creates 11 C:/step17_schemafilter_(SCHEMANAME).sql files. Upon completion place these files in the dbbuildscript folder and move on to the last stage of the database construction process.

  • If you are building a database to support joke-generation, execute the fourth and last stage of the database construction process:

    \i batchscript4_joke.sql

    Upon completion, run the following: Upon completion, run the following:

    java -Xms384m -Xmx384m -cp standup_dbtools_v1.4.jar;postgresql-8.1-407.jdbc3.jar standup.authoring.dbbuild.CustomLexiconsAndTopicBuilder

    Import the following files into the lexical database API .jar under standup/resources/xml:
    • c:/topicdb_fc1.topic (rename to jokeonly_topicdb_fc1.topic)
    • c:/topicdb_fc2.topic (rename to jokeonly_topicdb_fc2.topic)
    • c:/topicdb_fc3.topic (rename to jokeonly_topicdb_fc3.topic)
    • c:/topicdb_fc4.topic (rename to jokeonly_topicdb_fc4.topic)
    • c:/topicdb_fc5.topic (rename to jokeonly_topicdb_fc5.topic)
    • c:/customlex_fc1.lexicon (rename to jokeonly_customlex_fc1.topic)
    • c:/customlex_fc2.lexicon (rename to jokeonly_customlex_fc2.topic)
    • c:/customlex_fc3.lexicon (rename to jokeonly_customlex_fc3.topic)
    • c:/customlex_fc4.lexicon (rename to jokeonly_customlex_fc4.topic)
    • c:/customlex_fc5.lexicon (rename to jokeonly_customlex_fc5.topic)

    ...and that's it!

STANDUP System Download

about

What you need before installing STANDUP

  • a PC
  • either Windows XP or Windows Vista or a suitable Linux system;
  • at least 512 MB of memory, although 1 GB is recommended for better performance
  • free disk space of 60 MB (SIMPLE) or 4 GB (FULL)
  • a processor equivalent at least to a 1.5 GHz Intel/AMD
  • an up to date Java Runtime Environment -- you need at least version 5.0 update 4 (sometimes known as "version 1.5"), as earlier versions give problems. On Windows Vista, it must be at least version 6 (a.k.a. "version 1.6"). On Linux, use a recent version of Sun Microsystem's open Java -- the GNU Java interpreter gij is not suitable. The latest Java versions can be downloaded from http://java.com/en/download (or on Linux you may be able to use the package installer to download and install it).

(One way on Windows XP to find out which version of Java (JRE) you have is to open the Control Panel, choose Add/Remove Programs, and see which JRE is offered as available for removal).

STANDUP System - SIMPLE mode

If you want to see what STANDUP does, running it in SIMPLE mode will show you. Compared to FULL mode, it is much smaller, much easier to install, and runs quickly. All you need to do is download one ZIP folder (see below), extract all its contents, and run/open the appropriate file; fuller instructions are provided within the ZIP folder. FULL mode involves downloading some additional large files, following a careful installation procedure, and takes up about a hundred times more space on your disc. The visible behaviour of the two modes is identical. The difference between the two modes is in the internal workings:

  • SIMPLE mode does not build fresh jokes, but offers jokes from an in-built list of about 900 which it has previously generated. It therefore does not need a dictionary.
  • FULL mode has a vast dictionary of words and phrases which it uses to create jokes when required. This means it can in principle produce millions of different riddles.

Version 1.4.2: (February 2009). This download has the same functionality as Version 1.4.1, but:

  • the file and folder structure of the download (and subsequent installation) is simpler;
  • it has been rearranged to run on Windows Vista systems;
  • a Linux download is available;
  • there are more documentation files included -- including the Installation Guide;
  • some extraneous files are no longer present in the download;
  • a bug which prevented display of STANDUP's options control panel has been fixed.

Windows: Download STANDUP SIMPLE Version 1.4.2 (54 MB)
Linux: Download STANDUP SIMPLE Version 1.4.2.B (54 MB)

28 April 2009: The "B" version of the Linux download fixes a bug with getting STANDUP to open in a separate window; the system (in particular, the Java .JAR file) is otherwise unchanged from the original 1.4.2 version.

Two document files which are in the downloads (above) but which you might like to browse here are:

  • User Manual
  • Terms and Conditions of Use

(The Installation Guide is inside the download.)

STANDUP System - FULL mode

Please note:

  • STANDUP FULL mode cannot be installed on its own -- you must have STANDUP SIMPLE mode already installed on your machine before you upgrade to FULL mode.
  • STANDUP FULL mode is available for Windows, but a Linux version has not yet been tested.

To upgrade from STANDUP SIMPLE to STANDUP FULL, you need:

  • The PostgreSQL database software: This is free software, not created by the STANDUP project. A copy is included in the STANDUP FULL Upgrade download (below), or you can get the latest version from the PostgreSQL website .
    [WARNING: STANDUP has not been tested with more recent versions of PostgreSQL, and so it may be better to rely on the version downloadable from here, if this copy installs successfully on your computer.]
  • The STANDUP lexical database: This is supplied as a PostgreSQL "backup" file, to be loaded into the PostgreSQL system. The first two numbers (e.g. "1.4") in the version-number indicate which versions of STANDUP the database supports -- the 1.4 database supports STANDUP versions 1.4.1, 1.4.1.B, 1.4.2, etc.
  • The STANDUP FULL BAT file: Opening this file, once correctly installed, initiates the STANDUP FULL system. (Windows only -- not on Linux.)
  • STANDUP (FULL) Installation Guide: This describes the installation procedure, mostly involving PostgreSQL.

All of these are included in the STANDUP FULL Upgrade downloadable ZIP folder (below).

Windows: Download STANDUP FULL Upgrade, Version 1.4 (228 MB)

At the moment, there is not an FULL upgrade package for the Linux version.

A document file which is in the download folder (above) but which you might like to browse here is:

  • Version 1.4 Installation guide, FULL Upgrade

Speech Output Upgrade

The FreeTTS speech output of the STANDUP system can be given an improved voice (developed at Carnegie Mellon University by Alan Black). However, this takes up considerable disk space (103 MB) and causes the STANDUP system to run very slowly (particularly at start-up), except on very powerful machines. It can be added later if you find STANDUP's default voice is not satisfactory.

There is no question of version compatibility -- the same speech upgrade can be used with any version of STANDUP, and on Windows or on Linux.

The download file contains:

  • The main file cmu_us_awb_arctic.jar.
  • Two BAT files for running STANDUP with the improved speech, either in SIMPLE or in FULL mode on Windows.
    (For Linux, see the parameters.txt documentation in the STANDUP SIMPLE download for how to run STANDUP with the improved speech file.)
  • A short text file with very simple installation instructions.

Download STANDUP Speech Upgrade (104 MB)

Uninstalling STANDUP

Both the Installation Guides (SIMPLE and FULL) include instructions for removing STANDUP software from your system.

User Manual

(For STANDUP software version 1.2.12 onwards)

(Last update: 17 January 2007)

1. Introduction

The STANDUP system is an interactive program which allows the user to create (generate) very simple punning riddles, by choosing certain requirements, such as words or subjects. It has a large dictionary (over 100,000 words and phrases) and a small number of rules about the allowable shapes of punning riddles. The software can create a novel riddle in response to just a few choices (selections) by the user.

The user interface has been designed so as to be particularly suitable for use by those with limited motor skills - as well as being usable on a conventional keyboard-and-mouse arrangement, it can run on a touchscreen or with a single-switch (scanning) device. It is also suitable for young children, as the presentation is colourful, lively and simple to use. Pictorial images can (when available) be displayed alongside words to assist those with limited literacy. Also, speech output is available for everything from menu choices to the actual jokes, so the system can function not only as a joke-creation aid, but as a joke-telling device, for users with limited oral facility.

Most of these provisions can be customised for a particular user via the system's own control panel (see Section 11 below). Also, each individual user has their own "profile", which not only records their preferred settings for use next time, but keeps track of which jokes they have already produced in previous sessions.

The address of the STANDUP project website has changed over the years. Its current location can probably be found using a search engine.

2. Overview of use

The STANDUP system is an interactive program, in which options are presented on the screen using large labelled buttons (usually depicted as clouds). There is an initial phase during which various special displays appear to allow the user to "login" to the system (Section 3 below). After that, there is a standard layout in three parts:

  • Top: Some navigation buttons for basic functions such as going back to the main menu (HOME), exiting the session (END), etc. (Section 4 below).

  • Middle: The main options - a set of choices, depicted as clouds with textual and sometimes pictorial labels (Sections 5 to 8 below).

  • Bottom: A progress chart, showing where the user has reached. (Section 9 below). (This part can be turned off, via the STANDUP control panel, if desired.)

There is also a STANDUP control panel, where a number of options controlling the system's behaviour can be adjusted; this panel appears on request (Section 11 below).

3. Getting started

When first launched, a 'splash screen' displaying the terms and conditions of the software usage is shown. Click anywhere on it to accept the terms and conditions and to proceed to the main part of the system.

The initial screen presents choices for:

  • First time: A user who has not yet used this STANDUP installation on this computer should select this item, in order to register as a user with this copy of the software.

  • Back again: A user who has already used this STANDUP installation on this computer should select this item, in order that the software can load up past settings and history.

  • Exit: This closes down the system completely (i.e. terminates the STANDUP software).

N.B.: User names are stored only with particular copies of STANDUP, so two different copies on separate computers will have a different list of known users. Also, re-installation of STANDUP may cause all user information to be lost, requiring all users to start afresh.

If the user selects "First time", the system asks for a username. Users without text input skills may need help, but this is the only obligatory part of using STANDUP which demands textual input. Any username can be chosen, but something short and simple is best. If the name chosen is already on the system, the system will report this, allowing a chance to choose a different name. Once a name is selected, there will be a pause while the system prepares things for the new user.

If the user selects "Back again", the system will offer all known usernames (in that installation of STANDUP) as menu options, so the user can select the right one.

4. The navigation buttons

Along the top of the screen are anything from 3 to 5 buttons (in the shape of small clouds) which are present independently of the central menu options.

  • Home : This takes the user back (after a check to ask if this selection is really intended) to the first screen after the logging in stage, where the main menu is displayed.

  • Help: Selecting the "Help" button means that the next selection of a button is not treated as a choice of action but as a request for more information about the selected button. This information is supplied in the form of a short text message.

  • End: This ends the current session with the current user; that it, it is like a "log out" from STANDUP. It does not close down the software, but takes the user back to the very first screen, where another user can "log in" to STANDUP (and where the "Exit" button is available if a complete close-down is required).

  • Back: This button appears only once the user has made some choices within the session. It takes the user one step back, to the previous screen display (as in a web browser).

  • Forward: This button appears only once the user has used the "Back" button. It undoes the most recent use of "Back", taking the user to the state where Back was selected.

5. The main menu (Home display)

The central area of the STANDUP display always contains the main options currently available to the user. These change as choices are made. However, one set of options has special status, as it is the initial set of options provided when the user starts a session, and it is where the "Home" button takes the user back to. These options are referred to here as the main menu, or the "Home" display.

The first of these options is always present, but the remaining four can be turned off (i.e. not made available), via the STANDUP control panel, if desired.

  • My favourites: Gives access to any items which this user (not other users) has previously (either in this session or earlier sessions) added to the "Favourites" list.

  • Any joke : Allows the user access to a joke, without regard to kind of joke, subject matter, or words used.

  • Subjects: Allows the user to select from a menu of topics (e.g. People, Animals); the joke(s) offered will then contain at least one word related to that topic (see Section 6.1).

  • Words: Allows the user to select (or type in) a word; the joke(s) offered will then contain that word (see Section 6.2).

  • Kinds of joke: Offers the user a number of types of joke, described informally in terms either of their phrasing or some central aspect of their linguistic structure (see Section 6.3).

For all of these except "My favourites", a further choice between seeing previously generated jokes (for this user) or getting a new joke (for this user) is offered as the final step before seeing the actual jokes - the "new/old" menu (Section 6.4).

6. Intermediate menus

6.1 Subjects

The "Subjects" menus offer one central choice, and a number of peripheral choices. The central choice will represent some subject area (topic), and the peripheral choices represent subsets or subclasses of that area. If there are too many subclasses available to display on one screen, a "More" indicator will appear at the right of the screen; selecting this causes the display of further subclasses, along with a "Previous" indicator at the left of the screen to allow the user to move back to the previously displayed subclasses (of the current central subject area). By selecting a peripheral choice, the user can work down to smaller and more detailed subclasses. At any stage in this process, selecting the central choice (subject area) leads directly to the "new/old" menu (Section 6.4).

6.2 Words

The Words menu offers three routes to a word:

  • Spell it: The facilities provided on this route depend on whether the system has been set to allow the user to type in text (the "text input" option in the control panel - Section 11.1.2).

    If the "text input" option is set, then a dialogue is initiated in which the user can type a word into a text input box. As the user types, the system will attempt to predict the complete word, offering its guesses in a drop down menu (with a scroll bar) from the text box. To select one of these offers, the user should directly select (click on) it with the pointing device. (Directly clicking on the arrow at the right end of the text box will temporarily get rid of the menu, hitting RETURN (ENTER) will then bring the menu back.) Once the desired word is complete in the text box, selecting Finished word will enter it. If the word is not in the working lexicon, the system will give a warning message at once, and return the user to the type-in dialogue. (The working lexicon consists of all words which could in principle lead to a joke with the currently loaded data. Hence the system may deny knowledge of an ordinary word which is in the main lexical database, if the pre-computation of suitability has shown that it could not lead to a joke.)

    If the system does not expect the user to be able to type in letters ("text input" option not set in STANDUP control panel), then instead of the type-in box, the user will be offered a menu of letters to choose from. By selecting letters in turn, the user can gradually spell out a word. Only words which are in the working lexicon (i.e. could in principle lead to a joke) are offered.

    Either of these ways of selecting a word lead to a word-form, which could be associated with more than one meaning (different lexemes). If the selected word is ambiguous in this way, a set of choices is offered to the user to choose a meaning. Meanings are described using the English gloss provided by WordNet, which is not always entirely useful. Also, WordNet may make distinctions between meanings which may seem over-fine to the average user. For example, for "ball", it offers "spherical object used as a plaything" versus "round object that is hit or thrown or kicked in games".

    Once an unambiguous choice of word has been completed, the next menu is the "new/old" menu (Section 6.4).

  • Alphabet: The alphabet menus allow the user to get to a word by refining choices of subranges of the alphabet (as in an alphabetically-organised phonebook or encyclopaedia). The choices presented on each screen correspond to a small number of parts of the lexicon, alphabetically arranged. For example "a to burst","bus to deep freeze", "deep freezer to furious", etc. Selecting one of these leads to display of a similar menu of choices, segmenting that part of the alphabet more finely. Eventually choices will show individual words, and the user can select a word.

    As with typing in a word, if the word is ambiguous, the system offers different possible meanings for selection; if the word is unambiguous, it proceeds directly to the "new/old" menu (Section 6.4).

  • Subjects: The Subjects menus within the word selection process are exactly like the menus presented in the choice of joke by Subject (Section 6.1), in that they represent topic areas with sub-topics. The difference is that when a (sub)topic is selected which has no further subtopics, the menus shown are for choosing words, using Alphabet menus as described above; also, at any stage the user may select the central topic, which leads to an Alphabet-style menu for words on that topic. The user can then refine the selection of a word in the same way as in the Alphabet route.

    As with the Spell-it and Alphabet routes, if an ambiguous word is selected, the system presents its meanings for further selection before moving on to the "new/old" menu (Section 6.4).

6.3 Kinds of joke

The "Kinds of joke" menus offer classes of jokes, described informally either in terms of fixed words that appear in their questions, such as "What do you get when you cross a...", or by describing some aspect of the language, such as "a joke that swaps a word with a similar sounding one". As in some of the other menus, if there are too many choices to present on one screen, a "More" indicator is at the right, and a "Previous" indicator appears at the left if the "More" choice is selected. In this way the user can "scroll" between screens of choices.

Choosing a kind of joke leads directly to the "new/old" menu (Section 6.4).

6.4 The new/old menu

After the joke requirement has been fully selected (a subject area, or a word-meaning, or a kind of joke), the system offers two choices - a new joke or an old joke - which are to meet that requirement.

A "new" joke is one which has not been shown to this user, as far as the system is aware. It will be generated anew (or, if running with a joke cache - Section 11.2.1 - may be selected from that source, which will give the same effect to the user). Selecting this choice causes the display of a "please wait" message (perhaps very briefly), and then either a message stating that no joke can be generated to this requirement (followed by return to the "new/old" menu), or the "show-joke" menu (Section 7 below).

An "old" joke is one which has been generated for the current user (as a "new" joke) either in this session or a previous session. This should not be confused with "My favourites", an explicit list of jokes chosen by the user. The set of "old" jokes consists of all jokes that have been encountered previously; the system stores these without notifying the user. Choosing "old" jokes leads to a display of all such items, using (as in other menus) the "More" and "Previous" buttons if necessary to review a large number of items. Each item is represented by the question part of the joke only. Choosing one of these leads to the show-joke menu.

7. The show-joke menu

The show-joke menu appears in two variants, depending on whether it is reached via a request for a "new" joke or for an "old" joke.

Both variants

The common part of the menu (across both variants) consists of one textual item and three choices:

  • The joke: The current joke (newly generated, or chosen from the "old" collection) is displayed separately from the menu choices.
  • Add to favourites: Selecting this either puts the joke into "My Favourites" (with a message to say this has happened) or puts up a message saying that the item was already in "My Favourites", returning to the show-joke menu in either case.
  • Say again: Selecting this causes the speech synthesiser to output the current joke (without altering the screen display).
  • Tell joke: This leads to the tell-joke menu.

New joke variant

There are choices for "Another new joke" and "Old jokes". The first of these leads invokes the joke generator again, while retaining the requirement most recently specified. For example, if the user has just requested a joke on topic T, then the request for "Another new joke" will attempt to find one also on topic T; similarly for a requirement based on a specific word or a kind of joke. If a joke can be generated, this will become the current joke, with display of the "show-joke" menu (new joke variant); if none can be found, a message will be displayed, and the user will return to the "show-joke" menu (with the same current joke as before). The "Old jokes" choice allows browsing of previous jokes (if any) which meet the most recently requested requirement.

Old joke variant

There are choices for "Previous joke" and "Next joke". These allow the user to scan back and forward amongst the "old" joke collection, while retaining the requirement most recently specified. For example, if the user has just chosen an old joke on topic T, then the old jokes reachable by "Previous joke" and "Next joke" will be only jokes classed as being on topic T; similarly for a requirement based on a specific word or a kind of joke. If the user scans to the end of the collection, a message will be displayed to say that the first or last joke has been reached.

8. The tell-joke menu

The tell-joke menu controls the telling of the joke via the speech output device, providing that the appropriate options are turned on in the STANDUP control panel (Section 11.1.4). It has four choice items (the current joke is also displayed on the screen):

  • Get ready: Selecting this causes a message to be spoken by the system, suitable for introducing the joke-telling or catching the attention of a listener.
  • Question : Selecting this causes the question part of the joke to be spoken by the system.
  • Answer : Selecting this causes the answer part of the joke to be spoken by the system.
  • Add to favourites : Selecting this causes the current joke to be added to "My favourites".

For the first three of these (the speech choices), the spoken words are also displayed textually on the screen, along with a button to allow the spoken form to be repeated.

9. The Progress Map

The progress map gives a rough pictorial indication of how the user is progressing through the process of producing a joke. Not every choice point is displayed, but major landmarks such as the selection of Subjects or Words are shown. As the image of the jester-robot moves along the tracks, a coloured trail (red) is left behind it, and when the jester-robot moves back (in response to the "Back" button), the re-traced portion is shown in yellow. (Hence, yellow indicates where the "Forward" button could take the robot.)

The progress map cannot be modified or controlled directly by the user. None of its parts can be selected, and it cannot be used to guide the system's progress through joke-building. It is purely a visual aid to see where the process has reached (approximately).

10. User Profiles and Option Files

10.1 Option Files

For each user known to the system, the software retains a user profile. This records the most recently selected options for all the possible values and settings that can be altered in the STANDUP control panel. Hence, each user can have the software behave in their own preferred manner, such as the omission of certain menu items, or the suppression of speech output for some facilities.

In addition, the user profile records the user's favourites and also every joke that they have generated in every session since being first introduced to the system. This means that each session starts from where the user left off previously.

There are a number of standard option files, which can be used as starting points for new user profiles (or to make wholesale changes to an existing profile). These files contain a number of commonly-used combinations of settings (but no user names, favourites or past jokes). By using the "Load options from file" button in the control panel, a named file containing a stored set of options can be used to instantaneously set all the control panel options to the values in that file, while not affecting the current set of favourites or old jokes.

10.2 Initial (Login) Settings

Once a user has logged into STANDUP, all the settings to control the User Interface and the Joke Generator will be taken from that user's profile; if the user is new to the system, that profile will be based on the default profile.

However, before a user logs in to the system, when it has first started up, the settings are taken from a special profile called "login". Although this will have certain default values in a new STANDUP installation, these can be altered via the STANDUP Control Panel. Any Control Panel amendments made during that initial stage of use (i.e. when the Control Panel shows the "Profile being edited" as "login") will affect that initial phase (only). Hence, in order to ensure that STANDUP always starts up with some particular settings (e.g. using scanning mode for user interactions), then this is achieved by starting up STANDUP then using the Control Panel during the initial login phase to make the required settings. These settings will then affect the login phase of STANDUP in all subsequent use of that installation of STANDUP, until altered again.

N.B.: (a) Only User Interface settings can be altered for the "login" pseudo-user. Joke Generation settings are irrelevant, as joke generation cannot occur during the login stage.
(b) Altering settings for the login phase (as here) is different from altering the "default" settings which the system will assume for a new user. Altering default user settings is done using the STANDUP Options Authoring Tool, for which a separate User Guide is available. So if the aim is to have every new user automatically get some particular settings in their profile when they start using the system, the default options should be edited with the Options Authoring Tool.

11. The Control Panel

Pressing the ESC key (top left of normal keyboard) brings up the STANDUP Control Panel, which allows the fine adjustment of many options.
N.B.: Whereas the main STANDUP menus are designed to be accessible via various input modalities, including a single-switch/scanning interface, the Control Panel can be operated only if conventional selection via a mouse (or mouse-substitute) is available; there is no provision for scanning within the Control Panel.

At the bottom of the panel are four buttons controlling the overall use of the control panel:

  • About: This brings up a new window giving background information about the origins of the STANDUP software, the version-number, etc. This window can be dismissed via an "OK" button.
  • Load options from file...: This brings up a file-opening dialogue box to allow the user to set all the options to values specified in an options file (Section 10 above). These files are to allow standard combinations of settings to be installed easily, and a few such combinations are supplied with the software. However, the user can set any combination of options simply by using the controls listed in the subsections below.
  • OK: This closes the control panel, altering the current settings to have the values which the control panel contains; that is, it causes any changes the user has just made to take effect.
  • Cancel: This closes the control panel, leaving the current settings with the values which they had before the control panel was opened; that is, it causes any changes the user has just made to be abandoned.

Above these four major buttons, the control panel is structured into a number of tabs, as follows (Sections 11.1, 11.2).

11.1 User Interface Options

This has 5 tabs within it: "Interactions", "Input devices", "Symbols", "Speech" and "Miscellaneous", all explained below.

11.1.1 Interactions

These switch on/off the available menu items, in the main menu (home screen) and also the menus for word-selection.

  • Get (any) joke: the main-menu choice of "Any joke".
  • Get joke by subject: the main-menu choice of "Subjects".
  • Get joke by word: the main-menu choice of "Words". Within this, there are four further options. The first three control the presence of choices within the Words menu:
  • Get word by spelling/typing: The "Spell it" item
    • Cluster words & letters: This affects the presentation of choices when the user, from the "Words" menu, opts to "Spell" a word (for use in a joke). When set, this option ensures that all the available choices (in the gradual spelling of the word) are fitted on to a single screen, without need for "Previous" and "More" indicators. This is achieved by grouping choices into a small number of sets, and allocating each such set to a single button; selection of this button will then allow the exploration (in a similar manner) of all those choices. When the "Cluster words & letters" option is not set, individual choices are given separate buttons, which will often need several screens for display, controlled by "Previous" and "More" indicators.
    • Show spelt words sooner: This affects the presentation of choices when the user, from the "Words" menu, opts to "Spell" a word (for use in a joke). If this option is set, then when a specific word can be spelt by adding a unique suffix to the currently spelt form, that word is presented as a button. When the option is not set, the user has to make one more selection to reach that word, since a "spelt form" button will appear, the choice of which will then lead to the word.
  • Get word by alphabet: The "Alphabet" item
  • Get word by subject: The "Subject" item
  • Choose specific meanings: If this option is set, whenever a user chooses a word (for use in a joke), either by typing/spelling, alphabet, or subject, and STANDUP can generate or retrieve jokes for more than one possible meaning of that word, the user will be presented with a choice of those meanings, complete with symbol (where available) and short dictionary description for each meaning, and will be asked to choose a specific one. If not set, STANDUP will generate or retrieve jokes with either meaning.
  • Get joke by type: the main-menu choice of "Kinds of joke".

That is, by setting or unsetting these options, the user can reduce the set of menu choices which are made available in the main menu and in the "Words" menu. If all are unset, the main menu contains just one item: "My favourites".

11.1.2 Input devices

These indicate what kind of input the user will use for communicating with the user interface (selecting menu items, entering text). If the Text input via keyboardbox is set, the system will act on the basis that the user can input individual letters from a keyboard (or substitute device); if not, then letters have to be selected from menus (see the Words menu, Section 6.2 above). The user interface will alter what it offers to the user, depending on this setting.

Exactly one of the following options can also be set:

  • Mouse: This indicates that a conventional mouse (or a device which provides that functionality) is available. Moving the mouse over a menu item focusses (highlights) it, "left click" selects it. ("Right click" has no effect.)
  • Touchscreen (one touch): This is for use when the STANDUP User Interface is running on a touchscreen of some kind (e.g. a tablet PC). "One touch" indicates that a single touch on a menu item will select it.
  • Touchscreen (two touch): This alternative touchscreen option indicates that a single touch on a menu item will focus (highlight) on it, but a second touch is required to select it.
  • Switch device: This indicates that a single-switch device is being used. The user interface will deploy scanning, with each available choice on the screen being highlighted in turn. If this mode is used on a computer with a conventional mouse as its switch device, then a "right click" acts as the switch. (As a further facility, the "left click" can override the scanning and select a menu item directly; this is to allow intervention by a non-switch-user.)

If "Switch device" is set, then two further values can be set:

  • Scanner initial delay (in seconds): This value indicates how long the scanner (single switch) interface will pause before starting to scan the available choices.
  • Scanner delay (in seconds): This value indicates how long the scanner (single switch) interface will pause on each available choice when cycling through the menu.

11.1.3 Symbols

This tab provides two menus to control the pictorial symbols which can be displayed alongside words when text is displayed by the STANDUP system.

At the top, the user can select which picture library is preferred for use. Two are available: Picture Communication Symbols (PCS) and Widgit Rebus. The user can specify either that just one of these is to be used, or that one is to be preferred but the other can be used where no symbol is available from the preferred set. The current preference-ordered list is shown at the top, and the buttons below (Add, Remove, Move up, Move down) can be used to edit this list. PCS images will appear only on messages from the STANDUP system (e.g. "Choose a joke"), but not on jokes themselves or on buttons representing subject areas (topics). Widgit Rebus images will be used in all types of text, if available.

The PCS images are the property of Mayer-Johnson LLC, and are used with permission. The Widgit Rebus symbols are the property of Widgit Software and are used with permission.

The buttons lower down allow exactly one of four options to be chosen, to control the display of images alongside words. A distinction is made between user interface symbols and keyword symbols. A "user interface symbol" may be either a picture library symbol (PCS or Widgit Rebus) or an icon used within the STANDUP system (e.g. the special icon for "Any joke"), but it is a user interface symbol by virtue of being attached to a message or label by which the user interface communicates with the user. A "keyword symbol" is a pictorial symbol (from one of the picture libraries) displayed with a word in a joke; these symbols are displayed only for the central words involved in the joke, and do not apply to the fixed pieces of text such as "What do you get when you cross".

On this basis, the options are:

  • No symbols: The system will not display any symbols, either user interface symbols or keyword symbols.
  • User interface symbols only: The system will display user interface symbols (i.e. in messages, menus, etc.), but will not display any keyword symbols.
  • User interface and ambiguous words only: The system will display user interface symbols, and also display keyword symbols (where available) for words which have more than one entry (meaning) in the dictionary.
  • Show all symbols: The system will display user interface symbols, and also display keyword symbols (where available) regardless of how many meanings the corresponding words have.

In practice, some words may not have an associated picture in the STANDUP dictionary, so even setting Show all symbolswill not guarantee the display of a picture for every (key)word in a joke.

11.1.4 Speech

There are 5 options controlling speech output. Any set of these can be selected, ranging from none to all.

  • Speak options when focussed: When this is set, the text labels on menu buttons will be spoken whenever the button is highlighted (i.e. when the system "focusses" them). This happens either when a mouse user hovers the mouse over the option, or when the scanning interface offers that button for selection.
  • Speak options when selected: When this is set, the text labels on menu buttons will be spoken whenever the button is chosen by the user.
  • Speak messages: When this is set, the textual messages used by the system to communicate with the user will be spoken.
  • Speak jokes on display: When this is set, any joke displayed by the system will be spoken as it is displayed.
  • Speak jokes on request: When this is set, the "tell joke" menu can be used to have the STANDUP system speak the joke in a manner controlled by the user (Section 8).

11.1.5 Miscellaneous

The "Miscellaneous" tab controls various aspects of the appearance of the screen display. At the top are three time-delays which can be specified:

  • Options animation offset (milliseconds): When the main choice buttons are displayed (or removed from the screen) by the STANDUP system, the separate buttons can appear/disappear together, or there can be a "staggering", where the buttons appear/disappear one after another, with a slight delay between them. The time-delay, in milliseconds, can be specified here, with 0 corresponding to simultaneous appearance/disappearance.

The other options on this tab are as follows:

  • Show progress map: This switches on/off the display of the map showing the progress of the jester-robot through the joke-production process.
  • Show message before options: When set, the message outlining what the user is to do (e.g. "Choose a joke") will appear before the menu items appear in the main area of the screen. When it is not set, the message appears roughly simultaneously with the menu options.
  • Show message banner below options: When set, the message appears lower down on the screen than the main menu choices; when not set, the message appears higher up than the menu items.
  • Enable session logging : When this is set, a session-log file will be created on disk for all the user's interactions with the system.

11.2 Joke Generator Options

11.2.1 Generation parameters

Thresholds and levels

  • Phonetic similarity threshold : When the STANDUP joke generator is comparing phrases, words, or parts of words for similarity, it rates the similarity of the pronunciations on a scale from 0.0 (completely different) to 1.0 (identical). Only those comparisons which give a value above a particular threshold (value between 0.0 and 1.0) are counted as "sufficiently similar" to use in a pun. This option allows the user to set what that threshold value is. In any given version of the STANDUP software, there will be a fixed limiting threshold, and any attempt to set the threshold below this limit will have no real effect on the system's behaviour. In version 1 of STANDUP, this limit is typically set at 0.75.
  • Word familiarity level: All the words in the STANDUP dictionary have been allocated an F-score between 0.0 and 1.0, which represents (roughly) how familiar a word is. Simple everyday words like "run" will have high scores, obscure words like "peripatetic" will have low scores. STANDUP jokes fall into F-levels, depending on the F-scores of the significant words in them, with FL1 having only very familiar words, and FL9 allowing any words at all; intermediate FL values vary the available level of familiarity. Any joke in a particular F-level will also be in any F-level with a higher number, as FL2 jokes include FL1 jokes, FL3 jokes include FL2 jokes, etc. By setting an FL value here, the user excludes any jokes which would be classed at an F-level with a higher number.
    N.B.: selecting a very restrictive level, such as FL1 or FL2, will greatly reduce the quantity of jokes available.

Generation options

  • Generate only new jokes: The STANDUP system keeps a note of every joke that has been generated for each user. When this option is set, the system will not offer, when asked for a new joke, any joke that the user has previously been shown. When the option is not set, it is possible that a previously shown joke might reappear.
  • Enable joke cache: A joke cache contains details of a relatively small number (a few hundred) jokes which the system has generated previously, but not for any particular user. That is, they are unseen but pre-constructed jokes, ready for display when requested. When running the full version of the STANDUP system, this option appears, allowing control of how joke generation proceeds. If this option is set, the system will, when asked for a new joke, search the cache first to see if any pre-constructed joke meets the current requirement (e.g. being on a particular subject). If its search of the cache does not yield a suitable item, the system will try to carry out normal joke generation. When this option is not set, the system will proceed directly to normal joke generation, bypassing the cache. To operate with the cache only, use the demo version of the STANDUP system.

Allowed ambiguity

These options limit the words that can be used in jokes.

  • One unambiguous meaning: Choosing "one" means that only words with a single meaning (in the STANDUP dictionary) can be used; this is highly restrictive, and will greatly reduce the quantity of jokes available.
  • Two ambiguous meanings: Choosing "two" will allow words with two meanings. Although this will allow more jokes, the STANDUP dictionary may assign more meanings to a word than the average English speaker would expect, and so a word which is thought to have exactly two meanings might be ruled out because the STANDUP dictionary has additional (perhaps obscure) meanings.
  • Many ambiguous meanings: This allows words to have any number of meanings (including one or two). The setting "many" is probably the most appropriate under most circumstances.

11.2.2 Joke types and schemas

The list on the left shows which joke types are available to the system. This list can be edited using the Add and Remove buttons below it. Each of the "joke types" corresponds to one of the choices on the Kinds of joke menu (Section 6.3), as follows:

Joke Type Informal Description (in menu)
master a joke of any kind
cross starts with "What do you get when you cross ... with ..."
call starts with "What do you call ...."
difference starts with "What is the difference between .... "
similarity starts with "How is .... like ...."
type starts with "What kind of .... ...."
juxtapose two similar sounding words are used one after another
partial part of a word sounds like another word
substitution swaps a word with a similar word
idiom uses common phrases
spoonerism the start of words are swapped around

The list on the right shows which joke schemas are available to the system. This list can be edited using the Add and Remove buttons below it. There are a total of 11 possible schemas.

A "joke schema" is an abstract description of the relationships between words, phrases and lexicon entries which are required to build a joke. Each STANDUP joke is based on exactly one schema. These do not correspond exactly to joke types, because the same schema can be phrased in words in more than one way. Users should read the technical documentation for the STANDUP software if they wish to find out more.

12. Files and folders

Most users of the STANDUP system will not need to know where and how the STANDUP system stores files. Some brief details are here for those who need this.

In the folder where the main STANDUP system (.JAR file) is located (see the installation instructions), a folder called standupdata will be created (if not already there) the first time that this installation of STANDUP is run. Within that, there are folders lexicons (containing files which control the use of the STANDUP dictionary), options (containing option files as described in Section 10 above), and users (containing information about all the users known to the system, including their "profiles"). The "users" folder contains a separate folder for each user, and a further "folder" for a pseudo-user called "login", which describes the settings used before during the STANDUP system's start-up phase, before a specific user is logged on (see Section 10.2 above). Each user's folder contains a folder called log which contains log files; i.e. traces of events during the user's sessions with the STANDUP system (if this facility has been turned on via the STANDUP Control Panel).

There is no clean way, via the STANDUP Control Panel, to remove a user from the STANDUP system's list of known users. However, if this must be done, then closing down STANDUP and then deleting the folder corresponding to that particular user (within the "users" folder described above) should be effective. Do not delete the "login" folder!

Acknowledgements

The STANDUP software was developed between 2003 and 2006 at the universities of Aberdeen (Department of Computing Science), Dundee (School of Computing) and Edinburgh (School of Informatics), supported by grants GR/S15402/01 and GR/S15419/01 from the Engineering and Physical Sciences Research Council (UK). Design work was carried out by the whole team (Rolf Black, Ruli Manurung, Dave O'Mara, Helen Pain, Graeme Ritchie, Annalu Waller) and implementation was by Ruli Manurung. The system makes use of various public domain resources, including data taken from the WordNet lexical system (http://wordnet.princeton.edu), information from the Unisyn pronunciation dictionary (http://www.cstr.ed.ac.uk/projects/unisyn), the PostgreSQL relational database package (http://www.postgresql.org/), and two sets of pictorial symbols:

  • The Widgit Rebus symbol set © Widgit Software Ltd. All Rights Reserved Worldwide. Used with permission. Widgit Software & Logotron Ltd, 124 Cambridge Science Park, Milton Road, Cambridge CB4 0ZS. Website: www.widgit.com.
  • The Picture Communication Symbols ©1981 ­- 2006 by Mayer-Johnson LLC. All Rights Reserved Worldwide. Used with permission. Mayer-Johnson LLC, P.O. Box 1579, Solana Beach, CA 92075, USA. Phone: 858-550-0084. Fax: 858-550-0449. Email: mayerj@mayer-johnson.com, web site: www.mayer-johnson.com.

References

Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Mass.

Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K. & Tengi, R. (1990). Five Papers on WordNet. International Journal of Lexicography, 3, 4, Winter 1990, Revised March 1993.

Manurung, R., O'Mara,D., Pain, H. Ritchie, G., Waller, A. (2006) Building a lexical database for an interactive joke-generator. In Proceedings of LREC 2006, Genoa, May 2006.

Manurung, R., Ritchie, G., O'Mara, D., Waller, A., Pain, H. (2006) Combining lexical resources for an interactive language tool. In Proceedings of ISAAC 2006, Duesseldorf, August 2006.

O'Mara, D., Waller, A., Ritchie, G., Pain H., Manurung, H.M. (2004). The role of assisted communicators as domain experts in early software design. In Proceedings of the 11th Biennial Conference of the International Society for Augmentative and Alternative Communication (CD) Natal, Brazil, 6-10 October 2004.

O'Mara,D., Waller, A., Manurung, R., Ritchie, G., Pain, H., Black, R.(2006) Designing and evaluating joke-building software for AAC users. In Proceedings of ISAAC 2006, Duesseldorf, August 2006.

Ritchie, G., Manurung, R., Pain, H., Waller, A., O'Mara, D. (2006) The STANDUP Interactive Riddle Builder. IEEE Intelligent Systems 21 (2), March/April. Pp. 67-69.

Waller, A., O'Mara, D., Manurung, R., Pain, H., and Ritchie, G. (2005). Facilitating User Feedback in the Design of a Novel Joke Generation System for People with Severe Communication Impairment. In HCII 2005 (CD), Vol.5, G. Salvendy (Ed). Lawrence Erlbaum, NJ, USA.

Terms and Conditions

Terms of Use of STANDUP Software

  1. The various computer programs and data files that constitute the STANDUP system and the STANDUP Options Authoring Tool are copyright 2004-2006 by the University of Edinburgh, the University of Dundee and the University of Aberdeen.
  2. The various computer programs and data files that constitute the STANDUP system and the STANDUP Options Authoring Tool may not be distributed without the permission of an appropriate representative of these universities.
  3. The STANDUP software is supplied without any warranty. The authors of the software, and their employers, are not responsible for any consequences of using this software.
  4. The Widgit Rebus symbol set is copyright by Widgit Software Ltd, all rights reserved worldwide, and is used with permission. Users may not extract Widgit Rebus image data from the STANDUP software.
  5. The Picture Communication Symbol set is copyright 1981 - 2006 by Mayer-Johnson LLC, all rights reserved worldwide, and is used with permission. Users may not extract PCS image data from the STANDUP software.
  6. STANDUP graphic icons (including the jester-robot) are copyright 2006 Regina Fernandes/Illugraphics. Users may not extract graphic image data from the STANDUP software.
STANDUP Installation Guide (FULL mode) for Version 1.4.1 Release B

(Last update: 24 May 2007)

This guide contains step-by-step installation instructions for upgrading the STANDUP system from SIMPLE to FULL mode on a Windows XP machine.

Currently, the process is rather cumbersome and requires considerable (and patient) user intervention.

Instructions for uninstalling the STANDUP FULL Upgrade are given at the end of this document.

Have you got STANDUP SIMPLE?

The files described here are merely an upgrade: you must have STANDUP already installed and runnable in SIMPLE mode on your machine in order to get STANDUP to run in FULL mode. After the upgrade, you can still run STANDUP in SIMPLE mode if you wish.

If you have STANDUP already working in SIMPLE mode on your machine, this means that you have enough memory (512 MB, although more is better), a fast enough processor (1.5 GHz) and a suitable Java environment (Version 5 or later). The upgrade installation will consume about 4 GB of disk space.

PostgreSQL

The STANDUP FULL upgrade procedure consists entirely of installing the PostgreSQL free software (not developed on the STANDUP project) and loading a large STANDUP lexical database into PostgreSQL.

The availability of PostgreSQL determines whether STANDUP FULL mode can run on your machine. As of May 2007, PostgreSQL is supported on Windows 2000, XP and 2003 (only on 32 bit systems); you can check the PostgreSQL website for the latest on this. PostgreSQL requires functionality that is not available on Windows 95/98/ME, and will not run on them. If you want to try running PostgreSQL on these platforms, the PostgreSQL developers suggest looking at the Cygwin port, which has basic support for Windows 9x platforms, but we do not know how feasible that is.
N.B.: If you're using these old operating systems, the PC in question is probably not large enough or fast enough to run STANDUP anyway.

Although in principle STANDUP should be able to run on Macs, Unix/Linux, and other Windows variants (e.g. Windows Vista) using an appropriate underlying Java and PostgreSQL setup, we currently do not provide any instructions for these systems, and offer no guarantee that the STANDUP software will run on these systems.

N.B. Once the files have been fully downloaded and extracted (unzipped), the lexical database installation procedure (see steps 1 and 2 below) takes about an hour, and requires user interaction at various stages.

The files

The download from the STANDUP website should have provided you with a ZIP (compressed) folder/file. Extract the files from that folder. These should be:

  • STANDUP Full.bat
  • README_Full.TXT
  • LICENCE.TXT
  • a ZIP folder containing the PostgreSQL system, named something like postgresql-8.1.5-1.zip
  • the STANDUP database file (a PostgreSQL "backup" file), named something like standup_v1.4_061127_joke.backup
  • a folder docs containing a number of files of documentation, including the Installation Guide for the STANDUP FULL Upgrade.

The recommended locations for these files are:

  • STANDUP Full.bat : in the folder where the other STANDUP .BAT file, from the STANDUP SIMPLE installation (STANDUP Simple.BAT) is located. This matters - the system will not work if STANDUP Full.bat is not alongside the jars folder from the SIMPLE installation.
  • README_Full.TXT, LICENCE.TXT: the location of these is not critical, but it's probably convenient to put them in the same folder as STANDUP Full.bat. LICENCE.TXT should be there already, from the SIMPLE installation.
  • the PostgreSQL ZIP file can go anywhere that is convenient (e.g. wherever you keep downloads). You will be extracting its contents and then using them to set up PostgreSQL (See "Installation" below).
  • the STANDUP database "backup" file can go anywhere that is convenient. You will have to be able to find it when running PostgreSQL to set up the lexical database.
  • the contents of the docs are best placed in the docs folder which is located in the existing STANDUP Simple folders. (But you can put them elsewhere if you prefer.)

Installation

STEP 1: Setting up PostgreSQL

Note: You will need administrative privileges on your Windows system to carry out the PostgreSQL installation.

(We will assume here you are installing PostgreSQL version 8.1, but you may need to look for different filenames if you are using a later version, such as 8.2.)

  1. Extract the files from the PostgreSQL installation ZIP folder (see above), placing them somewhere convenient on your hard drive. Double-click the postgresql-8.1.msi file to begin the installation process.
  2. Leave the selected language as English and click "Start". Click "Next" twice.
  3. You should then see the "Installation options" screen. The default behaviour of PostgreSQL is to install itself under C:\Program Files\PostgreSQL\8.1\ -- if this presents a problem, you can change it here by clicking the 'Browse' button.
  4. Click "Next". At the next screen, you can just leave all the default settings as is. Just make sure that "Install as a service" is checked.
  5. You can enter any password you want here, but if you just leave it blank, one will be randomly generated for you. This is the password for the Windows account that will run the service, not the database superuser account (that comes later).
  6. Click Next. If it asks for confirmation whether to create the account, click Yes.
  7. You should then see the "Initialise database cluster" screen.
    Set locale to "English, United Kingdom".
    Set encoding to "UTF-8".
    Set superuser name to "postgres".
    Set password to "pgsuper!" (without the quotation marks).
    Reconfirm password: "pgsuper!".
  8. Click Next. You should then see the "Enable procedural languages" screen.
  9. Make sure "PL/pgsql" is checked and click Next. You should then see the "Enable contrib modules" screen. Leave things as is and click Next.
  10. Click Next again. This should begin the installation. It might take a few minutes.
  11. Click Finish.

STEP 2: Restoring the STANDUP database

Now that PostgreSQL is installed, we need to load, or in Postgres parlance, restore the standup_v1.4 database, using the "backup" file that you have extracted from the STANDUP FULL Upgrade ZIP folder (above).

There are 2 ways to restore the database, i.e. by using pgAdmin III, the PostgreSQL administration GUI tool, or by entering the command from a MSDOS command line interface. They both accomplish the same thing, so it's down to your preference

(Again, we shall assume that you have installed PostgreSQL Version 8.1, but you will need to make the obvious changes to folder names, etc., if the number is 8.2 or later.)

 

The GUI way

  1. Launch the pgAdmin III tool: go to the Start menu, choose Programs > PostgreSQL > pgAdmin III
  2. On the left side of the window should be a list of Servers containing 1 entry: "PostgreSQL Database Server 8.1 (localhost:5432). Double-click this entry.
  3. A "Connect to server" dialog box should pop up. Enter the password you entered earlier: "pgsuper!" (without the quotation marks) and click OK.
  4. Some new entries should appear: Databases, Tablespaces, Group Roles, and Login Roles. Right-click on Databases and choose "New Database".
  5. A "New Database" dialog box should pop up. Enter name: "standup_v1.4". Leave everything as is (everything else should be empty except Encoding, which should be "UTF8". Click OK. This will create the "standup_v1.4" database.
  6. Now double-click the "Databases" entry to expand it. You should see the 'standup_v1.4' database there.
  7. Right-click on "standup_v1.4" and choose "Restore". The "Restore Database standup_v1.4" dialog box should pop up. Click the "..." button next to the Filename field, and locate the standup_v1.4_061127_joke.backup file you extracted from the STANDUP download (see above). Click OK and the database restore process will begin. This can take anywhere between thirty minutes and a few hours depending on the configuration of the computer being used (in particular, hard disk speed and amount of RAM). If you spot an error saying 'could not execute query: ERROR: language "plpgsql" already exists', just ignore it -- it's perfectly normal.
  8. Once the restore process is complete, it should say something like: "WARNING: errors ignored on restore: 1 Process returned exit code 1." -- this is simply reporting the aforementioned error.
  9. At this point, do NOT click the "OK" button! This will cause PostgreSQL to try and restore the database again, and this will only serve to confuse it! Click the "Cancel" button instead.
  10. Exit the pgAdmin III application by choosing File > Exit.

The command-line way

  1. Open a DOS command prompt. You can do this by going to the Start menu and choosing "Run...". In the resulting dialog box, type in "cmd" and click OK. (Alternatively, find the "Command Prompt" in the Programs menu (it's usually in "Accessories").
  2. If you haven't changed any settings above, enter this command to create the database:
    "C:\Program Files\PostgreSQL\8.1\bin\createdb.exe" -E UTF8 -U postgres "standup_v1.4"
  3. If successful, it should return with a CREATE DATABASE message. Now, to restore the database, enter this command:
    "C:\Program Files\PostgreSQL\8.1\bin\pg_restore.exe" -i -U postgres -d "standup_v1.4" -v "C:\My Documents\standup_v1.4_061127.backup"
    (This example assumes you placed the .backup file in the My Documents folder. If you placed it somewhere else, change the command above accordingly.)
  4. This can take anywhere between thirty minutes and a few hours depending on the configuration of the computer being used (in particular, hard disk speed and amount of RAM). If you spot an error saying 'could not execute query: ERROR: language "plpgsql" already exists', just ignore it -- it's perfectly normal. Once the restore process is complete, it should say something like: WARNING: errors ignored on restore: 1 -- this is simply reporting the aforementioned error.
  5. Close the DOS window by entering the command exit or pressing the 'X' icon in the top right corner.

STEP 3: Running the STANDUP system

If everything has gone OK up until now, you should be able to run the system!

To launch the system, open (double-click) the STANDUP Full BAT file, (after making sure it is in the folder which contains the jars folder from the STANDUP SIMPLE Installation, as explained above). If you haven't completed steps 1 and 2 above, this attempted launch will result in an error.

As with STANDUP SIMPLE, when you run the STANDUP system, it will create data in a standupdata folder alongside the BAT file you are starting from. Hence, the user running the STANDUP system MUST have permission to create files/folders in this location, otherwise the system will crash during start-up.

The initial set-up phase may take a couple of minutes. Once it is done, you will be presented with the STANDUP User Interface! Consult the STANDUP User Manual for more information about this. (There is a copy in the docs folder.)

If all this is working, then no further interaction with PostgreSQL should be needed. Opening the STANDUP Full.BAT file is all you need to do when you want to run STANDUP. You can still run it in SIMPLE mode using STANDUP Simple.BAT

Optional voice upgrade

The optional voice upgrade for STANDUP is equally compatible with STANDUP SIMPLE and STANDUP FULL. Hence, if you have installed the upgraded voice already, it should work with STANDUP FULL, or if you wish to install the improved voice later, you can.

Uninstalling the STANDUP FULL Upgrade

At present, there is no simple way to remove the STANDUP FULL Upgrade from a computer (i.e. so that only STANDUP SIMPLE is installed) -- the only method is to undo all the steps originally taken during installation.

Files to be removed

The first step involves deleting all the files you extracted from the downloaded STANDUP FULL Upgrade ZIP folder/file:

  • STANDUP Full.bat
  • README_Full.TXT
  • postgresql-8.1.5-1.zip (but you can keep this if you might be re-installing PostgreSQL in the future). Also, delete the files you extracted from this ZIP folder.
  • standup_v1.4_061127_joke.backup (but you can keep this if you might be re-installing this version of STANDUP FULL in the future)
  • the files which were in the folder docs (be careful not to delete other documentation files which were in your original STANDUP SIMPLE installation already).

After the above deletions, PostgreSQL will still be installed on your PC, and the STANDUP lexical database will still be installed within PostgreSQL. This takes up quite a bit of space.

Removing PostgreSQL

If you do not wish to make further use of the PostgreSQL system, you can remove that from your computer. How to do this is not properly part of the STANDUP uninstall procedure, but here are some suggestions:

  • Use the Add/Remove Programs in the Windows XP Control Panel to remove PostgreSQL.
  • Check the postgres folder within your Program Files folder (or wherever else you installed PostgreSQL) to see if any files and folders have been left behind in it. If so, delete these (and the postgres folder).
  • PostgreSQL may have left a "service account" defined on your Windows system, created as part of the installation procedure. On Windows XP Professional, following the menus Control Panel -> Administrative Tools -> Computer Management -> Local Users and Groups will take you to a list where the "postgres" service account can be removed. Alternatively, on Windows XP Home (or Professional), enter "net user postgres /delete" from the Command Prompt. If you changed the name of the Postgres service account during the installation, replace the account name accordingly.

After all that, the STANDUP Full Upgrade should be expunged from your computer, but STANDUP should still run in SIMPLE mode as before.

STOP

Aims:

  • to develop a computer system for generating tailored letters to help people stop smoking
  • to research knowledge acquisition (KA) techniques to acquire text-planning and sentence-planning rules from domain experts
  • to evaluate the clinical effectiveness of the computer generated letters in a general practice setting
  • to evaluate the cost effectiveness of this brief smoking cessation intervention

The results of our clinical trial suggested that while sending smokers a letter could help a small but useful number of people quit, the tailored letters were no more effective in this regard than the non-tailored letters. The tailored letters may have been slightly more effective with heavy smokers and others who found it especially difficult to quit, but the evidence for this is not conclusive.

SumTime - Generating Summaries of Time Series Data

Project Summary

Currently there are many visualisation tools for time-series data, but techniques for producing textual descriptions of time-series data are much less developed. Some systems have been developed in the natural-language generation (NLG) community for tasks such as producing weather reports from weather simulations, or summaries of stock market fluctuations, but such systems have not used advanced time-series analysis techniques.

Our goal is to develop better technology for producing summaries of time-series data by integrating leading-edge time-series and NLG technology.

SumTime Parrallel Corpus

SumTime-Meteo : A parallel corpus of weather data and their corresponding human written forecast texts

Demo

SumTime-Mousam Demo - Generates only Wind Descriptions

IGR

Final Report (IGR) to EPSRC about SumTime

Publications

Links to publications

Project Team

Collaborators

Related Links


Funded by EPSRC Logo

TUNA - Towards a UNified Algorithm for the Generation of Referring Expressions
Overview

Towards a UNified Algorithm for the Generation of Referring Expressions

TUNA was a research project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC). It involves a collaboration between the Department of Computing Science, University of Aberdeen, the Open University , and the University of Tilburg . The project started in October 2003, and ended in Feburary 2007.

Natural Language Generation programs generate text from an underlying Knowledge Base. It can be difficult to find a mapping from the information in the Knowledge Base to the words in a sentence. Difficulties arise, for example, when the Knowledge Base uses `names' (ie, databases keys) that a hearer/reader does not understand. This can happen, for instance, if the Knowledge Base contains an artificial name like `#Jones083', because `Jones' alone is not uniquely distinguishing; it is also true if the Knowledge Base deals with entities for which no names at all are in common usage (eg, a specific tree or a chair). In all such cases, the program has to "invent" a description that enables the reader to identify the referent. In the case of Mr. Jones, for example, the program could give his name and address; in the case of a tree, some longer description may be necessary (eg, `the green oak on the corner of ... and ...'. The technical term for this set of problems is Generation of Referring Expressions (GRE). GRE is a key aspect of almost any Natural Language Generation system.

Existing GRE algorithms tend to focus on one particular class of referring expressions, for example conjunctions of atomic or relational properties (eg, `the black dog', `the book on the table'). Our research is aimed at designing and implementing a new algorithm for the generation of referring expressions that generates appropriate descriptions in a far greater variety of situations than any of its predecessors. The algorithm will be more complete than its predecessors because it is able to construct a greater variety of descriptions (involving negations, disjunctions, relations, vagueness, etc.). The descriptions generated should also be more appropriate (ie, more natural in the eyes of a human hearer/reader), because the algorithm will be based on empirical studies involving corpora and controlled experiments. Among other things, these empirical studies will address the question under what circumstances the descriptions should be logically under- or over specific; they will also allow us to prune the search space (ie, the space of all descriptions) which would otherwise threaten to make the problem intractable. The project combines (psycho) linguistic, computational and logical challenges and should be of interest to people whose intellectual home is in either of these areas.

Project Members

  • Kees van Deemter (PI, University of Aberdeen)
  • Richard Power (Co-Investigator, Open University)
  • Emiel Krahmer (Visiting Fellow, University of Tilburg)
  • Ielka van der Sluis (Post-Doctoral Research Fellow)
  • Albert Gatt (Research student)
  • Sebastian Varges (Post-Doctoral Research Fellow, 2003-2005)

Background Reading

Papers that describe some of the technical background to the project (in pdf format):

TUNA Publications

Reports

Journal papers

Book chapters

  • van Deemter, K., and Krahmer, E. (2007). Graphs and Booleans . H. Bunt and R. Muskens (eds.), Computing Meaning III. Dordrecht: Kluwer Academic Publishers.

Conference papers

2007

2006

2003-2005

Workshop papers

2007

2003-2006

Annotated Bibliography

The bibliography is split into categories for convenience. Works may be relevant to more than one category. Each category contains links to relevant references listed in other categories. Links to papers are given where available. Some papers have an associated description.

Algorithms and meta-algorithms for GRE

Bateman, J.A. (1999). Using aggregation for selecting content when generating referring expressions . Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, ACL-99. Extends a lattice-based approach to aggregation to the case of referring expressions generation. Lattices are used to represent common properties between entities (nodes in the lattice). This results in a static representation of domain knowledge which can be processed efficiently to select identifying properties of a target referent and approximate minimal descriptions.

Dale, R. (1989). Cooking up referring expressions . Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, ACL-89. Describes the GRE algorithm implemented in the EPICURE system, which generates recipes. Input to the algorithm is a structured representation, an instance of the basic ontological category physobj containing information about the properties, quantity and state of the object, and whether it is mass or count. The algorithm proceeds to search for a distinguishing description by selecting properties on the basis of their discriminatory power, calculated in terms of the number of distractors they exclude. Based on a greedy heuristic, the algorithm seeks to satisfy the 'full brevity' interpretation of the Gricean maxim of quantity; the shortest possible distinguishing description is generated. See Oberlander and Dale (1991) for an extension of the algorithm to events.

Dale, R., and Reiter, E. (1995). Computational interpretation of the Gricean maxims in the generation of referring expressions . Cognitive Science, 19(2): 233-263. Describes previous approaches to GRE: the Full Brevity algorithm, based on the greedy heuristic, and Local Brevity. Argues for a weak interpretation of the Gricean maxim of quantity, based on psycholinguistic evidence. Demonstrates the intractability of the 'full brevity' approach to descriptions: finding a brief description is equivalent to minimal set cover, i.e. is NP-Hard. Proposes the Incremental Algorithm which performs hillclimbing along a predetermined preference ordering of descriptors, without backtracking, resulting in descriptions which contain some redundant descriptors. Redundancy is justified on the basis of psycholinguistic evidence. Complexity is linear in the number of descriptors. See Dale and Reiter (1996) for a more detailed outline of the theoretical stance on the Gricean maxims. See Jordan and Walker (2000) for an empirical evaluation of the Incremental Model relative to other theories of reference.

Horacek, H. (1997). An algorithm for generating referential descriptions with flexible interfaces . Proceedings of the 35th Annual meeting of the Association for Computational Linguistics, ACL-97.

Krahmer, E., van Erk, S., and Verleg, A. (2001). A meta-algorithm for the generation of referring expressions . Proceedings of the 8th European Workshop on Natural Language Generation.

Krahmer, E., van Erk, S., and Verleg, A. (2002). Graph-based generation of referring expressions . Computational Linguistics, 28(1). Reinterprets the GRE content selection task as a subgraph isomorphism problem, formalising the domain as a labelled directed graph D, with vertices representing entities and arcs representing properties. Atomic properties and 2-place relations are uniformly represented as disjoint subsets of the set of labels. Avoids the problem of infinite recursion in the generation of relational descriptions reported by Dale and Haddock (1991) . Finding a distinguishing description for an intended referent e is a process of constructing a subgraph G of D which corresponds to e and to no other entity. A description of a branch-and-bound algorithm is given to resolve this problem: To identify a referent e, the algorithm starts with the subgraph containing only the vertex e and recursively expands the graph by adding edges from D which are adjacent to the subgraph G. It is shown that the graph-theoretic framework can accommodate other approaches, such as Dale and Reiter's Incremental Algorithm. Contains proposals to incorporate salience weightings and cost-functions to guide subgraph expansion.

Mouret, P., and Rolbert, M. (1998). Dealing with distinguishing descriptions in a guided composition system . Proceedings of the 17th Conference on Computational Linguistics, COLING/ACL-98. Approaches the GRE problem from the point of view of guided composition in user interfaces, where the user is informed at every step what the options are for completing the current utterance. In this paradigm, the GRE problem is not only to identify a description as distinguishing, but also to identify possible completions of an incomplete description. The paper offers a formalisation of Dale's (1989) notion of distinguishing descriptions, and extends it to cover cases of ~inclusion, where one description subsumes another either because of hyperonymy (the dog vs. the animal) or because of given information about the intended referent in prior discourse (the child who robbed -> the robber)

Reiter, E. (1990a). The computational complexity of avoiding false implicatures . Proceedings of the 28th Annual meeting of the Association for Computational Linguistics, ACL-90. Proves the intractability of the Full Brevity algorithm of (Dale, 1989) (equivalent to a Minimal Set Cover problem). Proposes a version of brevity called Local Brevity, incorporating a weaker version of the Gricean maxim of Quantity. Algorithm proceeds by checking that a component of a description cannot be replaced locally by a briefer new component without loss of discriminatory power. Complexity is polynomial. See also: Reiter, 1990b .

Reiter, E., and Dale, R. (1992). A fast algorithm for the generation of referring expressions. Proceedings of the 14th International Conference on Computational Linguistics, COLING-92. An earlier version of the Incremental Algorithm of Dale and Reiter (1995) .

Stone, M., and Webber, B. (1998). Textual economy through close coupling of syntax and semantics . Proceedings of the 9th International Workshop on Natural Language Generation. Describes the approach to generating object descriptions in the SPUD system, which combines semantics/pragmatics and their associated syntactic structure in an incremental approach to description building using an ontologically promiscuous knowledge representation and a Lexicalised Tree Adjoining Grammar. At a particular state, the representation of a sentence consists of (a) an instantiated tree; (b) the semantic requirements associated with the tree, and (c) its semantic contributions.

van Deemter, K., and Krahmer, E. (2006). Graphs and Booleans: On the generation of referring expressions . H.Bunt, and R.Muskens (Eds.), Computing Meaning (Vol. III). Dordrecht: Kluwer. Extends the graph-based formulation of Krahmer et al. (2003) to include Boolean operations such as complementation (negation), reference to sets, and set union (disjunction). For the latter, a graph partition algorithm is proposed, which seeks to construct descriptions in Disjunctive Normal Form, rewritten as partitions. Partitions are constructed at increasing levels (starting from level 1), until a distinguishing description is found.

Pragmatics of reference, dialogue and description planning

Appelt, D. (1985a). Some pragmatic issues in the planning of definite and indefinite noun phrases . Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics, ACL-85.

Appelt, D. (1985b).Planning English referring expressions . Artificial Intelligence, 26: 1-33. [Reprinted in: B. J. Grosz, K. Sparck Jones, and B. L. Webber (Eds.). (1986). Readings in Natural Language Processing. Los Altos, Ca.: Morgan Kaufmann]. A classic paper describing the NP generation component of the KAMP system. The generation process is modelled as an interactive process between a speaker (modelled the system) and a hearer. Inference about speaker and hearer goals, based on Speech Act theory, guide the generation process. The section on generation of definite, referential NPs contains one of the earliest insights into the computational complexity of generating provably minimal descriptions. Appelt proposes a naive incremental model which is, historically, a precursor to Dale and Reiter (1995) .

Appelt, D. (1987a). Reference and pragmatic identification . Proceedings of Theoretical Issues in Natural Language Processing, TINLAP-87.

Appelt, D. (1987b). Towards a plan-based theory of referring actions. In: G. Kempen (Ed.), Natural Language Generation: New Results in Artificial Intelligence, Psychology and Linguistics. Dordrecht: Nijhoff and NATO Scientific Affairs Division.

Dale, R., and Reiter, E. (1996). The role of the Gricean maxims in the generation of referring expressions . Proceedings of the AAAI-96 Spring Symposium on Computational Models of Conversational Implicature. A theoretical outline of the role of the Gricean maxims in GRE, with reference to the Dale and Reiter (1995) Incremental Algorithm. The argument is that, rather than directives on human communication, the Gricean maxims are post hoc descriptions of aspects of rational communicative behaviour. Hence, rather than directly modelling the maxims as constraints, GRE algorithms should satisfy their observations if they are sufficiently goal-directed.

Heeman, P.A., and Hirst, G. (1995). Collaborating on referring expressions . Computational Linguistics, 21(3).

Kronfeld, A. (1986). Donnellan's distinction and a computational model of reference . Proceedings of the >24th Annual Meeting of the Association for Computational Linguistics, ACL-86.

Kronfeld, A. (1987). Goals of referring acts . Proceedings of Theoretical Issues in Natural Language Processing, TINLAP-87.

Kronfeld, A. (1989). Conversationally relevant descriptions . Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, ACL-89 Argues for a distinction between the functional relevance of references, whereby they distinguish their referents, and their conversational relevance, which is related to whether their content is relevant to the current discourse. One of the earliest insights into the problems of relevance and perspective in reference.

Kronfeld, A. (1990). Reference and Computation: An Essay in Applied Philosophy of Language. Cambridge: CUP.

O'Donnell, M., Cheng, H., and Hitzeman, J. (1998). Integrating referring and informing in NP planning . Proceedings of the COLING-ACL workshop on the Computational Treatment of Nominals, ACL-98. Describes the GRE component of the ILEX system. Extends the standard model of GRE as 'generation of identifying descriptions' to NPs which contain attributes that are informative, but are not necessary for identification. The model is based on systemic-functional grammar. During the formation of a referring NP, the informing task influences decisions on which attributes to include in the description, choice of head noun, as well as the form that the final NP is realised as (deictic, definite etc).

Paris, C.L., and McKeown, K.R. (1987). Discourse strategies for describing complex physical objects. In: G. Kempen (Ed.), Natural Language Generation: New Results in Artificial Intelligence, Psychology and Linguistics. Dordrecht: Nijhoff and NATO Scientific Affairs Division.

Reiter, E. (1990b). Generating descriptions that exploit a user's domain knowledge. In: R. Dale, C. Mellish, and M. Zock (Eds.), Current Research in Natural Language Generation. New York & London: Academic Press

Empirical Approaches and Evaluation

Gupta, S., and Stent, A. J. (2005). Automatic evaluation of referring expression generation using corpora . Proceedings of the 1st Workshop on Using Corpora in NLG, Birmingham, UK. Evaluates a number of algorithms for GRE, including Dale and Reiter's (1995) Incremental algorithm and Siddharthan and Copestake's (2004) GRE algorithm. Algorithms were also augmented with a function for realising modifiers pre- and post-nominally. Evaluation was automatic, and carried out on the COCONUT and MAPTASK corpora. The output of the algorithms was compared to a baseline, which always selected type information and then arbitrarily added modifiers until the description was distinguishing. The baseline performed best on MAPTASK, but the Dale/Reiter and Siddharthan/Copestake algorithms performed better on the COCONUT data, in which domain objects tend to be more complex, requiring attribute selection.

Jordan, P., and Walker, M. (2000). Learning attribute selections for non-pronominal expressions . Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, ACL-00.

Jordan, P., and Walker, M. (2005). Learning content selection rules for generating object descriptions in dialogue . Journal of Artificial Intelligence Research, 24: 157--194. Evaluates three competing models of GRE: Dale and Reiter's (1995) incremental model; Brennan and Clark's (1996) conceptual pacts model, and an alternative Intentional Influences model proposed by Jordan (2000), which proposes that attribute selection in a referring expression is a function of current communicative intentions and task constraints. The models were evaluated by training a machine learner on descriptions from the COCONUT corpus, annotated with the relevant information. Comparison of the machine learner's output with descriptions in the corpus was analysed. When comparing the models, Intentnaional Influences provides the best fit to the data (42.4%), although a combination of all 3 models performs best (60%).

Relations, Spatial/Scene Descriptions, Reference to Events

Arbib, M.A., Conklin, E.J., and Hill, C. (1986). From Schema Theory to Language. Oxford: OUP.

Presents an integrated psycholinguistic, neuro-cognitive and computational approach to language, based on a schema-theoretic model of cooperative computation that seeks to integrate information from different modalities, including vision. Part II contains a description of a scene description generation system built by Conklin, with some discussion of the way salience dynamics were incorporated in the content-selection process for object description.

Conklin, E.J., and McDonald, D.D. (1982). Salience: The key to the selection problem in natural language generation . Proceedings of the 20th Annual Meeting of the Association for Computational Linguistics, ACL-82.

Dale, R., and Haddock, N. (1991). Generating referring expressions containing relations . Proceedings of the 5th Conference of the European Chapter of the ACL, EACL-91. A constraint-based approach to generating distinguishing descriptions containing relations. Algorithm takes three data structures as input: (a) a referent stack with the intended referents; (b) a property set for the intended referent; (c) a constraint network containing properties for the description and the set of domain variables constituting the distractor set. Algorithm proceeds by recursively updating the constraint network until a distinguishing description is found. The search procedure is depth-first. Problems occur when the algorithm keeps trying to recursively identify the referent and the relatum, generating descriptions such as the cup on the floor which is holding the cup which is... Proposed solution: a heuristic that prevents objects from being mentioned more than once in a description.

Horacek, H. (1995). More on generating referring expressions. Proceedings of the 5th European Workshop on Natural Language Generation.

Horacek, H. (1996). A new algorithm for generating referential descriptions. Proceedings of the European Conference on Artificial Intelligence, ECAI-96. Attempts to bridge the proposals of Dale and Reiter (1995) and Dale and Haddock (1991) into a unified algorithm that combines depth-first and breadth-first search, overcoming some of the limitations in the previous proposals by including (a) Categorial expectations, i.e. the contextually motivated expectation of the category of the intended referent (which is used to rule out distractors); (b) proposing a unified treatment for atomic and relational descriptors; (c) imposing a parameter max on the depth of search (avoiding the infinite recursion found by Dale and Haddock) and a predicate salient for particularly salient descriptors that warrant inclusion even if they have no discriminatory power; (d) combining depth-first and breadth-first search via iterative deepening, favouring flat over embedded descriptions. Algorithm is composed of two sub-routines: describe-by-relation and describe-by-attribute, both of which maintain a constraint network. Input is a communicative goal to identify an intended referent. Successfully generates descriptions such as (1) the table on which there are two bottles without going into infinite looping. Does not incorporate an account of negation and disjunction.

Neumann, B., and Novak, H-J. (1983). Event models for recognition and natural language description of events in real-world image sequences. Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI-83. Neumann, B. (1984). Natural language description of time-varying scenes. FBI-HH-B-105/84, Fachbereich Informatik, Universitat Hamburg.

Novak, H-J. (1987).Strategies for generating coherent descriptions of object movements in street scenes. In: Kempen, G. (Ed.), Natural Language Generation: New Results in Artificial Intelligence, Psychology and Linguistics. Dordrecht: Nijhoff & NATO Scientific Affairs Division. Describes the NAOS system, which generates descriptions of visual street scenes containing objects and events, after event recognition occurs through micro-analysis of the visual scene. Generation system makes use of an event hierarchy, temporal distinctions between event types (durative/non-durative/inchoative), and case frames a` la Fillmore (1968). The GRE module REF is based on an open-world database. Two strategies for GRE are proposed: (a) The system generates referring expressions of an object based on its properties, taking into account whether an object of the same type has already been introduced (which triggers the use of 'other'-anaphora). (b) Objects that have the same properties are distinguished by ordinal numerals (the first X, the second X...).

Novak, H-J. (1988). Generating referring phrases in a dynamic environment . In: M. Zock, and G. Sabah (Eds.), Advances in Natural Language Generation, Vol. II. USA: Pinter

Oberlander, J., and Dale, R. (1991). Generating expressions referring to eventualities. Proceedings of the 15th Annual Meeting of the Cognitive Science Society. Extends the GRE algorithm proposed in Dale (1989) to cover reference to events, taking into account the event/process distinction. While the ontology in Dale (1989) distinguishes mass and count, this algorithm extends the ontology using the analogy between mass/count and event/process proposed by Bach (1986).

Walltz, D. (1981).Generating and understanding scene descriptions. In A. Joshi, B. Webber, and I. Sag (Eds.), Elements of Discourse Understanding. Cambridge: CUP.

Relevant links in other sections:

Salience and context-sensitive GRE

Krahmer, E., and Theune, M. (2002).Efficient generation of descriptions in context. In: K. van Deemter, and R. Kibble (Eds.), Information Sharing. Stanford: CSLI.

An extension of Dale and Reiter's (1995) Incremental Algorithm to take context into account in the generation of reduced anaphoric descriptions and pronouns. The modified Incremental Algorithm uses a salience metric to identify which entities are salient in a given discourse segment, treating only these as the distractors of the intended referent in context. The salience metric is based on a combination of Centring Theory and the Prague theory of discourse focus. The paper also contains an experimental evaluation of the hypotheses that the algorithm seeks to model.

Pattabhiraman, T., and Cercone, N. (1990). Selection: Salience, relevance and the coupling between domain-level tasks and text planning . Proceedings of the 5th International Workshop on Natural Language Generation.

Stevenson, R. (2002). The role of salience in the production of referring expressions: A psycholinguistic perspective. In: K. van Deemter, and R. Kibble (Eds.), Information Sharing. Stanford: CSLI.

TUNA corpus

About the corpus

The TUNA Reference Corpus is a semantically and pragmatically transparent corpus of identifying references to objects in visual domains. It was constructed via an online experiment, and has since been used in a number of evaluation studies on Referring Expressions Generation, as well as in two Shared Tasks: the Attribute Selection for Referring Expressions Generation task (2007), and the Referring Expression Generation task (2008).

Obtaining the TUNA Corpus

A version of the corpus was released for public distribution in October 2009. It forms part of the ELRA Language Resources Catalogue , and can be obtained by contacting ELRA directly. Alternatively, you can download the latest distribution from here.

Annotation and documentation

The following documents describe the annotation procedure and XML format of the corpus:

  1. Van der Sluis, I., A. Gatt and K. van Deemter (2006). Manual for the TUNA Corpus: Referring expressions in two domains. Technical Report AUCS/TR0705, University of Aberdeen.
  2. Gatt, A., van der Sluis, I., and van Deemter, K. (2008). XML Format Guidelines for the TUNA Corpus . Technical Report, University of Aberdeen.

Publications related to the corpus

These papers describe evaluation studies involving the TUNA Corpus, as well as giving further details on the design of the experiment and annotation.

  1. van Deemter, K., van der Sluis, I. & Gatt, A. (2006). Building a semantically transparent corpus for the generation of referring expressions . Proceedings of the 4th International Conference on Natural Language Generation (Special Session on Data Sharing and Evaluation), INLG-06.
  2. Gatt, A., van der Sluis, I. & van Deemter, K. (2007). Assessing algorithms for the generation of referring expressions, using a semantically and pragmatically transparent corpus .
  3. van der Sluis, I., Gatt, A. & van Deemter, K. (2007). Evaluating algorithms for the generation of referring expressions: Going beyond toy domains .
  4. Gatt, A. and van Deemter, K. (2007). Incremental generation of plural descriptions: Similarity and partitioning .
  5. Gatt, A.,van der Sluis, I., and van Deemter, K. (2007). Corpus-based evaluation of referring expressions generation . Workshop on Shared Tasks and Evaluation in NLG, Arlington, Virginia.