Pre-conference
Workshops
NLP in CALL3rd Eurocall workshop organised by the SIG in Language Processing |
|||||||||
|
|
Aim of the Workshop | ||||||||
|
|
Schedule | ||||||||
|
|
Papers | ||||||||
|
(a)
|
|||||||||
|
(b)
|
|||||||||
|
(c)
|
|||||||||
|
(d)
|
Learner-corpora and NLP | ||||||||
|
(e)
|
What have you done for me lately? The fickle alignment of NLP and CALL | ||||||||
|
|
Cost to Participants | ||||||||
|
Aim of the Workshop The workshop title is in part borrowed from a presentation John Sinclair gave at a one-day conference on NLP in CALL in Manchester in 1998, which was jointly organised by Eurocall and the Centre for Computational Linguistics at UMIST (Manchester). Four years later, it seems appropriate to again endevour to assess the validity of parser-based and corpus-based approaches in CALL research and development. During this workshop, participants will be introduced to examples of Natural Language Processing (NLP) approaches in CALL and will have the chance to familiarise themselves with the application of NLP techniques in CALL. The way in which such technology can be integrated in computer-assisted language learning will be discussed. The Special Interest Group in Language Processing is Eurocall's newest SIG. The group organised a successful pre-conference workshops for EUROCALL2000 in Dundee and EUROCALL2001 in Nijmegen. This year's (third) workshop places emphasis on two areas of language processing that a highly relevant to CALL: morpho-syntactic parsing for error diagnosis and the use of corpora for language learning and teaching. It brings together presenters from Finland, Germany, Sweden and Switzerland. |
|||||||||
|
|
|||||||||
| 09:00 - 09:30 | Registration / Coffee | ||||||||
| 09:30 - 09:45 | Opening |
Trude
Heift Workshop Chair
Mathias
Schulze Chair of SIGLP |
|||||||
| 09:45 - 10:30 | Anju Saxena, Lars Borin | ||||||||
| 10:30 - 11:15 | Sébastien L’haire | ||||||||
| 11:15 - 12:00 | Steve Legrand | ||||||||
| 12:00 - 13:30 | Lunch | ||||||||
| 13:30 - 14:15 | Veit Reuer | ||||||||
| 14:15 - 15:00 | Lars Borin | ||||||||
| 15:00 - 16:00 | Round Table Discussion | with the panel of contributors | |||||||
| 16:00 - 16:30 | Coffee Break | ||||||||
|
Anju Saxena1 ;Lars Borin1,2 1
Department of Linguistics, Uppsala University, Sweden
2 Computational Linguistics, Department of Linguistics, Stockholm University, Sweden It is generally acknowledged that the goal of teaching grammar in Linguistics should not primarily be that students memorize definitions of concepts and grammatical constructions, but rather that they understand and learn to recognize different structural patterns. This can hardly be achieved without giving students practical training in the skill of grammatical analysis. Research has shown that hands-on problem-solving is more stimulating and thought-provoking than when the information and results are handed down to the pupils during lectures. With this in mind, we formulated a project (The work described here forms part of the project IT-based Collaborative Learning in Grammar, a collaboration between the universities in Uppsala and Stockholm, funded by the Swedish Agency for Distance Education (DISTUM), for the three years 2002–2004. Anju Saxena is the principal investigator for the project. See also <http://www.ling.uu.se/anjusaxena/distum.html>.) for realizing a new format for teaching courses in grammar in Linguistics, Computational Linguistics, and lesser-taught languages, where practical training and corpus-based exercises will comprise an integral part of the students’ learning process. The proposed web-based training material has a modular architecture, composed of four types of modules:
One
type of exercise in module 3 will use a treebank (Talbanken; Teleman
1974) together with a grammar writer’s workbench, in a refinement of an
idea presented by Borin & Dahllöf (1999). We propose to use grammar
rules written by students (using an existing parser frontend) as search
expressions in the treebank. Given an NP rule formulated by a student,
we could automatically tell how many treebank POS sequences matching the
rule actually make up NPs, how many are not NPs, and how many NPs in the
treebank are not described by the rule. There are all kinds of conceivable
interesting elaborations of this basic scheme, which could be seen as
a more linguistically sophisticated parallel to the use of (unannotated)
text corpora and concordancing software in so-called data-driven language
learning (Flowerdew 1996).
At
the moment, we are locating and evaluating NLP resources, mainly on the
web. The evaluation is to be mainly pedagogical, i.e. we will ask ourselves
whether a particular resource will be suitable for the pedagogical framework
that we have adopted for teaching grammar. However, usability—as the term
is used in Human–Computer Interaction research—will also be an important
evaluation criterion, as well as the the estimated effort needed to adapt
the resource for our needs. As the corpora are in place already, we are
now evaluating tools for the manipulation and visualization of corpus
data, parsing systems, and grammar writer’s workbenches, which raises
a number of compatibility/standardization issues that need to
be resolved.
References Borin,
Lars & Mats Dahllöf 1999. A corpus-based grammar tutor for Education
in Language and Speech Technology. EACL'99, Computer and Internet Supported
Education in Language and Speech Technology. Proceedings of a Workshop
Sponsored by ELSNET and The Association for Computational Linguistics.
University of Bergen, Norway. 36–43.
Flowerdew,
John 1996. Concordancing in language learning. The power of CALL,
ed. by Martha Pennington. Houston, texas: Athelstan.
Saxena, Anju. 2000. Corpora of lesser-known languages on the internet: A pedagogical tool for the teaching of syntax. Paper presented at the workshop on IT inom språkundervisningen. Uppsala University. <http://www.ling.uu.se/anjusaxena/symposium0303.html>. Teleman, Ulf 1974. Manual för grammatisk beskrivning av talad och skriven svenska. Lund: Liber. What
have you done for me lately? The fickle alignment of NLP and CALL
Natural Language Processing (NLP; or Computational Linguistics, CL; or Language Engineering, LE; or Language Technology, LT)—which deals precisely with the use of (natural) language by computers—ought to be eagerly brought to bear on the task of developing Computer-Assisted Language Learning (CALL) applications by CALL practitioners. Similarly, NLP researchers ought to be interested in (human) first and second language learning, and in developing NLP systems in support of language development and learning. Unfortunately, neither is actually the case. In the recent broad Survey of the state of the art in human language technology (Cole et al. 1996), there is not a single word about (human) language learning. Similarly, CALL contributions at the biennal international conference on computational linguistics (COLING) have been next to nonexistent (e.g. Borissova 1988; Zock 1996; Schneider and McCoy 1998; Burstein and Marcu 2000). Much of the work on using NLP in CALL has been pursued under the heading of Artificial Intelligence (AI; a field which overlaps minimally with mainstream NLP; see Swartz and Yazdani 1992; Holland et al. 1995), particularly in the area of Intelligent Tutoring Systems (see Frasson et al. 1998; Goettl et al. 1998). Chapelle (1997, 2001) is not optimistic about the contributions of AI/NLP to CALL, although at least in her 2001 book, the NLP work that she reviews (under the headings “Artificial intelligence” and “Computational linguistics”; 2001: 32–36) is in most cases more than a decade old, in a field which has seen very rapid development in the last ten years. On a more positive note, there have been some international workshops on NLP and CALL, sometimes in connection with CL conferences (e.g. Jager et al. 1998; Olsen 1999; Schulze et al. 1999; Efthimiou 2000), although these, too, seem to depend on fortuitous circumstances, rather than a conviction that CALL is an important NLP application; thus, the Language resources and tools for educational applications workshop held at LREC 2000 (Efthimiou 2000) will not be repeated at the upcoming LREC 2002. Some factors that could be instrumental in fostering the attitudes in the two communites (NLP and CALL) toward each other are:
I
will discuss these and other factors in more detail in the paper, and
also try to speculate about how to change this state of affairs.
References Allwood,
Jens and Lars Borin 2001. Datorer och språkteknologi som hjälpmedel
i bevarandet av romani – Computers and language technology as an aid in
the preservation of Romani. Plenary presentation at the symposium Romani
as a language of education: possibilities and restrictions today,
Göteborg University, 19–20 January 2001.
Amiri,
Faramarz 2000. IT-literacy for language teachers: Should it include computer
programming? System 28: 77–84.
Borissova,
Elena 1988. Two-component teaching system that understands and corrects
mistakes. COLING Budapest. Proceedings of the 12th International Conference
on Computational Linguistics. Vol I. Budapest: John von Neumann Society
for Computing Sciences. 68–70.
Burstein, Jill and Daniel Marcu 2000. Benefits of modularity in an automated essay scoring system. Proceedings of the COLING–2000 workshop on using toolsets and architectures to build NLP systems. Centre Universitaire, Luxembourg, 5 August 2000. Chapelle, Carol 1997. CALL in the year 2000: Still in search of research paradigms? Language Learning & Technology 1(1): 19–43. Available on the WWW via <http://llt.msu.edu>. Chapelle, Carol 2001. Computer applications in second language acquisition. Cambridge: Cambridge University Press. Cole, Ron, Joseph Mariani, Hans Uszkoreit, Annie Zaenen and Victor Zue (eds.) 1996. Survey of the state of the art in human language technology. Cambridge: Cambridge University Press. Available on the WWW as <http://cslu.cse.ogi.edu/HLTsurvey/>. Diamond, Jared 1998. Guns, germs and steel. A short history of everybody for the last 13,000 years. London: Vintage. Efthimiou, Eleni (ed.) 2000. LREC 2000. Second international conference on language resources and evaluation. Workshop proceedings: Language resources and tools for educational applications. Athens: ILSP. Frasson, Claude, Gilles Gautier and Alan Lesgold (eds.) 1998. Intelligent tutoring systems. Third International Conference, ITS '96. Montréal, Canada, June 12–14, 1996. Proceedings. Lecture notes in computer science 1086. Berlin: Springer. Goettl, Barry P., Henry M. Halff, Carol L. Redfield and Valerie J. Shute (eds.) 1998. Intelligent tutoring systems. 4th International Conference, ITS '98. San Antonio, Texas, USA, August 16–19, 1998. Proceedings. Lecture notes in computer science 1452. Berlin: Springer. Holland, V. Melissa, Jonathan D. Kaplan and Michelle R. Sams (eds.) 1995. Intelligent language tutors: theory shaping technology. Mahwah, New Jersey: Lawrence Erlbaum Associates. Jager, Sake, John A. Nerbonne and A.J. van Essen (eds.) 1998. Language teaching and language technology. Lisse: Swets & Zeitlinger. Olsen, Mari Broman (ed.) 1999. Computer mediated language assessment and evaluation in natural language processing. A joint ACL–IALL symposium. Retrieved from the WWW in July 1999: <http:// umiacs.umd.edu/~molsen/acl-iall/accepted.html>. Schneider, David and Kathleen F. McCoy 1998. Recognizing syntactic errors in the writing of second language learners. COLING-ACL '98. Proceedings of the Conference, Vol. II. Montréal: Université de Montréal. 1198–1204. Schulze, Mathias, Marie-Josée Hamel and June Thompson (eds.) 1999. ReCALL: Language processing in CALL. Proceedings of a one-day conference “Natural Language Processing in Computer-Assisted Language Learning”, a special ReCALL publication. Hull: The CTI Centre for Modern Languages, University of Hull, UK. Sparck Jones, Karen 1996. How much has information technology contributed to linguistics?. Presentation at the British Academy Symposium on Information Technology and Scholarly Disciplines, 18–19 October 1996. The page references in the text are to the electronic version available via <http://xxx.lanl.gov/cmp-lg/9702011/>. Swartz, Merryanna L. and Masoud Yazdani (eds.) 1992. Intelligent tutoring systems for foreign language learning. Berlin: Springer-Verlag. Zock, Michael 1996. Computational linguistics and its use in real world: the case of computer assisted-language [sic] learning. COLING–96. The 16th international conference on computational linguistics. Proceedings, vol. 2. Copenhagen, Denmark: Center for Sprogteknologi. 1002–1004. Syntactic
and ‘semantic’ error detection In this talk, we intend to present the research conducted at the University of Geneva in the framework of the European research project FreeText, which aims at developing an advanced hypermedia CALL software featuring NLP tools for a smart treatment of authentic documents and (relatively) free production exercises. The system targets intermediate to advanced learners of French. We use various NLP tools to provide the learners with intelligent feedback: a sentence structure viewer; a diagnosis tool; a speech synthesizer, which can pronounce either the software’s or the learners’ sentences; and a sentence reformulation tool. The presentation will focus on the techniques used for the error detection, the main goal of which is to give the learners an appropriate feedback for production exercises. The learner’s answer is compared, if applicable, with a model answer stored in the exercise database. If the sentences do not match, we start a 3-step procedure, involving spell checking, syntactic checking and ‘semantic’ checking. The procedure can be interrupted at any step, when an error is detected. We will not detail the well-known techniques of spell-checking. The syntactic error detection uses three different techniques. The main one is constraint relaxation. If the parser can only give a partial analysis, we try to relax some constraints to obtain a complete analysis, for instance agreement rules or verb and adjective complementation. The second technique is phonetic reinterpretation: we try to build a correct sentence by looking for homophones in the lexicon at the boundaries of the partial analysis chunks. The third technique is called chunk reinterpretation, where ad-hoc rules are applied. The results of the three techniques are combined in order to give an intelligent feedback. A sentence can be syntactically correct but nevertheless wrong in the context of the exercise. For the question “as-tu vu les voitures rouges?” (did you see the red cars?), if the instructions are to answer with a pronoun, the answer is “je les ai vues” (I’ve seen-AGR them). The learner could type “je les ai vus” which is a correct sentence, but a wrong answer, since “voiture” is a feminine noun. The pronoun “les” is both masculine and feminine, and the past participle “vu” must agree with the pre-posed object complement. Therefore “les” is feminine and the past participle must be “vues” feminine plural. Our semantic checker is able to detect such mismatches and works as follows: a semantic representation of both sentences, using “pseudo-semantic structures” which combine both lexical and abstract information, is extracted. Then we compare the learner’s answer with the model answer. These structures remain the same, regardless of the construction used (active, passive, focus etc.) - only some abstract features change. Transformation exercises are easy to construct. The teacher needs enter only one sentence in the database, while, with systems using pattern matching, formulas must be entered and all possibilities listed. Ontology
Enrichment with Conceptual Structures for Cross-Linguistic Disambiguation
Conceptual structures (1) can be understood as those structures of mind that have developed in living organisms during their evolution in interactions with the changing environmental conditions. These structures are reflected in the semantics and are partially captured in the syntax of a natural language. However, natural language is, by no means, the only expression of those conceptual structures: all the other senses such as hearing, vision etc. employ the same structures. The value of these structures lies in their universality: languages may vary, but as all the human beings have presumably similar evolutionary development behind them, those conceptual structures should vary very little from region to region and between individuals. This gives hope that some universal semantic structures encoded in syntax may, in fact, be found in all languages and could be employed productively in many natural language processing tasks such as language learning and translation. Ontology enrichment differs from lexico-syntactic approach to annotation in certain respects. It does not exclude the use of real-world ontologies but is designed to work in tandem with them in a framework to be created for the purpose in the current study. The framework will use PIA (Platform for Information Applications) to annotate text with functional tags based on conceptual semantics that can then be used by information agents in various transactions. These XML-compatible tags contain instructions created with the help of a scripting language with a complete Turing engine functionality. Although this study concentrates on semantic components, the motivation behind is to allow the addition of real-world knowledge to semantic disambiguation with a minimum of effort. The framework will also allow component-based collaborative development of tagsets. Cross-linguistic disambiguation uses tags incorporating lexical semantic components to disambiguate text to boost the disambiguation accuracy of current parsers (Grammatical, stochastic, rule-based, syntactical and their combinations). As the conceptual structures used are universal, they can be used as a system of interlingua between several languages(2). In each language, the syntax of that language uses only a small subset of possible conceptual semantics structures available, and there are differences between languages in that respect. These differences in the use of conceptual semantic structures form often a stumbling block in language learning. For example, the Finnish sentence: Minulla on nälkä (’I have hunger’) translated to English: I am hungry may not be self-evident for a novice language learner. However, although the sentences have a different syntactic structure on the surface, they can be represented independently on the conceptual level. These syntactic/conceptual differences between languages can be marked automatically with the help of the tagset. The language learner or teacher can be alerted (by highlighting the relevant parts, for example) to take note of the structures to concentrate on. The differences in syntax could also be made more explicit at that stage. This would create positive (learning through surprise) rather than negative (learning through failure) enforcement to the learner and would, therefore, make learning more effective. [1] Jackendoff, R.: Semantic Structures, The MIT Press, Cambridge, Mass., 1990 [2]
Dorr, B.J., The Use of Lexical Semantics in Interlingual Machine Translation.
Journal of Machine Translation, 7:3, pp. 135-193, 1992.
Veit Reuer, University of Osnabrück, Germany There have been various approaches to grammar checking in natural language processing, i.e. recognition and correction of errors both in the field of language learning and word-processing. Approaches which do not use the anticipation of errors, usually are not very efficient. Therefore some steps are taken to improve the performance of the system. In the classical approach to PS-Rule-parsing, Mellish89 uses very complicated heuristics to determine possible next moves by the parser. Menzel98 in theis approach weighs the constraints used in the dependency grammar. Other approaches use anticipation and encode the position and type of possible errors somewhere in the grammar or the lexicon. Only very few developers have actually used learner-corpora to determine the outcome of the parsing-modifications. An exception is McCoy98. To enhance the efficiency of non-anticipating parsers and possibly to supply material for anticipation-based systems we have analyzed learner-corpora for error-types occurring most frequently. Since our interest lies in German as a second language, a corpus from Heringer97 has been used, which contains 7107 German sentences marked with error codes and corrections. 403 error classes were used. A second annotated corpus collected at the Universitat de Barcelona by Oliver Strunk is currently being investigated. Note that annotating a corpus with errors-flags has its own difficulties which will not be discussed here. In the following we present some conclusions that were reached.
Parsers
should therefore be expected to concentrate on omission and permutation
instead of treating every linearization-error the same as e.g. in Mellish89.
Unfortunately government-errors were not marked as such in the corpus,
but the number for e.g. a verb governing a certain case is around 8 per
cent.
Although
some publications about error-recognition have mentioned the high number
of syntatic errors in general as opposed to semantic or orthographic errors,
there have not been any analyses of frequencies of syntactic errors in
the context of NLP and CALL to our knowledge. In our opinion these analyses
could be well used for the improvement of syntactic parsers used in CALL.
References Heringer,
Hans Jürgen (1995): Aus Fehlern lernen, Universität Augsburg,
CD-ROM for Win9x/NT
Mellish,
Chris S. (1989): Some Chart-based Techniques for Parsing Ill-Formed Input,
in Proc. 27th ACL-Conference, 102-109
Menzel, Wolfgang and Schröder, Ingo (1998): Constraint-based Diagnosis for Intelligent Language Tutoring Systems, Universität Hamburg, Fachbereich Informatik Report Nr. FBI-HH-B-208-98 Schneider,
David and McCoy, Kathleen F. (1998): Recognizing Syntactic Errors in the
Writing of Second Language Learners, Proc. 17th Int. COLING-Conference
on Computational Linguistics, Montreal Costs to Participants |
|||||||||
|
Participant fee
|
EUR 80.00 | ||||||||
| back to the top | |||||||||
Get an EUROCALL conference logo for your own site HERE>>>