The OpenNET Project / Index page

[ новости /+++ | форум | теги | ]

Поиск:  Каталог документации | Other

Natural Language Processing FAQ

This posting contains Frequently Asked Questions (FAQ) about natural language processing and their answers. It should be read by anyone who wishes to post to the newsgroup.
Last-Modified: Fri Feb  2 14:18:48 EST 2001
Posting-Frequency: Monthly
Version: 0.1
Archive-Name: natural-lang-processing-faq

This is the latest release of an FAQ (frequently asked questions and
answers) list for the newsgroup. Please don't
hesitate to send me any comments, be they positive or negative.  There
are many blank spots in the FAQ, please help fill them.

Copyright (c) 1994-2001, Dragomir R. Radev. All rights reserved.

Permission to distribute this FAQ by all volatile electronic means
(mailing lists, FTP, WWW, Usenet news, etc.) is hereby given under
the restriction that the file is not modified and all disclaimers and
acknowledgements remain intact.
This permission does NOT apply to CD-ROMS and/or commercial printed
publications. All requests for republication in this case should
be referred to the FAQ maintainer (

Many people have contributed to this FAQ. A list of credits is shown at the
end of the message.


[1] What is this FAQ all about
[2] What is Computational Linguistics
[3] What is
[4] How to get updates to this FAQ
[5] World-Wide Web resources.
[6] Which schools offer graduate programs in CL/NLP
[7] How to apply to graduate school in CL/NLP in the USA
[8] Organizations that are partly related to CL/NLP
[9] Major non-academic research laboratories
[10] What major publications exist in the field
[11] Bibliographies
[12] Electronic mailing lists
[13] Newsgroups
[14] Professional Organizations, Associations
[15] Major Conferences
[16] Evaluation Competitions
[17] How to join a mailing list
[18] How to obtain files by anonymous ftp
[19] FTP repositories
[20] What are some important books in NLP
[21] Encyclopedia of Artificial Intelligence
[22] Machine Translation
[23] What are the major accomplishments of the field
[24] Publishers
[25] Credits

Disclaimers and Notes

 1. Please read this FAQ list before posting to
 2. The FAQ is a collection of materials, rather than a complete reference.
    Some of the information may be out of date, so please be careful and
    take everything with a grain of salt. The maintainer, Dragomir R. Radev
    (, doesn't assume any responsibility for wrong
    information. The list of contributors to the FAQ appears at the end of
    this document.
 3. Any comments, contributions, and corrections are more than welcome.  
    Please help make the FAQ really helpful and interesting. 

[1] What is this FAQ all about

This is an attempt to put together a list of frequently (and not so
frequently) asked questions about Natural Language Processing and their
answers. This document is in no way perfect or complete or 100% accurate.
In no way should the maintainer be responsible for damage resulting 
directly or indirectly from using information in this FAQ.

The FAQ originated from Mark Kantrowitz's FAQ on AI. Some questions in
the present document come directly from Mark's original FAQ (available

This FAQ is maintained by Dragomir R. Radev of the University of
Michigan. Please send me all your comments, suggestions, corrections,
additions, and such to my e-mail address:
[2] What is Computational Linguistics

Computational linguistics (CL) is a discipline between linguistics and 
computer science which is concerned with the computational aspects of the 
human language faculty. It belongs to the cognitive sciences and overlaps 
with the field of artificial intelligence (AI), a branch of computer 
science that is aiming at computational models of human cognition. 
There are two components of CL: applied and theoretical.

The applied component of CL is more interested in the practical
outcome of modelling human language use. The goal is to create
software products that have some knowledge of human language.  Such
products are urgently needed for improving human-machine interaction
since the main obstacle in the interaction beween human and computer
is one of communication. Today's computers do not understand our
language, and humans have difficulties understand the computer's
language, which does not correspond to the structure of human thought.

Natural language interfaces enable the user to communicate with the 
computer in German, English or another human language.  Some applications 
of such interfaces are database queries, information retrieval from texts 
and so-called expert systems.  Current advances in recognition of spoken 
language improve the usability of many types of natural language systems.  
Communication with computers using spoken language will have a lasting
impact upon the work environment, opening up completely new areas of
application for information technology.

Although existing CL programs are far from achieving human ability, they 
have numerous possible applications. Even if the language the machine 
understands and its domain of discourse are very restricted, the use of 
human language can increase the acceptance of software and the productivity 
of its users.

Much older than communication problems between human beings and machines 
are those between people with different mother tongues.  One of the 
original goals of applied computational linguistics was fully automatic 
translation between human languages.  From bitter experience scientists 
have realized that they are far from achieving this.  Nevertheless,
computational linguists have created software systems which can simplify 
the work of human translators and clearly improve their productivity.  

The future of applied computational linguistics will be determined by the 
growing need for user-friendly software.  Even though the successful 
simulation of human language competence is not to be expected in the near 
future, computational linguists have numerous immediate research goals 
involving the design, realization and maintenance of systems which 
facilitate everyday work, such as grammar checkers for word processing 

Theoretical CL takes up issues in formal theories. It deals with
formal theories about the linguistic knowledge that a human needs for
generating and understanding language. Today these theories have
reached a degree of complexity that can only be managed by employing
computers.  Computational linguists develop formal models simulating
aspects of the human language faculty and implement them as computer
programmes. These programmes constitute the basis for the evaluation
and further development of the theories. In addition to linguistic
theories, findings from cognitive psychology play a major role in
simulating linguistic competence.  Within psychology, it is mainly the
area of psycholinguistics that examines the cognitive processes
constituting human language use.

The special attraction of computational linguistics lies in the combination 
of methods and strategies from the humanities, natural and behavioural 
sciences, and engineering.  

SEE ALSO: which contains:

* Chapter 1 of Christopher D. Manning and Hinrich Sch|tze, 1999,
  Foundations of Statistical Natural Language Processing, MIT Press,
  Cambridge, MA.
* Chapter 1 of Daniel Jurafsky and James H. Martin, 2000, Speech and 
  Language Processing, Prentice Hall, Upper Saddle River, New Jersey.

[3] What is

Here follows the original charter for 


Moderation:   This group will be unmoderated.

Purpose:      To discuss issues relating to natural language, especially
              computer-related issues from an AI viewpoint. The topics
              that will be discussed in this group will concentrate on, but
              are not limited to, the following:

                   *   Natural Language Understanding
                   *   Natural Language Generation
                   *   Machine Translation
                   *   Dialogue and Discourse Systems
                   *   Natural Language Interfaces
                   *   Parsing
                   *   Computational Linguistics
                   *   Computer-Aided Language Learning

              This group will avoid discussing issues that are more properly
              covered by other newsgroups. For example, speech synthesis
              should be discussed in comp.speech. However, due to the
              interdisciplinary nature of the field, there may be overlap in
              material between other groups. To try to keep this to a 
              minimum, topics should pertain to computer-related aspects
              of natural language.

Rules of Decorum:  Because of the unmoderated format, anyone with access to
                   this newsgroup will be able to post without review.
                   This is meant to encourage discussion of the topics.
                   Please refrain from "flames" or unnecessary criticism
                   of a person's viewpoints or personality in a harsh
                   or insulting manner. Criticisms should constructive
                   and polite whenever possible.

[4] How to get updates to this FAQ

This FAQ is available currently from the following newsgroups:, comp.answers,, and news.answers
It is posted once a month although updates are made less often.

The official archive of the above newsgroups is at MIT. You can get a
copy of the FAQ from

Another major site with lots of FAQs (including this one) is 

The current copy can also be retrieved from the following URL:

[5] World-Wide Web resources.


5.1.  The Association for Computational Linguistics site:

      The Association for Computational Linguistics is the major
      international organization in the field.

5.2.  The ACL NLP/CL Universe:

      The largest index of Computational Linguistics and Natural Language
      Processing resources on the Web. It features a search engine
      which should allow you to find specific NLP-related Web pages.

5.3.  The Computation and Language E-Print Archive

      The Computation and Language E-Print Archive is a fully automated
      electronic archive and distribution server for papers on 
      computational linguistics, natural-language processing, 
      speech processing, and related fields. 

5.4.  The Survey of the State of the Art of Human Language Technology

      This book surveys the state of the art of human language
      technology. The goal of the survey is to provide an interested reader
      with an overview of the field---the main areas of work, the
      capabilities and limitations of current technology, and the technical
      challenges that must be overcome to realize the vision of graceful
      human computer interaction using natural communication skills. 

5.5.  The Linguistic Data Consortium

      The Linguistic Data Consortium is an open consortium of universities,
      companies and government research laboratories. It creates, collects
      and distributes speech and text databases, lexicons, and other
      resources for research and development purposes. The University of
      Pennsylvania is the LDC's host institution. 

5.6. The Language Technology Helpdesk

      Frequently-asked questions of the Human COmmunication Research
      Centre at U. Edinburgh.


5.7.  Head-Driven Phrase Structure Grammar

      The HPSG offers current information relating to various aspects
      of the grammar formalism and linguistic theory of Head-Driven
      Phrase Structure Grammar, a constraint-based, lexicalist
      approach to grammatical theory that seeks to model human
      languages as systems of constraints on typed feature structures.

5.8.  Lexical Functional Grammar

      This site provides access to information about various aspects
      of the grammatical theory known as Lexical Functional Grammar

5.9.  Word Grammar

      This site houses publications on Word-Grammar and has some
      information on the group and its meetings.

[6] Which schools offer graduate programs in CL/NLP

This list is, *of course*, completely preliminary. Please send me 
information about other programs. I will try and get in touch with the
editors of the ACL guide to Graduate Programs in CL for more information.
Universities are given in alphabetical order. If a certain university
is not included now and you feel it must be included, please send me
some information about it.


Melbourne, University of
Microsoft Institute of Advanced Software Technology in association with
        Macquarie University


Montreal, University of
Ottawa, University of
Simon Fraser University
Toronto, University of
Waterloo, University of


Helsinki, University of


Paris 7, Jussieu, University of
Provence, University of


Bonn, University of
Heidelberg, University of
Humboldt University, Berlin
Koblenz-Landau, University of
Munich, University of
Osnabrueck, University of
Saarland, University of the
Potsdam, University of
Stuttgart, University of
Tuebingen, University of

Pisa, University of
Trento, University of


Kyoto University


Pohang University of Science and Technology, Pohang


Amsterdam, University of
Groningen, University of
Nijmegen, University of
Tilburg, University of
Utrecht, University of


Goteborg (Gothenburg), University of
Skoevde, University of
Uppsala, University of


Geneva, University of
Zurich, University of


Brighton, University of
Cambridge, University of
Durham, University of
Essex, University of
Edinburgh, University of
Sheffield, University of
Sussex, University of
University of Manchester Institute of Science and Technology


Brown University
Buffalo, SUNY at
California at Berkeley, University of
California at Los Angeles, University of
Carnegie-Mellon University
Columbia University
Cornell University
Delaware, University of
Duke University
Georgetown University
Georgia, University of
Georgia Institute of Technology
Harvard University
Indiana University
Information Sciences Institute (ISI) at the University of Southern California
Johns Hopkins University
Massachusetts at Amherst, University of
Massachusetts Institute of Technology
Michigan, University of
New Mexico State University
New York University
Ohio State University
Pennsylvania, University of
Rochester, University of
Southern California, University of
Stanford University
SUNY, Buffalo
Utah, University of
Wisconsin - Milwaukee, University of
Yale University

[7]How to apply to graduate school in CL/NLP in the USA

Usually, the best timetable is as follows (given that M is the month
when your studies would start, usually, in September)

        M - 24 : Try to clarify your interests: is it really NLP
                 that you are interested in? What possible
                 subfields might be of interest to you? ...etc.
                 Remember: 5 years working in an area you are
                           not interested in will be a very painful
        M - 18 : Read publications in the area of your interest
                 in order to discover the best places for
                 you. Pay close attention to the specific fields of
                 research: which professors are most active in  those
                 fields, and which institutions. 
                 Remember: Unless you are familiar with the most
                           current research, you will not be able
                           to find the best place for you.
        M - 18 : Go to your local library and consult some of the
                 available directories (see [3-3]) - write down
                 as much information as you can about some
                 15-25 universities. These universities form your
                 preliminary list.
                 Remember: There are some 100 universities in the
                           USA offering NLP/CL programs. Some of them
                           will be more attractive to you than others.
        M - 18 : Talk to your advisers at school, talk to other
                 students, post questions on the Internet, visit
                 departmental Web sites.
                 This way you will get advice on a few more univer-
                 sities that you might have skipped until this moment.
                 Remember: Others have faced what you are going
                           through. Use their experience.
        M - 15 : Send letters to the universities that you have
                 on your preliminary list. Make sure you indicate
                 when do you want to start, what degree (MA, MS,
                 Ph.D.) you are interested in, whether or not
                 you will be applying for financial aid, whether
                 you will need some special visa...
                 Remember: Ask for all the information that you
                           need; give them all the information they'd
                           need to satisfy your request.
        M - 12 : Read carefully the information that you have 
                 received from the universities. Shorten your list
                 of places to the number that you will eventually
                 apply to (usually 5-8 is a good number). 
                 Remember: Make sure you include both your best choice 
                           schools and some places where you are almost
                           certain of getting accepted.
        M - 10 : Fill in all the forms that are sent to you, 
                 ask your professors to send reference letters to 
                 the schools directly.
                 Remember: Professors will probably be very busy.
                           Give them the reference forms
                           as early as possible and make sure you 
                           specify a reasonable time for them to fill
                           them in and send them out.
        M - 10 : (or earlier) - take the necessary tests (GRE,
                 TOEFL, or others) that the schools want. Make sure
                 you tell the testing service which universities
                 you want them to send your scores to.
                 Remember: Time yourself through several practice
                           tests. The GRE General test, for example,
                           is more about mastery of timing than knowledge.
        M -  9 : (approximately) - mail your forms to the schools,
                 preferably 2-3 weeks before the deadlines.
                 Remember: You don't want your applications to get there
                           at the same time as everyone else. Give the
                           admissions committee some extra time to
                           review your application.
        M -  6 : usually six months before the beginning of the semester
                 that you are applying for, you will get a letter 
                 saying whether you have been accepted.
                 Remember: Usually, thick letters, e-mails, and telegrams
                           mean acceptance. Thin one-sheet letters will
                           most likely be disappointing for you.
        M -  5 : now, you have been accepted to a few schools. Go back
                 to the same resources that you used when you were 
                 deciding where to apply (journals, catalogs, directo-
                 ries, professors, etc.). Ask the schools that accepted
                 you to fly you in for a visit (many will do this).
                 Remember: Don't forget non-academic factors such as
                           location, financial aid, the atmosphere in
                           the department, etc.
[8] Organizations that are partly related to CL/NLP

International Assoc of MT (IAMT) and its daughters AMTA, EAMT, AAMT

ACM SIGIR (Special Interest Group in Information Retrieval)



[9] Major non-academic research laboratories

AT&T Labs - Research
BBN Systems and Technologies Corporation
DFKI (German research center for AI)
General Electric R&D
IRST, Italy
IBM T.J. Watson Research, NY
Lucent Technologies Bell Labs, Murray Hill, NJ
Microsoft Research, Redmond, WA
NEC Corporation
SRI International, Menlo Park, CA
SRI International, Cambridge, UK
Xerox, Palo Alto, CA
XRCE, Grenoble, France

[10] What major publications exist in the field


Computational Linguistics is the only publication devoted exclusively
to the design and analysis of natural language processing
systems. From this unique quarterly, university and industry
linguists, computational linguists, artificial intelligence (AI)
investigators, cognitive scientists, speech specialists, and
philosophers get information about computational aspects of research
on language, linguistics, and the psychology of language processing
and performance.

Published by The MIT Press for: The Association for Computational Linguistics. 



Dr B. K. Boguraev, IBM Thomas J. Watson Research Center, New York, USA
Professor Roberto Garigliano, University of Durham, UK
Dr John I. Tait, University of Sunderland, UK

Published: March, June, September and December. ISSN:1351-3249.

Natural Language Engineering is an international journal designed
to meet the needs of professionals and researchers working in all
areas of computerised language processing, whether from the
perspective of theoretical or descriptive linguistics, lexicology,
computer science or engineering. Its principal aim is to bridge the
gap between traditional computational linguistics research and the
implementation of practical applications with potential real-world
use. As well as publishing research articles on a broad range of
topics – from text analysis, machine translation and speech
generation and synthesis to integrated systems and multi modal
interfaces – the journal also publishes book reviews. Its aim is
to provide the essential link between industry and the academic community


Editors: Prof. S.J. Young & Dr. S.E. Levinson
Send manuscripts (worldwide apart from the Americas) to:
Prof. Steve Young, Cambridge University Engineering Dept.,
Trumpington Street, Cambridge, CB2 1PZ, England. 
Send manuscripts (from the Americas) to:
Dr. Steve Levinson, Head Linguistics Reseach,
AT&T Bell Laboratories, 600 Mountain Ave., 
Murray Hill, New Jersey 07974. USA. 
US Subscription rates are $170, with a personal rate of $75.
CS&L is published 4 times per year.
The address for subscription orders is:
Harcourt Brace and Company Limited,
High Street, Foots Cray, 
Sidcup, Kent, DA14 SHP. England.

Published 4 times annually. ISSN 0922-6567.
Subscriptions: Institutions $141 plus $16 postage; Individuals $55
(members of ACL $46).
Kluwer Academic Publishers, PO Box 322, 3300 AH Dordrecht, The
Netherlands, or Kluwer Academic Publishers, PO Box 358, Accord
Station, Hingham, MA 02018-0358. 

Published quarterly, since 1981.
Media Dimensions, New York, NY, USA

Published quarterly. ISSN 0167-806X
Subscriptions: Individual $59,-/Dfl.156,-; Institutional $200,-/Dfl.383,-
including p&h. Kluwer Academic Publishers
USA: Order Dept, Box 358, Accord Station, Hingham, MA 02018-0358. Phone 
(617) 871-6600; Fax (617) 871-6528; E-mail:
Other: P.O.Box 322, 3300 AH Dordrecht, The Netherlands. Phone (31) 78 
524400; Fax (31) 78 183273; Telex: kadc nl; E-mail:

Editors: Cotheart, Davies, Guttenplan, Harris, Humphreys, Leslie,
Smith, Wilson.
4 times annually
Blackwell Publishers, Oxford, UK.

Editor: Peter Gardenfors

[11] Bibliographies


   For information on a fairly complete bibliography of computational
   linguistics and natural language processing work from the 1980s, send
   mail to with the subject HELP. 

   The CSLI linguistics bibliography contains 3,300 entries in
   bib/tib/refer format. The bibliography is heavily slanted towards
   phonetics and phonology but also includes a fair amount of
   computational morphology, syntax, semantics, and psycholinguistics.
   The bibliography can be used with James Alexander's tib
   bibliography system, which is available from
   [] among other places. The bibliography itself is available
   by anonymous ftp from
   Contributions are welcome, but should be in tib format.
   For more information, contact Andras Kornai <>


   Robert Dale's Natural Language Generation (NLG) bibliography is
   available by anonymous ftp from [] 
   Note that it is formatted for A4 paper. Stick in a line 
      .94 .94 scale
   after the %! line to print on 8.5 x 11 paper. For further information,
   write to Robert Dale, University of Edinburgh, Centre for Cognitive
   Science, 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, or
   <> or <>.

   Mark Kantrowitz's Natural Language Generation (NLG) bibliography is
   available by anonymous ftp from [] 
   In addition to the tech report, the BibTeX file containing the
   bibliography is also available.  The bibliography contains more than
   1,200 entries. A searchable index to the bibliography is
   available via the URL
   Additions and corrections should be sent to 

[12] Electronic mailing lists

(This section is out of date - should be fixed for next release.)

Information Retrieval:                                                 
   irlist <>                                
Natural Language and Knowledge Representation (moderated):             (formerly                       
   Gatewayed to the newsgroup                       

Natural Language Generation:                                             

LFG (Lexical-Functional Grammar):

Statistics, Natural Language, and Computing:                          

Colibri (weekly update on Conferences, Seminars, Jobs and Shareware in
NLP and speech)
Dependency Grammar                                                                                                                

Text Analysis and Natural Language Applications:                           
Text Corpora:                                                         

Speech production and perception:                                    
   foNETiks <>                                      



Eastern (European) Language Engineering list:
   to join, send mail to
Preprint archive mailing list

  For further information about (among other topics) submission of papers to
  the server, subscribing or canceling your subscription, requesting full
  text of any of the papers above, retrieving macro files for these papers, 
  searching past listings, or submitting comments to the server operators,
  send a message:
     Subject: help

[13] Newsgroups

alt.usage.english          English grammar, word usages, and related
                           topics.           Natural language processing by computers.     Natural Language and Knowledge Representation.
comp.speech                Research & applications in speech science & 
sci.lang                   Natural languages, communication, etc.
alt.etext                  Electronic texts.
comp.text.sgml             ISO 8879 SGML structured documents markup
                           languages Information Retrieval topics. (Moderated)  General document understanding technologies
comp.internet.library      Discussing electronic libraries. (Moderated)

[14] Professional Organizations, Associations

Membership in the Association for Computational Linguistics is for the
calendar year, regardless of when dues are paid. Membership includes a
full year of the ACL journal, Computational Linguistics, reduced
registration at most ACL-sponsored conferences, and discounts on
ACL-sponsored publications. Payments for membership dues, fund
donations, back issues, and proceedings may be made in Europe or the


(The rest of this section is not up to date - should be fixed for next

655 Fifteenth Street, NW, Suite 310, Washington, DC 20005
Membership: $40 Associate members, $65 active members, Institutional $200,
Corporate $400. Members receive the MT News International and the
MT Yellow Pages. 

SIGNLL is the ACL Special Interest Group on Natural Language Learning
(language acquisition and related topics). To join, send mail to or use the forms on the SIGNLL home page. For
more information, see the SIGNLL home page at the URL

Membership: $50 individuals, $25 student. Add $15 overseas postage.
Members receive a copy of the journal Cognitive Science without
additional charge. Write to Alan Lesgold, Secretary/Treasurer,
Cognitive Science Society, LRDC, University of Pittsburgh, 3939
O'Hara Street, Pittsburgh, PA 15260, fax 1-412-624-9149, email 

AAAI, 445 Burgess Drive, Menlo Park, CA 94025.
phone 415-328-3123, fax 415-328-4457,,, 
Membership includes AI Magazine, and the AI Directory:
$50 regular, $20 student, $75 institution/library (US/Canadian)
$75 regular, $45 student, $100 institution/library (Foreign)
AAAI has several special interest groups (SIGs) on medicine,
manufacturing, business, and law. (Add $10/year for each subgroup.)
Life memberships $700 (US/Canadian), $1000 (Foreign)

[15] Upcoming Conferences

        Coling 2002 will be in Taipei, Taiwan. 

        The site for ACL 2002 will be announced in 2001. It is
        supposed to be held in North America. 

        Second meeting of the NAACL (NAACL'01), Pittsburgh, PA
        (June 2-7, 2001)

        39th Annual Meeting of the ACL (ACL'01) - joint with
        EACL'01, Toulouse, France (July 6-11, 2001)

For an updated list, check:

[16] Evaluation Competitions

TREC - Text Retrieval Conference
Information retrieval using NLP/statistical techniques.

NIST Spoken Language Technology Evaluations

DUC - Document Understanding Competition

[17] How to join a mailing list

A: Most often, you have to send mail to the listserver at the site where
   the mailing list resides, and put "subscribe <listname> <yourname> in the
   body of the mail message. The underlined text is what you have to type in.



      Subject: some text here
      subscribe LINGUIST Dragomir R. Radev

 [18] How to obtain files by anonymous ftp

A: There are many ways. The most common way, however, is using a local ftp 
   Suppose you want to get the file /pub/editors/webster.tar.Z

   Here is a sample session. You type in whatever is underlined here.

      Connected to
      220 ftp.UU.NET FTP server Thu Apr 14 15:45:10 EDT 1994) ready.
      Name ( anonymous

      331 Password required for  anonymous.
                ^^^^^^^^^^^^^^^^^^^^^  (put your email address here)

      230 Guest login ok, access restrictions apply.
      ftp> cd pub/editors
      ftp> binary
      ftp> get webster.tar.Z
      200 PORT command successful.
      150 Opening BINARY mode data connection for webster.tar.Z (148579 bytes).
      226 Transfer complete.
      local: webster.tar.Z remote: webster.tar.Z
      148579 bytes received in 2.2 seconds (67 Kbytes/s)
      ftp> quit 

[19] FTP repositories

(This section is out of date).

19.1. Consortium for Lexical Research (CRL)

  The Consortium for Lexical Research is designed to serve as a
  repository for software and resources of importance to the natural
  language processing research community. Sharable resources, and the
  task of centralizing lexical data and tools, are of foremost
  concern in lexical research and computational linguistics It
  is our objective to help alleviate the repeated recreation of
  basic software tools, and to assist in making essential data
  sources more generally available.

  CLR maintains a public ftp site, and a separate library of
  materials only for members of CLR. Currently CLR has about 60
  members, mostly academic institutions, and almost every major
  natural language processing center in the U.S. belongs. Access to
  the members-only materials is strictly regulated by password and

  Our catalog of current holdings is available by using anonymous
  ftp to

19.2. Oxford Text Archive (OTA)

  ota/textarchive.list         the current catalogue

  There are two classes of texts available from this FTP server:

  (a) texts which are in TEI format and which we can make freely
      available (these all appear as category P texts in the shortlist)

  (b) texts which are available only under our standard conditions of
      use, (these all appear as category U or A in the shortlist)

19.3. University of Michigan Linguistics Archive (UMICH)

  moderator: John Lawler (

[20] What are some important books in NLP


   Allen, James F., "Natural Language Understanding", The
   Benjamin/Cummings Publishing Company, Menlo Park, California,
   (Addison-Wesley Publishing Company, Reading, Massachusetts).

   Manning, C. and Schuetze, H. Foundations of Statistical Natural
   Language Processing. Hardcover - 680 pages (July 1999) 
   MIT Press; ISBN: 0262133601

   Jurafsky, D. and Martin, J. Speech and Language Processing.

   Gazdar, G. and Mellish, C., "Natural Language Processing in Lisp:
   An Introduction to Computational Linguistics", Addison-Wesley,
   Reading, Massachusetts, 1989. (There are three different editions
   of the book, one for Lisp, one for Prolog, and one for Pop-11.)

   Michael A. Covington, "Natural Language Processing for Prolog
   Programmers", Prentice-Hall, Englewood Cliffs, NJ, 1994. ISBN


   Rustin, Randall (ed.) "Natural Language Processing", Algorithmics Press,
   New York, NY, 1973. 

   Schank, Roger C., and Colby, Kenneth M. (eds.) "Computer Models of Thought
   and Language", W.H. Freeman, San Francisco, CA, 1973, 454 pp.

   Charniak, Eugene and Wilks, Yorick A. (eds.) "Computational Semantics",
   North-Holland, Amsterdam, Netherlands, 1976, 294 pp.

   Metzing, Dieter (ed.) "Frame Conceptions and Text Understanding",
   De Gruyter, Berlin, Germany, 1980, 167 pp. 

   Tennant, Harry R., "Natural Language Processing", Petrocelli Books, New 
   York, NY, 1981.

   Lehnert,  Wendy G., and Ringle, Martin H. (eds.) "Strategies for Natural

   Language Processing", Lawrence Erlbaum Associates, Hillsdale, NJ, 1982, 
   533 pp. 

   King, Margaret (ed.) "Parsing Natural Language", Academic Press, 
   London, England, 1983, 308 pp.

   Grosz, Barbara J., Sparck-Jones, Karen, and Webber, Bonnie L., eds.
   "Readings in Natural Language Processing", Morgan Kaufmann
   Publishers, Los Altos, CA, 1986, 664 pages. ISBN 0-934613-11-7, $44.95.

   Robert C. Berwick, "Computational Linguistics", MIT Press, 
   Cambridge, MA, 1989, ISBN 0262-02266-4.

   Brady, Michael, and Berwick, Robert C., eds. "Computational Models
   of Discourse", MIT Press, Cambridge, MA, 1983.

   Ralph Grishman, "Computational Linguistics: An Introduction",
   Cambridge University Press, New York, 1986, 193 pages.

   Terry Winograd, "Language as a Cognitive Process", Addison-Wesley,
   Reading, MA, 1983.

   Schank, R. and Abelson, R.  "Scripts, Plans, Goals, and Understanding,"
   Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1977.


   David Crystal, "A Dictionary of Linguistics and Phonetics", 3rd Edition,
   Basil Blackwell Publishers, New York, 1991.


   Tomita, M. (Editor), "Current Issues in Parsing Technology", 
   Kluwer Academic Publishers, Norwell, MA, 1991.

   Marcus, M.  "A Theory of Syntactic Recognition for Natural Language,"
   The MIT Press, Cambridge, MA, 1980.

   Pereira, F. and Sheiber, S.  "Prolog and Natural-Language Analysis,"
   Center for the Study of Language and Information, 1987.

Probabilistic Parsing:

   Ted Briscoe and John Carroll, "Generalised Probabilistic LR Parsing of
   Natural Language (Corpora) with Unification-based Grammars",
   University of Cambridge Computer Laboratory, Technical Report Number
   224, 1991.

   Zhi Biao Wu, Loke Soo Hsu, and Chew Lim Tan, "A Survey of Statistical
   Approaches to Natural Language Processing", Technical report TRA4/92,
   Department of Information Systems and Computer Science, National
   University of Singapore, 1992

Natural Language Understanding:

   Dyer, M.  "In-Depth Understanding:  A Computer Model of Integrated
   Processing for Narrative Comprehension,"  MIT Press, Cambridge, MA, 1983.

   Aravind Joshi, Bonnie Webber and Ivan Sag, eds. "Elements of Discourse
   Understanding", Cambridge University Press, New York, 1981.

   Cohen, P. R., Morgan, J. and Pollack, M., editors, "Intentions in
   Communication", MIT Press, Cambridge, MA, 1990.

Natural Language Interfaces:

   Raymond C. Perrault and Barbara J. Grosz, "Natural Language
   Interfaces", Annual Review of Computer Science, volume 1, J.F. Traub,
   editor, pages 435-452, Annual Reviews Inc., Palo Alto, CA, 1986.

Natural Language Generation:

   McKeown, Kathleen R. and Swartout, William R., "Language
   Generation and Explanation", in Zock, M. and Sabah, G.,
   editors, Advances in Natural Language Generation, Volume 1, Pages
   1-51, Ablex Publishing Company, Norwood, NJ, 1988. (Overview of
   the state of the art in natural language generation.)

   Mann, W. & S. Thompson. Rhetorical Structure Theory: a theory of
   text organization.


   Ronnie W. Smith and D. Richard Hipp, "Spoken Natural Language
   Dialog Systems: A Practical Approach", Oxford University Press, 
   ISBN #0-19-509187-6 

   John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech:
   The MITalk System", Cambridge University Press, 1987. [Synthesis,
   precursor of DECtalk.]

   Frank Fallside and William A. Woods (editors), "Computer Speech Processing"
   Prentice Hall, Englewood Cliffs, NJ, 1985. 

   X. D. Huang, Y. Ariki and M. A. Jack, "Hidden Markov Models for Speech
   Recognition", Edinburgh University Press, 1990. [Analysis]

   A. Nejat Ince (editor), "Digital Speech Processing: Speech Coding,
   Synthesis, and Recognition", Kluwer Academic Publishers, Boston,
   1992. [Analysis and Synthesis]

   Kai-Fu Lee, "Automatic Speech Recognition: The Development of the
   SPHINX System", Kluwer Academic Publishers, Boston, MA, 1989. [Analysis]

   Douglas O'Shaughnessy, "Speech Communication: Human and Machine"
   Addison-Wesley, MA, 1987. [Analysis and Synthesis]

   Lawrence R. Rabiner and Ronald W. Schafer, "Digital Processing of
   Speech Signals", Prentice Hall, Englewood Cliffs, NJ, 1978.
   [Analysis and Synthesis]

   Lawrence R. Rabiner and Biing-Hwang Juang, "Fundamentals of Speech
   Recognition", Prentice Hall, Englewood Cliffs, NJ, 1993.
   ISBN 0-13-015157-2. [Analysis]

   Ronald W. Schafer and John D. Markel (editors), "Speech Analysis",
   IEEE Press, New York, 1979. [Analysis]

   Alex Waibel and Kai-Fu Lee (editors), "Readings in Speech Recognition"
   Morgan Kaufmann Publishers, San Mateo, CA, 1990, 680 pages. 
   ISBN 1-55860-124-4, $49.95. [Analysis]

   Alex Waibel, "Prosody and Speech Recognition", Morgan Kaufmann
   Publishers, San Mateo, CA, 1988. [Analysis]

Machine Translation:

   W. John Hutchins and Harold L. Somers, "An Introduction to Machine
   Translation", Academic Press, San Diego, 1992. 362 pages, ISBN

   Bonnie J. Dorr, "Machine Translation: A View from the Lexicon" MIT
   Press, Cambridge, MA 1993. 432 pages, ISBN 0-262-04138-3.

   Kenneth Goodman and Sergei Nirenburg., editors, "The KBMT Project: A
   Case Study in Knowledge-Based Machine Translation", Morgan Kaufmann
   Publishers, San Mateo, CA, 1991. 331 pages, ISBN 1-558-60129-5, $34.95.

   Arnold, D.J.; Balkan, L.; Lee Humphreys, R.; Meijer, S.; and Sadler, L.
   (1994). Machine Translation: An Introductory Guide. NCC Blackwell. 

   The journal "Machine Translation" is the principle forum for
   current research.

   A review of MT systems on the market appeared in BYTE 18(1), January 1993.

Reversible Grammars:

   Tomek Strzalkowski, editor, "Reversible Grammar in Natural Language
   Processing", Kluwer Academic Publishers, 1993.

   Proceedings of the ACL Workshop on Reversible Grammar in Natural
   Language Processing, UC Berkeley, 1991. (See especially Remi
   Zajac's paper.)

Statistical Processing:
   Eugene Charniak, "Statistical Language Learning", MIT Press, Cambridge,
   Massachusetts, 1993, 170 pages.

Categorial Grammar (CG):
   M. Moortgat, "Categorial Investigations. Logical and Linguistic 
   Aspects of the Lambek Calculus", Groningen-Amsterdam Studies in 
   Semantics:9, Foris, Dordrecht, Holland, 1988.
   Richard T. Oehrle, Emmon Bach and Deirdre Wheeler, "Categorial 
   Grammars and Natural Language Structures", Studies in Linguistics 
   and Philosophy:32, D. Reidel Publishing Company, Dordrecht, 1988.
   Mary McGee Wood, "Categorial Grammars", Linguistic Theory Guides, 
   Routledge, London, 1993.
Dependency Grammar:

   Igor' Aleksandrovich Mel'cuk, "Dependency syntax : theory and 
   practice", State University Press of New York, 1987. 

Functional Grammar (aka Systemic Grammar):

   Michael A. K. Halliday, "An Introduction to Functional Grammar",
   Edward Arnold, London, 1985.

   Generalized Phrase Structure Grammar (GPSG):

   Gerald Gazdar, Ewan Klein, Geoffrey Pullum and Ivan Sag, 
   "Generalized Phrase Structure Grammar", Oxford:Blackwell, 1985.

Government and Binding (GB):

   Noam Chomsky, Lectures on government and binding, Foris Publications 

   Vivian J. Cook, "Chomsky's Universal Grammar: An Introduction", Basil
   Blackwell Publisher, New York, 1988, 201 pages.

   Victoria Fromkin and Robert Rodman, "An Introduction to Language",
   Holt, Rinehart, and Winston, New York, 4th edition, 1988, 474 pages.

   Liliane M.V. Haegeman, "Introduction to Government and Binding
   Theory", Basil Blackwell Publishers, Oxford, 1991, 618 pages.

   Geoffrey C. Horrocks, "Generative Grammar", Longman, London, 1987,
   339 pages. 

   Andrew Radford, "Transformational Grammar: A First Course", 
   Cambridge University Press, New York, 1988, 625 pages.

   Stabler, E.P. (1992). The Logical Approach to Syntax. Cambridge,
   Massachusetts: MIT Press, 1992.

Head-driven Phrase Structure Grammar (HPSG):

   Carl Pollard and Ivan Sag, "Information-based Syntax and Semantics", 
   Stanford:CSLI, University of Chicago Press, 1987.
   Pollard, Carl and Ivan A. Sag. 1994. Head-Driven Phrase Structure
   Grammar.  Chicago: University of Chicago Press and Stanford: CSLI

Lexical-Functional Grammar (LFG):

   Joan Bresnan (ed.), "The Mental Representation of Grammatical 
   Relations", Cambridge:MA, MIT Press, 1982.

   Dalrymple, Kaplan, Maxwell & Zaenen, eds. (1995) `Formal Issues in
   Lexical-Functional Grammar', CSLI Publications, Stanford CA
   (distributed by Cambridge University Press)

Tree Adjoining Grammar (TAG):

   A. Joshi, L. Levy and M. Takahasihi, "Tree Adjunct Grammars"
   In: Journal of Computer and System Sciences 10:136-63, 1975.

   A. Joshi, "An Introduction to Tree Adjoining Grammars"
   In: Alexis Manaster-Ramer (ed.), "The Mathematics of Language",
   Benjamins, Philadelphia, 1987.

Cognitive Grammar:
   Ronald W. Langacker, "Foundations of cognitive grammar" Stanford 
   University Press, 1987.

Programming for NLP:

   Pereira, Fernando C.N. and Shieber, Stuart  "Prolog and Natural-Language
   Analysis," Center for the Study of Language and Information, Stanford, CA
   1987, 264 pp.

   Gazdar, Gerald and Mellish, Christopher S., "Natural Language Processing in
   Lisp: An Introduction to Computational Linguistics", Addison-Wesley,
   Reading, Massachusetts, 1989. (There are three different editions
   of the book, one for Lisp, one for Prolog, and one for Pop-11.)

   Michael A. Covington, "Natural Language Processing for Prolog
   Programmers", Prentice-Hall, Englewood Cliffs, NJ, 1994. ISBN

   Peter Norvig. Paradigms of AI Programming


   Gazdar, Gerald, Alex Franz, Karen Osborne, and Roger Evans (1987).
   "Natural Language Processing in the 1980s: A Bibliography",  Center for 
   the Study of Language and Information (CSLI) lecture notes no. 12, CSLI,
   Stanford, CA, 240 pp.

Computational Morphology

   Richard Sproat, Morphology and Computation, MIT Press, Cambridge, 1992.

   Graeme D. Ritchie, Graham J. Russell, Allan W. Black, Stephen G. Pulman,
   Computational Morphology, MIT Press, Cambridge/London, 1992.


   Austin, J.L. How to do things with words.

   Searle, J.   Speech acts.

   Levinson, S. Pragmatics.

   Ross, Don, and Dan Brink (eds.) (1994) "Research in Humanities Computing 3: 
   Selected Papers from the ALLC/ACH Conference, Tempe, Arizona, March 1991," 
   Clarendon Press, Oxford, England.

   Gazdar, Gerald, Franz, Alex, Osborne, Karen, and Evans, Roger,
   "Natural Language Processing in the 1980s: A Bibliography", 
   Center for the Study of Language and Information (CSLI) lecture notes 
   no. 12, CSLI, Stanford, CA, 1987, 240 pp.

   _The Mulltilingual PC Directory_. By Ian Tresman. 254pp. 
   Stamford CT: Knowledge Computing Ltd.

   Stefan Wermter, Hybrid connectionist natural language processing
   Chapman & Hall Inc, 1995.

   Connectionist approaches to natural language processing.
   Edited by Ronan G. Reilly and Noel E. Sharky.
   Earlsdale, 1992 ISBN 0-86377-179-3

   _Natural Language Processing_.  Ed. Fernando C.N. Pereira and
   Barbara J. Grosz. A Bradford Book. Cambridge, MA, and London:
   The MIT Press, 1994. Rptd from _Artificial Intelligence: An
   International Journal_, Volume 63, Numbers 1-2 (1993).

   _Research in Humanities Computing 1: Selected Papers
   from the ALLC/ACH Conference, Toronto, June 1989_.
   Ed. Ian Lancashire. Oxford: Clarendon Press, 1991.

   Peter D. Smith, _An Introduction to Text Processing_.
   Cambridge MA and London: The MIT Press, 1990.
   ISBN 0-262-19299-3.

   Computer processing of natural language
   Author Gilbert K Krulee
   published Prentice Hall
   ISBN 0-13-610299-3

   Sadock, J.   Toward a linguistic theory of speech acts.

   Vanderveken, D. & J. Searle. Meaning and speech acts. (2 vols.)

 [21] Encyclopedia of Artificial Intelligence


                  Stuart C. Shapiro (editor) (John Wiley & Sons, 1992)

                                    compiled by:

                                William J. Rapaport

                           Department of Computer Science
                          and Center for Cognitive Science
                       State University of New York at Buffalo
                                 Buffalo, NY 14260

AUTHOR                          TITLE                                     PAGES

                               Volume 1:

Bookman, L. A.,
  & Alterman, R.       Analog Semantic Features                           27-28
Alvarado, S. J.        Argument Comprehension                             30-52
Kucera, H.             Brown Corpus                                     128-130
Srihari, S. N.,
  & Hull, J. J.        Character Recognition                            138-150
Ballard, B.,
  & Jones, M.          Computational Linguistics                        203-224
Hardt, S. L.           Conceptual Dependency                            259-265
Hindle, D.             Deep Structure                                   328-330
Ingria, R.;
  Boguraev, B.;
  & Pustejovsky,J.     Dictionary/Lexicon                               341-365
Scha, R.;
  Bruce, B. C.;
  & Polanyi,L.         Discourse Understanding                          365-379
Tennant, H.            Ellipsis                                         445-446
Novak, V.              Fuzzy Logic: Applications to Natural Language    515-521
Woods, W. A.           Grammar, Augmented Transition Network            552-563
Bruce, B.,
  & Moser, M. G.       Grammar, Case                                    563-570
Gazdar, G.             Grammar, Generalized Phrase Structure            570-573
Joshi, A. K.           Grammar, Phrase Structure                        573-580
Burton, R.             Grammar, Semantic                                580-583
Bateman, J. A.         Grammar, Systemic                                583-592
Mallery, J. C.;
  Hurwitz, R.;
  & Duffy,G.           Hermeneutics                                     596-611
Hill, J. C.            Language Acquisition                             761-772
Fass, D.,
  & Pustejovsky, J.    Lexical Decomposition                            806-812
Pustejovsky, J.        Lexical Semantics                                812-819

                               Volume 2:

Nagao, M.              Machine Translation                              898-902
Klavans, J. L.,
  & Tzoukermann, E.    Morphology                                       963-972
McDonald, D. D.        Natural-Language Generation                      983-997
Carbonell, J. G.,
  & Hayes, P. J.       Natural-Language Understanding                  997-1016
Petrick, S.            Parsing                                        1099-1109
Small, S. L.           Parsing, Word-Expert                           1109-1116
Wilks, Y.,
  & Fass, D.           Preference Semantics                           1183-1194
Cruse, D. A.           Presupposition                                 1194-1201
Dyer, M. G.;
  Cullingford, R. E.;
  & Alvarado, S. J.    Scripts                                        1443-1460
Sowa, J. F.            Semantic Networks                              1493-1511
Devlin, K. J.          Situation Theory and Situation Semantics       1541-1547
Briscoe, E. J.         Speech Recognition                             1553-1559
Norvig, P.             Story Analysis                                 1568-1576
Alterman, R.           Text Summarization                             1579-1587
Sparck Jones, K.       Thesaurus                                      1605-1613
Knight, K.             Unification                                    1630-1636

                  Additional articles from the 1st edition (1987):

Coelho, H.             Grammar, Definite Clause                         339-342
Berwick, R.            Grammar, Transformational                        353-361
Newmeyer, F. J.        Linguistics, Competence and Performance          503-508
Wilks, Y.              Machine Translation                              564-571
Tennant, H.            Menu-Based Natural Language                      594-597
Koskenniemi, K.        Morphology                                       619-620
Bates, M.              Natural-Language Interfaces                      655-660
Riesbeck, C. K.        Parsing, Expectation-Driven                      696-701
Keyser, S. J.          Phonemes                                         744-746
Webber, B.             Question Answering                               814-822
Smith, B. C.           Self-Reference                                 1005-1010
Hirst, G.              Semantics                                      1024-1029
Woods, W.              Semantics, Procedural                          1029-1031
Allen, J. F.           Speech Acts                                    1062-1065
Allen, J.              Speech Recognition                             1065-1070
Allen, J.              Speech Synthesis                               1070-1076
Briscoe, E. J.         Speech Understanding                           1076-1083
Lehnert, W. G.         Story Analysis                                 1090-1099

[22] Machine Translation

   Globalink, Inc
   9302 Lee Highway
   Fairfax, VA, 22031, USA
   Tel: +1 703 273 5600
   Fax: +1 703 273 3866
   Archers Translation Services
   203-205 Desborough Road
   High Wycombe, Bucks., HP11 2QL, UK
   Tel: +44 494 537755
   Fax: +44 494 474001

   Gesellschaft f|r multilinguale Systeme (GMS)
   Balanstr. 57
   81541 Munich, Germany
[23] What are the major accomplishments of the field (only up to 1987)

Note: This section is in a very preliminary stage.


Chomsky (1957) Syntactic Structures
Weizenbaum (1966), ELIZA
Woods (1967), Procedural semantics
Thorne et al. and Woods (1968-70), ATNs
Winograd (1970), Shrdlu
Colby, Weber & Hilf, 1971; Colby, 1975, PARRY
Wilks (1972), Preference semantics
Woods et al. (1972), LSNLIS / Lunar
Charniak (1972), Frames and demons
Wilks (1973), Stanford machine translation project
Montague (1973) IL semantics (Montague Grammar) in PTQ
Grosz (1977), Focus in task-oriented dialogues
Marcus (1977), Deterministic parsing
Davey (1978)
Cohen, Phil (1979), Planning speech acts
Allen (1980), Understanding speech acts
McDonald (1980), MUMBLE
Heim/Kamp (1981) Discourse Representation Theory
McKeown (1982), TEXT
Appelt (1982), KAMP (Integration of Functional Grammar with Discourse Plans)
Shieber (1984) Noncontextfreeness of NL syntax proven
   [note from Lillian Lee:
    Culy probably deserves co-credit w/Shieber for the non-CFness of
    NLs (see Pullum, "Footloose and Context-Free"). Although Pullum
    says there was an even earlier argument given in Dutch (don't have
    the article, but it's Pullum's "Nobody goes around at LSA meetings
    offering odds").]
Pollack (1986), Plan inference
Mann & Thompson (1987), Rhetorical Structure Theory

      Conceptual Dependency:

Schank (1969), Conceptual Dependency
Schank, Riesbeck, Rieger, Goldman (1975), MARGIE
Cullingford (1979), SAM
Wilensky (1979), PAM
DeJong (1980), FRUMP
Lebowitz (1980), IPP
Dyer (1982), BORIS
Lytinen (1986), MOPTRANS
Hovy (1986), PAULINE
Ram (1989), AQUA
Martin (1986) Direct Memory Access Parsing (DMAP)
Fitzgerald (1995) Indexed Concept Parsing

[24] Publishers

24.1. MIT Press  

24.2. Elsevier

24.3. Kluwer

24.4. Addison Wesley 

24.5. Cambridge University Press 

24.6. CSLI, Stanford

24.7. Springer Verlag

24.8. University of Chicago Press

24.9. Academic Press

[25] Credits

Large parts of the answers to Q. 10, 11, 14, and 20 come from Mark
Kantrowitz's FAQ. Q.2 is due to Hans Uszkoreit, Q.21 comes from
William Rapaport and Stuart Shapiro. Jan Daciuk compiled most of Q. 24.

 Partial list of contributors (in alphabetical order):

    Avery Andrews   
    Paul Buitelaar 
    Charles Brendan Callaway
    Russell Collingham
    Jan Daciuk     
    Robert Dale    
    Mary Dalrymple 
    Barbara di Eugenio
    Dan Fass       
    John Fry                 fry@Prosit.Stanford.EDU
    Joshua Goodman 
    Malcolm Grandis
    Graeme Hirst   
    Eduard Hovy    
    Mark Kantrowitz
    Stefan Langer  
    Alberto Lavelli
    Lillian Lee              llee@CS.Cornell.EDU
    John McNaught  
    David Pautler  
    Fred Popowich  
    Ashwin Ram     
    Daniel Radzinski
    William J. Rapaport
    David Reitter  
    Hinrich Schuetze         schuetze@Sante.Stanford.EDU
    Stuart Shapiro 
    Jakob Sommer   
    Kevin Thomas   
    R. M. Thomas   
    Hans Uszkoreit 
    Gertjan van Noord
    Ellen Voorhees 
    Jean Veronis   
    Carl Vogel     
    Phil Woodland  

Inferno Solutions
Hosting by

Закладки на сайте
Проследить за страницей
Created 1996-2024 by Maxim Chirkov
Добавить, Поддержать, Вебмастеру