Changes between Version 13 and Version 14 of WikiStart


Ignore:
Timestamp:
11/20/07 04:03:29 (10 years ago)
Author:
francis_bond
Comment:

Remove most of the documentation, added pointer to the DELPHIN wiki.

Legend:

Unmodified
Added
Removed
Modified
  • WikiStart

    v13 v14  
    1212 
    1313 
    14 = Overview = 
    15  
    16 The Jacy grammar is a broad-coverage linguistically precise grammar of Japanese. 
    17 It is based on the [http://hpsg.stanford.edu HPSG] formalism with MRS semantics. 
    18 [http://wiki.delph-in.net/moin/LkbTop LKB] is the primary grammar development environment, but the grammar processing can be efficiently done with [http://wiki.delph-in.net/moin/PetTop PET]. 
    19  
    20 The first application of the Japanese HPSG was the Verbmobil system, a spoken language machine translation project, where the Japanese HPSG was used in deep processing of appointment scheduling and travel reservation dialogues. The grammar was also used in an industrial application of automatic email response. The grammar also contributed to the EU project [http://www.dfki.de/deepthought DeepThought], where the main focus is on building applications for combined shallow and deep natural language processing. This project is multilingually oriented, such that much effort is put on multilingual approaches to grammatical phenomena and building a matrix grammar that can be used as the basis for the development of further grammars. 
    21  
    22 Current development is mainly being done by Francis Bond at [http://www2.nict.go.jp/x/x161/en/index.html NiCT], with help from Takayuki Kuribayashi and Chikara Hashimoto.  
    23  
    24 Melanie Siegel is the original principal Jacy developer.  Major contributions came from Emily Bender (University of Washington), especially concerning the MRS construction and numeral expressions. Stephan Oepen (Universitetet i Oslo & CSLI Stanford) contributed support on the grammar development environment, Japanese font encodings and inclusion of [http://chasen.naist.jp ChaSen]. Ulrich Callmeier (acrolinx GmbH) contributed the requirements for letting the grammar run on his fast and efficient PET system. 
    25 Akira Ohtani, Chikara Hashimoto (Kyoto University), Francis Bond, Sanae Fujita, Shigeko Nariyama and Takaaki Tanaka (NTT Communication Science Laboratories - Machine Translation Research Group) contributed grammar extensions, especially for verbal compounds and relative sentence constructions, and many lexicon entries. Ulrich Schaefer integrated [http://chasen.naist.jp ChaSen], Japanese Named Entity Recognition via [http://sprout.dfki.de SProUT] and [http://wiki.delph-in.net/moin/PetTop PET] with the Jacy grammar into the [http://heartofgold.dfki.de Heart of Gold] middleware for robust parsing of Japanese text, adding automatic translations of Chasen's EUC-JP byte offsets to Unicode character counts. 
    26  
    27 A presentation explaining grammar fundamentals can be [http://www.delph-in.net/jacy/jacy.pdf downloaded].  There is some on-line [http://wiki.delph-in.net/moin/JacyDoc documentation] available. 
     14The main documentation for Jacy is at the DELPHIN wiki: [http://wiki.delph-in.net/moin/JacyTop] 
    2815 
    2916It would be nice if you'd give a short feedback about the usage of the grammar. 
     
    3724Maintainer: Francis Bond <bond@ieee.org>  
    3825 
    39 == Using the Grammar == 
    40  
    41 Here are some slightly out of date instructions on how to install and run the grammar.  Currently the easiest way to install it is through the [http://wiki.delph-in.net/moin/LkbInstallation automated installation] with the option {{{--jacy}}}. 
    42  
    43  
    44 Thanks to Francis Bond, Stephan Oepen, Atsuko Shimada, Ulrich Callmeier and Yoshihiro Morimoto for helping to set these up. 
    45  
    46 This is needed for running the Jacy grammar : 
    47  
    48     * Basic requirements: Installation of ACL6.0 with CLIM and all patches, Linux installed with Japanese, [http://www.openmotif.org Open Motif] 
    49     * The [http://wiki.delph-in.net/moin/LkbTop LKB] grammar development system (You will find detailed installation instructions there) 
    50     * The [http://chasen.naist.jp ChaSen] morphological analyzer (You will find detailed installation instructions there) 
    51          
    52  
    53 Although the lkb will run standalone, there are problems with Japanese input. The recommended way to run it is from inside emacs, using the eli interface. Install the lkb and eli (following [http://www-csli.stanford.edu/~aac/emacslkb.html these instructions]) 
    54   
    55  
    56 Problems or questions concerning LKB in general can be directed to lkb-bugs@csli.stanford.edu 
    57  
    58 You need to run Lisp with the EUC locale (ja_JP.EUC-JP) and be sure emacs uses EUC for the process encoding in the *common-lisp* buffer. Use the [http://www.delph-in.net/jacy/.emacs.jp .emacs.jp] file and adapt the paths. Then, your .emacs must be told that the .emacs.jp exists: 
    59 {{{ 
    60 (when (file-exists-p (concat user-home "/.emacs.jp")) (load (concat user-home "/.emacs.jp") nil t t)) 
    61 }}} 
    62  
    63 You will also need the file [http://www.delph-in.net/jacy/.clinit.cl .clinit.cl]. Finally, for running [incr tsdb()] and PET on the Japanese grammar, you will need [http://www.delph-in.net/jacy/.tsdbrc .tsdbrc] 
    64  
    65 Now load everything, LKB, MRS, plus [incr tsdb()]: 
    66  
    67 Open emacs 
    68  
    69 Start Lisp with 
    70  
    71 {{{ 
    72 M-x japanese 
    73 }}} 
    74  
    75 {{{ 
    76 :ld ~/src/lkb/src/general/loadup 
    77 (pushnew :lkb *features*) 
    78 (pushnew :mrs *features*) 
    79 (compile-system "tsdb" :force t) 
    80 }}} 
    81  
    82 Load the grammar with 
    83 {{{ 
    84 (read-script-file-aux "~/japanese/lkb/ascript") 
    85 }}} 
    86 (your path to the grammar). 
    87  
    88 You can parse a sentence by typing 
    89 {{{ 
    90 (do-parse-tty "SENTENCE") 
    91 }}} 
    92 in the emacs window. 
    93   
    94  
    95 If you have any questions, please write an email to: siegel.melanie@gmail.com or the [http://lists.delph-in.net/mailman/listinfo/jacy Jacy mailing list]. 
    96  
    97 == Using Jacy with [incr tsdb()] == 
    98  
    99 Install itsdb, following the instructions in the manual. 
    100  
    101 The latest version of Jacy and versions of itsdb later than 2003-05-20 should work as is with Japanese. 
    102 {{{ 
    103 M-x itsdb 
    104 }}} 
    105  
    106 Note: Japanese test sentences should be in euc-jp. 
    107  
    108 To get itsdb to count Japanese words, you need to segment the test sentences at some stage. This can be done during import. 
    109  
    110       If there is a _global_ 'preprocessing hook', [incr tsdb()] import will pipe everything through it and use the _second_ value that it returns as the 'i-length' field; e.g. 
    111 {{{ 
    112 (setf *tsdb-preprocessing-hook* "lkb::chasen-preprocess-for-pet") 
    113 }}} 
    114       will enable that hook globally, and once you use a definition of this function that counts correctly (no good doing length() on a variable _after_ using the destructive nreverse() on it :-{), you will notice that (i) imports from text files are much slower and (ii) 'Browse -- Test Items' will show Chasen word counts for the 'i-length' field. 
    115  
    116 Note that because the import can now take actual time (half a second per item or so), the [incr tsdb()] progress meter should advance correctly during the import from text file function (this does not work on versions older than 2003-06), 
    117  
    118 There is an example of user-fns.lsp for JaCY that enables the *tsdb-preprocessing-hook*, when [incr tsdb()] is loaded _before_ the grammar. (You could also set this in ~/.tsdbrc, but then it would affect everything you do, no matter which grammar was used.) 
    119  
    120 from user-fns.lsp: 
    121 {{{ 
    122 ;;; 
    123 ;;; hook for [incr tsdb()] to call when preprocessing input (going to the PET 
    124 ;;; parser or when counting 'words' while import test items from a text file). 
    125 ;;; 
    126  
    127 (defun chasen-preprocess-for-pet (input) 
    128  
    129 (preprocess-sentence-string input :verbose nil :posp t)) 
    130  
    131 #+(or :pvm :itsdb) 
    132  
    133 (setf tsdb::*tsdb-preprocessing-hook* "lkb::chasen-preprocess-for-pet") 
    134 }}} 
    135  
    136 == Using Jacy with PET == 
    137  
    138 Install PET following the instructions at [http://wiki.delph-in.net/moin/PetTop PET]. 
    139  
    140 You need to segment the Japanese, for example by preprocessing with chasen: 
    141  
    142 {{{ 
    143 chasen -F"%m " | cheap ~/japanese/japanese.grm 
    144 }}} 
    145  
    146   reading 'pet/japanese.set'... 
    147  
    148   loading 'japanese.grm' (Japanese (jan-03)) 
    149  
    150   16674 types in 1.7 s 
    151  
    152 == Using Jacy with itsdb and PET == 
    153  
    154 Install itsdb and PET. 
    155  
    156 You can run Japanese with a cpu defined in your .tsdbrc (substituting your pathnames). 
    157  
    158 After starting lkb-ja and itsdb in emacs: 
    159  
    160 Choose the cpu in the normal way by evaluating 
    161 {{{ 
    162 (tsdb::tsdb :cpu :nihongo :file t)  
    163 }}} 
    164 in the *common-lisp* buffer. 
    165  
    166 The preprocessor calls a function defined in usr-fns.lisp that runs chasen on the input, the combination of "-yy" "-default-les" takes the output and produces default lexical types for unknown words. 
    167   
    168 == Using Jacy with Heart of Gold == 
    169  
    170 Wrapper modules and sample configurations for Chasen, SProUT with Japanese Named Entity Recognition and PET with Jacy are included in the [http://heartofgold.opendfki.de Heart of Gold source tree], installation is described in the [http://heartofgold.dfki.de/Documentation.html Heart of Gold user and developer documentation]. 
    171  
    172 = Jacy References = 
    173  
    174 Siegel, Melanie and Emily M. Bender (2002): Efficient Deep Processing of Japanese. In 
    175 Proceedings of the 3rd Workshop on Asian Language Resources and International 
    176 Standardization. Coling 2002 Post-Conference Workshop. Taipei, Taiwan. 
    177  
    178 Oepen, Stephan, Emily M. Bender, Uli Callmeier, Dan Flickinger and Melanie Siegel (2002): 
    179 Parallel Distributed Grammar Engineering for Practical Applications. In Proceedings of the 
    180 Workshop on Grammar Engineering and Evaluation. Coling 2002 Post-Conference Workshop. 
    181 Taipei, Taiwan. 
    182  
    183 Bender, Emily M. (2002): Number Names in Japanese: A Head-Medial Construction in a Head-Final Language. Linguistic Society of America. 
    184  
    185 Kiefer, B., H.-U. Krieger and M. Siegel (2000): An HPSG-to-CFG Approximation of Japanese. In 
    186 Proceedings of Coling 2000, Saarbrücken. 
    187  
    188 Siegel, Melanie (2000): HPSG Analysis of Japanese. In:W.Wahlster(ed.): Verbmobil: 
    189 Foundations of Speech-to-Speech Translation., Springer Verlag. 
    190  
    191 Siegel, Melanie (2000): Japanese Honorification in an HPSG Framework. In Proceedings of the 
    192 14th Pacific Asia Conference on Language, Information and Computation, ed. A. Ikeya and M. 
    193 Kawamori, 289-300. Waseda University International Conference Center, Tokyo. 
    194 Logico-Linguistic Society of Japan. 
    195  
    196 Siegel, Melanie (1999): The Syntactic Processing of Particles in Japanese Spoken Language. In: 
    197 Wang, Jhing-Fa and Wu, Chung-Hsien (eds.): Proceedings of the 13th Pacific Asia Conference on 
    198 Language, Information and Computation, Taipei 1999. 
    199  
    200 Siegel, Melanie (1998): Japanese Particles in an HPSG Grammar. Verbmobil-Report 220. 
    201 Universität des Saarlandes. 
    202   
    20326 
    20427