39 | | == Using the Grammar == |
40 | | |
41 | | Here are some slightly out of date instructions on how to install and run the grammar. Currently the easiest way to install it is through the [http://wiki.delph-in.net/moin/LkbInstallation automated installation] with the option {{{--jacy}}}. |
42 | | |
43 | | |
44 | | Thanks to Francis Bond, Stephan Oepen, Atsuko Shimada, Ulrich Callmeier and Yoshihiro Morimoto for helping to set these up. |
45 | | |
46 | | This is needed for running the Jacy grammar : |
47 | | |
48 | | * Basic requirements: Installation of ACL6.0 with CLIM and all patches, Linux installed with Japanese, [http://www.openmotif.org Open Motif] |
49 | | * The [http://wiki.delph-in.net/moin/LkbTop LKB] grammar development system (You will find detailed installation instructions there) |
50 | | * The [http://chasen.naist.jp ChaSen] morphological analyzer (You will find detailed installation instructions there) |
51 | | |
52 | | |
53 | | Although the lkb will run standalone, there are problems with Japanese input. The recommended way to run it is from inside emacs, using the eli interface. Install the lkb and eli (following [http://www-csli.stanford.edu/~aac/emacslkb.html these instructions]) |
54 | | |
55 | | |
56 | | Problems or questions concerning LKB in general can be directed to lkb-bugs@csli.stanford.edu |
57 | | |
58 | | You need to run Lisp with the EUC locale (ja_JP.EUC-JP) and be sure emacs uses EUC for the process encoding in the *common-lisp* buffer. Use the [http://www.delph-in.net/jacy/.emacs.jp .emacs.jp] file and adapt the paths. Then, your .emacs must be told that the .emacs.jp exists: |
59 | | {{{ |
60 | | (when (file-exists-p (concat user-home "/.emacs.jp")) (load (concat user-home "/.emacs.jp") nil t t)) |
61 | | }}} |
62 | | |
63 | | You will also need the file [http://www.delph-in.net/jacy/.clinit.cl .clinit.cl]. Finally, for running [incr tsdb()] and PET on the Japanese grammar, you will need [http://www.delph-in.net/jacy/.tsdbrc .tsdbrc] |
64 | | |
65 | | Now load everything, LKB, MRS, plus [incr tsdb()]: |
66 | | |
67 | | Open emacs |
68 | | |
69 | | Start Lisp with |
70 | | |
71 | | {{{ |
72 | | M-x japanese |
73 | | }}} |
74 | | |
75 | | {{{ |
76 | | :ld ~/src/lkb/src/general/loadup |
77 | | (pushnew :lkb *features*) |
78 | | (pushnew :mrs *features*) |
79 | | (compile-system "tsdb" :force t) |
80 | | }}} |
81 | | |
82 | | Load the grammar with |
83 | | {{{ |
84 | | (read-script-file-aux "~/japanese/lkb/ascript") |
85 | | }}} |
86 | | (your path to the grammar). |
87 | | |
88 | | You can parse a sentence by typing |
89 | | {{{ |
90 | | (do-parse-tty "SENTENCE") |
91 | | }}} |
92 | | in the emacs window. |
93 | | |
94 | | |
95 | | If you have any questions, please write an email to: siegel.melanie@gmail.com or the [http://lists.delph-in.net/mailman/listinfo/jacy Jacy mailing list]. |
96 | | |
97 | | == Using Jacy with [incr tsdb()] == |
98 | | |
99 | | Install itsdb, following the instructions in the manual. |
100 | | |
101 | | The latest version of Jacy and versions of itsdb later than 2003-05-20 should work as is with Japanese. |
102 | | {{{ |
103 | | M-x itsdb |
104 | | }}} |
105 | | |
106 | | Note: Japanese test sentences should be in euc-jp. |
107 | | |
108 | | To get itsdb to count Japanese words, you need to segment the test sentences at some stage. This can be done during import. |
109 | | |
110 | | If there is a _global_ 'preprocessing hook', [incr tsdb()] import will pipe everything through it and use the _second_ value that it returns as the 'i-length' field; e.g. |
111 | | {{{ |
112 | | (setf *tsdb-preprocessing-hook* "lkb::chasen-preprocess-for-pet") |
113 | | }}} |
114 | | will enable that hook globally, and once you use a definition of this function that counts correctly (no good doing length() on a variable _after_ using the destructive nreverse() on it :-{), you will notice that (i) imports from text files are much slower and (ii) 'Browse -- Test Items' will show Chasen word counts for the 'i-length' field. |
115 | | |
116 | | Note that because the import can now take actual time (half a second per item or so), the [incr tsdb()] progress meter should advance correctly during the import from text file function (this does not work on versions older than 2003-06), |
117 | | |
118 | | There is an example of user-fns.lsp for JaCY that enables the *tsdb-preprocessing-hook*, when [incr tsdb()] is loaded _before_ the grammar. (You could also set this in ~/.tsdbrc, but then it would affect everything you do, no matter which grammar was used.) |
119 | | |
120 | | from user-fns.lsp: |
121 | | {{{ |
122 | | ;;; |
123 | | ;;; hook for [incr tsdb()] to call when preprocessing input (going to the PET |
124 | | ;;; parser or when counting 'words' while import test items from a text file). |
125 | | ;;; |
126 | | |
127 | | (defun chasen-preprocess-for-pet (input) |
128 | | |
129 | | (preprocess-sentence-string input :verbose nil :posp t)) |
130 | | |
131 | | #+(or :pvm :itsdb) |
132 | | |
133 | | (setf tsdb::*tsdb-preprocessing-hook* "lkb::chasen-preprocess-for-pet") |
134 | | }}} |
135 | | |
136 | | == Using Jacy with PET == |
137 | | |
138 | | Install PET following the instructions at [http://wiki.delph-in.net/moin/PetTop PET]. |
139 | | |
140 | | You need to segment the Japanese, for example by preprocessing with chasen: |
141 | | |
142 | | {{{ |
143 | | chasen -F"%m " | cheap ~/japanese/japanese.grm |
144 | | }}} |
145 | | |
146 | | reading 'pet/japanese.set'... |
147 | | |
148 | | loading 'japanese.grm' (Japanese (jan-03)) |
149 | | |
150 | | 16674 types in 1.7 s |
151 | | |
152 | | == Using Jacy with itsdb and PET == |
153 | | |
154 | | Install itsdb and PET. |
155 | | |
156 | | You can run Japanese with a cpu defined in your .tsdbrc (substituting your pathnames). |
157 | | |
158 | | After starting lkb-ja and itsdb in emacs: |
159 | | |
160 | | Choose the cpu in the normal way by evaluating |
161 | | {{{ |
162 | | (tsdb::tsdb :cpu :nihongo :file t) |
163 | | }}} |
164 | | in the *common-lisp* buffer. |
165 | | |
166 | | The preprocessor calls a function defined in usr-fns.lisp that runs chasen on the input, the combination of "-yy" "-default-les" takes the output and produces default lexical types for unknown words. |
167 | | |
168 | | == Using Jacy with Heart of Gold == |
169 | | |
170 | | Wrapper modules and sample configurations for Chasen, SProUT with Japanese Named Entity Recognition and PET with Jacy are included in the [http://heartofgold.opendfki.de Heart of Gold source tree], installation is described in the [http://heartofgold.dfki.de/Documentation.html Heart of Gold user and developer documentation]. |
171 | | |
172 | | = Jacy References = |
173 | | |
174 | | Siegel, Melanie and Emily M. Bender (2002): Efficient Deep Processing of Japanese. In |
175 | | Proceedings of the 3rd Workshop on Asian Language Resources and International |
176 | | Standardization. Coling 2002 Post-Conference Workshop. Taipei, Taiwan. |
177 | | |
178 | | Oepen, Stephan, Emily M. Bender, Uli Callmeier, Dan Flickinger and Melanie Siegel (2002): |
179 | | Parallel Distributed Grammar Engineering for Practical Applications. In Proceedings of the |
180 | | Workshop on Grammar Engineering and Evaluation. Coling 2002 Post-Conference Workshop. |
181 | | Taipei, Taiwan. |
182 | | |
183 | | Bender, Emily M. (2002): Number Names in Japanese: A Head-Medial Construction in a Head-Final Language. Linguistic Society of America. |
184 | | |
185 | | Kiefer, B., H.-U. Krieger and M. Siegel (2000): An HPSG-to-CFG Approximation of Japanese. In |
186 | | Proceedings of Coling 2000, Saarbrücken. |
187 | | |
188 | | Siegel, Melanie (2000): HPSG Analysis of Japanese. In:W.Wahlster(ed.): Verbmobil: |
189 | | Foundations of Speech-to-Speech Translation., Springer Verlag. |
190 | | |
191 | | Siegel, Melanie (2000): Japanese Honorification in an HPSG Framework. In Proceedings of the |
192 | | 14th Pacific Asia Conference on Language, Information and Computation, ed. A. Ikeya and M. |
193 | | Kawamori, 289-300. Waseda University International Conference Center, Tokyo. |
194 | | Logico-Linguistic Society of Japan. |
195 | | |
196 | | Siegel, Melanie (1999): The Syntactic Processing of Particles in Japanese Spoken Language. In: |
197 | | Wang, Jhing-Fa and Wu, Chung-Hsien (eds.): Proceedings of the 13th Pacific Asia Conference on |
198 | | Language, Information and Computation, Taipei 1999. |
199 | | |
200 | | Siegel, Melanie (1998): Japanese Particles in an HPSG Grammar. Verbmobil-Report 220. |
201 | | Universität des Saarlandes. |
202 | | |