Monday, August 10, 2009

mecab - japanese parser

Mecab is an excellent parser for Japanese text im using it to auto generate a weighted flash card list from the relevant sections of Japanese news papers... the hardest part of studying Japanese is figuring out which words and phrases are applicable for your field of expertise..

but mecab is a pain in the butt to install in ubuntu the default output isnt utf-8 which is a total pain... heres the fixed install install instructions

Download the main mecab and the ipa dict from here;
http://sourceforge.net/projects/mecab/files/

Then run these commands... note that i had to monkey fix the config LDFLAGS for ubuntu...

tar xvzf mecab-0.97.tar.gz mecab-0.97/
tar xzvf mecab-ipadic-2.7.0-20070801.tar.gz mecab-ipadic-2.7.0-20070801/
cd mecab-0.97/
LDFLAGS=-R/usr/local/lib ./configure  --with-charset=utf8
make
make check
sudo make install
cd ..
cd mecab-ipadic-2.7.0-20070801/
make
LDFLAGS=-R/usr/local/lib ./configure  --with-charset=utf8
make
sudo make install

Check here for Japanese version of the instructions and download links:
http://mecab.sourceforge.net/#install-unix

http://sourceforge.jp/projects/mecab/lists/archive/users/2007-January/000204.html

No comments:

Post a Comment