-
Notifications
You must be signed in to change notification settings - Fork 507
MeCab
This page describes interaction with MeCab and how to get in running.
MeCab is a word segmentation tool for Japanese and can be found at:
Like most academic software MeCab has a few rough edges, but we will get you up and running in a jiffy with some knowledge about software porting. We'll even make sure it runs in its own directory not depending on being able to root (MeCab has a strong desire to be in /usr/local
but we will dodge that).
These instructions assume that we are installing the 0.98 version of MeCab and the 2.7.0 version of the IPA dictionary.
Get the following files:
- mecab (SourceForge Link)
- mecab-ipadict (SourceForge Link)
- mecab-python (SourceForge Link)
Create a directory and extract the source code.
mkdir mecab
cd mecab
mv ${PATH_TO_MECAB_DOWNLOADS}/mecab-*.tar.gz ./
find . -name '*.tar.gz' | xargs -n 1 tar xfz
We will install MeCab in this directory, thus we need a local
to place it in.
mkdir local
Now, we configure, compile and install MeCab.
( cd mecab-0.98 && ./configure --prefix=`pwd`/../local --enable-utf8-only && make install clean )
Then the same for the dictionaries.
( cd mecab-ipadic-2.7.0-20070801 && env PATH="${PATH}:`pwd`/../local/bin" \
./configure --prefix=`pwd`/../local --with-charset=utf8 && make install clean )
Do a dry-run with the MeCab binary.
echo '鴨かも?' | local/bin/mecab
Now we only have to build the Python SWIG bindings.
( cd mecab-python-0.98 && env PATH="${PATH}:`pwd`/../local/bin" \
python setup.py build_ext --inplace --rpath `pwd`/../local/lib )
We want to try out the bindings, but first we patch test.py
since it doesn't have an encoding specified.
sed -i -e '2i# -*- coding: utf-8 -*-' mecab-python-0.98/test.py
Then we are ready to go.
( cd mecab-python-0.98 && python test.py )