Monday, August 10, 2009

japanese in mysql databases

grr.. what is it today... to get nihongo working in mysql

1) verify that the issue is the db encoding, it needs to read line this example

mysql -u[usename] -p[password] [database]
show variables like "%character%";show variables like "%collation%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

2) the fix;
sudo emacs /etc/mysql/my.cnf

then add/edit the lines (in the correct sections)
[mysqld]
default-character-set=utf8

[mysql]
default-character-set=utf8

then reboot the server
/etc/init.d/mysql restart

then recreate the db from scratch or export all the data/recreate and import it..

parsing a japanese file to create a word list

mecab japaneseIn.txt | egrep -v "^EOS" | egrep -v "記号|\w助詞" | sed "s/\t.*//" | sort | uniq -c | sort -n -r | grep -v "^\s*[0-9]*\s*[0-9a-zA-Z,./\\<>?_;:@{}^~。*%()\-]*$" > wordListOut.txt

mecab Asic-wiki.txt | grep -v "記号,.*,\*,\*" | grep -v "名詞,数,\*,\*,\*,\*,\*" | grep -v "名詞,サ変接続,\*,\*,\*,\*,\*" | grep -v "^EOS"

verbs;
mecab Asic-wiki.txt | grep "助詞"

verb(dictionary form) occurance count;
mecab Asic-wiki.txt | grep "動詞" | egrep "一段|五段" | sed "s/.*[一五]段[^,]*,[^,]*,\([^,]*\),.*/\1/" | sort | uniq -c | sort -n -r

verbs transform occurance count;
mecab Asic-wiki.txt | grep "動詞" | egrep "一段|五段" | sed "s/.*[一五]段[^,]*,\([^,]*\),.*/\1/" | sort | uniq -c | sort -n -r

of course these are hacky 1 liners i have a ruby fuction that parsers the file and gives me the various results

the critical column is given by this
mecab Asic-wiki.txt | sed "s/[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,\([^,]*\).*/\1/"

rails "bugs" 2 belongs to pointing to the same table

2 belongs_to pointing to the same table and searching fails...

(rdb:1)  Customer.find(:all, :conditions => ["#{Address.table_name}.name_given = 'kdFJf6J6Mdx'"], :include => [:address, :shipping_address] )
[]
(rdb:1)  Customer.find(:all, :conditions => ["#{Address.table_name}.name_given = 'kdFJf6J6Mdx'"], :include => [:shipping_address, :address] )
[#]

decide for your self...
1) The problem can be fixed by using the :joins option. to search both fields
2) The problem can be exploted to expand the full set of child records linked via has_many with eager loading while searching for a field contained in the children to locate the the parents.. Generally searches from the parent into child will limit the children to only the matching ones...(which is a problem 90% of the time)

mecab - japanese parser

Mecab is an excellent parser for Japanese text im using it to auto generate a weighted flash card list from the relevant sections of Japanese news papers... the hardest part of studying Japanese is figuring out which words and phrases are applicable for your field of expertise..

but mecab is a pain in the butt to install in ubuntu the default output isnt utf-8 which is a total pain... heres the fixed install install instructions

Download the main mecab and the ipa dict from here;
http://sourceforge.net/projects/mecab/files/

Then run these commands... note that i had to monkey fix the config LDFLAGS for ubuntu...

tar xvzf mecab-0.97.tar.gz mecab-0.97/
tar xzvf mecab-ipadic-2.7.0-20070801.tar.gz mecab-ipadic-2.7.0-20070801/
cd mecab-0.97/
LDFLAGS=-R/usr/local/lib ./configure  --with-charset=utf8
make
make check
sudo make install
cd ..
cd mecab-ipadic-2.7.0-20070801/
make
LDFLAGS=-R/usr/local/lib ./configure  --with-charset=utf8
make
sudo make install

Check here for Japanese version of the instructions and download links:
http://mecab.sourceforge.net/#install-unix

http://sourceforge.jp/projects/mecab/lists/archive/users/2007-January/000204.html

Thursday, August 6, 2009

ruby on rails - dumping the stack -hack..

maybe not the best.. but im a semicon guy and my rails stuff is for play so sue me

begin
  asd
rescue => e
  logger.info ""
  logger.info "skip_full_validation #{ret} -  #{self.shipping_flag}"
  bt = e.backtrace.delete_if{ |l| !(/meat-guy/ =~ l) }
  logger.info YAML::dump(bt)
end

passwordless shutdown of ubuntu

the correct way to edit the /etc/sudoers file is via "sudo visudo". using "chmod u+w /etc/sudoers" is a no-no

if you brick it then restart the computer with physical access and enter recovery mode and drop to the root prompt and fix the file.. reinstall is not required

to get a password less shutdown go;
sudo visudo 

add the line and save to its default location
%admin ALL=NOPASSWD: /sbin/shutdown

then
sudo chmod +s /sbin/reboot

disabling clear text password access to ssh

Disable Password Authentication

To disable password authentication,
vi /etc/ssh/sshd_config 

replace it with a line that looks like this:
#PasswordAuthentication yes
PasswordAuthentication no

/etc/init.d/ssh restart

make certain that u have at least one public ssh key in the relevant users .ssh/authorized_keys file

Wednesday, July 22, 2009

notes for for a harddisk less laptop as a server

I got given an old T40 (its harddisk had died) I have set it up as a ligth weight low power server running a 8gb usb key as it hdd. Heres how;

Old laptops (in this case a ibm T40) dont have the "pae" feature needed for a ubuntu server 9.04 install u need to install the desktop version

Download the iso of the i386 ubuntu desktop liveCD as a bootable install (i dont use CDs or DVDs anymore)
http://www.ubuntu.com/GetUbuntu/download

Forget the CD bs..total waste... grab around this grab a 1GB usb key and unetbootin the iso into
http://unetbootin.sourceforge.net/

Now to kill the junk
sudo apitiude 

Select the gnome and x11 pieces and hit "-" key to mark them delete them hit "!" to fix any mistakes it makes and the "g" and "g" to get the removally running

You might need to repeat a few times to get out of dependancy hell and get rip it all out. some of the last steps you many need to pull piece by pieve and examine the dependances to make certian of that it not going to destablise the system.

Also if u take out "network-manager" you'll loose networking from the box... i was sshed at the time and noticed it very quickly to fix it get a physical login up and do;
sudo vi /etc/network/interfaces

then add the lines;
auto eth0
iface eth0 inet dhcp

Next to automatically wipe the all the libs and junk that are hanging around for gnome do this;
sudo apt-get install deborphan 
sudo aptitude purge `deborphan --guess-all`

https://help.ubuntu.com/community/SwapFaq
(temp)mounting swap and tmp into the ram to reduce the usb read write stress
sudo mkswap -f /dev/ram15
sudo swapon /dev/ram15
sudo swapoff /dev/sda5
cat /proc/swaps

(perm)mounting swap in ram edit /etc/fstab and add..
/dev/ram0 none swap sw 0 0

If you mount swap into ram YOU MUST disable Hibernation as it writes this data to swap and without power the ram data is lost

(temp) mounting tmp into ram to reduce threshing
....

Sunday, May 3, 2009

Ubuntu virtual screen size

Clearly since im a workaholic I have build my own home servers. I run ubuntu 8(hardy), basically because its the most easy linux flavour to admin. My high power systems are normally screenless and keyboardless. So manage them like every other sys admin does; with ssh -X and the builtin vnc (vino). The catch is at some point they normal are setup to use an cruddy old lcd that i have lying around and this results in screen size crunch when I vnc to it..

You dont want to shift the real size the display to cater for the bigger vnc size because if you brick the box then you might need physical access, which means dusting of that old cruddy lcd. So you make the vitural screen size bigger here is the hack...

$ cd /etc/X11
$ sudo cp xorg.conf.YYYYMMDD
$ sudo vi xorg.conf


Why use the date as YYMMDD.. basic its sortable.. forget the crazy US dates this is a computer, "ls -1" it with > 1000 files and you'll get the point....

Now Locate "Screen" section and add the Virtual line.. or the entire "Display" sections if you need to add "Display" SubSection double check existing display mode from System > Preferences > Screen Resolution

Section "Screen"
  Identifier      "Default Screen"
  Monitor         "Configured Monitor"
  Device          "Configured Video Device"
  SubSection "Display"
    Depth   24
    Modes   "960x600"
    Virtual 1280 1024
  EndSubSection
EndSection