|
|
|
|
|
|
|
|
|
|
100C
Joined: 09 Dec 2003 Posts: 34 Location: Japan Country: |
Posted: Sun Apr 18, 2004 3:25 pm Post subject: Programs that convert Kanji+Kana to Romaji |
|
|
Probably, you want to translate English into Japanese with machine translation service or translation software. However, they are outputted in Kanji+Kana sentence. In order to know how it should be pronounced, you have to consult a dictionary repeatedly.
But, I found there are application programs which automates it.
One of them is a browser of shareware. The another is free software which needs a little knowledge of DOS usage.
1.Browser
HIRAGANA Navi.V3 http://kids.knowledgewing.com/
Originally, it is developed for the schoolchild who is learning Kanji, and has the function to change Kanji into Kana. Moreover, there is also a function changed into Roman alphabet.
Operation is easy, but required to pay 4,000 yen for using it exceeding the trial period for 20 days.
2.Free tool
KAKASI - Kanji Kana Simple Inverter http://kakasi.namazu.org/index.html.en
KAKASI is the language processing filter to convert Kanji characters to Hiragana, Katakana or Romaji and may be helpful to read Japanese documents.
You use it in command-prompt (DOS Window).
ex.
c:\kakasi\bin>kakasi -s -U -Ja -Ha -Ka < kanji.txt > romaji.txt
Input(kanji.txt):
KAKASI�͊������ȍ����蕶���Ђ炪�Ȃ�[�}���ɕϊ����邱�Ƃ�ړI�Ƃ���
Output(romaji.txt):
KAKASI ha KANJI kana MAJI ri BUN wohiraganaya ro^ma JI ni HENKAN surukotowo MOKUTEKI toshite
|
|
Back to top |
|
|
|
|
|
|
|
|
|
Wyckd
Joined: 12 Apr 2004 Posts: 41 Location: United States of Los Angeles Country: |
|
Back to top |
|
|
|
|
|
|
|
|
|
Azumi
Joined: 15 Apr 2004 Posts: 122
|
Posted: Sat May 01, 2004 10:40 pm Post subject: |
|
|
|
|
Back to top |
|
|
|
|
|
|
|
|
|
100C
Joined: 09 Dec 2003 Posts: 34 Location: Japan Country: |
Posted: Sun May 02, 2004 2:24 pm Post subject: |
|
|
u r welcome.
and there is also web version of it.(it can convert text and webpage)
http://www.j-talk.com/nihongo/
�EKanji to spaced Hiragana
�EReadings in brackets
�EKanji to Romaji
�EDetailed word info(not websites)
�EKanji with rollover Hiragana, and translation in pop up!
|
|
Back to top |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Azumi
Joined: 15 Apr 2004 Posts: 122
|
Posted: Sat Dec 04, 2004 5:10 am Post subject: |
|
|
There is no install. Just extract it to C:\Kakasi\ .It's the default path. To run it, you need to launch the application in c:\kakasi\bin\ using a command window.
You can also create a text file in c:\kakasi\bin, copy this text:
kakasi -s -U -Ja -Ha -Ka<kanji.txt> romaji.txt
and save the file as a .bat instead of .txt.
So, you just need to double-click the .bat file, instead of typing the big command line all the time.
I'm not sure if I'm clear enough...
|
|
Back to top |
|
|
|
|
|
|
|
|
|
sunilkkn
Joined: 07 Dec 2004 Posts: 6 Location: tvm Country: |
Posted: Tue Dec 07, 2004 7:01 pm Post subject: Programs that convert Kanji+Kana to Romaji |
|
|
I have visited many sites for the exe of Kakasi. But I didn't get it. Also when I extracted the source file, there is no bin folder is created. So can you send me the bin folder so that I can check Kakasi. Also I could not create the exe from the source files. It is not compiling successfully.
Also can You tell me about the configurations of kakasi so that I can run the source file extracted.
Heard that Kakasi itself tokenizing the kanji before it is translated in to romaji. Can you explain about it ?
Is there any APIs available for Kakasi ? also the exe of Kakasi is very urent. Can you send the exe or can you say the path. I am using windows OS and the binary files are not available in http://kakasi.namazu.org/ for windows
Also is there any programs for the conversion of hiragana and katakana to romaji ?
Thanks in Advance
Sunil
|
|
Back to top |
|
|
|
|
|
|
|
|
|
groink
Joined: 01 Jan 1970 Posts: 1223
|
Posted: Wed Dec 08, 2004 3:12 am Post subject: Re: Programs that convert Kanji+Kana to Romaji |
|
|
sunilkkn wrote: | I have visited many sites for the exe of Kakasi. But I didn't get it. Also when I extracted the source file, there is no bin folder is created. So can you send me the bin folder so that I can check Kakasi. Also I could not create the exe from the source files. It is not compiling successfully. |
Dude, you've got to put a little more effort into it...
http://www.namazu.org/win32/kakasi-2.3.4.zip
As for an API... There is a perl API being designed here:
http://search.cpan.org/dist/Text-Kakasi/
You can then use the perl API as a CGI, then use a web page as the user interface.
I'm going a similar route... I'm writing a BBcode MOD for phpBB (JDorama.com and d-addicts.com uses phpBB) to utilize kakasi. It can then be used by anyone in a posting. It'll look something like this in the phpBB editor:
Code: | This is a test:
[kakasi]KAKASI �́A�������Ȃ܂��蕶���Ђ炪�ȕ���[�}�����ɕϊ����邱�Ƃ�ړI�Ƃ��č쐬�����v���O�����Ǝ����̑��̂ł��B[/kakasi] |
--- groink
|
|
Back to top |
|
|
|
|
|
|
|
|
|
sunilkkn
Joined: 07 Dec 2004 Posts: 6 Location: tvm Country: |
Posted: Wed Dec 08, 2004 3:04 pm Post subject: Programs that convert Kanji+Kana to Romaji |
|
|
Thanx groink. I got the executable of kakasi and it is running. Also the zip includes the libraries.
Kakasi is fine. i hope, I can use it for my requirement. My requirements are to convert kanji, kana to romaji and tokenizing the japanese string.
When I visited the url http://kakasi.namazu.org/, I got the source code of kakasi in C language. And in that source code, I found many main method.
For running Kakasi, whether we need only kakasi.c and kakasi.h ? I am asking about c, because I don't know perl. Do you know, how to make executable of kakasi from the c program ?
Thanx for the executable of kakasi.
Sunil
|
|
Back to top |
|
|
|
|
|
|
|
|
|
sunilkkn
Joined: 07 Dec 2004 Posts: 6 Location: tvm Country: |
|
Back to top |
|
|
|
|
|
|
|
|
|
100C
Joined: 09 Dec 2003 Posts: 34 Location: Japan Country: |
|
Back to top |
|
|
|
|
|
|
|
|
|
groink
Joined: 01 Jan 1970 Posts: 1223
|
Posted: Thu Dec 09, 2004 3:40 am Post subject: |
|
|
100C wrote: | You may used Kanji code "UTF-8".
kakasi only supports with old JIS, new JIS, EUC, DEC, SHIFTJIS.
Not Unicode(like UTF-8 UTF-8N UTF-16....etc).
So you have to make input text file wiih these supported Kanji code. |
That's exactly correct. Some text editors, especially notepad.exe, saves as UTF-8.
Also, if you download web pages to text files, make sure that the encoding method used to render these pages are Shift-JIS or similar. For Internet Explorer users, you can force this encoding mode on a particular page by selecting View -> Encoding -> Japanese (Shift-JIS). Then, once the page is re-rendered, you can save the page as a text file.
--- groink
|
|
Back to top |
|
|
|
|
|
|
|
|
|
Azumi
Joined: 15 Apr 2004 Posts: 122
|
Posted: Thu Dec 09, 2004 10:47 am Post subject: |
|
|
|
|
Back to top |
|
|
|
|
|
|
|
|
|
sunilkkn
Joined: 07 Dec 2004 Posts: 6 Location: tvm Country: |
|
Back to top |
|
|
|
|
|
|
|
|
|
groink
Joined: 01 Jan 1970 Posts: 1223
|
Posted: Fri Dec 10, 2004 4:14 am Post subject: Re: Tokenize japanese text |
|
|
sunilkkn wrote: | Yes. Thanx for your comments. Actually I have done the same. I saved the file kanji.txt with utf-8 format. that was why I got wrong answer. Thank you for your comments. Now I tested all the kanji texts with the kakasi front end and got the correct answer. Anybody know how to make the API (c and header files) runnable ? |
I'd first off study the perl interface I posted earlier. See how that interface communicates with kakasi, then possibly reverse engineer the code and re-work it in C++. My C++ is pretty weak seeing I haven't worked with that language in eons, so I can't help you there.
--- groink
|
|
Back to top |
|
|
|
|
|
|
|
|
|
sunilkkn
Joined: 07 Dec 2004 Posts: 6 Location: tvm Country: |
|
Back to top |
|
|
|
|
|
|
|
|
|
100C
Joined: 09 Dec 2003 Posts: 34 Location: Japan Country: |
Posted: Wed Dec 15, 2004 1:26 am Post subject: |
|
|
I do not understand most meanings of the word "tokenizer." However, I can read Japanese.
(It is hard to read English for me.)
Juman reads from Standard-In and outputs to Standard-Out.
It understands only SJIS among some kanji codes.
Therefore, when it was operated as follows, it operated normally.
Code: | C:\Program Files\juman>juman -b
脳動脈の閉塞
脳 (のう) 脳 普通名詞
動脈 (どうみゃく) 動脈 普通名詞
の (の) の 接続助詞
閉塞 (へいそく) 閉塞 サ変名詞
EOS
^Z
C:\Program Files\juman>juman -b < input.txt
脳 (のう) 脳 普通名詞
動脈 (どうみゃく) 動脈 普通名詞
の (の) の 接続助詞
閉塞 (へいそく) 閉塞 サ変名詞
EOS
C:\Program Files\juman>juman -b < input.txt > output.txt
C:\Program Files\juman>type output.txt
脳 (のう) 脳 普通名詞
動脈 (どうみゃく) 動脈 普通名詞
の (の) の 接続助詞
閉塞 (へいそく) 閉塞 サ変名詞
EOS
C:\Program Files\juman>
|
|
|
Back to top |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|