Home-made Japanese-English Dictionary for Kobo

Hi everyone,

here is a script that allows everyone to enrich the Japanese dictionary shipped with Kobo firmwares with English translations from eDict (and/or Japanese3).

I was using tshering's excellent dictionary, but the new firmware 3.16.10 brought a much extended Japanese-Japanese dictionary, so I decided to write a script that enriches this dictionary with English definitions.

The current status with the Japanese-Japanese dictionary from 3.16.10 is that of in total 805521 entries in the dictionary, 274464 are translated (where edict was used for 265953, and Japanese3 for 8511).

The following is from the blog entry Japanese-English dictionary for Kobo.

The most current version is available via github here: Github project kobo-ja-dict-enhance, and the direct link to the script: enhance-dictionary.pl

Prerequisites

The script is neither fool-proof nor does everything by itself. Furthermore, it requires a set of programs. In details:

Unix/Linux computer I havent tried anything of this on a Windows machine, but I am happy about feedback. I am working on a version that does not depend on external programs, and thus might be much more portable.
Dictionaries A copy of the Edict dictionary see below for details on dictionaries.
7z A standard zip/unzip program that also takes the locale into account when unpacking (different to unzip that I have access to)
Perl modules Various Perl modules that should be standard on most installations: Getopt::Long, File::Temp, File::Basename, and Cwd.

Supported dictionaries

At the moment the program can use the following two dictionaries as sources: Edict2 and Japanese3.

Edict2

The Edict dictionary is a free dictionary which is the base of most other dictionaries. Created by the JMdict/EDICT Project it provides a very complete Japanese-English dictionary.

To use the dict with the current program, one need to download the edict2.gz file and unpack it with gunzip edict2.gz. If you put it into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --edict PATH-TO-EDICT to specify the location of the edict2 file.

Japanese3

The Japanese3 is a dictionary application for iOS which provides another very complete Japanese-English dictionary. My feeling is that it is 90% and more based on Edict, so adding it will not buy you much, but still a bit (see below for how much!).

If you have purchased this dictionary/application, and you manage to get access to your iOS device (via jailbreaking or some other tools), then you need to get the file Japanese3.db from the application folder, and then generate a file via sqlite3 as follows:

Code:

$ sqlite3 Japanese3.db

.output japanese3-data

select Entry, Furigana, Summary from entries;

If you save the generated file japanese3-data into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --japanese3 PATH-TO-J3-DATA to specify the location of the Japanese3 data file.

Command line options

The program supports the following command line options (but this is due to change when new features are added!)

Code:

  -h, --help            Print this message and exit.

  -v, --version         Print version and exit.

  --info=STR            Print info found on STR in dictionaries and exit.

  -i, --input=STR       location of the original Kobo GloHD dict

                          default: dicthtml-jaxxdjs.zip

  -o, --output=STR      name of the output file

                          default: dicthtml-jaxxdjs-TIMESTAMP.zip

  --dicts=STR           dictionaries to be used, can be given multiple times

                        possible values are 'edict2' and 'japanese3'

                        The order determines the priority of the dictionary.

                        If *not* given, all found dictionaries are used.

  -e, --edict=STR       location of the edict2 file

                          default: edict2

  -j, --japanese3=STR   location of the japanese3 file

                          default: japanese3-data

  --keep-input          keep the unpacked directory

  --keep-output         keep the updated output directory

  -u, --unpacked=STR    location of an already unpacked original dictionary

  --unpackedzipped=STR  location of an already unpacked original dictionary

                        where the html files are already un-gzipped

Note that in case you pass in the option --unpacked, the files should be properly named (encodings are a problem!). Furthermore, note that the unpacked directory contains lots of .gif and .html files that are actually gzip-compressed files, but lacking the .gz extension. If you ave already ungzipped the html file, you can use --unpackedzipped.

If you want to unpack the dictionary, by advised that the file names in the Zip directory are already encoded in UTF-8, but normal programs (unzip, 7z) assume that they are encoded in some other encoding. Thus, if your are using UTF-8 locale, it is necessary to set LC_CTYPE to C to make sure that the encoding is used, as in

Code:

LC_CTYPE=C 7z x ...

Typical run

In the following example we use both dict2 and Japanese3 dictionaries, and prefer Edict2 (which is the default). When not passing in any --dicts option, the program searches for both available dictionaries, which are found in this case.

Code:

$ perl enhance-dictionary.pl

Using the following dictionaries as source for translations: edict2 japanese3

loading edict2 ... done

loading Japanese3 data ... done

unpacking original dictionary ... done

loading dict files ... done

searching for words and updating ... done

total words 805521, matches: 274464 (edict: 265953, japanese3: 8511)

creating output html ... done

creating update dictionary in dicthtml-jaxxdjs-201508072201.zip ... done

$

You see, quite some words have received a translation.

Installing the dictionary

After having created the enhanced dictionary, one can install the generated file as KOBO/.kobo/dicts/dicthtml-jaxxdjs.zip (where KOBO is the mount point of the eReader). The dictionary should be picked up automatically. And the next lookup should give you something like the following:

There is only one caveat: Syncing with Kobo will re-download the original dictionary and overwrite the enhanced one. There is at least one solution here, which I am employing, see this post.

Future plans

I am planning to get rid of the 7z dependency and use Archive::Zip for all the unpacking and packing. This would also allow to do everything in memory and thus make it faster.

Conclusions

Even if Kobo does not provide us with a decent Japanese-English dictionary, adding at least a huge amount of translations to the current dictionary is now easily possible. For serious students of Japanese who are reading or starting to read Japanese eBooks, this will be of great help.

Enjoy, and dont forget to give feedback, suggestions, and improvements either here or better via the Github page.

Home-made Japanese-English Dictionary for Kobo

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List