Hi everyone,
here is a script that allows everyone to enrich the Japanese dictionary shipped with Kobo firmwares with English translations from eDict (and/or Japanese3).
I was using tshering's excellent dictionary, but the new firmware 3.16.10 brought a much extended Japanese-Japanese dictionary, so I decided to write a script that enriches this dictionary with English definitions.
The current status with the Japanese-Japanese dictionary from 3.16.10 is that of in total 805521 entries in the dictionary, 274464 are translated (where edict was used for 265953, and Japanese3 for 8511).
The following is from the blog entry Japanese-English dictionary for Kobo.
The most current version is available via github here: Github project kobo-ja-dict-enhance, and the direct link to the script: enhance-dictionary.pl
Prerequisites
The script is neither fool-proof nor does everything by itself. Furthermore, it requires a set of programs. In details:
Supported dictionaries
At the moment the program can use the following two dictionaries as sources: Edict2 and Japanese3.
Edict2
The Edict dictionary is a free dictionary which is the base of most other dictionaries. Created by the JMdict/EDICT Project it provides a very complete Japanese-English dictionary.
To use the dict with the current program, one need to download the edict2.gz file and unpack it with gunzip edict2.gz. If you put it into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --edict PATH-TO-EDICT to specify the location of the edict2 file.
Japanese3
The Japanese3 is a dictionary application for iOS which provides another very complete Japanese-English dictionary. My feeling is that it is 90% and more based on Edict, so adding it will not buy you much, but still a bit (see below for how much!).
If you have purchased this dictionary/application, and you manage to get access to your iOS device (via jailbreaking or some other tools), then you need to get the file Japanese3.db from the application folder, and then generate a file via sqlite3 as follows:
If you save the generated file japanese3-data into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --japanese3 PATH-TO-J3-DATA to specify the location of the Japanese3 data file.
Command line options
The program supports the following command line options (but this is due to change when new features are added!)
Note that in case you pass in the option --unpacked, the files should be properly named (encodings are a problem!). Furthermore, note that the unpacked directory contains lots of .gif and .html files that are actually gzip-compressed files, but lacking the .gz extension. If you ave already ungzipped the html file, you can use --unpackedzipped.
If you want to unpack the dictionary, by advised that the file names in the Zip directory are already encoded in UTF-8, but normal programs (unzip, 7z) assume that they are encoded in some other encoding. Thus, if your are using UTF-8 locale, it is necessary to set LC_CTYPE to C to make sure that the encoding is used, as in
Typical run
In the following example we use both dict2 and Japanese3 dictionaries, and prefer Edict2 (which is the default). When not passing in any --dicts option, the program searches for both available dictionaries, which are found in this case.
You see, quite some words have received a translation.
Installing the dictionary
After having created the enhanced dictionary, one can install the generated file as KOBO/.kobo/dicts/dicthtml-jaxxdjs.zip (where KOBO is the mount point of the eReader). The dictionary should be picked up automatically. And the next lookup should give you something like the following:
![]()
There is only one caveat: Syncing with Kobo will re-download the original dictionary and overwrite the enhanced one. There is at least one solution here, which I am employing, see this post.
Future plans
I am planning to get rid of the 7z dependency and use Archive::Zip for all the unpacking and packing. This would also allow to do everything in memory and thus make it faster.
Conclusions
Even if Kobo does not provide us with a decent Japanese-English dictionary, adding at least a huge amount of translations to the current dictionary is now easily possible. For serious students of Japanese who are reading or starting to read Japanese eBooks, this will be of great help.
Enjoy, and dont forget to give feedback, suggestions, and improvements either here or better via the Github page.
here is a script that allows everyone to enrich the Japanese dictionary shipped with Kobo firmwares with English translations from eDict (and/or Japanese3).
I was using tshering's excellent dictionary, but the new firmware 3.16.10 brought a much extended Japanese-Japanese dictionary, so I decided to write a script that enriches this dictionary with English definitions.
The current status with the Japanese-Japanese dictionary from 3.16.10 is that of in total 805521 entries in the dictionary, 274464 are translated (where edict was used for 265953, and Japanese3 for 8511).
The following is from the blog entry Japanese-English dictionary for Kobo.
The most current version is available via github here: Github project kobo-ja-dict-enhance, and the direct link to the script: enhance-dictionary.pl
Prerequisites
The script is neither fool-proof nor does everything by itself. Furthermore, it requires a set of programs. In details:
- Unix/Linux computer I havent tried anything of this on a Windows machine, but I am happy about feedback. I am working on a version that does not depend on external programs, and thus might be much more portable.
- Dictionaries A copy of the Edict dictionary see below for details on dictionaries.
- 7z A standard zip/unzip program that also takes the locale into account when unpacking (different to unzip that I have access to)
- Perl modules Various Perl modules that should be standard on most installations: Getopt::Long, File::Temp, File::Basename, and Cwd.
Supported dictionaries
At the moment the program can use the following two dictionaries as sources: Edict2 and Japanese3.
Edict2
The Edict dictionary is a free dictionary which is the base of most other dictionaries. Created by the JMdict/EDICT Project it provides a very complete Japanese-English dictionary.
To use the dict with the current program, one need to download the edict2.gz file and unpack it with gunzip edict2.gz. If you put it into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --edict PATH-TO-EDICT to specify the location of the edict2 file.
Japanese3
The Japanese3 is a dictionary application for iOS which provides another very complete Japanese-English dictionary. My feeling is that it is 90% and more based on Edict, so adding it will not buy you much, but still a bit (see below for how much!).
If you have purchased this dictionary/application, and you manage to get access to your iOS device (via jailbreaking or some other tools), then you need to get the file Japanese3.db from the application folder, and then generate a file via sqlite3 as follows:
Code:
$ sqlite3 Japanese3.db
.output japanese3-data
select Entry, Furigana, Summary from entries;
Command line options
The program supports the following command line options (but this is due to change when new features are added!)
Code:
-h, --help Print this message and exit.
-v, --version Print version and exit.
--info=STR Print info found on STR in dictionaries and exit.
-i, --input=STR location of the original Kobo GloHD dict
default: dicthtml-jaxxdjs.zip
-o, --output=STR name of the output file
default: dicthtml-jaxxdjs-TIMESTAMP.zip
--dicts=STR dictionaries to be used, can be given multiple times
possible values are 'edict2' and 'japanese3'
The order determines the priority of the dictionary.
If *not* given, all found dictionaries are used.
-e, --edict=STR location of the edict2 file
default: edict2
-j, --japanese3=STR location of the japanese3 file
default: japanese3-data
--keep-input keep the unpacked directory
--keep-output keep the updated output directory
-u, --unpacked=STR location of an already unpacked original dictionary
--unpackedzipped=STR location of an already unpacked original dictionary
where the html files are already un-gzipped
Note that in case you pass in the option --unpacked, the files should be properly named (encodings are a problem!). Furthermore, note that the unpacked directory contains lots of .gif and .html files that are actually gzip-compressed files, but lacking the .gz extension. If you ave already ungzipped the html file, you can use --unpackedzipped.
If you want to unpack the dictionary, by advised that the file names in the Zip directory are already encoded in UTF-8, but normal programs (unzip, 7z) assume that they are encoded in some other encoding. Thus, if your are using UTF-8 locale, it is necessary to set LC_CTYPE to C to make sure that the encoding is used, as in
Code:
LC_CTYPE=C 7z x ...
Typical run
In the following example we use both dict2 and Japanese3 dictionaries, and prefer Edict2 (which is the default). When not passing in any --dicts option, the program searches for both available dictionaries, which are found in this case.
Code:
$ perl enhance-dictionary.pl
Using the following dictionaries as source for translations: edict2 japanese3
loading edict2 ... done
loading Japanese3 data ... done
unpacking original dictionary ... done
loading dict files ... done
searching for words and updating ... done
total words 805521, matches: 274464 (edict: 265953, japanese3: 8511)
creating output html ... done
creating update dictionary in dicthtml-jaxxdjs-201508072201.zip ... done
$
Installing the dictionary
After having created the enhanced dictionary, one can install the generated file as KOBO/.kobo/dicts/dicthtml-jaxxdjs.zip (where KOBO is the mount point of the eReader). The dictionary should be picked up automatically. And the next lookup should give you something like the following:

There is only one caveat: Syncing with Kobo will re-download the original dictionary and overwrite the enhanced one. There is at least one solution here, which I am employing, see this post.
Future plans
I am planning to get rid of the 7z dependency and use Archive::Zip for all the unpacking and packing. This would also allow to do everything in memory and thus make it faster.
Conclusions
Even if Kobo does not provide us with a decent Japanese-English dictionary, adding at least a huge amount of translations to the current dictionary is now easily possible. For serious students of Japanese who are reading or starting to read Japanese eBooks, this will be of great help.
Enjoy, and dont forget to give feedback, suggestions, and improvements either here or better via the Github page.