Quantcast
Channel: MobileRead Forums - Kobo Developer's Corner
Viewing all articles
Browse latest Browse all 1576

Home-made Japanese-English Dictionary for Kobo

$
0
0
Hi everyone,

here is a script that allows everyone to enrich the Japanese dictionary shipped with Kobo firmwares with English translations from eDict (and/or Japanese3).

I was using tshering's excellent dictionary, but the new firmware 3.16.10 brought a much extended Japanese-Japanese dictionary, so I decided to write a script that enriches this dictionary with English definitions.

The current status with the Japanese-Japanese dictionary from 3.16.10 is that of in total 805521 entries in the dictionary, 274464 are translated (where edict was used for 265953, and Japanese3 for 8511).

The following is from the blog entry Japanese-English dictionary for Kobo.

The most current version is available via github here: Github project kobo-ja-dict-enhance, and the direct link to the script: enhance-dictionary.pl

Prerequisites

The script is neither fool-proof nor does everything by itself. Furthermore, it requires a set of programs. In details:
  • Unix/Linux computer I haven’t tried anything of this on a Windows machine, but I am happy about feedback. I am working on a version that does not depend on external programs, and thus might be much more portable.
  • Dictionaries A copy of the Edict dictionary – see below for details on dictionaries.
  • 7z A standard zip/unzip program that also takes the locale into account when unpacking (different to unzip that I have access to)
  • Perl modules Various Perl modules that should be standard on most installations: Getopt::Long, File::Temp, File::Basename, and Cwd.

Supported dictionaries

At the moment the program can use the following two dictionaries as sources: Edict2 and Japanese3.

Edict2

The Edict dictionary is a free dictionary which is the base of most other dictionaries. Created by the JMdict/EDICT Project it provides a very complete Japanese-English dictionary.

To use the dict with the current program, one need to download the edict2.gz file and unpack it with gunzip edict2.gz. If you put it into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --edict PATH-TO-EDICT to specify the location of the edict2 file.

Japanese3

The Japanese3 is a dictionary application for iOS which provides another very complete Japanese-English dictionary. My feeling is that it is 90% and more based on Edict, so adding it will not buy you much, but still a bit (see below for how much!).

If you have purchased this dictionary/application, and you manage to get access to your iOS device (via jailbreaking or some other tools), then you need to get the file Japanese3.db from the application folder, and then generate a file via sqlite3 as follows:

Code:

$ sqlite3 Japanese3.db
.output japanese3-data
select Entry, Furigana, Summary from entries;

If you save the generated file japanese3-data into the same directory as the Perl script, nothing else needs to be done, otherwise one can use the command line option --japanese3 PATH-TO-J3-DATA to specify the location of the Japanese3 data file.

Command line options

The program supports the following command line options (but this is due to change when new features are added!)

Code:

  -h, --help            Print this message and exit.
  -v, --version        Print version and exit.
  --info=STR            Print info found on STR in dictionaries and exit.
  -i, --input=STR      location of the original Kobo GloHD dict
                          default: dicthtml-jaxxdjs.zip
  -o, --output=STR      name of the output file
                          default: dicthtml-jaxxdjs-TIMESTAMP.zip
  --dicts=STR          dictionaries to be used, can be given multiple times
                        possible values are 'edict2' and 'japanese3'
                        The order determines the priority of the dictionary.
                        If *not* given, all found dictionaries are used.
  -e, --edict=STR      location of the edict2 file
                          default: edict2
  -j, --japanese3=STR  location of the japanese3 file
                          default: japanese3-data
  --keep-input          keep the unpacked directory
  --keep-output        keep the updated output directory
  -u, --unpacked=STR    location of an already unpacked original dictionary
  --unpackedzipped=STR  location of an already unpacked original dictionary
                        where the html files are already un-gzipped


Note that in case you pass in the option --unpacked, the files should be properly named (encodings are a problem!). Furthermore, note that the unpacked directory contains lots of .gif and .html files that are actually gzip-compressed files, but lacking the .gz extension. If you ave already ungzipped the html file, you can use --unpackedzipped.

If you want to unpack the dictionary, by advised that the file names in the Zip directory are already encoded in UTF-8, but normal programs (unzip, 7z) assume that they are encoded in some other encoding. Thus, if your are using UTF-8 locale, it is necessary to set LC_CTYPE to C to make sure that the encoding is used, as in

Code:

LC_CTYPE=C 7z x ...

Typical run

In the following example we use both dict2 and Japanese3 dictionaries, and prefer Edict2 (which is the default). When not passing in any --dicts option, the program searches for both available dictionaries, which are found in this case.

Code:

$ perl enhance-dictionary.pl
Using the following dictionaries as source for translations: edict2 japanese3
loading edict2 ... done
loading Japanese3 data ... done
unpacking original dictionary ... done
loading dict files ... done
searching for words and updating ... done
total words 805521, matches: 274464 (edict: 265953, japanese3: 8511)
creating output html ... done
creating update dictionary in dicthtml-jaxxdjs-201508072201.zip ... done
$

You see, quite some words have received a translation.

Installing the dictionary

After having created the enhanced dictionary, one can install the generated file as KOBO/.kobo/dicts/dicthtml-jaxxdjs.zip (where KOBO is the mount point of the eReader). The dictionary should be picked up automatically. And the next lookup should give you something like the following:



There is only one caveat: Syncing with Kobo will re-download the original dictionary and overwrite the enhanced one. There is at least one solution here, which I am employing, see this post.

Future plans

I am planning to get rid of the 7z dependency and use Archive::Zip for all the unpacking and packing. This would also allow to do everything in memory and thus make it faster.


Conclusions

Even if Kobo does not provide us with a decent Japanese-English dictionary, adding at least a huge amount of translations to the current dictionary is now easily possible. For serious students of Japanese who are reading – or starting to read – Japanese eBooks, this will be of great help.

Enjoy, and don’t forget to give feedback, suggestions, and improvements either here or better via the Github page.

Viewing all articles
Browse latest Browse all 1576

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>