---------------------------------------------------
Ace Kanji Workout - source code documentation

See also: http://alexquinn.org/kanji/ or send email
to Alex Quinn at aquinn@cs.washington.edu.

7/13/06
---------------------------------------------------

INTRODUCTION

This program is for learning to READ JAPANESE KANJI in the most efficient
way possible.  It is focused on that goal only.  Thus, it's not
suitable for learning vocabulary or learning to write kanji.  I have
no plans to expand it to any of those purposes.  I think by focusing
this program, I was able to do things with it in a specialized way
that will support the stated goal very well.

Before creating this program, I tried many other applications.  Some are
focused on kanji, but too limited.  Others have excellent learning facilities
but are difficult to adapt to the task of studying kanji.  So, I built this.


LICENSE INFORMATION

Ace Kanji Workout
Copyright (C) 2006 Alexander Quinn <aquinn@cs.washington.edu>
http://alexquinn.org/kanji/

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

See http://www.gnu.org/licenses/gpl.html for more information on this.


DATA FILE RULES AND OBJECTIVES

This is a little complicated, I guess.  Before making this program, I used
JFC, which is an excellent application.  JFC is more sophisticated and
mature than my application.  However, one shortcoming of JFC is that you have
to either key in all the kanji data by hand or use the defaults, which come
from Edict or Kanjidic.  If you do that, then you get a huge number of readings
and you don't know which ones are really important.  Thus, you can't really
learn them.  My mind won't focus on things unless I know they are important.

To solve these problems, we put a fair amount of work into the data files.
The data comes from Edict and Kanjidic, but we do some processing on it.
By default (this is switchable in the code), it uses only examples (a la
Edict) that are in my JLPT practice lists (abeit a little old) or are
marked as P (popular) in Edict.  Then, it only uses kanji readings which 
are BOTH listed in Kanjidic and included in some example.  Figuring out
if a reading is used in an example is difficult.  You have to be able to
split the reading of the example along the boundaries of kanji or groups
of kana.  For example, taberu becomes tabe-ru.  arigatou becomes
ari-gato-u.  shokuji becomes shoku-ji.  Once you can do that, then you can
say if the "shoku" reading of taberu is ever used in any examples.  Without
the splits, it's hard to say.


COMPILE INSTRUCTIONS

Should be very simple.  "javac KanjiTrainer.java"  However, before you can
run the program, you also need to build the data files, (data-examples.txt
and (data-kanji.txt).  To do that, first compile the data file creator
"javac DataFileCreator.java".  Then, make sure there's a directory called
"original-data".  In that directory, you should have "edict.txt" in EUC-JP
format, "kanjidic.txt" in EUC-JP format, and "kana-splits-edits.txt" in
UTF-8 format.  The first two files can be obtained from Jim Breen's web site
at Monash University in Australia.  The last file can be empty if you don't
have one.  It simply helps the program figure out how to split the readings
of examples on kanji boundaries, in the few cases where my algorithm falls
short.  As of 7/13/06, there were 389 entries in that file.  Once you have
the data files in the right place, run "java DataFileCreator".  On my laptop
(1.5 GHz Pentium M, 512 MB RAM), it takes about 2 minutes to finish.  Your
results may very.


MAIN CLASSES IN THIS APPLICATION

The main classes in this project are:
	KanjiTrainer - MAIN ENTRY POINT, contains main function, handles UI
	Pool - represents user's long term goal (e.g. all JLPT-1 kanji)
			and mediates most access to the kanjiHash (database of
			kanji information.
	PoolSpec - represents user's choices about what kanji to study
			in the long-term.
	PoolSpecDialog - dialog box for choosing your long-term goal.
			(e.g. study all kanji that are JLPT-1 and frequency >1000)
	Deck - represents the kanji that are being studied in this session.
	DeckSpec - represents user's choices about how to choose kanji
			to study for a particular session.
	DeckSpecDialog - dialog box for specifying how to choose kanji to
			study in a particular session. (e.g. choose 10 kanji I
			already know, chosen by date last seen, etc.)
	OptionsDialog - just what it sounds like.  Currently allows you to
			change the Japanese font, time limit options, number of
			examples shown, etc.
	UseCalendar - Keeps track of your daily performance on a very rough
			scale.
	UseCalendarDialog - Dialog box to show you how often you've used the
			program and how many kanji (new vs. review) you studied
			each time.
	Profile - Wrapper that includes the PoolSpec, DeckSpec, UseCalendar, and
			Options classes.  This was added late.  In the future, it would
			be smart to have this class take care of loading the profile from
			a file.  This should also take care of the performanceHash, which
			is currently accessed simply as a hashtable object.
	Kanji - Just what it sounds like.  Contains kanji character (as a String),
			frequency, indexes into various books, etc.
	Example - Just what it sounds like.  Contains the word in kana and kanji,
			kana / kanji versions that are split with colons.  Also contains
			important methods (i.e. makeSplits) which are used for creating
			the data file.
	DataFileCreator - ENTRY POINT for creating data file.  This is run from
			the command line.  The files it creates are necessary for the
			rest of the program to run.
	OutFile - Wrapper for Java's messy file access facilities to make the
			rest of	the code simpler, although a little less robust.
	InFile - Similar to OutFile.
	JarInFile - Similar to InFile, but can also read files inside a jar
			file.  Will read the files from wherever the application is
			being run.


MAIN FILES  (not exhaustive)

Kanji.java - Kanji class
Example.java - Example class
KanjiTrainer.java - everything about the UI and kanji selection (Pool, Deck)
	etc.
DataFileCreator.java - entry point for data file creation.  contains most of
	that stuff, but also relies on Kanji.java and Example.java.
ace-kanji-workout-profile.txt - This file is created outside the jar when the
	user starts the program for the first time.  It contains all non-static
	information, including PoolSpec, DeckSpec, options, study history, and
	user's performance history with every single kanji.  See the toString methods
	of various classes for the specifics on how it's formated.
data-kanji.txt - Contains details about every kanji the program is capable of studying,
	(currently 2232 kanji, including all JLPT and Joyo kanji), along with a list
	of examples (in kanji form only).
data-examples.txt - Contains details about each example.  This could have been
	integrated into the data-kanji.txt file, but that would have caused a fair 
	amount of duplicated information, because some examples contain more than one
	kanji.
kt.xml - Data file for creating the wrapper with Launch4J.
KanjiTrainer.properties - table of strings used in the interface, loaded by
	ResourceBundle class.
edict.txt - Edict file, in EUC-JP encoding.  See the Edict web page:
	http://www.csse.monash.edu.au/~jwb/edict.html
kanjidic.txt - KanjiDic file, in EUC-JP encoding.  See the KanjiDic web page:
	http://www.csse.monash.edu.au/~jwb/kanjidic.html
jlpt-voc-1.txt - List of example words, along with the JLPT practice list they
	appear in.  One file for each of the 4 levels.
jlpt_kanji_level_1_base.txt - List of kanji, along with the JLPT practice list they
	appear in.  One file for each of the 4 levels.


NOTES

DATA MEMBER ACCESS:  I took a fairly liberal approach to data member
access for accessing fields of the Example and Kanji classes.  Direct access to
some data fields is needed for manipulating and displaying the data.  I didn't
want to have a large number of getters and setters, so I just kept everything in
the same package and accessed some members directly.  In hindsight, maybe it would
have been better to use the getters and setters.  For example, there's some
redundant information in the Example class (split and unsplit version) which could
be easily eliminated if it weren't for the direct member access.

MULTIPLE INSTANCES OF THE APPLICATION:  Please don't open the application more than
once at a time.  It has no checking to see if there is more than one instance open.
Thus, you may lose some of your profile (progress, options) information after the
second copy gets closed and clobbers the changes from the first instance.

INDENT - I made these files in VIM using an indent (tab-width) of 4 characters.  I didn't worry
much about the right margin.  Most of the code probably wraps at around 120 columns.
Java is wordy, so I see no need to impose a strict 76 or 80 column limit on my code,
especially for a small project like this.

JAVA VERSION - I developed and tested using Sun's JFC version 1.5.0_06.  I haven't had
an opportunity to test it anywhere else.

INTERFACE LANGUAGE - Most messages and important interface strings are in the properties
file.  In the future, it wouldn't be too hard to translate the whole interface to Japanese,
or some other language.  I can't imagine why you'd want to translate to another language,
given that all the definitions of kanji and examples are in English.  Note, however, that
there would be some changes needed.  Some window sizes were sized precariously.

APPLICATION NAME - When I started this project, I thought I'd call it Kanji Trainer.  The
idea is that if the program is effective, then most of your time will be with kanji you
don't know well, and thus, it will be hard work.  Unfortunately, there's already at least
one application called "Kanji Trainer".  So, I changed it to "Ace Kanji Workout".  Since
the user doesn't see the file names anyway, I didn't bother to change the file names and
class names in the source code.  I guess a better approach might have been to simply call
it something name-independent like "KanjiApplicationUI".  Oh well.
