Japanese Scripts

The Japanese writing system includes the following scripts:japan flag and map

  • Kanji: Chinese characters,
  • Hiragana: Syllabic characters,
  • Katakana: Characters used for foreign words,
  • Okurigana: Combination of Kanji with Hiragana to form a Japanese character.
  • Furigana: Characters added to Kanji characters as an aid to their proper pronunciation,
  • Kokuji’: Kanji characters developed by Japan,
  • Arabic Numbers: For numerical data,
  • Romaji: Letters of the Romance alphabet.

Challenging? I would say so. For those of us who are not Japanese speakers, the following is some information about characters and scripts and how they are used.

Kanji (Chinese characters): These is the script that originated in China 2500 years ago, and the Japanese adopted at the end of the sixth century. Each character represents a whole word or a meaningful unit.  Some Kanji characters are pictographic and others are ideographic.  Pictographic characters are based on a picture of the object that they represent. They have evolved drastically into their modern, stylized versions, so that it is difficult to envision the original format. The following picture illustrates how some of these evolved:

kanji development

Ideographs are characters based on representations of objects or concepts that suggest what the picture is supposed to represent. For example, the character for book (本) is based on the character for tree (木). You can visualize a tree with roots. The roots are important, therefore, the character for book is based on a tree with roots. The characters for up, upper or on top of (上) and for down, or go down (下) include a horizontal bar that is up and going down, for down and one that is at the bottom and going up for up. The character for the number one (一) is a single stroke. Some say that it represents a finger, but others say that it represents a single unit.

Other  characters are aggregates, i.e., they were created by combining simple elements that are often characters themselves. For example, the character for ocean, sea ( 海) is a combination of the characters for water (シ) and all or  every (毎), symbolizing that all water flows to the ocean, or sea. The character for Fall (秋) combines the characters for wheat or grain plants (禾) and fire(火), because grain plants turn to the color of fire in the Fall. The characters for heart (心) and Fall, (秋) combined create the character for melancholy, sadness (愁), because Fall is considered the season for love, and lovers are vulnerable to unrequited love and heartbreak.

About 8% of these characters fall under these three categories (pictographs, ideographs and aggregates). 85% percent of kanji characters are classified as phonetic ideographs, i.e., the combination of a semantic and a phonetic element. For example, the character for school (学校), phonetically GAKKO, is formed by combining the character for wood (木), representing that school buildings were made of wood, and the character  (校) for exchanging ideas or mingling with different people, phonetically KOU.

Japanese script also uses phonetic loan characters, where the sound and the character are used to represent names of countries, such as America, pronounced MEI and represented by the borrowed Kanji character for rice(米). France is pronounced FUTSU and represented with the Kanji character for Buddha (仏), and England is pronounced EI and represented by the Kanji character for excellence (英).

kana development

Kana (The Phonetic Alphabet): Includes up to 48 hiragana and katakana characters, representing sounds, like the English “oh” and “shi”. Their number increases to 71 with the addition of diacritical marks (like the Spanish tilde (˜)), . However, this is not enough to build an entire vocabulary, so most Japanese writing contains also phonetic sounds. represented in the Gojonzu, meaning table of 50 sounds. This is the kana table. Refer to the image on the left, containing 46 sounds that may be represented using hiragana or katakana characters. Regardless of the script that is used, the symbols represent the same sounds. Some of the syllables are missing from the basic chart and others have become obsolete. The nasal consonant / n was added later, so the table contains now 46 sounds.

Kokuji’: (Kanji characters of Japanese invention)

Literally translated, Kokuji’ means “national characters”, a fitting name, since they were created in Japan. Examples are the characters used to write the verbs `hataraku’ (はたらく) meaning   “work” and `komu’ (込 ) meaning “into”. Some, like (はたらく)  the character for `hataraku’ (into), have actually been exported to Chinese, but others have remained strictly Japanese.

Okurigana: Kana characters apended to kanji to represent a word’s grammatical functions. For example, the character 使う for the verb tsukau (use) in kanji script, includes the appended character (う) the final “u”. This is okurigana.  The okurigana character completes the kanji and makes it Japanese.

furigana2Furigana: These are small kana characters (hiragana, or sometimes katakana)  added to kanji characters to help with pronunciation. Furigana character are often found in children’s books, comic books, story books and also text books for highschool students, who are not expected to know the most difficult kanji characters. (A Japanese student is expected to know 881 kanji characters at the end of the sixth grade and 2000 upon graduation from high-school. On the average, a college graduate can read about 3400 characters.) Books for non-Japanese who want to learn the language often contain furigana characters also. Also used for proper names, to ensure that they are pronounced properly.

Romaji: Roman alphabet letters used for foreign names, like the names of  U.S. companies (“Apple”, “Dell”, “Microsoft”, etc. ) Another example is (Eメール) “email” where “electronic” is represented with the romaji character “E”.  The following are a couple of examples of Japanese phrases represented with phonetic, romanji alphabet:

Are wa gakkoo desu. That is a school.
Kore wa hon desu. This is a book.

The spoken language

Japanese contains fewer sounds than English and each Japanese syllable has the same pronunciation and stress. However, the pitch is important, because it changes the meaning of a word. Careful listening is required for proper comprehension. Some words, particularly those of foreign origin, cannot be reproduced identically in Japanese. For example, phonetically, California sounds like Kariforunia, Maryland, like Meriiando, violin, like vaiorin and beef steak as biifusuteeki.


The CJKV languages (Chinese, Japanese, Korean and Vietnamese) become quite challenging in software localization because they are double-byte character languages. Romance languages are one byte, meaning that 1 character occupies 1 byte (8 bits).

The processing of languages like English is based on one byte to one character, and it was necessary to overcome this paradigm in order to develop systems that could handle CJKV, where characters are represented by more than a single byte (more than 8 bits). CJKV are 16 bit languages in computer technology. The Japanese Standards Association (JSA), Japan’s counterpart to ANSI, the American National Standards Institute, has identified 3418 primary kanji characters and 3384 secondary kanji characters. Secondary kanji include obsolete or historical characters, such as proper names.

In the beginning, Japanese computers were limited to using kana, which seriously handicapped ordinary textual applications. In the 1960’s IBM developed the Japanese answer to ASCII in a kanji code, an extension of EBCDIC (Extended binary-coded Decimal Interchange Code), for Japanese language interfacing with mainframes. Out of this came programming software for smaller computer systems. Text processing for double-byte characters’ languages has taken a very big leap since then, but CJKV still requires special handling on computer systems.

For information about our translation and localization services, please visit our website:cropped-logo.png