Drehersoft
  • Rss
Turning out software solutions.™
  • Home
  • Products
    • Droid48 Reader
      • News
      • Screenshot Examples
      • FAQ
      • Legal And Disclaimers
  • News
  • Contact
  • About
Search the site...

Mapping HP48 Text to Unicode

July 11, 2012 - HP48 Articles

The Problem

The HP48 calculators have a text encoding that is based on the Latin-1 character set (a.k.a. ISO 8859-1) with the exception of 34 of the control characters.  These characters are 0x1F and 0x7F to 0x9F.  Instead of leaving these characters as the normal Latin-1 control codes, HP re-purposed these mostly unused control codes for 34 characters better suited for displaying on a high-end calculator.  Problems appear when the re-purposed characters are present in HP48 text or file names that are being used on a different computing platform (ex: transferring a file from an HP48 to a PC).  This sometimes results in garbage data, bugs, and crashes in software that doesn’t attempt to handle these special characters.

Table of HP48 font at size 3. Blue HP48 characters differ from the Latin-1 character set.

Table of the HP48 size 3 font. Blue HP48 characters differ from the Latin-1 character set.

Table of HP48 font at size 2. Blue HP48 characters differ from the Latin-1 character set.

Table of the HP48 size 2 font. Blue HP48 characters differ from the Latin-1 character set.

Table of HP48 font at size 1. Blue HP48 characters differ from the Latin-1 character set.

Table of the HP48 size 1 font. Blue HP48 characters differ from the Latin-1 character set.

Unicode has become the ubiquitous standard since the time the HP48 was originally created.  Unicode supports over 1 million possible characters.  This means that it is now possible to convert HP48 text to characters that much of the world now uses.

However, with so many characters to chose from that look similar, sometimes the issue then becomes one to use.  For example, the number 0 and the letter O looks somewhat similar, depending on what font is being used.

Solution

To convert an HP48 character to a Unicode character, use the following mapping table:

HP48 Unicode
Decimal Hex I/O Char* Name Char Hex UTF-8
31 1F Ellipsis … 2026 E2 80 A6
127 7F Medium Shade ▒ 2592 E2 96 92
128 80 \<) Angle ∠ 2220 E2 88 A0
129 81 \x- Latin Small Letter a with Macron ā 0101 C4 81
130 82 \.V Nabla ∇ 2207 E2 88 87
131 83 \v/ Square Root √ 221A E2 88 9A
132 84 \.S Integral ∫ 222B E2 88 AB
133 85 \GS Greek Capital Letter Sigma Σ 03A3 CE A3
134 86 \|> Black Right-Pointing Triangle ▶ 25B6 E2 96 B6
135 86 \pi Greek Small Letter Pi π 03C0 CF 80
136 88 \.d Partial Differential ∂ 2202 E2 88 82
137 89 \<= Less-Than or Equal To ≤ 2264 E2 89 A4
138 8A \>= Greater-Than or Equal To ≥ 2265 E2 89 A5
139 8B \=/ Not Equal To ≠ 2260 E2 89 A0
140 8C \Ga Greek Small Letter Alpha α 03B1 CE B1
141 8D \-> Rightwards Arrow → 2192 E2 86 92
142 8E \<- Leftwards Arrow ← 2190 E2 86 90
143 8F \|v Downwards Arrow ↓ 2193 E2 86 93
144 90 \|^ Upwards Arrow ↑ 2191 E2 86 91
145 91 \Gg Greek Small Letter Gamma γ 03B3 CE B3
146 92 \Gd Greek Small Letter Delta δ 03B4 CE B4
147 93 \Ge Greek Small Letter Epsilon ε 03B5 CE B5
148 94 \Gn Greek Small Letter Eta η 03B7 CE B7
149 95 \Gh Greek Small Letter Theta θ 03B8 CE B8
150 96 \Gl Greek Small Letter Lamda λ 03BB CE BB
151 97 \Gr Greek Small Letter Rho ρ 03C1 CF 81
152 98 \Gs Greek Small Letter Sigma σ 03C3 CF 83
153 99 \Gt Greek Small Letter Tau τ 03C4 CF 84
154 9A \Gw Greek Small Letter Omega ω 03C9 CF 89
155 9B \GD Greek Capital Letter Delta Δ 0394 CE 94
156 9C \PI Greek Capital Letter Pi Π 03A0 CE A0
157 9D \GW Greek Captial Letter Omega Ω 03A9 CE A9
158 9E \[] Black Square ■ 25A0 E2 96 A0
159 9F \oo Infinity ∞ 221E E2 88 9E

* not all I/O Characters are listed here.

All remaining HP48 characters can be directly mapped to Unicode.  For example, an HP48 ‘A’ is 0×41 and in Unicode is 0041.  This applies for the ranges of 0×00 to 0x1E, 0×20 to 0x7E, and 0xA0 to 0xFF.

If you are using UTF-8, then it is necessary to encode each Unicode characters into 1, 2, or 3 byte sequences.  Details are available at http://en.wikipedia.org/wiki/Utf-8.

Rationale

  1. Character 0×80 (angle)
    1. Instead using ∠ 2220 for character 0×80, others have incorrectly used ∟ 221F.  This is the Right Angle character and is not intended for any generic angle.  Also, it does not visually match the HP48.
    2. While ∡ 2221 is visually an even better match, this character often does not render properly on various computer platforms and software.  In short, some users will just see empty boxes such as:

      Examples of Microsoft Visual Studio 2008 HTML editor and Android 2.3.4 default font unable to render Unicode “Measured Angle” character (2221).

      Empty boxes.

  2. Character 0×81 (x-bar)
    1. In theory, Unicode allows two characters to be visually combined if the 2nd character is a “combining character”.  This would allow for the display of x̄ by using x followed by the “combining macron” character, which would be 0078 followed by 0304.  However, there are two problems with this.
      1. This combining of these two characters often renders poorly or not at all and will leave the user confused.  In the example below, the first two in the example are rendering failures while the last two are simply difficult to read at the default settings:
        For additional examples of how x-bar is inconsistently rendered based on font, go http://www.kreativekorp.com/charset/encoding.php?file=hp-48.kte&char=81.
      2. Various bad renderings of the Unicode characters "x" with "Combining Macron" (0078 and 0304) as rendered by Android 2.3.4, MS Visual Studio 2008, Windows 7 Notepad, and LibreOffice 3.4.4.

        x-bar’s bad rendering.

      3. Using two characters to represent one HP48 character breaks the pattern having a simple one-to-one mapping.  Some HP48 developers will likely have bugs in the code when converting back from Unicode to HP48 characters.
    2. Instead, ā 0101 is used.  It is a single Unicode character so it is easy for HP48 developers to deal with, leading to less bugs.  Also, x-bar is used in statistics as the notation for average and ā looks like an ‘a’ for average.
  3. Character 0×82 (nabla)
    1. The character ∇ 2207 was chosen over other triangles since this is the Nabla character which is used in mathematics.  Details can be read http://en.wikipedia.org/wiki/Nabla_symbol.
  4. Characters 0x8D through 0×90 (arrows)
    1. In Unicode, there are a large number of characters that represent arrows.  However, 2190 through 2193 were chosen because these are just simple arrow characters and don’t carry any additional implied meaning.  Also, this set of arrow characters supports all four directions where as some of the other sets do not.  Lastly, some of the alternative arrow characters do not consistently get rendered on some computing platforms.
  5. Characters 0×85, 0x8C, 0x9B, 0x9C, 0x9D (various Greek symbols)
    1. These are Greek symbols that could have alternatively been represented by various mathematical or electrical Unicode characters.  However there are several reasons for preferring the Greek symbols:
      1. We can gain insight into the original HP48 developers intentions by looking at how they translated these characters when using ASCII transfer mode 2 or 3 over a serial link.  These characters were translated into \GS, \Ga, \GD, \PI, and \GW respectively.  If we assume that “G” stands for Greek, then we can assume these translations mean Greek Capital Sigma, Greek lower alpha, Greek Capital Delta, Capital Pi, and Greek Capital Omega (a lower omega looks like a ‘w’).  This pattern holds for all the other translated Greek letters as well, except for \pi which is trivial to see that this is lower pi.
      2. Using all Greek symbols results in a visually clean look.  In contrast, when symbols from math, electronics, and Greek symbols are mixed together, they often look sloppy because they don’t line up, have different line weights, and different drawing styles.
  6. Character 0x9E (box)
    1. Instead of using ■ 25A0 as the Black Box, others have incorrectly used ▬ 25AC which is the Black Rectangle. This visually does not match.

Other HP48 to Unicode Mappings

  • http://www.kreativekorp.com/charset/encoding.php?file=hp-48.kte – Differs from above on HP48 characters 0×81 and 0×85.  Characters 0x1F and 0x7F are not dealt with.
  • http://www.kostis.net/charsets/hp48.htm – Differs from above on HP48 characters 0×80, 0×85, and 0x9E.  Characters 0x1F, 0x7F, and 0×81 are not dealt with.

Note: efforts are being made (or will be made) to rectify the differences.

Other Resources

  • Unicode Standard: http://unicode.org/
  • Unicode Character Name Index: http://www.unicode.org/charts/charindex.html
  • HP48 ASCII Transfer mode translations: http://holyjoe.net/hp/tiotable.htm
  • Newsgroup Post: https://groups.google.com/d/topic/comp.sys.hp48/hek271hUD-E/discussion
  • Matching Tables:
    • http://www.ascii.ca/hp48.htm

Products

  • Droid48 Reader

News and Articles

  • All Articles
    • HP48 Articles
  • All News
    • Droid48 Reader News
(c) 2012 Drehersoft
  • Sitemap