4499288: (cs spec) Charset terminology problems

Reviewed-by: mr, iris
2008-03-09 21:56:42 -07:00 · 2008-03-09 21:56:42 -07:00 · 13ced2b935
commit 13ced2b935
parent 86d2ba8dfc
1 changed files with 31 additions and 20 deletions
--- a/jdk/src/share/classes/java/nio/charset/Charset.java
+++ b/jdk/src/share/classes/java/nio/charset/Charset.java
@ -212,36 +212,47 @@ import sun.security.action.GetPropertyAction;
 *
 * <h4>Terminology</h4>
 *
- * <p> The name of this class is taken from the terms used in <a
+ * <p> The name of this class is taken from the terms used in
- * href="http://www.ietf.org/rfc/rfc2278.txt""><i>RFC&nbsp;2278</i></a>.  In that
+ * <a href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC&nbsp;2278</i></a>.
- * document a <i>charset</i> is defined as the combination of a coded character
+ * In that document a <i>charset</i> is defined as the combination of
- * set and a character-encoding scheme.
+ * one or more coded character sets and a character-encoding scheme.
 * (This definition is confusing; some other software systems define
 * <i>charset</i> as a synonym for <i>coded character set</i>.)
 *
 * <p> A <i>coded character set</i> is a mapping between a set of abstract
 * characters and a set of integers.  US-ASCII, ISO&nbsp;8859-1,
- * JIS&nbsp;X&nbsp;0201, and full Unicode, which is the same as
+ * JIS&nbsp;X&nbsp;0201, and Unicode are examples of coded character sets.
 * ISO&nbsp;10646-1, are examples of coded character sets.
 *
- * <p> A <i>character-encoding scheme</i> is a mapping between a coded
+ * <p> Some standards have defined a <i>character set</i> to be simply a
- * character set and a set of octet (eight-bit byte) sequences.  UTF-8, UCS-2,
+ * set of abstract characters without an associated assigned numbering.
- * UTF-16, ISO&nbsp;2022, and EUC are examples of character-encoding schemes.
+ * An alphabet is an example of such a character set.  However, the subtle
- * Encoding schemes are often associated with a particular coded character set;
+ * distinction between <i>character set</i> and <i>coded character set</i>
- * UTF-8, for example, is used only to encode Unicode.  Some schemes, however,
+ * is rarely used in practice; the former has become a short form for the
- * are associated with multiple character sets; EUC, for example, can be used
+ * latter, including in the Java API specification.
- * to encode characters in a variety of Asian character sets.
+ *
 * <p> A <i>character-encoding scheme</i> is a mapping between one or more
 * coded character sets and a set of octet (eight-bit byte) sequences.
 * UTF-8, UTF-16, ISO&nbsp;2022, and EUC are examples of
 * character-encoding schemes.  Encoding schemes are often associated with
 * a particular coded character set; UTF-8, for example, is used only to
 * encode Unicode.  Some schemes, however, are associated with multiple
 * coded character sets; EUC, for example, can be used to encode
 * characters in a variety of Asian coded character sets.
 *
 * <p> When a coded character set is used exclusively with a single
- * character-encoding scheme then the corresponding charset is usually named
+ * character-encoding scheme then the corresponding charset is usually
- * for the character set; otherwise a charset is usually named for the encoding
+ * named for the coded character set; otherwise a charset is usually named
- * scheme and, possibly, the locale of the character sets that it supports.
+ * for the encoding scheme and, possibly, the locale of the coded
- * Hence <tt>US-ASCII</tt> is the name of the charset for US-ASCII while
+ * character sets that it supports.  Hence <tt>US-ASCII</tt> is both the
 * name of a coded character set and of the charset that encodes it, while
 * <tt>EUC-JP</tt> is the name of the charset that encodes the
 * JIS&nbsp;X&nbsp;0201, JIS&nbsp;X&nbsp;0208, and JIS&nbsp;X&nbsp;0212
- * character sets.
+ * coded character sets for the Japanese language.
 *
 * <p> The native character encoding of the Java programming language is
- * UTF-16.  A charset in the Java platform therefore defines a mapping between
+ * UTF-16.  A charset in the Java platform therefore defines a mapping
- * sequences of sixteen-bit UTF-16 code units and sequences of bytes. </p>
+ * between sequences of sixteen-bit UTF-16 code units (that is, sequences
 * of chars) and sequences of bytes. </p>
 *
 *
 * @author Mark Reinhold