CCSID

From Wikipedia, the free encyclopedia

Jump to: navigation, search

CCSID is an abbreviation used by IBM to mean "Coded Character Set Identifier". It is a 16-bit number that represents a specific encoding of a specific code page. For example, Unicode is a code page that has several encoding forms, like UTF-8, UTF-16 and UTF-32.

Many European languages use the EBCDIC-encoded CCSID 37. CCSID 367 is identical to ASCII. CCSID 819 is identical to ISO 8859-1. CCSID 923 is identical to ISO 8859-15. CCSID 1208 is identical to UTF-8.

[edit] What Is the Difference between a Code Page and a CCSID?

The terms code page and CCSID are often used interchangeably even though they are not synonymous. A code page may be only part of what makes up a CCSID. The following definitions help to illustrate this point, from glyph to CCSID and everything in between.

A glyph is the actual physical pattern of pixels or ink that shows up on a display or printout.

A character is a concept that covers all glyphs associated with a certain symbol. For instance, "F", "F", "F", "F", "F", and "F" are all different glyphs, but use the same character. The various modifiers (bold, italic, underline, color, and font) do not change the Fs essential F-ness.

A character set contains most or all of the characters necessary to allow a particular human to carry on a meaningful interaction with the computer. Here's where we start to separate characters into various alphabets (Latin, Arabic, Hebrew, Cyrillic, and so on) or ideographic groups (Chinese, Korean, and so on).

A code page represents a particular mapping or ordering of a character set. Each character is assigned a code point , the computer's internal representation of that character. Many characters are represented by different code points in different code pages. It is important to note that all code points in a code page contain the same number of bytes. Certain character sets can be adequately represented with single-byte code pages (256 characters), but many require more than that.

A coded character set identifier (CCSID) contains all of the information necessary to assign and preserve the meaning and rendering of characters through various stages of processing and interchange. This information always includes at least one code page, but may include multiple code pages of differing byte-lengths. The CCSID also has an associated encoding scheme that governs how various code points are to be handled. This is the mechanism which allows us to implement mixed, bidirectional, and other complex encoding.

At a very simplistic level, you can think of a CCSID as a Swiss Army knife and a code page as the screwdriver attachment. You can do some jobs with just the code page, handling only the simplest language representations. But, to do anything really tricky, you need the corkscrew, scissors, nail file, and can opener. Only a CCSID will allow you to mix single-byte and double-byte characters, or deal with bidirectional languages.

[edit] External links

Views
Personal tools

Toolbox