Extended Binary Coded Decimal Interchange Code

From Wikipedia, the free encyclopedia

(Redirected from EBCDIC)
Jump to: navigation, search

Extended Binary Coded Decimal Interchange Code (EBCDIC) is an 8-bit character encoding (code page) used on IBM mainframe operating systems such as z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems such as OS/400 and i5/OS (see also Binary Coded Decimal). It is also employed on various non-IBM platforms such as Fujitsu-Siemens' BS2000/OSD, HP MPE/iX, and Unisys MCP. It descended from punched cards and the corresponding six bit binary-coded decimal code that most of IBM's computer peripherals of the late 1950s and early 1960s used.

Contents

[edit] History

EBCDIC was devised in 1963 and 1964 by IBM and was announced with the release of the IBM System/360 line of mainframe computers. It was created to extend the Binary-Coded Decimal encoding that existed at the time. It is an 8-bit character encoding, in contrast to, and developed separately from, the 7-bit ASCII encoding scheme.

Interestingly, IBM was a chief proponent of the ASCII standardization committee. However, IBM did not have time to prepare ASCII peripherals (such as card punch machines) to ship with its System/360 computers, so the company settled on EBCDIC at the time. The System/360 became wildly successful, and thus so did EBCDIC.

All IBM mainframe peripherals and operating systems (except Linux on zSeries or iSeries) use EBCDIC as their inherent encoding[citation needed] but software can translate to and from other encodings. Many hardware peripherals provide translation as well and modern mainframes (such as IBM zSeries) include processor instructions, at the hardware level, to accelerate translation between character sets.

At the time it was devised, EBCDIC made it relatively easy to enter data into a computer with punch cards. Since punch cards are never used on mainframes nowadays, EBCDIC is used in modern mainframes solely for backwards compatibility. It has no real technical advantage over ASCII-based code pages such as the ISO-8859 series or Unicode. (There are some technical niceties in each, e.g., in ASCII there is one bit which indicates a normal or control character, while in EBCDIC there is one bit which indicates upper or lower case.) As with single-byte extended ASCII codepages, most EBCDIC codepages only allow up to 2 languages (English and one other language) to be used in a database or text file.

Where true support for multilingual text is desired, a system supporting far more characters is needed. Generally this is done with some form of Unicode support. There is an EBCDIC Unicode Transformation Format called UTF-EBCDIC proposed by the Unicode consortium, but it is not intended to be used in open interchange environments and, even on EBCDIC-based systems, it is almost never used. IBM mainframes support UTF-16, but they do not support UTF-EBCDIC natively.

[edit] Technical details

EBCDIC code pages and ASCII-based code pages are incompatible with each other. Since computers "understand" only numbers, these codepages assign a character to these numbers. The same byte values are interpreted as different characters depending on the codepage used. Data stored in EBCDIC require a code page conversion before the text can be viewed on ASCII based machines, like a personal computer.

A single EBCDIC byte occupies eight bits, which are divided in two halves or nibbles. The first four bits is called the zone and represent the category of the character, whereas the last four bits is called the digit and identify the specific character.

There is a nice correspondence between hexadecimal character codes and punch card codes for EBCDIC. This was an important feature at the time the EBCDIC scheme was created. An IBM keypunch machine could make a 12-row punch card with up to two punches per column (for an alphabetic character), the first punch somewhere in the first three rows (called the zone) and the second punch somewhere in the last nine rows (called the number). The zone could thus be considered a value from 0 to 3, and the number a value from 0 to 9, where zero means no punch and non-zero means the corresponding row is punched. The initial version of EBCDIC codes were just (15−zone)×16+number, and defined only the lower-left 10×4 part of the table shown below (the zone was apparently reversed so that at least the letter characters would be in alphabetic order).

The Hollerith punch-card code was expanded to provide for a unique code for each binary byte-value from 00–FF. This correspondence was, for the most part, defined by a relatively small set of encoding rules, such that no card code had more than one punch in rows 1 through 7, and any combination of punches in the other rows was valid. This scheme provides the needed 256 combinations of 8 bits. This scheme enables any binary value (such as those found in a compiler "object" file) to be encoded in punched cards without shifting to another mode. Also, this limited the number of punches in the central part of the card, so that it was less likely to buckle and jam in a high-speed card reader. The "row-binary" and "column-binary" methods used with earlier IBM computers for binary punched cards did not have this characteristic. The logic circuitry used for translation between EBCDIC and Extended Hollerith was relatively simple, saving cost in the electronics of the card-reader/punch control units. This would not have been the case had ASCII been used, when discrete components were used for early to mid-'60s computers.

The first 64 code points (00–3F) are control characters, 33 of which have ASCII equivalents. One notable difference between the two sets is that ASCII has carriage return (CR) and linefeed (LF) codes, which are generally used as end of line indicators within ASCII text files, whereas EBCDIC has additional newline (NL) and reverse newline (RNL) codes. The other 31 control codes are used for various terminal and device controls, mostly specific to IBM hardware.

There are a number of different versions of EBCDIC, customized for different countries. Some East Asian countries use a double byte extension of EBCDIC to allow display of Chinese, Japanese and Korean scripts for their mainframes. In the double byte extension of EBCDIC, there are shift codes [0x0E,0x0F] to shift between the single byte and double byte modes.

IBM typically names all of its code pages with a number called a CCSID (Coded Character Set IDentifier). It is important to note that the same CCSID can have different character positions in a codepage. For example, the newline character can be a different byte value in z/OS UNIX System Services versus the other EBCDIC based operating systems. This becomes an issue when transferring EBCDIC based text data between machines.

[edit] Codepage layout

The table below is derived from CCSID 500, one of the code page variants of EBCDIC, showing only the basic (English) EBCDIC characters. Characters 00–3F and FF are controls, 40 is space, 41 is no-break space, and CA is soft hyphen. Characters are shown with their equivalent Unicode codes. Invariant alphanumeric, punctuation, and control characters common to all EBCDIC code pages are shown in color. Unassigned codes are typically filled with international or region-specific characters in the various EBCDIC code page variants.

EBCDIC
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F
 
0−
 
NUL
0000
0
SOH
0001
1
STX
0002
2
ETX
0003
3
SEL

4
HT
0009
5
RNL

6
DEL
007F
7
GE

8
SPS

9
RPT

10
VT
000B
11
FF
000C
12
CR
000D
13
SO
000E
14
SI
000F
15
 
1−
 
DLE
0010
16
DC1
0011
17
DC2
0012
18
DC3
0013
19
RES ENP

20
NL

21
BS
0008
22
POC

23
CAN
0018
24
EM
0019
25
UBS

26
CU1

27
IFS
001C
28
IGS
001D
29
IRS
001E
30
IUS ITB
001F
31
 
2−
 
DS

32
SOS

33
FS

34
WUS

35
BYP INP

36
LF
000A
37
ETB
0017
38
ESC
001B
39
SA

40
SFE

41
SM SW

42
CSP

43
MFA

44
ENQ
0005
45
ACK
0006
46
BEL
0007
47
 
3−
 


48


49
SYN
0016
50
IR

51
PP

52
TRN

53
NBS

54
EOT
0004
55
SBS

56
IT

57
RFF

58
CU3

59
DC4
0014
60
NAK
0015
61


62
SUB
001A
63
 
4−
 
SP
0020
64
RSP
00A0
65


66


67


68


69


70


71


72


73


74
.
002E
75
<
003C
76
(
0028
77
+
002B
78
|
007C
79
 
5−
 
&
0026
80


81


82


83


84


85


86


87


88


89
!
0021
90
$
0024
91
*
002A
92
)
0029
93
;
003B
94
¬
00AC
95
 
6−
 
-
002D
96
/
002F
97


98


99


100


101


102


103


104


105
¦
00A6
106
,
002C
107
%
0025
108
_
005F
109
>
003E
110
?
003F
111
 
7−
 


112


113


114


115


116


117


118


119


120
`
0060
121
:
003A
122
#
0023
123
@
0040
124
'
0027
125
=
003D
126
"
0022
127
 
8−
 


128


129
a
0061
130
b
0062
131
c
0063
132
d
0064
133
e
0065
134
f
0066
135
g
0067
136
h
0068
137
i
0069
138


139


140


141


142
±
00B1
143
 
9−
 


144
j
006A
145
k
006B
146
l
006C
147
m
006D
148
n
006E
149
o
006F
150
p
0070
151
q
0071
152
r
0072
153


154


155


156


157


158


159
 
A−
 


160
~
007E
161
s
0073
162
t
0074
163
u
0075
164
v
0076
165
w
0077
166
x
0078
167
y
0079
168
z
007A
169


170


171


172


173


174


175
 
B−
 
^
005E
176


177


178


179


180


181


182


183


184


185
[
005B
186
]
005D
187


188


189


190


191
 
C−
 
{
007B
192
A
0041
193
B
0042
194
C
0043
195
D
0044
196
E
0045
197
F
0046
198
G
0047
199
H
0048
200
I
0049
201
SHY
00AD
202


203


204


205


206


207
 
D−
 
}
007D
208
J
004A
209
K
004B
210
L
004C
211
M
004D
212
N
004E
213
O
004F
214
P
0050
215
Q
0051
216
R
0052
217


218


219


220


221


222


223
 
E−
 
\
005C
224


225
S
0053
226
T
0054
227
U
0055
228
V
0056
229
W
0057
230
X
0058
231
Y
0059
232
Z
005A
233


234


235


236


237


238


239
 
F−
 
0
0030
240
1
0031
241
2
0032
242
3
0033
243
4
0034
244
5
0035
245
6
0036
246
7
0037
247
8
0038
248
9
0039
249


250


251


252


253


254
EO

255
—0—1—2—3—4—5—6—7—8—9—A—B—C—D—E—F

[edit] Criticism and humor

Open source software advocate and hacker Eric S. Raymond writes in his Jargon File that EBCDIC was almost universally loathed by early hackers and programmers because of its multitude of different versions, none of which resembled the other versions, and that IBM produced it in direct competition with the already-established ASCII.

The Jargon file 4.4.7 gives the following definition:

EBCDIC: /eb´s@·dik/, /eb´see`dik/, /eb´k@·dik/, n.

[abbreviation, Extended Binary Coded Decimal Interchange Code] An alleged character set used on IBM dinosaurs. It exists in at least six mutually incompatible versions, all featuring such delights as non-contiguous letter sequences and the absence of several ASCII punctuation characters fairly important for modern computer languages (exactly which characters are absent varies according to which version of EBCDIC you're looking at). IBM adapted EBCDIC from punched card code in the early 1960s and promulgated it as a customer-control tactic (see connector conspiracy), spurning the already established ASCII standard. Today, IBM claims to be an open-systems company, but IBM's own description of the EBCDIC variants and how to convert between them is still internally classified top-secret, burn-before-reading. Hackers blanch at the very name of EBCDIC and consider it a manifestation of purest evil.

 

Another popular complaint is that the EBCDIC alphabetic characters follow an archaic punch card encoding rather than a linear ordering like ASCII. The upshot of this is that incrementing the character code for "I" does not produce the code for "J", and likewise there is a gap between the codes for "R" and "S". Thus programming a simple control loop to cycle through only the alphabetic characters is problematic.

These incompatibilities were also the source of many jokes. A popular one went:

Professor: So the American government went to IBM to come up with a data encryption standard, and they came up with—
Student: EBCDIC!

[edit] See also

[edit] External links

bs:EBCDIC cs:EBCDIC de:Extended Binary Coded Decimals Interchange Code es:EBCDIC eo:EBCDIC fr:Extended Binary Coded Decimal Interchange Code he:EBCDIC hr:EBCDIC it:EBCDIC hu:EBCDIC nl:EBCDIC ja:EBCDIC pl:EBCDIC pt:Extended Binary Coded Decimal Interchange Code ru:EBCDIC sk:EBCDI sv:EBCDIC th:EBCDIC tr:EBCDIC uk:EBCDIC zh:EBCDIC

Views
Personal tools

Toolbox