Digital computers need to represent things (either numerical, graphical
or textual) so that it can store and manipulate them. All digital computers
use minute switches (called transistors) to do this - each switch can
either be ON (1) or OFF (0) and each is called a bit.
Using bits (or 1's and 0's) requires some conventions and a little
bit of mathematical 'magic'.
In this section we investigate numeric representation/symbolism (numbers)
on a digital computer. This will help in our understanding of some low-level
computer processes.
Please note: the caret (^) in this section means RAISED TO THE POWER
OF
HINDU-ARABIC (Base 10) - decimal
In Decimal there are 10 Unique symbols (0123456789) that can be arranged
using a particular order to represent quantities.
Order is important (eg. 123 not same as 312 even though the same digits
have been used) - this order is called PLACE VALUE, and the order of
the digits greatly effects the value of the number.
Base 10 or decimal numbers are organised into columns that are powers
of 10:
..... 10^4 10^3 10^2 10^1 10^0 . 10^-1 10^-2 10^-3 ...
10000 1000 100 10 1 . 1/10 1/100 1/1000
256.3 = 2x10^2 + 5x10^1 + 6x10^0 + 3x10^-1 (this is sometimes referred
to as scientific notation)
Hindu-Arabic has some advantages for us HUMANS
- This number system uses quite large conveniently sized groups, which
all relate back to the number fingers or toes we have.
- We are well used to this system as we learn it nearly from birth.
- Groups of 10 are convenient for processing numbers of any size,
and a whole series of terms have evolved around these groups - decimal
point, ten, hundred, thousand etc.
Hindu-Arabic is totally inconvenient for digital computers, yet they
appear to be able to use this system with ease - how do they do it?
The answer to this question will come in a few steps...
BINARY (Base 2)
All information dealt with by a digital computer is stored (and often
processed) in BINARY.
A Digital Computer is a SWITCHING computer - it is made up of billions
of transistors each of which is a simple switch which is a two-state
device (2 state = on/off).
In contrast to the familiar Decimal, BINARY uses ONLY 2 unique symbols
(0 and 1) arranged using place value to represent any numeric quantity.
How is this possible?
Using a familiar model, columns (or place value) are powers of 2 (instead
of the familiar 10)
...2^6 2^5 2^4 2^3 2^2 2^1 2^0 . 2^-1 2^-2 2^-3...
64 32 16 8 4 2 1 . 1/2 1/4 1/8
The order of the digits (or bits as they are called),
and the column they occupy determine their value. This means there
is a place value present in this system, but the places are different
to what you are used to.
It may surprise you that using this scheme, it is possible
to represent any number that exists in the real world (as well as
to approximate most irrational numbers as well). It just looks weird.
The following sequence represents the first 20 counting numbers (ie.
1..20) in binary
|
1
10
11
100
101
110
111
1000
1001
1010
1011
1100
1101
1110
1111
10000
10001
10010
10011
10100 |
|
In the above sequence there is a definite pattern - can you spot it?
The biggest digit that can occur in any column is a '1' - to go one
more, we reset that column to '0' and carry the '1' to the next
column on the left.
You have been doing this successfully with decimal numbers most
of your life, the difference here is that the group size is different
(instead of the familiar '9' indicating a column is full, the '1' does
the job instead).
1101 base2 = 1x2^3 + 1x2^2 + 0x2^1 + 1x2^0
= 8 + 4 + 1
= 13 base10
42 base10 = 1x2^5 + 0x2^4 + 1x2^3 + 0x2^2 + 1x2^1 +0x2^0
= 101010 base2
Although this grouping system appears clumsy to our Base 10 biassed
minds, it represents a simplicity sufficient to be used with great rapidity
inside every digital device. Since 10 is NOT a power of 2, digital computers
actually work very hard to display answers in decimal, when all operations
actually happen in binary on the processor chip.
You will notice that even quite small quantities in binary occupy MANY
digits - this is a product of the small group size.
Fractional quantities can also be represented using this base, but
some irrationals (like 1/3 are represented in a fairly 'lossy' way,
and therefore may tend to be imprecise in certain applications).
OCTAL (Base 8)
(for historical reference only !!! This will NOT be tested) Octal was
a common number system used by early 8-bit digital computers.
In Octal there are 8 unique symbols (01234567) arranged using place
value to represent any numeric quantity. The column system comprises
powers of 8. The order of the digits determines their place value.
...8^6 8^5 8^4 8^3 8^2 8^1 8^0 . 8^-1 8^-2 8^-3...
262144 32768 4096 512 64 8 1 . 1/8 1/64 1/512
105 base8 = 1x8^2+0x8^1+5x8^0
= 64 + 0 + 5
= 96 base10
4178 base10 = 1x8^4+0x8^3+1x8^2+2x8^1+2x8^0
= 10122 base8
Since we are currently working with 32bit and 64bit machines are becoming
more common, we will not dwell on this number system.
HEXADECIMAL (Base 16)
Since large quantities are very cumbersome in binary, a number system
was devised with a relatively large group size.
Hexadecimal (or Hex) uses a group size of 16. In Hex there are 16 unique
symbols (0123456789ABCDEF) arranged using place value to represent any
numeric quantity.
Since we ran out of conventional mathematical symbols for quantities
at '9', some wise(?) computer scientists decided to 'recycle' letters
from the alphabet for the 6 additional symbolic quantities. The columns
in Hex are powers of 16:
...16^4 16^3 16^2 16^1 16^0 . 16^-1 16^-2 16^-3 ...
65536 4096 256 16 1 . 1/16 1/256 1/4096
The largest symbol that can occupy a column is 'F' (which stands for
15 in decimal)
2A base16 = 2x16^1 + Ax16^0
= 32 + 10
= 42 base10
1993 base10 = 7x16^2 + Cx16^1 + 9x16^0
= 7C9 base16
This large group size makes this number system convenient for representing
LARGE numbers which are commonly dealt with in digital computers.
Indeed Hexadecimal is used to represent memory addresses that routinely
range in the millions. An added bonus of the Hexadecimal system stems
from the fact that 16 is a power of 2, meaning binary-hex conversions
are VERY fast.
This number system 'feels' most alien to our base 10 biassed minds
- seeing LETTERS and numbers mixed up to represent mathematical quantities
is confusing at first.
Hex presents itself in the 'oddest' of places - these
words, for example have a font colour defined in Hex :CC3399
(meaning CC of RED = 204 RED; 33 of GREEN
= 51 GREEN; 99 of BLUE = 153
BLUE). By using Hex pairs for each of the three primary reflective
spectra primaries, it is possible to specify over 16.5 Million different
colour variations (thankfully it is rare to see them all on the one
page :).
We will learn more about colour models and the
computer monitor later.
Conversions between bases
Because Bases 16 and 8 are also powers of 2, there are some convenient
methods of converting from one base to the other:
16 <---> 2
- convert each Hex digit into a 4 bit binary number
- join all of the 4 bit groups together in the order they occur and
you are finished
2 <---> 16
- starting from the RIGHT, separate the bitstream into 4 bit groups
- convert each 4 bit group into a Hex digit
- join up the hex digits in the order they were generated and you
are finished
Information Basics
In these notes, we will concentrate on IBM/ISA PC hardware and associated
architectures. There are many different hardware specifications, each
with their own collection of standards.
A computer is an INFORMATION PROCESSOR - that is it spends most of
its time moving information (in Binary and Hex) around.
A Programmer recognises the required STEPS by which a task is to be
completed - this is termed an ALGORITHM.
The ALGORITHM is encoded (or made computer friendly) by writing/converting
it into a computer language - the result is termed a PROGRAM.
PROGRAMS can be written in low level languages (like ASSEMBLER), but
are more commonly written in a relatively high level language (C, COBOL,
FORTRAN, Pascal ...).
All programs, however are translated into ASSEMBLER and ultimately
into BINARY before the computer can begin processing them at chip level.
Basic Information Unit = BIT (Binary digIT) - 2 states
ON/OFF
BITs are usually grouped into BYTES (8 CONTIGUOUS BITS
= 1 BYTE)
CONTIGUOUS BYTES are often grouped into WORDS (with 8,
16, 32, 36, 64 and 128 BIT WORDS commonly in use today)
Computer memory used to be measures in KILOBYTES (1 Kb
= 1024 bytes = a 'long' 1000)
These days, MEGABYTES (1 Mb = 1024 Kb = 1048576 bytes),
or GIGABYTES (1073741824 bytes), TERABYTES
(a kilo of gigabytes) and PETABYTES (a kilo of terabytes)
are commonly used as measures.
This is, unfortunately, further confused by the fact that a KILOBYTE
is not always 1024 bytes - in Data Communications,
the standard metric measures (1000, 10000 etc) are used with
the same names.
a byte = 8 bits. From right to left, their values are
as follows:
128 |
64 |
32 |
16 |
8 |
4 |
2 |
1 |
MSB |
|
|
|
|
|
|
LSB |
Within a byte, there are 256 uniquely different patterns
of 1's and 0's if all 8 bits are used.
0000 0000 = 0
0000 0001 = 1
0000 0010 = 2
: :
1111 1101 = 253
1111 1110 = 254
1111 1111 = 255
The smallest value bit is called the Least Significant Bit,
and the left most bit is termed, appropriately, the Most Significant
Bit. These terms are somewhat misleading, as they suggest that an
error in the lower end of a byte is less of a problem than one in the
higher end - this is far from the truth
The Baudot Code Set
Early attempts to use binary patterns to represent numbers and letters
led to many conflicting conventions.
The following chart depicts the Baudot Code Set. Baudot codes were
5 bit representations of characters and other symbols, and were commonly
used in the days when input to a computer was via punched paper tape.
You would NOT be expected to learn these codes, they are included to
give you an appreciation of schemes for representation that were tried.
The leftmost bit is the Most Significant Bit (MSB), transmitted last.
The rightmost bit is the Least Significant Bit (LSB), transmitted first.
The associated LETTERS and FIGURES (case) characters are also listed,
along with the hexadecimal representation of the character.
BITS LTRS FIGS HEX
----- ---- ---- ---
00011 A - 03
11001 B ? 19
01110 C : 0E
01001 D $ 09
00001 E 3 01
01101 F ! 0D
11010 G & 1A
10100 H STOP 14
00110 I 8 06
01011 J ' 0B
01111 K ( 0F
10010 L ) 12
11100 M . 1C
01100 N , 0C
11000 O 9 18
10110 P 0 16
10111 Q 1 17
01010 R 4 0A
00101 S BELL 05
10000 T 5 10
00111 U 7 07
11110 V ; 1E
10011 W 2 13
11101 X / 1D
10101 Y 6 15
10001 Z " 11
00000 n/a n/a 00
01000 CR CR 08
00010 LF LF 02
00100 SP SP 04
11111 LTRS LTRS 1F
11011 FIGS FIGS 1B
It can be noted that a particular code could well stand for more than
one thing, necessitating an extra transmission to tell the receiving
device whether it was a letter or a figure - each symbol then required
10 bits unless it was one of the 'control' codes.
The information on The Baudot Set, and tables of other collating sequences
were copied from A Brief History of Data Communications at www.tbu.net/~jhall/history1.html
with thanx.
CHARACTERS
Symbols we can generate using a keyboard include 52 letters of alphabet
(with both upper and lower case),10 digits and many other keyboard symbols.
This basic character set usually totals less than 128 uniquely different
characters. It is common to encode these characters into a number that
is representable by a digital computer.
By using bits 0..6 (ie. 7 bits), it is possible to represent 128 uniquely
different patterns. ASCII (the American Standard Code
for Information Interchange) was one such scheme which
is probably the most prolific, though not necessarily the most sensible.
Space Parity 7-bit Ascii
------------------------------------------------------------------------------
CHAR HEX OCTAL BINARY DEC CHAR HEX OCTAL BINARY DEC
------------------------------------------------------------------------------
A 41 101 01000001 065 P 50 120 01010000 080
B 42 102 01000010 066 Q 51 121 01010001 081
C 43 103 01000011 067 R 52 122 01010010 082
D 44 104 01000100 068 S 53 123 01010011 083
E 45 105 01000101 069 T 54 124 01010100 084
F 46 106 01000110 070 U 55 125 01010101 085
G 47 107 01000111 071 V 56 126 01010110 086
H 48 110 01001000 072 W 57 127 01010111 087
I 49 111 01001001 073 X 58 130 01011000 088
J 4A 112 01001010 074 Y 59 131 01011001 089
K 4B 113 01001011 075 Z 5A 132 01011010 090
L 4C 114 01001100 076
M 4D 115 01001101 077
N 4E 116 01001110 078
O 4F 117 01001111 079
------------------------------------------------------------------------------
a 61 141 01100001 097 p 70 160 01110000 112
b 62 142 01100010 098 q 71 161 01110001 113
c 63 143 01100011 099 r 72 162 01110010 114
d 64 144 01100100 100 s 73 163 01110011 115
e 65 145 01100101 101 t 74 164 01110100 116
f 66 146 01100110 102 u 75 165 01110101 117
g 67 147 01100111 103 v 76 166 01110110 118
h 68 150 01101000 104 w 77 167 01110111 119
i 69 151 01101001 105 x 78 170 01111000 120
j 6A 152 01101010 106 y 79 171 01111001 121
k 6B 153 01101011 107 z 7A 172 01111010 122
l 6C 154 01101100 108
m 6D 155 01101101 109
n 6E 156 01101110 110
o 6F 157 01101111 111
------------------------------------------------------------------------------
0 30 060 00110000 048 % 25 045 00100101 037
1 31 061 00110001 049 & 26 046 00100110 038
2 32 062 00110010 050 ' 27 047 00100111 039
3 33 063 00110011 051 ( 28 050 00101000 040
4 34 064 00110100 052 ) 29 051 00101001 041
5 35 065 00110101 053 * 2A 052 00101010 042
6 36 066 00110110 054 + 2B 053 00101011 043
7 37 067 00110111 055 , 2C 054 00101100 044
8 38 070 00111000 056 - 2D 055 00101101 045
9 39 071 00111001 057 . 2E 056 00101110 046
SP 20 040 00100000 032 / 2F 057 00101111 047
! 21 041 00100001 033 : 3A 072 00111010 058
" 22 042 00100010 034 ; 3B 073 00111011 059
# 23 043 00100011 035 < 3C 074 00111100 060
$ 24 044 00100100 036 = 3D 075 00111101 061
> 3E 076 00111110 062 STX 02 002 00000010 002
? 3F 077 00111111 063 ETX 03 003 00000011 003
------------------------------------------------------------------------------
CHAR HEX OCTAL BINARY DEC CHAR HEX OCTAL BINARY DEC
------------------------------------------------------------------------------
@ 40 100 01000000 064 EOT 04 004 00000100 004
[ 5B 133 01011011 091 ENQ 05 005 00000101 005
\ 5C 134 01011100 092 ACK 06 006 00000110 006
] 5D 135 01011101 093 BEL 07 007 00000111 007
^ 5E 136 01011110 094 BS 08 010 00001000 008
5F 137 01011111 095 HT 09 011 00001001 009
{ 7B 173 01111011 123 LF 0A 012 00001010 010
| 7C 174 01111100 124 VT 0B 013 00001011 011
} 7D 175 01111101 125 FF 0C 014 00001100 012
~ 7E 176 01111110 126 CR 0D 015 00001101 013
DEL 7F 177 01111111 127 SO 0E 016 00001110 014
NUL 00 000 00000000 000 SI 0F 017 00001111 015
SOH 01 001 00000001 001 DLE 10 020 00010000 016
------------------------------------------------------------------------------
D1 11 021 00010001 017
D2 12 022 00010010 018
D3 13 023 00010011 019
D4 14 024 00010100 020
NAK 15 025 00010101 021
SYN 16 026 00010110 022
ETB 17 027 00010111 023
CAN 18 030 00011000 024
EM 09 031 00011001 025
SUB 1A 032 00011010 026
ESC 1B 033 00011011 027
FS 1C 034 00011100 028
GS 1D 035 00011101 029
RS 1E 036 00011110 030
US 1F 037 00011111 031
-----------------------------------------
ASCII is one of many collating sequences used for coding characters
into numerical equivalents (others include CDC and IBM's EBCDIC). Every
character has an ASCII equivalent. 95 are 'printable', the rest are
generally not, being designated as control codes designed to
cause either the computer or other connected device to 'do something'.
'A' = 65, 'B' = 66, 'C' = 67....
'a' = 97, 'b' = 98, 'c' = 99....
'0' = 48, '1' = 49, '2' = 50...
bel = 07, cr = 13, lf = 10..
Only 7 bits are used for coding standard ASCII.
If all 8 bits are used, then EXTENDED ASCII (a set of 256 characters
result including 'graphics characters' ׬òèÈý ...).
EXTENDED ASCII still has the standard printable characters in the same
places, thankfully.
NUMBERS
Mathematical quantities present their own problems, and differing schemes
are used to represent them:
BCD (Binary Coded Decimal) - where each digit in a decimal
number is converted to 4 bit binary
eg. 38 base10 = 0011 1000 in BCD
advantages: each digit has its own pattern, even the decimal
point and negative sign have one - quick and easy to render numbers
digitally
disadvantages a 'cow' to use - special rules of arithmetic
need to be defined for each of even the standard computations (eg.
how do you 'carry' in addition ?)
UNSIGNED INTEGERS (cardinals = positive whole nos)
2 byte word used 0000000000000000 = 0
1111111111111111 = 65535
SIGNED INTEGERS (whole numbers, either positive or negative) in a
2-byte word.
When representing numbers, different conventions must be invented for
distinguishing the positives from the negatives. 3 such schemes are
presented below:
- SIGN MAGNITUDE- MSB is 0 for positive numbers, 1 for
negative numbers - range +32767..+0,-0,..-32767
for example,
+42 is 0000 0000 0010 1010
and -42 is 1000 0000 0010 1010
note the MSB (the left-most bit) is either '0' indicating a 'positive'
or '1' indicating a negative.
added together= 1000 0000 0101 0100 which IS NOT zero
New mathematical 'rules' are necessary to correctly compute answers
to mathematical ecpressions. Using this scheme, there are also 2
zeros (one being positive, the other being negative)
- ONES COMPLEMENT- the negative of a number is obtained
by inverting each of its bits - range +0..32767,-32767,..-0
for example,
+42 is 0000 0000 0010 1010
and -42 is 1111 1111 1101 0101
added together= 1111 1111 1111 1111 which IS NEGATIVE ZERO
This is 'more' acceptable, mathematically, but still presents 2
different values for zero
- TWOS COMPLEMENT-the negative of a number is obtained
by flipping the bits then adding 1
The range is 0..32767,-32768..-1. Note there is only ONE zero
for example,
+42 is 0000 0000 0010 1010
and -42 is 1111 1111 1101 0101
+ 1
=1111 1111 1101 0110
+42 + -42 =0000 0000 0000 0000 which IS ZERO
Although a bit is carried out of the word, the answer is recognisably
zero. Twos compliment remains the most 'popular' method of representing
integers in a 2 byte word.
REALS
numbers containing fractional quantities - many schemes used, all complex.
Briefly, express number in scientific notation, MSB indicates sign,
next 8 bits contain exponent, remainder contains mantissa - often leads
rounding errors.
REPRESENTATIONAL PROBLEMS
'2' '7'
00110010 00110111 = 12855 base10
Q: how does the computer distinguish the type of data (in this case
is the bitstream character or integer???)
A: it doesn't ! It is the job of the INSTRUCTIONS (ie. the currently
running program) to sort out what the memory contents mean (i.e. their
context and value)
A computer stores instructions together with data in similar places
in memory. Faulty instructions can 'grab' bytes of instructions and
interpret them as data and visa-versa leading to all sorts of problems.