IPT - A Virtual Approach - Representation

Digital computers need to represent things (either numerical, graphical or textual) so that it can store and manipulate them. All digital computers use minute switches (called transistors) to do this - each switch can either be ON (1) or OFF (0) and each is called a bit.

Using bits (or 1's and 0's) requires some conventions and a little bit of mathematical 'magic'.

In this section we investigate numeric representation/symbolism (numbers) on a digital computer. This will help in our understanding of some low-level computer processes.

Please note: the caret (^) in this section means RAISED TO THE POWER OF

HINDU-ARABIC (Base 10) - decimal

In Decimal there are 10 Unique symbols (0123456789) that can be arranged using a particular order to represent quantities.

Order is important (eg. 123 not same as 312 even though the same digits have been used) - this order is called PLACE VALUE, and the order of the digits greatly effects the value of the number.

Base 10 or decimal numbers are organised into columns that are powers of 10:

   ..... 10^4   10^3   10^2  10^1   10^0  .   10^-1 10^-2  10^-3   ...
         10000  1000   100   10     1     .   1/10  1/100  1/1000

256.3 = 2x10^2 + 5x10^1 + 6x10^0 + 3x10^-1 (this is sometimes referred to as scientific notation)

Hindu-Arabic has some advantages for us HUMANS

This number system uses quite large conveniently sized groups, which all relate back to the number fingers or toes we have.
We are well used to this system as we learn it nearly from birth.
Groups of 10 are convenient for processing numbers of any size, and a whole series of terms have evolved around these groups - decimal point, ten, hundred, thousand etc.

Hindu-Arabic is totally inconvenient for digital computers, yet they appear to be able to use this system with ease - how do they do it?

The answer to this question will come in a few steps...

BINARY (Base 2)

All information dealt with by a digital computer is stored (and often processed) in BINARY.

A Digital Computer is a SWITCHING computer - it is made up of billions of transistors each of which is a simple switch which is a two-state device (2 state = on/off).

In contrast to the familiar Decimal, BINARY uses ONLY 2 unique symbols (0 and 1) arranged using place value to represent any numeric quantity. How is this possible?

Using a familiar model, columns (or place value) are powers of 2 (instead of the familiar 10)

...2^6   2^5    2^4    2^3   2^2    2^1   2^0 .  2^-1   2^-2  2^-3...
   64    32     16     8     4      2     1   .  1/2    1/4   1/8

The order of the digits (or bits as they are called), and the column they occupy determine their value. This means there is a place value present in this system, but the places are different to what you are used to.

It may surprise you that using this scheme, it is possible to represent any number that exists in the real world (as well as to approximate most irrational numbers as well). It just looks weird. The following sequence represents the first 20 counting numbers (ie. 1..20) in binary

1
10
11
100
101
110
111
1000
1001
1010
1011
1100
1101
1110
1111
10000
10001
10010
10011
10100

In the above sequence there is a definite pattern - can you spot it?

The biggest digit that can occur in any column is a '1' - to go one more, we reset that column to '0' and carry the '1' to the next column on the left.

You have been doing this successfully with decimal numbers most of your life, the difference here is that the group size is different (instead of the familiar '9' indicating a column is full, the '1' does the job instead).

   1101 base2     = 1x2^3 + 1x2^2 + 0x2^1 + 1x2^0 
                  = 8 + 4 + 1
                  = 13 base10

   42 base10      = 1x2^5 + 0x2^4 + 1x2^3 + 0x2^2 + 1x2^1 +0x2^0
                  = 101010 base2

Although this grouping system appears clumsy to our Base 10 biassed minds, it represents a simplicity sufficient to be used with great rapidity inside every digital device. Since 10 is NOT a power of 2, digital computers actually work very hard to display answers in decimal, when all operations actually happen in binary on the processor chip.

You will notice that even quite small quantities in binary occupy MANY digits - this is a product of the small group size.

Fractional quantities can also be represented using this base, but some irrationals (like 1/3 are represented in a fairly 'lossy' way, and therefore may tend to be imprecise in certain applications).

OCTAL (Base 8)

(for historical reference only !!! This will NOT be tested) Octal was a common number system used by early 8-bit digital computers.

In Octal there are 8 unique symbols (01234567) arranged using place value to represent any numeric quantity. The column system comprises powers of 8. The order of the digits determines their place value.

...8^6   8^5    8^4    8^3   8^2    8^1   8^0 .  8^-1   8^-2  8^-3...
262144   32768  4096   512   64     8     1   .  1/8    1/64  1/512


   105 base8   = 1x8^2+0x8^1+5x8^0
               = 64 + 0 + 5
               = 96 base10

   4178 base10 = 1x8^4+0x8^3+1x8^2+2x8^1+2x8^0 
               = 10122 base8

Since we are currently working with 32bit and 64bit machines are becoming more common, we will not dwell on this number system.

HEXADECIMAL (Base 16)

Since large quantities are very cumbersome in binary, a number system was devised with a relatively large group size.

Hexadecimal (or Hex) uses a group size of 16. In Hex there are 16 unique symbols (0123456789ABCDEF) arranged using place value to represent any numeric quantity.

Since we ran out of conventional mathematical symbols for quantities at '9', some wise(?) computer scientists decided to 'recycle' letters from the alphabet for the 6 additional symbolic quantities. The columns in Hex are powers of 16:

...16^4  16^3   16^2   16^1  16^0   .  16^-1  16^-2 16^-3 ...
65536    4096   256    16    1      .  1/16   1/256 1/4096

The largest symbol that can occupy a column is 'F' (which stands for 15 in decimal)

   2A base16    = 2x16^1 + Ax16^0
                = 32 + 10 
                = 42 base10

   1993 base10  = 7x16^2 + Cx16^1 + 9x16^0
                = 7C9 base16

This large group size makes this number system convenient for representing LARGE numbers which are commonly dealt with in digital computers.

Indeed Hexadecimal is used to represent memory addresses that routinely range in the millions. An added bonus of the Hexadecimal system stems from the fact that 16 is a power of 2, meaning binary-hex conversions are VERY fast.

This number system 'feels' most alien to our base 10 biassed minds - seeing LETTERS and numbers mixed up to represent mathematical quantities is confusing at first.

Hex presents itself in the 'oddest' of places - these words, for example have a font colour defined in Hex :CC3399 (meaning CC of RED = 204 RED; 33 of GREEN = 51 GREEN; 99 of BLUE = 153 BLUE). By using Hex pairs for each of the three primary reflective spectra primaries, it is possible to specify over 16.5 Million different colour variations (thankfully it is rare to see them all on the one page :).

We will learn more about colour models and the computer monitor later.

Conversions between bases

Because Bases 16 and 8 are also powers of 2, there are some convenient methods of converting from one base to the other:

   16 <---> 2

convert each Hex digit into a 4 bit binary number
join all of the 4 bit groups together in the order they occur and you are finished

    2 <---> 16

starting from the RIGHT, separate the bitstream into 4 bit groups
convert each 4 bit group into a Hex digit
join up the hex digits in the order they were generated and you are finished

Information Basics

In these notes, we will concentrate on IBM/ISA PC hardware and associated architectures. There are many different hardware specifications, each with their own collection of standards.

A computer is an INFORMATION PROCESSOR - that is it spends most of its time moving information (in Binary and Hex) around.

A Programmer recognises the required STEPS by which a task is to be completed - this is termed an ALGORITHM.

The ALGORITHM is encoded (or made computer friendly) by writing/converting it into a computer language - the result is termed a PROGRAM.

PROGRAMS can be written in low level languages (like ASSEMBLER), but are more commonly written in a relatively high level language (C, COBOL, FORTRAN, Pascal ...).

All programs, however are translated into ASSEMBLER and ultimately into BINARY before the computer can begin processing them at chip level.

Basic Information Unit = BIT (Binary digIT) - 2 states ON/OFF

BITs are usually grouped into BYTES (8 CONTIGUOUS BITS = 1 BYTE)

CONTIGUOUS BYTES are often grouped into WORDS (with 8, 16, 32, 36, 64 and 128 BIT WORDS commonly in use today)

Computer memory used to be measures in KILOBYTES (1 Kb = 1024 bytes = a 'long' 1000)

These days, MEGABYTES (1 Mb = 1024 Kb = 1048576 bytes), or GIGABYTES (1073741824 bytes), TERABYTES (a kilo of gigabytes) and PETABYTES (a kilo of terabytes) are commonly used as measures.

This is, unfortunately, further confused by the fact that a KILOBYTE is not always 1024 bytes - in Data Communications, the standard metric measures (1000, 10000 etc) are used with the same names.

Data Representation

a byte = 8 bits. From right to left, their values are as follows:

128	64	32	16	8	4	2	1
MSB							LSB

Within a byte, there are 256 uniquely different patterns of 1's and 0's if all 8 bits are used.

     0000 0000   = 0
     0000 0001   = 1
     0000 0010   = 2
       :   :
     1111 1101   = 253
     1111 1110   = 254
     1111 1111   = 255

The smallest value bit is called the Least Significant Bit, and the left most bit is termed, appropriately, the Most Significant Bit. These terms are somewhat misleading, as they suggest that an error in the lower end of a byte is less of a problem than one in the higher end - this is far from the truth

The Baudot Code Set

Early attempts to use binary patterns to represent numbers and letters led to many conflicting conventions.

The following chart depicts the Baudot Code Set. Baudot codes were 5 bit representations of characters and other symbols, and were commonly used in the days when input to a computer was via punched paper tape. You would NOT be expected to learn these codes, they are included to give you an appreciation of schemes for representation that were tried.

The leftmost bit is the Most Significant Bit (MSB), transmitted last. The rightmost bit is the Least Significant Bit (LSB), transmitted first.

The associated LETTERS and FIGURES (case) characters are also listed, along with the hexadecimal representation of the character.

BITS     LTRS    FIGS      HEX
-----    ----    ----      ---
00011      A      -        03
11001      B      ?        19
01110      C      :        0E
01001      D      $        09
00001      E      3        01
01101      F      !        0D
11010      G      &        1A
10100      H      STOP     14
00110      I      8        06
01011      J      '        0B
01111      K      (        0F
10010      L      )        12
11100      M      .        1C
01100      N      ,        0C
11000      O      9        18
10110      P      0        16
10111      Q      1        17
01010      R      4        0A
00101      S      BELL     05
10000      T      5        10
00111      U      7        07
11110      V      ;        1E
10011      W      2        13
11101      X      /        1D
10101      Y      6        15
10001      Z      "        11
00000      n/a    n/a      00
01000      CR     CR       08
00010      LF     LF       02
00100      SP     SP       04
11111      LTRS   LTRS     1F
11011      FIGS   FIGS     1B

It can be noted that a particular code could well stand for more than one thing, necessitating an extra transmission to tell the receiving device whether it was a letter or a figure - each symbol then required 10 bits unless it was one of the 'control' codes.

The information on The Baudot Set, and tables of other collating sequences were copied from A Brief History of Data Communications at www.tbu.net/~jhall/history1.html with thanx.

CHARACTERS

Symbols we can generate using a keyboard include 52 letters of alphabet (with both upper and lower case),10 digits and many other keyboard symbols.

This basic character set usually totals less than 128 uniquely different characters. It is common to encode these characters into a number that is representable by a digital computer.

By using bits 0..6 (ie. 7 bits), it is possible to represent 128 uniquely different patterns. ASCII (the American Standard Code for Information Interchange) was one such scheme which is probably the most prolific, though not necessarily the most sensible.

Space Parity 7-bit Ascii
------------------------------------------------------------------------------
  CHAR   HEX   OCTAL   BINARY    DEC       CHAR   HEX   OCTAL   BINARY    DEC
------------------------------------------------------------------------------
   A     41     101   01000001   065        P     50     120   01010000   080
   B     42     102   01000010   066        Q     51     121   01010001   081
   C     43     103   01000011   067        R     52     122   01010010   082
   D     44     104   01000100   068        S     53     123   01010011   083
   E     45     105   01000101   069        T     54     124   01010100   084
   F     46     106   01000110   070        U     55     125   01010101   085
   G     47     107   01000111   071        V     56     126   01010110   086
   H     48     110   01001000   072        W     57     127   01010111   087
   I     49     111   01001001   073        X     58     130   01011000   088
   J     4A     112   01001010   074        Y     59     131   01011001   089
   K     4B     113   01001011   075        Z     5A     132   01011010   090
   L     4C     114   01001100   076
   M     4D     115   01001101   077
   N     4E     116   01001110   078
   O     4F     117   01001111   079
------------------------------------------------------------------------------
   a     61     141   01100001   097        p     70     160   01110000   112
   b     62     142   01100010   098        q     71     161   01110001   113
   c     63     143   01100011   099        r     72     162   01110010   114
   d     64     144   01100100   100        s     73     163   01110011   115
   e     65     145   01100101   101        t     74     164   01110100   116
   f     66     146   01100110   102        u     75     165   01110101   117
   g     67     147   01100111   103        v     76     166   01110110   118
   h     68     150   01101000   104        w     77     167   01110111   119
   i     69     151   01101001   105        x     78     170   01111000   120
   j     6A     152   01101010   106        y     79     171   01111001   121
   k     6B     153   01101011   107        z     7A     172   01111010   122
   l     6C     154   01101100   108
   m     6D     155   01101101   109
   n     6E     156   01101110   110
   o     6F     157   01101111   111
------------------------------------------------------------------------------
   0     30     060   00110000   048        %     25     045   00100101   037
   1     31     061   00110001   049        &     26     046   00100110   038
   2     32     062   00110010   050        '     27     047   00100111   039
   3     33     063   00110011   051        (     28     050   00101000   040
   4     34     064   00110100   052        )     29     051   00101001   041
   5     35     065   00110101   053        *     2A     052   00101010   042
   6     36     066   00110110   054        +     2B     053   00101011   043
   7     37     067   00110111   055        ,     2C     054   00101100   044
   8     38     070   00111000   056        -     2D     055   00101101   045
   9     39     071   00111001   057        .     2E     056   00101110   046
   SP    20     040   00100000   032        /     2F     057   00101111   047
   !     21     041   00100001   033        :     3A     072   00111010   058
   "     22     042   00100010   034        ;     3B     073   00111011   059
   #     23     043   00100011   035        <     3C     074   00111100   060
   $     24     044   00100100   036        =     3D     075   00111101   061
   >     3E     076   00111110   062        STX   02     002   00000010   002
   ?     3F     077   00111111   063        ETX   03     003   00000011   003
------------------------------------------------------------------------------
  CHAR   HEX   OCTAL   BINARY    DEC       CHAR   HEX   OCTAL   BINARY    DEC
------------------------------------------------------------------------------
   @     40     100   01000000   064        EOT   04     004   00000100   004
   [     5B     133   01011011   091        ENQ   05     005   00000101   005
   \     5C     134   01011100   092        ACK   06     006   00000110   006
   ]     5D     135   01011101   093        BEL   07     007   00000111   007
   ^     5E     136   01011110   094        BS    08     010   00001000   008
         5F     137   01011111   095        HT    09     011   00001001   009
   {     7B     173   01111011   123        LF    0A     012   00001010   010
   |     7C     174   01111100   124        VT    0B     013   00001011   011
   }     7D     175   01111101   125        FF    0C     014   00001100   012
   ~     7E     176   01111110   126        CR    0D     015   00001101   013
   DEL   7F     177   01111111   127        SO    0E     016   00001110   014
   NUL   00     000   00000000   000        SI    0F     017   00001111   015
   SOH   01     001   00000001   001        DLE   10     020   00010000   016
------------------------------------------------------------------------------
   D1    11     021   00010001   017
   D2    12     022   00010010   018
   D3    13     023   00010011   019
   D4    14     024   00010100   020
   NAK   15     025   00010101   021
   SYN   16     026   00010110   022
   ETB   17     027   00010111   023
   CAN   18     030   00011000   024
   EM    09     031   00011001   025
   SUB   1A     032   00011010   026
   ESC   1B     033   00011011   027
   FS    1C     034   00011100   028
   GS    1D     035   00011101   029
   RS    1E     036   00011110   030
   US    1F     037   00011111   031
-----------------------------------------

ASCII is one of many collating sequences used for coding characters into numerical equivalents (others include CDC and IBM's EBCDIC). Every character has an ASCII equivalent. 95 are 'printable', the rest are generally not, being designated as control codes designed to cause either the computer or other connected device to 'do something'.

      'A' = 65, 'B' = 66, 'C' = 67....
      'a' = 97, 'b' = 98, 'c' = 99....
      '0' = 48, '1' = 49, '2' = 50...
      bel = 07, cr  = 13, lf  = 10..

Only 7 bits are used for coding standard ASCII.

If all 8 bits are used, then EXTENDED ASCII (a set of 256 characters result including 'graphics characters' ×¬òèÈý ...).

EXTENDED ASCII still has the standard printable characters in the same places, thankfully.

NUMBERS

Mathematical quantities present their own problems, and differing schemes are used to represent them:

BCD (Binary Coded Decimal) - where each digit in a decimal number is converted to 4 bit binary

      eg. 38 base10 = 0011 1000 in BCD

advantages: each digit has its own pattern, even the decimal point and negative sign have one - quick and easy to render numbers digitally

disadvantages a 'cow' to use - special rules of arithmetic need to be defined for each of even the standard computations (eg. how do you 'carry' in addition ?)

UNSIGNED INTEGERS (cardinals = positive whole nos)

      2 byte word used       0000000000000000 = 0
                             1111111111111111 = 65535

SIGNED INTEGERS (whole numbers, either positive or negative) in a 2-byte word.

When representing numbers, different conventions must be invented for distinguishing the positives from the negatives. 3 such schemes are presented below:

SIGN MAGNITUDE- MSB is 0 for positive numbers, 1 for negative numbers - range +32767..+0,-0,..-32767
for example,
```
         +42 is 0000 0000 0010 1010
     and -42 is 1000 0000 0010 1010
```
note the MSB (the left-most bit) is either '0' indicating a 'positive' or '1' indicating a negative.
```
added together= 1000 0000 0101 0100 which IS NOT zero
```
New mathematical 'rules' are necessary to correctly compute answers to mathematical ecpressions. Using this scheme, there are also 2 zeros (one being positive, the other being negative)

ONES COMPLEMENT- the negative of a number is obtained by inverting each of its bits - range +0..32767,-32767,..-0
for example,
```
         +42 is 0000 0000 0010 1010
     and -42 is 1111 1111 1101 0101

added together= 1111 1111 1111 1111 which IS NEGATIVE ZERO
```
This is 'more' acceptable, mathematically, but still presents 2 different values for zero

TWOS COMPLEMENT-the negative of a number is obtained by flipping the bits then adding 1
The range is 0..32767,-32768..-1. Note there is only ONE zero
for example,
```
         +42 is 0000 0000 0010 1010
     and -42 is 1111 1111 1101 0101
               +                  1
               =1111 1111 1101 0110

+42 + -42      =0000 0000 0000 0000 which IS ZERO
```
Although a bit is carried out of the word, the answer is recognisably zero. Twos compliment remains the most 'popular' method of representing integers in a 2 byte word.

REALS

numbers containing fractional quantities - many schemes used, all complex. Briefly, express number in scientific notation, MSB indicates sign, next 8 bits contain exponent, remainder contains mantissa - often leads rounding errors.

REPRESENTATIONAL PROBLEMS

   '2'          '7'
   00110010     00110111  = 12855 base10

Q: how does the computer distinguish the type of data (in this case is the bitstream character or integer???)

A: it doesn't ! It is the job of the INSTRUCTIONS (ie. the currently running program) to sort out what the memory contents mean (i.e. their context and value)

A computer stores instructions together with data in similar places in memory. Faulty instructions can 'grab' bytes of instructions and interpret them as data and visa-versa leading to all sorts of problems.

Representation

Number Systems