Unit 1 - Data Representation

Number systems

Converting binary to denary

To review watch this: https://www.youtube.com/watch?v=q7nZbAUTSC4

Converting a decimal number to binary

To review watch this: https://www.youtube.com/watch?v=gGiEu7QTi68

Hexadecimal number system (base 16)

Humans are not very good at remembering long strings of numbers so, to make it easier for us, we can represent every group of 4 bits with a single digit.

Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F

Convert binary to hexadecimal

Example:

Convert the binary number 101010111100010 to hexadecimal

Solution:

101 0101 1110 0010=> 5 5 E 2

Convert hexadecimal to binary

Example:

Convert the hexadecimal number EA13B to binary

E A 1 3 B => 1110 1010 0001 0011 1011

Convert hexadecimal to denary

First, convert the hexadecimal number to binary and from binary convert it to denary.

Convert denary to hexadecimal

Convert first the denary number to binary and then convert this binary number to hexadecimal.

Use of binary numbers in computer registers

Binary numbers are used in robotics, digital instruments and counting systems.

A register is a high-speed storage area within the CPU. It is a group of bits and we will represent it as follows:

When computers are used to control devices such as robots, digital instruments or counting systems, registers are used as part of the control system.

Example:

A robot vacuum cleaner has three wheels, A, B and C. A rotates on a spindle to allow for direction changes (as well as forward and backward movement); B and C are fixed to revolve around their axles to provide only forward and backward movement, and have an electric motor attached:

An 8-bit register is used to control the movement of the robot vacuum cleaner:

If the register contains 10101010 this means “motor B is ON and motor C is ON and both motors are turning to produce FORWARDS motion”. Effectively, the vacuum cleaner is moving forwards.

Exercise:

What would be the effect if the register contained the following values?

1 0 1 0 0 1 0 1
1 0 1 0 0 1 1 0

What would the register contain if motor B and motor C were both ON but B was turning in a backward direction and C was turning in a forward direction?

Use of hexadecimal

Colours in HTML

Colours

http://yizzle.com/whatthehex/

Media Access Control (MAC) addresses

A MAC address is a number that uniquely identifies a Network Interface Card (NIC). The MAC address is rarely changed so that a particular device can always be identified no matter where it is.

A MAC address is usually made up of 48 bits which are shown as 6 groups of hexadecimal digits:

NN-NN-NN-DD-DD-DD

or

NN:NN:NN:DD:DD:DD

where

NN:NN:NN = identity number of the manufacturer of the device

DD:DD:DD = serial number of the device

Example:

34-34-52-C4-69-B8

ASCII code

Assembly language and machine code

Assembly language is a low level programming language.

Example of assembly language:

STO FFA4

Every processor or processor family has its own instruction set. Instructions are patterns of bits that correspond to different commands to the machine.

Example of machine code:

1010 0101 1110 0100 1111 1111 1010 0100

Debugging

Error checking methods

Following data transmission, there is always the risk that the data has been corrupted or changed in some way.

Methods to detect errors:

Parity checking

PARITY CHECKING is one method used to check whether data has been changed or corrupted following transmission from one device or medium to another device or medium. A byte of data, for example, is allocated a PARITY BIT. This is allocated before transmission takes place. Systems that use EVEN PARITY have an even number of 1- bits; systems that use ODD PARITY have an odd number of 1-bits. Consider the following byte:

1 1 0 1 1 0 0

If this byte is using even parity, then the parity bit needs to be 0. If odd parity is being used, then the parity bit needs to be 1 to make the number of 1-bits odd. Therefore, the byte just before transmission would be: either

0 1 1 0 1 1 0 0 (even parity)

or

1 1 1 0 1 1 0 0 (odd parity)

Activities:

Find the parity bits for each of the following bytes:

a. 1 1 0 1 1 0 1 even parity used

b. 0 0 0 1 1 1 1 even parity used

c. 0 1 1 1 0 0 0 even parity used

d. 1 1 1 0 1 0 0 odd parity used

e. 1 0 1 1 0 1 1 odd parity used

If a byte has been transmitted from ‘A’ to ‘B’, and even parity is used, an error would be flagged if the byte now had an odd number of 1-bits at the receiver’s end.

0 1 0 1 1 1 0 0

0 1 0 0 1 1 0 0

In this case, the receiver’s byte has three 1-bits, which means it now has odd parity whilst the byte from the sender had even parity (four 1-bits). This clearly means an error has occurred during the transmission of the data. The error is detected by the computer recalculating the parity of the byte sent. If even parity has been agreed between sender and receiver, then a change of parity in the received byte indicates that a transmission error has occurred.

Activities

Which of the following bytes have an error following data transmission?

a. 1 1 1 0 1 1 0 1 even parity used

b. 0 1 0 0 1 1 1 1 even parity used

c. 0 0 1 1 1 0 0 0 even parity used

d. 1 1 1 1 0 1 0 0 odd parity used

e. 1 1 0 1 1 0 1 1 odd parity used

In each case where an error occurs, can you work out which bit is incorrect?

Check digits

A CHECK DIGIT is the final digit included in a code; it is calculated from all the other digits in the code.

Check digits are used for barcodes, product codes, International Standard Book Numbers (ISBN) and Vehicle Identification Numbers (VIN).

Check digits are used to identify errors in data entry caused by mistyping or misscanning a barcode.

They can usually detect the following types of error:

• an incorrect digit entered, for example 5327 entered instead of 5307

• transposition errors where two numbers have changed order, for example 5037 instead of 5307

• omitted or extra digits, for example 537 instead of 5307 or 53107 instead of 5307

• phonetic errors, for example 13, thirteen, instead of 30, thirty.

An example of a check digit calculation is ISBN 13, where the 13th digit of the ISBN code is calculated using the following algorithm.

1 Add all the odd numbered digits together, excluding the check digit.

2 Add all the even numbered digits together and multiply the result by 3.

3 Add the results from 1 and 2 together and divide by 10.

4 Take the remainder, if it is zero use this value, otherwise subtract the remainder from 10 to find the check digit.

Example:

Find the check digit of the following ISBN:

978-1-107-57724-?

1. 9+8+1+7+7+2=34

2. 7+1+0+5+7+4=24*3=72

3. 34+72=106/10=10.6

4. 10-6=4

To check that an ISBN 13 digit code is correct a similar process is followed.

1 Add all the odd numbered digits together, including the check digit.

2 Add all the even number of digits together and multiply the result by 3.

3 Add the results from 1 and 2 together and divide by 10.

4 The number is correct if the remainder is zero.

Actividades

a Find the check digit for the ISBN 978190612400.

b Are these ISBNs correct?

i 9718780171500

ii 9781234567897

Checksum

CHECKSUM is another way to check if data has been changed or corrupted following data transmission. Data is sent in blocks and an additional value, the checksum, is also sent at the end of the block of data.

To explain how this works, we will assume the checksum of a block of data is 1 byte in length. This gives a maximum value of 2^8 – 1 (i.e. 255). The value 0000 0000 is ignored in this calculation.

Example:

If the sum of all the bytes in the transmitted block of data is <= 255, then the checksum is this value. However, if the sum of all the bytes in the data block is > 255, then the checksum is found using the following algorithm:

1. Divide the sum, X, of the bytes by 256

2. Round the answer down to the nearest whole number, Y

3. Z = Y * 256

4. Calculate the difference (X – Z)

5. This value is the checksum

When a block of data is about to be transmitted, the checksum for the bytes is first of all calculated. This value is then transmitted with the block of data. At the receiving end, the checksum is recalculated from the block of data received. This calculated value is then compared to the checksum transmitted. If they are the same value, then the data was transmitted without any errors; if the values are different, then a request is sent for the data to be retransmitted.

Automatic Repeat Request

It uses an ACKNOWLEDGEMENT (a message sent by the receiver indicating that data has been received correctly) and TIMEOUT (this is the time allowed to elapse before an acknowledgement is received).

If an acknowledgement isn’t sent back to the sender before timeout occurs, then the message is automatically resent.

Compression techniques

Lossless

The data can be retrieved without any loss of the original information.

Common lossless compression techniques: keyword encoding, run-length encoding, Huffman encoding.

Keyword encoding

Replace frequently used words with a single character.

Example:

Let´s encode the following paragraph:

The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must all systems work independently, but they must interact and cooperate as well. Overall health is a function of the well-being of separate systems, as well as how these separate systems work in concert.

We will use the following chart:

Word Symbol

as ˆ

the -

and +

that $

must &

well %

these #

The encoded paragraph is:

The human body is composed of many independent systems, suchˆ - circulatory system, - respiratory system, + - reproductive system. Not only & each system work independently, but they & interact + cooperate ˆ %. Overall health is a function of - %-being of separate systems, ˆ % ˆ how # separate systems work in concert.

There are a total of 352 characters in the original paragraph, including spaces and punctuation. The encoded paragraph contains 317 characters, resulting in a savings of 35 characters.

Lossy

Some information may be lost in the process of compression.

Lossless algorithms are typically used for text, and lossy for images and sound where a little bit of loss in resolution is often undetectable, or at least acceptable.

Temporal compression: looks for differences between consecutive frames in a video file. A key frame is chosen as the basis on which to compare the differences and its entire image is stored. For consecutive images, only the changes (called delta frames) are stored. Temporal compression is effective in video that changes little from frame to frame, such as a scene that contains little movement.

Spatial compression: removes redundant information within a frame. Spatial video compression often groups pixels into blocks (rectangular areas) that have the same colour, such as a portion of a clear blue sky. Instead of storing each pixel, the colour and the coordinates of the area are stored.

show understanding that sound (music), pictures, video, text and numbers are stored in different formats

Data Representation

Source: https://www.bbc.co.uk/bitesize/guides/zpfdwmn/revision/1

Representing text

When any key on a keyboard is pressed, it needs to be converted into a binary number so that it can be processed by the computer and the typed character can appear on the screen.

A code where each number represents a character can be used to convert text into binary. One code we can use for this is called ASCII. The ASCII code takes each character on the keyboard and assigns it a binary number.

Text characters start at denary number 0 in the ASCII code, but this covers special characters including punctuation, the return key and control characters as well as the number keys, capital letters and lower case letters.

ASCII code can only store 128 characters, which is enough for most words in English but not enough for other languages. If you want to use accents in European languages or larger alphabets such as Cyrillic (the Russian alphabet) and Chinese Mandarin then more characters are needed. Therefore another code, called Unicode, was created. This meant that computers could be used by people using different languages.

Representing images

Images also need to be converted into binary in order for a computer to process them so that they can be seen on our screen. Digital images are made up of pixels. Each pixel in an image is made up of binary numbers.

If we say that 1 is black (or on) and 0 is white (or off), then a simple black and white picture can be created using binary.

To create the picture, a grid can be set out and the squares coloured (1 – black and 0 – white). But before the grid can be created, the size of the grid needs be known. This data is called metadata and computers need metadata to know the size of an image. If the metadata for the image to be created is 10x10, this means the picture will be 10 pixels across and 10 pixels down.

This example shows an image created in this way:

Adding colour

The system described so far is fine for black and white images, but most images need to use colours as well. Instead of using just 0 and 1, using four possible numbers will allow an image to use four colours. In binary this can be represented using two bits per pixel:

00 – white
01 – blue
10 – green
11 – red

While this is still not a very large range of colours, adding another binary digit will double the number of colours that are available:

1 bit per pixel (0 or 1): two possible colours
2 bits per pixel (00 to 11): four possible colours
3 bits per pixel (000 to 111): eight possible colours
4 bits per pixel (0000 – 1111): 16 possible colours
…
16 bits per pixel (0000 0000 0000 0000 – 1111 1111 1111 1111): over 65 000 possible colours

The number of bits used to store each pixel is called the colour depth. Images with more colours need more pixels to store each available colour. This means that images that use lots of colours are stored in larger files.

Image quality

Image quality is affected by the resolution of the image. The resolution of an image is a way of describing how tightly packed the pixels are.

In a low-resolution image, the pixels are larger so fewer are needed to fill the space. This results in images that look blocky or pixelated. An image with a high resolution has more pixels, so it looks a lot better when you zoom in or stretch it. The downside of having more pixels is that the file size will be bigger.

Representing sound

Sound needs to be converted into binary for computers to be able to process it. To do this, sound is captured - usually by a microphone - and then converted into a digital signal.

An analogue to digital converter will sample a sound wave at regular time intervals. For example, a sound wave like this can be sampled at each time sample point:

The samples can then be converted to binary. They will be recorded to the nearest whole number.

If the time samples are then plotted back onto the same graph, it can be seen that the sound wave now looks different. This is because sampling does not take into account what the sound wave is doing in between each time sample.

This means that the sound loses quality as data has been lost between the time samples. The way to increase the quality and store the sound at a quality closer to the original is to have more time samples that are closer together. This way, more detail about the sound can be collected, so when it’s converted to digital and back to analogue again it does not lose as much quality.

The frequency at which samples are taken is called the sample rate, and is measured in Hertz (Hz). 1 Hz is one sample per second. Most CD-quality audio is sampled at 44 100 or 48 000 kHz.

File formats

MIDI (Musical Instrument Digital Interface)

MIDI is a digital standard for encoding notes and their related properties instead of actual sound.

A MIDI file consists of a list of commands that instruct a device (for example, an electronic organ, sound card in a computer or in a mobile phone) how to produce a particular sound or musical note.

Each MIDI command has a specific sequence of bytes. The first byte is the status byte - this informs the MIDI device what function to perform.

Examples of MIDI commands include:

Note on or off: this indicates that a key (on an electronic keyboard) has been pressed or released to produce or to stop producing a musical note.

Key pressure: this indicates how hard the key has been pressed (this could indicate loudness of the music note)

Two additional bytes are required, a “pitch byte”, which tells the MIDI device which note to play, and a “velocity byte”, which tells the device how loud to play the note.

When music or sound is recorded on a computer system, these MIDI messages are saved in a file which is recognised by the file extension .mid.

If this .mid file is played back through a musical instrument, such as an electronic keyboard, the music will be played back in an identical way to the original. The whole piece of music will be played back in an identical way to the original. The whole piece of music will have been stored as a series of commands but no actual musical notes.

JPEG (Join Photographic Experts Group).

JPEG is one of the file formats used to reduce photographic file sizes. Once the image is subjected to the jpeg compression algorithm, a new file is formed and the original file can no longer be constructed.

JPEG relies on certain properties of the human eye and, up to a point, a certain amount of file compression can take place without any real loss of quality. The human eye is limited in its ability to detect very slight differences in brightness and in colour hues.

MP3 (MPEG-3, Moving Picture Experts Group Audio Layer 3)

MPEG-3 uses technology known as audio compression to convert music and other sounds into an MP3 file format.

This compression technology will reduce the size of a normal music file by about 90 per cent. For example, an 80 megabyte music CD can be reduced to 8 megabytes using MP3 technology.

MP4 (MPEG-4 Part 14)

This format allows the storage of multimedia files rather than just sound. Music, videos, photos and animation can all be stored in the MP4 format.