Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Part 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

CSC204: INTRODUCTION TO DATA STRUCTURES

Introduction

Overview:
A data structure is an arrangement of data in a computer’s memory (or sometimes on a disk). Data
structures include arrays, linked lists, stacks, binary trees, and hash tables, among others.
Algorithms manipulate the data in these structures in various ways, such as searching for a
particular data item and sorting the data. Data structure and algorithm go hand-in-hand in software
development, as a whole we can say that:
Program = Data Structure + Algorithm

What sorts of problems can you solve with a knowledge of data structure fundamentals? As a rough
approximation, we might divide the situations in which they’re useful into three categories:
 Real-world data storage: Many of the structures and techniques we’ll discuss are concerned
with how to handle real-world data storage (i.e. data that describes physical entities external to
the computer, e.g. a personnel record describes an actual human being).
 Programmer’s tools: Some data storage structures, however, are not meant to be accessed by
the user, but by the program itself. A programmer uses such structures as tools to facilitate some
other operation, e.g. stacks, queues. The correct choice of data structure allows major
improvements in program efficiency.
 Modelling: Some data structures directly model real-world situations. The most important data
structure of this type is the graph. You can use graphs to represent airline routes between cities
or connections in an electric circuit or tasks in a project.

Algorithm
An algorithm for a given task is “a finite sequence of instructions, each of which has a clear
meaning and can be performed with a finite amount of effort in a finite length of time".
 As such, an algorithm must be precise enough to be understood by human beings.
 However, in order to be executed by a computer, we need a program that is written in a rigorous
formal language; and since computers are quite inflexible compared to the human mind,
programs usually need to contain more details than algorithms.
 Algorithms can obviously be described in plain English, and we will sometimes do that.
 However, for computer scientists it is usually easier and clearer to use something that comes
somewhere in between formatted English and computer program code, but is not runnable
because certain details are omitted. This is called pseudocode.

Fundamental questions about algorithms


Given an algorithm to solve a particular problem, we are naturally led to ask:
1. What is it supposed to do?
2. Does it really do what it is supposed to do?
3. How efficiently does it do it?
The technical terms normally used for these three aspects are:
1. Specification: details of the problem that the algorithm is trying to solve. Typically, it will have
to specify how the inputs and outputs of the algorithm are related.

-1-
2. Verification: spend some effort verifying whether the algorithm is indeed correct (testing on a
few particular inputs can be enough to show that the algorithm is to be sure that the algorithm
incorrect) to be sure that algorithm satisfies its specification.
3. Performance analysis: the efficiency or performance of an algorithm relates to the resources
required by it, such as how quickly it will run, or how much computer memory it will use up.

Abstract Data Type (ADT)


 In computer science, abstract data type-ADT is a mathematical model of a data structure that
specifies the type of data stored, the operations supported on them, and the types of parameters
of the operations.
 A type is a collection of values. For example, the Boolean type consists of the values true and
false. The integers also form a type.
 Data type is defined by its behaviour (semantics) from the point of view of a user of the data,
specifically in terms of possible values, possible operations on data of this type, and the
behaviour of these operations.
 When we say "data type", we often refer to the primitive data types built into a language, such
as integer, real, character, and boolean.
 However, when we use integers, we do not worry at all about its internal representation, or how
these operations are implemented by the compiler in machine code.
 Additionally, we know that, even when we run our program on a different machine, the
behaviour of an integer does not change, even though its internal representation may change.
 What we know is that we can use primitive data types via their operational interfaces -- '+', '-', '*'
and '/' for integers.
 the concept of ADTs help us discuss about data structures without having to worry about its
implementation in a particular language, or worry about how the data is stored in computer
memory.
 An ADT does not specify how the data type is implemented, implementation details hidden
from the user of the ADT and protected from outside access, encapsulation by terminology.

Data structures
 A data structure is the implementation for an ADT. In an OO language e.g. Java, an ADT and
its implementation together make up a class.
 Each operation associated with the ADT is implemented by a member function or method.
 The variables that define the space required by a data item are referred to as data members.
 An object is an instance of a class, that is, something that is created and takes up storage during
the execution of a computer program.
 The term “data structure” often refers to data stored in a computer’s main memory.

-2-
Arrays

In computer science, the obvious way to store an ordered collection of items is as an array, which is
a numbered collection of data items all of the same type.

Arrays
 Array items are typically stored in a sequence of computer memory locations, but to discuss
them, we need a convenient way to write them down on paper.
 We can just write the items in order, separated by commas and enclosed by square brackets.
Thus, [1, 4, 17, 3, 90, 79, 4, 6, 81]
is an example of an array of integers. If we call this array a, we can write it as:
a = [1, 4, 17, 3, 90, 79, 4, 6, 81]
 This array a has 9 items, and hence we say that its size is 9.
 In everyday life, we usually start counting from 1. When we work with arrays in computer
science, however, we more often (though not always) start from 0.
 Thus, for our array a, its positions are 0, 1, 2, ... , 7, 8.
 The element in the 8th position is 81, and we use the notation a[8] to denote this element.
 More generally, for any integer i denoting a position, we write a[i] to denote the element in the
ith position.
 This position i is called an index (and the plural is indices).
 Then, in the above example, a[0] = 1, a[1] = 4, a[2] = 17, and so on.

It is worth noting at this point that the symbol = is quite overloaded. In mathematics, it stands for
equality. In most modern programming languages, = denotes assignment, while equality is
expressed by = =. We will typically use = in its mathematical meaning, unless it is written as part of
code or pseudocode.

 We say that the individual items a[i] in the array a are accessed using their index i, and one can
move sequentially through the array by incrementing or decrementing that index, or jump
straight to a particular item given its index value.
 Algorithms that process data stored as arrays will typically need to visit systematically all the
items in the array, and apply appropriate operations on them.

Loops and Iteration


 The standard approach in most programming languages for repeating a process a certain number
of times, such as moving sequentially through an array to perform the same operations on each
item, involves a loop.
 In pseudocode, this would typically take the general form
For i = 1,...,N,
do something
 and in programming languages like C and Java this would be written as the for-loop
for( i = 0 ; i < N ; i++ ) {
// do something
}

-3-
in which a counter i keep tracks of doing “the something" N times.

 For example, we could compute the sum of all 20 items in an array a using
for( i = 0, sum = 0 ; i < 20 ; i++ ) {
sum += a[i];
}
We say that there is iteration over the index i.

 The general for-loop structure is:


for( INITIALIZATION ; CONDITION ; UPDATE ) {
REPEATED PROCESS
}
in which any of the four parts are optional.

 One way to write this out explicitly is:


INITIALIZATION
if ( not CONDITION ) go to LOOP FINISHED
LOOP START
REPEATED PROCESS
UPDATE
if ( CONDITION ) go to LOOP START
LOOP FINISHED

Throughout this course, we will regularly make use of this basic loop structure when operating on
data stored in arrays, but it is important to remember that different programming languages use
different syntax, and there are numerous variations that check the condition to terminate the
repetition at different points.

-4-
Lists,

We have seen how arrays are a convenient way to store collections of items, and how loops and
iteration allow us to sequentially process those items. However, arrays are not always the most
efficient way to store collections of items. In this section, we shall see that lists may be a better way
to store collections of items, and how recursion may be used to process them. As we explore the
details of storing collections as lists, the advantages and disadvantages of doing so for different
situations will become apparent.

Linked Lists
 A list can involve virtually anything, for example, a list of integers [3, 2, 4, 2, 5], a shopping list
[apples, butter, bread, cheese], or a list of web pages each containing a picture and a link to the
next web page.
 When considering lists, we can speak about-them on different levels –
o on a very abstract level (on which we can define what we mean by a list),
o on a level on which we can depict lists and communicate as humans about them,
o on a level on which computers can communicate, or
o on a machine level in which they can be implemented.

Graphical Representation
 Non-empty lists can be represented by two-cells, in each of which the first cell contains a
pointer to a list element and the second cell contains a pointer to either the empty list or another
two-cell.
 We can depict a pointer to the empty list by a diagonal bar or cross through the cell.
 For instance, the list [3, 1, 4, 2, 5] can be represented as:

Abstract Data Type “List"


On an abstract level, a list can be constructed by the two constructors:
o EmptyList, which gives you the empty list, and
o MakeList(element,list), which puts an element at the top of an existing list.

Using those, our last example list can be constructed as


MakeList(3, MakeList(1, MakeList(4, MakeList(2, MakeList(5, EmptyList))))).
and it is clearly possible to construct any list in this way.

This inductive approach to data structure creation is very powerful, and we shall use it many times
throughout this course. It starts with the “base case", the EmptyList, and then builds up

-5-
increasingly complex lists by repeatedly applying the “induction step", the
MakeList(element, list) operator.

It is obviously also important to be able to get back the elements of a list, and we no longer have an
item index to use like we have with an array. The way to proceed is to note that a list is always
constructed from the first element and the rest of the list. So, conversely, from a non-empty list it
must always be possible to get the first element and the rest. This can be done using the two
selectors, also called accessor methods:

o first(list), and
o rest(list).

The selectors will only work for non-empty lists (and give an error or exception on the empty list),
so we need a condition which tells us whether a given list is empty:

o isEmpty(list)
This will need to be used to check every list before passing it to a selector.

We call everything a list that can be constructed by the constructors EmptyList and MakeList,
so that with the selectors first and rest and the condition isEmpty, the following
relationships are automatically satisfied (i.e. true):
o isEmpty(EmptyList)
o not isEmpty(MakeList(x, l)) (for any x and l)
o first(MakeList(x, l)) = x
o rest(MakeList(x, l)) = l

In addition to constructing and getting back the components of lists, one may also wish to
destructively change lists. This would be done by so-called mutators which change either the first
element or the rest of a non-empty list:
o replaceFirst(x, l)
o replaceRest(r, l)

For instance, with l = [3, 1, 4, 2, 5], applying replaceFirst(9, l) changes l to


[9, 1, 4, 2, 5]and then applying replaceRest([6, 2, 3, 4], l) changes it to
[9, 6, 2, 3, 4].

We shall see that the concepts of constructors, selectors and conditions are common to virtually all
abstract data types. Throughout this course, we will be formulating our data representations and
algorithms in terms of appropriate definitions of them.

XML Representation
In order to communicate data structures between different computers and possibly different
programming languages, XML (eXtensible Markup Language) has become a quasi-standard.
The above list could be represented in XML as:

-6-
<ol>
<li>3</li>
<li>1</li>
<li>4</li>
<li>2</li>
<li>5</li>
</ol>
However, there are usually many different ways to represent the same object in XML. For instance,
a cell-oriented representation of the above list would be:
<cell>
<first>3</first>
<rest>
<cell>
<first>1</first>
<rest>
<cell>
<first>4</first>
<rest>
<cell>
<first>2</first>
<rest>
<first>5</first>
<rest>EmptyList</rest>
</rest>
</cell>
</rest>
</cell>
</rest>
</cell>
</rest>
</cell>

While this looks complicated for a simple list, it is not, it is just a bit lengthy. XML is flexible enough to
represent and communicate very complicated structures in a uniform way.

Implementation of Lists
There are many different implementations possible for lists, and which one is best will depend on
the primitives offered by the programming language being used.

The programming language Lisp and its derivates, for instance, take lists as the most important
primitive data structure. In some other languages, it is more natural to implement lists as arrays.

That can be problematic, however, since lists are conceptually not limited in size, and for this
reason array based implementations with fixed-sized arrays can only approximate the general
concept. For many applications, this is not a problem because a maximal number of list members
can be determined a priori (e.g., the maximal number of students on this course is limited by the
number of students in our University). More general purpose implementations follow a pointer
based approach, which is close to the diagrammatic representation given above.

-7-
Recursion
We previously saw how iteration based on for-loops was a natural way to process collections of
items stored in arrays. When items are stored as linked-lists, there is no index for each item, and
recursion provides the natural way to process them. The idea is to formulate procedures which
involve at least one step that invokes (or calls) the procedure itself.

We will now look at how to implement two important derived procedures on lists, last and
append, which illustrate how recursion works.

To find the last element of a list l we can simply keep removing the first remaining item till there
are no more left. This algorithm can be written in pseudocode as:
last(l) {
if ( isEmpty(l) )
error(`Error: empty list in procedure last.')
elseif ( isEmpty(rest(l)) )
return first(l)
else
return last(rest(l))
}

The running time of this depends on the length of the list, and is proportional to that length, since
last is called as often as there are elements in the list. We say that the procedure has linear time
complexity, that is, if the length of the list is increased by some factor, the execution time is
increased by the same factor. Compared to the constant time complexity which access to the last
element of an array has, this is quite bad. It does not mean, however, that lists are inferior to arrays
in general, it just means that lists are not the ideal data structure when a program has to access the
last element of a long list very often.

Another useful procedure allows us to append one list l2 to another list l1. Again, this needs to
be done one item at a time, and that can be accomplished by repeatedly taking the first remaining
item of l1 and adding it to the front of the remainder appended to l2:
append(l1,l2) {
if ( isEmpty(l1) )
return l2
else
return MakeList(first(l1),append(rest(l1),l2))
}

The time complexity of this procedure is proportional to the length of the first list, l1, since we
have to call append as often as there are elements in l1.

-8-

You might also like