Web Engineering
Emilia Mendes · Nile Mosley (Eds.)
Web Engineering
With 143 Figures and 70 Tables
123
Editors
Emilia Mendes
Computer Science Department
University of Auckland
Private Bag 92019
Auckland, New Zealand
emilia@cs.auckland.ac.nz
Nile Mosley
MetriQ (NZ) Ltd.
19A Clairville Crescent
Wai-O-Taiki Bay
Auckland, New Zealand
nile@metriq.biz
Library of Congress Control Number: 2005936101
ACM Computing Classification (1998): D.2, C.4, K.6
ISBN-10 3-540-28196-7 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-28196-2 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations
are liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springeronline.com
© Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
Typeset by the authors using a Springer TEX macro package
Production: LE-TEX Jelonek, Schmidt & Vöckler GbR, Leipzig
Cover design: KünkelLopka Werbeagentur, Heidelberg
Printed on acid-free paper
45/3142/YL - 5 4 3 2 1 0
To: Pai, Mãe; Ma, Pa
Preface
Since its original inception the Web has changed into an environment employed for the delivery of many different types of applications, ranging
from small-scale information-dissemination-like applications, typically
developed by writers and artists, to large-scale commercial, enterpriseplanning and scheduling, collaborative-work applications. Numerous current Web applications are fully functional systems that provide businessto-customer and business-to-business e-commerce, with numerous services
to numerous users.
As the reliance on larger and more complex Web applications increases
so does the need for using methodologies/standards/best practice guidelines to develop applications that are delivered on time, within budget,
have a high level of quality and are easy to maintain. To develop such applications Web development teams need to use sound methodologies, systematic techniques, quality assurance, rigorous, disciplined and repeatable
processes, better tools, and baselines. Web engineering aims to meet such
needs.
The focus of this book is to provide its audience with the fundamental
concepts necessary to better engineer Web applications, and also present a
set of case studies where these concepts are applied to real industrial scenarios. Chapter 1 provides an introduction to Web engineering and discusses its differences and similarities to software engineering. Ten chapters are used to introduce concepts (e.g. cost estimation, productivity
assessment, usability measurement) and details on how to apply each concept to a practical situation. Another three chapters provide readers with
introductions to statistical techniques and empirical methods.
There is no other book in the market that examines Web engineering in
such breadth and with a practical emphasis. In terms of its audience, this
book is of considerable benefit for Web practitioners and graduate students. Practitioners can immediately grasp the usefulness and benefits of
Web engineering principles since all case studies describe real situations
that can also be similar to their own practices. Graduate students and researchers are provided a great opportunity to study Web engineering and to
see its application relative to concrete examples.
Table of Contents
1
The Need for Web Engineering: An Introduction ........................ 1
1.1 Introduction .............................................................................. 1
1.2 Web Applications Versus Conventional Software ................... 3
1.2.1 Web Hypermedia, Web Software,
or Web Application? .................................................... 3
1.2.2 Web Development vs. Software Development ............ 4
1.3 The Need for an Engineering Approach................................. 13
1.4 Empirical Assessment............................................................. 17
1.5 Conclusions ............................................................................ 24
Acknowledgements ................................................................ 24
References .............................................................................. 24
Authors’ Biographies.............................................................. 26
2
Web Effort Estimation .................................................................. 29
2.1 Introduction ............................................................................ 29
2.2 Effort Estimation Techniques ................................................. 30
2.2.1 Expert Opinion ........................................................... 31
2.2.2 Algorithmic Techniques............................................. 32
2.2.3 Artificial Intelligence Techniques .............................. 34
2.3 Measuring Effort Prediction Power and Accuracy ................. 39
2.3.1 Measuring Predictive Power ...................................... 39
2.3.2 Measuring Predictive Accuracy ................................. 40
2.4 Which Is the Most Accurate Prediction Technique? .............. 41
2.5 Case Study .............................................................................. 42
2.5.1 Data Validation .......................................................... 44
2.5.2 Variables and Model Selection................................... 47
2.5.3 Extraction of effort Equation...................................... 67
2.5.4 Model Validation........................................................ 67
2.6 Conclusions ............................................................................ 69
References .............................................................................. 70
Authors’ Biographies.............................................................. 73
3
Web Productivity Measurement and Benchmarking ................. 75
3.1 Introduction ............................................................................ 75
3.2 Productivity Measurement Method ........................................ 76
3.3 Case Study .............................................................................. 77
3.3.1 Productivity Measure Construction............................ 79
3.3.2 Productivity Analysis ................................................. 96
X
Table of Contents
3.4
Conclusions .......................................................................... 104
References ............................................................................ 105
Acknowledgements .............................................................. 105
Authors’ Biographies............................................................ 105
4
Web Quality.................................................................................. 109
4.1 Introduction .......................................................................... 109
4.2 Different Perspectives of Quality ......................................... 112
4.2.1 Standards and Quality .............................................. 112
4.2.2 Quality Versus Quality in Use ................................. 116
4.2.3 Quality and User Standpoints................................... 119
4.2.4 What is Web Quality? .............................................. 120
4.3 Evaluating Web Quality using WebQEM ............................ 123
4.3.1 Quality Requirements Definition
and Specification ...................................................... 125
4.3.2 Elementary Measurement and Evaluation................ 125
4.3.3 Global Evaluation..................................................... 128
4.3.4 Conclusions and Recommendations......................... 129
4.3.5 Automating the Process using WebQEM_Tool ....... 129
4.4 Case Study: Evaluating the Quality
of Two Web Applications..................................................... 130
4.4.1 External Quality Requirements ................................ 130
4.4.2 Designing and Executing
the Elementary Evaluation ....................................... 131
4.4.3 Designing and Executing
the Partial/Global Evaluation ................................... 134
4.4.4 Analysis and Recommendations .............................. 136
4.5 Concluding Remarks ............................................................ 139
Acknowledgements .............................................................. 140
References ............................................................................ 140
Authors’ Biographies............................................................ 142
5
Web Usability: Principles and Evaluation Methods................. 143
5.1 Introduction .......................................................................... 143
5.1.1 Usability in the Software Lifecycle.......................... 144
5.1.2 Chapter Organisation................................................ 145
5.2 Defining Web Usability........................................................ 146
5.2.1 Usability and Accessibility....................................... 147
5.3 Web Usability Criteria.......................................................... 149
5.3.1 Content Visibility ..................................................... 151
5.3.2 Ease of Content Access ............................................ 153
5.3.3 Ease of Content Browsing........................................ 155
Table of Contents
5.4
5.5
5.6
5.7
XI
Evaluation Methods.............................................................. 156
5.4.1 User Testing ............................................................. 157
5.4.2 Inspection Methods .................................................. 159
Automatic Tools To Support Evaluations ............................ 165
Evaluation of the DEI Application ....................................... 166
5.6.1 Design Inspection..................................................... 167
5.6.2 Web Usage Analysis ................................................ 170
Concluding Remarks ............................................................ 173
References ............................................................................ 175
Authors’ Biographies............................................................ 179
6
Web System Reliability and Performance:................................ 181
6.1 Introduction .......................................................................... 181
6.2 Web Application Services .................................................... 183
6.2.1 Web Resource Classification.................................... 184
6.2.2 Web Application’s Bearing on System Resources... 186
6.2.3 Workload Models and Performance Requirements.. 187
6.3 Applications Predominantly Dynamic.................................. 189
6.3.1 Dynamic Request Service ........................................ 189
6.3.2 Software Technologies for the Application Logic ... 190
6.3.3 System Platforms...................................................... 193
6.4 Testing Loop Phase .............................................................. 197
6.4.1 Representation of the Workload Model ................... 198
6.4.2 Traffic Generation .................................................... 199
6.4.3 Data Collection and Analysis ................................... 199
6.5 Performance Improvements.................................................. 203
6.5.1 System Tuning.......................................................... 203
6.5.2 System Scale-up ....................................................... 204
6.5.3 System Scale-out ...................................................... 204
6.6 Case Study ............................................................................ 205
6.6.1 Service Characterisation and Design........................ 205
6.6.2 Testing Loop Phase .................................................. 208
6.6.3 System Consolidation and Performance
Improvement ............................................................ 212
6.7 Conclusions .......................................................................... 214
Acknowledgements .............................................................. 214
References ............................................................................ 214
Authors’ Biographies............................................................ 217
7
Web Application Testing............................................................. 219
7.1 Introduction .......................................................................... 219
7.2 Web Application Testing: Challenges and Perspectives ...... 221
XII
Table of Contents
7.2.1
7.3
7.4
7.5
7.6
7.7
7.8
Testing the Non-functional Requirements
of a Web Application ............................................... 222
7.2.2 Testing the Functional Requirements
of a Web Application ............................................... 225
Web Application Representation Models............................. 227
Unit Integration and System Testing
of a Web Application............................................................ 230
7.4.1 Unit Testing.............................................................. 230
7.4.2 Integration Testing ................................................... 232
7.4.3 System Testing ......................................................... 233
Strategies for Web Application Testing ............................... 233
7.5.1 White Box Strategies................................................ 234
7.5.2 Black Box Strategies ................................................ 237
7.5.3 Grey Box Testing Strategies .................................... 241
7.5.4 User Session Based Testing ..................................... 242
Tools for Web Application Testing ...................................... 243
A Practical Example of Web Application Testing................ 246
Conclusions .......................................................................... 257
References ............................................................................ 258
Authors’ Biographies............................................................ 260
8
An Overview of Process Improvement in Small Settings......... 261
8.1 Introduction .......................................................................... 261
8.1.1 Why Do Organisations Initiate SPI Efforts? ............ 262
8.1.2 Process Improvement Cycle..................................... 265
8.1.3 Process Assessments ................................................ 267
8.2 Implementation in Small Settings......................................... 269
8.2.1 Availability of Funds................................................ 269
8.2.2 Resources For Process Improvement ....................... 269
8.2.3 Process Model .......................................................... 270
8.2.4 Training .................................................................... 270
8.2.5 Relevance of Practices in Assessment Models......... 271
8.2.6 Changing Behaviour................................................. 271
8.2.7 Piloting Practices...................................................... 272
8.2.8 Where To Start ......................................................... 272
8.3 Conclusions .......................................................................... 274
References...................................................................................... 274
Author’s Biography ....................................................................... 275
9
Conceptual Modelling of Web Applications:
The OOWS Approach ................................................................. 277
9.1 Introduction .......................................................................... 277
9.2 A Method to Model Web Applications ................................ 278
Table of Contents
XIII
9.2.1
9.2.2
9.3
9.4
9.5
OO-Method Conceptual Modelling.......................... 279
OOWS: Extending Conceptual Modelling
to Web Environments............................................... 280
A Strategy To Develop the Web Solution ............................ 286
Case Study: Valencia CF Web Application .......................... 287
Conclusions .......................................................................... 300
References ............................................................................ 300
Authors’ Biographies............................................................ 301
10
Model-Based Web Application Development............................ 303
10.1 The OOHDM approach – An Overview............................... 303
10.1.1 Requirements Gathering........................................... 304
10.1.2 Conceptual Design ................................................... 305
10.1.3 Navigational Design................................................. 305
10.1.4 Abstract Interface Design......................................... 306
10.1.5 Implementation......................................................... 306
10.2 Building an Online CD Store with OOHDM........................ 307
10.2.1 Requirements Gathering........................................... 307
10.2.2 Conceptual Modelling .............................................. 313
10.2.3 Navigation Design.................................................... 315
10.2.4 Abstract Interface Design......................................... 321
10.3 From Design to Implementation........................................... 327
10.4 Discussion and Lessons Learned.......................................... 331
10.5 Concluding Remarks ............................................................ 332
Acknowledgements .............................................................. 332
References ............................................................................ 332
Authors’ Biography .............................................................. 333
11
W2000: A Modelling Notation for Complex
Web Applications......................................................................... 335
11.1 Introduction .......................................................................... 335
11.2 Modelling Elements.............................................................. 337
11.3 Models .................................................................................. 341
11.3.1 Adaptability.............................................................. 343
11.3.2 Tool Support............................................................. 346
11.4 Example Application ............................................................ 346
11.4.1 Information Model ................................................... 348
11.4.2 Navigation Model..................................................... 353
11.4.3 Presentation Model................................................... 356
11.4.4 Service Model .......................................................... 358
11.5 Conclusions and Future Work .............................................. 362
References ............................................................................ 362
Authors’ Biographies............................................................ 363
XIV
12
What You Need To Know About Statistics ............................... 365
12.1 Describing Individual Variables ........................................... 365
12.1.1 Types of Variables ................................................... 365
12.1.2 Descriptive Statistics ................................................ 367
12.2 The Normal Distribution ...................................................... 374
12.3 Overview of Sampling Theory ............................................. 375
12.4 Other Probability Distributions............................................. 378
12.5 Identifying Relationships in the Data ................................... 379
12.5.1 Chi-Square Test for Independence........................... 380
12.5.2 Correlation Analysis................................................. 384
12.5.3 Regression Analysis ................................................. 390
12.5.4 Analysis of Variance (ANOVA) .............................. 400
12.5.5 Comparing Two Estimation Models ........................ 405
12.5.6 Final Comments ....................................................... 407
Author’s Biography .............................................................. 407
13
Empirical Research Methods in Web
and Software Engineering........................................................... 409
13.1 Introduction .......................................................................... 409
13.2 Overview of Empirical Methods .......................................... 411
13.3 Empirical Methods in an Improvement Context................... 415
13.4 Controlled Experiments ........................................................ 416
13.4.1 Introduction .............................................................. 416
13.4.2 Design ...................................................................... 417
13.4.3 Operation.................................................................. 419
13.4.4 Analysis and Interpretation ...................................... 420
13.5 Case Study ............................................................................ 422
13.5.1 Introduction .............................................................. 422
13.5.2 Case Study Arrangements ........................................ 423
13.5.3 Confounding Factors and Other Aspects.................. 423
13.6 Survey................................................................................... 424
13.6.1 Survey Characteristics.............................................. 424
13.6.2 Survey Purposes ....................................................... 425
13.6.3 Data Collection......................................................... 426
13.7 Post-mortem Analysis........................................................... 426
13.8 Summary............................................................................... 428
References ............................................................................ 428
Authors Biographies ............................................................. 429
List of Contributors
Abrahão, S., Assistant Professor
Department of Information Systems and Computation,
Valencia University of Technology
Camino de Vera, 46071 Valencia
Spain
Andreolini, M., Dr.
Dipartimento di Ingegneria dell'Informazione,
Università di Modena e Reggio Emilia
Via Vignolese 905
41100 Modena, Italy
Baresi, L., Associate Professor
Dipartimento di Elettronica e Informazione,
Politecnico di Milano, Italy
Via Giuseppe Ponzio, 34/5 - 20133 Milano
Italy
Carughi, G.T., BEng.
Dipartimento di Elettronica e Informazione,
Politecnico di Milano
Via Giuseppe Ponzio, 34/5 - 20133 Milano
Italy
Colajanni, M., Professor
Dipartimento di Ingegneria dell'Informazione,
Università di Modena e Reggio Emilia
Via Vignolese 905
41100 Modena, Italy
Colazzo, S.
HOC- Hypermedia Open Center,
Politecnico di Milano
Via Ponzio 34/5, 20133 Milano
Italy
XVI
List of Contributors
Counsell, S., Dr.
School of Information Systems and Computing,
Brunel University
St John's Building, Uxbridge, UB8 3PH
UK
Covella, G., Assistant Professor
Engineering Faculty,
National University of La Pampa
Calle 9 esq. 110 – (6360) General Pico, La Pampa
Argentina
Di Lucca, G.A., Dr.
RCOST-Research Centre on Software Technology
Department of Engineering,
University of Sannio
Via Traiano Palazzo ex Poste, 82100 Benevento
Italy
El-Emam, K, Associate Professor
Faculty of Medicine, University of Ottawa,
CHEO RI 401 Smyth Road, Ottawa, Ontario K1H 8L1
Canada
Fasolino, A.R., Associate Professor
Dep. 'Informatica and Sistemistica',
University of Naples Federico II
Via Claudio 21, 80125 Naples
Italy
Fons, J., Assistant Professor
Department of Information Systems and Computation,
Valencia University of Technology
Camino de Vera, 46071 Valencia
Spain
List of Contributors
XVII
Henningsson, K., Dr.
Dept. of Systems and Software Engineering,
School of Engineering
Blekinge Institute of Technology, Box 520, SE-372 25 Ronneby
Sweden
Höst, M., Dr.
Dept. of Communication Systems,
Lund Institute of Technology
Lund University, Box 118, SE-221 00 Lund
Sweden
Kitchenham, B.A., Professor
National ICT Australia,
Locked Bag 9013, Alexandria, NSW 1435,
Australia
Dept. of Computer Science, Keele University
Staffordshire ST5, 5BG
UK
Lancellotti, R., Dr.
Dipartimento di Ingegneria dell'Informazione,
Università di Modena e Reggio Emilia
Via Vignolese 905
41100 Modena, Italy
Mainetti, L., Associate Professor
Dip. Elettronica e Informazione,
Politecnico di Milano
Via Ponzio 34/5, 20133 Milano
Italy
Matera, M., Assistant Professor
Dipartimento di Elettronica e Informazione,
Politecnico di Milano
Via Ponzio 34/5, 20133 - Milano
Italy
XVIII
List of Contributors
Maxwell, K., Dr.
Datamax
7 bis bld. Marechal Foch, 77300 Fontainebleau,
France
Mendes, E., Dr.
Department of Computer Science,
The University of Auckland
Science Centre, 38 Princes Street, Auckland
New Zealand
Morasca, S., Professor
Dipartimento di Scienze della Cultura,
Politiche e dell'Informazione
Università degli Studi dell'Insubria
Via Valleggio 11, I-22100 Como
Italy
Mosley, N., Dr.
MetriQ (NZ) Limited
19 Clairville Crescent, Glendowie, Auckland
New Zealand
Olsina, L., Associate Professor
Engineering Faculty,
National University of La Pampa
Calle 9 esq. 110 - (6360) General Pico, La Pampa
Argentina
Pastor, O., Professor
Department of Information Systems and Computation,
Valencia University of Technology
Camino de Vera, 46071 Valencia
Spain
Pelechano, V., Associate Professor
Department of Information Systems and Computation,
Valencia University of Technology
Camino de Vera, 46071 Valencia
Spain
List of Contributors
XIX
Rizzo, F., Dr.
Human Computer Interaction Laboratory,
Politecnico di Milano
Via Ponzio 34/5, 20133 Milano
Italy
Rossi, G., Professor
LIFIA, National University of La Plata
calle 50 y 115, Primer Piso, La Plata
Argentina
Schwabe, D., Associate Professor
Computer Science Department,
Catholic University of Rio de Janeiro
Rua Marquês de São Vicente, 225 RDC, CEP 22453-900 Gávea,
Rio de Janeiro RJ
Brazil
Wohlin, C., Professor
Dept. of Systems and Software Engineering,
School of Engineering
Blekinge Institute of Technology,
Box 520, SE-372 25 Ronneby
Sweden
1 The Need for Web Engineering:
An Introduction
Emilia Mendes, Nile Mosley, Steve Counsell
Abstract: The objective of this chapter is three-fold. First, it provides an
overview of differences between Web and software development with
respect to their development processes, technologies, quality factors, and
measures. Second, it provides definitions for terms used throughout the
book. Third, it discusses the need for empirical investigations in Web engineering and presents the three main types of empirical investigations –
surveys, case studies, and formal experiments.
Keywords: Web engineering, Empirical Investigation, Case studies, Surveys, Formal experiment, Scientific principles, Engineering.
1.1 Introduction
The World Wide Web (Web) was originally conceived in 1989 as an environment to allow for the sharing of information (e.g. research reports, databases, user manuals) amongst geographically dispersed individuals. The
information itself was stored on different servers and was retrieved by
means of a single user interface (Web browser). The information consisted
primarily of text documents inter-linked using a hypertext metaphor1 [23].
Since its original inception the Web has changed into an environment
employed for the delivery of many different types of applications. Such
applications range from small-scale information-dissemination-like applications, typically developed by writers and artists, to large-scale commercial,2 enterprise-planning and scheduling, collaborative-work applications.
The latter are developed by multidisciplinary teams of people with diverse
skills and backgrounds using cutting-edge, diverse technologies [10,12,23].
Numerous current Web applications are fully functional systems that provide business-to-customer and business-to-business e-commerce, and numerous services to numerous users [23].
1
2
http://www.zeltser.com/web-history/.
The increase in the use of the Web to provide commercial applications has been
motivated by several factors, such as the possible increase of an organisation’s
competitive position, and the opportunity for small organisations to project
their corporate presence in the same way as that of larger organisations [29].
2
Emilia Mendes, Nile Mosley, Steve Counsell
Industries such as travel and hospitality, manufacturing, banking, education, and government utilised Web-based applications to improve and increase their operations [12]. In addition, the Web allows for the development of corporate intranet Web applications, for use within the boundaries
of their organisations [15]. The remarkable spread of Web applications
into areas of communication and commerce makes it one of the leading
and most important branches of the software industry [23].
To date the development of Web applications has been in general ad hoc,
resulting in poor-quality applications, which are difficult to maintain [22].
The main reasons for such problems are unsuitable design and development
processes, and poor project management practices [11]. A survey on Webbased projects, published by the Cutter Consortium in 2000, revealed a
number of problems with outsourced large Web-based projects [11]:
•
•
•
•
84% of surveyed delivered projects did not meet business needs.
53% of surveyed delivered projects did not provide the required
functionality.
79% of surveyed projects presented schedule delays.
63% of surveyed projects exceeded their budget.
As the reliance on larger and more complex Web applications increases
so does the need for using methodologies/standards/best practice guidelines to develop applications that are delivered on time, within budget,
have a high level of quality and are easy to maintain [29,27,20]. To develop such applications Web development teams need to use sound methodologies, systematic techniques, quality assurance, rigorous, disciplined
and repeatable processes, better tools, and baselines. Web engineering3
aims to meet such needs [12].
Web engineering is described as [21]:
“the use of scientific, engineering, and management principles and systematic approaches with the aim of successfully developing, deploying
and maintaining high quality Web-based systems and applications”.
This is a similar definition to that used to describe software engineering;
however, both disciplines differ in many ways. Such differences are discussed in Sect. 1.2.
Section 1.3 provides an introduction to measurement principles and
three widely used methods of investigation – surveys, case studies, and
formal experiments [7]. Finally, conclusions are presented in Sect. 1.4.
3
The term “Web engineering” was first published in 1996 in a conference paper
by Gellersen et al. [9]. Since then this term has been cited in numerous publications, and numerous activities devoted to discussing Web engineering have taken place (e.g. workshops, conference tracks, entire conferences).
The Need for Web Engineering: An Introduction
3
1.2 Web Applications Versus Conventional Software
An overview of differences between Web and software development with
respect to their development processes, technologies, quality factors, and
measures is presented here. In addition, this section also provides definitions and terms used throughout the book (e.g. Web application).
1.2.1 Web Hypermedia, Web Software, or Web Application?
The Web is the best known example of a hypermedia system. To date,
numerous organisations world-wide have developed a vast array of commercial and/or educational Web applications. The Web literature uses numerous synonyms for a Web application, such as Web site, Web system,
Internet application. The IEEE Std 2001-2002 uses the term Web site defined as [17]:
“A collection of logically connected Web pages managed as a single
entity.”
However, using Web site and Web application interchangeably does not
allow one to differentiate between the physical storage of Web pages and
their application domains.
The Web has been used as the delivery platform for three types of applications: Web hypermedia applications, Web software applications, and
Web applications [4].
•
•
Web hypermedia application – a non-conventional application characterised by the authoring of information using nodes (chunks of information), links (relations between nodes), anchors, access structures
(for navigation), and delivery over the Web. Technologies commonly
used for developing such applications are HTML, XML, JavaScript,
and multimedia. In addition, typical developers are writers, artists, and
organisations who wish to publish information on the Web and/or CDROM without the need to know programming languages such as Java.
These applications have unlimited potential in areas such as software
engineering, literature, education, and training.
Web software application – a conventional software application that
relies on the Web or uses the Web's infrastructure for execution. Typical
applications include legacy information systems such as databases,
booking systems, knowledge bases, etc. Many e-commerce applications
fall into this category. Typically they employ development technologies
(e.g. DCOM, ActiveX, etc.), database systems, and development solutions (e.g. J2EE). Developers are in general young programmers fresh
4
•
Emilia Mendes, Nile Mosley, Steve Counsell
from a Computer Science or Software Engineering degree course, managed by a few more senior staff.
Web application – an application delivered over the Web that combines
characteristics of both Web hypermedia and Web software applications.
1.2.2 Web Development vs. Software Development
Web development and software development differ in a number of areas,
which will be detailed later. However, of these, three such areas seem to
provide the greatest differences and to affect the entire Web development
and maintenance processes. These areas encompass the people involved in
development, the intrinsic characteristics of Web applications, and the
audience for which they are developed.
The development of conventional software remains dominated largely by
IT professionals where a sound knowledge of programming, database design, and project management is necessary. In contrast, Web development
encompasses a much wider variety of developers, such as amateurs with no
programming skills, graphics designers, writers, database experts, and IT
professionals, to name but a few. This is possible as Web pages can be created by anyone without the necessity for programming knowledge [3].
Web applications by default use communications technology and have
multi-platform accessibility. In addition, since they employ a hypermedia
paradigm, they are non-sequential by nature, using hyperlinks to interrelate Web pages and other documents. Therefore, navigation and pluralistic design become important aspects to take into account. Finally, the
multitude of technologies available for developing Web applications
means that developers can build a full spectrum of applications, from a
static simple Web application using HTML to a fully fledged distributed ecommerce application [29]. Conventional software can be developed using
several programming languages running on a specific platform, components off the shelf (COTS), etc. It can also use communications technology
to connect to and use a database system. However the speed of implementing new technology is faster for Web development relative to non-Webbased applications.
Web applications are aimed at wide-ranging groups of users. Such groups
may be known ahead of time (e.g. applications available within the boundaries of the intranet). However, it is more often the case that Web applications are devised for an unknown group of users, making the development
of aesthetically pleasing applications more challenging [5]. In contrast,
conventional software applications are generally developed for a known
user group (e.g. department, organisation) making the explicit identification
of target users an easier task.
The Need for Web Engineering: An Introduction
5
For the purpose of discussion, we have grouped the differences between
Web and software development into 12 areas, which are as follows:
1. Application Characteristics
2. Primary Technologies Used
3. Approach to Quality Delivered
4. Development Process Drivers
5. Availability of the Application
6. Customers (Stakeholders)
7. Update Rate (Maintenance Cycles)
8. People Involved in Development
9. Architecture and Network
10. Disciplines Involved
11. Legal, Social, and Ethical Issues
12. Information Structuring and Design
(1) Application Characteristics
Web applications are created by integrating numerous distinct elements,
such as fine-grained components (e.g. DCOM, OLE, ActiveX), interpreted
scripting languages, components off the shelf (COTS) (e.g. customised
applications, library components, third-party products), multimedia files
(e.g. audio, video, 3D objects), HTML/SGML/XML files, graphical images, mixtures of HTML and programs, and databases [5,23,26]. Components may be integrated in many different ways and present different quality attributes. In addition, their source code may be proprietary or
unavailable, and may reside on and/or be executed from different remote
computers [23]. Web applications are in the main platform-independent
(although there are exceptions, e.g. OLE, ActiveX) and Web browsers in
general provide similar user interfaces with similar functionality, freeing
users from having to learn distinct interfaces [5]. Finally, a noticeable difference between Web applications and conventional software applications
is in the use of navigational structures. Web applications use a hypermedia
paradigm where content is structured and presented using hyperlinks.
Navigational structures may also need to be customised, i.e. the dynamic
adaptation of content structure, atomic hypermedia components, and presentation styles [8].
Despite the initial attempt by the hypermedia community to develop conventional applications with a hypermedia-like interface, the large amounts
of conventional software applications do not employ this technique.
Again in contrast, conventional software applications can also be developed using a wide variety of components (e.g. COTS), generally developed
using conventional programming languages such as C++, Visual Basic, and
Delphi. These applications may also use multimedia files, graphical images,
and databases. It is common that user interfaces are customised depending
6
Emilia Mendes, Nile Mosley, Steve Counsell
on the hardware, operating system, software in use, and the target audience
[5]. There are programming languages on the market (e.g. Java) that are intentionally cross-platform; however, the best part of conventional software
applications tend to be monolithic running on a single operating system.
(2) Primary Technologies Used
Web applications are developed using a wide range of diverse technologies
such as the many flavoured Java solutions (Java servlets, Enterprise JavaBeans, applets, and JavaServer Pages), HTML, JavaScript, XML, UML,
databases, and much more. In addition, there is an increasing use of thirdparty components and middleware. Since Web technology is an area that
changes quickly, some authors suggest it may be difficult for developers
and organisations to keep up with what is currently available [23].
The primary technology used to develop conventional software applications is mostly represented by object-oriented methods, generators, and
languages, relational databases, and CASE tools [26]. The pace with which
new technologies are proposed is slower than that for Web applications.
(3) Approach to Quality Delivered
Web companies that operate their business on the Web rely heavily on providing applications and services of high quality so that customers return to
do repeat business. As such, these companies only see a return on investment if customers’ needs have been fulfilled. Customers who use the Web
for obtaining services have very little loyalty to the companies they do business with. This suggests that new companies providing Web applications of
a higher quality will most likely displace customers from previously established businesses. Further, that quality is the principal factor that will bring
repeated business. For Web development, quality is often considered as
higher priority than time to market, with the mantra “later and better” as the
mission statement for Web companies who wish to remain competitive [23].
Within the context of conventional software development, software contractors are often paid for their delivered application regardless of its quality.
Return on investment is immediate. Ironically, they are also often paid for
fixing defects in the delivered application, where these failures principally
exist because the developer did not test the application thoroughly. This has
the knock-on effect that a customer may end up paying at least twice (release
and fixing defects) the initial bid in order to make the application functional.
Here time to market takes priority over quality since it can be more lucrative
to deliver applications with plenty of defects sooner than high-quality applications later. For these companies the “sooner but worse” rules applies [23].
Another popular mechanism employed by software companies is to fix
defects and make the updated version into a new release, which is then resold to customers, bringing in additional revenue.
The Need for Web Engineering: An Introduction
7
(4) Development Process Drivers
The dominant development process drivers for Web companies are composed of three quality criteria [23]:
•
•
•
Reliability,
Usability, and
Security.
Followed by:
•
•
•
•
Availability,
Scalability,
Maintainability, and
Time to market.
Reliability: applications that work well, do no crash, do not provide incorrect data, etc.
Usability: an application that is simple to use. If a customer wants to use
a Web application to buy a product on-line, the application should be as
simple to use as the process of physically purchasing that product in a
shop. Many existing Web applications present poor usability despite the
extensive range of Web usability guidelines that have been published. A
Web application with poor usability will quickly be replaced by another
more usable application as soon as its existence becomes known to the
target audience [23].
Security: the handling of customer data and other information securely
so that problems such as financial loss, legal consequences, and loss of
credibility can be avoided [23].
With regards to conventional software development, the development
process driver is time to market and not quality criteria [23].
(5) Availability of the Application
Customers who use the Web expect applications to be operational throughout the whole year (24/7/365). Any downtime, no matter how short, can
be detrimental [23].
Except for a few application domains (e.g. security, safety critical, military, banking) customers of conventional software applications do not expect these applications to be available 24/7/365.
(6) Customers (Stakeholders)
Web applications can be developed for use within the boundaries of a single
organisation (intranet), a number of organisations (extranets), or for use by
people anywhere in the world. The implications are that stakeholders may
come from a wide range of groups where some may be clearly identified
(e.g. employees within an organisation) and some may remain unknown,
8
Emilia Mendes, Nile Mosley, Steve Counsell
which is often the case [23,5,6,28]. As a consequence, Web developers are
regularl aced with the challenge of developing applications for unknown
users, whose expectations (requirements) and behaviour patterns are also
unknown at development time [5]. In this case new approaches and guidelines must be devised to better understand prospective and unknown users
such that quality requirements can be determined beforehand to deliver
high-quality applications [6]. Whenever users are unknown it also becomes
more difficult to provide aesthetically pleasing user interfaces, necessary to
be successful and stand out from the competition [5].
Some stakeholders can reside locally, in another state/province/county,
or overseas. Those who reside overseas may present different social and
linguistic backgrounds, which increases the challenge of developing successful applications [5,28]. Whenever stakeholders are unknown it is also
difficult to estimate the number of users an application will service, so
applications must also be scalable [23].
With regards to conventional software applications, it is usual for stakeholders be explicitly identified prior to development. These stakeholders
often represent groups confined within the boundaries of departments,
divisions, or organisations [5].
(7) Update Rate (Maintenance Cycles)
Web applications are updated frequently without specific releases and with
maintenance cycles of days or even hours [23]. In addition, their content and
functionality may also change significantly from one moment to another,
and so the concept of project completion may seem unsuitable in such circumstances. Some organisations also allow non-information-systems experts to develop and modify Web applications and in such environments it is
often necessary to provide an overall management of the delivery and modification of applications to avoid confusion [28].
The maintenance cycle for conventional software applications complies
with a more rigorous process. Upon a product’s release software organisations usually initiate a cycle whereby a list of requested changes/adjustments/improvements (either from customers or from its own development
team) is prepared over a set period of time, and later incorporated as a specific version or release for distribution to all customers simultaneously.
This cycle can be as short as a week and as long as several years. It
requires more planning as it often entails other, possibly expensive activities such as marketing, sales, product shipping, and occasionally personal
installation at a customer’s site [12,23].
(8) People Involved in Development
The Web provides a broad spectrum of different types of Web applications,
varying in quality, size, complexity, and technology. This variation is also
The Need for Web Engineering: An Introduction
9
applicable to the range of skills represented by those involved in Web development projects. Web applications can be created, for example, by artists and writers using simple HTML code or more likely one of the many
commercially available Web authoring tools (e.g. Macromedia Dreamweaver, Microsoft Frontpage), making the authoring process available to
those with no prior programming experience [28]. However, Web applications can also be very large and complex, requiring a team of people with
diverse skills and experience. Such teams consist of Web designers and
programmers, graphic designers, librarians, database designers, project
managers, network security experts, and usability experts [23].
Web designers and programmers are necessary to implement the application’s functionality using the necessary programming languages and
technology. In particular they also decide on the application’s architecture
and technologies applicable, and to design the application taking into account its documents and links [5]. Graphic designers, usability experts, and
librarians provide applications pleasing to the eye, easy to navigate, and
provide good search mechanisms to obtain the required information. This
is often the case where such expertise is outsourced, and used on a projectby-project basis.
Large Web applications most likely use database systems for data storage making it important to have a team member with expertise in database
design and the necessary queries to manipulate the data. Project managers
are responsible for managing the project in a timely manner and allocating
resources adequately such that applications are developed on time, within
budget, and are of high quality. Finally, network security experts provide
solutions for various security aspects [11].
Conversely, the development of conventional software remains dominated by IT professionals where a sound knowledge of programming, database design, and project management is necessary.
(9) Architecture and Network
Web applications are typically developed using a simple client–server architecture (two-tier), represented by Web browsers on client computers connecting to a Web server hosting the Web application, to more sophisticated
configurations such as three-tier or even n-tier architecture [23]. The servers
and clients within these architectures represent computers that may have a
different operating system, software, hardware configurations, and may be
connected to each other using different network settings and bandwidth.
The introduction of more than two tiers was motivated by limitations
of the two-tier model (e.g. implementation of an application’s business
logic on the client machine, increased network load as any data processing is only carried out on the client machine). In such architectures the
business logic is moved to a separate server (middle-tier), which services
10
Emilia Mendes, Nile Mosley, Steve Counsell
client requests for data and functionality. The middle-tier then requests
and sends data to and from a (usually) separate database server. In addition, the type of networks used by the numerous stakeholders may be
unknown, so assumptions have to be made while developing these Web
applications [5].
Conventional software applications either run in isolation on a client machine or use a two-tier architecture whenever applications use data from
database systems installed on a separate server. The type of networks used
by the stakeholders is usually known in advance since most conventional
software applications are limited to specific places and organisations [5].
(10) Disciplines Involved
To develop large and complex Web applications adequately a team of people with a wide range of skills and expertise in different areas is required.
These areas reflect distinct disciplines such as software engineering (development methodologies, project management, tools), hypermedia engineering (linking, navigation), requirements engineering, usability engineering, information engineering, graphics design, and network management
(performance measurement and tuning) [6,11,12].
Building a conventional software application involves contributions
from a smaller number of disciplines than those used for developing Web
applications, such as software engineering, requirements engineering, and
usability engineering.
(11) Legal, Social, and Ethical Issues
The Web as a distributed environment enables a vast amount of structured
(e.g. database records) and unstructured (e.g. text, images, audio) content to
be easily available to a multitude of users worldwide. This is often cited as
one of the greatest advantages of using the Web. However, this environment
is also used for the purpose of dishonest actions, such as copying content
from Web applications without acknowledging the source, distributing information about customers without their consent, infringing copyright and
intellectual property rights, and even, in some instances, identity theft [5].
The consequences that follow from the unlawful use of the Web are that
Web companies, customers, entities (e.g. W3C), and government agencies
must apply a similar paradigm to the Web as those applied to publishing,
where legal, social, and ethical issues are taken into consideration [6].
Issues referring to accessibility offered by Web applications should also
take into account special user groups such as the handicapped [5].
Conventional software applications also share a similar fate to that of
Web applications, although to a smaller extent, since these applications are
not so readily available for such a large community of users, compared to
Web applications.
The Need for Web Engineering: An Introduction
11
(12) Information Structuring and Design
As previously mentioned, Web applications present structured and unstructured content, which may be distributed over multiple sites and use different systems (e.g. database systems, file systems, multimedia storage devices) [8]. In addition, the design of a Web application, unlike that of
conventional software applications, includes the organisation of content
into navigational structures by means of hyperlinks. These structures provide users with easily navigable Web applications. Well-designed applications should allow for suitable navigation structures [6], ]as well as the
structuring of content, which should take into account its efficient and
reliable management [5].
Another difference between Web and conventional applications is that
Web applications often contain a variety of specific file formats for multimedia content (e.g. graphics, sound, and animation). These files must be
integrated into any current configuration management system, and their
maintenance routine also needs to be organised as is likely that it will differ from the maintenance routine used for text-based documents [3]. Conventional software applications present structured content that uses file or
database systems. The structuring of such content has been addressed by
software engineering in the past so the methods employed here for information structuring and design are well known by IT professionals [5].
Reifer [26 ]presents a comparison between Web-based and traditional
approaches that takes into account measurement challenges for project
management (see Table 1.1). Table 1.2 summarises the differences between Web-based and conventional development contexts.
Table 1.1. Comparison between Web-based and traditional approaches
Web-based approach
Traditional approach
Estimating Ad-hoc costing of work, centred More formal costing of work based
process
on input from the developers.
on past experience from similar
projects and expert opinion.
Size
No agreement upon a standard Lines of code or function points
estimation size measure for Web applica- are the standard size measures
used.
tions within the community.
Effort is estimated using a bot- Effort is estimated using equations
Effort
estimation tom-up approach based on input built taking into account project
from developers. Hardly any
characteristics and historical data
historical data is available from from past projects.
past projects.
Quality
Quality is difficult to measure. Quality is measurable using
estimation Need for new quality measures known quality measures (e.g.
specific for Web-based projects. defect rates, system properties).
12
Emilia Mendes, Nile Mosley, Steve Counsell
Table 1.2. Web-based versus traditional approaches to development
Application
characteristics
Primary
technologies
used
Approach to
quality delivered
Development
process drivers
Availability of
the application
Web-based approach
Traditional approach
Integration of numerous distinct
components (e.g. fine-grained,
interpreted scripting languages,
COTS, multimedia files,
HTML/SGML/XML files, databases, graphical images), distributed, cross-platform applications,
and structuring of content using
navigational structures with hyperlinks.
Variety of Java solutions (Java
servlets, Enterprise JavaBeans,
applets, and JavaServer Pages),
HTML, JavaScript, XML, UML,
databases, third-party components and middleware, etc.
Quality is considered as of higher
priority than time to market.
Integration of distinct
components (e.g. COTS,
databases, graphical
images), monolithic
single-platform applications.
Reliability, usability, and security.
Throughout the whole year
(24/7/365).
Time to market.
Customers
(stakeholders)
Wide range of groups, known and
unknown, residing locally or
overseas.
Update rate
(maintenance
cycles)
Frequently without specific releases, maintenance cycles of
days or even hours.
People involved
in development
Web designers and programmers,
graphic designers, librarians,
database designers, project managers, network security experts,
usability experts, artists, writers.
Two-tier to n-tier clients and
servers with different network
settings and bandwidth, sometimes unknown.
Architecture
and Network
Object-oriented methods, generators, and
languages, relational
databases, and CASE
tools.
Time to market takes
priority over quality.
Except for a few application domains, no need
for availability 24/7/365.
Generally groups confined within the boundaries of departments,
divisions, or organizations.
Specific releases, maintenance cycles ranging
from a week to several
years.
IT professionals with
knowledge of programming, database design,
and project management.
One to two-tier architecture, network settings,
and bandwidth are likely
to be known in advance.
The Need for Web Engineering: An Introduction
Web-based approach
Disciplines
involved
Legal, social,
and ethical
issues
Information
structuring and
design
Software engineering, hypermedia engineering, requirements
engineering, usability engineering, information engineering,
graphics design, and network
management.
Content can be easily copied and
distributed without permission or
acknowledgement of copyright
and intellectual property rights.
Applications should take into
account all groups of users including those handicapped.
Structured and unstructured content, use of hyperlinks to build
navigational structures.
13
Traditional approach
Software engineering,
requirements engineering, and usability engineering.
Content can also be
copied infringing privacy, copyright, and IP
issues, albeit to a smaller
extent.
Structured content, seldom use of hyperlinks.
As we have seen, there are several differences between Web development
and applications and conventional development and applications. However,
there are also similarities that are more evident if we focus on the development of large and complex applications. Both need quality assurance
mechanisms, development methodologies, tools, processes, techniques for
requirements elicitation, effective testing and maintenance methods, and
tools [6].
The next section will provide an introduction to the measurement principles used throughout the book. It also provides an introduction to empirical assessment.
1.3 The Need for an Engineering Approach
Engineering is widely taken as a disciplined application of scientific
knowledge for the solution of practical problems. A few definitions taken
from dictionaries confirm that:
“Engineering is the application of science to the needs of humanity.
This is accomplished through knowledge, mathematics, and practical
experience applied to the design of useful objects or processes.” [30]
“Engineering is the application of scientific principles to practical
ends, as the design, manufacture, and operation of structures and
machines.” [15]
14
Emilia Mendes, Nile Mosley, Steve Counsell
“The profession of applying scientific principles to the design, construction, and maintenance of engines, cars, machines, etc. (mechanical engineering), buildings, bridges, roads, etc. (civil engineering),
electrical machines and communication systems (electrical engineering), chemical plant and machinery (chemical engineering), or aircraft (aeronautical engineering).” [14]
In all of the above definitions, the need for “the application of scientific
principles” has been stressed, where scientific principles are the result of
applying a scientific process [13]. A process in this context means that our
current understanding, i.e. our theory of how best to develop, deploy, and
maintain high-quality Web-based systems and applications, may be modified or replaced as new evidence is found through the accumulation of
data and knowledge. This process is illustrated in Fig. 1.1 and described
below [13]:
•
•
•
•
Observation: To observe or read about a phenomenon or set of facts. In
most cases the motivation for such observation is to identify cause and
effect relationships between observed items, since these entail predictable results. For example, we can observe that an increase in the development of new Web pages seems also to increase the corresponding
development effort.
Hypothesis: To formulate a hypothesis represents an attempt to explain
an Observation. It is a tentative theory or assumption that is believed to
explain the behaviour under investigation [7]. The items that participate in the Observation are represented by variables (e.g. number of
new Web pages, development effort) and the hypothesis indicates what
is expected to happen to these variables (e.g. there is a linear relationship between number of Web pages and development effort, showing
that as the number of new Web pages increases so does the effort to
develop these pages). These variables first need to be measured and to
do so we need an underlying measurement theory.
Prediction: To predict means to predict results that should be found if
the rationale used in the hypothesis formulation is correct (e.g. Web
applications with a larger number of new Web pages will use a larger
development effort).
Validation: To validate requires experimentation to provide evidence
either to support or refute the initial hypothesis. If the evidence refutes
the hypothesis then the hypothesis should be revised or replaced. If the
evidence is in support of the hypothesis, then many more replications
of the experiment need to be carried out in order to build a better understanding of how variables relate to each other and their cause and
effect relationships.
The Need for Web Engineering: An Introduction
15
The scientific process supports knowledge building, which in turn involves the use of empirical studies to test hypotheses previously proposed,
and to ensure if current understanding of the discipline is correct. Experimentation in Web engineering is therefore essential [1,2].
Observation
Hypothesis
Prediction
Validation
No
Valid?
Y
Theory
Fig. 1.1. The scientific process
The extent to which scientific principles are applied to developing and
maintaining Web applications varies among organisations. More mature
organisations generally apply these principles to a larger extent than less
mature organisations, where maturity reflects an organisation’s current
development processes [7]. Some organisations have clearly defined processes that remain unchanged regardless of the people who work on the
projects. For such organisations, success is dictated by following a welldefined process, where feedback is constantly obtained using product,
process and resource measures. Other organisations have processes that are
not so clearly defined (ad hoc) and therefore the success of a project is
often determined by the expertise of the development team. In such a scenario product, process, and resource measures are rarely used and each
project represents a potential risk that may lead an organisation, if it gets it
wrong, to bankruptcy [25].
The variables used in the formulation of hypotheses represent the attributes of real-world entities that we observe. An entity represents a process,
product, or resource. A process is defined as a software-related activity.
Examples of processes are Web development, Web maintenance, Web
design, Web testing, and Web project. A product is defined as an artefact,
deliverable, or document that results from a process activity. Examples of
16
Emilia Mendes, Nile Mosley, Steve Counsell
products are Web application, design document, testing scripts, and fault
reports. Finally, a resource represents an entity required by a process activity. Examples of resources are Web developers, development tools, and
programming languages [7].
In addition, for each entity’s attribute that is to be measured, it is also useful to identify if the attribute is internal or external. Internal attributes can
be measured by examining the product, process, or resource on its own,
separate from its behaviour. External attributes can only be measured with
respect to how the product, process, or resource relates to its environment
[7]. For example, usability is in general an external attribute since its measurement often depends upon the interaction between user and application.
The classification of entities applied to the case study in Chap. 2 is presented in Table 1.3.
The measurement of an entity’s attributes generates quantitative descriptions of key processes, products, and resources, enabling us to understand
behaviour and result. This understanding lets us select better techniques and
tools to control and improve our processes, products, and resources [24].
The measurement theory that has been adopted in this book is the representational theory of measurement [7]. It drives the definition of measurement scales, presented in Chap. 12, and the measures presented in all remaining chapters.
Table 1.3. Classification of process, product, and resources for Tukutuku4 dataset
ENTITY
ATTRIBUTE
Description
PROCESS ENTITIES
PROJECT
TYPEPROJ Type of project (new or enhancement).
LANGS
Implementation languages used.
DOCPROC If project followed defined and documented process.
PROIMPR If project team involved in a process improvement
programme.
METRICS If project team part of a software metrics programme.
DEVTEAM Size of project’s development team.
4
The Tukutuku project collects data on industrial Web projects, for the development of effort estimation models and to benchmark productivity across and
within Web companies. See http://www.cs.auckland.ac.nz/tukutuku.
The Need for Web Engineering: An Introduction
ENTITY
ATTRIBUTE
17
Description
WEB
DEVELOPMENT
TOTEFF
Actual total effort used to develop the Web application.
ESTEFF
Estimated total effort necessary to develop the Web
application.
ACCURACY Procedure used to record effort data.
PRODUCT ENTITY
WEB
APPLICATION
TYPEAPP
TOTWP
NEWWP
TOTIMG
NEWIMG
HEFFDEV
Type of Web application developed.
Total number of Web pages (new and reused).
Total number of new Web pages.
Total number of images (new and reused).
Total number of new images your company created.
Minimum number of hours to develop a single
function/feature by one experienced developer that
is considered high (above average).
HEFFADPT Minimum number of hours to adapt a single function/feature by one experienced developer that is
considered high (above average).
HFOTS
Number of reused high-effort features/functions
without adaptation.
HFOTSA
Number of adapted high-effort features/functions.
HNEW
Number of new high-effort features/functions.
FOTS
Number of low-effort features off the shelf.
FOTSA
Number of low-effort features off the shelf adapted.
NEW
Number of new low-effort features/functions.
RESOURCE ENTITY
DEVELOPMENT
TEAM
TEAMEXP
Average team experience with the development
language(s) employed.
1.4 Empirical Assessment
Validating a hypothesis or research question encompasses experimentation, which is carried out using an empirical investigation. Investigations
can be organised as a survey, case study or formal experiment [7].
18
•
•
5
Emilia Mendes, Nile Mosley, Steve Counsell
Survey: a retrospective investigation of an activity in order to confirm
relationships and outcomes [7]. It is also known as “research-in-thelarge” as it often samples over large groups of projects. A survey should
always be carried out after the activity under focus has occurred [18].
When performing a survey, a researcher has no control over the situation at hand, i.e. the situation can be documented, compared to other
similar situations, but none of the variables being investigated can be
manipulated [7]. Within the scope of software and Web engineering,
surveys are often used to validate the response of organisations and developers to a new development method, tool, or technique, or to reveal
trends or relationships between relevant variables [7]. For example, a
survey can be used to measure the success of changing from Sun’s J2EE
to Microsoft’s ASP.NET throughout an organisation, because it can
gather data from numerous projects. The downside of surveys is time.
Gathering data can take many months or even years, and the outcome
may only be available after several projects have been completed [18].
Case study: an investigation that examines the trends and relationships
using as its basis a typical project within an organisation. It is also known
as “research-in-the-typical” [18]. A case study can investigate a retrospective event, but this is not the usual trend. A case study is the type of
investigation of choice when wishing to examine an event that has not
yet occurred and for which there is little or no control over the variables.
For example, if an organisation wants to investigate the effect of an object-oriented language on the quality of the resulting Web application
but cannot develop the same project using numerous object-oriented
languages simultaneously, then the investigative choice is to use a case
study. If the quality of the resulting Web application is higher than the
organisation’s baseline, it may be due to many different reasons (e.g.
chance, or perhaps bias from enthusiastic developers). Even if the object-oriented language had a legitimate effect on quality, no conclusions
outside the boundaries of the case study can be drawn, i.e. the results of a
case study cannot be generalised to every possible situation. Had the
same application been developed several times, each time using a different object-oriented language5 (as a formal experiment) then it would be
possible to have better understanding of the relationship between language and quality, given that these variables were controlled. A case
study samples from the variables, rather than over them. This means
that, in relation to the variable object-oriented language, a value that
represents the object-oriented language usually used on most projects
The values for all other attributes should remain the same (e.g. developers,
programming experience, development tools, computing power, type of application).
The Need for Web Engineering: An Introduction
•
19
will be the one chosen (e.g. J2EE). A case study is easier to plan than a
formal experiment, but its results are harder to explain and, as previously
mentioned, cannot be generalised outside the scope of the study [18].
Formal experiment: rigorous and controlled investigation of an event
where important variables are identified and manipulated such that
their effect on the outcome can be validated [7]. It is also known as
“research-in-the-small” since it is very difficult to carry out formal experiments in software and Web engineering using numerous projects
and resources. A formal experiment samples over the variable that is
being manipulated, such that all possible variable values are validated,
i.e. there is a single case representing each possible situation. If we use
the same example used for case studies above, this means that several
projects would be developed, each using a different object-oriented
programming language. If one aims to obtain results that are largely
applicable across various types of projects and processes, then the
choice of investigation is a formal experiment. This type of investigation is most suited to the Web engineering research community. Despite the control that needs to be exerted when planning and running a
formal experiment, its results cannot be generalised outside the experimental conditions. For example, if an experiment demonstrates that
J2EE improves the quality of e-commerce Web applications, one cannot guarantee that J2EE will also improve the quality of educational
Web applications [18].
There are other concrete issues related to using a formal experiment or a
case study that may impact the choice of study. It may be feasible to control the variables, but at the expense of a very high cost or high degree of
risk. If replication is possible but at a prohibitive cost, then a case study
should be used [7]. A summary of the characteristics of each type of empirical investigation is given in Table 1.4.
Table 1.4. Summary characteristics of the three types of empirical investigations
Characteristic
Scale
Control
Replication
Generalisation
Survey
Case study
Research-in-the- Research-in-thelarge
typical
No control
Low level of
control
No
Low
Results represen- Only applicable to
tative of sampled other projects of
population
similar type and
size
Formal experiment
Research-in-the-small
High level of control
High
Can be generalised
within the experimental
conditions
20
Emilia Mendes, Nile Mosley, Steve Counsell
There are a set of steps broadly common to all three types of investigations, and these are described below.
Define the Goal(s) of Your Investigation and Its Context
Goals are crucial for the success of all activities in an investigation. Thus,
it is important to allow enough time to fully understand and set the goals
so that each is clear and measurable. Goals represent the research questions, which may also be presented by a number of hypotheses. By setting
the research questions or hypotheses it becomes easier to identify the dependent and independent variables for the investigation [7]. A dependent
variable is a variable whose behaviour we want to predict or explain. An
independent variable is believed to have a causal relationship with, or have
influence upon, the dependent variable [31]. Goals also help determine
what the investigation will do, and what data is to be collected. Finally, by
understanding the goals we can also confirm if the type of investigation
chosen is the most suitable type to use [7].
Each hypothesis of an investigation will later be either supported or rejected. An example of hypotheses is given below [31]:
H0 Using J2EE produces the same quality of Web applications as using
ASP.NET.
H1 Using J2EE produces a different quality of Web applications than using ASP.NET.
H0 is called the null hypothesis, and assumes the quality of Web applications developed using J2EE is similar to that of Web applications developed using ASP.NET. In other words, it assumes that data samples for
both come from the same population. In this instance, we have two samples, one representing quality values for Web applications developed using
J2EE, and the other, quality values for Web applications developed using
ASP.NET. Here, quality is our dependent variable, and the choice of programming framework (e.g. J2EE or ASP.NET), the independent variable.
H1 is called the alternative or research hypothesis, and represents what is
believed to be true if the null hypothesis is false. The alternative hypothesis assumes that samples do not come from the same sample population.
Sometimes the direction of the relationship between dependent and independent variables is also presented as part of an alternative hypothesis. If
H1 also suggested a direction for the relationship, it could be described as:
H1 Using J2EE produces a better quality of Web applications than using
ASP.NET.
To confirm H1 it is first necessary to reject the null hypothesis and, second, show that quality values for Web applications developed using J2EE
The Need for Web Engineering: An Introduction
21
are significantly higher than quality values for Web applications developed
using ASP.NET.
We have presented both null and alternative hypotheses since they are
both equally important when presenting the results of an investigation,
and, as such, both should be documented.
In addition to defining the goals of an investigation, it is also important
to document the context of the investigation [19]. One suggested way to
achieve this is to provide a table (see Table 1.3) describing the entities,
attributes, and measures that are the focus of the investigation.
Prepare the Investigation
It is important to prepare an investigation carefully to obtain results from
which one can draw valid conclusions, even if these conclusions cannot be
scaled up. For case studies and formal experiments it is important to define
the variables that can influence the results, and once defined, decide how
much control one can have over them [7].
Consider the following case study which would represent a poorly prepared investigation.
The case study aims to investigate, within a given organisation, the effect of using the programming framework J2EE on the quality of the resulting Web application. Most Web projects in this organisation are developed using ASP.NET, and all the development team has experience with
this language. The type of application representative of the majority of
applications this organisation undertakes is in electronic commerce (ecommerce), and a typical development team has two developers. Therefore, as part of the case study, an e-commerce application is to be developed using J2EE by two developers. Because we have stated this is a
poorly executed case study, we will assume that no other variables have
been considered, or measured (e.g. developers’ experience, development
environment).
The e-commerce application is developed, and the results of the case
study show that the quality of the delivered application, measured as the
number of faults per Web page, is worse than that for the other similar
Web applications developed using ASP.NET. When questioned as to why
these were the results obtained, the investigator seemed puzzled, and without a clear explanation.
What is missing?
The investigator should have anticipated that other variables can also
have an effect on the results of an investigation, and should therefore be
taken into account. One such variable is developers’ programming experience. Without measuring experience prior to the case study, it is impossible to discern if the lower quality is due to J2EE or to the effects of learning J2EE as the investigation proceeds. It is possible that one or both
22
Emilia Mendes, Nile Mosley, Steve Counsell
developers did not have experience with J2EE, and lack of experience has
interfered with the benefits of its use.
Variables such as developers’ experience should have been anticipated
and if possible controlled, or risk obtaining results that will be incorrect.
To control a variable is to determine a subset of values for use within
the context of the investigation from the complete set of possible values
for that variable. For example, using the same case study presented above,
if the investigator had measured developers’ experience with J2EE (e.g.
low, medium, high), and was able to control this variable, then (s)he could
have determined that two developers experienced with J2EE should participate in the case study. If there were no developers with experience in
J2EE, two would be selected and trained.
If, when conducting a case study, it is not possible to control certain
variables, they should still be measured, and the results documented.
If, however, all variables are controllable, then the type of investigation
to use is a formal experiment.
Another important issue is to identify the population being studied and
the sampling technique used (see Chap. 12 for further details on sampling).
For example, if a survey was designed to investigate the extent to which
project managers use automatic project management tools, then a data
sample of software programmers is not going to be representative of the
population that has been initially specified.
With formal experiments, it is important to describe the process by
which experimental subjects and objects are selected and assigned to
treatments [19[, where a treatment represents the new tool, programming
language, or methodology you want to evaluate. The experimental object,
also known as experimental unit, represents the object to which the treatment is to be applied (e.g. development project, Web application, code).
The control object does not use or is not affected by the treatment [7]. In
software and Web engineering it is difficult to have a control in the same
way as in, say, formal medical experiments. For example, if you are investigating the effect of a programming framework on quality, and your
treatment is J2EE, you cannot have a control that is “no programming
framework” [19]. Therefore, many formal experiments use as their control
a baseline representing what is typical in an organisation. Using the example given previously, our control would be ASP.NET since it represents
the typical programming framework used in the organisation. The experimental subject is the “who” applying the treatment [7].
As part of the preparation of an investigation we also include the preparation and validation of data collection instruments. Examples are questionnaires, automatic measurement tools, timing sheets, etc. Each has to
be prepared carefully such that it clearly and unambiguously identifies
what is to be measured. For each variable it is important also to identify
The Need for Web Engineering: An Introduction
23
its measurement scale and measurement unit. So, if you are measuring
effort, then you should also document its measurement unit (e.g. person
hours, person months) or else obtain incorrect and conflicting data. It is
also important to document at which stage during the investigation the
data collection takes place. If an investigation gathers data on developers’
programming experience (before they develop a Web application), size
and effort used to design the application, and size and effort used to implement the application, then a diagram, such as the one in Fig. 1.2, may
be provided to all participants to help clarify what instrument(s) to use
and when to use them.
It is usual for instruments to be validated using pilot studies. A pilot
study uses similar conditions to those planned for the real investigation,
such that any possible problems can be anticipated.
Finally, it is also important to document the methods used to reduce any
bias.
Functional
Requirements
Data and
Navigation
Design
Implementation
Testing
Evaluation
st
1 data collection point
questionnaire 1
2
nd
data collection point
questionnaire 2
rd
3 data collection point
questionnaire 3
Fig. 1.2. Plan detailing when to apply each instrument
Analysing the Data and Reporting the Results
The main aspect of this final step is to understand the data collected and to
apply statistical techniques that are suitable for the research questions or
hypotheses of the investigation. For example, if the data was measured
using a nominal or ordinal scale then statistical techniques that use the
mean cannot be applied as this would violate the principles of the representational theory of measurement. If the data is not normally distributed
then it is possible to use non-parametric or robust techniques, or transform
the data to conform to the normal distribution [7]. Further details on empirical evaluations are provided in Chap. 13. In addition, several statistical
techniques to analyse and report the data are presented throughout this
book and further detailed in Chap. 12.
24
Emilia Mendes, Nile Mosley, Steve Counsell
1.5 Conclusions
This chapter discussed differences between Web and software applications, and their development processes based on the following 12 areas:
1. Application Characteristics
2. Primary Technologies Used
3. Approach to Quality Delivered
4. Development Process Drivers
5. Availability of the Application
6. Customers (Stakeholders)
7. Update Rate (Maintenance Cycles)
8. People Involved in Development
9. Architecture and Network
10. Disciplines Involved
11. Legal, Social, and Ethical issues
12. Information Structuring and Design
In addition, it discussed the need for empirical investigation in Web engineering, and introduced the three main types of empirical investigation –
surveys, case studies, and formal experiments.
Acknowledgements
We would like to thank Tayana Conte for her comments on a previous
version of this chapter.
References
1
Basili VR (1996) The role of experimentation in software engineering: past,
current, and future. In: Proceedings of the 18th International Conference on
Software Engineering, 25−30 March, pp 442−449
2
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families
of experiments. IEEE Transactions on Software Engineering, July−Aug,
25(4):456−473
3
Brereton P, Budgen D, Hamilton G (1998) Hypertext: the next maintenance
mountain, Computer, December, 31(12):49–55
4
Christodoulou SP, Zafiris PA, Papatheodorou TS (2000) WWW2000: The
developer's view and a practitioner's approach to Web engineering. In: Proceedings of the 2nd ICSE Workshop on Web Engineering, pp 75−92
5
Deshpande Y, Hansen S (2001) Web engineering: creating a discipline among
disciplines, IEEE Multimedia, April−June, 8(2):8−87
The Need for Web Engineering: An Introduction
25
6
Deshpande Y, Murugesan S, Ginige A, Hansen S, Schwabe D, Gaedke M,
White B (2002) Web engineering. Journal of Web Engineering, October,
1(1):3−17
7
Fenton NE, Pfleeger SL (1997) Software metrics: a rigorous and practical
approach, 2nd edn. PWS Publishing Company
8
Fraternali P, Paolini P (2000) Model-driven development of Web applications: the AutoWeb system. ACM Transactions on Information Systems
(TOIS), October , 18(4):1−35
9
Gellersen H, Wicke R, Gaedke M (1997) WebComposition: an objectoriented support system for the Web engineering lifecycle. Journal of Computer Networks and ISDN Systems, September, 29(8−13):865−1553. Also
(1996) In: Proceedings of the Sixth International World Wide Web Conference, pp 429−1437
10 Gellersen H-W, Gaedke M (1999) Object-oriented Web application development. IEEE Internet Computing, January/February, 3(1):60−68
11 Ginige A (2002) Workshop on web engineering: Web engineering: managing
the complexity of Web systems development. In: Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering,
July, pp 72−729
12 Ginige A, Murugesan S (2001) Web engineering: an introduction. IEEE Multimedia, January/March, 8(1):14−18
13 Goldstein M, Goldstein IF (1978) How we know: an exploration of the scientific process, Plenum Press, New York
14 Harper Collins Publishers (2000) Collins English Dictionary
15 Houghton Mifflin Company (1994) The American Heritage Concise Dictionary, 3rd edn.
16 Horowitz E (1998) Migrating software to the World Wide Web. IEEE Software, May/June, 15(3):18−21
17 IEEE Std. 2001–2002 (2003) Recommended Practice for the Internet Web
Site Engineering, Web Site Management, and Web Site Life Cycle, IEEE.
18 Kitchenham B, Pickard L, Pfleeger SL (1995) Case studies for method and
tool evaluation. IEEE Software, 12(4):52−62
19 Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam
K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Transactions on Software Engineering, August,
28(8):721−734
20 Lee SC, Shirani AI (2004) A component based methodology for Web application development. J of Systems and Software, 71(1−2):177−187
21 Murugesan S, Deshpande Y (2001) Web Engineering, Managing Diversity
and Complexity of Web Application Development, Lecture Notes in Computer Science 2016, Springer Verlag, Heidelberg
26
Emilia Mendes, Nile Mosley, Steve Counsell
22 Murugesan S, Deshpande Y (2002) Meeting the challenges of web application
development: the web engineering approach. In: Proceedings of the 24th International Conference on Software Engineering, May, pp 687−688
23 Offutt J (2002) Quality attributes of Web software applications. IEEE Software, March/April, 19(2):25−32
24 Pfleeger SL, Jeffery R, Curtis B, Kitchenham B (1997) Status report on software measurement. IEEE Software, March/April, 14(2):33−43
25 Pressman RS (1998) Can Internet-based applications be engineered? IEEE
Software, September/October, 15(5):104−110
26 Reifer DJ (2000) Web development: estimating quick-to-market software.
IEEE Software, November/December:57−64
27 Ricca F, Tonella P (2001) Analysis and testing of Web applications. In: Proceedings of the 23rd International Conference on Software Engineering, pp
25−34
28 Standing C (2002) Methodologies for developing Web applications. Information and Software Technology, 44(3):151−160
29 Taylor MJ, McWilliam J, Forsyth H, Wade S (2002) Methodologies and website development: a survey of practice. Information and Software Technology,
44(6):381−391
30 Wikipedia, http://en.wikipedia.org/wiki/Main_Page (accessed on 25 October
2004)
31 Wild C, Seber G (2000) Chance Encounters: a First Course in Data Analysis
and Inference, John Wiley & Sons, New York
Authors’ Biographies
Dr. Emilia Mendes is a Senior Lecturer in Computer Science at the University of
Auckland (New Zealand), where she leads the WETA (Web Engineering, Technology and Applications) research group. She is the principal investigator in the
Tukutuku Research project,6 aimed at developing and comparing Web effort models using industrial Web project data, and benchmarking productivity within and
across Web companies. She has active research interests in Web measurement and
metrics, and in particular Web cost estimation, Web size measures, Web productivity and quality measurement, and Web process improvement. Dr. Mendes is on
the programme committee of numerous international conferences and workshops,
and on the editorial board of the International Journal of Web Engineering and
Technology and the Journal of Web Engineering. She has collaborated with Web
companies in New Zealand and overseas on Web cost estimation and usability
measurement. Dr. Mendes worked in the software industry for ten years before
obtaining her PhD in Computer Science from the University of Southampton
6
http://www.cs.auckland.ac.nz/tukutuku/.
The Need for Web Engineering: An Introduction
27
(UK), and moving to Auckland. She is a member of the New Zealand and Australian Software Measurement Associations.
Dr. Nile Mosley is the Technical Director of a software development company.
He has active research interests in software measurement and metrics, and objectoriented programming languages. He obtained his PhD in Pure and Applied Mathematics from Nottingham Trent University (UK).
Steve Counsell obtained a BSc (Hons) in Computer Studies from the University
of Brighton and an MSc in Systems Analysis from the City University in 1987 and
1988, respectively. After spending some time in industry as a developer, he obtained his PhD in 2002 from the University of London and is currently a Lecturer
in the Department of Information Systems and Computing at Brunel University.
Prior to 2004, he was a Lecturer in the School of Computer Science and Information Systems at Birkbeck, University of London and between 1996 and 1998 was a
Research Fellow at the University of Southampton. In 2002, he was a BT Shortterm Research Fellow. His research interests are in software engineering, more
specifically metrics and empirical studies.
2 Web Effort Estimation
Emilia Mendes, Nile Mosley, Steve Counsell
Abstract: Software effort models and effort estimates help project managers allocate resources, control costs, and schedule and improve current
practices, leading to projects that are finished on time and within budget.
In the context of Web development and maintenance, these issues are also
crucial, and very challenging, given that Web projects have short schedules and a highly fluidic scope. Therefore this chapter has two main objectives. The first is to introduce the concepts related to effort estimation and
in particular Web effort estimation. The second is to present a case study
where a real effort prediction model based on data from completed industrial Web projects is constructed step by step.
Keywords: Web effort estimation, Manual stepwise regression, Effort models, Web size measures, Prediction accuracy, Data analysis.
2.1 Introduction
The Web is used as a delivery platform for numerous types of Web applications, ranging from complex e-commerce solutions with back-end databases
to on-line personal static Web pages. With the sheer diversity of Web application types and technologies employed, there exists a growing number of
Web companies bidding for as many Web projects as they can accommodate.
As usual, in order to win the bid, companies estimate unrealistic schedules,
leading to applications that are rarely developed within time and budget.
Realistic effort estimates are fundamental for the successful management of software projects; the Web is no exception. Having realistic estimates at an early stage in a project's life cycle allows project managers and
development organisations to manage their resources effectively.
To this end, prediction is a necessary part of an effective process,
whether it be authoring, design, testing, or Web development as a whole.
A prediction process involves:
•
•
The identification of measures (e.g. number of new Web pages, number of new images) that are believed to influence the effort required to
develop a new Web application.
The formulation of theories about the relationship between the selected
measures and effort (e.g. the greater the number of new static Web
pages, the greater the development effort for a new application).
30
•
•
•
Emilia Mendes, Nile Mosley, Steve Counsell
The capturing of historical data (e.g. size and actual effort) about past
Web projects or even past development phases within the same project.
The use of this historical data to develop effort estimation models for
use in predicting effort for new Web projects.
The assessment of how effective those effort estimation models are,
i.e. the assessment of their prediction accuracy.
Cost and effort are often used interchangeably within the context of effort estimation (prediction) since effort is taken as the main component of
project costs. However, given that project costs also take into account
other factors such as contingency and profit [20]we will use the word “effort” and not “cost” throughout this chapter.
Numerous effort estimation techniques have been proposed and compared over the last 20 years. A classification and description of such techniques is introduced in Sect. 2.2 to help provide readers with a broader
overview. To be useful, an effort estimation technique must provide an
effort estimate for a new project that is not widely dissimilar from the actual effort this project will need to be finished. The effectiveness of effort
estimation techniques to provide accurate effort estimates is called prediction power. Section 2.3 presents the four most commonly used measures of
prediction power and, in Section 2.4, the associated prediction accuracy.
Finally, Sect. 2.5 details a case study building an effort estimation model
using data from world-wide industrial Web projects.
2.2 Effort Estimation Techniques
The purpose of estimating effort is to predict the amount of effort to accomplish a given task, based on knowledge of other project characteristics
that are believed to be related to effort. Project characteristics (independent
variables) are the input, and effort (dependent variable) is the output we
wish to predict (see Fig. 2.1). For example, a given Web company may
find that to predict the effort necessary to implement a new Web application, it will require the following input: estimated number of new Web
pages, total number of developers who will help develop the new Web
application, developers’ average number of years of experience with the
development tools employed, and the number of functions/features (e.g.
shopping cart) to be offered by the new Web application.
A task to be estimated can be as simple as developing a single function
(e.g. creating a table on the database) or as complex as developing a large
application, and in general the one input (independent variable) assumed to
have the strongest influence on effort is size. Other independent variables
may also be influential (e.g. developers’ average experience, number of
Web Effort Estimation
31
tools employed) and these are often identified as cost drivers. Depending
on the techniques employed, we can also use data on past finished projects
to help estimate effort for new projects.
Estimated
size and
cost drivers
Step 2
Deriving
an effort
estimate
Step 3
Step 1
Estimated
effort +
effort
accuracy
Data on
finished
projects
Fig. 2.1. Components of a cost model
Several techniques for effort estimation have been proposed over the
past 30 years in software engineering. These fall into three general categories [37]: expert opinion, algorithmic models and artificial intelligence
techniques.
2.2.1 Expert Opinion
Expert opinion represents the process of estimating effort by subjective
means, and is often based on previous experience from developing/managing similar projects. It has been and still is widely used in software and Web development.
The drawback of this technique is that it is very difficult to quantify and
to determine those factors that have been used to derive an estimate, making it difficult to repeat. However, studies show that this technique can be
an effective estimating tool when used in combination with other less subjective techniques (e.g. algorithmic models) [11,30,31].
In terms of the diagram presented in Fig. 2.1, the sequence occurs as
follows:
a) An expert looks at the estimated size and cost drivers related to a new
project for which effort needs to be estimated.
b) Based on the data obtained in a) (s)he remembers or retrieves data on
past finished projects for which actual effort is known.
c) Based on the data from a) and b) (s)he subjectively estimates effort for
the new project. Deriving an accurate effort estimate is more likely to
occur when there are completed projects similar to the one having its
32
Emilia Mendes, Nile Mosley, Steve Counsell
effort estimated. The sequence described corresponds to steps 2, 1, and
3 in Fig. 2.1. The knowledge regarding the characteristics of a new
project is necessary to retrieve, from either memory or a database,
knowledge on finished similar projects. Once this knowledge is retrieved, effort can be estimated.
2.2.2 Algorithmic Techniques
To date, the most popular techniques described in the effort estimation
literature are algorithmic techniques. Such techniques attempt to formalise
the relationship between effort and one or more project characteristics.
The result is an algorithmic model. The central project characteristic used
in such a model is usually taken to be some notion of software size (e.g.
the number of lines of source code, number of Web pages, number of
links). This formalisation is often translated as an equation such as that
shown by Eq. 2.1, where a and b are parameters that also need to be estimated. Equation 2.1 shows that size is the main factor contributing to
effort, and can be adjusted according to an Effort Adjustment Factor
(EAF), calculated from cost drivers (e.g. developers, experience, tools).
An example of an algorithmic model that uses Eq. 2.1 is the COnstructive
COst MOdel (COCOMO) model [2], where parameters a and b are based
on the type of project under construction, and the EAF is based on 15 cost
drivers that are calculated and then summed.
Estimated
Effort = a EstSizeNewproj b EAF
(2.1)
where:
a, b are parameters chosen based on certain criteria, such as the type of
software project being developed. EstSizeNewproj is the estimated size for
the new project. EAF is the Effort Adjustment Factor.
Equations 2.2 and 2.3 are different examples of algorithmic equations
(models), where both are obtained by applying regression analysis techniques [33 ]on data sets of past completed projects. Equation 2.2 assumes a
linear relationship between effort and its size/cost drivers whereas Equation 2.3 assumes a non-linear relationship. In Equation 2.3, when the exponent is < 1 we have economies of scale, i.e., larger projects use less effort comparatively than smaller projects. The opposite situation (exponent
> 1) gives diseconomies of scale, i.e. larger projects use more effort comparatively than smaller projects.
EstimatedEffort = C + a0 EstSizeNewproj + a1 CD1 + " + an CDn
(2.2)
EstimatedEffort = C EstSizeNewproj a0 CD1a1 "CDn an
(2.3)
Web Effort Estimation
33
where:
C is a constant denoting the initial estimated effort (assuming size and cost
drivers to be zero) derived from past data.
a0 ... an are parameters derived from past data.
CD1…CDn are other project characteristics, other than size, that have an
impact on effort.
The COCOMO model is an example of a generic algorithmic model, believed to be applicable to any type of software project, with suitable calibration or adjustment to local circumstances. In terms of the diagram presented in Fig. 2.1, the model uses parameter values that are based on past
project data; however, for anyone wishing to use this model, the steps to
use are 1, 2, and 3. Step 1 is used only once to calculate the initial values
for its parameters, which are then fixed from that point onwards. The single use of step 1 makes this model a generic algorithmic model.
Regression-based algorithmic models are most suitable to local circumstances such as “in-house” analysis as they are derived from past data that
often represents projects from the company itself. Regression analysis,
used to generate regression-based algorithmic models, provides a procedure for determining the “best” straight-line fit to a set of project data that
represents the relationship between effort (the response or dependent variable) and project characteristics (e.g. size, experience, tools, the predictor
or independent variables) [33]. The regression line is represented as an
equation, such as those given by Eqs. 2.1 and 2.2. The effort estimation
models we will create in Sect. 2.5 fall into this category.
Regarding the regression analysis itself, two of the most widely used
techniques are multiple regression (MR) and stepwise regression (SWR).
The difference between both is that MR obtains a regression line using all
the independent variables at the same time, whereas SWR is a technique
that examines different combinations of independent variables, looking for
the best grouping to explain the greatest amount of variation in effort. Both
use least squares regression, where the regression line selected is the one
that reflects the minimum values of the sum of the squared errors. Errors
are calculated as the difference between actual and estimated effort and are
known as the residuals [33].
The sequence followed here is as follows:
a) Past data is used to generate a cost model.
b) This model then receives, as input, values for the new project characteristics.
c) The model generates estimated effort. The sequence described herein
corresponds to steps 1, 2, and 3 from Fig. 2.1, in contrast to that for expert opinion.
A description of regression analysis is presented in Chap. 12.
34
Emilia Mendes, Nile Mosley, Steve Counsell
2.2.3 Artificial Intelligence Techniques
Artificial intelligence techniques have, in the last decade, been used as a
complement to, or as an alternative to, the previous two categories. Examples include fuzzy logic [22[, regression trees [34[, neural networks [38],
and case-based reasoning [37]. We will cover case-based reasoning (CBR)
and regression trees (CART) in more detail as they are currently the most
popular machine learning techniques employed for Web cost estimation. A
useful summary of numerous machine learning techniques can also be
found in [10].
Case-Based Reasoning
Case-based reasoning (CBR) provides estimates by comparing the current
problem to be estimated against a library of historical information from
completed projects with a known effort (case base). It involves [1]:
Characterising a new project p, for which an estimate is required, with
attributes (features) common to those completed projects stored in the
case base. In terms of software cost estimation, features represent size
measures and cost drivers which have a bearing on effort. Feature values are normally standardized (between 0 and 1) such that they have
the same degree of influence on the result.
ii. Use of this characterisation as a basis for finding similar (analogous)
completed projects, for which effort is known. This process can be
achieved by measuring the “distance” between two projects, based on
the values of the number of features (k) for these projects. Although
numerous techniques can be used to measure similarity, nearest
neighbour algorithms using the unweighted Euclidean distance measure have been the most widely used to date in software and Web engineering.
iii. Generation of a predicted value of effort for project p based on the
effort for those completed projects that are similar to p. The number of
similar projects will depend on the size of the case base. For small case
bases (e.g. up to 90 cases), typical values are 1, 2, and 3 closest
neighbours (analogies). For larger case bases no conclusions have been
reached regarding the best number of similar projects to use. The calculation of estimated effort is obtained using the same effort value as
the closest neighbour, or the mean of effort for two or more analogies.
This is the common choice in Web and software engineering.
i.
Web Effort Estimation
35
The sequence of steps used with CBR is as follows:
a) The estimated size and cost drivers relating to a new project are used to
retrieve similar projects from the case base, for which actual effort is
known.
b) Using the data from a) a suitable CBR tool retrieves similar projects
and calculates estimated effort for the new project. The sequence just
described corresponds to steps 2, 1, and 3 in Fig. 2.1, similar to that
employed for expert opinion. The characteristics of a new project must
be known in order to retrieve finished similar projects. Once similar
projects are retrieved, then effort can be estimated.
When using CBR there are six parameters to consider [35]:
•
•
•
•
•
•
Feature Subset Selection
Similarity Measure
Scaling
Number of Analogies
Analogy Adaptation
Adaptation Rules
Feature Subset Selection
Feature subset selection involves determining the optimum subset of features that yield the most accurate estimation. Some existing CBR tools,
e.g. ANGEL [36 ]optionally offer this functionality using a brute force
algorithm, searching for all possible feature subsets. Other CBR tools (e.g.
CBR-Works) have no such functionality, and therefore to obtain estimated
effort, we must use all of the known features of a project to retrieve the
most similar cases.
Similarity Measure
The similarity measure measures the level of similarity between different
cases, with several similarity measures proposed in the literature. The most
popular in the current Web/software engineering literature [1,24,35 ]are
the unweighted Euclidean distance, the weighted Euclidean distance, and
the maximum distance. Other similarity measures are presented in [1].
Unweighted Euclidean distance: The unweighted Euclidean distance
measures the Euclidean (straight-line) distance d between the points (x0,y0)
and (x1,y1), given by the equation:
d = ( x0 − x1 ) 2 + ( y0 − y1 ) 2
(2.4)
36
Emilia Mendes, Nile Mosley, Steve Counsell
This measure has a geometrical meaning as the shortest distance between two points in an n-dimensional Euclidean space [1].
Page-complexity
Y1
d
Y0
Page-count
x0
x1
Fig. 2.2. Euclidean distance using two size attributes
Figure 2.2 illustrates this distance by representing coordinates in a twodimensional space, E2. The number of features employed determines the
number of dimensions, En.
Weighted Euclidean distance: The weighted Euclidean distance is used
when features vectors are given weights that reflect the relative importance
of each feature. The weighted Euclidean distance d between the points
(x0,y0) and (x1,y1) is given by the following equation:
d = wx ( x0 − x1 ) 2 + w y ( y0 − y1 ) 2
(2.5)
where wx and wy are the weights of x and y respectively.
Maximum distance: The maximum distance computes the highest feature similarity, i.e. the one to define the closest analogy. For two points
(x0,y0) and (x1,y1), the maximum measure d is equivalent to the formula:
d = max(( x 0 − x1 ) 2 , ( y 0 − y1 ) 2 )
(2.6)
This effectively reduces the similarity measure down to a single feature,
although this feature may differ for each retrieval episode. So, for a given
“new” project Pnew, the closest project in the case base will be the one that
has at least one size feature with the most similar value to the same feature
in project Pnew.
Web Effort Estimation
37
Scaling
Scaling (also known as standardisation) represents the transformation of
attribute values according to a defined rule, such that all attributes present
values within the same range and hence have the same degree of influence
on the results [1]. A common method of scaling is to assign zero to the
minimum observed value and one to the maximum observed value [15].
This is the strategy used by ANGEL.
Number of Analogies
The number of analogies refers to the number of most similar cases that
will be used to generate the estimation. With small sets of data it is reasonable to consider only a small number of analogies [1]. Several studies in
software engineering have restricted their analysis to the closest analogy (k = 1.0) [3,30], while others have used two and three analogies
[1,13,14,24,25,27,32].
Analogy Adaptation
Once the similar cases have been selected the next step is to decide how to
generate the estimation for project Pnew. Choices of analogy adaptation techniques presented in the literature vary from the nearest neighbour [3,14], the
mean of the closest analogies [36], the median [1], inverse distance weighted
mean and inverse rank weighted mean [15], to illustrate just a few. The adaptations used to date for Web engineering are the nearest neighbour, mean of
the closest analogies [24,25], and the inverse rank weighted mean [26,27].
Each adaptation is explained below:
Mean: The average of k analogies, when k > 1. This is a typical measure of
central tendency, often used in the software and Web engineering literature. It treats all analogies as being equally influential on estimated effort.
Median: The median of k analogies, when k > 2. This is also a measure of
central tendency, and has been used in the literature when the number of
closest projects increases [1].
Inverse rank weighted mean: Allows higher ranked analogies to have more
influence than lower ones. If we use three analogies, for example, the closest analogy (CA) would have weight = 3, the second closest (SC) weight =
2, and the third closest (LA) weight = 1. The estimation would then be
calculated as:
InverseRankWeighedMean =
3CA + 2 SC + LA
6
(2.7)
38
Emilia Mendes, Nile Mosley, Steve Counsell
Adaptation Rules
Adaptation rules are used to adapt estimated effort, according to a given
criterion, such that it reflects the characteristics of the target project more
closely. For example, in the context of effort prediction, the estimated effort to develop an application would be adapted such that it would also
take into consideration the application’s size values.
Classification and Regression Trees
The objective of a Classification and Regression Tree (CART) model is to
develop a simple tree-structured decision process for describing the distribution of a variable r given a vector of predictors vp [5]. A CART model
represents a binary tree where the trees’ leaves suggest values for r based
on existing values of vp. For example, assume the estimated effort to develop a Web application can be determined by an estimated number of
pages (WP), number of images (IM), and number of functions (FN).
A regression tree such as the one shown in Fig. 2.3 is generated from data
obtained from past finished Web applications, taking into account their
existing values of effort, WP, IM, and FN. These are the predictors that
make up the vector vp. Once the tree has been built it is used to estimate
effort for a new project. So, to estimate effort for a new project where
WP = 25, IM = 10, and FN = 4 we would navigate down the tree structure
to find the estimated effort. In this case, 45 person hours.
Whenever predictors are numerical the CART tree is called a regression
tree and whenever predictors are categorical the CART tree is called a
classification tree.
WP
WP < = 50
WP > 50
IM
IM
IM
IM < =
IM > = 10
<
FN
Effort = 25
FN <
Effort = 45
Effort = 75
IM > 20
Effort = 110
FN > = 5
Effort = 65
Fig. 2.3. Example of a regression tree for Web cost estimation
Web Effort Estimation
39
A CART model constructs a binary tree by recursively partitioning the
predictor space (set of values of each of the predictors in vector vp) into
subsets where the distribution of values for the response variable (effort) is
successively more uniform. The partition itself is determined by splitting
rules associated with each of the non-leaf nodes.
A “purity” function calculated from the predictor data is employed to
split each node. There are numerous types of “purity” functions where the
choice is determined by the software tool used to build the CART model,
the type of predictors employed, and the goals for using a CART model
(e.g. using it for cost estimation). The sequence used with CART is as
follows:
a) Past data is used to generate a CART model.
b) This model is then traversed manually in order to obtain estimated
effort, using as input values for the new project characteristics.
c) The sequence described corresponds to steps 1, 2, and 3 from Fig. 2.1,
in contrast to that for expert opinion and CBR.
2.3 Measuring Effort Prediction Power and Accuracy
An effort estimation model m uses historical data of finished projects to
predict the effort of a new project. Some believe this is enough to provide
accurate effort estimates. However, to gauge the accuracy of this model we
need to measure its predictive accuracy.
To measure a model’s predictive accuracy first calculate the predictive
power for each of a set of new projects p1 to pn that used the effort estimation model m. Once predictive power for p1 to pn has been obtained, their
values are aggregated, which gives the predictive power of model m and
hence its corresponding predictive accuracy.
This section describes how to measure the predictive power of a model,
and how to measure a model’s predictive accuracy.
2.3.1 Measuring Predictive Power
The most common approaches to date for measuring predictive power of
effort estimation models are:
•
•
•
The Mean Magnitude of Relative Error (MMRE) [37]
The Median Magnitude of Relative Error (MdMRE) [30]
The Prediction at level n (Pred(n)) [36 ]
40
Emilia Mendes, Nile Mosley, Steve Counsell
The basis for calculating MMRE and MdMRE is to use the Magnitude
of Relative Error (MRE) [16], defined as:
MRE =
e − ê
(2.8)
e
where e is the actual effort and ê is the estimated effort.
The mean of all MREs is the MMRE, calculated as:
MMRE =
1
n
i =n
¦
i =1
ei − eˆi
ei
(2.9)
As the mean is calculated by taking into account the value of every estimated and actual effort from the data set employed, the result may give a
biased assessment of a model’s predictive power when there are several
projects with large MREs.
An alternative to the mean is the median, which also represents a measure of central tendency, as it is less sensitive to the existence of several
large MREs. The median of MRE values for the number i of observations
(data values) is called the MdMRE.
Another indicator which is commonly used is the prediction at level l,
also known as Pred(l). This measures the percentage of effort estimates
that are within l % of their actual values.
MMRE, MdMRE, and Pred(l) are taken as the de facto standard evaluation
criteria to measure the predictive power of effort estimation models [39].
2.3.2 Measuring Predictive Accuracy
In order to calculate the predictive accuracy of a given effort estimation
model m, based on a given data set of finished projects d, we do the following:
1. Divide the data set d into a training set t and a validation set v. It is
common to create training sets that use 66% of the projects from the
complete data set, leaving 34% for the validation set.
2. Using t, produce an effort estimation model m (if applicable).
3. Using m, predict the effort for each of the projects in v, simulating new
projects for which effort is unknown.
Once done, we will have, for each project in v, an estimated effort ê, calculated using the model m, and also the actual effort e that the project actually used. We are now able to calculate the predictive power (MRE) for each
project in the validation set v. The final step, once we have obtained the
Web Effort Estimation
41
predictive power for each project, is to aggregate these values to obtain
MMRE, MdMRE, and Pred(25) for v, which is taken to be the same for m.
Calculated MMREs and MdMREs with values up to 0.25, and Pred(25)
at 75% or above, indicate good prediction models [6].
This splitting of a data set into training and validation sets is also known
as cross-validation. An n-fold cross-validation means the original data set
is divided into n subsets of training and validation sets. When the validation set has only one project the cross-validation is called “leave-one-out”
cross-validation. This is an approach commonly used when assessing prediction accuracy using CBR.
2.4 Which Is the Most Accurate Prediction Technique?
Section 2.2 introduced numerous techniques for obtaining effort estimates
for a new project, and all have been used, each with a varying degree of
success. Therefore the question that is often asked is: Which of the techniques provides the most accurate prediction?
To date, the answer to this question has been simply “it depends”.
Algorithmic models have some advantages over machine learning and
expert opinion, such as:
1. Allowing users to see how a model derives its conclusions, an important factor for verification as well as theory building and understanding
of the process being modelled [10].
2. The need to be specialised relative to the local environment in which
they are used [21,7].
Despite these advantages, no convergence on which effort estimation
technique has the best predictive power has yet been reached, even though
comparative studies have been carried out over the last 15 years (e.g.
[1,3,4,8–10,12–16,30,32,35–37]).
One justification is that these studies have used data sets with differing
characteristics (e.g. number of outliers,1 amount of collinearity,2 number of
variables, number of projects) and different comparative designs.
Shepperd and Kadoda [35 ]presented evidence for the relationship between the success of a particular technique and training set size, nature of
the “effort estimation” function (e.g. continuous3 or discontinuous4), and
1
2
3
An outlier is a value which is far from the others.
Collinearity represents the existence of a linear relationship between two or
more independent variables.
A continuous function is one in which “small changes in the input produce
small changes in the output” (http://e.wikipedia.org/wiki/Continuous_function).
42
Emilia Mendes, Nile Mosley, Steve Counsell
characteristics of the data set. They concluded that the “best” prediction
technique that can work on any type of data set may be impossible to
obtain.
Mendes et al. [28]investigated three techniques for Web effort estimation (stepwise regression, case-based reasoning, and regression trees) by
comparing the prediction accuracy of their respective models. Stepwise
regression provided the best results overall. This trend has also been
confirmed using a different data set of Web projects [29]. This is therefore
the technique to be used in Sect. 2.5 to build an effort estimation model for
estimating effort for Web projects.
2.5 Case Study
The case study we present here describes the construction and further validation of a Web effort estimation model using data on industrial Web projects, developed by Web companies worldwide, from the Tukutuku database [29].5 This database is part of the ongoing Tukutuku project,6 which
collects data on Web projects, for the development of effort estimation
models and to benchmark productivity across and within Web companies.
The database contains data on 87 Web projects: 34 and 13 come from 2
single Web companies respectively and the remaining 40 projects come
from another 23 companies. The Tukutuku database uses 6 variables to
store specifics about each company that volunteered projects, 10 variables
to store particulars about each project, and 13 variables to store data about
each Web application (see Table 2.1). Company data is obtained once and
both project and application data are gathered for each volunteered project.
All results presented were obtained using the statistical software SPSS
10.1.3 for Windows. Further details on the statistical methods used throughout this case study are given in Chap. 12. Finally, all the statistical tests
set the significance level at 95% (α = 0.05).
4
5
6
“If small changes in the input can produce a broken jump in the changes of the
output, the function is said to be discontinuous (or to have a discontinuity)”
(http://e.wikipedia.org/wiki/Continuous_function).
The raw data cannot be displayed here due to a confidentiality agreement with
those companies that have volunteered data on their projects.
http://www.cs.auckland.ac.nz/tukutuku.
Web Effort Estimation
43
Table 2.1. Variables for the Tukutuku database
7
NAME
SCALE
COMPANY DATA
COUNTRY
Categorical
ESTABLISHED Ordinal
SERVICES
Categorical
NPEOPLEWD Ratio
DESCRIPTION
Country company belongs to.
Year when company was established.
Type of services company provides.
Number of people who work on Web design and
development.
CLIENTIND
Categorical Industry representative of those clients to whom
applications are provided.
ESTPRACT
Categorical Accuracy of a company’s own effort estimation
practices.
PROJECT DATA
TYPEPROJ
Categorical Type of project (new or enhancement).
LANGS
Categorical Implementation languages used.
DOCPROC
Categorical If project followed defined and documented process.
PROIMPR
Categorical If project team involved in a process improvement
programme.
METRICS
Categorical If project team part of a software metrics programme.
DEVTEAM
Ratio
Size of project’s development team.
TEAMEXP
Ratio
Average team experience with the development
language(s) employed.
TOTEFF
Ratio
Actual total effort used to develop the Web application.
ESTEFF
Ratio
Estimated total effort necessary to develop the Web
application.
ACCURACY
Categorical Procedure used to record effort data.
WEB APPLICATION
TYPEAPP
Categorical Type of Web application developed.
TOTWP
Ratio
Total number of Web pages (new and reused).
NEWWP
Ratio
Total number of new Web pages.
TOTIMG
Ratio
Total number of images (new and reused).
NEWIMG
Ratio
Total number of new images created.
HEFFDEV
Ratio
Minimum number of hours to develop a single
function/feature by one experienced developer that
is considered high (above average).8
HEFFADPT
Ratio
Minimum number of hours to adapt a single function/feature by one experienced developer that is
considered high (above average).9
7
8
The different types of measurement scale are described in Chap. 12.
This number is currently set to 15 hours based on the collected data.
44
Emilia Mendes, Nile Mosley, Steve Counsell
NAME
HFOTS
SCALE
Ratio
HFOTSA
Ratio
HNEW
FOTS
Ratio
Ratio
FOTSA
NEW
Ratio
Ratio
7
DESCRIPTION
Number of reused high-effort features/functions
without adaptation.
Number of reused high-effort features/functions
adapted.
Number of new high-effort features/functions.
Number of reused low-effort features without adaptation.
Number of reused low-effort features adapted.
Number of new low-effort features/functions.
The following sections describe our data analysis procedure, adapted
from [23], which consists of:
1.
2.
3.
4.
5.
Data validation
Variables and model selection
Model inspection
Extraction of effort equation
Model validation
2.5.1 Data Validation
Data validation (DV) performs the first screening of the collected data. It
generally involves understanding what the variables are (e.g. purpose, scale
type, see Table 2.1) and also uses descriptive statistics (e.g. mean, median,
minimum, maximum) to help identify any missing or unusual cases.
Table 2.2 presents summary statistics for numerical variables. None of
the numerical variables seem to exhibit unusual or missing values, although this requires careful examination. For example, one would find it
strange to see zero as minimum value for Total Images (TOTIMG) or one
as minimum value for Total Web Pages (TOTWP). However, it is possible
to have either a Web application without any images or a Web application
that provides all its content and functionality within a single Web page.
Another example relates to the maximum number of Web pages, which is
2000 Web pages. Although it does not seem possible at first to have such
large number of pages we cannot simply assume this has been a data entry
error. We were unable to obtain confirmation from the source company.
However, further investigation revealed that 1980 pages were developed
from scratch, and numerous new functions/features (five high-effort and
seven low-effort) were also implemented. In addition, the development
team consisted of two people who had very little experience with the six
9
This number is currently set to 4 hours based on the collected data.
Web Effort Estimation
45
programming languages used. The total effort was 947 person hours,
which can correspond to a three-month project assuming both developers
worked at the same time. If we only consider number of pages and effort,
the ratio of number of minutes per page is 27:1, which seems reasonable
given the lack of experience of the development team and the number of
different languages they had to use.
Table 2.2. Descriptive statistics for numerical variables
Variables
DEVTEAM
TEAMEXP
TOTWP
NEWWP
TOTIMG
NEWIMG
HEFFDEV
HEFFADPT
HFOTS
HFOTSA
HNEW
FOTS
FOTSA
NEW
TOTEFF
ESTEFF
N
87
87
87
87
87
87
87
87
87
87
87
87
87
87
87
34
Min.
1
1
1
0
0
0
5
0
0
0
0
0
0
0
1
1
Max.
Mean
8
2.37
10
3.40
2000 92.40
1980 82.92
1820 122.54
800 51.90
800 62.02
200 10.61
3
.08
4
.29
10
1.24
15
1.07
10
1.89
13
1.87
5000 261.73
108 14.45
Median
2
2
25
7
40
0
15
4
0
0
0
0
1
0
43
7.08
Std. dev.
1.35
1.93
273.09
262.98
284.48
143.25
141.25
28.48
.41
.75
2.35
2.57
2.41
2.84
670.36
20.61
Once we have checked the numerical variables our next step is to check
the categorical variables using their frequency tables as a tool (see Tables
2.4 to 2.7).
Tables 2.4 to 2.6 show that most projects followed a defined and documented process, and that development teams were involved in a process
improvement programme and/or part of a software metrics programme.
These positive trends are mainly due to the two single companies that together volunteered data on 47 projects (54% of our data set). They have
answered “yes” to all three categories. No unusual trends seem to exist.
Table 2.7 shows that the majority of projects (83%) had the actual effort
recorded on a daily basis, for each project and/or project task. These numbers are inflated by the two single companies where one chose category
“good” (11 projects) and the other chose category “very good” (34 projects).
The actual effort recording procedure is not an adequate effort estimator per
46
Emilia Mendes, Nile Mosley, Steve Counsell
se, being used here simply to show that the effort data gathered seems to be
reliable overall.
Table 2.3. Frequency table for type of project
Type of project
New
Enhancement
Total
Frequency
39
48
87
%
44.8
55.2
100.0
Cumulative %
44.8
100.0
Table 2.4. Frequency table for documented process
Documented process
no
yes
Frequency
% Cumulative %
23 26.4
26.4
64 73.6
100.0
Total
87 100.0
Table 2.5. Frequency table for process improvement
Process improvement
no
yes
Frequency
% Cumulative %
28 32.2
32.2
59 67.8
100.0
Total
87 100.0
Table 2.6. Frequency table for metrics programme
Metrics programme
no
yes
Frequency
% Cumulative %
36 41.4
41.4
51 58.6
100.0
Total
87 100.0
Table 2.7. Frequency table for companies’ effort recording procedure
Actual effort recording procedure Frequency
% Cumulative %
Poor
12 13.8
13.8
Medium
3
3.4
17.2
Good
24 27.6
44.8
Very good
48 55.2
100
Total
87 100.0
Once the data validation is complete, we are ready to move on to the
next step, namely variables and model selection.
Web Effort Estimation
47
2.5.2 Variables and Model Selection
The second step in our data analysis methodology is sub-divided into two
separate and distinct phases: preliminary analysis and model building.
Preliminary analysis allows us to choose which variables to use, discard,
modify, and, where necessary, sometimes create. Model building determines an effort estimation model based on our data set and variables.
Preliminary Analysis
This important phase is used to create variables based on existing variables, discard unnecessary variables, and modify existing variables (e.g.
joining categories). The net result of this phase is to obtain a set of variables that are ready to use in the next phase, model building. Since this
phase will construct an effort model using stepwise regression we need to
ensure that the variables comply with the assumptions underlying regression analysis, which are:
1. The input variables (independent variables) are measured without error. If this cannot be guaranteed then these variables need to be normalised.
2. The relationship between dependent and independent variables is linear.
3. No important input variables have been omitted. This ensures that
there is no specification error associated with the data set. The use of a
prior theory-based model justifying the choice of input variables ensures this assumption is not violated.
4. The variance of the residuals is the same for all combinations of input
variables (i.e. the residuals are homoscedastic rather than heteroscedastic)10.
5. The residuals must be normally distributed.
6. The residuals must be independent, i.e. not correlated.11
7. The independent variables are not linearly dependent, i.e. there are no
linear dependencies among the independent variables.
The first task within the preliminary analysis phase is to examine the entire set of variables and check if there is a significant amount of missing
values (> 60%). If yes, they should be automatically discarded as they
prohibit the use of imputation methods12 and will further prevent the identification of useful trends in the data. Table 2.2 shows that only ESTEFF
presented missing values greater than 60%. ESTEFF was gathered to give
10
11
12
Further details are provided in Chap. 12.
Further details are provided in Chap. 12.
Imputation methods are methods used to replace missing values with estimated
values.
48
Emilia Mendes, Nile Mosley, Steve Counsell
an idea of each company’s own prediction accuracy; however, it will not
be included in our analysis since it is not an effort predictor per se. Note
that a large number of zero values on certain size variables do not represent missing or rounded values.
Next we present the analyses for numerical variables first, followed by
the analyses for categorical variables.
Numerical Variables: Looking for Symptoms
Our next step is to look for symptoms (e.g. skewness13, heteroscedasticity14, and outliers15) that may suggest the need for variables to be normalised, i.e. having their values transformed such that they resemble more
closely a normal distribution. This step uses histograms, boxplots, and
scatter plots.
Histograms, or bar charts, provide a graphical display, where each bar
summarises the frequency of a single value or range of values for a given
variable. They are often used to check if a variable is normally distributed,
in which case the bars are displayed in the shape of a bell-shaped curve.
Histograms for the numerical variables (see Figs. 2.4 to 2.6) suggest that
all variables present skewed distributions, i.e. values not symmetrical
about a central value.
Next we use boxplots to check the existence of outliers. Boxplots (see
Fig. 2.7) use the median, represented by the horizontal line in the middle
of the box, as the central value for the distribution. The box’s height is the
inter-quartile range, and contains 50% of the values. The vertical (whiskers) lines up or down from the edges contain observations which are less
than 1.5 times inter-quartile range. Outliers are taken as values greater than
1.5 times the height of the box. Values greater than 3 times the box’s
height are called extreme outliers [19].
13
14
15
Skewness measures to what extent the distribution of data values is symmetrical about a central value.
Heteroscedasticity represents unstable variance of values.
Outliers are unusual values.
Web Effort Estimation
DEVTEAM
TEAMEXP
30
50
40
20
30
20
Frequency
Frequency
10
Std. Dev = 1.35
Mean = 2.4
N = 87.00
0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Std. Dev = 1.93
Mean = 3.4
N = 87.00
0
2.0
8.0
4.0
6.0
8.0
10.0
TEAMEXP
DEVTEAM
(a)
(b)
TOTWP
NEWWP
100
100
80
80
60
60
40
40
20
Std. Dev = 271.04
Mean = 101.3
N = 87.00
0
120.0
560.0
1000.0
1440.0
Frequency
Frequency
10
20
Std. Dev = 262.98
Mean = 82.9
N = 87.00
0
120.0
560.0
1000.0
1440.0
1880.0
1880.0
NEWWP
TOTWP
(c)
(d)
TOTIMG
IMGNEW
100
100
80
80
60
60
40
20
Std. Dev = 284.48
Mean = 122.5
N = 87.00
0
108.3
425.0
TOTIMG
741.7
1058.3 1375.0 1691.7
Frequency
Frequency
40
20
Std. Dev = 143.25
Mean = 51.9
N = 87.00
0
62.5
287.5
512.5
737.5
IMGNEW
(e)
(f)
Fig. 2.4. Distribution of values for six numerical variables
49
50
Emilia Mendes, Nile Mosley, Steve Counsell
HEFFADPT
100
80
80
60
60
40
40
20
Std. Dev = 141.25
Mean = 62.0
N = 87.00
0
62.5
287.5
HEFFDEV
512.5
Frequency
Frequency
HEFFDEV
100
20
Std. Dev = 28.48
Mean = 10.6
N = 87.00
0
15.6
737.5
71.9
128.1
HEFFADPT
(a)
184.4
(b)
HFOTSA
HFOTS
80
100
80
60
60
40
20
Std. Dev = .41
Mean = .1
N = 87.00
0
0.0
1.0
2.0
3.0
Frequency
Frequency
40
20
Std. Dev = .75
Mean = .3
N = 87.00
0
0.0
HFOTS
(c)
1.0
2.0
3.0
4.0
(d)
HFOTSA
FOTS
HNEW
80
70
60
60
50
40
40
30
Std. Dev = 2.35
10
Mean = 1.2
N = 87.00
0
0.0
2.0
4.0
6.0
HNEW
(e)
8.0
10.0
Frequency
Frequency
20
20
Std. Dev = 2.57
Mean = 1.1
N = 87.00
0
0.0
FOTS
2.5
5.0
7.5
10.0
12.5
15.0
(f)
Fig. 2.5. Distribution of values for another six numerical variables
Web Effort Estimation
NEW
FOTSA
50
50
40
40
30
30
20
10
Std. Dev = 2.41
Mean = 1.9
N = 87.00
0
0.0
2.0
4.0
6.0
8.0
10
Std. Dev = 2.84
Mean = 1.9
N = 87.00
0
0.0
2.0
4.0
6.0
8.0
10.0 12.0
14.0
10.0
NEW
(a)
FOTSA
Frequency
20
Frequency
51
(b)
TOTEFF
100
80
60
Frequency
40
20
Std. Dev = 670.36
Mean = 261.7
N = 87.00
0
512.5
2562.5
TOTEFF
4612.5
(c)
Fig. 2.6. Distribution of values for three numerical variables
When upper and lower tails are approximately equal and the median is
in the centre of the box, the distribution is symmetric. If the distribution is
not symmetric the relative lengths of the tails and the position of the median in the box indicate the nature of the skewness. The length of the box
relative to the length of the tails gives an indication of the shape of the
distribution. So, a boxplot with a small box and long tails represents a very
peaked distribution, whereas a boxplot with a long box represents a flatter
distribution [19].
The boxplots for numerical variables (see Fig. 2.8) indicate that they
present a large number of outliers and peaked distributions that are not
symmetric.
52
Emilia Mendes, Nile Mosley, Steve Counsell
Fig. 2.7. Main components of a boxplot
Whenever outliers are present they should be investigated further, since
they may be a result of data entry error. In our study we looked at all cases,
in particular in relation to projects that exhibited very large effort values,
but did not find anything in the data to suggest they should be removed
from the data set. Note that when there are doubts about the correctness of
the data, the best solution is to contact the data source for confirmation.
Only if the source is not available should an assessment be based on consistency with other variables.
The histograms and boxplots both indicate symptoms of skewness and
outliers. When this situation arises it is common practice to normalise the
data, i.e. to transform the data trying to approximate the values to a normal
distribution. A common transformation is to take the natural log (ln),
which makes larger values smaller and brings the data values closer to
each other [23]. This is the transformation applied in our case, to all numerical variables. For consistency, all variables with a value of zero had
one added to their values prior to being transformed, as there is no natural
log of zero.
The Tukutuku database uses six variables to record the number of features/functions for each application. Their histograms (Fig. 2.5(c)–(f), Fig.
2.6(a)–(b)) indicate that each has a large number of zeros, reducing their
likelihood of being selected by the stepwise procedure. We therefore decided to group their values by creating two new variables – TOTHIGH
(summation of HFOTS, HFOTSA, and HNEW) and TOTNHIGH (summation of FOTS, FOTSA, and NEW). Their histograms are presented in Fig.
2.9(a)–(b).
Web Effort Estimation
2250
2000
1750
1500
1250
1000
750
500
250
0
-250
53
16
14
12
10
8
6
4
2
0
-2
N=
87
87
87
TOTWP
87
N=
TOTIMG
NEWWP
87
87
HFOTS
IMGNEW
87
HFOTSA
(a)
87
87
FOTSA
FOTS
NEW
(b)
1000
5250
4750
4250
3750
3250
2750
2250
1750
1250
750
250
-250
N=
87
HNEW
800
600
400
200
0
-200
87
TOTEFF
(c)
N=
87
87
87
DEVTEAM
87
HEFFDEV
TEAMEXP
HEFFADPT
(d)
Fig. 2.8. Boxplots for numerical variables
Finally, we created a variable called NLANG, representing the number
of different implementation languages used per project, replacing the
original multi-valued variable that stored the names of the different implementation languages. The histogram is presented in Fig. 2.10.
TOTHIGH, TOTNHIGH, and NLANG were also transformed since
they presented skewness and outliers.
In the following sections, any variables that have been transformed have
their names identified by an uppercase L, followed by the name of the
variables they originated from.
The last part of the preliminary analysis is to check if the relationship
between the dependent variable (LTOTEFF) and the independent variables
is linear. The tool used to check such relationships is a scatter plot. Further
details on scatter plots are provided in Chap. 12.
54
Emilia Mendes, Nile Mosley, Steve Counsell
TOTHIGH
TOTNHIGH
50
40
40
30
30
20
10
Std. Dev = 2.52
Mean = 1.6
N = 87.00
0
0.0
2.0
4.0
6.0
8.0
10.0
Frequency
Frequency
20
10
Std. Dev = 4.00
Mean = 4.8
N = 87.00
0
.8
12.0
4.4
8.0
11.6
15.2
TOTNHIGH
TOTHIGH
(a)
(b)
Fig. 2.9. Distribution of values for TOTHIGH and TOTNHIGH
NLANG
60
50
40
Frequency
30
20
Std. Dev = 10.03
10
Mean = 9.7
N = 87.00
0
2.5
10.0
17.5
25.0
32.5
NLANG
Fig. 2.10. Distribution of values for number of different implementation languages
Numerical Variables: Relationship with Total Effort
Scatter plots are used to explore possible relationships between numerical
variables. They also help to identify strong and weak relationships between
two numerical variables. A strong relationship is represented by observations (data points) falling very close to or on the trend line. Examples of
such relationships are shown in Fig. 2.11(a)–(f), Fig. 2.12(d)–(f), and Fig.
2.13(a)–(d). A weak relationship is shown by observations that do not form
a clear pattern, which in our case is a straight line. Examples of such relationships are shown in Fig. 2.12(a)–(c), and Fig. 2.13(e).
10
10
8
8
6
6
4
4
2
2
LTOTEFF
LTOTEFF
Web Effort Estimation
0
-2
-2
0
2
8
0
-2
-2
10
10
8
8
6
6
4
4
2
2
0
-2
-2
0
2
4
6
8
8
6
6
4
4
2
2
LTOTEFF
8
0
.5
1.0
LTOTHIGH
1.5 2.0
2.5
3.0
4
6
8
(b)
0
1
LIMGNEW
(c)
10
0.0
2
0
-2
-1
10
-2
-.5
0
LNEWWP
(a)
LTOTIMG
LTOTEFF
6
LTOTEFF
LTOTEFF
LTOTWP
4
55
2
3
4
5
6
7
(d)
0
-2
-.5
0.0
.5
1.0 1.5
2.0 2.5
3.0
LTOTNHIG
(e)
(f)
Fig. 2.11. Scatter plots showing strong relationships between ltoteff and several
size variables
Emilia Mendes, Nile Mosley, Steve Counsell
10
8
8
6
6
4
4
2
2
LTOTEFF
10
0
-2
-.2 0.0 .2
.4
LTOTEFF
LHFOTS
.6
.8
1.0 1.2 1.4
0
-2
-.5
10
10
8
8
6
6
4
4
2
2
0
-2
-.5
0.0
.5
1.0
1.5
2.0
2.5
3.0
-2
-.5
10
8
8
6
6
4
4
2
2
LTOTEFF
LTOTEFF
10
0
-2
.5
1.0
LFOTSA
1.5
2.0
2.5
1.0
1.5
2.0
(b)
0.0
.5
LHNEW
(c)
0.0
.5
0
LFOTS
-.5
0.0
LHFOTSA
(a)
LTOTEFF
LTOTEFF
56
1.0
1.5
2.0
2.5
(d)
0
-2
-.5
0.0
.5
1.0 1.5 2.0 2.5 3.0
LNEW
(e)
(f)
Fig. 2.12. Scatter plots for strong (d,e,f) and weak (a,b,c) relationships between
ltoteff and several size variables
10
8
8
6
6
4
4
2
2
LTOTEFF
LTOTEFF
10
0
-2
0
10
20
40
57
0.0
0
-2
-.5
.5
LDEVTEAM
(a)
10
10
8
8
6
6
4
4
1.0
1.5
2.0
2.5
5
6
7
(b)
2
2
LTOTEFF
LTOTEFF
NLANG
30
Web Effort Estimation
0
-2
-.5
0.0
.5
1.0
1.5
2.0
2.5
0
-2
1
2
3
4
LHEFFDEV
LTEAMEXP
(c)
(d)
10
8
6
4
LTOTEFF
2
0
-2
-1
0
1
2
3
4
5
6
LHEFFADA
(e)
Fig. 2.13. Scatter plots for strong (a–d) and weak (e) relationships between ltoteff
and independent variables
We can also say that a relationship is positively associated when values
on the y-axis tend to increase with those on the x-axis (e.g. Fig. 2.11(a)–
(f)). When values on the y-axis tend to decrease as those on the x-axis increase we say that the relationship is negatively associated (e.g. Fig.
2.12(e) and Fig. 2.13(a)).
58
Emilia Mendes, Nile Mosley, Steve Counsell
Figures 2.11 to 2.13 show that most variables seem to present a positive
relationship with LTOTEFF.
The scatter plots in Fig. 3.12(a)–(f) clearly show that the large number
of zero values for the independent variables causes the dependent variable
to exhibit more variability at the zero point, i.e. when independent variables have zero values, compared with non-zero values. This behaviour
violates the fourth assumption underlying linear regression. Therefore
within the context of this case study, we will exclude LHFOTS,
LHFOSTA, LHNEW, LFOTS, LFOTSA, and LNEW from any subsequent
analysis.
Our preliminary analyses for numerical variables is finished. Now we
can move on and look at our categorical variables.
Categorical Variables: Relationship with Total Effort
This part of the analysis involves the creation of a table for each categorical variable where, for each of this variable’s category, we display the
mean and median values of effort and the corresponding number of projects it is based on. The motivation is to check if there is a significant difference in effort by category. If there is, then we need to understand why.
Table 2.8 shows that on average, new projects required more effort, despite being smaller in number than enhancement projects. This should not
come as a surprise since we generally know that building an application of
size s from scratch takes longer than enhancing such application.
Table 2.8. Mean, median effort, and number of projects per type of project category
TYPEPROJ
New
Enhancement
Total
N
39
48
87
Mean effort
329.8
206.4
261.7
Median effort
100.0
18.7
43.0
Table 2.9 shows that on average, projects that did not use any documented process used higher effort, despite being smaller in number than
projects that used a documented process. Further inspection of the data
revealed that 70% of the 23 projects that did not use any documented process are new, and that 64% of the 64 projects that used a documented process are enhancement projects. These results are in line with those shown in
Table 2.8.
Web Effort Estimation
59
Table 2.9. Mean, median effort, and number of projects per documented process
category
DOCPROC
no
yes
Total
N
23
64
87
Mean effort
307.5
245.3
261.7
Median effort
50.0
36.2
43.0
Table 2.10. Mean, median effort, and number of projects per process improvement category
PROIMPR
no
yes
Total
N
28
59
87
Mean effort
508.1
144.8
261.7
Median effort
100.0
25.2
43.0
Table 2.11. Mean, median effort, and number of projects per metrics programme
category
METRICS
no
yes
Total
N
36
51
87
Mean effort
462.9
119.7
261.7
Median effort
112.5
21.0
43.0
A similar pattern is observed in Tables 2.10 and 2.11, where, on average,
projects that are not part of a process improvement or metrics programme
required higher effort despite being smaller in size (61% of the 28 projects
that are not part of a process improvement programme are new projects).
For projects that are not part of a metrics programme this percentage is
also 61% of 36 projects. In both cases the majority of projects that are part
of a process improvement or metrics programme are enhancement projects
(63% of 59 and 67% of 51 respectively).
Our next step is to check the relationship between categorical variables
and effort. Note that we cannot use scatter plots as categorical variables are
not numerical. Therefore we use a technique called the one-way ANOVA
(see Chap. 12 for details). Table 2.12 summarises the results for the oneway ANOVA.
Table 2.12. Results for the one-way ANOVA
Categorical variables
TYPEPROJ
DOCPROC
PROIMPR
METRICS
LTOTEFF
Yes
No
Yes
Yes
60
Emilia Mendes, Nile Mosley, Steve Counsell
DOCPROC is the only categorical variable not significantly related to
LTOTEFF; however, it will not be removed from further analysis as its
relationship with LTOTEFF may be concealed at this stage [18].
Next we build the effort model using a two-step process. The first step is
to use a manual stepwise regression based on residuals to select the categorical and numerical variables that jointly have a statistically significant
effect on the dependent variable, LTOTEFF. The second step is to use
these selected variables to build the final effort model using multivariate
regression, i.e. linear regression using more than one independent variable.
The size measures used in our case study represent early Web size
measures obtained from the results of a survey investigation [29], using
data from 133 on-line Web forms aimed at giving quotes on Web development projects. In addition, the measures were validated by an established Web company, and a second survey involving 33 Web companies in
New Zealand. Consequently it is our belief that the size measures identified are plausible effort predictors, not an ad-hoc set of variables with no
underlying rationale.
Building the Model Using a Two-Step Process
This section describes the use of a manual stepwise regression based on
residuals to build the effort model. This technique, proposed by Kitchenham [18], enables the use of information on residuals to handle relationships amongst independent variables. In addition, it only selects the input
variables that jointly have a statistically significant effect on the dependent
variable, thus avoiding any multi-collinearity problems.
The input variables to use are those selected as a result of our preliminary analyses, which are: LTOTWP, LNEWWP, LTOTIMG, LIMGNEW,
LTOTHIGH, LTOTNHIG, TYPEPROJ, DOCPROC, PROIMPR, and
METRICS.
Note: the distinct values of a categorical variables are called levels. For
example, the categorical variable DOCPROC has two levels – Yes and No.
The manual stepwise technique applied to categorical variables comprises the following steps [18]:
Step 1. Identify the categorical variable that has a statistically significant
effect on LTOTEFF and gives the smallest error term (mean square
within groups). This is obtained by applying simple analysis of
variance (ANOVA) using each categorical variable in turn (CV1).
Step 2. Remove the effect of the most significant categorical variable to
obtain residuals (ResC1). This means that for each level of the
most significant categorical variable, subtract the mean effort from
the project effort values. Note that effort represents the normalised
effort – LTOTEFF.
Web Effort Estimation
61
Step 3. Apply ANOVA using each remaining categorical variable in turn,
this time measuring their effect on ResC1.
Step 4. Any categorical variables that had a statistically significant effect
on LTOTEFF (in step 1), but have no statistically significant effect
on ResC1, are variables related to CV1 and offer no additional information about the dependent variable. They can therefore be
eliminated from the stepwise regression.
Step 5. Identify the next most significant categorical variable from step 4
(CV2). Again, if there are several statistically significant variables,
choose the one that minimises the error term.
Step 6. Remove the effect of CV2 to obtain residuals (ResC2).
Step 7. Apply ANOVA using each remaining categorical variable in turn,
this time measuring their effect on ResC2.
Step 8. Any categorical variables that had a statistically significant effect
on ResC1, but have no statistically significant effect on ResC2, are
variables related with CV2 and offer no additional information
about the dependent variable. They can therefore be eliminated
from the stepwise regression.
Step 9. Repeat the stepwise process until all statistically significant categorical variables are removed or none of the remaining variables
have a statistically significant effect on the current residuals.
The initial level means for the four categorical variables to be used in
our manual stepwise process are presented in Table 2.13.
Numerical variables can also be added to this stepwise procedure. Their
impact on the dependent variable can be assessed using linear regression,
and obtaining the mean squares for the regression model and residual.
Whenever a numerical variable is the most significant, its effect has to be
removed, i.e. the obtained residuals are the ones further analysed.
To construct the full regression model, apply a multivariate regression
using only the variables that have been selected from the manual stepwise
procedure. At each stage of the stepwise process we also need to verify the
stability of the model. This involves identifying large residual and high–
influence data points (i.e. projects), and also checking if residuals are homoscedastic and normally distributed. Several types of plots (e.g. residual,
leverage, probability) and statistics are available in most statistics tools to
accomplish such task.
62
Emilia Mendes, Nile Mosley, Steve Counsell
Table 2.13. Initial level means for categorical variables
Variable/Level
TYPEPROJ/New
TYPEPROJ/Enhancement
DOCPROC/Yes
DOCPROC/No
PROIMPR/Yes
PROIMPR/No
METRICS/Yes
METRICS/No
No. projects Total LTOTEFF Mean LTOTEFF
39
48
64
23
59
28
51
36
186.7
154.1
244.3
96.5
204.4
136.4
163.2
177.6
4.8
3.2
3.8
4.2
3.5
4.9
3.2
4.9
The plots we have employed here are:
•
•
•
A residual plot showing residuals vs. fitted values. This allows us to
investigate if the residuals are random and normally distributed. For
numerical variables the plotted data points should be distributed randomly about zero. They should not exhibit patterns such as linear or
non-linear trends, or increasing or decreasing variance. For categorical
variables the pattern of the residuals should appear “as a series of parallel, angled lines of approximately the same length” [18].
A normal P–P plot (probability plots) for the residuals. Normal P–P
plots are generally employed to verify if the distribution of a variable is
consistent with the normal distribution. When the distribution is normal, the data points are close to linear.
Cook’s D statistic to identify projects that exhibited jointly a large
influence and large residual [23]. Any projects with D greater than 4/n,
where n represents the total number of projects, are considered to have
a high influence on the results. When there are high-influence projects
the stability of the model is tested by removing these projects and observing the effect their removal has on the model. If the coefficients
2
remain stable and the adjusted R increases, this indicates that the highinfluence projects are not destabilising the model and therefore do not
need to be removed.
First Cycle
Table 2.14 shows the results of applying ANOVA to categorical and numerical variables. This is the first cycle in the stepwise procedure. The numerical
variable LNEWWP is the most significant, since it results in the smallest
error term, represented by a within-groups mean square value of 1.47.
Web Effort Estimation
63
Table 2.14. ANOVA for each categorical and numerical variable for first cycle
Variable
Levels
Mean
No. projs
Between- Within- F test
groups groups level of
signifiMS
MS
cance
53.56
3.05
17.56
p < 0.01
2.44
3.65
0.42
n.s.
37.38
3.24
11.64
p = 0.001
63.54
2.93
21.67
p < 0.01
158.22
1.82
LNEWWP LTOTEFF = 2.165 + 0.731LNEWWP 188.21
1.47
LTOTIMG
LTOTEFF = 2.428 + 0.471LTOTIMG
78.55
2.76
LIMGNEW
LTOTEFF = 2.98 + 0.524LIMGNEW
104.35
2.45
LTOTHIGH LTOTEFF = 2.84 + 1.705LTOTHIGH
143.04
2.00
LTOTNHIG LTOTEFF = 2.954 + 0.641LTOTNHIG
21.12
3.43
86.97
p < 0.01
128.36
p < 0.01
28.50
p < 0.01
42.54
p < 0.01
71.61
p < 0.01
6.15
p = 0.015
TYPEPROJ
New
4.79
39
TYPEPROJ Enhancement 3.20
48
DOCPROC
Yes
3.82
64
DOCPROC
No
4.20
23
PROIMPR
Yes
3.46
59
PROIMPR
No
4.87
28
METRICS
Yes
3.20
51
METRICS
No
4.93
36
LTOTWP
LTOTEFF = 1.183 + 0.841LTOTWP
The single variable regression equation with LTOTEFF as the dependent/response variable and LNEWWP as the independent/predictor variable
2
gives an adjusted R of 0.597. Two projects are identified with Cook’s D >
0.045; however, their removal did not seem to destabilise the model, i.e.
2
after their removal the coefficients remained stable and the adjusted R
increased. Furthermore, there was no indication from the residual and P–P
plots that the residuals were non-normal. The residuals resulting from the
linear regression are used for the second cycle in the stepwise procedure.
Second Cycle
Table 2.15 shows the results of applying ANOVA to categorical and numerical variables. This is the second cycle in the stepwise procedure. The
numerical variable LTOTHIGH is the most significant, since it results in
the smallest error term, represented by a within-square value of 1.118. The
linear regression equation with the residual as the dependent/response
variable and LTOTHIGH as the independent/predictor variable gives an
64
Emilia Mendes, Nile Mosley, Steve Counsell
adjusted R2 of 0.228. This time five projects are identified with Cook’s
D > 0.045; however, their removal did not destabilise the model. In addition, the residual and P–P plots found no evidence of non-normality.
Table 2.15. ANOVA for each categorical and numerical variable for second cycle
Variable
Levels
Mean
No. projs
Be- Within- F test
tween- groups level of
groups MS
signifiMS
cance
TYPEPROJ
TYPEPROJ
DOCPROC
DOCPROC
PROIMPR
PROIMPR
METRICS
METRICS
LTOTWP
New
Enhancement
Yes
No
Yes
No
Yes
No
LTOTEFF =
-0.0181
39
0.023 1.466
0.0147
48
0.0385
64
0.359 1.462
-0.1072
23
-0.1654
59
5.017 1.407
0.3486
28
-0.2005
51
4.954 1.408
0.2840
36
-0.474 + 0.146LTOTWP 4.749 1.410
LTOTIMG
LTOTEFF = -0.417 + 0.132LTOTIMG 6.169 1.394
LIMGNEW
LTOTEFF = -0.33 + 0.184LIMGNEW 12.915 1.314
LTOTHIGH LTOTEFF = -0.49 + 0.775LTOTHIGH 29.585 1.118
LTOTNHIG LTOTEFF = -0.593 + 0.395LTOTNHIG 8.015 1.372
0.016
n.s.
0.246
n.s.
3.565
n.s.
3.519
n.s.
3.367
n.s.
4.427
p = 0.038
9.826
p = 0.002
26.457
p < 0.01
5.842
p = 0.018
Table 2.15 also shows that TYPEPROJ, PROIMPR, METRICS, and
LTOTWP have no further statistically significant effect on the residuals
obtained in the previous cycle. Therefore they can all be eliminated from
the stepwise procedure.
Once this cycle is complete the remaining input variables are
DOCPROC, LTOTIMG, LIMGNEW, and LTOTNHIG.
Third Cycle
Table 2.16 shows the results of applying ANOVA to the four remaining
categorical and numerical variables. This is the third cycle in the stepwise
procedure. As shown in Table 2.16 none of the four remaining variables
have any statistically significant effect on the current residuals, and as such
the procedure finishes.
Web Effort Estimation
65
Finally, our last step is to construct the effort model using a multivariate
regression analysis with only the input variables selected using the manual
stepwise procedure – LNEWWP and LTOTHIGH. The coefficients for the
2
effort model are presented in Table 2.17. Its adjusted R is 0.717 suggesting that LNEWWP and LTOTHIGH can explain 72% of the variation in
LTOTEFF.
Table 2.16. ANOVA for each categorical and numerical variable for third cycle
Variable
DOCPROC
DOCPROC
LTOTIMG
LIMGNEW
LTOTNHIG
Levels
Mean
No. projs
Yes
0.0097
64
No
-0.0272
23
LTOTEFF = -0.109 + 0.034
LTOTIMG
LTOTEFF = -0.162 + 0.091
LIMGNEW
LTOTEFF = -0.192 + 0.128
LTOTNHIG
Be- Within- F test level
tween- groups of signigroups
MS
ficance
MS
0.023
1.118
0.021
n.s.
0.419
1.113
3.126
1.081
0.837
1.108
0.376
n.s.
2.89
n.s.
0.755
n.s.
Table 2.17. Coefficients for the effort model
Variable
(Constant)
LNEWWP
LTOTHIGH
Coeff.
1.959
0.553
1.001
Std. error
0.172
0.061
0.164
t
11.355
9.003
6.095
P>|t|
0.000
0.000
0.000
[95% conf. interval]
1.616
0.431
0.675
2.302
0.675
1.328
Four projects had Cook’s D > 0.045 (see Table 2.18) and so we followed the procedure adopted previously. We repeated the regression
analysis after excluding these four projects from the data set. Their removal did not result in any major changes to the model coefficients and the
2
adjusted R improved (0.757). Therefore we assume that the regression
equation is reasonably stable for this data set and it is not necessary to omit
these four projects from the data set.
Table 2.18. Four projects that presented high Cook’s distance
ID
20
25
32
45
NEWWP TOTHIGH TOTEFF Cook’s D
20
0
625
0.073
0
4
300
0.138
22
8
3150
0.116
280
0
800
0.078
66
Emilia Mendes, Nile Mosley, Steve Counsell
Figure 2.14 shows three different plots all related to residuals. The histogram (see Fig. 2.14(a)) suggests that the residuals are normally distributed, which is further corroborated by the P–P plot (see Fig. 2.14(b)). In
addition, the scatter plot of standardised residuals versus standardised predicted values does not show any problematic patterns in the data.
P-P Plot stand. Residual
Histogram
Dep. Variable: LTOTEFF
Expected Cum Prob
Dependent Variable: LTOTEFF
14
12
Frequency
10
8
6
4
Std. Dev = .99
2
0
Mean = 0.00
N = 87.00
1.00
.75
.50
.25
75
2. 5
2
2. 5
7
1. 5
2
1.
5
.7
5
.2 5
-.25
-.7.25
-1.75
-1.25
-2.75
-2
0.00
0.00
.25
.50
.75
1.00
Observed Cum Prob
Regression Standardized Residual
(a)
(b)
Regression Stand. Residual
Scatterplot
Dependent Variable: LTOTEFF
3
2
1
0
-1
-2
-3
-2
-1
0
1
2
3
Regression Stand. Predicted Value
(c)
Fig. 2.14. Several residual plots
Once the residuals and the stability of the regression model have been
checked, we are in a position to extract the equation that represents the
model.
Web Effort Estimation
67
2.5.3 Extraction of effort Equation
The equation that is obtained from Table 2.17 is the following:
LTOTEFF = 1.959 + 0.553LNEWWP + 1.001LTOTHIGH
(2.10)
This equation uses three variables that had been previously transformed,
therefore we need to transform it back to its original state, which gives the
following equation:
TOTEFF = 7.092 ( NEWWP + 1) 0.553 (TOTHIGH + 1)1.001
(2.11)
In Eq. 2.11, the multiplicative value 7.092 can be interpreted as the effort required to develop one Web page.
Obtaining a model that has a good fit to the data and can alone explain a
large degree of the variation in the dependent variable is not enough to
assume this model will provide good effort predictions. To confirm this, it
also needs to be validated. This is the procedure explained in Sect. 2.5.4.
2.5.4 Model Validation
As described in Sect. 2.3.2, to validate a model we need to do the following:
Step 1. Divide data set d into a training set t and a validation set v.
Step 2. Use t to produce an effort estimation model te (if applicable).
Step 3. Use te to predict effort for each of the projects in v, as if these
projects were new projects for which effort was unknown.
This process is known as cross-validation. For an n-fold crossvalidation, n different training/validation sets are used. In this section we
will show the cross-validation procedure using a one-fold cross-validation,
with a 66% split. This split means that 66% of our project data will be used
for model building, the remaining 34% to validate the model, i.e. the training set will have 66% of the total number of projects and the validation set
will have the remaining 34%.
Our initial data set had 87 projects. At step 1 they are split into training
and validation sets containing 58 and 29 projects respectively. Generally
projects are selected randomly.
As part of step 2 we need to create an effort model using the 58 projects
in the training set. We will create an effort model that only considers the
variables that have been previously selected and presented in Eq. 2.10.
These are: LNEWWP and LTOTHIGH. Here we do not perform the residual analysis or consider Cook’s D since it is assumed these have also been
68
Emilia Mendes, Nile Mosley, Steve Counsell
done using the generic equation, Eq. 2.10. The model’s coefficients are
presented in Table 2.19, and the transformed equation is presented in Eq.
2
2.12. The adjusted R is 0.619.
Table 2.19. Coefficients for effort model using 58 projects
Variable
Coeff.
Std. error
t
P>|t|
[95% conf. interval]
(Constant)
LNEWWP
LTOTHIGH
2.714
0.420
0.861
0.264
0.073
0.160
10.290
5.749
5.389
0.000
0.000
0.000
2.185
0.273
0.675
TOTEFF = 15.089 ( NEWWP + 1) 0.420 (TOTHIGH + 1) 0.861
3.242
0.566
1.328
(2.12)
Validation set
(29 projects)
To measure this model’s prediction accuracy we obtain the MMRE,
MdMRE, and Pred(25) for the validation set. The model presented as Eq.
2.12 is applied to each of the 29 projects in the validation set to obtain
estimated effort, and MRE is computed. Having the calculated estimated
effort and the actual effort (provided by the Web companies), we are finally in a position to calculate MRE for each of the 29 projects, and hence
MMRE, MdMRE, and Pred(25) for the entire 29 projects. This process is
explained in Fig. 2.15.
(1)
Training set (58 projects)
87 projects
(1)
(4)
Estimated effort,
Actual effort
MRE,
Residual
(5)
(2)
(3)
Model in Eq. 2.14
MMRE,
MdMRE,
Pred(25)
Fig. 2.15. Steps used in the cross-validation process
Table 2.20 shows the measures of prediction accuracy, calculated from
the validation set, and is assumed to represent the entire set of 87 projects.
Web Effort Estimation
69
Table 2.20. Prediction accuracy measures using model-based estimated effort
Measure
MMRE
MdMRE
Pred(25)
%
129
73
17.24
If we assume a good prediction model has an MMRE less than or equal
to 25% and Pred(25) greater than or equal to 75% then the values presented in Table 2.20 suggest the accuracy of the effort model used is poor.
However, if instead we were to use the average actual effort (average =
261) or the median actual effort for the 87 projects (median = 43) accuracy
would be considerably worse. One viable approach for a Web company
would be to use the effort model described above to obtain an estimated
effort, and adapt the obtained values, taking into account factors such as
previous experience with similar projects and the skills of the developers.
Table 2.21. Prediction accuracy measures based on average and median effort
MMRE
MdMRE
Pred(25)
Average effort as
estimated effort
4314%
1413%
6.89%
Median effort as
estimated effort
663%
149%
3.44%
Table 2.21 presents the results for a one-fold cross-validation. However,
research on effort estimation suggests that to have unbiased results for a
cross-validation we should actually use at least a 20-fold cross-validation
analysis [17]. This would represent for the data set presented here, the
selection of 20 different training/validation sets and the aggregation of the
MMREs, MdMREs, and Pred(25)s after accuracy for all 20 groups has
been calculated.
2.6 Conclusions
This chapter introduced the concepts related to effort estimation, and described techniques for effort estimation using three general categories:
expert opinion, algorithmic models and artificial intelligence (AI) techniques. In addition, it discussed how to measure effort prediction power
and accuracy of effort estimation models.
This chapter also presented a case study that used data from industrial
Web projects held in the Tukutuku database, to construct and validate an
70
Emilia Mendes, Nile Mosley, Steve Counsell
effort estimation model. The size measures used in the case study represent
early Web size measures obtained from the results of a survey investigation [29], using data from 133 on-line Web forms aimed at giving quotes
for Web development projects. In addition, the measures were validated by
an established Web company, and by a second survey involving 33 Web
companies in New Zealand. Consequently we believe that the size measures identified are plausible effort predictors, not an ad-hoc set of variables
with no underlying rationale.
Furthermore, a detailed analysis of the data was provided, with details
of a manual stepwise procedure [18] used to build an effort estimation
model. The two variables that were selected by the effort estimation model
were the total number of new Web pages and the total number of higheffort features/functions in the application. Together they explained 76%
of the variation in total effort. Note that the effort model constructed and
the selected variables are applicable only to projects belonging to the data
set on which they were constructed.
The case study details the mechanism that can be used by any Web
company to construct and validate its own effort estimation models. Alternatively, Web companies that do not have a data set of past projects may
be able to benefit from the cross-company effort estimation models provided within the context of the Tukutuku project, provided they are willing
to volunteer data on three of their past finished projects.
References
1
2
3
4
5
6
7
Angelis L, Stamelos I (2000) A Simulation Tool for Efficient Analogy Based
Cost Estimation. Empirical Software Engineering, 5:35–68
Boehm B (1981) Software Engineering Economics. Prentice-Hall, Englewood
Cliffs, NJ
Briand LC, El-Emam K, Surmann D, Wieczorek I, Maxwell KD (1999) An
Assessment and Comparison of Common Cost Estimation Modeling Techniques. In: Proceedings of ICSE 1999, Los Angeles, USA, pp 313–322
Briand LC, Langley T, Wieczorek I (2000) A Replicated Assessment and
Comparison of Common Software Cost Modeling Techniques. In: Proceedings of ICSE 2000, Limerick, Ireland, pp 377–386
Brieman L, Friedman J, Olshen R, Stone C (1984) Classification and Regression Trees. Wadsworth, Belmont.,CA
Conte S, Dunsmore H, Shen V (1986) Software Engineering Metrics and
Models. Benjamin/Cummings, Menlo Park, CA
DeMarco T (1982) Controlling Software Projects: Management, Measurement and Estimation. Yourdon, New York
Web Effort Estimation
8
71
Finnie GR, Wittig GE, Desharnais J-M (1997) A Comparison of Software
Effort Estimation Techniques: Using Function Points with Neural Networks,
Case-Based Reasoning and Regression Models. Journal of Systems and Software, 39:281–289
9
Gray A, MacDonell S (1997) Applications of Fuzzy Logic to Software Metric
Models for Development Effort Estimation. In: Proceedings of IEEE Annual
Meeting of the North American Fuzzy Information Processing Society NAFIPS, Syracuse, NY, USA, pp 394–399
10 Gray AR, MacDonell SG (1997) A comparison of model building techniques
to develop predictive equations for software metrics. Information and Software Technology, 39:425–437
11 Gray R, MacDonell SG, Shepperd MJ (1999) Factors Systematically associated with errors in subjective estimates of software development effort: the
stability of expert judgement. In: Proceedings of the 6th IEEE Metrics Symposium
12 Hughes RT (1997) An Empirical investigation into the estimation of software
development effort. PhD thesis, Dept. of Computing, University of Brighton
13 Jeffery R, Ruhe M, Wieczorek I (2000) A Comparative study of two software
development cost modelling techniques using multi-organizational and company-specific data. Information and Software Technology, 42:1009–1016
14 Jeffery R, Ruhe M, Wieczorek I (2001) Using Public Domain Metrics to
Estimate Software Development Effort. In: Proceedings of the 7th IEEE Metrics Symposium, London, UK, pp 16–27
15 Kadoda G, Cartwright M, Chen L, Shepperd MJ (2000) Experiences Using
Case-Based Reasoning to Predict Software Project Effort. In: Proceedings of
the EASE 2000 Conference, Keele, UK
16 Kemerer CF (1987) An Empirical Validation of Software Cost Estimation
Models, Communications of the ACM, 30(5):416–429
17 Kirsopp C, Shepperd M (2001) Making Inferences with Small Numbers of
Training Sets, January, TR02-01, Bournemouth University
18 Kitchenham BA (1998) A Procedure for Analyzing Unbalanced Datasets.
IEEE Transactions on Software Engineering, April, 24(4):278–301
19 Kitchenham BA, MacDonell SG, Pickard LM, Shepperd MJ (2001) What
accuracy statistics really measure. IEE Proceedings Software, June,
148(3):81–85
20 Kitchenham BA, Pickard LM, Linkman S, Jones P (2003) Modelling Software Bidding Risks. IEEE Transactions on Software Engineering, June,
29(6):54–554
21 Kok P, Kitchenham BA, Kirakowski J (1990) The MERMAID Approach to
software cost estimation. In: Proceedings of the ESPRIT Annual Conference,
Brussels, pp 296–314
72
Emilia Mendes, Nile Mosley, Steve Counsell
22 Kumar S, Krishna BA, Satsangi PS (1994) Fuzzy systems and neural networks in software engineering project management. Journal of Applied Intelligence, 4:31–52
23 Maxwell K (2002) Applied Statistics for Software Managers. Prentice Hall
PTR, Englewood Cliffs, NJ
24 Mendes E, Counsell S, Mosley N (2000) Measurement and Effort Prediction
of Web Applications. In: Proceedings of the 2nd ICSE Workshop on Web
Engineering, June, Limerick, Ireland, pp 57–74
25 Mendes E, Mosley N, Counsell S (2001) Web Metrics – Estimating Design
and Authoring Effort. IEEE Multimedia, Special Issue on Web Engineering,
January/March:50–57
26 Mendes E, Mosley N, Counsell S (2002) The Application of Case-Based
Reasoning to Early Web Project Cost Estimation. In: Proceedings of
COMPSAC 2002, Oxford, UK
27 Mendes E, Mosley N, Counsell S (2003) Do Adaptation Rules Improve Web
Cost Estimation?. In: Proceedings of the ACM Hypertext conference 2003,
Nottingham, UK
28 Mendes E, Mosley N, Counsell S (2003) A Replicated Assessment of the Use
of Adaptation Rules to Improve Web Cost Estimation. In: Proceedings of the
ACM and IEEE International Symposium on Empirical Software Engineering. Rome, Italy, pp 100–109
29 Mendes E, Mosley N, Counsell S (2003) Early Web Size Measures and Effort
Prediction for Web Costimation. In: Proceedings of the IEEE Metrics Symposium. Sydney, Australia, September, pp 18–29
30 Myrtveit I, Stensrud E (1999) A Controlled Experiment to Assess the Benefits
of Estimating with Analogy and Regression Models. IEEE Transactions on
Software Engineering, July/August, 25(4):510–525
31 Ruhe M, Jeffery R, Wieczorek I (2003) Cost Estimation for Web Applications. In: Proceedings of ICSE 2003. Portland, USA
32 Schofield C (1998) An empirical investigation into software estimation by
analogy. PhD thesis, Dept. of Computing, Bournemouth University
33 Schroeder L, Sjoquist D, Stephan P (1986) Understanding Regression Analysis: An Introductory Guide, No. 57. In: Quantitative Applications in the Social Sciences, Sage Publications, Newbury Park, CA
34 Selby RW, Porter AA (1998) Learning from examples: generation and
evaluation of decision trees for software resource analysis. IEEE Transactions
on Software Engineering, 14:1743–1757
35 Shepperd MJ, Kadoda G (2001) Using Simulation to Evaluate Prediction
Techniques. In: Proceedings of the IEEE 7th International Software Metrics
Symposium, London, UK, pp 349–358
36 Shepperd MJ, Schofield C (1997) Estimating Software Project Effort Using
Analogies. IEEE Transactions on Software Engineering, 23(11):736–743
Web Effort Estimation
73
37 Shepperd MJ, Schofield C, Kitchenham B (1996) Effort Estimation Using
Analogy. In: Proceedings of ICSE-18. Berlin
38 Srinivasan K, Fisher D (1995) Machine Learning approaches to estimating
software development effort. IEEE Transactions on Software Engineering,
21:126–137
39 Stensrud E, Foss T, Kitchenham BA, Myrtveit I (2002) An Empirical validation of the relationship between the magnitude of relative error and project
size. In: Proceedings of the IEEE 8th Metrics Symposium. Ottawa, pp 3–12
Authors’ Biographies
Dr. Emilia Mendes is a Senior Lecturer in Computer Science at the University of
Auckland (New Zealand), where she leads the WETA (Web Engineering, Technology and Applications) research group. She is the principal investigator in the
Tukutuku Research project,16 aimed at developing and comparing Web effort
models using industrial Web project data, and benchmarking productivity within
and across Web companies. She has active research interests in Web measurement
and metrics, and in particular Web cost estimation, Web size measures, Web productivity and quality measurement, and Web process improvement. Dr. Mendes is
on the programme committee of numerous international conferences and workshops, and on the editorial board of the International Journal of Web Engineering
and Technology and the Journal of Web Engineering. She has collaborated with
Web companies in New Zealand and overseas on Web cost estimation and usability measurement. Dr. Mendes worked in the software industry for ten years before
obtaining her PhD in Computer Science from the University of Southampton
(UK), and moving to Auckland. She is a member of the Australian Software
Measurement Association.
Dr. Nile Mosley is the Technical Director of a software development company.
He has active research interests in software measurement and metrics, and objectoriented programming languages. He obtained his PhD in Pure and Applied
Mathematics from Nottingham Trent University (UK).
Steve Counsell obtained a BSc (Hons) in Computer Studies from the University
of Brighton and an MSc in Systems Analysis from the City University in 1987 and
1988, respectively. After spending some time in industry as a developer, he obtained his PhD in 2002 from the University of London and is currently a Lecturer
in the Department of Information Systems and Computing at Brunel University.
Prior to 2004, he was a Lecturer in the School of Computer Science and Information Systems at Birkbeck, University of London and between 1996 and 1998 was a
Research Fellow at the University of Southampton. In 2002, he was a BT Shortterm Research Fellow. His research interests are in software engineering, more
specifically metrics and empirical studies.
16
http://www.cs.auckland.ac.nz/tukutuku/.
3 Web Productivity Measurement
and Benchmarking
Emilia Mendes, Barbara Kitchenham
Abstract: Project managers use software productivity measures to assess
software development efficiency. Productivity is commonly measured as the
ratio of output to input. Within the context of software development, output is
often assumed to be product size and input to be effort. However, Web applications are often characterised using several different size measures and there
is no standard model for aggregating those measures into a single size measure. This makes it difficult to measure Web application productivity.
In this chapter, we present a productivity measurement method, which
allows for the use of different size measures. An advantage of the method is
that it has a built-in interpretation scale. It ensures that each project has an
expected productivity value of one. Values between zero and one indicate
lower than expected productivity; values greater than one indicate higher
than expected productivity. We demonstrate how to use the method by
analysing the productivity of Web projects from the Tukutuku database.
Keywords: Web productivity measurement, Productivity measure, Manual
stepwise regression, Size-based effort model, Data analysis.
3.1 Introduction
Productivity is commonly measured as the ratio of output to input. The
more output per unit of input, the more productive a project is assumed to
be. Within the context of software development the output of the software
production process is often taken to be product size and the input to the
process to be effort. Therefore, productivity is represented by the following
equation:
Productivity = Size/Effort
(3.1)
Equation 3.1 is simple to apply when product size is represented by a
single dominant size measure (e.g. product size measured in lines of code
or function points). However, there are circumstances when there are several different effort-related size measures and there is no standard model
for aggregating these measures. When we have more than one size measure
related to effort and no theoretical model for aggregating those measures, it
is difficult to construct a single size measure. In these circumstances,
76
Emilia Mendes, Barbara Kitchenham
Eq. 3.1 cannot be used to measure productivity. This is exactly the problem
we face when attempting to measure Web application productivity. The
majority of studies published in the Web sizing literature have identified
the need to use a variety of different measures to adequately characterise
the size of a Web application, but there is no widely accepted method for
aggregating the measures into a single size measure.
In this chapter we describe a case study that analyses the productivity of
87 Web projects from the Tukutuku database. This is the same subset of
projects used in Chap. 2. We adopt the productivity measurement method
suggested in Kitchenham and Mendes [2], which allows for the use of several effort-related size measures, and also provides a productivity baseline
of one. Thus, productivity values between zero and one indicate lower than
expected productivity, values greater than one indicate higher than expected productivity.
Section 3.2 presents the method used to build the productivity measure
and the assumptions underlying the productivity measure. The results of
our productivity analysis using the new productivity measurement method
are described in Sect. 3.3, followed by our conclusions in Sect. 3.4.
3.2 Productivity Measurement Method
The productivity measurement method employed in this chapter allows for
the use of multiple effort-related size measures. It is based on the idea that
any size-based effort estimation model constructed using the stepwise regression technique is by definition a function of effort-related size measures. Thus the size-based effort estimation model can be regarded as an
AdjustedSize measure, and used in the following equation to represent
productivity [2]:
Productivity = AdjustedSize/Effort
(3.2)
The AdjustedSize measure contains only size measures that together are
strongly associated with effort. In addition, the relationship between these
size measures and effort does not need to be linear.
The benefits of using this method for measuring productivity are as follows [2]:
•
•
•
The standard value of productivity is one, since it is obtained using the
ratio of estimated to actual effort.
A productivity value greater than one suggests above-average productivity.
A productivity value smaller than one suggests below-average productivity.
Web Productivity Measurement and Benchmarking
•
•
77
The stepwise regression technique used to build a regression model
that represents the AdjustedSize measure can also be employed to construct upper and lower bounds on the productivity measure. These
bounds can be used to assess whether the productivity achieved by a
specific project is significantly better or worse than expected.
The productivity measure automatically allows for diseconomies (or
economies) of scale before being used in a productivity analysis. This
means that an investigation of factors that affect productivity will only
select factors that affect the productivity of all projects. If we ignore
the impact of diseconomies (or economies) or scale, we run the risk of
detecting factors that differ between large and small projects rather
than factors that affect the productivity of all projects.
3.3 Case Study
The case study presented in this section describes the construction of a
productivity measure and its use to analyse the productivity of Web projects from the Tukutuku database.1
The database used in our analysis has data on 87 Web projects where 13
and 34 come from 2 single Web companies, respectively, and the remaining 40 projects come from another 23 companies. The Tukutuku database
uses 6 variables to store data on companies which volunteered projects, 10
variables to store data on each project and 13 variables to store data on
each Web application2 (see Table 3.1). Company data is obtained once and
both project and application data are gathered for each volunteered project.
Table 3.1. Variables for the Tukutuku database
NAME
SCALE
Company data
COUNTRY
ESTABLISHED
SERVICES
NPEOPLEWD
Categorical
Ordinal
Categorical
Ratio
CLIENTIND
Categorical
ESTPRACT
Categorical
1
2
DESCRIPTION
Country company belongs to.
Year when company was established.
Type of services company provides.
Number of people who work on Web design and
development.
Industry representative of those clients to whom
applications are provided.
Accuracy of a company’s own effort estimation
practices.
The raw data cannot be displayed here due to a confidentiality agreement with
those companies that have volunteered data on their projects.
A definition of Web application is given in Chap. 1.
78
Emilia Mendes, Barbara Kitchenham
NAME
SCALE
Project data
TYPEPROJ
LANGS
DOCPROC
Categorical
Categorical
Categorical
PROCIMPR
Categorical
METRICS
Categorical
DEVTEAM
TEAMEXP
Ratio
Ratio
TOTEFF
Ratio
ESTEFF
Ratio
ACCURACY
Categorical
Web application
TYPEAPP
Categorical
TOTWP
Ratio
NEWWP
Ratio
TOTIMG
Ratio
NEWIMG
Ratio
HEFFDEV
Ratio
HEFFADPT
Ratio
HFOTS
Ratio
HFOTSA
Ratio
HNEW
FOTS
FOTSA
Ratio
Ratio
Ratio
NEW
Ratio
3
4
DESCRIPTION
Type of project (new or enhancement).
Implementation languages used.
If project followed defined and documented
process.
If project team involved in a process improvement programme.
If project team part of a software metrics programme.
Size of project’s development team.
Average team experience with the development
language(s) employed.
Actual total effort in person hours used to develop the Web application.
Estimated total effort in person hours necessary
to develop the Web application.
Procedure used to record effort data.
Type of Web application developed.
Total number of Web pages (new and reused).
Total number of new Web pages.
Total number of images (new and reused).
Total number of new images created.
Minimum number of hours to develop a single
function/feature by one experienced developer
that is considered high (above average).3
Minimum number of hours to adapt a single
function/feature by one experienced developer
that is considered high (above average).4
Number of reused high-effort features/functions
without adaptation.
Number of adapted high-effort features/functions.
Number of new high-effort features/functions.
Number of low-effort features off the shelf.
Number of low-effort features off the shelf adapted.
Number of new low-effort features/functions.
This number is currently set to 15 hours based on the collected data.
This number is currently set to 4 hours based on the collected data.
Web Productivity Measurement and Benchmarking
79
All results presented here were obtained using the statistical software
SPSS 10.1.3 for Windows. Finally, all the statistical significance tests used
α = 0.05.
Two main steps are used in this case study. The first step is to build the
productivity measure using the productivity measurement method proposed in [2]. The second step is to use the productivity values (including
lower and upper bounds) obtained from step 1 to carry out a productivity
analysis.
3.3.1 Productivity Measure Construction
To build the productivity measure we will employ the same technique used
in Chap. 2, a manual stepwise regression. However, here the attributes of
interest are only size and effort measures. We will use the following steps
to carry out our data analysis [4]:
1.
2.
3.
4.
Data validation
Variables and model selection
Model building and inspection
Extraction of AdjustedSize equation
Each of these steps will be detailed below.
Data Validation
Data validation represents a first screening of the data set to become familiar with it and also to identify any missing or unusual values. It generally
involves understanding what the variables are (e.g. purpose, scale type) and
also using descriptive statistics that will help identify any unusual cases.
Table 3.2 presents a set of results that show summary values for the
size and effort variables. It might be considered unusual to see “zero” as
minimum value for TOTIMG, or “one” as minimum value for TOTWP;
however, it is possible to have a Web application without any images or
an application that provided all its information and functionality using
only one Web page.
The average size of applications is around 82 new Web pages and 51
new images. However, their corresponding medians are 7 and 0 respectively, which indicates that half the Web applications in the data set construct no more than seven new Web pages, and no new images.
Our summary statistics also show that there is least one very large
application with 2000 Web pages. Although this value is atypical for our
data set, we cannot simply assume that this has been a data entry error.
Best practice in such circumstances is to ask the data provider to check
80
Emilia Mendes, Barbara Kitchenham
Table 3.2. Descriptive statistics for numerical variables
Variables
TOTWP
NEWWP
TOTIMG
IMGNEW
HFOTS
HFOTSA
HNEW
FOTS
FOTSA
NEW
TOTEFF
N
87
87
87
87
87
87
87
87
87
87
87
Minimum
1
0
0
0
0
0
0
0
0
0
1
Maximum
2000
1980
1820
800
3
4
10
15
10
13
5000
Mean Median
92.40
25
82.92
7
122.54
40
51.90
0
.08
0
.29
0
1.24
0
1.07
0
1.89
1
1.87
0
261.73
43
Std. deviation
273.098
262.982
284.482
143.254
.410
.746
2.352
2.574
2.413
2.836
670.364
the value. However, we were unable to obtain confirmation from the
source company. If the data providers are unavailable, it is customary to
investigate whether the data is internally consistent. In this case, the
developers produced 1980 pages from scratch, and constructed numerous
new functions/features (five high-effort and seven low-effort). The development team consisted of two people who had very little experience
with the six programming languages used. The total effort was 947 person hours, which corresponds to a three-month project, assuming both
developers worked full time, and in parallel. Considering only the number of Web pages and effort, the project delivered just over 2 Web
pages per hour compared with an average of about 0.4 Web pages per
hour for the other projects. Thus, the results cast some doubt on the
internal consistency of the project values, particularly given the lack of
experience of the development team and the number of different languages they had to use. However, for the purpose of illustrating the data
analysis method we have not removed this project from the data set.
In terms of TOTEFF, the average person hours is around 261 and its
median is 43 person hours, indicating that half the applications on the data
set are relatively small with a duration close to a working week. Further
investigation of the data revealed that more than half of the projects are
enhancements of existing Web applications, which may explain the small
median for TOTEFF, NEWWP and IMGNEW.
Once the data validation is finished we are ready to move on to the next
step, namely variables and model selection.
Web Productivity Measurement and Benchmarking
81
Variables and Model Selection
The second step in our data analysis methodology is sub-divided into two
separate and distinct phases: preliminary analyses and model building.
A Preliminary analyses allows us to choose which variables to use, discard, modify and sometimes create. Model building determines the best
size-based effort estimation model based on our data set and set of variables.
Preliminary Analyses
Our aim is to build an AdjustedSize measure using manual stepwise regression. The assumptions underlying stepwise regression are as follows:
1. The input variables (independent variables) are measured without error. If this cannot be guaranteed then these variables need to be normalised.
2. The relationship between dependent and independent variables is linear.
3. No important input variables have been omitted. This ensures that
there is no specification error associated with the data set. The use of a
prior theory-based model justifying the choice of input variables helps
to ensure this assumption is not violated.
4. The variance of the residuals is the same for all combinations of input variables (i.e. the residuals are homoscedastic rather than heteroscedastic).
5. The residuals are normally distributed.
6. The residuals are independent, i.e. not correlated.
7. The independent variables are not linearly dependent, i.e. there are no
linear dependencies among the independent variables.
The first task is to look at the set of variables (size measures and effort)
and see if they have a large number of missing values (> 60%). If they do,
they should be automatically discarded. Without sufficient values it is not
possible to identify useful trends and a large number of missing values also
prohibits the use of imputation methods. Imputation methods are methods
used to replace missing values with estimated values.
Table 3.2 shows that there are no variables with missing values. Even
though we have a large number of zero values on certain size variables,
these zeros do not represent a missing value or a rounded-down value.
However, a problem with many zero values is that they may cause heteroscedasticity at the zero point (see Fig. 3.6), i.e. the dependent variable exhibits more variability when the input variable is zero. It is not possible to
correct this form of heteroscedasticity by normalising the corresponding
variables.
82
Emilia Mendes, Barbara Kitchenham
Fig. 3.1. Example of a histogram representing a normal distribution
Our next step is to look for symptoms (e.g. skewness,5 heteroscedasticity,6 and outliers7) that may suggest the need for variables to be normalised,
i.e. to have their values transformed such that they resemble more closely
a normal distribution. This step uses histograms, boxplots and scatter plots.
Histograms, or bar charts, provide a graphical display where each bar summarises the frequency of a single value/range of values for a given variable.
They are often used to check whether a variable is normally distributed, in
which case the bars are displayed according to a bell-shaped curve (see Fig.
3.1). Figure 3.2 confirms that all variables have skewed distributions since
their data values are not symmetrical about a central value.
Next, we use boxplots to check the existence of outliers. Boxplots (see
Fig. 3.3) use the median value as the central value for the distribution. The
median is represented by the horizontal line in the middle of the box. The
length of the box corresponds to the inter-quartile range, and contains 50%
of the values. The vertical (whiskers) lines up or down from the edges
contain observations which are less than 1.5 times inter-quartile range.
Outliers are taken as values greater than 1.5 times the length of the box. If
a value is greater than 3 times the length of the box it is called an extreme
outlier [3].
When upper and lower tails are approximately equal and the median is
in the centre of the box, the distribution is symmetric. If the distribution is
not symmetric the relative lengths of the tails and the position of the median in the box indicate the extent of the skewness. The length of the box
relative to the length of the tails gives an indication of the shape of the
distribution. A boxplot with a small box and long tails represents a very
peaked distribution, whereas a boxplot with a long box represents a flatter
distribution [3].
5
6
7
Skewness measures to what extent the distribution of data values is symmetrical about a central value.
Heteroscedasticity represents unstable variance of values.
Outliers are unusual values.
Web Productivity Measurement and Benchmarking
TOTWP
NEWWP
100
80
80
60
60
40
40
20
Std. Dev = 271.04
Mean = 101.3
N = 87.00
0
Frequency
Frequency
100
20
Std. Dev = 262.98
Mean = 82.9
N = 87.00
0
120.0
120.0
560.0
1000.0
1440.0
560.0
1440.0
1880.0
NEWWP
TOTWP
(a)
(b)
TOTIMG
IMGNEW
100
100
80
60
60
40
40
20
Std. Dev = 284.48
Mean = 122.5
N = 87.00
0
108.3
425.0
TOTIMG
741.7
Frequency
80
20
Std. Dev = 143.25
Mean = 51.9
N = 87.00
0
62.5
1058.3 1375.0 1691.7
287.5
512.5
737.5
IMGNEW
(c)
(d)
HFOTSA
HFOTS
80
100
80
60
60
40
20
Std. Dev = .41
Mean = .1
N = 87.00
0
0.0
HFOTS
1.0
(e)
2.0
3.0
Frequency
40
Frequency
Frequency
1000.0
1880.0
20
Std. Dev = .75
Mean = .3
N = 87.00
0
0.0
HFOTSA
1.0
2.0
(f)
3.0
4.0
83
84
Emilia Mendes, Barbara Kitchenham
FOTS
HNEW
80
70
60
60
50
40
40
30
Std. Dev = 2.35
10
Mean = 1.2
N = 87.00
0
0.0
2.0
4.0
6.0
8.0
Frequency
Frequency
20
20
Std. Dev = 2.57
Mean = 1.1
N = 87.00
0
10.0
0.0
HNEW
2.5
5.0
10.0
12.5
15.0
FOTS
(g)
(h)
NEW
FOTSA
50
50
40
40
30
30
20
10
Std. Dev = 2.41
Mean = 1.9
N = 87.00
0
0.0
2.0
4.0
6.0
8.0
Frequency
20
Frequency
7.5
10
Std. Dev = 2.84
Mean = 1.9
N = 87.00
0
0.0
2.0
4.0
6.0
8.0
10.0 12.0
14.0
10.0
NEW
FOTSA
(i)
(j)
TOTEFF
100
80
60
Frequency
40
20
Std. Dev = 670.36
Mean = 261.7
N = 87.00
0
512.5
TOTEFF
2562.5
4612.5
(k)
Fig. 3.2. Distribution of values for size and effort variables
Web Productivity Measurement and Benchmarking
85
Fig. 3.3. Main components of a boxplot
The boxplots for size and effort measures (see Fig. 3.4) confirm that
each variable has a large number of outliers, and a peaked distribution that
is not symmetric.
Whenever outliers are present they need to be investigated since they
may be a result of data entry error. In our study, we looked at all these
cases, in particular in relation to projects that exhibited very large effort
values, and did not find anything on the data suggesting that they should
be removed from the data set. As we said earlier, whenever there are
doubts about the correctness of the data the best solution is to contact the
data provider for confirmation. Only if the source is not available should
an assessment be based on consistency with other variables.
The histograms and boxplots both show symptoms of skewness and outliers. When this situation arises, it is common practice to normalise the data,
i.e. to apply a functional transformation to the data values to make the distribution closer to a normal distribution. A common transformation is to take
the natural log (ln), which makes larger values smaller and brings the data
values closer to each other [4]. This is the procedure we have adopted, i.e. we
created new variables representing the natural log for each of our size and
effort variables. Whenever a numerical variable had zero values, we added
one to all values before applying the transformation. In the subsequent sections, we refer to logarithmical transformed variables as Lvarname, e.g.
LTOTHIGH is the variable obtained by transforming TOTHIGH.
86
Emilia Mendes, Barbara Kitchenham
16
2250
2000
1750
1500
1250
1000
750
500
250
0
-250
N=
14
12
10
8
6
4
2
0
-2
87
87
TOTWP
87
87
N=
TOTIMG
NEWWP
87
87
HFOTS
IMGNEW
87
87
HNEW
HFOTSA
87
87
FOTSA
FOTS
NEW
(b
(a)
)
5250
4750
4250
3750
3250
2750
2250
1750
1250
750
250
-250
N=
87
TOTEFF
(c)
Fig. 3.4. Boxplots for size and effort variables
The Tukutuku database uses six variables to record the number of features/functions for each Web application. Their histograms (see Fig.
3.2(e)–(j)) indicate that each has a large number of zeros. We therefore
decided to construct two new variables, one related to high-effort functions/features, the other related to low-effort functions/features: TOTHIGH
and TOTNHIGH. TOTHIGH is the sum of HFOTS, HFOTSA and
HNEW, and TOTNHIGH is the sum of FOTS, FOTSA and NEW. Their
histograms are shown in Fig. 3.5(a)–(b).
Finally, we created two new variables: RWP and RIMG. RWP is the
difference between TOTWP and NEWWP, and RIMG is the difference
between TOTIMG and IMGNEW. RWP represents the number of reused
Web pages and RIMG the number of reused images. The motivation for
their creation was twofold: first, to be consistent with the criteria used regarding the features/functions variables; second, to enable us to check the
effect of reused Web pages and reused images on total effort. Their histograms are shown in Fig. 3.5(c)–(d). All four new variables were also transformed since they exhibit both skewness and outliers.
Web Productivity Measurement and Benchmarking
87
It is important to note that creating new variables as linear combinations
of existing variables places a constraint on subsequent analyses. One assumption of multiple regression is that there are no linear combinations in
the model. This means that we must not attempt to include a constructed
variable and all the variables used to construct it in the same model, e.g.
we can attempt to include in a model only three of the following four variables: TOTHIGH, HFOTS, HFOSTA and HNEW. In fact, since the variable TOTHIGH was constructed because of problems of multiple zeros,
the best approach is to exclude HFOTS, HFOSTA and HNEW from any
subsequent analysis.
Our next step is to check if the relationships between the dependent
variable (LTOTEFF, the natural logarithm of TOTEFF) and the independent variables are linear. The tool used to check such relationships is a scatter plot.
TOTNHIGH
TOTHIGH
50
40
40
30
30
20
10
Std. Dev = 2.52
Mean = 1.6
N = 87.00
0
0.0
2.0
4.0
6.0
8.0
10.0
Frequency
Frequency
20
10
Std. Dev = 4.00
Mean = 4.8
N = 87.00
0
12.0
.8
TOTHIGH
4.4
8.0
11.6
15.2
TOTNHIGH
(a)
(b)
RIMG
RWP
80
80
70
60
60
50
40
40
Std. Dev = 57.44
Mean = 18.4
N = 87.00
0
100.0
200.0
300.0
400.0
500.0
50.0
150.0
250.0
350.0
450.0
20
Std. Dev = 204.09
10
Mean = 70.6
N = 87.00
0
.0
00
12 0. 0
0
11 . 0
00
10 . 0
0
90 . 0
0
80 . 0
0
70 . 0
0
60 . 0
0
50 . 0
0
40 0
0.
30 . 0
0
20 . 0
0
10
0
0.
0.0
Frequency
Frequency
30
20
RIMG
RWP
(c)
(d)
Fig. 3.5. Distribution of values for TOTHIGH, TOTNHIGH, RWP and RIMG
88
Emilia Mendes, Barbara Kitchenham
A scatter plot is used to visualise possible relationships between numerical variables. A relationship is said to be positive when values on the
y-axis tend to increase with those on the x-axis. When values on the y-axis
tend to decrease as those on the x-axis increase the relationship is negative.
Adding a simple regression line to a scatter plot helps identify the strength
of the relationship between two numerical variables. A strong relationship
is represented by observations (data points) all falling very close to the
linear trend line.
Scatter plots of LTOTEFF against each of the independent variables are
shown in Fig. 3.6. They demonstrate very clearly that zero values are a
problem for this data set. Figure 3.6(l) shows a negative trend line between
LFOTSA and LTOTEFF. This is counterintuitive in the sense that including
more functions that need adaptation (even if the adaptation effort is low)
should not reduce total production effort. The effect occurs because the
projects with zero values have distorted the underlying relationship. Nonetheless, there are several reasonably strong relationships between
LTOTEFF and the independent variables, in particular LTOTWP (Fig.
3.6(a)), LNEWW (Fig. 3.6(b)), LHNEW (Fig. 3.6(i)) and LTOTHIGH (Fig
3.6(j)). Some other relationships appear quite strong but are being distorted
by the large number of zeros, in particular LRWP (Fig. 3.6(c)), TOTIMG
(Fig. 3.6(d)) and LIMGNEW (Fig. 3.6(e)). Other variables exhibit relationships that look as though they are solely due to multiple zero value distortions, see Fig. 3.6(g) and Fig 3.6(m).
A potential problem with this type of analysis is that the more variables
you measure the more likely you are to detect spurious relationships. For
this reason, best practice is to include only those variables that are “a priori” plausible predictors. The size measures used in our case study represent early Web size measures obtained from the results of a survey investigation [5], using data from 133 on-line Web forms aimed at giving quotes
on Web development projects. In addition, the measures were validated by
an established Web company, and a second survey involving 33 Web
companies in New Zealand. Consequently it is our belief that the size
measures identified are plausible effort predictors, not an ad hoc set of
variables with no underlying rationale.
Our preliminary analysis is finished. Now we are ready to build the AdjustedSize measure using manual stepwise regression. Assumptions 4 to 7
will be dealt with in the next section.
10
8
8
6
6
4
4
2
2
LTOTEFF
10
0
-2
-2
0
2
LTOTEFF
LTOTWP
4
6
8
10
8
8
6
6
4
4
2
2
0
0
1
2
3
4
5
6
7
10
8
8
6
6
4
4
2
2
LTOTEFF
LTOTEFF
10
0
1
2
LIMGNEW
3
4
5
6
7
0
2
6
4
6
8
8
(d)
0
-2
-2
0
2
4
LRIMG
(e)
4
(b)
LTOTIMG
(c)
0
2
0
-2
-2
LRWP
-2
-1
0
LNEWWP
10
-2
-1
89
0
-2
-2
(a)
LTOTEFF
LTOTEFF
Web Productivity Measurement and Benchmarking
(f)
6
8
Emilia Mendes, Barbara Kitchenham
10
10
8
8
6
6
4
4
2
2
LTOTEFF
LTOTEFF
90
0
-2
-.2 0.0 .2
.4
1.0 1.2 1.4
-2
-.5
10
10
8
8
6
6
4
4
2
2
0
-2
-.5
0.0
.5
1.0
1.5
2.0
2.5
8
6
6
4
4
2
2
LTOTEFF
8
0
.5
1.0
1.5
LFOTS
2.0
2.5
3.0
1.0
0.0
.5
1.0
1.5 2.0
2.0
2.5
3.0
(j)
0
-2
-.5
0.0
.5
1.0
LFOTSA
(k)
1.5
(h)
LTOTHIGH
(i)
10
0.0
.5
0
-2
-.5
10
-2
-.5
0.0
LHFOTSA
(g)
LHNEW
LTOTEFF
.8
LTOTEFF
LTOTEFF
LHFOTS
.6
0
(l)
1.5
2.0
2.5
10
10
8
8
6
6
4
4
2
2
LTOTEFF
LTOTEFF
Web Productivity Measurement and Benchmarking
0
-2
-.5
0.0
LNEW
.5
1.0 1.5 2.0 2.5 3.0
(m)
91
0
-2
-.5
0.0
.5
LTOTNHIG
1.0 1.5
2.0 2.5
3.0
(n)
Fig. 3.6. Scatter plots for LTOTEFF versus transformed independent variables
Model Building and Inspection
This section describes the use of a manual stepwise regression based on
residuals to build the AdjustedSize model. This technique, proposed by
Kitchenham [1], enables the use of information on residuals to handle relationships among independent variables. In addition, it only selects the input variables that jointly have a statistically significant effect on the dependent variable, thus avoiding any multi-collinearity problems.
The input variables to use are those selected as a result of our preliminary analyses, which are: LTOTWP, LNEWWP, LRWP, LTOTIMG,
LIMGNEW, LRIMG, LHFOTS, LHFOTSA, LHNEW, LTOTHIGH,
LFOTS, LFOTSA, LNEW, and LTOTNHIG.
The manual stepwise technique comprises the following steps:
Step 1. Construct the single variable regression equation with effort as the
dependent variable using the most highly (and significantly) correlated input variable (IV1).
Step 2. Calculate the residuals (Res1).
Step 3. Correlate the residuals with all the other input variables.
Step 4. Any input variables that were initially significantly correlated with
effort but are not significantly correlated with the residual are significantly correlated with IV1 and offer no additional information
about the dependent variable. They can therefore be eliminated
from the stepwise regression.
Step 5. Construct a single variable regression with the residuals (Res1) as
the dependent variable and the variable (IV2), of the remaining input variables, that is most highly (and significantly) correlated
with Res1.
Step 6. Calculate residuals Res2.
92
Emilia Mendes, Barbara Kitchenham
Step 7. Correlate the residuals Res2 with the remaining input variables.
Any variables that were correlated with Res1 in step 5 but are not
correlated with Res2 are eliminated from the analysis. They are
variables that are highly correlated with IV2.
Step 8. Continue in this way until there are no more input variables available for inclusion in the model or none of the remaining variables
are significantly correlated with the current residuals.
Step 9. The simplest way to construct the full regression model is then to
use simple multivariate regression with only the selected input
variables.
We also need to verify the stability of the regression model. This involves identifying large residual and high-influence data points (i.e. projects), and also checking whether residuals are homoscedastic and normally
distributed. Several types of plots (e.g. residual, leverage, probability) and
statistics are available in most statistics tools to accomplish such task. The
ones we have employed here, which are available in SPSS v10.1.3, are:
•
•
•
A residual plot showing residuals vs. fitted values to investigate if the
residuals are random and normally distributed. The plotted data points
should be distributed randomly about zero. They should not exhibit patterns such as linear or non-linear trends, or increasing or decreasing
variance.
A normal P–P plot (probability plots) for the residuals. Normal P–P
plots are generally employed to verify whether the distribution of a
variable is consistent with the normal distribution. If the distribution is
Normal, the data points are close to linear.
Cook’s D statistic to identify projects that exhibited jointly a large
influence and large residual [4]. Any projects with D greater than 4/n,
where n represents the total number of projects, are considered to have
high influence on the results. When there are high-influence projects
the stability of the model needs to be tested by removing these projects, and observing the effect their removal has on the model. If the
2
coefficients remain stable and the adjusted R increases, this indicates
that the high-influence projects are not destabilising the model and
therefore do not need to be removed.
LNEWWP is the most highly and significantly correlated variable with
LTOTEFF, therefore it is the first input variable to be selected. The single
variable regression equation, with LTOTEFF as the dependent/response
variable and LNEWWP as the independent/predictor variable, gives an
adjusted R2 of 0.597 (see Table 3.3). Two projects are identified with
Cook’s D > 0.045, but their removal did not seem to destabilise the model,
2
i.e. after their removal the coefficients remained stable and the adjusted R
Web Productivity Measurement and Benchmarking
93
increased. Furthermore, there was no indication from the residual and P–P
plots that the residuals were non-normal.
The correlation between the residual and the remaining input variables
reveals that LTOTWP, LRIMG and LNEW are no longer significantly
correlated with the residual, therefore they are eliminated from the stepwise procedure. Since we have a linear relationship among the variables
RWP, TOTWP and NEWWP, it is appropriate that LTOTWP is removed
from the list of candidate variables once LNEWWP is selected.
Once this step is finished the remaining input variables are LRWP,
LTOTIMG, LIMGNEW, LHFOTS, LHFOTSA, LHNEW, LTOTHIGH,
LFOTS, LFOTSA and LTOTNHIG.
The next highly and significantly correlated variable with the residual is
LTOTHIGH. The single variable regression equation, with the residual as
the dependent/response variable and LTOTHIGH as the independ2
ent/predictor variable, gives an adjusted R of 0.228 (see Table 3.3). This
time five projects are identified with Cook’s D > 0.045, but their removal
did not destabilise the model. In addition, the residual and P–P plots found
no evidence of non-normality. The correlation between the residual and the
remaining input variables reveals that LRWP, LTOTIMG, LIMGNEW,
LHFOTS, LHFOTSA, LHNEW and LTOTNHIG are no longer significantly correlated with the residual, therefore they are eliminated from the
stepwise procedure. Again, since we have a relationship among the variables, TOTHIGH, HFOTS, HFOTSA and HNEW, after selecting
LTOTHIGH, it is appropriate that LFOTS, LFOTSA and LHNEW are all
removed from the list of candidate variables.
Neither of the two remaining variables, LFOTS and LFOTSA, are significantly correlated with the current residuals, therefore the procedure
finishes.
Finally, our last step is to construct the AdjustedSize model using a multivariate regression analysis with only the input variables selected using
the manual stepwise procedure. The coefficients for the AdjustedSize
2
model are presented in Table 3.4. Its adjusted R is 0.717 suggesting that
LNEWWP and LTOTHIGH can explain 72% of the variation in
LTOTEFF.
Table 3.3. Summary of the manual stepwise procedure
Variable
LNEWWP
LTOTHIGH
2
Effect
+
Adj. R
0.597
+
0.228
Comments
Variables removed after correlation with
residuals: LTOTWP, LRIMG, LNEW
Variables removed after correlation with
residuals: LRWP, LTOTIMG,
LIMGNEW, LHFOTS, LHFOTSA,
LHNEW and LTOTNHIG
94
Emilia Mendes, Barbara Kitchenham
Table 3.4. Coefficients for the AdjustedSize model
Variable
(Constant)
LNEWWP
LTOTHIGH
Coeff.
1.959
0.553
1.001
Std. error
0.172
0.061
0.164
t
11.355
9.003
6.095
P>|t|
0.000
0.000
0.000
[95% conf. interval]
1.616
2.302
0.431
0.675
0.675
1.328
Four projects had Cook’s D > 0.045 (see Table 3.5), therefore we followed the procedure adopted previously. We repeated the regression
analysis after excluding these four projects from the data set. Their removal did not result in any major changes to the model coefficients and the
2
adjusted R improved (0.757). Therefore, we assume that the regression
equation is reasonably stable for this data set, and it is not necessary to
omit these four projects from the data set.
Table 3.5. Four projects that presented high Cook’s distance
ID
20
25
32
45
NEWWP
20
0
22
280
TOTHIGH
0
4
8
0
TOTEFF
625
300
3150
800
Cook’s D
0.073
0.138
0.116
0.078
Figure 3.7 shows three different plots all related to residuals. The histogram (see Fig. 3.7(a)) suggests that the residuals are normally distributed,
corroborated by the P–P plot (see Fig. 3.7(b)). In addition, the scatter plot
of standardised residuals versus standardised predicted values does not
show any problematic patterns in the data.
Once the residuals and the stability of the regression model have been
checked, we are in a position to extract the equation that represents the
model.
However, before continuing, it is necessary to consider whether the accuracy of the multiple regression model is good enough to be the basis of
a subsequent productivity analysis. This is not a simple case of statistical
significance. It is possible to have a statistically significant equation that
accounts for such a small amount of the variation in the data that further
analysis would be valueless. However, there is no clear guideline on how
2
accurate the model needs to be. Our model has an R value of 0.72; is this
good enough? In our opinion, models need to account for at least 70% of
the variation before it can be considered viable for subsequent productivity analysis. However, it is also important to consider the size of the data
set. We are more likely to detect spurious results with a large number of
variables and a small number of data points. As a rule of thumb, in addi2
tion to achieving an R value of more than 0.7, the basic data set should
Web Productivity Measurement and Benchmarking
P-P Plot stand. Residual
Histogram
Dep. Variable: LTOTEFF
Expected Cum Prob
Dependent Variable: LTOTEFF
14
12
10
Frequency
95
8
6
4
Std. Dev = .99
2
0
Mean = 0.00
N = 87.00
1.00
.75
.50
.25
75
2.25
2. 5
7
1. 5
2
1.
5
.7
5
.2 5
-.25
-.7.25
-1.75
-1.25
-2.75
-2
0.00
0.00
.25
.50
.75
1.00
Observed Cum Prob
Regression Standardized Residual
(a)
(b)
Regression Stand. Residual
Scatterplot
Dependent Variable: LTOTEFF
3
2
1
0
-1
-2
-3
-2
-1
0
1
2
3
Regression Stand. Predicted Value
(c)
Fig. 3.7. Several residual plots
include more than 30 data points per independent variable before the
model is used for further analysis.8 Thus, our model is on the borderline
for use in a productivity analysis and we need to treat any results with
caution.
Extraction of AdjustedSize Equation
The equation that is obtained from Table 3.4 is the following:
LTOTEFF = 1.959 + 0.553LNEWWP + 1.001LTOTHIGH
8
(3.3)
This is an area where simulation studies are needed to provide evidence-based
guidelines.
96
Emilia Mendes, Barbara Kitchenham
This equation uses three variables that had been previously transformed,
therefore we need to transform it back to its original state, which gives the
following equation:
TOTEFF = 7.092 ( NEWWP + 1) 0.553 (TOTHIGH + 1)1.001
(3.4)
In Eq. 3.4, the multiplicative value 7.092 can be interpreted as the effort
required to develop one Web page.
Treating Eq. 3.4 as an AdjustedSize function, we can construct a productivity measure:
Productivity =
7.092 ( NEWWP + 1) 0.553 (TOTHIGH + 1)1.001
TOTEFF
(3.5)
Once the productivity measure has been constructed we are able to carry
out a productivity analysis as explained in the next section
3.3.2 Productivity Analysis
The productivity values constructed using Eq. 3.5 varied from a minimum
of 0.06 to a maximum of 14.66. The mean value was 1.61, the standard
deviation was 2, and the median was 1. The distribution of the productivity
values is shown in Fig. 3.8 using boxplots (see Fig. 3.8(a)) and a histogram
(see Fig. 3.8(b)). The histogram shows that 45% of the productivity values
are between 0.5 and 1.5, representing a range of values similar to the baseline of 1. The boxplots also show a number of outliers, which may be an
indication of productivity values significantly different from one.
50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PRODUCTIVITY
PRODUCTIVITY
(a)
(b)
Fig. 3.8. Distribution of productivity values
Web Productivity Measurement and Benchmarking
97
The mechanism used to check the existence of productivity values significantly different from one is to use the upper and lower bounds of the
AdjustedSize model to construct upper and lower bounds for the productivity values. The steps employed to obtain these upper and lower bounds are
the following:
Step 1. During the construction of the AdjustedSize model using a multivariate regression analysis also obtain the prediction intervals for
each individual value of the AdjustedSize measure, for a corresponding effort. SPSS creates two new variables (LICI_1 and
UICI_1), each with 87 values.
Step 2. The variables LICI_1 and UICI_1 have lower and upper values for
a predicted value LTOTEFF, therefore we need to transform them
back to the raw data scale by creating two new variables:
LICI_1
(3.6)
LICI_1_new = e
UICI_1
UICI_1_new = e
(3.7)
Step 3. Finally, divide the upper and lower bounds by total effort to get
the upper and lower productivity bounds, i.e. LICI_1_new/
TOTEFF and UICI_1_new/TOTEFF. This gives the upper and
lower bounds for the productivity value.
Once these bounds are obtained, the next step is to check whether there are
any productivity values either smaller than their lower bound or greater than
their upper bound. Figure 3.9 shows a line chart with lines representing
values for productivity (PRODUCTI), lower productivity bound (LOWER)
Fig. 3.9. Productivity values and their corresponding lower and upper bounds
98
Emilia Mendes, Barbara Kitchenham
and upper productivity bound (UPPER). We used a logarithmic scale to display the (Y) axis values to illustrate better that productivity values, represented by black squares, consistently remain in between their lower and upper bounds, represented by light grey and grey squares, respectively. This
means that we did not find any productivity values significantly different
from one.
After calculating suitable productivity values for each of the 87 Web
projects, we can carry out standard productivity analyses. The issues to be
investigated as part of this case study are:
Issue #1. The impact of reuse of Web pages on productivity.
Issue #2. The impact of team size on productivity.
Issue #3. The impact of number of programming languages on productivity.
Issue #4. The impact of average team experience with the programming
languages on productivity.
The Impact of Reuse of Web Pages on Productivity
We created a dummy variable to differentiate between projects that reused
Web pages and those that did not. Then we investigated the productivity
differences between the two groups of projects. The mean and median
productivity for the 48 projects that reused Web pages are 1.79 and 1.2,
respectively. The remaining 39 projects have mean and median of 1.4 and
0.92, respectively.
Figure 3.10 shows boxplots of the productivity distribution for each
group. Both distributions are not symmetric and exhibit outliers. Since
none of these distributions are normally distributed, we have to compare
their productivity values using a statistical test that does not assume the
data is normally distributed. We therefore employed a non-parametric test
called the Mann–Whitney U test to assess if the difference between the
two groups (two independent samples) was significant. The results were
not significant at the 0.05 level, therefore reuse is not having a significant
effect on productivity.
These results differ from those we obtained in [2], where, using a subset of 54 Web projects from the Tukutuku database, we found that reuse
had a significant effect on productivity. For that study the AdjustedSize
equation used LTOTWP,9 LIMGNEW10 and LTOTHIGH as its variables.
This is a different equation to the one we have constructed in this chapter; however, there are similarities between both. For example, LTOTWP
was removed from our manual stepwise procedure when LNEWWP
was selected (see Table 3.3), thus showing that it is a surrogate for
LNEWWP. In addition, LTOTHIGH is present in both equations. The
9
10
ln(TOTWP).
ln(IMGNEW +1).
PRODUCTIVITY
Web Productivity Measurement and Benchmarking
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
-1
N=
48
Reuse
99
39
No Reuse
Fig. 3.10. Boxplots of the productivity of projects that reused Web pages and
those that did not
best fitting equations are likely to change when more data is gathered.
However, the similarities between both equations suggest that they are
capturing a genuine underlying phenomenon. Whenever that is the case,
some variables are included in most equations, and surrogate variables are
selected in different equations. Note that, despite the similarity between
both equations, the productivity measures obtained and the corresponding
productivity analyses carried out are dependent on the data set employed.
The Impact of Team Size on Productivity
We created a dummy variable to differentiate between the seven team-size
values we observed in the data set. Next, we investigated the productivity
differences between the projects in each of the seven groups. Figure 3.11
shows boxplots of the productivity distribution for each group. Except for
one, all remaining distributions are not symmetric and three exhibit outliers.
In order to compare their productivity values, we used the KruskalWallis test. This is a non-parametric test that allows more than two groups
to be compared. The Kruskal-Wallis test suggests that productivity is significantly different between groups (chi-squared = 14.82 with 6 degrees of
freedom, p = 0.022).
The boxplots suggest that projects with a team size of 1 person presented the best productivity overall, with the median productivity above
the baseline of 1.
In addition to using boxplots, we also used a scatter plot to further investigate the relationship between productivity and team size. Figure
3.12(a) shows a scatter plot of productivity and team size, which suggests
a linear relationship.
Emilia Mendes, Barbara Kitchenham
PRODUCTIVITY
100
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
8
DEVTEAM
Fig. 3.11. Boxplots of productivity for different team sizes
Since both productivity and team size are skewed and present outliers,
they were transformed. Figure 3.12(b) shows the scatter plot for the transformed variables. Both scatter plots indicate that, as productivity values
decrease, team size increases, which represents a relationship that is negatively associated.
16
3
14
2
1
8
0
6
4
2
-1
LPROD
PRODUCTIVITY
12
10
0
-2
0
2
4
DEVTEAM
6
8
10
-2
-3
-.5
0.0
.5
1.0
1.5
2.0
2.5
LDEVTEAM
(a)
(b)
Fig. 3.12. Scatter plots of productivity and team size before and after transforming
variables
We performed a linear regression analysis using LPROD as the dependent variable and LDEVTEAM as the independent variable (see Table 3.6).
This analysis confirms that there is a statistically significant negative relationship between productivity and team size.11 This means that productivity
11
Kitchenham and Mendes 2 demonstrate that LPROD is mathematically equivalent to using the residual values of the original regression model (multiplied
by –1).
Web Productivity Measurement and Benchmarking
101
decreases as team size increases. These results are supported by another
study, where, using data from European space and military projects, Briand et al. [1] provide evidence that smaller teams result in substantially
higher productivity.
Table 3.6. Coefficients for productivity model based on team size
Variable
Coeff.
Std. Error
t
P>|t|
(Constant)
LDEVTEAM
0.512
-0.716
0.166
0.185
3.094
-3.878
0.003
0.000
[95% conf.
interval]
0.183
0.841
-1.083
-0.349
PRODUCTIVITY
The Impact of Number of Programming Languages on Productivity
We created a dummy variable to differentiate between the seven different
values for number of programming languages. Next, we investigated the
productivity differences between these seven groups. Figure 3.13 shows
boxplots of the productivity distribution for each group.
None of the distributions are symmetric and three exhibit outliers. In order to compare their productivity values we used the Kruskal–Wallis test,
which suggests that productivity is significantly different between groups
(chi-squared = 86 with 6 degrees of freedom, p < 0.01).
Boxplots for projects that used six languages presented the highest median, suggesting they were the most productive overall. However, since
this group contains only three projects, these results must be interpreted
with caution. The group that used seven languages also presented a median
above the baseline, but it only contained a single project.
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
NLANG
Fig. 3.13. Boxplots of productivity for different number of languages
102
Emilia Mendes, Barbara Kitchenham
In addition to using boxplots we also used scatter plots to further investigate the relationship between productivity and number of languages. Figure 3.14(a) shows a scatter plot of productivity and number of languages,
which does not suggest a strong linear pattern.
Since both productivity and number of languages are skewed and present outliers they were transformed. Figure 3.14(b) shows the scatter plot
for the transformed variables.
None of the scatter plots indicate a significant linear relationship between productivity and number of languages. Linear regression analysis
with LPROD as the dependent variable and LNLANG as the independent
variable confirms that there is no significant linear relationship between
LPROD and LNLANG.
3
16
14
2
12
1
10
0
6
-1
4
2
0
-2
0
LPROD
PRODUCTIVITY
8
1
2
3
NLANG
4
5
6
7
8
-2
-3
-.5
0.0
.5
1.0
1.5
2.0
LNLANG
(a)
(b)
Fig. 3.14. Scatter plots of productivity and team size before and after transforming
variables
The Impact of Average Team Experience with the Programming
Languages on Productivity
We created a dummy variable to differentiate between the eight different
values for average team experience. Next, we investigated the productivity
differences between these eight groups. Figure 3.15 shows boxplots of the
productivity distribution for each group. Two distributions are symmetric
and three exhibit outliers. In order to compare their productivity values we
used the Kruskal–Wallis test, which suggests that productivity is significantly different between groups (chi-squared = 86 with 7 degrees of freedom, p < 0.01).
Boxplots suggest that two groups, with average team experience of 1
and 10, respectively, are very productive with all, or nearly all, of their
data points above the baseline. However, both groups contain only two
projects each, therefore these results must be interpreted with care. Two
PRODUCTIVITY
Web Productivity Measurement and Benchmarking
103
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
8
10
TEAMEXP
Fig. 3.15. Boxplots of productivity for different average team experiences
16
3
14
12
2
10
8
6
1
4
2
0
-2
-1
0
LPROD
PRODUCTIVITY
other groups, with average team experience of 2 and 6, respectively, also
seem to contain productive projects, with a median productivity greater
than the productivity baseline. In addition, they contain at least six projects
each, which may indicate a more reliable pattern than that provided by the
two “very productive” groups.
In addition to using boxplots, we also used a scatter plot to further investigate the relationship between productivity and average team experience. Figure 3.16(a) shows a scatter plot of productivity and average team
experience, which does not suggest any linear relationship.
0
2
TEAMEXP
4
6
8
(a)
10
12
-2
-3
-.5
0.0
.5
LTEAMEXP
1.0
1.5
2.0
2.5
(b)
Fig. 3.16. Scatter plots of productivity and average team experience before and
after transforming variables
104
Emilia Mendes, Barbara Kitchenham
Since both productivity and average team experience are skewed and
present outliers, they were transformed. Figure 3.16(b) shows the scatter
plot for the transformed variables, which suggests a weak negative association between LPROD and LTEAMEXP. However, regression analysis
confirmed that there is no statistically significant relationship between
LPROD and LTEAMEXP.
The data set used in this case study comprises data on projects volunteered by individual companies. It was not a random sample of projects
from a defined population; thus, we cannot conclude that the results of our
productivity analysis apply to other Web application projects [2]. The results apply to the specific data set under analysis and may not be stable
when more data is added. For example, a previous analysis of a smaller
subset of 54 projects from the Tukutuku data set observed a significant
reuse effect that is not found in the current data set [2].
3.4 Conclusions
This chapter presented a productivity measurement method, which allows
for the use of different size measures. An advantage of the method is that it
has a built-in interpretation scale. It ensures that each project has an expected productivity value of one. We have presented a software productivity
measure that can be used when there are several size measures jointly significantly related to effort. Such a productivity measure is easy to construct
from a regression-based effort estimation model, and it is simple to interpret. In addition, it has a built-in baseline. A value greater than one is a sign
of good productivity, and a value less than one is a sign of poor productivity.
We have also presented a case study that used the productivity measurement method to construct a productivity measure, and used this measure to analyse the productivity of Web projects from the Tukutuku database. Four issues were investigated during the productivity analysis:
•
•
•
•
The impact of reuse of Web pages on productivity.
The impact of team size on productivity.
The impact of number of programming languages on productivity.
The impact of average team experience with the programming languages on productivity.
Results showed that reuse of Web pages had no impact on productivity,
and that different team sizes, number of programming languages and average team experiences could each present significant productivity differences among projects. However, we cannot generalise these results to
other Web projects and companies since the data set used is not a random
Web Productivity Measurement and Benchmarking
105
sample from a defined population [2]. Therefore the productivity measure
is applicable only to projects belonging to the data set based upon which it
was constructed.
References
1
Briand LC, El Emam K, Wieczorek I (1999) Explaining the Cost of European
Space and Military Projects. In: Proceedings of the ICSE 99 Conference,
May. Los Angeles, CA, pp 303–312
2
Kitchenham BA (1998) A Procedure for Analyzing Unbalanced datasets.
IEEE Transactions on Software Engineering, 24(4):278–301
3
Kitchenham BA, Mendes E (2004) Software Productivity Measurement Using
Multiple Size Measures. IEEE Transactions on Software Engineering,
30(12):1023–1035
4
Kitchenham BA, MacDonell SG, Pickard LM, Shepperd MJ (2001) What
Accuracy Statistics Really Measure. IEE Proceedings Software, June,
148(3):81–85
5
Maxwell K (2002) Applied Statistics for Software Managers. Prentice Hall
PTR
6
Mendes E, Mosley N, Counsell S (2003) Investigating Early Web Size Measures for Web Cost Estimation. In: Proceedings of the EASE’2003 Conference, Keele, UK, pp 1–22
Acknowledgements
We would like to thank Associate Professor Guilherme Travassos for his
comments on a previous version of this chapter.
Authors’ Biographies
Dr. Emilia Mendes is a Senior Lecturer in Computer Science at the University of
Auckland (New Zealand), where she leads the WETA (Web Engineering, Technology and Applications) research group. She is the principal investigator in the
Tukutuku Research project,12 aimed at developing and comparing Web effort
models using industrial Web project data, and benchmarking productivity within
and across Web companies. She has active research interests in Web measurement
and metrics, and in particular Web cost estimation, Web size measures, Web productivity and quality measurement, and Web process improvement. Dr. Mendes is
on the programme committee of numerous international conferences and workshops, and on the editorial board of the International Journal of Web Engineering
12
http://www.cs.auckland.ac.nz/tukutuku/.
106
Emilia Mendes, Barbara Kitchenham
and Technology and the Journal of Web Engineering. She has collaborated with
Web companies in New Zealand and overseas on Web cost estimation and usability measurement. Dr. Mendes worked in the software industry for ten years before
obtaining her PhD in Computer Science from the University of Southampton
(UK), and moving to Auckland. She is a member of the Australian Software Measurement Association.
Barbara Kitchenham is Professor of Quantitative Software Engineering at Keele
University and currently has a part-time position as a Senior Principal Researcher
with National ICT Australia (NICTA). She has worked in software engineering for
over 20 years in both industry and academia. Her main research interest is software metrics and its application to project management, quality control, risk management and evaluation of software technologies. She is particularly interested in
the limitations of technology and the practical problems associated with applying
measurement technologies and experimental methods to software engineering. She
is a Chartered Mathematician and Fellow of the Institute of Mathematics and Its
Applications. She is also a Fellow of the Royal Statistical Society. She is a visiting
professor at both the University of Bournemouth and the University of Ulster.
4 Web Quality
Luis Olsina, Guillermo Covella, Gustavo Rossi
Abstract: In this chapter we analyse the different quality perspectives of
software and Web applications. In particular, we review quality taking into
account the ISO (International Organization for Standardization) standards
for software product, and discuss the distinction between quality and quality in use, and how different requirements, from different users’ standpoints, should be considered as well. Moreover, we also describe Web
quality and how it can be measured and evaluated. In order to illustrate the
specific procedures and processes of an inspection evaluation methodology, a case study on the external quality of the shopping cart component of
two typical e-commerce Web applications is presented.
Keywords: Web quality, quality measurement, Logic Scoring Preference.
4.1 Introduction
The quality of an entity is easy to recognise but hard to define and evaluate. Although the term seems intuitively self-explanatory, there are actually many different perspectives and approaches to measure and evaluate
quality as part of a software or Web development, operation, and maintenance processes.
The meaning of quality is not simple and atomic, but a multidimensional
and abstract concept. Common practice assesses quality by means of the
quantification of lower abstraction concepts, such as attributes of entities.
The attribute can be briefly defined as a measurable property of an entity.1
An entity may have many attributes, though only some of them may be of
interest to a given project’s measurement and evaluation purposes. Therefore, quality is an abstract relationship between attributes of entities and
information needs (measurement goals).2 Figure 4.1 specifies some of
these terms and their relationships.
To illustrate these concepts let us consider the following example. One of
the goal’s of an organisation’s project, within a quality assurance plan, is to
“evaluate the link reliability of a Web application’s static pages”. The
1
2
Types of entities of interest to software and Web engineering are resource,
process, product, product in use, and service.
In fact, quality, quality in use, cost, etc., are instances of a computable concept.
110
Luis Olsina, Guillermo Covella, Gustavo Rossi
purpose is to evaluate the link reliability calculable concept for static Web
pages as the product entity, from a user’s viewpoint; we can see that the link
reliability sub-concept is a sub-characteristic related to the external quality
of a product. Considering the level of abstraction, a calculable concept can
be composed of other sub-concepts that may be represented by a concept
model (e.g. ISO 9126-1 [13]specifies the external quality model based on
characteristics and sub-characteristics). A calculable concept combines one
or more attributes of entities. Figure 4.2 shows a simple concept model
where three attributes are part of the link reliability calculable concept.
`
subEntity
InformationNeed
`
ab out
purpose
viewpoint
contextDescription
Entity
0..*
name
description
ConceptModel
name
specification
references
type = {own,standard,mix}
1
a
describes
0..*
1..*
a
asociated_with
a
represented_by
-focus
1..*
1..*
Attribute
CalculableConcept
0..* name
definition
references
`
combines
1
name
definition
objetive
1..*
references
_
subConcept
Fig. 4.1. Main terms and relationships related with the calculable concept term
where quality or quality in use are instances of it
1. Link Reliability
1.1 Internal Broken Links (IBL)
1.2 External Broken Links (EBL)
1.3 Invalid Links (IL)
Fig. 4.2. A concept model for the link reliability calculable concept
Web Quality
111
On the other hand, each attribute can be quantified by one or more metrics.3 The metric contains the definition of the selected measurement
method and scale (the conceptual model of metrics and indicators are introduced in Sect. 4.3.2).
The previous example, which does not include other external quality subconcepts, such as efficiency, functionality, and usability, is intended to
show that the meaning of quality is not atomic but rather a complex, multidimensional concept. Quality cannot be measured directly, at least not in a
trivial and subjective way. On the other hand, the requirements for quality
can vary depending on the entity type, user’s viewpoint, and context of use.
Regarding the entity type (e.g. process, product), quality requirements
specified as quality models can differ from one another. Moreover, we can
specify different requirements, from different users’ standpoints, for the
same entity type. In addition, the quality perception for the same software or
Web product can vary depending on contexts of use for the same user type!
In Sect. 4.2, we discuss the different perspectives of quality for software. In particular, in Sect. 4.2.1 we review the state of the art of quality
according to the ISO standards for software quality; we also address the
importance of distinguishing between quality and quality in use (see Sect.
4.2.2); and how different requirements, from diverse users’ standpoints,
should be considered (see Sect. 4.2.3).
The next section describes Web quality, focusing on the quality of Web
products and the perceived quality of real users in a real context of use.
Nowadays, the Web plays a central role in diverse application domains
such as business, education, government, industry, and entertainment. The
Web’s growing importance heightens concerns about Web applications’
development and evaluation methods, and requires the systematic use of
engineering models, methods, and tools. In particular, we need sound
evaluation methods for obtaining reliable information about product quality and quality in use. There are different categories of methods (e.g. inspection, testing, inquiry, simulation) and specific types of evaluation
methods and techniques (e.g. heuristic evaluation technique [19,20], the
concept model-centred evaluation method [24]). In Sect. 4.3 we present the
Web Quality Evaluation Method (WebQEM) as a model-centred evaluation method. Using WebQEM to assess Web applications helps to meet
quality requirements in new Web development projects and to evaluate
requirements in operational phases. It also helps discover absent attributes
or poorly implemented requirements, such as interface-related designs, and
implementation drawbacks or problems with navigation, accessibility,
search mechanisms, content, reliability and performance, among others.
3
Metric and measure mean the same within the context of this book.
112
Luis Olsina, Guillermo Covella, Gustavo Rossi
Section 4.4 presents a case study where the external quality of the shopping cart component of two typical e-commerce Web applications is assessed, using the specific models, procedures, and processes of the WebQEM methodology. In Sect. 4.5 concluding remarks to this chapter are
drawn.
4.2 Different Perspectives of Quality
The essential purpose-oriented evaluation of quality characteristics and
attributes for different entities is not an easy endeavour in either software
or Web engineering [18]. It is difficult to consider all the characteristics
and mandatory or desirable attributes of a process, or a product (e.g. Web
application), without using sound quality frameworks, models, and methods. These allow evaluators to specify systematically goal-oriented quality
concepts, sub-concepts, and attributes. An example of a generic quality
model is provided by the ISO standards for specifying quality requirements in the form of quality models to software processes and products.
As previously mentioned, quality requirements can vary depending on
the entity type, the users’ viewpoint, and the context of use. From a software measurement and evaluation point of view, we can identify different
entity types at a high level of abstraction, i.e. a resource, a process, a product, a service, a product or a system in use, as well as a software or Web
project. Quality requirements can be specified using a concept model representing quality or quality in use. Studies have shown that resource quality can potentially help improve process quality; process quality can help
improve product quality, which can help improve quality in use [13]. In
the same way, evaluating quality in use can provide feedback to improve
product quality; evaluating a product can provide feedback to improve
process quality; and evaluating a process can provide feedback to improve
resource quality (see Fig. 4.3).
Within the context of this chapter we focus on product quality and quality in use.
4.2.1 Standards and Quality
One standardisation milestone of the software product quality for assessment purposes occurred at the end of 1991, when ISO/IEC issued the quality model and the evaluation process model [9]. Previously, seminal work
defined software quality models and frameworks; among these were the
quality models specified by McCall, Boehm, and the US Air Force
Web Quality
113
(see [9]). The aim of the ISO/IEC organisation was to reach the required
consensus and to encourage international agreement.
The ISO/IEC 9126 standard prescribes six characteristics (sub-concepts)
that describe, with minimal overlap, software quality. In addition, it presents a set of quality sub-characteristics for each characteristic. As it also
specifies an evaluation process model, the two inputs to the quality requirement definition step are the ISO quality model and stated or implied
user needs.
The quality definition in this standard is “The totality of features and
characteristics of a software product that bears on its ability to satisfy
stated or implied needs” ([9]this definition is adopted from the previous
ISO 8402 standard entitled “Quality – Vocabulary” issued in 1986). The
six prescribed characteristics useful to evaluate product quality are Usability, Functionality, Reliability, Efficiency, Portability, and Maintainability.
For instance, Usability is defined as “A set of attributes that bear on the
effort needed for use, and on the individual assessment of such use, by a
stated or implied set of users.” In turn, usability is broken down into three
sub-characteristics, namely: Understandability, Learnability, and Operability (e.g. operability is defined as “Attributes of software that bear on
the users’ effort for operation and operation control”).
Other aspects of this standard are as follows:
•
•
•
•
•
The meaning of quality is taken as a complex, multidimensional concept that cannot be measured directly.
Given the complexity that the quality concept embraces, a quality
model to specify software product quality requirements is needed.
The general-purpose quality model contains a minimum number of
characteristics by which every type of software can be evaluated.
For the quality requirement definition step, the stated or implied user
needs are considered. In addition, the term user is acknowledged in
some definitions of characteristics and sub-characteristics (e.g. usability and its sub-characteristics).
ISO 9126 differs from traditional quality approaches that emphasise
the need to meet requirements that are primarily functional (e.g. the
manufacturing quality approach of ISO 9000).
As observed above, the ISO 9126 definitions acknowledge that the goal
of quality is to meet user needs. But what is not clearly stated is that the
purpose of software quality is to be “perceived with quality”: that is, with
degrees of excellence by end users in actual contexts of use. Rather, ISO
9126 suggests that quality is determined by the presence or absence of the
attributes, with the implication that these are specific attributes which can
be designed into the product. As Bevan [2]says:
114
Luis Olsina, Guillermo Covella, Gustavo Rossi
“Although developers would like to know what attributes to incorporate in the code to reduce the ‘effort required for use’, presence or
absence of predefined attributes cannot assure usability, as there is
no reliable way to predict the behaviour of the users of the final
product.”
To fill this gap, the ISO 9126 standard has been revised to specify a
quality framework that distinguishes among three different approaches to
software quality − internal quality, external quality, and quality in use. The
ISO/IEC 9126-1 standard, which includes these three approaches to quality, was officially issued in 2001 [13]. The evaluation process model initially included in ISO 9126 was moved to and fully developed in the
ISO/IEC 14598 series [11,12]. The three approaches of quality in ISO
9126-1 can be summarised as follows:
•
•
•
Internal quality, which is specified by a quality model (similar to the
ISO 9126 model), and can be measured and evaluated by static attributes of documents such as specification of requirements, architecture,
or design; pieces of source code; and so forth. In early phases of a
software lifecycle, we can evaluate and control the internal quality of
these early products, but assuring internal quality is not usually sufficient to assure external quality.
External quality, which is specified by a quality model (similar to the
ISO 9126 model), and can be measured and evaluated by dynamic
properties of the running code in a computer system, i.e. when the
module or full application is executed in a computer or network simulating as close as possible the actual environment. In late phases of a
software lifecycle (mainly in different kinds of testing, or even in the
acceptance testing, or furthermore in the operational state of a software
or Web application), we can measure, evaluate, and control the external quality of these late products, but assuring external quality is not
usually sufficient to assure quality in use.
Quality in use, which is specified by a quality model (similar to the
ISO 9241-11 model [10]), and can be measured and evaluated by the
extent to which the software or Web application meets specific user
needs in the actual, specific context of use.
The internal quality definition in ISO 9126-1 is “the totality of attributes of a product that determines its ability to satisfy stated and implied
needs when used under specified conditions”; the external quality definition is “the extent to which a product satisfies stated and implied needs
when used under specified conditions”; and the quality in use definition is
“the extent to which a product used by specified users meets their needs to
achieve specified goals with effectiveness, productivity and satisfaction in
Web Quality
115
specified context of use” (note that these definitions are in the ISO/IEC
14598-1 standard [12])
These three slightly different definitions of quality (instead of the
unique definition in the previous 9126 standard) refer particularly to the
product when it is used under specified conditions and context of use, so
making it clear that quality is not an absolute concept, but depends on specific conditions and context of use by specific users.
The same six prescribed quality characteristics have been maintained in
the revised internal and external quality models. Moreover, subcharacteristics are now prescriptive. Besides, new sub-characteristics were
added and redefined in terms of “the capability of the software” to enable
them to be interpreted as either an internal or an external perspective of
quality. For instance, usability characteristic is defined in [13]as “The capability of the software product to be understood, learned, used and attractive to the user, when used under specified conditions.” In turn, usability is subdivided into five sub-characteristics, namely: Understandability,
Learnability, and Operability, in addition to Attractiveness and Usability
compliance (see Table 4.1 for the definition of these sub-characteristics).
Table 4.1. Definition of usability sub-characteristics prescribed in ISO 9126-1
[13]for internal and external quality
Sub-characteristic
Definition
Understandability The capability of the software product to enable the user to
understand whether the software is suitable, and how it can
be used for particular tasks and conditions of use.
Learnability
The capability of the software product to enable the user to
learn its application.
Operability
The capability of the software product to enable the user to
operate and control it.
Attractiveness
The capability of the software product to be attractive to the
user.
Compliance
The capability of the software product to adhere to standards,
conventions, style guides or regulations relating to usability.
External quality is ultimately the result of the combined behaviour of
the software component or application and the computer system, while
quality in use is the effectiveness, productivity, safety, and satisfaction of
specific users when performing representative tasks in a specific, realistic
working environment. By measuring and evaluating the quality in use (by
means of metrics and indicators) the external quality of the software or
Web application can be validated. Quality in use evaluates the degree of
excellence, and can be used to validate the extent to which the software or
116
Luis Olsina, Guillermo Covella, Gustavo Rossi
Web application meets specific user needs. In turn, by measuring and
evaluating external quality, a software product’s internal quality can be
validated. Similarly, taking into account suitable software/Web application
attributes for internal quality is a prerequisite to achieve the required external behaviour, and to consider suitable software attributes to external
behaviour is a prerequisite to achieve quality in use (this dependency is
suggested in Fig. 4.3).
Entity
Quality / Concept Model
Attributes/Metrics
Resource
Resource
Quality
Resource
Attributes/Metrics
Process
Process
Quality
Process
Attributes/Metrics
Product
Quality
Product
Product/System
in Use
External
Quality
Internal
Quality
Quality
in Use
Product
Attributes/Metrics
for Internal and
External Quality
Product in Use
Attributes/Metrics
influences to
depends on
Fig. 4.3. Framework of quality regarding different entity types and potential quality models
The basic example introduced in Figure 4.2 focuses on external quality
because we cannot measure such application attributes (i.e. IBL, EBL, IL)
without Web server and network infrastructure support.
4.2.2 Quality Versus Quality in Use
While users are becoming more and more mature in the use of IT systems
and tools, there is greater demand for the quality of software and Web
applications that match real user needs in actual working environments.
The core aim in designing an interactive (software or Web) application
is to meet the user needs; that is, to provide degrees of excellence or quality in use by interacting with the application and by performing its tasks
Web Quality
117
comfortably. Within the context of the ISO 9126-1 standard, quality in use
is the end user’s view of the quality of a running system containing software, and is measured and evaluated in terms of the result of using the
software, rather than by properties of the software itself. A software product’s internal and external quality attributes are the cause, and quality in
use attributes are the effect. According to Bevan [2]:
“Quality in use is (or at least should be) the objective, software
product quality is the means of achieving it.”
Quality in use is a broader view of the ergonomic concept of usability as
for ISO 9241-11 [10]. Quality in use is the combined effect of the internal
and external quality characteristics for the end user. It can be measured and
evaluated by the extent to which specified users can achieve specified
goals with effectiveness, productivity, safety, and satisfaction in specified
contexts of use. Table 4.2 shows the definition of these four characteristics, and Fig. 4.4 outlines a partial view of the quality in use (concept)
model and associated attributes.
Table 4.2. Definition of the four quality in use characteristics prescribed in ISO
9126-1
Characteristic
Definition
Effectiveness The capability of the software product to enable users to achieve
specified goals with accuracy and completeness in a specified
context of use.
Productivity
The capability of the software product to enable users to expend
appropriate amounts of resources in relation to the effectiveness
achieved in a specified context of use.
Safety
The capability of the software product to achieve acceptable
levels of risk of harm to people, business, software, property or
the environment in a specified context of use.
Satisfaction
The capability of the software product to satisfy users in a specified context of use. Note [by ISO]. Satisfaction is the user’s response to interaction with the product, and includes attitudes
towards use of the product.
In order to design and select metrics (and indicators) for assessing quality in use it is first necessary to associate attributes to the effectiveness,
productivity, safety, and satisfaction characteristics. Figure 4.4 shows attributes for two characteristics, namely effectiveness and productivity.
118
Luis Olsina, Guillermo Covella, Gustavo Rossi
Quality in Use
1. Effectiveness
1.1 Task Effectiveness (TE)
1.2 Task Completeness (TC)
1.3 Error Frequency (EF)
2. Productivity
2.1 Efficiency related to Task Effectiveness (ETE)
2.2 Efficiency related to Task Completeness (ETC)
Fig. 4.4. Specifying an instance of the Quality in Use model
Note that effectiveness, productivity, safety, and satisfaction are influenced not only by the usability, functionality, reliability, and efficiency of
a software product, but also by two resource components of the context of
use. The context of use depends on both the infrastructure (i.e. the computer, network, or even the physical working medium) and the useroriented goals (i.e. the supported application tasks and the properties of the
user type such as level of training, expertise, and cultural issues as well).
Care should be taken when generalising the results of any quality in use
assessment to another context of use with different types of users, tasks, or
environments [2].
As a consequence, when designing and documenting quality in use
measurement and evaluation processes, at least the following information
is needed:
•
•
Descriptions of the components of the context of use, including user
type, equipment, environment, and application tasks (tasks are the
steps or sub-goals undertaken to reach an intended goal).
Quality in use metrics and indicators for the intended purpose and
measurement goal(s).
As a final remark, it can be observed that quality is not an absolute concept; there are different quality perspectives both to a product and to a
product in a context of use. Internal quality, external quality, and quality in
use can then be specified, measured and evaluated. Each of these perspectives has its own added value considering a quality assurance strategy in
the overall lifecycle. However, the final objective is the quality in use.
How a concept model (quality, quality in use) can be instantiated for different user standpoints is discussed next.
Web Quality
119
4.2.3 Quality and User Standpoints
In a measurement and evaluation process, the quality requirements specified in the form of a quality model should be agreed upon. The quality
model can be a standard-based quality model, a project or organisation’s
proprietary quality model, or a mixture of both.
Depending on the goal and scope of the evaluation, the concept model
and corresponding characteristics and attributes that might intervene
should be selected. Moreover, the importance of each characteristic varies
depending on the application’s type and domain, in addition to the user
standpoint taken into account. Therefore, the relative importance of characteristics, sub-characteristics, and attributes depends on the evaluation’s
goal and scope, the application domain, and the user’s viewpoint.
When designing an evaluation process, the assessment purpose and
scope may be manifold. For instance, the purpose can be to understand the
external quality of a whole software application or one of its components;
we might want to predict the external quality by assessing the internal
quality of a software specification, or to improve the quality in use of a
shopping cart component, or to understand and compare the external quality of two typical e-commerce Web applications to incorporate the best
features in a new development project. On the other hand, the type of applications can be at least categorised as mission-critical, or non-missioncritical, and the domain can be diverse (e.g. avionics, e-commerce, elearning, information-oriented Web applications).
Lastly, the user standpoint for evaluation purposes can be categorised as
one of an acquirer, a developer, a maintainer, a manager, or a final (end)
user. In turn, a final user can, for instance, be divided into a novice user or
an expert user. Thus, final users are mainly interested in using the software
or Web application, i.e. they are interested in the effects of the software
rather than in knowing the internal aspects of the source code or its maintainability. For this reason, when the external quality requirements are, for
example, defined from the end user’s standpoint, generally usability, functionality, reliability, and efficiency are the most important. Instead, from
the maintainer’s viewpoint, analysability, changeability, stability, and testability of application modules are the most important.
As a final comment, we would like to draw the reader’s attention to the
conceptual model shown in Fig. 4.1. That basic model is a key piece of a
set of tools we are currently building for measurement and evaluation projects. Given an entity (e.g. e-learning components to support course tasks),
it allows us to specify an evaluation information need: that is to say, the
purpose (e.g. understand), the user viewpoint (e.g. a novice student), in a
given context of use (e.g. the software is installed in the engineering school
server as support to a preparatory mathematics course for pre-enrolled
120
Luis Olsina, Guillermo Covella, Gustavo Rossi
students, etc.), with the focus on a calculable concept (quality in use) and
sub-concepts (effectiveness, productivity, and satisfaction), which can be
represented by a concept model (e.g. the ISO quality in use model) and
associated attributes (as shown in Fig. 4.4).
The next section describes Web quality. The main focus is on the quality of Web products and the perceived quality of real users in a real context
of use.
4.2.4 What is Web Quality?
According to Powell [26]Web applications “involve a mixture between
print publishing and software development, between marketing and computing, between internal communications and external relations, and between art and technology”.
Nowadays, there is a greater awareness and acknowledgement in the
scientific and professional communities about the multidimensional nature
of Web applications; it encompasses technical computing, information
architecture, contents authoring, navigation, presentation and aesthetic,
multiplicity of user audiences, legal and ethical issues, network performance and security, and heterogeneous operational environments.
As pointed out in Chap. 1, Web applications, taken as product, or product in use entities (without talking about distinctive features of Web development processes), have their own features, distinct from traditional software [18,26], namely:
•
•
•
•
•
Web applications will continue to be content-driven and documentoriented. Most Web applications, besides the increasing support to
functionalities and services, will continue aiming at showing and delivering information. This is a basic feature stemming from the early
Web that is currently empowered by the Semantic Web initiative [4].
Web applications are interactive, user-centred, hypermedia-based applications, where the user interface plays a central role; thus, Web applications will continue to be highly focused on the look and feel. Web interfaces might be easy to use, understand, and operate because thousand of
users with different profiles and capabilities interact with them daily.
The Web embodies a greater bond between art and science than that encountered in software applications. Aesthetic and visual features of Web
development are not just a technical skill, but also a creative, artistic skill.
Internationalisation and accessibility of content for users with various
disabilities are real and challenging issues in Web applications.
Searching and browsing are two basic functionalities used to find and
explore documents and information content. These capabilities are inherited from hypermedia-based applications.
Web Quality
•
•
•
•
121
Security is a central issue in transaction-oriented Web applications.
Likewise, performance is also critical for many Web applications, although both are also critical features for traditional applications.
The entire Web application, and its parts, are often evolutionary pieces
of information.
The medium where Web applications are hosted and delivered, is
generally more unpredictable than the medium where traditional software applications run. For instance, unpredictability in bandwidth
maintenance, or in server availability, can affect the perceived quality
that users could have.
Content privacy and intellectual property rights of materials are current
issues too. They involve ethic, cultural, and legal aspects as well. Most
of the time it is very difficult to establish legal boundaries due to the
heterogeneity of legislation in different countries, or even worse, the
absence of them.
Most of the above features make a Web application a particular artefact.
However, like a software application, it also involves source and executable code, persistent structured data, and requirements, architecture, design,
and testing specifications as well.
Therefore, we argue that the ISO quality framework introduced in previous sections is also applicable to a great extent to intermediate and final
lifecycle Web products. A discussion of this statement follows, as well as
how we could adapt specific particularities of Web quality requirements
into quality models.
Like any software line production, the Web lifecycle involves different
stages of its products, whether in early phases as inception and development, or in late phases as deployment, operation, and evolution. To assure
the quality of products, we can plan to do it by evaluating and controlling
the quality from intermediate products to final products. Thus, if we can
apply to the general question the same ISO internal and external quality,
and quality in use models, the natural answer is yes – we believe this does
not need further explanation. However, to the more specific question of
whether we can use the same six prescribed quality characteristics for internal and external quality, and the four characteristics for quality in use,
our answer is yes for the latter, but some other considerations might be
taken into account for the former.
In particular, as highlighted at the beginning of this section, the very
nature of Web applications is a mixture of information (media) content,
functionalities, and services. We argue that the six quality characteristics
(i.e. Usability, Functionality, Reliability, Efficiency, Portability, and
Maintainability) are not well suited (or they were not intended) to specify
requirements for information quality. As Nielsen [19] writes regarding
122
Luis Olsina, Guillermo Covella, Gustavo Rossi
Web content for informational Web applications: “Ultimately, users visit
your Web site for its contents. Everything else is just the backdrop.”
Hence, to follow the thread of our argument, the central issue is how we
can specify and gauge the quality of Web information content from the
internal and external quality perspectives.
Taking into account some contributions made in the area of information
quality [1,7,8,15,17] we have primarily identified four major sub-concepts
for the Content characteristic. The following categories can help to evaluate information quality requirements of Web applications:
•
•
•
•
Information accuracy. This sub-characteristic addresses the very intrinsic nature of the information quality. It assumes that information
has its own quality per se. Accuracy is the extent to which information
is correct, unambiguous, authoritative (reputable), objective, and verifiable. If a particular piece of information is believed to be inaccurate,
the Web site will likely be perceived as having little added value and
will result in reduced visits.
Information suitability. This sub-characteristic addresses the contextual
nature of the information quality. It emphasises the importance of conveying the appropriate information for user-oriented goals and tasks. In
other words, it highlights the quality requirement that content must be
considered within the context of use and the intended audience. Therefore, suitability is the extent to which information is appropriate
(appropriate coverage for the target audience), complete (relevant
amount), concise (shorter is better), and current.
Accessibility. This emphasises the importance of technical aspects of
Web sites and applications in order to make Web content more accessible for users with various disabilities (see, for instance, the WAI initiative [27]).
Legal compliance. This concerns the capability of the information
product to adhere to standards, conventions, and legal norms related to
contents and intellectual property rights.
Besides the above categories, sub-concepts of information structure and
organisation should be addressed. Many of these sub-characteristics, such
as global understandability,4 learnability, and even internationalisation, can
be related to the Usability characteristic.
On the other hand, other particular features of Web applications, such as
search and navigation functionalities, can be specified in the Functionality
sub-characteristics (e.g. are the basic and advanced searches suitable for
4
implemented by mechanisms that help to understand quickly the structure and
contents of the information space of a Web site like a table of contents, indexes,
or a site map.
Web Quality
123
the end user, o are they tolerant of mis-spelled words and accurate in retrieving documents?). In the same way, we can represent link and page
maturity attributes, or attributes to deficiencies due to browsers’ compatibility, in the Reliability sub-characteristics.
As a consequence, in order to represent software and Web applications,
quality information requirements accordingly, we propose to include the
Content characteristic in the internal and external quality model of the ISO
standard. A point worth mentioning is that in the spirit of the ISO 9126-1
standard it is stated that “evaluating product quality in practice requires
characteristics beyond the set at hand”; and as far as the requirements for
choosing the prescribed characteristics, an ISO excerpt recommended “To
form a set of not more than six to eight characteristics for reasons of clarity
and handling.”
Finally, from the “quality in use” perspective, for the Satisfaction characteristic, specific items for evaluating the quality of content as well as
items for navigation, aesthetics, functions, etc., can be included. In addition, for other quality in use characteristics such as Effectiveness and Productivity, specific user-oriented evaluation tasks that include performing
actions with content and functions can be designed and tested.
4.3 Evaluating Web Quality using WebQEM
As introduced in Sect. 4.1, the Web currently plays a central role in diverse
application domains for various types of organisations and even in the
personal life of individuals. Its growing importance heightens concerns
about Web processes being used for the development, maintenance, and
evolution of Web applications, and about the evaluation methods being
used for assuring Web quality, and ultimately argues for the systematic use
of engineering models, methods, and tools. Therefore, we need sound
evaluation methods that support efforts to meet quality requirements in
new Web development projects and assess quality requirements in operational and evolutionary phases. It is true that one size does not fit all the
needs and preferences, but an organisation might at least adopt a method or
technique in order to judge the state of its quality, for improvement purposes. We argue that a method or technique is usually not enough to assess
different information needs for diverse evaluation purposes.
In this section we present the Web Quality Evaluation Method (WebQEM) [24] as a model-centred evaluation method for the inspection category; that is, inspection of concepts, sub-concepts, and attributes stemming
from a quality or quality in use model. We have used the WebQEM methodology since the late 1990s. The underlying WebQEM strategy is evaluator-driven by domain experts rather than user-driven; quantitative and
124
Luis Olsina, Guillermo Covella, Gustavo Rossi
model-centred rather than qualitative and intuition-centred; and objective
rather than subjective. Of course, a global quality evaluation (and eventual
comparison), where many characteristics and attributes, metrics, and indicators intervene, cannot entirely avoid subjectivity. Next, a robust and
flexible evaluation methodology must properly aggregate subjective and
objective components controlled by experts.
The WebQEM process steps are grouped into four major technical
phases that are now further described:
1. Quality Requirements Definition and Specification.
2. Elementary Measurement and Evaluation (both Design and Implementation Stages).
3. Global Evaluation (both Design and Implementation Stages).
4. Conclusion and Recommendations.
Non-functional Requirements
Elementary Evaluation
Partial/Global Evaluation
Web Audience's Needs
Contextual
Decision
Criteria
ISO/IEC 9126-1 Quality Models
or own-defined Web Quality Models
Evaluation
Goal
Elementary
Indicator Definition
Metric Definition
Metric
Specification
Web Product
Components
Measurement
Implementation
Global Indicator
Definition
Elementary
Indicator
Specification
Design of
the Evaluation
Web Product
Descriptions
Requirements
Specification
Global
Indicator
Specification
Measure
Value
Execution of
the Evaluation
Quality Requirements
Definition
Indicator
Elementary Indicator Value
Implementation
Final
Results
Documentation / Conclusion of the Evaluation
Recommendations
Global Indicator
Implementation
Fig. 4.5. The evaluation processes underlying the WebQEM methodology. The
technical phases, main processes, and their inputs and outputs are represented
Web Quality
125
Figure 4.5 shows the evaluation process underlying the methodology,
including the phases, main processes, inputs, and outputs. This model follows to some extent the ISO’s process model for evaluators [11]. Next we
give an overview of the major technical phases, and some used models.
4.3.1 Quality Requirements Definition and Specification
During the definition and specification of quality requirements, evaluators
clarify the evaluation goals and the intended user’s viewpoint. They select
a quality model, for instance the ISO-prescribed characteristics, in addition
to attributes customised to the Web domain. Next, they identify these
components’ relative importance to the intended audience and the extent of
coverage required.
Once the domain and product descriptions, the agreed goals, and the selected user view (i.e. the explicit and implicit user needs) are defined, the
necessary characteristics, sub-characteristics, and attributes can be specified in a quality requirement tree (such as that shown in Figs. 4.5 and 4.9).
This phase yields a quality requirement specification document.
4.3.2 Elementary Measurement and Evaluation
The elementary measurement and evaluation phase defines two major
stages (see Fig. 4.5): elementary evaluation design and execution (implementation). Regarding the elementary evaluation design, we further identify two main processes: (a) metric definition and (b) elementary indicator
definition.
In our previous work [16,25], we have represented the conceptual domain of metrics and indicators from an ontological viewpoint. The conceptual framework of metrics and indicators, which was based as much as
possible on the concepts of various ISO standards [12,14], can be useful to
support different quality assurance processes, methods, and tools. That is
the case for the WebQEM methodology and its supporting tool (WebQEM_Tool [23]), which are based on this framework.
As shown in Fig. 4.6, each attribute can be quantified by one or more
metrics. For the metric definition process we should select just a metric for
each attribute of the quality requirement tree, given a specific measurement project.
The metric contains the definition of the selected measurement and/or
calculation method and scale. The metric m represents the mapping m:
A→ X, where A is an empirical attribute of an entity (the empirical world),
X the variable to which categorical or numerical values can be assigned
126
Luis Olsina, Guillermo Covella, Gustavo Rossi
(the formal world), and the arrow denotes a mapping. In order to perform
this mapping a sound and precise definition of measurement activity is
needed by specifying explicitly the metric’s method and scale (see Fig.
4.6). We can apply an objective or subjective measurement method for
direct metrics, and we can perform a calculation method for indirect metrics; that is, when an equation intervenes.
To illustrate this, we examine the following direct metrics, taken from
the example shown in Fig. 4.2:
1) Internal Broken Links Count (#IBL, for short),
2) External Broken Links Count (#EBL), and
3) Invalid Links Count (#IL).
In case we need a ratio or percentage, with regard to the Total of Links
Count (#TL), the next indirect metrics can be defined:
4) %IBL = (#IBL / #TL) * 100, and so forth to
5) %EBL; and
6) %IL.
Metric
_
quantifies
1
Attribute
name
1..* valueInterpretation
definition
references
accuracy
2..*
Scale
1
1
contains
`
<<enum>> scaleType
valueType = {Symbol, Integer,Float}
a
refers_to
name
definition
objetive
references
Measure
value
0..*
Measurement
timePoint
a
related_metrics
CalculationMethod
IndirectMetric
1
CategoricalScale
allowedValues
produces
`
DirectMetric
MeasurementMethod
includes
b
1
type = {Objetive, Subjetive}
NumericalScale
type = {continuous, discrete}
1
Method
name
specification
references
expressed_in
b
Tool
0..*
1..*
automated_by
`
name
description
version
provider
1..*
Unit
name
description
Fig. 4.6. Main terms and relationships with the metric concept
The scale type for the direct metrics presented above is absolute, represented by a numerical scale with integer value type. For the direct metrics
1) and 2), a specific objective measurement method can be applied (e.g. a
recursive algorithm that counts each 404 HTTP status code). In addition, to
automate the method, a software tool can be utilised; conversely, for the
direct metric 3), it is harder to find a tool to automate it. On the other hand,
Web Quality
127
for the indirect metrics 4), 5), and 6), we can use a calculation method in
order to perform the specified equation.
However, because the value of a particular metric will not represent the
elementary requirement’s satisfaction level, we need to define a new mapping that will yield an elementary indicator value.
In [16,25] the indicator term is stated as:
“the defined calculation method and scale in addition to the model
and decision criteria in order to provide an estimate or evaluation of
a calculable concept with respect to defined information needs.”
In particular, we define an elementary indicator as that which does not
depend upon other indicators to evaluate or estimate a concept at a lower
level of abstraction (e.g. for associated attributes to a concept model); in
addition, we define a partial or global indicator as that which is derived
from other indicators to evaluate or estimate a concept at a higher level of
abstraction (i.e. for sub-characteristics and characteristics). Therefore, the
elementary indicator represents a new mapping coming from the interpretation of the metric’s value of an attribute (the formal world) into the new
variable to which categorical or numerical values can be assigned (the new
formal world). In order to perform this mapping, a model and decision
criterion for a specific user information need is considered. Figure 4.7
represents these concepts.
_
related_to
_
includes
CalculationMethod
1
CalculableConcept
name
definition
references
_
evaluates/estimates
1
`
produces
Calculation
timePoint
0..*
Indicator
`
contains
name
accuracy
1 references
description
1
IndicatorValue
1
value
Scale
<<enum>> scaleType
1 valueType = {Symbol, Integer,Float}
2..*
0..*
a
related_indicators
sub Concept
_
ElementaryIndicator
Metric
name
_
interprets
valueInterpretation
definition
0..1
references
accuracy
GlobalIndicator
1
modeled_by
b
1
ElementaryModel
name
specification
has
b
1
GlobalModel
name
specification
DecisionCriteria
name
description
1..*
range
has
b
1..*
Fig. 4.7. Main terms and relationships with the indicator concept
128
Luis Olsina, Guillermo Covella, Gustavo Rossi
Hence, an elementary indicator for each attribute of the concept model
can be defined. To the 1.1 attribute of Fig. 4.2, the name of the elementary
indicator can be for example Internal Broken Links Preference Level
(IBL_P). The specification of the elementary model can look like this:
IBL_P = 100% if %IBL = 0; IBL_P = 0% if %IBL >= X max
otherwise IBL_P =( (X max – %IBL) / X max ) * 100
if 0 < %IBL < X max where X max is some agreed upper threshold
The decision criteria that a model of an indicator may have are the
agreed acceptability levels in a given scale; for instance, it is unsatisfactory if the range is 0 to 40%; marginal if it is greater than 40% and less
than or equal than 60%; otherwise, satisfactory.
One fact worth mentioning is that the selected metrics are useful for a
measurement process, as long as the selected indicators are useful for an
evaluation process. Indicators are ultimately the foundation for interpretation of information needs and decision-making. Finally, Fig. 4.5 depicts
the execution stage for the specified metrics and elementary indicators.
4.3.3 Global Evaluation
The global evaluation phase has two major stages: design and execution of
the partial and global quality evaluation.
Regarding the global evaluation design, we identify the definition process of partial and global indicators. In this process, an aggregation and
scoring model, and decision criteria, must be selected. The quantitative
aggregation and scoring models aim at making the evaluation process well
structured, objective, and comprehensible to evaluators. At least two types
of models exist: those based on linear additive scoring models [6[, and
those based on non-linear multi-criteria scoring models [5], where different attributes and characteristic relationships can be designed. Both use
weights to consider an indicator’s relative importance. For example, if our
procedure is based on a linear additive scoring model, the aggregation and
computing of partial/global indicators (P/GI), considering relative weights
(W), is based on the following equation:
P/GI = (W1 EI1 + W2 EI2 + ... + Wm EIm)
(4.1)
such that, if the elementary indicator (EI) is on a percentage scale, the following holds: 0 <= EIi <= 100.
Also the sum of weights for an aggregation block, or group, must fulfil:
(W1 + W2 + … + Wm ) = 1; if Wi > 0; to i = 1 ... m
(4.2)
Web Quality
129
where m is the number of sub-concepts at the same level in the aggregation
block’s tree.
The basic arithmetic aggregation operator for inputs is the plus (+) connector. We cannot use Equation 4.1 to model input simultaneity, or replaceability, among other limitations, as we discuss later.
Therefore, once we have selected a scoring model, the aggregation
process follows the hierarchical structure as defined in the quality or quality in use requirement tree (see Fig 4.4), from bottom to top. Applying a
stepwise aggregation mechanism, we obtain a global schema. This model
lets us compute partial and global indicators in the execution stage. The
global quality and ‘quality in use’ indicator ultimately represents the
global degree of satisfaction in meeting the stated requirements, from a
user’s viewpoint.
4.3.4 Conclusions and Recommendations
The conclusion of the evaluation comprises documenting Web product
components, the specification of quality requirements, metrics, indicators,
elementary and global models, and decision criteria; and also it records
measures and elementary, partial, and global indicator values. Requesters
and evaluators can then analyse and understand the assessed product’s
strengths and weaknesses with regard to established information needs,
and suggest, and justify, recommendations.
4.3.5 Automating the Process using WebQEM_Tool
The evaluation and comparison processes require both methodological and
technological support. We have developed a Web-based tool (WebQEM_Tool [23]) to support the administration of evaluation projects. It
permits editing, relating non-functional requirements, and calculating indicators based on the two aggregation models previously presented. Next, by
automatically or manually editing elementary indicators, WebQEM_Tool
aggregates the elements to yield a schema and calculates a global quality
indicator for each application. This allows evaluators to assess and compare a Web product’s quality to quality in use. WebQEM_Tool relies on a
Web-based hyperdocument model that supports traceability of evaluation
projects. It shows evaluation results using linked pages with textual, tabular, and graphical information, and dynamically generates pages with these
results, obtained from tables stored in the data layer.
Currently, we are implementing a more robust measurement and evaluation framework, so-called INCAMI (Information Need, Concept model,
Attribute, Metric, and Indicator). Its foundation lies in the ontological
130
Luis Olsina, Guillermo Covella, Gustavo Rossi
specification of metrics and indicators [16,25]. The Web-based tool related
to the INCAMI framework is called INCAMI_Tool.
4.4 Case Study: Evaluating the Quality of Two Web
Applications
We have used WebQEM to evaluate the quality of Web applications in
several domains, which is documented elsewhere [3,21,22]. We discuss
here its application in an e-business domain.
4.4.1 External Quality Requirements
Many potential attributes, both general and domain-specific, can contribute
to the quality of a Web application. However, an evaluation must be focused, and purpose-oriented for a real information need. Let us establish
that the purpose is to understand and compare the external quality of the
shopping cart component of two typical e-stores, from a general visitor’s
viewpoint, in order to incorporate the best features in a new e-bookstore
development project. To this end, we chose a successful international application – Amazon (www.amazon.com/books), and a well-known regional application – Cuspide (www.cuspide.com.ar).
Figure 4.8 shows a screenshot of Cuspide’s shopping cart page with
several highlighted attributes, which intervene in the quality requirements
tree of Fig. 4.9. For the definition of the external quality requirements, we
considered four main characteristics: Usability (1), Functionality (2), Content (3), and Reliability (4), and 32 attributes related to them (see Fig. 4.9).
For instance, the Usability characteristic splits into sub-characteristics,
such as understandability (1.1), learnability (1.2), operability (1.3), and
attractiveness (1.4). We also consider another two separate characteristics:
Functionality and Content. Functionality is decomposed into function suitability (2.1) and accuracy (2.2). Content is decomposed into information
suitability (3.1) and content accessibility (3.2). As the reader can observe
(see Fig. 4.9), we relate five measurable attributes to the function suitability sub-characteristic, and three to the function accuracy. In the latter subcharacteristic, we mainly consider precision attributes to recalculate values, after making supported edit operations.
On the other hand, as mentioned in Sect. 4.2.4, information suitability
stresses the contextual nature of the information quality. It emphasises the
importance of conveying the appropriate information for user-oriented
goals and tasks.
Web Quality
131
First-time Visitor Help
Shopping Cart
Labeling
A
i
Line Item Information
Completness
Capability to Modify
an Item Quantity
Product Description
Appropiateness
Shipping and Handling
Info Completeness
Proceed-to-Check-out
Feedback
Appropiateness
Continue-Buying
Feedback
Appropiateness
Fig. 4.8. A screenshot of Cuspide’s shopping cart page with several attributes
INCAMI_Tool records all the information for an evaluation project. Besides the project data itself, it also saves to the InformationNeed class (see
Fig. 4.1) the purpose, user viewpoint, and context description metadata; for
the CalculableConcept and Attribute classes, it saves all the names, and
definitions, respectively. The ConceptModel class permits one to instantiate a specific model, i.e. the external quality model in our case, allowing
evaluators to edit and relate specific concepts, sub-concepts, and attributes.
The resulting model is similar to that in Fig. 4.9.
4.4.2 Designing and Executing the Elementary Evaluation
As mentioned in Sect. 4.3.2, the evaluators should design, for each measurable attribute of the instantiated external quality model, the basis for the
elementary evaluation process, by defining each specific metric and elementary indicator accordingly.
In the design phase we record all the information for the selected metrics and indicators, regarding the conceptual schema of Metric and Elementary Indicator classes shown in Figs. 4.6 and 4.7, respectively.
132
Luis Olsina, Guillermo Covella, Gustavo Rossi
1. Usability
1.1. Understandability
1.1.1. Shopping cart icon/label ease to be recognized
1.1.2. Shopping cart labeling appropriateness
1.2. Learnability
1.2.1. Shopping cart help (for first-time visitor)
1.3. Operability
1.3.1.
1.3.2.
1.3.3.
1.3.4.
Shopping cart control permanence
Shopping cart control stability
Steady behaviour of the shopping cart control
Steady behaviour of other related controls
1.4. Attractiveness
1.4.1. Color style uniformity (links, text, etc.)
1.4.2. Aesthetic preference
2. Functionality
2.1. Function Suitability
2.1.1.
2.1.2.
2.1.3.
2.1.4.
2.1.5.
Capability to add items from anywhere
Capability to delete items
Capability to modify an item quantity
Capability to show totals by performed changes
Capability to save items for later/move to cart
2.2. Function Accuracy
2.2.1. Precision to recalculate after adding an item
2.2.2. Precision to recalculate after deleting items
2.2.3. Precision to recalculate after modifying an item quantity
3. Content
3.1. Information Suitability
3.1.1. Shopping Cart Basic Information
3.1.1.1. Line item information completeness
3.1.1.2. Product description appropriateness
3.1.2. Shopping Cart Contextual Information
3.1.2.1. Purchase Policies Related Information
3.1.2.1.1. Shipping and handling costs information completeness
3.1.2.1.2. Applicable taxes information completeness
3.1.2.1.3. Return policy information completeness
3.1.2.2. Continue-buying feedback appropriateness
3.1.2.3. Proceed-to-check-out feedback appropriateness
3.2. Content Accessibility
3.2.1. Readability by Deactivating the Browser Image Feature
3.2.1.1. Image title availability
3.2.1.2. Image title readability
3.2.2. Support for text-only version
Web Quality
133
4. Reliability
4.1. Nondeficiency (Maturity)
4.1.1. Link Errors or Drawbacks
4.1.1.1. Broken links
4.1.1.2. Invalid links
4.1.1.3. Reflective links
4.1.2. Miscellaneous Deficiencies
4.1.2.1. Deficiencies or unexpected results dependent on browsers
4.1.2.2. Deficiencies or unexpected results independent on
browsers
Fig. 4.9. Specifying the external quality requirements tree of the shopping cart
component for a general visitor standpoint
Table 4.3. Summary of elementary indicators’ values of the shopping cart of both
applications
Code
2.1.1
2.1.2
2.1.3
2.1.4
2.1.5
3.1.1.1
3.1.1.2
3.1.2.1.1
3.1.2.1.2
3.1.2.1.3
3.1.2.2
3.1.2.3
3.2.1.1
3.2.1.2
3.2.2
4.1.1.1
4.1.1.2
4.1.1.3
4.1.2.1
4.1.2.2
Attribute name
Amazon
Capability to add items from anywhere
50.0
Capability to delete items
66.0
Capability to modify an item quantity
100.0
Capability to show totals by performed changes 66.0
Capability to save items for later/move to cart
100.0
Line item information completeness
100.0
Product description appropriateness
100.0
Shipping and handling costs information com100.0
pleteness
Applicable taxes information completeness
100.0
Return policy information completeness
100.0
Continue-buying feedback appropriateness
100.0
Proceed-to-check-out feedback appropriateness 100.0
Image title availability
50.0
Image title readability
100.0
Support for text-only version
0.0
Broken links
100.0
Invalid links
100.0
Reflective links
50.0
Deficiencies or unexpected results dependent
100.0
on browsers
Deficiencies or unexpected results independent
30.0
of browsers
Cuspide
50.0
100.0
100.0
66.0
0.0
33.0
30.0
100.0
100.0
66.0
60.0
100.0
50.0
50.0
0.0
100.0
100.0
50.0
66.0
30.0
134
Luis Olsina, Guillermo Covella, Gustavo Rossi
In addition, in the execution phase, we record for the Measurement and
Calculation classes’ instances the yielded final values for each metric and
indicator. Table 4.3 contains calculated elementary indicators’ values for
the shopping cart component of Amazon and Cuspide. The data collection
for the measurement activity was performed from 15 to 20 November 2004.
Once evaluators have designed and implemented the elementary evaluation, they should consider not only each attribute’s relative importance, but
also whether the attribute (or sub-characteristic) is mandatory, alternative,
or neutral. For this task, we need a robust aggregation and scoring model,
described next.
4.4.3 Designing and Executing the Partial/Global Evaluation
The design and execution of the partial/global evaluation represents a
phase where we select and apply an aggregation and scoring model (see
Fig. 4.5). Arithmetic or logic operators will then relate the hierarchically
grouped attributes, sub-characteristics, and characteristics accordingly.
As mentioned earlier, we can use a linear additive or a non-linear multicriteria scoring model (or even others). We cannot use the additive scoring
model to model input simultaneity (an and relationship among inputs) or
replaceability (an or relationship), because it cannot express, for example,
simultaneous satisfaction of several requirements as inputs. Additivity
assumes that insufficient presence of a specific attribute (input) can always
be compensated by sufficient presence of any other attribute. Furthermore,
additive models cannot model mandatory requirements; that is, a necessary
attribute’s or sub-characteristic’s total absence cannot be compensated by
others’ presence.
A non-linear multi-criteria scoring model lets us deal with simultaneity,
neutrality, replaceability, and other input relationships by using aggregation operators based on the weighted power means mathematical model.
This model, called Logic Scoring of Preference [5 ](LSP), is a generalisation of the additive scoring model, and can be expressed as follows:
1
P / Gl (r ) = (W1 El1r + W2 El 2r + " + Wm El mr ) r
(4.3)
−∞ ≤ r ≤ +∞; P / Gl (−∞) = min( El1 , El 2, ! El m ) ;
and
where
P / Gl (+∞) = max( El1 , El 2 ,!, El m )
The power r is a parameter selected to achieve the desired logical relationship and polarisation intensity of the aggregation function. If P/GI(r) is
Web Quality
135
closer to the minimum, such a criterion specifies the requirement for input
simultaneity. If it is closer to the maximum, it specifies the requirement for
input replaceability. Equation 4.3 is additive when r = 1, which models the
neutrality relationship; that is, the formula remains the same as in the first
additive model. Equation 4.3 is supra-additive for r > 1, which models
input disjunction or replaceability. And it is sub-additive for r < 1 (with r
<> 0), which models input conjunction or simultaneity.
For our case study we selected this last model and used a 17-level approach of conjunction–disjunction operators, as defined by Dujmovic [5].
Each operator in the model corresponds to a particular value of the r parameter. When r = 1 the operator is tagged with A (or the + sign). The C or
conjunctive operators range from weak (C–) to strong (C+) quasiconjunction functions; that is, from decreasing r values, starting from r < 1.
In general, the conjunctive operators imply that low-quality input indicators can never be well compensated by a high quality of some other input to output a high-quality indicator (in other words, a chain is as strong
as its weakest link). Conversely, disjunctive operators (D operators) imply
that low-quality input indicators can always be compensated by a high
quality of some other input. Designing an LSP aggregation schema requires answering the following key basic questions (which are part of the
Global Indicator Definition task in Fig. 4.5):
•
•
•
What is the relationship between this group of related attributes and
sub-characteristics: conjunctive, disjunctive, or neutral? (For instance,
when modelling the attributes’ relationship for the Function Suitability
(2.1) sub-characteristic, we can agree that they are neutral or independent of each other.)
What is the level of intensity of the logic operator, from a weak to
strong conjunctive or disjunctive polarisation?
What is the relative importance or weight of each element in the aggregation block or group?
WebQEM_Tool (which is being integrated into INCAMI_Tool) lets
evaluators select the aggregation and scoring model. When using the additive scoring model, the aggregation operator is A for all tree aggregation
blocks. If evaluators select the LSP model, they must indicate the operator
for each group.
Figure 4.10 shows a partial view of the enacted schema for Amazon.com, as generated by our tool.
136
Luis Olsina, Guillermo Covella, Gustavo Rossi
Partial Usability
Indicator
Selected Site
Conjunctive
Operator
Fig. 4.10. Once the weights and operators were agreed and the schema checked,
WebQEM_Tool yields partial and global indicators as highlighted in the righthand pane
4.4.4 Analysis and Recommendations
Once we have performed the final execution of the evaluation, decisionmakers can analyse the results and draw conclusions and recommendations.
As stated in Sect. 4.4.1, one of the primary goals of this study is the understanding and comparison of the current level of fulfilment of required
external quality characteristics and attributes (see Fig. 4.9) for the shopping cart of two typical e-commerce applications, from a general visitor’s
standpoint. In addition, the best features of both shopping carts can be
incorporated in a new e-bookstore development project. The underlying
assumption of this study is that at the level of characteristics at least they
are within the satisfactory acceptability range.
Table 4.4 shows the final values for the Usability, Functionality, Content,
and Reliability characteristics, and the global quality indicator to both the
Amazon and Cuspide shopping carts. The quality bars in Fig. 4.11 indicate
the acceptability ranges and the quality level each shopping cart has reached.
Amazon scored a higher quality level (84.32%) than Cuspide (65.73%). We
suggest that scores between 40% and 60% (marginal acceptance) indicate
Web Quality
137
the need for improvement. An unsatisfactory rating, obtained by a score
below 40%, means that improvements must be made very soon, so taking
high priority. A score above 60% indicates a satisfactory quality.
Table 4.4. Summary of partial and global indicators’ values of the Amazon.com
and Cuspide.com shopping carts
Code
1
1.1
1.2
1.3
1.4
2
2.1
2.2
3
3.1
3.1.1
3.1.2
3.1.2.1
3.2
3.2.1
4
4.1
4.1.1
4.1.2
Characteristic/Subcharacteristic name
External Quality Indicator
Usability
Understandability
Learnability
Operability
Attractiveness
Functionality
Function Suitability
Function Accuracy
Content
Information Suitability
Shopping Cart Basic Information
Shopping Cart Contextual Information
Purchase Policies Related Information
Content Accessibility
Readability by Deactivating the Browser
Image Feature
Reliability
Nondeficiency (Maturity)
Link Errors or Drawbacks
Miscellaneous Deficiencies
Amazon
84.32
90.1
75.00
100.00
87.50
100.00
87.61
76.40
100.00
81.61
100.00
100.00
100.00
100.00
56.79
67.75
Cuspide
65.73
90.1
75.00
100.00
87.50
100.00
80.05
63.20
100.00
45.11
47.30
31.47
81.17
88.68
41.91
50.00
75.34
75.34
94.35
58.00
67.61
67.61
94.35
44.40
Looking at the Usability and Functionality characteristics we see similar
scores in both applications, so that we can emulate such attributes in a new
development project. We can just highlight that the Capability to save
items for later/move to cart (2.1.5) desirable attribute is absent in Cuspide,
and the Capability to delete items (2.1.2) attribute is more suitable in Cuspide, as users can delete several items at once from the shopping cart (see
the elementary indicators in Table 4.3).
Nonetheless, the greatest score differences can be observed in the Content characteristic (see Tables 4.3 and 4.4). Cuspide must plan changes in
the Shopping Cart Basic Information sub-characteristic mainly in the
3.1.1.1 and 3.1.1.2 attributes. For instance, the Line item information completeness has to have at least the author description besides the title description, because when users add another item with the same starting title
138
Luis Olsina, Guillermo Covella, Gustavo Rossi
(e.g. Software Engineering …) they cannot, looking at the shopping cart,
determine who is the author of each title. Even worse, users might navigate
back to find out who the authors are because they have no link to a detailed
product description.
Fig. 4.11. WebQEM_Tool shows diverse information types (as textual, tabular,
and graphical). The graph depicts the final shopping cart ranking
With regard to the Content Accessibility sub-characteristic, we may not
emulate both applications because they are in the marginal acceptability
level. On the other hand, we found Deficiencies or unexpected results independent of browsers (4.1.2.2) in both shopping carts; that is, there is no
input validation in the quantity field so that a user can type decimal numbers or alphanumeric inputs, which can lead to unexpected outcomes.
Finally, we observe that the state of the art of the shopping cart quality
on typical e-bookstores, from the visitor’s point of view, is rather high, but
the wish list is not empty because of some poorly designed or absent attributes. Notice that elementary, partial, and global indicators reflect the
results of these specific requirements for this specific audience and should
not be regarded as generalised rankings. Moreover, results themselves
from a case study are seldom intended to be interpreted as generalisations
that can be applicable to any other applications.
Web Quality
139
4.5 Concluding Remarks
Developing successful Web applications with economic and quality issues
in mind requires broad perspectives and the incorporation of a number of
principles, models, methods, and techniques from diverse disciplines such
as information systems, computer science, hypertext, graphic design, information structuring, knowledge management, and ultimately software
engineering as well. Web engineering is therefore an amalgamation of
many disciplines, but with its own challenges. It has a very short history
compared with other engineering disciplines, but is rapidly evolving. Like
any other engineering science, Web engineering is concerned with the
establishment and use of sound scientific, engineering, and management
principles, and disciplined and systematic approaches to the successful
development, deployment, maintenance, and evolution of Web sites and
applications within budgetary, calendar, and quality constraints.
As mentioned above, the quality of an entity is easy to recognise but hard
to define and evaluate, and sometimes costly to incorporate in the end product. In this chapter we have discussed what quality in general, and what
Web quality in particular, is about. We adhere to the ISO approaches of
quality: that is, internal quality, external quality, and quality in use. Because
quality is not achieved at the end of a development without a carefully designed quality assurance strategy in the early stages, we argue that the three
perspectives of quality per se have their own relative importance. However,
we also adhere to the saying “Quality in use is (or at least should be) the
objective, software product quality is the means of achieving it” [2].
We have highlighted that the very nature of Web applications is a mixture of information content, functionalities, and services. Next, we proposed to include Content as an extra characteristic in the internal and external quality models to the ISO 9126-1 standard (see Sect. 4.2.4).
On the other hand, regarding Web engineering evaluation approaches,
we posed the need for counting with sound evaluation frameworks, methods, and techniques that support efforts to meet quality requirements at
different stages of a Web project. We also stated that very often a method
or technique is not enough to assess different information needs for diverse
evaluation purposes. In this context, we presented WebQEM as a quantitative evaluation method for the inspection category whose underlying strategy is evaluator-driven by domain experts rather than user-driven; quantitative and model-centred rather than qualitative and intuition-centred; and
objective rather than subjective. We are aware that a global quality evaluation (and eventual comparison), where many characteristics and attributes,
metrics, and indicators intervene, cannot entirely avoid subjectivity. Then
a robust and flexible evaluation methodology must properly aggregate
subjective and objective components controlled by experts.
140
Luis Olsina, Guillermo Covella, Gustavo Rossi
In order to illustrate WebQEM and its applicability, we conducted an ebusiness case study by evaluating the external quality of the shopping cart
components of Amazon and Cuspide sites, taking into account a general
visitor’s standpoint. As a matter of fact, the data collection and evaluation
were made by two expert evaluators working simultaneously. Note the important difference between evaluating external quality and quality in use.
The former generally involves only experts and the latter always involves
real end users. The advantage of using expert evaluation without extensive
user involvement is minimising costs, time, and potential misinterpretation
of questions (i.e. end users may sometimes interpret instructions and questionnaire items in a different way than they were intended to). The choice of
whether to involve end users or not should be carefully planned and justified. Ultimately, without end user participation, it is unthinkable to conduct
task testing in a real context of use. Nielsen indicates that commonly up to
five subjects in the testing process for a given audience produce meaningful
results minimizing costs: “The best results come from testing no more than
5 users and running as many small tests as you can afford” [19].
As a last remark, we are currently implementing a more robust measurement and evaluation framework called INCAMI which stands for Information Need, Concept model, Attribute, Metric, and Indicator; its foundation lies in the ontological specification of metrics and indicators [24].
WebQEM_Tool, which is part of this measurement and evaluation framework, allows consistently saving of not only metadata of metrics and indicators but also data for specific evaluation projects. Inter- and intra-project
analyses and comparisons can now be performed in a consistent way. This
applied research is thoroughly discussed in a follow-up manuscript.
Acknowledgements
This research is supported by the UNLPam-09/F022 project, Argentina.
Gustavo Rossi has been partially funded by Secyt's project PICT No 13623.
References
1
2
3
Alexander J, Tate M (1999) Web Wisdom: How to Evaluate and Create Information Quality on the Web. Lawrence Erlbaum, Hillsdate, NJ
Bevan N (1999) Quality in Use: Meeting User Needs for Quality. Journal of
Systems and Software, 49(1):89–96
Covella G, Olsina L (2002) Specifying Quality Attributes for Sites with ELearning Functionality. In: Proceedings of the Ibero American Conference on
Web Engineering (ICWE, 02), Santa Fe, Argentina, pp 154–167
Web Quality
141
4
Davies J, Fensel D, Van Harmelen F (2003) Towards the Semantic Web:
Ontology-driven Knowledge Management. John Willey & Sons
5
Dujmovic J (1996) A Method for Evaluation and Selection of Complex
Hardware and Software Systems. In: Proceedings of the 22nd International
Conference for the Resource Management and Performance Evaluation of Enterprise CS, CMG 96 Proceedings, 1, pp 368–378
6
7
Gilb T (1976) Software Metrics. Chartwell-Bratt, Cambridge, MA
Herrera-Viedma E, Peis E (2003) Evaluating the Informative Quality of
Documents in SGML Format from Judgements by Means of Fuzzy Linguistic
Techniques Based on Computing with Words. J Information Processing. &
Management 39(2):233–249
8
Huang K, Lee YW, Wang RY (1999) Quality Information and Knowledge.
Prentice Hall, Englewood Cliffs, NJ
9 ISO/IEC 9126 (1991) Information technology – Software product evaluation
– Quality characteristics and guidelines for their use
10 ISO 9241–11 (1998) Ergonomic requirements for office work with visual
display terminals (VDT)s – Part 11 Guidance on Usability
11 ISO/IEC 14598–5 (1998) Information technology – Software product evaluation – Part 5: Process for evaluators
12 ISO/IEC 14598–1 (1999) Information technology – Software product evaluation – Part 1: General Overview
13 ISO/IEC 9126–1 (2001) Software Engineering – Product Quality – Part 1:
Quality Model
14 ISO/IEC 15939 (2002) Software Engineering – Software Measurement Process
15 Lee YW, Strong DM, Kahn BK, Wang RY (2002) AIMQ: A Methodology for
Information Quality Assessment. Information & Management, 40(2):133–146
16 Martín M, Olsina L (2003) Towards an Ontology for Software Metrics and
Indicators as the Foundation for a Cataloging Web System. In: Proceedings of
the 1st Latin American Web Congress, Santiago de Chile, pp 103–113
17 Mich L, Franch M, Gaio L (2003) Evaluating and Designing the Quality of
Web Sites. IEEE MultiMedia, 10(1):34–43
18 Murugesan S, Deshpande Y, Hansen S, Ginige A (2001) Web Engineering: A
New Discipline for Development of Web-based Systems. In: Murugesan S,
Deshpande Y (eds) Web Engineering: Managing University and Complexity
of Web Application Development, LNCS 2016, Springer, Berlin,pp 3–13
19 Nielsen J (1995–2004) The Alertbox column, http://www.useit.com/alertbox/
20 Nielsen J, Molich R, Snyder C, Farrell S (2001) E-Commerce User Experience, NN Group
21 Olsina L, Godoy D, Lafuente G, Rossi G (1999) Assessing the Quality of
Academic Web sites: a Case Study. New Review of Hypermedia and Multimedia, 5:81–103
142
Luis Olsina, Guillermo Covella, Gustavo Rossi
22 Olsina L, Lafuente G, Rossi G (2000) E-commerce Site Evaluation: a Case
Study. In: Proceedings of the 1st International Conference on Electronic
Commerce and Web Technologies. LNCS 1875, Springer London, UK, pp
239–252
23 Olsina L, Papa MF, Souto ME, Rossi G (2001) Providing Automated Support
for the Web Quality Evaluation Methodology. In: Proceedings of the 4th
Workshop on Web Engineering, at the 10th International WWW Conference,
Hong Kong, pp 1–11
24 Olsina L, Rossi G (2002) Measuring Web Application Quality with WebQEM. IEEE Multimedia, 9(4):20–29
25 Olsina L, Martín M (2004) Ontology for Software Metrics and Indicators. J of
Web Engineering. 2(4):262–281
26 Powel TA (1998) Web Site Engineering: Beyond Web Page Design. Prentice
Hall
27 WWW Consortium, Web Content Accessibility Guidelines 1.0,
http://www.w3.org/TR/WAI-WEBCONTENT/ (accessed on 10th November
2004)
Authors’ Biographies
Luis Olsina is an Associate Professor in the Engineering School at National University of La Pampa, Argentina, and heads the Software and Web Engineering
R&D group (GIDISWeb). His research interests include Web engineering, particularly Web metrics and indicators, quantitative evaluation methods, and ontologies for the measurement and evaluation domain. He authored the WebQEM
methodology. He earned a PhD in software engineering and an MSE from National University of La Plata, Argentina. In the last seven years, he has published
over 50 refereed papers, and participated in numerous regional and international
events both as programme committee chair and member. He is an IEEE Computer
Society member.
Guillermo Covella is an Assistant Professor in the Engineering School at National University of La Pampa, Argentina. He is currently an MSE student in the
Informatics School at National University of La Plata, developing his thesis on
quality in use evaluation of web applications. His primary research interests are
web quality and quality in use, specifically in the field of e-learning.
Gustavo Rossi is Full Professor at Universidad Nacional de La Plata, Argentina,
and heads LIFIA, a computer science research lab in the College of Informatics. His
research interests include context-awareness and Web design patterns and frameworks. He coauthored the Object-Oriented Hypermedia Design Method (OOHDM)
and is currently working on the application of design patterns in context-aware
software. He earned a PhD in Computer Science from Catholic University of Rio de
Janeiro (PUC-Rio), Brazil. He is an ACM member and IEEE member.
5 Web Usability: Principles
and Evaluation Methods
Maristella Matera, Francesca Rizzo, Giovanni Toffetti Carughi
Abstract: Current Web applications are very complex and highly sophisticated software products, whose usability can greatly determine their success or failure. Defining methods for ensuring usability is one of the current goals of Web engineering research. Also, much attention is currently
paid to usability by industry, recognising the importance of adopting
methods for usability evaluation before and after application deployment.
This chapter introduces principles and evaluation methods to be adopted
during the whole application lifecycle for promoting usability. For each
evaluation method, the main features, as well as the emerging advantages
and drawbacks, are illustrated so as to support the choice of an evaluation
plan that best fits the goals to be pursued and the available resources. The
design and evaluation of a real application is also described for exemplifying the concepts and methods introduced.
Keywords: Web usability, Evaluation methods, Web usability principles,
Development process.
5.1 Introduction
The World Wide Web has had a significant impact on access to the large
quantity of information available through the Internet. Web-based applications have influenced several domains, by providing access to information
and services to a variety of users with different characteristics and backgrounds. Users visit Web applications, and return to previously accessed
applications, if they can easily find useful information, organised in a way
that facilitates access and navigation, and presented according to a wellstructured layout. In other words, the acceptability of Web applications by
users relies strictly on the applications’ usability.
Usability is one relevant factor of a Web application’s quality. Recently,
it has received great attention, and been recognised as a fundamental property for the success of Web applications. Defining methods for ensuring
usability is therefore one of the current goals of Web engineering research.
Also, much attention is currently paid to usability by industry, which is
recognising the importance of adopting usability methods during the development process, to verify the usability of Web applications before and
144
Maristella Matera et al.
after their deployment. Some studies have demonstrated how the use of
such methods reduces costs, with a high cost benefit ratio, as they reduce
the need for changes after the application is delivered [40,50].
5.1.1 Usability in the Software Lifecycle
Traditional software engineering processes do not explicitly address usability within their lifecycles. They suggest different activities, from the
initial inception of an idea until the product deployment, where testing is
conducted at the end of the cycle to check if the application design satisfies
the high-level requirements, agreed by the customer, is complete and internally consistent. To achieve usable applications, it is necessary to extend the standard lifecycle to explicitly address usability issues. This objective does not imply simply adding some activities; rather it requires
appropriate techniques which span the entire lifecycle [20].
Given the emergent need for usability, traditional development processes
were extended to enable the fulfilment of usability requirements. Evaluation methods have been adopted at all stages within the process, to verify
the usability of incremental design artefacts, as well as of the final product.
This has resulted in the proposal of the so-called iterative design [58,16] for
promoting usability throughout the entire development lifecycle.
With respect to more traditional approaches, which suggest the use of a
top-down method (such as for example the waterfall model), iterative
design prescribes that the development process be complemented by a
bottom-up, synthetic approach, in which the requirements, the design, and
the product gradually evolve to become well defined. The essence of iterative design is that the only way to be sure about the effectiveness of design decisions is by building and evaluating application prototypes. The
design can then be modified, to correct any false assumptions detected
during the evaluation activities, or to accommodate new requirements; the
cycle represented by design, evaluation, and redesign must be repeated as
often as necessary.
In this context, usability evaluation is interpreted as an extension of testing, carried out through the use of prototypes with the aim of verifying the
application design against usability requirements. Evaluation is central to
this model: it is relevant at all the stages in the lifecycle, not just at the end
of the product development. All aspects of the application development are
in fact subject to constant evaluation, involving expert evaluators and users.
Iterative development is consistent with the real nature of design. It emphasises the role of prototyping and evaluation, the discovery of new requirements, and the importance of involving diverse stakeholders –
including users.
Web Usability: Principles and Evaluation Methods
145
What makes iterative development more than merely well-intentioned
trial and error? Usability engineering became the banner under which diverse methodological endeavours were carried throughout the 1990s:
•
•
It proposes that iterative development is managed according to explicit
and measurable objectives, called “usability specifications”, which
must be identified early in the development process. Explicit usability
goals are therefore incorporated within the design process, emphasising that the least expensive way of obtaining usable products is to consider usability issues early in the lifecycle, reducing the need to modify
the design at the end of the process [44,45].
It suggests the use of “simple usability engineering”, which adopts
easy- to-apply, and efficient, evaluation techniques, encouraging developers to consider usability issues throughout the whole development cycle [47].
5.1.2 Chapter Organisation
The aim of this chapter is to illustrate usability principles and evaluation
methods that, in the context of an iterative design process, can support the
production of usable Web applications. After introducing the general concept of usability and its specialisation for the Web, we present usability
criteria that support Web usability in two ways: first, they can guide the
design process, providing guidelines on how to organise the application by
means of usable solutions; second, they drive the evaluation process, providing benchmarks for usability assessment. We will then present evaluation methods to be tackled during the entire development process – both
during design and after application deployment based on the intervention
of usability specialists, or involvement of real users.
In order to exemplify the concepts introduced, we discuss several important usability issues during the design and evaluation of a real Web
application, developed for the Department of Electronics and Information
(DEI) at Politecnico di Milano (http://www.elet.polimi.it). The DEI application is a very large, data-intensive application, consisting of:
•
•
•
A public area, publishing information about the Department staff, and
their teaching and research activities. It receives about 9000 page requests per day from external users.
An intranet area, supporting some administrative tasks available to
300 DEI members.
A content management area, which provides Web administrators with
an easy-to-use user interface front-end for creating or updating content
to be published via the Web application.
146
Maristella Matera et al.
5.2 Defining Web Usability
Usability is generally taken as a software quality factor that aims to provide the answer to many frustrating problems caused by the interaction
between people and technology. It describes the quality of products and
systems from the point of view of its users.
Different definitions of usability have been proposed, which vary according to the models they are based on. Part 11 of the international standard ISO 9241 (Ergonomic Requirements for Office Work with Visual Display Terminals) provides guidance on usability, introducing requirements
and recommendations to be used during application design and evaluation
[29]. The standard defines usability as “the extent to which a product can
be used by specified users to achieve specified goals with effectiveness,
efficiency and satisfaction in a specified context of use”. In this definition,
effectiveness means “the accuracy and completeness with which users
achieve specified goals”, efficiency is “the resources expended in relation
to the accuracy and completeness with which users achieve goals”, and
satisfaction is described as “the comfort and acceptability of use”. Usability problems therefore refer to aspects that make the application ineffective, inefficient, and difficult to learn and use.
Although the ISO 9241-11 recommendations have become the standard
for the usability specialists’ community, the usability definition most
widely adopted is the one introduced by Nielsen [45]. It provides a detailed
model in terms of usability constituents that are suitable to be objectively
and empirically verified through different evaluation methods. According
to Nielsen’s definition, usability refers to:
•
•
•
•
•
Learnability: the ease of learning the functionality and behaviour of
the system.
Efficiency: the level of attainable productivity, once the user has
learned the system.
Memorability: the ease of remembering the system functionality, so
that the casual user can return to the system after a period of non-use,
without needing to learn again how to use it.
Few errors: the capability of the system to feature a low error rate, to
support users making few errors during the use of the system, and, in
case they make errors, to help them recover easily.
Users’ satisfaction: the measure in which the user finds the system
pleasant to use.
The previous principles can be further specialised and decomposed into
finer-grained criteria that can be verified through different evaluation methods. The resulting advantage is that more precise and measurable criteria
Web Usability: Principles and Evaluation Methods
147
contribute towards setting an engineering discipline, where usability is not
just argued, but systematically approached, evaluated, and improved
[44,45].
When applying usability to Web applications, refinements need to be
applied to the general definitions, to capture the specificity of this application class. Main tasks for the Web include: finding desired information and
services by direct search, or the discovery of others by browsing; comprehending the information presented; invoking and executing services specific to certain Web applications, such as the ordering and downloading of
products. Paraphrasing the ISO definition, Web usability can therefore be
considered as the ability of Web applications to support such tasks with
effectiveness, efficiency, and satisfaction. Also, Nielsen’s usability principles mentioned above can be interpreted as follows [48]:
•
•
•
•
•
Web application learnability must be interpreted as the ease for Web
users to understand the contents and services made available through
the application, and how to look for specific information using the
available links for hypertext browsing. Learnability also means that
each page in the hypertext front-end should be composed in a way such
that its contents are easy to understand and navigational mechanisms
are easy to identify.
Web applications efficiency means that any content can be easily
reached by users through available links. Also, when users get to a
page, they must be able to orient themselves and understand the meaning of this page with respect to the starting point of their navigation.
Memorability implies that, after a period of non-use, users are still able
to orient themselves within the hypertext; for example, by means of
navigation bars pointing to landmark pages.
Few errors mean that when users erroneously follow a link, they are
able to return to their previous location.
Users’ satisfaction refers to the situation in which users feel they are in
control with respect to the hypertext, since they comprehend the available content and navigational commands.
In order to be evaluated, the previous criteria can be further refined into
more objective and measurable criteria. Section 5.3 will introduce a set of
operational criteria for Web application design and evaluation.
5.2.1 Usability and Accessibility
Recently, the concept of usability has been extended to include accessibility. Accessibility focuses on application features that support universal
access by any class of users and technology [59]. In particular, accessibility
148
Maristella Matera et al.
focuses on properties of the mark-up code that make page contents “readable” by technologies assisting impaired users. Some literature gives accessibility a broader meaning: that is, the ability of an application to support
any users identifying, retrieving, and navigating its contents [26,63]. In
fact, accessible Web applications are advantageous to any users, especially
in specific contexts of use, such as adopting voice-based devices (e.g. cellular phones) while driving. According to this meaning, accessibility can
therefore be considered a particular facet of Web usability.
The W3C Web Accessibility Initiative (WAI) acts as the central point
for setting accessibility guidelines for the Web. Its work concentrates on
the production of Web Content Accessibility Guidelines (WCAG 2.0) [72],
which focus on two main goals:
•
•
Producing contents that must be perceivable and operable: this implies
using a simple and clear language, as well as defining navigation and
orientation mechanisms for supporting content access and browsing.
Ensuring access alternatives: this means that pages must be designed
and coded so they can be accessed independently from the adopted
browsing technologies and devices, and from the usage environment.
The first goal is strictly related to the definition of Web usability; it can
be pursued by focusing on usability criteria that enhance the effectiveness
and efficiency of navigation and orientation mechanisms. The second goal
can be achieved via the page mark-up, and in particular:
•
•
•
Separating presentation from content and navigation design, which
enables an application to present the same content and navigational
commands according to multiple presentation modalities, suitable for
different devices.
Augmenting multimedia content with textual descriptions, so it can be
presented through alternative browsing technologies, such as screen
readers for assisting impaired users.
Creating documents that can be accessed by different types of hardware devices. For example, it should be possible to interact with page
contents even through voice devices, small-size devices, or black and
white screens, and when pointing devices are not available.
WCAG recommendations provide 14 guidelines, each specifying how it
can be applied within a specific context. For further details the reader is
referred to [72].
Web Usability: Principles and Evaluation Methods
149
5.3 Web Usability Criteria
According to the usability engineering approach, a cost-effective way to
increase usability is for it to be addressed from the early phases of an application’s development. A solution for achieving this goal is to take into
account criteria that refine general usability principles (such as those presented in Sect. 5.2), suggesting how the application must be organised to
conform to usability requirements [45]. Such criteria drive the design activity, providing guidelines on how to restrict the space of design alternatives, thus preventing designers from adopting solutions that can lead to
unusable applications [20]. In addition, they constitute the background for
the evaluation activity.
The development of Web applications, according to several methods recently introduced in Web engineering [5,14,57], must focus on three separate dimensions: data, hypertext, and presentation design - each being accompanied by a set criterion. Criteria so far proposed for the design of user
interfaces [28,45,53], as well as the W3C-WCAG guidelines for accessibility, work well for organising the presentation layer of Web applications
[39,49]. Table 5.1 summarises the ten “golden rules” proposed by Nielsen
in 1993 for the design and evaluation of interactive systems.
More specific criteria are, however, needed for addressing the specific
requirements, conventions and constraints characteristic of the design of
content and hypertext links in Web applications. This section therefore
proposes a set of criteria that suggest how Web applications should be
organised, at the data and hypertext level, supporting information finding,
browsing, and user orientation. These represent the three fundamental
aspects we believe have the greatest impact on usability of Web applications. The criteria have been defined in the context of a model-driven
design method [14,15]; as such, they take advantage of adopting a few
high-level conceptual abstractions for systematically planning the overall
structure of the application, avoiding implementation details and mark-up
coding of single pages. Our method focuses on the broad organisation of
the information content and the hypertext structure (“in-the-large”). In
particular, the criteria are based on the assumption that the retrieval and
fruition of content by end users is significantly affected by the way in
which the content itself is conceived, designed, and later delivered by the
hypertext interface. This assumption is also supported by a recommendation coming from the fields of human computer interaction and human
factor studies [41,69,70].
150
Maristella Matera et al.
Table 5.1. Nielsen’s ten heuristics for user interface design and evaluation
(http://www.useit.com/papers/heuristic/heuristic_list.html)
HEURISTIC
1. Visibility of system
status
2. Match between system and the real
world
3. User control and
freedom
4. Consistency and
standards
5. Error prevention
6. Recognition rather
than recall
7. Flexibility and efficiency of use
8. Aesthetic and minimalist design
9. Help users recognise,
diagnose, and recover from errors
10. Help and documentation
DESCRIPTION
The system should always keep users informed
about what is going on, through appropriate feedback within reasonable time.
The system should speak the users' language, with
words, phrases, and concepts familiar to the user,
rather than system-oriented terms. Follow realworld conventions, making information appear in a
natural and logical order.
Users often choose system functions by mistake
and will need a clearly marked “emergency exit” to
leave the unwanted state without having to go
through an extended dialogue. Support undo and
redo.
Users should not have to wonder whether different
words, situations, or actions mean the same thing.
Follow platform conventions.
Even better than good error messages is a careful
design which prevents a problem from occurring in
the first place.
Make objects, actions, and options visible. The
user should not have to remember information
from one part of the dialogue to another. Instructions for use of the system should be visible or
easily retrievable whenever appropriate.
Accelerators - unseen by the novice user - may
often speed up the interaction for the expert user
such that the system can cater to both inexperienced and experienced users. Allow users to tailor
frequent actions.
Dialogues should not contain information which is
irrelevant or rarely needed. Every extra unit of
information in a dialogue competes with the relevant units of information and diminishes their
relative visibility.
Error messages should be expressed in plain language (no codes), precisely indicate the problem,
and constructively suggest a solution.
Even though it is better if the system can be used
without documentation, it may be necessary to
provide help and documentation. Any such information should be easy to search, focused on the
user's task, list concrete steps to be carried out, and
not be too large
Web Usability: Principles and Evaluation Methods
151
The usability of Web applications thus requires the complete understanding and accurate modelling of data resources. As such, and differently
from previous proposals [25,48,49], our criteria are organised as general
principles later expanded into two sets of more practical guidelines, one
suggesting how to structure content, and another proposing the definition
of usable navigation and orientation mechanisms for content access and
browsing.
5.3.1 Content Visibility
In order to understand the structure of the information offered by the application, and become oriented within hypertext, users must be able to
easily identify the main conceptual classes of contents.
Identification of Core Information Concepts
Content visibility can be supported by an appropriate content design,
where the main classes of content are identified and adequately structured.
To fulfil this requirement, the application design should start from the
identification of the information entities modelling the core concepts of the
application, which act as the application backbones, representing the best
answer to users’ information requirements [15]. Data design will be centred on such content, and will gradually evolve by detailing its structure in
terms of elementary components, and further add access and browsing
content.
Hypertext Modularity
The hypertext must be designed to support users to perceive where core
concepts are located. To this end:
•
•
•
The hypertext can be organised in areas, i.e. modularisation constructs
grouping pages that publish homogeneous contents. Each one should
refer to a given core concept identified at a data level.
Areas must be defined as global landmarks accessible through links,
grouped in global navigation bars that are displayed in any page of the
application interface.
Within each area, the most representative pages (e.g. the area entry
page, search pages, or any other page from which users can invoke
relevant operations) can be defined as local landmarks, reachable
through local navigation bars displayed in any page within the area.
152
Maristella Matera et al.
These links supply users with cornerstones to enhance their orientation
within the area.
The regular use of hierarchical landmarks within pages enhances learnability and memorability: landmarks indeed provide intuitive mechanisms
for highlighting the available content and the location within the hypertext
where they are placed. Once learned, they also support orientation and
error recovery, as they are available throughout the application as the simplest mechanism for context change.
Content Visibility in the DEI Application
In the DEI application, the core concepts of the public module are the research areas, the teaching activities, the industrial projects, and the DEI
members. In accordance with this organisation of information content, the
hypertext of the Web application is organised into four areas, Research,
Teaching, Industry, and People, each corresponding to a single core concept (see Fig. 5.1).
Global navigation bar,
providing links to the main
application areas
Local navigaton bar,
providing links to landmark
pages within the People
area
Link for traversing a
semantic interconnection
Links for browsing the DEI
Member subschema
Fig. 5.1. Page organisation, with global and local landmarks, and core, peripheral,
and interconnection sections
Web Usability: Principles and Evaluation Methods
153
Each page within the application includes a global navigation bar,
grouping links to the four application areas. Also, each page contains a
local navigation bar that groups links to local landmarks. Figure 5.1
shows a page from the People area, which displays information about a
DEI member. The global navigation bar is placed in the top region of the
page. The bar also includes a link to the non-public intranet and to the
Home Page. Landmarks defined locally for the People area are placed in
the top region of the left-side bar.
5.3.2 Ease of Content Access
Once users have identified the application’s main content classes, they
must be provided with “facilities” for accessing the specific content items
they are interested in.
Identification of Access Information Concepts
The design of access paths for retrieving core content items can be facilitated if designers augment the application content with access concepts,
which correspond to classification criteria or context over core concepts.
These enable users to move progressively from broader to narrower categories, until they locate the specific core concept of interest [49]. In general, multiple and orthogonal hierarchies of access concepts should be
related to every core concept.
Navigational Access and Search-Based Access
In order to facilitate access to specific instances of core concepts, access
concepts, defined at data level, should be used to construct navigational
access mechanisms. These typically consist of multi-level indexes, possibly distributed on several access pages, bridging pages with a high visibility (e.g. the Home Page or the entry page of each area), to pages devoted to
the publication of core concepts.
Especially in large Web applications, navigational access is often complemented with direct access, i.e. keyword-based search mechanisms,
which allow users to avoid navigation and to rapidly reach the desired
information objects. Direct access mechanisms are essential for interfaces
(such as those of mobile devices) that are not able to support multiple
navigation steps. In traditional hypertext interfaces, they enhance orientation when users “get lost” while moving along navigational access mechanisms [60,49].
154
Maristella Matera et al.
Pairing navigational and direct access with explicit visibility over available categorisations and free text queries, in addition to a regular use of
these access mechanisms within the hypertext, can greatly enhance content
accessibility.
Content Access in the DEI Application
In the DEI application, each area is provided with navigational and direct
access. Figure 5.2 shows the contextual access path defined for the core
concept DEI Member. It consists of a hierarchy of indexes, developed
through different access pages, which let users move from broader People
categories, presented in the application’s Home Page (e.g. academic
staff), to pages listing narrower sub-categories (e.g. all the categories of
the academic staff). In addition, users can move to the list of members in
a selected sub-category, from which they can select a person’s name and
access her/his corresponding page. Each page also provides direct access,
by means of a keyword-based search, for directly reaching single DEI
members.
a)
Direct access
Direct access
b)
c)
Fig. 5.2. Hierarchy of indexes in the navigational access to the DEI Member,
consisting of the Home Page (a), the Member Categories page (b), and the Category Member Index page (c). Pages (b) and (c) also include a keyword-based
search for direct access
Web Usability: Principles and Evaluation Methods
155
5.3.3 Ease of Content Browsing
Users must be able to easily identify possible auxiliary content related to
each single core concept, as well as the available interconnections between
different core concepts.
Core Concepts’ Structuring and Interconnection
The ease of use and learnability of a Web application can be enhanced by
supporting users’ understanding of the content structure and the semantic
interconnections defined between different content classes. Therefore,
when the core concepts represent a structured and complex concept, it is
recommended that they be expanded, using a top-down design, into a
composite data structure. Such structure collectively represents the entire
core concept, and is characterised by:
•
•
A central information content − which expresses the concept’s main
content and provides to an individual the means to identity each core
concept.
Some other peripheral information elements, which complete the concept’s description.
Semantic interconnections among core concepts must be established for
producing a knowledge network through which users can easily move, and
explore the information content [41]. If defined, interconnections allow
users to comprehend a Web application’s structure and how to navigate
through it efficiently.
Organisation of Core Pages
In order to highlight the structure of each core concept, and the interconnections between different concepts, pages devoted to core concept presentation should contain at least three sections:
•
•
•
A core section that clearly conveys the content associated with the core
concept.
A peripheral section that highlights auxiliary information – if any –
completing the core concept.
An interconnection section that represents links to other pages within
the area, or to the core contents of other areas.
The definition of the internal structure of pages by means of these three
sections facilitates the understanding of the information in the page. If
systematically repeated through the application, it enhances consistency
among the components displayed by pages [49]. In addition, it is perceived
156
Maristella Matera et al.
as a key component in helping users understand the application’s hypertextual structure, and to support a conscious shift of focus by users. Finally, if
the structure is explicitly annotated on the page mark-up code, it can be
used to build intelligent page readers, and thus enable accessibility to any
users.
Content Browsing in the DEI Application
As an example of the organisation of content browsing in the DEI application, let us consider the DEI Member page (see Fig. 5.1). The page features
as a core section a central region that presents the attributes qualifying a
DEI Member (e-mail address, postal address, department section, biography, list of personal pages). The page then includes two link groups:
•
•
The first refers to the page’s peripheral section, and points to pages
that contain further details about each DEI member (e.g. the list of
publications and courses).
The second represents the page’s interconnection section, thus enabling a semantic interconnection, and points to the research area the
member belongs to.
Note that such page structure also applies to pages in other application
areas.
5.4 Evaluation Methods
Applying principles for the design of usable applications is not sufficient
to ensure good usability of the final product. Even though systematic design techniques can be used, it is still necessary to check the intermediate
results, and to test the final application to verify if it actually shows the
expected features, and meets the user requirements. The role of evaluation
is to help verify such issues.
The three main goals of an evaluation are, first, to assess the application’s functionality; second, to verify the effect of the application’s interface on the user; third, to identify any specific problems with the application, such as aspects which show unexpected effects when used within the
intended context [20]. In relation to Web applications, an evaluation
should verify if the application design allows users to easily retrieve and
browse content, and to invoke available services and operations. Therefore, it implies not only to have the appropriate content and services available, but also to make them easily reachable to users by means of adequate
hypertext structures.
Web Usability: Principles and Evaluation Methods
157
Depending on the phase in which an evaluation is performed, it is possible to distinguish between formative evaluation, which takes place during
the design stage, and summative evaluation, which takes place after the
product has been developed, or when a prototype is ready. During the early
design stages, the goal of a formative evaluation is to provide feedback during the design activities by checking the design team’s understanding of the
users’ requirements, and by testing design choices quickly and informally.
Later, a summative evaluation can be used to identify users’ difficulties
using the application, and help improve the final product or prototype.
Within these two broad categories, there are different methods that can
be used at different stages of the development cycle of an application. The
most commonly adopted methods are user testing, where the real users
participate, and usability inspection, which is conducted by specialists.
Recently, Web usage analysis has also emerged as a method for studying
user behaviour through the computation of access statistics, and the reconstruction of user navigation on the basis of Web access logs.
The remainder of this section illustrates the main features of these three
classes of evaluation methods, and also highlights their advantages and
drawbacks.
5.4.1 User Testing
User testing aims to investigate real users’ behaviour, observed using a
representative sample of real users [46]. It requires users to perform a set
of tasks using physical artefacts, which can be either prototypes or finished
applications, while an investigator observes their behaviour and gathers
data about the way users execute assigned tasks [20,55,68]. In general the
data gathered during such investigations are user’s execution time, number
of errors, and user satisfaction. After the user test is complete, the collected
data are analysed and used to improve the application’s usability.
Usability testing is explicitly devoted to analysing in detail how users interact with the application while accomplishing well-defined tasks. This
characteristic differentiates between usability and beta testing, which is
largely applied in industry. Beta testing is always carried out using the final
product, where after an application’s release, end users are asked about their
satisfaction with the product. Conversely, usability testing is conducted by
observing a sample of users that perform specific tasks while interacting
with the application. The test is usually video recorded. The list of detected
problems is reported, in addition to specific redesign suggestions.
To avoid unreliable and biased results, the design of a user test evaluation and its execution should be carefully planned and managed. A good
usability test should involve the following steps:
158
Maristella Matera et al.
1. Define the goals of the test. The objective of the evaluation can be
generic (e.g. to improve end users’ satisfaction with and the design of a
product); or it can be specific (e.g. to evaluate the effectiveness of a
navigational bar for user orientation, or the readability of labels).
2. Define the user sample to participate in the test. The user sample for
the test should be representative of the population of end users that will
use the application or prototype under scrutiny. Failing to do so will
provide results that cannot be generalised to the population of real users. Possible criteria to use to define the sample are: user’s experience
(experts vs. novices), age, application’s frequency of use, and experience with similar applications. The number of participants can vary,
depending on the objectives of the test. Nielsen and Molich [52] assert
that 50% of the most important usability problems can be identified
with three users. Other authors claim that five users enable the discovery of 90% of usability problems [47,64]. Note that the use of very
small samples not suggested by the literature on empirical investigations: thus, within the context of this book, they are simply informative.
3. Select tasks and scenarios. The tasks to be carried out during the test
have to be real, i.e. they have to represent the activities people would
normally perform with the application. Task scenarios can be obtained
from the requirements phase. In addition, tasks can also be intentionally prepared to test unexpected situations.
4. Define how to measure usability. Before conducting a usability test, it is
important to define the attributes that will be used to measure the results.
Such attributes, or measures, can be qualitative1 (e.g. user satisfaction, or
difficulty of use), or quantitative (e.g. task completion time, number and
typology of errors, number of successfully accomplished tasks, the
amount of time users invoke help (verbal, on- line help, manual)). Users’
anonymity should be guaranteed, and participants should also be provided with the test results. Besides observing, an investigator can also
use other techniques for gathering data on task execution. Examples of
such techniques are: the think aloud protocol, in which a subject is required to talk out loud while executing tasks, explaining the actions (s)he
is trying to tackle, their reason, and the expectations; the co-discovery (or
collaborative) approach, in which two participants execute the tasks together, helping each other; the active intervention, in which the investigator asks participants to reflect upon the events of the test session. It is
worth noting that such techniques do not provide ways for measuring users’ satisfaction. Such subjective measurement can instead be obtained
through survey techniques, based on the use of questionnaires and interviews [35,58], to be answered by users after the completion of testing.
1
Qualitative measures are also known to be subjective.
Web Usability: Principles and Evaluation Methods
159
5. Prepare the material and the experimental environment. The experimental environment should be organised and equipped with a computer and a video camera for recording user activities. In addition, it is
also important to establish the roles of the investigative team members,
and prepare any supporting material (e.g. manuals, pencils, paper).
Prior to running the test, a pilot trial is necessary to check, and possibly
refine, all test procedures. Note that it is not mandatory to execute the
test in a laboratory.
5.4.2 Inspection Methods
User testing is considered the most effective way of assessing the use of
products and prototypes, from a real user’s point of view. However, user
testing is an expensive activity. In addition, to be useful, feedback needs to
be obtained at earlier stages in the development process, and repeated
throughout the process. Such constraints have led to the proposal of usability inspection methods, to be used by developers to predict usability problems that could be detected through user testing.
Usability inspection refers to a set of evaluation techniques that have
evolved from inspection methods, used in software engineering, to debug
and improve code. Within the context of usability, inspectors examine
usability-related aspects of an application, to detect violations of established usability principles [51], and to provide feedback to designers about
necessary design improvements. Such inspectors can be usability specialists, designers, and engineers with special expertise (e.g. knowledge of
specific domains or standards). To be effective, inspection methods rely
upon a good understanding of usability principles, how these principles
affect the specific application being analysed, and the skills of the inspector to discover problems where the main violations occur.
Usability inspection methods were proposed as a cost-effective alternative to traditional usability evaluation [8]. The cost of user test studies and
laboratory experiments became a central issue, and therefore many usability evaluation techniques were proposed, based on the involvement of specialists to supplement or even replace direct user testing [51,52].
Different methods can be used for inspecting an application [51]. The
most commonly used method is heuristic evaluation [45,51], in which
usability specialists judge if an application’s properties conform to established usability principles. Another method is cognitive walkthrough
[54,67], which uses detailed procedures for simulating users’ problemsolving processes, to assess if the functions provided by the application
are efficient for users, and can lead them to correct actions. The remainder of this section describes these two methods in more depth.
A detailed description of other inspection techniques is provided in [51].
160
Maristella Matera et al.
Heuristic Evaluation
Heuristic evaluation is the most informal of inspection methods. It prescribes having a small set of experts analysing the application against a list
of recognised usability principles − the heuristics. This technique is part of
the so-called discount usability method. In fact, research has shown that it
is a very efficient usability engineering method [32], with a high costbenefit [47].
During the evaluation session, each evaluator goes through the system
interface at least twice. The first step is to obtain an overall understanding
of the flow of interaction and the general scope of the application. The
second step focuses on specific objects and functionality, evaluating their
design and implementation against a list of heuristics. The output of a heuristic evaluation session is a list of usability problems with reference to the
violated heuristics (see Table 5.2 for an example). The reporting of problems caused by the violation of heuristics enables an easy generation of a
revised design. The revised design is prepared in accordance with what is
prescribed by the guidelines underlying the violated principles. Once the
evaluation has been completed, the findings of the different evaluators are
compared and aggregated.
Table 5.2. An example of table for reporting heuristic violations
Found problem
Download time
is not indicated
Violated heuristic
Feedback
Severity
Suggested improvement
High
Use a scrolling bar for
representing the time left
till the end of download
Heuristic evaluations are especially valuable when time and resources
are short, given that skilled evaluators can produce high-quality results in a
limited amount of time, without the need for real users’ involvement [34].
In principle, heuristic evaluation can be conducted by a single evaluator.
However, in an analysis of six studies, it has been found that single evaluators are able to find only 35% of the total number of existing usability problems [43], and that different evaluators tend to find different problems.
Therefore, it seems that the more experts involved in the evaluation, the
greater the number of different problems that can be identified. Figure 5.3
shows the percentage of usability problems found by number of evaluators,
as reflected by a mathematical model defined in [50]. The curve suggests
that five evaluators may be able to identify close to 75% of usability
problems; however, such results should be interpreted with caution, since
they are reliant on the data from which they were obtained.
Web Usability: Principles and Evaluation Methods
161
Fig. 5.3. The percentage of usability problems found by heuristic evaluation when
using different numbers of evaluators [50]
Heuristic evaluations can have a number of drawbacks, with the major
one being a high dependence on the skills and experience of the evaluators
[21,33,34]. Nielsen states that novice evaluators with no usability expertise
are poor evaluators, that usability experts are 1.8 times as good, and that
application domain and usability experts are 2.7 times as good [44,45].
These results suggest that specific experience with a specific category of
applications may significantly improve evaluators’ performance.
Cognitive Walkthrough
A cognitive walkthrough simulates the user’s problem-solving process, i.e.
what the user will do in specific situations of use and why [54]. Evaluators
go through the interface, step by step, using a task scenario, and discuss
the usability issues as they arise. In particular, the method guides evaluators in the analysis of the actions that users would accomplish to reach the
objectives defined in the scenario, by means of the identification of the
relationship between user goals, actions, and the visible states of the application interface [27]. As such, cognitive walkthrough is particularly suited
for the detection of problems affecting an application’s learnability.
Cognitive walkthrough is a technique largely applied to evaluating aspects of an application’s interface. Its use is recommended in the advanced
phases of Web application development, to evaluate high-fidelity prototypes for which the interaction functionalities already work. The typical
cognitive walkthrough procedure prescribes that, on the basis of selected
scenarios of use, a series of tasks are chosen to be performed by an expert
evaluator on the interface. The evaluator executes such tasks, and after the
162
Maristella Matera et al.
completion of each elementary action (s)he interprets the application’s
answer, and evaluates the steps forward for the achievement of the end
user’s goal, by answering the following standard questions:
1. Are the feasible and correct actions sufficiently evident to the user, and
do the actions match with her/his intention?
2. Will the user associate the correct action’s description with what (s)he
is trying to do?
3. Will the user receive feedback in the same place where (s)he has performed her/his action and in the same modality?
4. Does the user interpret the system’s response correctly: does (s)he
know if (s)he has made a right or wrong choice?
5. Does the user properly evaluate the results: is (s)he able to assess if
(s)he got closer to her/his goal?
6. Does the user understand if the intention (s)he is trying to fulfil cannot
be accomplished with the current state of the world: does (s)he find alternative goals?
During this interpretation process, it is also possible that the user/evaluator needs to change her/his initial goal because it is impossible to
achieve. Each negative answer to the previous questions increments the list
of detected problems. At the end of the evaluation session, the list of problems is completed with the indications of possible design amendments, and
communicated back to the design team.
Web Usage Analysis
A recent direction in the evaluation of Web applications is called Web
usage analysis [30]. It is performed using the recorded users’ access to the
application’s Web pages, stored in a Web server log [61], according to one
of the available standard formats [71]. This technique can only be used
once a Web application is deployed, and can be used to analyse how users
exploit and browse the information provided by the application. For instance, it can help discover navigation patterns that correspond to high
Web usage, or those which correspond to early leaving.
Very often, Web logs are analysed with the aim of calculating traffic
statistics. Such type of analysis can help identify the most accessed pages
and content, and may therefore highlight user preferences. These preferences may not have been previously anticipated during the design stage,
and may therefore need to be incorporated by restructuring the application’s hypertext structure.
Traffic analysis is not able to detect users’ navigational behaviour. To
allow a deeper insight into users’ navigation paths, the research community has investigated techniques to reconstruct user navigation from log
Web Usability: Principles and Evaluation Methods
163
files [17,19,22,23]. Most techniques are based on the extensions of Web
logging mechanisms, used to record additional semantic information about
the content presented in the pages accessed. This information can later be
used to understand observed frequent paths and corresponding pages [6].
Such extensions exploit Semantic Web techniques, such as RDF annotations for mapping URLs into a set of ontological entities. Also, recent
work [23,56] has proposed conceptual enrichment of Web logs through the
integration of information about a page’s content and the hypertext structure deriving from the application’s conceptual specification. The reconstruction of a user navigation can then be incorporated into automatic
tools, which provide designers and evaluators with statistics about identified navigation paths. Such paths can be useful to evaluate and improve an
application’s organisation with respect to its actual usage.
User navigation paths can also be analysed by means of Web usage mining techniques, which apply data mining techniques on Web logs to identify associations between visited pages and content [17,22]. With respect
to the simple reconstruction of user navigation, Web usage mining can
discover unexpected user behaviour, not foreseen by the application designers. The user behaviour can be a symptom of a poor design, rather than
a defect. The aim is to identify possible improvements that accommodate
such user needs.
Different techniques can be used to mine Web logs. Mining of association rules is probably the one used the most. Association rules [1] are implications of the form X Y, stating that in a given session where the X log
element (e.g. a page) is found, the Y log element is also very likely to be
found. Methods for discovering association rules can also be extended to
the problem of discovering sequential patterns. These are extensions of
association rules to the case where the relation between rule items specifies
a temporal pattern. The sequential pattern of the form X.html Y.html
states that users, who in a session visit page X.html, are also likely to next
visit page Y.html in the same session [62].
The discovery of association rules and sequential patterns is interesting
from the Web usage perspective, because the results produced can provide
evidence of content or pages that are frequently associated. If this behaviour is not supported by proper navigational structures, connecting such
content to pages, then it can suggest possible improvements to ease content
browsing.
A drawback of Web usage mining techniques is that they require a
substantial amount of pre-processing2 [17,61]. In particular, user session
identification can be very demanding, since requests for pages tracked in
2
To extract user navigation sessions containing consistent information, and to
format data in a way suitable for analysis.
164
Maristella Matera et al.
Web logs may be compromised due to proxy servers, which do not allow
the unique identification of users [18]. Solutions to circumvent this problem are illustrated in [18].
Comparison of Methods
User testing provides reliable evaluations, because its results are based on
user samples representative of the population of real users. It helps evaluators overcome problems, such as lack of precision of predictive models
whenever the application domain is not supported by a strong and detailed
theory. User testing, however, has a number of drawbacks. The first is the
difficulty to select a sample representative of the population of real users,
since the identification of such a population is sometimes not straightforward. A sample that does not represent the correct population provides results unlikely to be of use. The second drawback is that it can be difficult to
train users, within a limited amount of time, to master advanced features of
a Web application. This can lead to shallow conclusions, in most cases only
related to the simple application features. The third drawback is that the
limited amount of time available for user tests makes it difficult to mimic
real usage scenarios. Such scenarios require the provision of a real environment where the application is to be used, and also the motivations and
the goals that users may have in real-life situations [37]. Failure to reproduce such a context may lead to unrealistic results and conclusions. Finally,
the fourth drawback is that user observation provides little information
about the cause of a problem, since it deals primarily with the symptoms
[21]. Not understanding the underlying cause has implications for an application’s redesign. In fact, the new design can remove the original symptoms, but if the underlying cause remains, a different symptom may result.
Unlike user testing, inspection methods enable the identification of the
underlying cause of a problem. Inspectors know exactly which part of the
design violates a usability principle, and how. The main advantage of inspection methods, compared to user testing, is that they can be carried out
with a smaller number of people, i.e., they are conducted by usability and
human factor experts, who can detect problems and possible future faults
of a complex system in a limited amount of time. This is in our view a
relevant point, which strongly supports the use of usability evaluations
during the design activities. In fact, it constitutes an inexpensive add-on to
existing development practices, easily enabling the integration of usability
goals into those of the software design and development [21]. Furthermore, inspection techniques can be used early on in the development process lifecycle, using if necessary design specifications, whenever a prototype is not yet available.
Web Usability: Principles and Evaluation Methods
165
The three main disadvantages of inspection methods are, first, the great
subjectivity of the evaluation − different inspectors may produce incomparable outcomes; and second, the strong dependency upon inspectors’ skills.
Third, experts can misjudge the reactions of real users in two ways, i.e. not
detecting potential problems, or discovering problems that will not be relevant for real users.
According to Brooks [12], usability inspection methods cannot replace
user testing because they are not able to analyse aspects, such as trade-offs,
the entire interface acceptability, or the accuracy of a user’s mental model.
Also, they are not suitable to define the most usable interface out of several,
or anything that relates to a preference. However, usability testing cannot
predict if an interface will “just do the job” or will “delight the user”; this
type of information is, however, important within the context of a competitive user market share. Therefore it may be beneficial also to consider features that can distinguish an interface from good to excellent, rather than to
focus solely on its problems, which is what usability inspection does.
The analysis of Web server logs seems to solve a series of problems in
usability evaluation, as it may reduce the need for usability testing. Also,
with respect to the experimental settings, it offers the possibility of analysing the behaviour of a higher number of users, compared to user tests, increasing the number of attributes that can be measured, and the reliability
of the detected errors. However, the use of Web server log files is not
without problems of its own. The most severe relates to the meaning of the
information collected and how much it describes real users’ behaviour.
Even when logs are effective in finding patterns in the users’ navigation
sessions, they cannot be used to infer users’ goals and expectations, central
for a usability evaluation.
5.5 Automatic Tools To Support Evaluations
Automatic tools have been suggested as the most efficient means to treat
repetitive evaluation tasks, without requiring much time and skills from
human resources. There are three main categories of Web evaluation tools
[11], which cover a large set of tests for usability and accessibility:
•
Tools for accessibility analysis. Measures that can be automatically
collected by these tools correspond to official accessibility criteria
(such as those prescribed by W3C), and refer to properties of the
HTML page coding, such as browser compatibility, use of safe colours, appropriate colour contrast, etc. Examples are Bobby [10],
A-Prompt [3], and LIFT [36].
166
•
•
Maristella Matera et al.
Tools for usability analysis. These tools verify usability guidelines by
analysing an application’s design. They operate predominantly at the
presentation layer, with the aim of discovering problems, such as the
consistency of content presentation and navigation commands (e.g.
link labels, colour consistency). They often neglect structural and
navigation problems, although recent proposals (see for example [23])
plan to address such issues, by focusing on the identification of structural problems in the hypertext definition. Examples are CWW [9],
WebTango [31], and WebCriteria SiteProfile [65].
Tools for Web usage analysis. These tools allow the computation of
statistics about an application’s activities, and mine data about user behaviour. The majority of commercial tools (see for example [2,4]) are
traffic analysers. Their functionality is limited to producing the following reports and statistics [22]:
− Site traffic reports, such as total number of visits, average number
of hits, average view time.
− Diagnostic statistics, such as server errors and pages not found.
− Referrer statistics, such as search engines accessing the application.
− User statistics, such as top geographical regions.
− Client statistics, such as users Web browsers and operating systems.
Research has been recently proposed to analyse user navigation paths, and
to mine Web usage [7,17,42].
While the adoption of automatic tools for Web log analysis is mandatory, an important observation must be made about the first two categories
of tools. Such tools constitute valuable support to reduce the effort required to manually analyse an entire application with respect to all of the
possible usability problems. However, they are not able to exhaustively
verify usability issues. In particular, they cannot assess any properties that
require judgement by a human specialist (e.g. usage of natural and concise
language). Also, automatic tools cannot provide answers about the nature
of a discovered problem and the design revision that can solve it.
Automatic tools are therefore useful when their use complements the activity of human specialists, since they can execute repetitive evaluation
tasks to inspect the application, and highlight critical features that should
later be inspected by evaluators.
5.6 Evaluation of the DEI Application
The DEI application has been developed by means of an iterative development process, in which several incremental application versions have
Web Usability: Principles and Evaluation Methods
167
been released, evaluated, and improved based upon problems raised by the
evaluations. Such a process has been enabled by the ease of prototype generation, due to the adoption of a modelling language, WebML [14], and its
accompanying development tool [13,66], offering a visual environment for
composing WebML-based specifications of an application’s content and
hypertext, and a solid XML and Java-based technology for automatic code
generation.
The guidelines introduced in Sect. 5.3 have been taken into account during the application design. However, in order to further validate usability,
several evaluation sessions, through different evaluation methods, have
been conducted. In particular:
•
•
•
Inspection sessions to examine the hypertext specification have been
conducted, using an automatic tool aimed at discovering structural
problems related to the definition of erroneous or inconsistent navigation mechanisms.
After the application delivery, Web logs have been analysed to verify
if the application structure envisioned by the application designers
matched user needs, or if some unexpected behaviours could occur.
The released prototypes, as well as the delivered final application, have
been analysed through heuristic evaluations, to further identify problems that could not easily be revealed through the analysis of design
specifications.
5.6.1 Design Inspection
Design inspections have been carried out over 14 successive versions of
the DEI application, by applying different procedures to evaluate structural
properties, such as its internal consistency and the soundness of navigation
mechanisms.
Thanks to the availability of the XML-based representation of the hypertext specification, generated by the adopted development tool, the inspection was conducted automatically through the adoption of WQA (Web
Quality Analyzer) [23], an XSL-based tool able to parse the XML specification for retrieving usability problems. In particular, the tool inspects the
application design, looking for configurations that are considered potential
sources of problem. Thus, it executes analysis procedures aimed at verifying if any configurations found violate usability.
In the following section we will illustrate two main problems we identified within the content management area used by Web administrators.
168
Maristella Matera et al.
Consistency of Operation Design
Some of our inspection procedures aimed to verify the design consistency
of content management operations, used to create and modify an application’s content, within the content management area. In particular, they had
to identify all occurrences of operations within pages, and to verify if their
invocation, and the visualisation of results after their execution, was coherent across the entire application.
Fig. 5.4 plots the history of the Modify Termination (MT) evaluation
procedure along several releases of the DEI application. Such procedure
allowed us to evaluate and measure the consistency of visualisation results
for content modification operations, with respect to two possible variants:
visualisation of the modified content (i) in the same page where the operation was invoked (Same Page Termination variant), or (ii) in a new page
(Different Page Termination variant). The procedure thus entailed:
Modify Termination
Values
1. To identify all the modification operations specified in the application’s hypertext;
2. To compute the statistical variance (a value between 0 and 1) with respect to the occurrences of the two different termination variants, normalised with respect to the best-case variance (see [24] for further details).
A
0.80
B
C
D
0.60
0.40
0.20
0.00
1
2
3
4
5
6
7
8
9
10
11
12
13
14
DEI Web Application Releases
Fig. 5.4. History of the MT computation along the different DEI application releases
The plot in Fig. 5.4 highlights four different phases (from A to D) in the
application’s development. Phase A is characterised by a limited care regarding the design consistency. The initial high value of the computed
measures for release 1 depended on the limited number of modification
operations in the application at that time. However, as soon as the number
of modification operations in the following releases started to grow, the
Web Usability: Principles and Evaluation Methods
169
consistency value for the termination of modification operations decreased,
and reached its lowest value (0.04) in release 5. At this point, the evaluators raised the problem. Therefore during phase B, the application designer
modified the application, trying to use a single design variant (the Same
Page termination variant) in almost every case. Release 6 clearly shows the
improvement obtained by the re-engineering activity, with the variance
value going from 0.04 to 0.54. Improvement is also noted in relation to the
percentages of use of the two variants in releases 5 and 6, as detailed in
Table 5.3.
Table 5.3. The percentage of the occurrences of the two different Modify pattern
variants within releases 5 and 6
Release 5
Release 6
Different page termination
42,86%
12,5%
Same page termination
57,14%
87,5%
Starting from release 7 (phase C), the MT measure computation has
reached a constant value − no modifications have been applied to the
modification operation. From release 12 to 14 (phase D), we have instead
assisted with improving the application’s consistency. The last computed
value for the MT metrics was 0.76, which corresponds to an acceptable
level of consistency with respect to the set usability goals.
Identification of Dead-ends
Besides verifying consistency, some inspection tasks were performed to
discover structural weaknesses within the application’s hypertext. One
particular inspection procedure, executed on DEI’s hypertext specification,
aimed to discover dead-ends. Dead-ends are pages reachable by different
navigation paths preventing the user from navigating further. The only
choice they give to a user is to go back (e.g. by hitting a browser’s “back”
button). These pages either have no outgoing links, or activate operations
that end up where they started, thus making navigation difficult.
While analysing the entire application’s structure we identified 20 occurrences of the dead-end pattern. A closer look at each occurrence revealed that all dead-ends were pages reached whenever a database update
operation failed. In such a situation, the user is presented with a page displaying an error message, and is unable to navigate further, to recover
from the error. According to the “ease of browsing” usability criterion, the
designer should have inserted a link to allow a user to go back to the initial
page from which the operation had been invoked.
170
Maristella Matera et al.
It is worth noting that dead-end pages would have been difficult to find
by means of user testing. The reason is that the investigator would need to
induce database-related errors in order to obtain one of these pages. This is
therefore an example where the analysis of design specifications (automatic or not) to verify structural properties can support the identification of
usability problems that may be difficult to find.
5.6.2 Web Usage Analysis
Once the WEI application was deployed, we carried out a Web usage
analysis. To do so, we analysed Web logs to reconstruct user navigation
patterns, and to identify possible critical situations that had been encountered by users. In particular, the analysis focused on navigational access
mechanisms.
Navigation Sequences for Accessing a DEI Member’s Page
The computation of traffic statistics was used to identify that, apart from
the application’s Home Page, the other most visited URL corresponded to
the page showing the details for a single DEI member, with links to the
member’s publications, personal home page, and research areas. The navigational facilities provided by the application enabled us to examine the
means employed by users to reach the page. As can be observed in
Fig. 5.2, the page is reachable through a navigational access that takes a
user from the Home Page to an index page, thus providing two different
ways to reach a single member’s page:
•
•
The first using an index of member categories (e.g. professor, lecturer),
which allows for the selection of a category and, using a different page,
the selection of a specific member from an alphabetically ordered list.
The second using a search-based direct access.
Given these navigational access facilities, we wanted to monitor their
usage, to find out whether they showed any usability problems. In order to
identify navigational sequences, we adopted a module of the WQA tool
[23], which is able to analyse Web logs and to reconstruct user navigation.
The analysis of logs for 15 days showed that the navigational sequence
from the index page to the DEI member page was followed about 20,000
times during that period, and that:
•
•
The indexes of categories and members were used less than 900 times.
Times users went through the search box more than 19,000.
Web Usability: Principles and Evaluation Methods
171
These results suggested either that users did not know which category to
use when looking for a DEI member, or that the navigational access was
not easy to use and needed improvements. This feedback is currently being
taken into account by the application designers, to assess the merits of redesigning the application.
Another problem related to the access to the DEI member pages was
identified while carrying out a mining session on the DEI Web logs, to
discover possible association rules [42]. The mining query was aimed at
discovering the page sequences most frequently visited. Results showed
that requests for a “research area” page were later followed by a request to
a DEI member page. A possible reason for this behaviour could be that
users perceive the “research area” page as an access page for the DEI
members. This result supports the view that users are not making use of
the navigational access on DEI members, as envisioned by the application
designers.
Navigation Sequences for Accessing a DEI Member’s
Publications
Another problem also identified was related to accessing the DEI member’s publication details. The “Publication Details” page3 can be reached
from four distinct pages: “Search Publications,”4 “All Publications,”5 “DEI
Member Publications”6, and “Project Publications”7. Yet again, our goal
was to discover the mostly used path to reach the “Publication Details”
page. To do so, we gathered data on all the navigation sequences that contained the “Publication details” page.
The analysis of 15 days of Web logs revealed that the “Publication Details” page had been visited 440 times during that period, organised as
follows:
•
•
3
4
5
6
7
The “Publication Details” page was reached 420 times from the “DEI
Member Publications” page.
Of the remaining 20 times: the page “Publication Details” was reached
8 times from “Project Publications”, 7 times from the “All Publications” page, and twice from the “Search Publications” page.
Page that shows the full details of a publication.
Page that provides a keyword-based search.
Page that displays the list of all the publications.
Page that shows the publication list for a specific DEI member.
Page that displays the publications produced in the context of a specific research project.
172
Maristella Matera et al.
To reach the “Publication Details” page the “DEI Member Publications”
page seems very likely to occur, therefore the results were not surprising.
However, the small number of times that other pages were used to reach
the “Publication Details” page was a concern. To understand these results,
we inspected the application’s hypertext design, to consider all the navigational sequences that reached the “Publication Details” page from pages
other than the “DEI Member Publications” page. The inspection results
showed that the “All Publications” and “Search Publications” pages were
only reachable through links displayed in the “Publication Details” page.
Therefore the reason for the low usage of such pages is that they were not
visible to users.
Note that a problem such as this could not be identified solely by analysing the design, as this suggests that both pages can be reached using
links from “Publication Details” page. The analysis of the design would
not therefore take both pages as “unreachable”. In addition, this problem
could not be identified using a heuristic evaluation as the hypertext structure employed does not violate any usability principles. This is therefore
an example that supports the need for observing real users’ behaviour.
Heuristic Evaluation of Hypertext Interface
To achieve more complete and reliable evaluation results, design inspection and Web usage analysis were complemented with a heuristic evaluation session, conducted by expert evaluators from outside the design team.
The aim of this evaluation was to assess the usability of the hypertext’s
presentation layer, which had not been addressed by the two previous
evaluations.
Nielsen’s heuristics were considered as the benchmark criteria. Results
indicated problems related to the effectiveness of the language adopted.
For example, the DEI Home Page (see Fig. 5.2) shows content categories
that are related to the application core concepts. The same content categories are presented within each page using a permanent navigation bar.
However, the category “Staff” in the Home Page is displayed as “People”
in the navigation bar available in all remaining pages. The solution to the
naming inconsistency is always to use the same category name.
Another problem, also related to naming conventions, is related to the
inconsistent semantics of the “Details”, within a page that shows the publications for a DEI professor (see Fig. 5.5). The link “Details” on the left
side of the navigation bar is used as a link to the DEI “Member” page.
However, the link “Details” underneath each publication is used as a link
to the detailed description of a particular publication. Interpreting the problem in the light of Nielsen’s heuristics, we can therefore observe that:
Web Usability: Principles and Evaluation Methods
173
1. A system-oriented language has been employed, rather than a userorientated language. In fact, “Details” was the term constantly referred
to by application designers, during the application’s development, to
represent the presentation of an information entity’s detailed contents
(e.g. DEI “Member” and “Publication”). This problem is therefore an
example where the interface has not been user-centred. Such a problem
can be solved by assigning meaningful names to links clearly indicating the contents to be displayed in the target page.
2. To adopt the same name to denote two different concepts means users
have to “remember” the interaction model implemented, rather than allowing them to “recall” such a model. The interface does not make objects, actions, and options visible, thus requiring the user to remember
how to reach the content across different application areas, and different interaction sessions.
Fig. 5.5. Ambiguous semantics for the “Details” link name
5.7 Concluding Remarks
Web applications are quickly growing in number and complexity, becoming the de facto standard for distributed applications that require human
interaction. The increasing number of Web users, the diversity of application domains, content, the complexity of hypertext structures and
174
Maristella Matera et al.
interfaces, all encourage the use and measurement of usability as a determinant factor for the success of such applications.
The process by which engineering principles are applied to developing
Web applications started only recently [5,14,57]. Web engineering provides application designers with a collection of tools and languages to
accelerate the development process, and to enforce a level of syntactic
correctness, allowing for semi or complete, automatic code generation.
Syntactic correctness prevents a designer from specifying an application
that has defects. However, a quality application is more than a piece of
defect-free code.
Applications that incorporate usability engineering into their development process are expected to comply with quality requirements. In particular [38]:
1. Evaluation is the key for assuring quality: the effort devoted to an
evaluation can directly determine the quality of the final application.
2. To be effective, an evaluation must rely upon suitable and validated
usability criteria.
This chapter provided an overview of methods currently adopted in assessing the usability of Web applications, and criteria that can be applied
to the evaluation of Web applications. In addition, this chapter also highlighted the advantages and drawbacks of different usability methods so as
to help practitioners choose the most suitable method with respect to the
evaluation goals.
Independent of the method chosen, practitioners and researchers suggest
that a sound usability evaluation plan should include the use of different,
complementary methods, to ensure the completeness of the evaluation
results. The characteristics of each method determine their effectiveness in
discovering a specific class of usability problems.
The adoption of automatic tools can improve the reliability of the
evaluation process. As reported in [11], tools for automatic analysis can
address some of the issues that prevent developers from adopting evaluation methods. In particular, tools are systematic, fast, and reliable, and can
be effectively adopted for tackling repetitive and time-consuming evaluation tasks. Also, tools may allow developers to code and execute procedures for the verification of in-house guidelines, making them easily enforceable.
However, tools may help verify structural properties, but fail to assess
properties that require specialised human judgement, to provide answers
explaining the nature of a given problem, and suggestions on how to fix it.
Automatic tools are therefore very useful when their use complements the
activity of human specialists.
Web Usability: Principles and Evaluation Methods
175
References
1
Agrawal R, Imielinski T, Swami A. 1993) Mining Association Rules Between
Sets of Items in Large Databases. In: Proceedings of ACM-SIGMOD 93, Washington, DC, May, pp 207–216
2
Analog. (2005) http://www.analog.cx. (accessed on 18th January 2005)
3
A-Prompt Project. (2005) http://aprompt.snow.utoronto.ca/ (accessed 18 January 2005)
4
AWSD-WebLog. (2005) http://awsd.com/scripts/weblog/index.shtml (accessed 18 January 2005)
5
Baresi L, Garzotto F, Paolini P (2001) Extending UML for Modeling Web
Applications. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences, Maui, USA, January
6
Berendt B, Hotho A, Stumme G (2002) Towards Semantic Web Mining. In:
Proceedings of the 1st International Semantic Web Conference, Sardinia, Italy, June. Springer, Berlin, LNCC. 2342, pp 264–278
7
Berendt B, Spiliopoulou M (2000) Analysis of Navigation Behaviour in Web
Sites Integrating Multiple Information Systems. J Very Large Data Bases,
9(1):56–75
8
Bias RG, Mayhew DJ (1994) Cost-justifying usability. Academic Press, Boston, MA
9
Blackmon MH, Polson PG, Kitajima M, Lewis C (2002) Cognitive Walkthrough for the Web. In: Proceedings of the 2002 International Conference on
Human Factors in Computing Systems, Minneapolis, USA, April, pp 463–470
10 Bobby. (2005) http://bobby.watchfire.com/bobby/html/en/index.jsp (accessed
18 January 2005)
11 Brajnik G (2004) Using Automatic Tools in Accessibility and Usability Assurance. In: Proceedings of the 8th International Workshop on User Interface
for All, Vienna. June, Springer, Berlin, LNCC 3196, pp 219–234
12 Brooks P (1994) Adding Value to Usability Testing. In: Nielsen J, Mack RL
(eds) Usability Inspection Methods. Wiley, New York, pp 255–271
13 Ceri S, Fraternali, P (2003) Architectural Issues and Solutions in the Development of Data-Intensive Web Applications. In: Proceedings of the First Biennial Conference on Innovative Data Systems Research, Asilomar, USA, January
14 Ceri S, Fraternali P, Bongio A, Brambilla M, Comai S, Matera M (2003)
Designing Data-Intensive Web Applications, Morgan Kaufmann, San Francisco, CA
15 Ceri S, Fraternali P, Matera M (2002) Conceptual Modeling of Data-Intensive
Web Applications. IEEE Internet Computing, 6(4):20–30
16 Conallen J (2002) Building Web Applications with UML, Addison-Wesley,
Boston, MA
176
Maristella Matera et al.
17 Cooley R (2003) The Use of Web Structures and Content to Identify Subjectively Interesting Web Usage Patterns. ACM Transactions on Internet Technology 3(2):93–116
18 Cooley R, Mobasher B, Srivastava J (1999) Data Preparation for Mining
World Wide Web Browsing Patterns. J Knowledge and Information Systems,
1(1):5–32
19 Cooley R, Tan P, Srivastava J (2000) Discovery of Interesting Usage Patterns
from Web Data. In: Proceedings of the 1999 International Workshop on Web
Usage Analysis and User Profiling, San Diego, USA, August. Springer, Berlin, LNCC 1836, pp 163–182
20 Dix A, Finlay J, Abowd G, Beale R (1998) Human-Computer Interaction, 2nd
edn. Prentice Hall, London
21 Doubleday A, Ryan M, Springett M, Sutcliffe A (1997) A Comparison of
Usability Techniques for Evaluating Design. In: Proceedings of the 1999
Symposium on Designing Interactive Systems: Processes, Practices, Methods
and Techniques, Amsterdam, the Netherlands, August, pp 101–110
22 Eirinaki M, Vazirgiannis M (2003) Web Mining for Web Personalization. J
ACM Transactions on Internet Technology, 3(1):1–27
23 Fraternali P, Lanzi PL, Matera M, Maurino A (2004) A Model-Driven Web
Usage Analysis for the Evaluation of Web Application Quality. Web Engineering, 3(2):124–152
24 Fraternali P, Matera M, Maurino A (2002) WQA: An XSL Framework for
Analyzing the Quality of Web Applications. In: Proceedings of the Second
International Workshop on Web-Oriented Software Technologies, Malaga,
Spain, June
25 Garzotto F, Matera M (1997) A Systematic Method for Hypermedia Usability
Inspection. New Review of Hypermedia and Multimedia, 6(3):39–65
26 Hull L (2004) Accessibility: It’s Not Just for Disabilities Any More. ACM
Interactions, 41(2):36–41
27 Hutchins EL, Hollan JD, Norman DA (1985) Direct manipulation interfaces.
Human-Computer Interaction, 1:311–338
28 IBM (2005) Ease of Use guidelines.
http://www-306.ibm.com/ibm/easy/eou_ext.nsf/publish/558 (2005). (accessed
18 January 2005)
29 ISO (1997) ISO 9241: Ergonomics Requirements for Office Work with Visual Display Terminal (VDT) Parts 1–17
30 Ivory MY, Hearst MA (2001) The State of the Art in Automating Usability
Evaluation of User Interfaces. ACM Computing Surveys, 33(4):470–516
31 Ivory MY, Sinha RR, Hearst MA (2001) Empirically Validated Web Page
Design Metrics. In: Proceedings of the ACM International Conference on
Human Factors in Computing Systems, Seattle, USA, April, pp 53–60
Web Usability: Principles and Evaluation Methods
177
32 Jeffries R, Desurvire HW (1992) Usability Testing vs. Heuristic Evaluation:
Was There a Context? ACM SIGCHI Bulletin, 24(4):39–41
33 Jeffries R, Miller J, Wharton C, Uyeda KM (1991) User Interface Evaluation
in the Real Word: A Comparison of Four Techniques. In: Proceedings of the
ACM International Conference on Human Factors in Computing Systems,
New Orleans, USA, pp 119–124
34 Kantner L, Rosenbaum S (1997) Usability Studies of WWW Sites: Heuristic
Evaluation vs. Laboratory Testing. In: Proceedings of the ACM 1997 International Conference on Computer Documentation, Snowbird, USA, pp 153–160
35 Lewis JR (1995) IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instruction for Use. Human-Computer Interaction,
7(1):57–78
36 LIFT. (2005) http://www.usablenet.com (accessed 18 January 2005)
37 Lim KH, Benbasat I, Todd PA (1996) An Experimental Investigation of the
Interactive Effects of Interface Style, Instructions, and Task Familiarity on
User Performance. ACM Transactions on Computer-Human Interaction,
3(1):1–37
38 Lowe D (2003) Emerging knowledge in Web Development. In Aurum A,
Jeffery R, Wohlin C, Handzic M (eds) Managing Software Engineering
Knowledge. Springer, Berlin, pp 157–175
39 Lynch P, Horton S (2001) Web Style Guide: Basic Design Principles for
Creating Web Sites, 2nd edn. Yale University Press, New Heaven, CT
40 Madsen KH (1999) Special Issue on The Diversity of Usability Practices. J
Communications of the ACM, 42(5)
41 Marchionini G, Shneiderman B (1988) Finding Facts vs. Browsing Knowledge in Hypertext Systems. IEEE Computer, 21(1):70–80
42 Meo R, Lanzi PL, Matera M, Esposito R (2004) Integrating Web Conceptual
Modeling and Web Usage Mining. In: Proceedings of the 2002 International
ACM Workshop on Web Mining and Web Usage Analysis, Seattle, USA,
August
43 Molich R, Nielsen J (1990) Improving a Human-Computer Dialogue. Communications of the ACM, 33(3):338–348
44 Nielsen J (1992) The Usability Engineering Lifecycle. J IEEE Computer,
25(3):12–22
45 Nielsen J (1993) Usability Engineering. Academic Press, Cambridge, MA
46 Nielsen J (1994) Special Issue on Usability Laboratories. Behavior and Information Technology, 13(1)
47 Nielsen J (1994) Guerrilla HCI: Using Discount Usability Engineering to
Penetrate Intimidation Barrier. In: Proceedings of the Cost-Justifying Usability, Academic Press, Cambridge, MA
48 Nielsen J (1995) Multimedia and Hypertext Internet and Beyond, Academic
Press, London
178
Maristella Matera et al.
49 Nielsen J (2000) Web Usability. New Riders, New York
50 Nielsen J, Landauer TK (1993) A Mathematical Model of the Finding of
Usability Problems. In: Proceedings of the ACM 1993 International Conference on Human Factors in Computing Systems, Amsterdam, Netherlands,
April, pp 296–213
51 Nielsen J, Mack RL (1994) Usability Inspection Methods. Wiley, New York
52 Nielsen J, Molich R (1990) Heuristic Evaluation of User Interfaces. In: Proceedings of the ACM 1990 International Conference on Human Factors in
Computing Systems, Seattle, USA, April, pp 249–256
53 Norman DA (1991) Cognitive Artifacts. In: Proceedings of the Designing
Interaction: Psychology at the Human–Computer Interface. Cambridge University. New York, pp. 17–38
54 Polson P, Lewis C, Rieman J, Wharton C (1992) Cognitive Walkthrough: A
Method for Theory-based Evaluation of User Interfaces. Man-Machine Studies, 36:741–773
55 Preece J, Rogers Y, Sharp H, Benyon D, Holland S, Carey T (1994) HumanComputer Interaction. Addison-Wesley, New York
56 Punin JR, Krishnamoorthy MS, Zaki MJ (2002) LOGML: Log Markup Language for Web Usage Mining. In: Proceedings of the Third International
Workshop on Web Mining and Web Usage Analysis, San Francisco, USA,
August, pp 88–112
57 Schwabe D, Rossi G (1998) An Object Oriented Approach to Web-Based
Applications Design. Theory and Practice of Object Ssystems, 4(4):207–225
58 Shneiderman B (1992) Designing the User Interface. Strategies for Effective
Human-Computer Interaction. Addison-Wesley, New York
59 Shneiderman B (2000) Universal Usability. Communications of the ACM,
43(5):84–91
60 Shneiderman B, Byrd D, Croft WB (1998) Sorting out searching. Communications of the ACM, 41(4):95–98
61 Srivastava J, Cooley R, Deshpande M, Tan PN (2000) Web Usage Mining:
Discovery and Applications of Usage Patterns from Web Data. ACM Special
Interest Group on Knowledge Discovery in Data Explorations, 1(2):12.23
62 Stroulia E, Niu N, El-Ramly M (2002) Understanding Web Usage for Dynamic Web-site Adaptation: A Case Study. In: Proceedings of the 4th International Workshop on Web Site Evolution, Montreal, Canada, October, pp 53–
64
63 Theofanos MF, Redish J (2003) Bridging the gap between accessibility and
usability. ACM Interactions, 10(6):36–51
64 Virzi RA (1992) Refining the Test Phase of Usability Evaluation: How Many
Subjects is Enough? Human Factors, 34(4):457–468
65 WebCriteria SiteProfile. (2005) http://www.coremetrics.com (accessed 18
January 2005)
Web Usability: Principles and Evaluation Methods
179
66 WebRatio Site Development Studio. (2005) http://www.webratio.com (accessed 18 January 2005)
67 Wharton C, Rieman J, Lewis C, Polson P (1994) The Cognitive Walkthrough
Method: A Practitioner’s Guide. In: Nielsen J, Mack RL (eds) Usability Inspection Methods, Wiley, New York, pp 105–140
68 Whiteside J, Bennet J, Holtzblatt K (1988) Usability Engineering: Our Experience and Evolution. In: Helander M (ed.) Handbook of Human-Computer
Interaction. Elsevier, Amsterdam pp 791–817
69 Wilson TD (2000) Human Information Behavior. Informing Science,
3(2):49–55
70 Wurman RS (1997) Information Architects, Watson-Guptill, New York
71 W3C Consortium – Extended log file format. (2005)
http://www.w3.org/TR/WD-logfile.html. (accessed 18 January 2005)
72 W3C Consortium - WAI-Web Content Accessibility Guidelines 2.0. (2005)
W3C-WAI Working Draft. http://www.w3.org/TR/WCAG20/ (accessed 18
January 2005)
Authors’ Biographies
Maristella Matera is Assistant Professor at Politecnico di Milano, where she
teaches Databases and Computer Science Fundamentals. Her research interests
focus on design methods and tools for Web applications, and in particular concentrate on conceptual modelling quality, Web log mining, personalisation of Web
applications, context-aware Web applications, multimodal Web interfaces, Web
application usability and accessibility. She is author of about 50 papers on the
previous topics and of the book Designing Data-Intensive Web Applications,
published by Morgan Kaufmann in December 2002. She has served as co-chair for
several editions of the “Web Technologies and Applications” track at ACM SAC,
and of the CAISE Workshop UMICS (Ubiquitous Mobile Information and Collaboration Systems). She is also regularly a member of the programme committee
of several conferences and workshops in the field of Web Engineering.
A more detailed curriculum vitae and list of publications can be found at:
http://www.elet.polimi.it/people/matera.
Francesca Rizzo is a junior researcher at Politecnico di Milano, where she is a
lecturer for the Human Computer Interaction Laboratory. She obtained her PhD in
Telematics and Information Society from the University of Siena in 2003. In the
last five years she has taught human computer interaction and interaction design at
the University of Siena and at Politecnico di Milano. Her fields of interest are
human computer interaction (HCI), user centered design (UCD), usability evaluation and activities analysis. She has worked in many European research projects.
Currently, her research is focused on Web application usability and accessibility,
e-learning, story telling technologies design and evaluation for children. She is
author of about 20 papers on the previous topics. She has served as reviewer for
180
Maristella Matera et al.
several conferences and journals such as ICWE (International Conference on Web
Engineering) and JWE (Journal of Web Engineering).
Giovanni Toffetti Carughi graduated in Information Engineering at Politecnico
di Milano in 2001. His thesis work focused on the extension through plugins of
the WebML methodology for designing and automatically generating dataintensive Web applications. He worked for three years in the software industry
both as a developer for WebRatio and as analyst and consultant in different industrial Web applications such as Acer-Euro portals, ABI-Pattichiari, Nortel-Consip,
MetalC.
He is currently a PhD student at Politecnico di Milano and his topics of interest
are conceptual modelling, model transformation, Web application engineering,
Web log analysis, human computer interaction, rich internet applications, and
image compression.
6 Web System Reliability and Performance:
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
Abstract: Modern Web applications provide multiple services that are
deployed through complex technologies. The importance and the economic
impact of consumer-oriented Web applications introduce significant requirements in terms of performance and reliability. This chapter presents
several methods to design new, and improve existing, Web applications
that, even within a context of unpredictable load variations, must satisfy
performance requirements. The chapter also provides a case study that
describes the application of the proposed methods to a typical consumeroriented Web application.
Keywords: Web systems, Performance, Reliability, Design.
6.1 Introduction
The use of Web technologies to deploy many classes of services through the
Internet is becoming a de facto standard. A large number of software and
hardware technologies are available with their pros and cons in terms of
complexity, performance and cost. Because of the extreme heterogeneity of
Web-related services and technologies, it is impossible to identify, from this
universe, the solution which best suits every possible Web application. Multiple issues need addressing during the design and deployment of a Webbased service, which include the efficacy of the presentation, richness of the
provided services, guarantee of security. Moreover, system performance and
reliability remain key actors for the success of any Web-based services. The
popularity of a Web application perceived as too slow or presenting availability problems can decline dramatically if navigation becomes a frustrating experience for the user. There are specific aspects of the design process
that focus mainly on data organisation. The interested reader can refer to
[24] and to references therein. In this chapter, we focus instead on the architectural design of Web applications that are subject to performance and
reliability requirements. Note that we remain in the domain of best-effort
Web-based services, with no strict guarantees on the performance levels,
similarly to that which characterises QoS-based applications [44,23]. The
techniques described in this chapter are hybrid in nature, but there are some
major steps that must be followed. These main procedural steps, listed below, are illustrated in Fig. 6.1, and detailed in the following sections.
182
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
Step 1. We have first to identify the main classes of services that must be
provided by the Web application.
Step 2. As we are interested in serving a large number of users with many
classes of services, it is important to define the most likely workload models that are expected to access the Web application. Each
workload model is characterised by a workload intensity, which
represents the number of requests admitted in the Web application,
and by the workload mixes, which denote the number of requests
in each class of service.
Step 3. The third step in the design phase is to define the performance and
reliability requirements of the Web application. There are many
system- and user-oriented performance parameters, as well as
many types of reliability goals, the most severe of which is the
well known 24/7 attribute (i.e. 24 hours, seven days a week) that
aims to deploy Web applications that are always reachable.
Step 4. Once the main characteristics and requirements for the Web application are specified, we have to choose the most appropriate software and hardware technologies to deploy it.
Step 5. After the implementation phase, we have to verify whether the
Web application works as expected and whether it respects the
performance and reliability requirements. As usual, a test can lead
to positive or negative outcomes.
Step 6. In the case of some negative results, an iterative step begins. It
aims to understand the causes of violation, remove them, and
check again until all expected performance requirements are satisfied. In the most severe cases, a negative outcome may require interventions at the system or implementation level (dashed line in
Fig. 6.1)
Step 7. Often, even a positive conclusion of the tests does not conclude
the work. If one considers the extreme variability of the user
patterns, the frequent updates/improvements of the classes of
services, the first deployment may be followed by a consolidation phase. It can have different goals, from capacity planning
tests to the verification of the performance margins of the system resources.
The remainder of the chapter is organised as follows. Section 6.2 outlines
the different types of Web applications and the main design challenges for
each class. Section 6.3 describes software technologies and hardware architectures for the deployment of Web applications that must serve mainly
dynamic Web resources. Section 6.4 focuses on the testing process of a Web
application. Section 6.5 outlines the main countermeasures to be undertaken
whenever the deployed system fails to meet performance and reliability
Web System Reliability and Performance:
183
Fig. 6.1. Procedural steps for designing and testing Web applications with performance requirements
requirements. Section 6.6 describes a case study showing how the proposed
design and testing methods can be applied to a typical e-commerce Web
application of medium size. Section 2.7 concludes the chapter with some
final remarks.
6.2 Web Application Services
There are so many services provided through the Web that it is difficult to
integrate all of them in a universally accepted taxonomy. We prefer to
describe the Web-based services through the following considerations:
184
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
• Each class of requests to the Web application involves one or more types
of Web resources, hence we consider that the first important step is to
classify the most important resources that characterise an application.
• Each class of requests has a different impact on the system resources of
the platform that supports the Web application. The system resources
include hardware and software components that are typically based on
distributed technologies. Knowing the available technologies and their
interactions is fundamental for the design of performance-aware Web
applications.
6.2.1 Web Resource Classification
Within the context of this chapter we recognise five basic resource types
that are provided by a Web application. Servicing each of these Web resources requires specific technologies, and has a different computational
impact on the system’s platform.
Static Web Resources
Static Web resources are stored as files. There are dozen of static resource
types, from HTML files to images, text files, archive files, etc. They are
typically managed by the file system of the same machine that hosts the
Web server. In the early days of the Web, static files were the only type of
Web resource. Servicing a static resource does not require a significant
effort from the Web system, since it is requested through a GET method of
the HTTP protocol, then fetched from a storage area or, often, from the
disk cache, and then sent to the client through the network interface. Performance problems may occur only when the static resource is very large.1
Dynamic Web Resources
Dynamic Web resources are generated “on-the-fly” by the application as a
response to a client request. There are many examples of dynamic resources, such as the result Web page from a Web search, a shopping cart
virtualisation in a Web store, the dynamic generation of embedded objects
or frames. Web resources generated in a dynamic way allow the highest
flexibility and personalisation because the page code is generated as a response to each client’s request.
There are two main motivations behind the use of dynamic resources.
The first is that the traditional dynamic request comes from the necessity to
obtain answers from an organised source of information, such as a database.
The generation of this type of response requires a significant computational
1
Based on current technology, large represents Megabytes and up.
Web System Reliability and Performance:
185
effort due to data retrieval from databases, and (optional) information processing and construction of the HTTP output. The computational impact of
dynamic requests on a Web application is increased also by the fact that it is
quite difficult to take advantage of caching for the dynamic resources. The
second is that one of the new trends on the Web is the generation of dynamic
content even when this is not strictly necessary. For example, XML- and
component-based technologies [19,29] provide mechanisms for separating
structure and representation details of a Web document. As a consequence,
all the documents (even static pages) are generated dynamically from a
template through computationally expensive operations.
Volatile Web Resources
Volatile Web resources are regenerated dynamically on a periodic time
basis or when a given event occurs. This type of resource represents information portals that deliver up-to-date news, sport results, stock exchange information, etc. Avoiding frequent re-computation keeps the cost
of volatile resource service low, and comparable to that of static resources.
On the other hand, the Web application must be equipped with mechanisms that can regenerate resources through automated technologies similar to those used for dynamic Web resources [40,21]. Pushing methods to
the clients are sometimes utilised [41].
Secure Web Resources
Secure Web resources are static, dynamic or volatile objects transferred
over a ciphered channel, usually through the HTTPS protocol. Secure resources address the need for privacy, non-repudiation, integrity and authentication. They are typically characterised by high CPU processing
demands, due to the computational requirements of the cryptographic algorithms [18,25].
Multimedia Resources
Multimedia resources are associated with video and audio content, such as
video clips, mp3 audio files, Shockwave Flash animations and movies.
There are two main ways to use multimedia resources: download-and-play,
or play-during-download. In the former case, multimedia content is usually
transferred through the HTTP protocol. The typical size of a multimedia
resource is much larger than that of other resources, hence download traffic caused by these files has a deep impact on network bandwidth requirements. In the case of play-during-download, the download service must be
integrated with streaming-oriented protocols, such as RTP [21,43], and
well-designed technologies that provide content from the Web application
without interruption.
186
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
6.2.2 Web Application’s Bearing on System Resources
Over the years, Web applications have evolved from being simple, static
pages to applications that incorporate complex, dynamic and secure services, such as e-commerce and home-banking applications. A complete
taxonomy that take into account every type of Web application is outside
the context of this chapter. Thus, we focus solely on application designs
that takes into account performance and reliability. In such a scenario, it is
important to consider the four main hardware resources of a system’s platform: CPU, disk, central memory and network interface. Moreover, we
suggest it is more important to focus on the classes of requests than on the
types of resources offered by a Web application. Indeed, each application
comprises a mix of Web resources, but for each workload mix there is a
prevalent class of requests that has a major impact on system resources. As
an example, if the workload model is predominantly characterised by
downloads of multimedia files, the network capacity has a primary impact
on system performance; hence it is important to design an architecture able
to guarantee high network throughput. For the above reasons we classify
Web applications according to their predominant request class.
Predominantly Static Applications
Currently, the design of static Web applications is not considered challenging, as existing Web technologies are able to serve an impressive volume
of static requests, even with commodity off-the-shelf hardware and software. The only requirement a static Web application has to meet relates to
the network capacity of the outbound link, which must handle the necessary volume of client requests/responses with no risk of bottleneck.
Predominantly Dynamic Applications
Dynamic Web applications offer sophisticated and interactive services,
possibly with personalised content. An idiosyncrasy of dynamic applications is the strong interaction between Web technology and information
sources (usually, databases) for nearly every client request. To provide
adequate performance to serve dynamic resources may prove to be a nontrivial task, as there are several technologies for dynamic content generation, each with advantages and drawbacks. Choosing the wrong technology
may lead to poor performance of the entire application. Section 6.3 is
dedicated to the analysis of dynamic Web applications and related technologies.
Predominantly Secure Applications
Secure Web applications provide services that are protected due to security
and privacy concerns. Examples include on-line shopping, and auction
Web System Reliability and Performance:
187
applications, and home-banking services. Purchase is the most critical
operation in secure e-commerce applications, because sensitive information (e.g. credit card number) is exchanged. When users buy, security requirements become significant and include privacy, non-repudiation, integrity and authentication rules. The transactions should be tracked
throughout the entire user session and backed up in the case of failures.
The largest part of secure applications’ content is often generated dynamically; however, even static resources may need a secure transmission.
The maximum computational requirement in secure applications is due
to the initial public-key cryptographic challenge, which is needed to perform the authentication phase [18]. This is in accordance with a previous
result [25], which confirms that the reuse of cached SSL session keys can
significantly reduce client response time (from 15% to 50%) in secure
Web-based services.
Predominantly Multimedia Applications
Multimedia Web applications are characterised by a large amount of multimedia content, such as audio and video clips, animations or slide-shows.
Examples of multimedia applications include e-learning services, a few ecommerce services specialised in music (e.g. iTunes [28]), on-line radios,
and applications that offer a download section with a repository of multimedia files.
We recall that two modes are available for multimedia resources realization: file download or content streaming. In the former case, the primary
design challenge is the same as for static Web applications, i.e. to provide
enough network bandwidth for downloading large multimedia files. As
multimedia resources are orders of magnitude larger than static resources,
bandwidth requirements are quite critical. In the latter case, introducing
streaming protocols increases the issues in the design of a Web application
because streaming-based delivery of multimedia content introduces realtime constraints in packet scheduling [43], and often requires a network
resource reservation protocol.
6.2.3 Workload Models and Performance Requirements
Knowing the composition of each service in terms of Web resources gives
a precise idea about the functional and software requirements for the design of the Web application, but only a rough approximation for the design
of the Web platform. Indeed, service characterisation alone is not enough
to quantify the amount of system resources that are needed to meet the
requested level of performance. For example, a service requiring many
system resources, but represented by infrequent accesses, may not have an
188
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
impact on the Web application. On the other hand, another service with
low resource requirements and frequent accesses may influence the performance of the entire Web system.
For these reason, it is necessary to characterise a set of workload models
that represent the behaviour of clients when they access each Web application’s service. The combined knowledge, obtained from both the service
and the workload characterisation, permits us to identify the system and
Web resources that will be used most intensively. As we are interested in
serving a large number of users with many classes of services, it is important to define the main workload models that are expected to access the
Web application. Each workload model is represented by the workload
intensity that correspond to the typical number of requests admitted in the
Web application, and by the workload mixes that corresponds to the number of requests in each class of service. Hence, we can consider expected
workload models that reflect the typical volume and mix of requests supposed to reach the Web system, and also worst-case workload models that
reflect the maximum amount of client requests that are admitted in the
Web application.
The next critical step is to quantify the level of adequate performance
for the expected set of workload models. Only after this choice is it possible to design and size the components and the system resources of the Web
system, according to performance and reliability expectations. The problem here is that it is quite difficult to anticipate the possible offered loads
to the Web application without any previous experience. The large number
of system- and user-oriented performance parameters (some of which are
reported in Section 6.4), as well as the types of reliability goals, make it
even more complicated to define exact levels of adequate performance
without testing the system under representative workload models. In practice, the definition of the performance expectations is an iterative process.
During the design phase, the commissioner can provide a rough idea of
workload intensity and mixes to the Web application designers and architects. It should be clear to both parties that the initial proposals do not represent a formal contract. On the other hand, the designers should be aware
that it is preferable to choose a Web application architecture that guarantees a safe margin in expected performance (twofold as initially declared
by the commissioner is not unusual).
Once the requirements of the Web system are defined in terms of Web
resources, workload models and performance expectations, the design and
deployment of the Web application become a matter of choosing the right
software and hardware technology. For this purpose, it is important to
know the main strengths and weaknesses of the most popular technologies.
We review in the following section those related to the dynamic-oriented
Web applications.
Web System Reliability and Performance:
189
6.3 Applications Predominantly Dynamic
To describe the hardware and software design of a Web application, we
consider a system servicing resources that are mainly dynamic. This type
of application is highly popular and introduces interesting design challenges, hence we consider it a representative case for describing the proposed methodology of design and testing.
6.3.1 Dynamic Request Service
An abstract view of the main steps for servicing a dynamic request is presented in Fig. 6.2. Three main entities are involved in the request service:
the client, the Internet and the Web system. As we are more interested in
the server part, we detail the Web system components. There are three
main abstract functions that contribute to service a dynamic request: HTTP
interface, application logic and information source.
The HTTP interface handles connection requests from the clients through
the standard HTTP protocol and serves static content. It is not responsible
for the generation of dynamic content. Instead, the functions offered by the
application logic are at the heart of a dynamic Web system: they define the
logic behind the generation of dynamic content and build the HTML documents that will be sent back to the clients. Usually, the construction of a
Web page requires the retrieval of further data. The information source
layer provides functions for storage of critical information that is used in the
management of dynamic requests that are passed to the application logic.
The final result of the computations is an HTML (or XML) document that is
sent back to the HTTP interface for delivery to the client.
The use of three separate levels of functions has its advantages. The
most obvious is the modularity: if the interfaces among different abstraction levels are kept consistent, changes at one level do not influence other
levels. Another advantage is the scalability: the separation of abstraction
layers makes it easier to deploy them on different nodes. It is even possible
Fig. 6.2. Abstract view of a dynamic request
190
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
to deploy a single level over multiple, identical nodes. Section 6.3.3 provides some insights on these possibilities.
The management of a dynamic request is the result of the interaction of
multiple and complex functions (see Fig. 6.2). Each of them can be deployed through different software technologies that have their own
strengths and weaknesses. Furthermore, they can be mapped in different
ways to the underlying hardware. A performance-oriented design must
address both issues: choose the right software technologies and the hardware architecture for the Web system. This is a non-trivial task that must
be solved through an extensive analysis of the main alternatives at the
software and hardware levels.
From the point of view of software technologies, the real design challenge resides in the choice of the appropriate application logic, since both
the HTTP interface and information source are well established. For the
HTTP interface, Apache has become the most popular Web server [35],
followed by other products, such as MS Internet Information Services, Sun
Java System Web Server, Zeus. All of them provide the main functions of
an HTTP server, but differ in the portability and efficiency levels that depend on operating systems and system platforms. Hence, the design of the
HTTP interface becomes a simple choice among one of the aforementioned products, as long as it is compatible with the underlying platform
and adopted software technologies. Apache works better with other open
source products, such as PHP, Perl, Python. MS IIS works better with Microsoft software technologies.
Similar considerations hold for the information source layer that handles
the storage and retrieval of data. This layer consists of a database management system (DBMS) and storage elements. There are many alternatives in the DBMS world, even if all of them are based on the relational
architecture and an SQL dialect. The most common products are MySQL
[33] and PostgreSQL [38] on the open source side, and MS SQL Server,
Oracle and IBM DB2 on the proprietary side. Hence, choosing the information source basically is a matter of cost, management, operating system
constraints, internal competences and taste.
In the following section, we use a notation that is widely adopted in the
Web literature. We refer to HTTP interface, application logic and information source also as front-end layer, middle layer and back-end layer, respectively.
6.3.2 Software Technologies for the Application Logic
The application logic of the middle layer is at the heart of a dynamic Web
application. This layer computes the information which will be used to
Web System Reliability and Performance:
191
construct documents that are sent over a protocol handler. There is a plethora of software technologies which implement different standards. Each of
them has its advantages and drawbacks with respect to performance,
modularity, scalability. Let us distinguish the scripting from the component-based technologies.
Scripting Technologies
Scripting technologies are based on a language interpreter that is integrated
in the Web server software. The interpreter processes the code that is embedded in the HTML pages and that typically accesses the database. The
script code is replaced with its output, and the resulting HTML is returned
to the client. Static HTML code (also called HTML template) is left unaltered. Examples of scripting technologies include language processors
such as PHP [37], ASP [1] and ColdFusion [20].
Scripting technologies are efficient for dynamic content generation, because they are tightly coupled with the Web server. They are ideal for medium-sized, monolithic applications that require an efficient execution
environment. Other applications that benefit from scripting technologies
are characterised by large amounts of static, template HTML code that
embeds a (relatively) small amount of dynamically generated data. An
example is the ordinary product description page of an e-commerce application, which has an HTML template that is filled with variable information, retrieved from the database.
On the other hand, the tight coupling between the front-end and the
middle layer, which is typical of scripting languages, severely limits their
use in Web-related applications that require high scalability. Indeed, to
achieve scalability, it may be necessary to add nodes, but scripting technologies often lack integrated, high-level support for coordination and
synchronisation of tasks running on different nodes. This support can be
implemented through the use of function libraries, provided with the most
popular scripting languages. However, this requires an additional, significant programming effort. For this reason, scripting technologies are seldom used to deploy highly distributed Web-based services.
Component-Based Technologies
Component-based technologies use software objects that implement the
application logic. These objects are instantiated within special execution
environments called containers. A popular component-based technology
for dynamic Web resource generation is the Java 2 Enterprise Edition
(J2EE) [29], which includes specifications for Java Servlets, Java Server
Pages (JSPs), and Enterprise Java Beans (EJBs).
192
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
Java Servlets are Java classes that implement the application logic for a
Web application. They are instantiated within a Servlet container (such as
Tomcat [44]) that has an interface with the Web server. The objectoriented nature of Java Servlets enforces better modularity in the design,
while the possibility to run distinct containers, on different nodes, facilitates a system scalability level that could not be achieved by the scripting
technologies. Java Servlets represent the building blocks of the J2EE
framework. Indeed, they only provide the low-level mechanisms for servicing dynamic requests. The programmer must take care of many details,
such as coding the HTML document template, and organising the communication with external information sources. For these reasons, Java Servlets are usually integrated with other J2EE technologies, such as JSP and
EJB technologies.
JSPs are a standard extension defined on top of the Java Servlet API that
permits the embedding of Java code in an HTML document. Each JSP is
automatically converted into a Java Servlet, used to serve future requests.
JSPs pages try to preserve the advantages of Java Servlets, without penalising Web pages that contain a large amount of static HTML templates,
and a small amount of dynamically generated content. As a consequence,
JSP is a better solution for dynamic content generation than plain Java
Servlets, which are more suitable to data processing and client request
handling. JSP is usually the default choice for dynamic, component-based
content generation.
EJBs are Java-based server-side software components that enable dynamic content generation. An EJB runs in a special environment called an
EJB container, which is analogous to a Java Servlet container. EJB provides native support for atomic transactions that are useful for preserving
data consistency through commit and rollback mechanisms. Moreover,
they handle persistent information across several requests. These added
functions introduce a performance penalty due to their overhead. They
should be used only in those services which require user session persistence among different user requests to the same application. Common examples include database transactions and shopping cart services in ecommerce applications.
Technology Comparison
An interesting performance comparison between scripting and componentbased technologies is provided in [14]. This study compares the PHP
scripting technology against Java Servlets and EJB for the implementation
of a simple e-commerce application. Using the same hardware architecture, PHP provides better performance with respect to other componentbased technologies. The performance gain is 30% over Java Servlets, and
Web System Reliability and Performance:
193
more than double with respect to EJB. On the other hand, Java Servlets
outperform the scripting technology when the system platform consists of
a sufficient number of nodes.
Figure 6.3 displays a qualitative performance comparison between the
two software technologies, which takes system throughput as a function of
client traffic volume. From this figure we can see that scripting technologies tend to reach their maximum throughput sooner than componentbased technologies, because of their more efficient execution environment.
Hence, component-based technologies tend to perform badly on small-tomedium-sized Web applications, but scale better than scripting technologies and can reach even higher throughput. The main motivation lies in
their high modularity, which allows for the distribution of the application
logic among multiple nodes.
Fig. 6.3. A qualitative comparison of software technologies
6.3.3 System Platforms
Once the logical layers and the proper software technologies needed to
implement the Web application are defined, they need to be mapped onto
physical nodes. Typically, we do not have a one-to-one mapping because
many logical layers may be located on the same physical node, and a single layer may be distributed among different nodes for the sake of performance, modularity and fault tolerance.
There are two approaches to map logical layers over the physical nodes,
called vertical and horizontal replications. In a vertical replication, each
logical layer is mapped to at most one physical node. Hence, each node hosts
one or more logical layers. In a horizontal replication, multiple replicas of
the same layer are distributed across different nodes. Horizontal and vertical
replications are usually combined to reach a scalable and reliable platform.
194
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
The simplest possible hardware architecture consists of a single node,
where all logical layers (front-end, middle, back-end) are placed on the
same physical node. This architecture represents the cheapest alternative
for providing a Web-based service; on the other hand, it suffers from multiple potential bottlenecks. In particular, the system resources can be easily
exhausted by a high volume of client requests. Moreover, the lack of
hardware component replication prevents the fault tolerance of a single
node architecture. Explicit countermeasures, such as RAID storage and
hot-swappable redundant hardware, may reduce the risks of single points
of failure, but basically there is no reliability opportunity. We should also
consider that placing every logical layer on the same node has a detrimental effect on system security because once the node has been corrupted, the
entire Web system is compromised. From the above considerations, we
can conclude that the single node architecture is not a viable solution for
the deployment of a dynamic Web application that intends to ensure performance and reliability.
Vertical Replication
In a vertical replication, logical layers comprising the Web-based service
are placed into different nodes. The most common distributions for dynamic resource-oriented Web applications lead to the vertical architectures
that are based on two-node, three-node and four-node schemes. Figure 6.4
shows the three examples of vertical replication.
In the two-node architecture, the three logical layers are distributed over
two physical nodes. There are three possible mappings between logical
layers and physical nodes. However, the typical solution is to have the
back-end layer on one node, and the front-end and middle layers on another. There are two main motivations for this choice. First, the tasks performed by a DBMS can easily exhaust the system resources of a single
node. Second, front-end and application logic may be tightly coupled, as in
the case of the scripting technologies; this makes separation of the logical
layers very hard (if not impossible). The distribution over two nodes generally improves the performance of the Web system, with respect to the
single node architecture. Fault tolerance still remains a problem, because a
failure in any of the two nodes causes a crash of the entire Web system.
In the three-node architecture, each logical layer is placed on a distinct
node. Due to the tight coupling between front-end and middle layer in
scripting technologies, an architecture based on at least three nodes is the
best choice for component-based technologies. For example, the J2EE
specification provides inter-layer communication mechanisms that facilitate the distribution of the front-end and the middle layer among the nodes.
Scripting technologies do not natively have similar mechanisms; hence
Web System Reliability and Performance:
195
they have to be entirely implemented if the distribution of the layers, over
more than two nodes, is a primary concern of the architectural design.
Fault tolerance is still not guaranteed by three-node architectures, since a
failure in any node hinders the generation of dynamic Web content. However, the three-node solution helps improve performance and reliability,
with respect to the two-node architecture, as shown in [14].
Fig. 6.4. Vertical replication
Four-node architectures are usually the choice for J2EE systems that
distribute the middle layer between two physical nodes: one hosting the
business logic, encapsulated into the EJB container, and the others hosting
the application functions through the JSP Servlet Engine. It is convenient
to adopt this architecture due to overheads caused by the EJB component.
Vertical replication is widely adopted, not just for performance reasons.
When security is a primary concern, this hardware architecture is useful
because it allows the deployment of a secure platform between the nodes
through the use of firewalls. The possibility of controlling and restricting
communication among the nodes of a Web system aids in detecting security breaches, and in reducing the consequences of a compromised system.
In fact, the multi-layered architectures presented in Fig. 6.4 are a simplification of real systems that include network switches, authentication servers
and other security-oriented components.
196
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
Vertical and Horizontal Replications
Higher performance and reliability purposes motivate the replication of
nodes at one or more logical layers, which is called horizontal replication.
This replication type is usually combined with vertical replication. Figure
6.5 shows the combination of horizontal and vertical replication. In particular, it shows a three-layer system, where each of the three logical levels
is hosted over a cluster of identical nodes, each connected through a highspeed LAN, and running the same components. Initial distribution is
achieved through a component called Web switch, which may be implemented in hardware or software. To achieve horizontal replication, other
mechanisms are needed for distributing requests among the nodes of each
layer.
Fig. 6.5. Vertical and horizontal replication
When choosing a Web system’s architecture, it is important to know
that the horizontal replication requires different efforts depending on the
layer concerned. For example, the replication of the front-end layer causes
fewer problems because the HTTP protocol is stateless and different HTTP
requests may be handled independently by different nodes [11].
Web System Reliability and Performance:
197
The replication of the middle layer is rather complex due to two main
factors: first, the use of user session information by most applications; and
second, the type of software technology that is adopted for the implementation. Web-based services that use session information must be equipped
with mechanisms that guarantee data consistency. Scripting technologies
usually do not support such mechanisms natively; some rely on external
modules, others are forced to store session information in the back-end,
with risks of serious slowdowns. Even in component-based technologies,
the implementation of data consistency is not always immediate. For example, Java Servlets do not provide native persistent data support, while
this is one of the strong attributes of EJB.
It is generally difficult to replicate horizontally, even the back-end layer,
because it introduces data consistency issues that must be addressed
through difficult and onerous solutions [26]. Modern DBMSs are equipped
with the necessary mechanisms that guarantee horizontal replication of
databases, but this replication is limited to a few units.
The combination of vertical and horizontal replication helps to improve
important design objectives, such as scalability and fault tolerance, which
are crucial for obtaining an adequate level of performance and reliability.
In particular, horizontal replication allows the use of dispatching algorithms that tend to distribute the load uniformly among the nodes [10,3].
Moreover, hardware and software redundancy, provided by horizontal
replication, also helps to add a level of fault tolerance to the system.
A more complete fault tolerance requires also fault detection and failover mechanisms. Fault detection mechanisms monitor system components and check whether they are operative or not. When a faulty component is detected, the dispatchers may be instrumented to bypass that node;
meanwhile, there are fail-over mechanisms that allow us to substitute the
faulty node on-the-fly [12].
6.4 Testing Loop Phase
Once the Web-based service has been designed and deployed, it is necessary to verify services, functional correctness (functional testing) and that
performance and reliability requirements are satisfied for each workload
model included (performance testing).
Functional testing aims to verify that a Web application works as expected, without regard to performance. This type of test is carried out by
defining and reproducing typical behavioural user patterns for different
operations. Each request’s output is matched against an expected, desired
template. Unexpected behaviours, or failures, imply that the Web system
has not been deployed correctly, and that appropriate software corrections
198
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
are needed. Details of Web system debugging are outside the scope of this
chapter. Rather, we focus on performance testing, which allows us to verify whether the Web system guarantees the expected performance. The
performance testing of a dynamic resource-oriented Web application is a
non-trivial activity that requires the completion of different tasks: representation of the workload model, traffic generation, data collection and
analysis. Each is detailed in subsequent sections.
6.4.1 Representation of the Workload Model
The first crucial step of testing is the translation of the workload models
into a sequence of client requests, to be used by load generation tools to
reproduce the appropriate volume of traffic. The choice of a request stream
representation depends greatly on the complexity of workload patterns.
There are two main approaches, according to the complexity of the workload model.
Simple workload patterns, such as those of static resource-oriented Web
applications, are usually well represented through file lists (with their access
frequencies) and analytical distributions. While being fairly straightforward
to implement, these two representations present drawbacks that limit their
application to more sophisticated workloads. For example, file lists lack
flexibility with respect to the workload specification, and do not provide
any support for modelling the session-oriented nature of Web traffic.
Analytical distributions allow us to define a wide characterisation, being
all features specified through mathematical models. It is an open issue to
determine whether an analytical model reflects user behaviour in a realistic
way. From our experience, we can say that the large majority of studies
has been focused on static content characterisation [8,9,10], while fewer
studies consider Web applications with prevalent dynamic and personalised content. Studies related to Web publishing applications can be found
in [6,45], and the characterisation of on-line shopping applications has
been analysed in [7,45]. In addition, preliminary results for trading and
B2B applications can be found in [31] and [45], respectively.
The modelling of more complicated browsing patterns, such as those associated with on-line transactions, may require the creation of ad-hoc
workloads, through the use of file traces and, in some case, the definition
of finite state automata.
File traces of a workload model are based on pre-recorded (or synthetically generated) logs, derived from Web server access logs. Traces aim to
capture the behaviour of users in the most realistic way. On the other
hand, the validity of tests depends strongly on the representativeness of a
trace. Some of them may show characteristics peculiar to a specific Web
Web System Reliability and Performance:
199
application with no general validity. Furthermore, it may be difficult to
adjust the workload described by a trace to emulate future conditions, or
varying demands, as well as to reconstruct user sessions.
The workload model may be described through finite state automata,
where each state is associated to a Web page. A transition from one state to
another occurs with a predefined probability. A user think time is modelled
as a variable delay between two consecutive state transitions. The main
advantage of finite state automata lies in the possibility of defining complicated browsing patterns, which reflect modern consumer- oriented Web
applications. On the other hand, most of these patterns need to be manually
specified, which is an error-prone operation.
6.4.2 Traffic Generation
Once the proper representation is chosen, the client request stream has to
be replayed through a traffic generator. The main goal of a traffic generator is to reproduce the specified traffic in the most accurate and scalable
way. Besides, it also has to reproduce realistically the behaviour of a fully
featured Web browser, with support for persistent HTTP connections,
cookie management, or secure connections through the HTTPS protocol.
There are four main approaches to generate a stream of Web requests:
trace-based, file list-based, analytical distribution-driven and finite-state
automata-based. Each depends on the workload model that is used as the
base for traffic generation.
The imitation of wide area network effects is another important factor
that must be taken into consideration during the performance tests. It has
been shown that, even in the presence of static resource-oriented Web applications, the performance evaluation is sensibly altered if the network is
perturbed by routing delays, packet losses or client bandwidth limitations
[34]. If these are not taken into consideration, which typically occurs if the
Web application’s performance is evaluated using a LAN, then measured
performance differs significantly from the reality, thus making the test
results almost useless.
6.4.3 Data Collection and Analysis
Data collection is strictly related to the two main goals of the performance
testing analysis: to understand if a Web system is performing adequately
and, if the expectations are not satisfied, to find the possible causes of performance slowdowns. They are the purpose of black-box and white-box
testing, respectively.
200
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
Data collection addresses two main issues of sampling: the choice of
representative metrics2 for a given performance index, and the granularity
of samples. The former problem is independent of the black and white-box
testing. As Web workload is characterised by heavy-tailed distributions
[9], many performance indexes may assume highly variable values with
non-negligible probability. Therefore, evaluating the performance only on
the basis of mean, minimum and maximum values, may not yield a representative view of a Web system’s behaviour. When performance indexes
are subject to high variability, the use of higher moments, such as percentiles or cumulative distributions, is highly recommended. This, in turn,
typically requires the storage of every sample.
The choice of the sample granularity is related to testing goals. Sampling of performance indexes may occur at different levels of granularity,
mainly system and resource. The former is more related to black-box testing, the latter to white-box testing.
Black-box Testing
The main goal of black-box testing is to check whether the Web system is
able to meet performance and reliability requirements, for each workload
model, with safe margins. Black-box testing is related to system performance indexes that quantify the performance of the entire Web system, as
seen from the outside. These are typically coarse-grained samples that aim
to verify whether the Web system is performing adequately or not. Many
performance indexes may be obtained from the running system, for different purposes. For example, the throughput of a Web system in terms of
served pages/hits/bytes per second may be of interest for the administrator,
to check if the architecture is able to deliver the requested traffic volume.
A Web page response time, i.e. the elapsed time from a user’s click until
the arrival of the last object composing a Web resource, is of main interest
for users for whom system throughput is of no concern, but who wish to
check the time they have to wait for the fruition of a given service. Both
indexes reflect the performance of the entire Web system, from different
points of view. Although we do not consider a QoS-based Web application
that has to comply with rigorous Service Level Agreements (SLAs), it is
suitable to mention soft constraints, generally accepted in the world of
interactive Web-based services. For example, a previous study by IBM
[16] provides a ranking of performance parameters (ranging from unacceptable to excellent) in terms of response time for a typical Web page
loaded by a dial-up user. The study concludes that a Web page response
time higher than 30 seconds is unacceptable, while everything below
2
Metrics have the same meaning as measures.
Web System Reliability and Performance:
201
20þseconds is considered at least adequate. In [36], the reaction of broadband users to different Web page download times is analysed. One conclusion is that the limit to keep a user's attention focused on the browser window is about 10 seconds. Longer-lasting Web page downloads lead users
towards other tasks.
It is important that a Web system works within safe performance margins. The consequences of this claim are twofold: first, the system must
follow the performance requirements for any expected workload; second,
when the system is subject to the maximum expected workload intensity, it
should not show signs of imminent congestion. The former requirement
can be verified through a set of independent black-box tests for each representative workload model. The latter requirement is motivated by the observation that a Web system may meet all of its performance requirements,
but with critically utilised resources. A similar situation is unacceptable
because a burst of client arrivals may easily saturate the resource, thus
slowing down the entire Web system. To avoid the risk of drawing false
conclusions about a system’s performance reliability, we can use a whitebox test, or carry out black-box tests to evaluate performance trends. For
now we will remain within the context of black-box testing.
Performance trends can be evaluated as a function of different workload
mixes, or workload intensities. In the latter case, we can evaluate the page
response time as a function of an increasing traffic volume reaching the
Web system, possibly even slightly higher than the maximum expected
workload intensity.
Figure 6.6 gives an example of performance trend evaluation in a system where the maximum expected workload intensity, and the adequate
performance, are clearly defined. Three performance curves (P1, P2, P3) are
Fig. 6.6. Analysis of performance trends of the Web system
202
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
considered, in addition to response times obtained for the maximum expected workload intensity (MEWI). If we limit the black-box analysis to
the MEWI point, we can conclude that P1 and P2 are acceptable, whereas
P3 does not respect the required service level. However, a trend black-box
analysis indicates that even P2 is not safe enough. Indeed, in association
with the maximum expected workload intensity, the P2 curve already
shows an exponential growth. For both the P2 and P3 cases, the black-box
test should be considered failed, and we should resort to white-box testing
to verify the causes of (possible) bottleneck.
White-Box Testing
During black-box testing, it is not necessary to consider the internal parts
of a Web system. We can sample performance indexes outside the Web
system while it is used with each of the predefined workload models. Conversely, white-box testing aims to evaluate the behaviour of a Web platform’s internal system components, for different client request rates (usually, around the maximum expected workload intensity). This task can
optionally be performed after a black-box test, to be sure that the utilisation of Web system components is well below critical levels. However, it
becomes compulsory in two situations: when the system is violating the
performance requirements (e.g., the P3 curve in Fig. 6.6), and when the
trends indicated by black-box testing suggest the presence of possible bottlenecks (e.g. the P2 curve in Fig. 6.6). To find potential or actual Web
system bottlenecks, it is necessary to carry out more detailed analysis,
which takes finer-grained performance indexes into account. White-box
testing is carried out by applying the expected workload models to the
Web-based service, and by monitoring its internals to ensure that system
resource utilisation does not exceed critical levels. For this purpose, we use
resource performance indexes that measure the performance of a Web
system’s internal resources. They help identify the Web system components most utilised. Examples of resource indexes include component utilisation (CPU, disk, network, central memory), and amount of limited software resources (such as file and socket descriptors, process table entries).
These fine-grain resource performance indexes require additional tools that
must be executed during the test. Some tools are publicly available within
ordinary UNIX operating systems [39,42], but they do not provide samples
for every system resource, hence modifications to the source code (when
available) are sometimes necessary.
Once white-box testing has indicated the nature of the bottleneck(s) affecting the Web system, it may still be necessary to collect additional information to understand the causes of the problem. This allows us to plan
appropriate actions for removing the bottleneck. An insufficient amount of
Web System Reliability and Performance:
203
information concerning the problem limits the range of effective and efficient interventions to improve performance and reliability. To deepen the
analysis, it is necessary to inspect the Web system at an even finer granularity, that of program functions. This allows us to identify hot spots: that
is, critical sections of executable code consuming a significant amount of
bottleneck resources.
Performance indexes at the function level are associated to the functions
of each executable program, including the operating system’s kernel. Common examples include the function call frequency and the percentage of
time spent by the program in each main function. Function-level analysis
requires special tools [30] that collect statistics and provide customisable
views for function accesses and service times.
After the bottleneck removal step, a new testing phase follows (involving black-box and white-box testing), in order to verify that performance
and reliability targets have been achieved. As outlined in Section 6.1, the
entire procedure is a fix-and-test loop that may require several attempts to
achieve the desired goals. In the next section, we detail the various categories of possible interventions to remove potential bottlenecks.
6.5 Performance Improvements
Whenever a performance test fails, the Web system is not operating adequately or reliably, and proper intervention is required. As already outlined
in Section 6.4, test failures can happen in different cases. First, black-box
testing can indicate a performance that is below the expected level. Second, even if the goals for “adequate performance” are met, performance
can still be compromised by high utilisation of system resources that can
lead to a bottleneck, if the client request traffic further increases. Finally, it
is often interesting to carry out capacity planning studies that test the system under expected future workload models. These studies tend to put
more stress on the resources of the Web system, which may cause saturation and introduce new bottlenecks that need to be removed. In this section, we discuss three main interventions for improving the performance
and reliability of a Web system: system tuning, scale-up and scale-out.
6.5.1 System Tuning
System tuning aims to improve system performance by appropriately
choosing operating system and application parameters. There are two major ways: first, to increase available software resources related to the operating system and critical applications; second, to reduce hardware resource
204
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
utilisation. The typical intervention, to improve the capacity of a software
resource, tends to raise the number of available file descriptors, sockets
and process descriptors. Alternatively, sophisticated mechanisms, such as
caching and pooling, are adopted to limit the utilisation of critical system
hardware or software resources. Caching avoids information recomputation by preserving it in memory. Examples of commonly cached
entities include database queries and Web pages. In resource pooling, multiple software resources are generated and grouped into a set (called pool),
previous to being used, so they become immediately available upon request. They are not destroyed on release, but returned to the pool. The
main advantage of pooling is the reduced overhead of resource creation
and destruction, saving system resources. The TCP connections (especially, persistent TCP connections to a DBMS) are typical resources handled through a pool, because they are expensive to setup and destroy.
The size of caches and resource pools is a typical parameter to be tuned.
Increasing the size tends to avoid re-computation (as in the case of caches)
and to reduce set-up/destruction overheads (as in the case of pooling). In
both cases, the utilisation of critical system resources is reduced. However,
restrictions in the available amount of memory (both main and secondary)
and operating system resources (e.g. socket and file descriptors) limit the
maximum size of caches and pools.
6.5.2 System Scale-up
Scale-up consists of an upgrade of one or more hardware resources, without adding new nodes to the underlying architecture. This intervention is
necessary whenever white-box testing shows (the risk of) a saturated
hardware resource (e.g. disk bandwidth or CPU power). Usually, a hardware upgrade is straightforward and does not require extensive system
analysis. However, two points are important when performing scale-up.
First, hardware upgrades are useless if an operating system resource (such
as file descriptors) is exhausted. In such a scenario, adding hardware does
not increase the capacity of the blocked resource. Second, performance
improvements may often be obtained at lower costs through parameter
tuning, previously discussed.
6.5.3 System Scale-out
System scale-out interventions aim at adding nodes to the platform. This can
be achieved through vertical or horizontal replications. A vertical replication
deploys the logical layers over more nodes (e.g. it may pass from a two-node
to a three-node architecture); a horizontal replication adds nodes to one or
Web System Reliability and Performance:
205
more layers. Both interventions improve system performance. However,
horizontal replication can also be used to improve the Web system’s reliability. As the redesign of the platform implies non-negligible costs, in terms of
time and money, scale-out should be used only when no performance improvement, based on scale-up, can be achieved. Furthermore, not all software technologies are well suited for scale-out. For example, Section 6.3
discusses that scripting technologies do not provide any native support for
service distribution. Hence, system scale-out would imply a massive redesign of the applications supporting the Web-based service.
An even greater scale-out intervention may be necessary when performance degradation is caused by the network connecting the Web system to
the Internet (the so-called first mile). Indeed, locally distributed Web
server systems may suffer from bottlenecks that affect the capacity of outbound connections [4]. Performance and scalability improvements can be
achieved through a geographically distributed architecture that is managed
by the content provider, or by recurring to outsourcing solutions. The deployment of a geographically distributed Web system is expensive and
requires uncommon skills. As a consequence, only a few large organisations can afford to handle geographical scale-out by themselves. An alternative is to employ Content Delivery Networks (CDNs) [2], which, by
handling Web content and service delivery, thus relieve the content provider from the design and management of a complex, and geographically
distributed, architecture. There are many aspects that cannot be exhaustively described in this chapter. For more details on geographically distributed Web applications the reader can refer to [40].
6.6 Case Study
We present a case study that illustrates the main steps introduced in Section
6.1, and detailed in the subsequent sections. After the characterisation of a
Web-based service and workload models, we show a possible design and
deployment of a Web system. We then carry out white-box and black-box
performance testing, aimed at finding and removing system bottlenecks.
6.6.1 Service Characterisation and Design
Web Resources Characterisation
The application used as a case study is an on-line shop Web application
that allows users to browse a product catalogue and to purchase goods.
These two main user interactions with the Web system illustrate the type
206
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
of Web resources that will be used. In particular, the workload mix of the
Web-based service is characterised by a few static Web resources mainly
related to product images. In addition, most HTML documents are generated dynamically. Within the context of this case study, we assume that an
external payment gateway system is used; as a consequence, the Web system does not serve secure Web resources. The Web-based service characteristics correspond to those of a predominantly dynamic Web application
(see Section 6.2.2).
Workload Model Characterisation
The set of expected workload models for a Web application captures the
most common user interactions with the Web-based service. We consider
two workload models, namely browsing and buying, which have their
workload mix shown in Table 6.1. The browsing workload model is represented predominantly by product browsing actions, which use static and
dynamic resources. The presence of static content is motivated by the high
amount of images shown during browsing. The buying workload model is
represented by purchase transactions involving a high amount of dynamic
resources. Table 6.1 also shows that no secure, volatile and multimedia
resources are present in either workload model.
The amount of clients accessing the application can change at different
temporal scales (daily, weekly, season). However, we assume that the
maximum expected workload intensity does not exceed 400 concurrent
users. Whenever this threshold is reached, the Web system rejects requests
for connection. We will allude to this maximum workload intensity when
defining the Web system’s performance requirements.
Table 6.1. Composition of the workload models
Static resource requests (%)
Dynamic resource requests (%)
Browsing
60
40
Buying
05
95
Workload model
Performance Requirements
Once the workload models have been defined, performance requirements
for each workload mix need to be set. The page response time was chosen
as the main system parameter. The first performance requirement to be
defined is related to user-perceived performance. A previous study [17]
showed that a Web page download time exceeding 25 seconds is perceived
as slow by ordinary dial-up users. However, due to the growing number of
Web System Reliability and Performance:
207
x-DSL and cable modem connections, we chose to use as the basis for
performance evaluation page response times that represent faster connections (e.g. ADSL links). Nielsen [36] suggests that an acceptable response
time threshold for page downloads, using high-bandwidth Internet connections, is 10 seconds.
Due to a tailed distribution of page response time, we rather represent
th
response time’s 90 percentile, i.e. performance requirements are only met
if page response time is below the threshold set to 10 seconds. For a system-oriented view, we also evaluate system throughput using served pages
per second.
System Design
To design a Web system, software technologies for each of its three logical
layers must be chosen. Due to its critical nature, we find it convenient to
focus on the middle layer. Since the Web application is of medium size, no
extreme scalability requirements are to be met. Hence, we can assume that
this system will not use a highly distributed architecture. In addition, many
pages are represented by a fixed template, with a significant amount of
static HTML code. The application’s size and the presence of large HTML
page templates suggest that the application’s middle layer can be deployed
using a scripting technology.
We chose PHP [37] as the scripting language because of its efficiency
and for being open source, thus reducing deployment costs. PHP is easily
integrated in the Apache Web server [5], which is our choice for the frontend layer. Finally, we chose MySQL [33] as the DBMS for the back-end
layer. Our choice is motivated by the fact that MySQL is also open
source, and widely adopted. Furthermore, it offers adequate performance,
considering the size of our application, and it is well supported by the
PHP interpreter.
Next, we need to map the three logical layers onto physical nodes.
Scripting technologies typically lead to a two-node vertical architecture.
Indeed, separation of the middle and the back-end layers on different
nodes is a common choice in most medium-sized Web applications. Due to
the application’s performance requirements, we can dismiss horizontally
replicated architectures, which would introduce significant complexity to
the middle layer software.
We can summarise the design choices for the deployment of the Web
application as follows. One node runs both the Apache Web server (version
2.0) and the PHP4 engine, used for the front-end and middle layers, respectively. The back-end layer is on a separate node running MySQL database
server (version 4.0). All computers are based on the Linux operating system with kernel version 2.6.8. Each node is equipped with a 2.4 GHz
208
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
hyperthreaded Xeon, 1GB of main memory, 80GB ATA disks (7200 rpm,
transfer rate 55MB/s) and a Fast Ethernet 100Mbps adapter.
6.6.2 Testing Loop Phase
Initial black-box testing was carried out to verify if the Web system satisfies the performance requirements for all the workload models considered.
The test-bed architecture is rather simple, as shown in Fig. 6.7: a node
hosts the client emulator; the other two nodes comprise the platform that
hosts the Web application.
Fig. 6.7. Architecture of the test-bed for the experiments
The client emulator creates a fixed number of client processes, which
instantiate sessions made up of multiple requests to the e-commerce system. For each customer session, the client emulator opens a persistent
HTTP connection to the Web server, which lasts until the end of the session. Session length has a mean value of 15 minutes. Before initiating the
next request, each emulated client waits for a specified time, with an average of 7 seconds. The sequence of requests is emulated by a finite state
machine that specifies the probability to pass from one Web page request
to another.
To take into account the wide area network effects, we use a network
emulator, based on the netem packet scheduler [27], that creates a virtual
link between the clients and the e-commerce system with the following
characteristics: the packet delay is normally distributed with ȣ = 200 ms
and ı = 10 ms, the packet drop probability is set to 1%. Bandwidth limitation in the last mile (i.e. the client–Internet connection) is provided directly
by the client emulator.
Black-Box Testing
Initially, we consider system-level measures to determine the capacity of
the Web system. We carry out tests with browsing and buying workload
Web System Reliability and Performance:
209
models, and measure the system’s throughput and the Web page response
time for different values of the client population.
Figure 6.8 shows the system’s throughput (measured as Web pages
served per second, including embedded objects) as a function of four client
populations for both browsing and buying workload models. The browsing
workload model shows a close-to-linear throughput increase with the user
population, while the histogram of the buying workload model shows a
clear throughput saturation occurring between 300 and 400 clients. Further
increases of user population beyond 300 units does not improve the system
throughput, which remains close to 40 pages per second.
Fig. 6.8. Web system’s throughput
In addition, we also assessed the system response time’s 90th percentile
for both workload models, for different client populations (see Fig. 6.9).
The browsing model shows response times well below the 10 second
threshold. On the other hand, the buying workload model shows an increase of nearly one order of magnitude (from 0.9 to 8.8 seconds) in page
response time, especially when the population increases from 300 to 400
clients. The expected performance requirement is met: a response time of
8.8 seconds is still below the threshold.
However, the sudden growth in the response time, in association with a
critical throughput, is an indication that a bottleneck occurs in the system
when the number of clients is between 300 and 400. We also check if the
response time’s exponential growth trend is also present in association
with a higher number of clients. For this reason we continue our black-box
testing, increasing the number of clients up to 500; 500 clients corresponds
to an increase in relatively small response time, when compared to that for
210
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
Fig. 6.9. Page response time’s 90th percentile
client numbers between 300 and 400. In addition, we observe a nonnegligible number of errors in the provision of Web pages and refusal of
client requests. The reason for this behaviour cannot be solely explained
using black-box testing, thus further analysis is necessary.
The negative trend, indicated by black-box testing, and a performance
value close to the set threshold, suggest the need to plan and undertake a
countermeasure to improve the system’s performance, as described in Section 6.4. To understand the directions that must be followed, it is necessary
to carry out white-box testing.
White-Box Testing
We now present the results of white-box testing, in which we investigate
the utilisation of the Web system’s internal resources. The main goal is to
investigate the causes of performance degradation, indicated by black-box
testing, when the system is subjected to the buying workload model for a
client population between 300 and 400. Furthermore, white-box testing can
also help us to understand the reasons for errors in the client request service that was observed for a client population of 500 users. We chose to
initiate the white-box analysis using a client population of 400 clients, as it
is close to the point where the system’s performance declined.
Finer-grained performance evaluations take into account resource performance indexes, such as CPU, disk and network utilisation. Table 6.2
shows the results for white-box testing, for different resources. Utilisation
values are reported as the sample averages throughout the entire test’s
duration. Table 6.2 suggests that the system’s bottleneck is caused by the
CPU of the node hosting the back-end layer.
Web System Reliability and Performance:
211
This is confirmed by the curve in Fig. 6.10, which displays the CPU
utilisation of the back-end node during the experiments (the horizontal
dashed line represents the mean value). A CPU utilisation of 0.9 is a clear
sign of resource over-utilisation, which may be at the basis of a system
bottleneck. The 80%–20% ratio between the time spent in the user and
kernel mode, respectively, suggests that the application-level computations
on the DBMS are much more intensive than the overhead imposed by the
management of the operating system calls.
White-box tests also show an unexpected result: even if the bottleneck is
related to the DBMS, the disk utilisation is quite low (0.015). The motivation for this result must be found in the size of the database, which nearly
fits completely within the 1Gigabyte main memory. We conclude that, for
this case study, the disk activity does not represent a potential bottleneck
for the back-end node hosting the database server. Instead, the bottleneck’s
cause must be found from the back-end node’s CPU. Our experiments
confirm that due to hardware improvements, even at the entry level, it is
becoming common for medium-size e-commerce databases to fit almost
completely in main memory.
Fig. 6.10. CPU utilisation of the back-end layer node
White-box testing can also help explain errors occurring when the system is subject to heavier load, as in the case of a population of 500 clients.
In this case, the finite queue of pending connection requests gets saturated.
As a consequence, future client connection attempts are refused, and the
actual amount of requests served by the system is only slightly higher than
the amount served using 400 clients. The ultimate consequence is that the
212
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
Web system tends asymptotically to saturation, as shown by the black-box
analysis. However, since the saturation of the pending connections queue
only occurs after the back-end node’s bottleneck, we will look into this
issue in more detail, since it has a significant impact on performance.
Table 6.2. Resource utilisation (white-box testing)
Performance index
• CPU utilisation
− user mode
− kernel mode
• Disk utilisation
• Network interface utilisation
Front-end and Middle-layer Back-end layer
0.31
0.21
0.10
0.003
0.012
0.90
0.76
0.14
0.015
0.002
The next step aims to eliminate the system’s bottleneck. To this end, it
is necessary to use operating system monitors to help identify what is causing the bottleneck. Monitors allow us to detect the software component
that is utilising a large part of the CPU in user mode. As there is only one
major application running on the back-end layer node, we can deduce that
the MySQL server process is the source of the bottleneck. However, if we
limit the analysis at the process granularity level, we do not have any hint.
It is necessary to evaluate finer-grained indexes, at the function level.
These indexes will permit us identify the source of the problem and to
fully explain the causes of the inadequate performance.
We present the results of the same experiment, executed under an efficient operating system profiler [30], on the node hosting the DBMS. The
profiler output shows more than 800 DBMS functions; hence a detailed
analysis of all function access times and frequencies is quite difficult, and
even useless. The idea is to focus on the functions that use more CPU time,
while we aggregate the other functions that are not significant for the bottleneck analysis. The evaluation shows that most CPU time is consumed
by consistency checks on stored data, so we can conclude that the real
cause of the bottleneck is represented by the asynchronous I/O subsystem
adopted in the MySQL process.
6.6.3 System Consolidation and Performance Improvement
The results of the white-box test show that the asynchronous I/O operations on the DBMS require more CPU power than what is currently available. We have three possible interventions to address this issue: system
scale-out, system scale-up and system tuning. The goal now is to understand the most appropriate solution.
Web System Reliability and Performance:
213
Scaling out the system is not the best approach to solve the problem.
Vertical replication is not effective in reducing the load on the back-end
node because it does not allow us to distribute the DBMS over multiple
nodes. The only viable solution would be a horizontal replication of the
DBMS, but MySQL has no native support to manage consistency in a database distributed over multiple nodes. Furthermore, a similar intervention
would require a mechanism to distribute queries over the multiple backend nodes. This means a complete redesign of the back-end layer and also
of significant portions of the middle layer. For these reasons, we avoid
interventions based on a system’s scale-out.
Scale-up is a viable solution: upgrading the existing hardware is a
straightforward approach, in particular if we increase the CPU speed for
the back-end node. However, it is also worthwhile to investigate if tuning
the DBMS’s parameters can be the solution. As the problem is in the asynchronous I/O subsystem, we can try to reduce the asynchronous I/O activity by decreasing the number of buffer accesses. For example, this can be
accomplished by increasing the query cache’s size.
After this intervention, we re-evaluate the system performance with a
second test phase. We find that the CPU utilisation on the back-end node
diminishes from 0.9 to 0.6. As expected, reducing the CPU bottleneck on
the back-end node improves the performance of the overall system. Figure
6.11 shows the cumulative distributions for the system’s response time
before and after reconfiguring the system’s parameters. It confirms the
validity of the intervention, as the 90th percentile for response time drops
from 8.8 to 3.1 seconds after the database tuning.
Fig. 6.11. Cumulative distribution functions for system’s response time
214
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
6.7 Conclusions
Web applications are becoming a critical component of the information society. Modern Web-based services handle a wide, heterogeneous range of
data, including news, personal and multimedia data. Due to their growing
popularity and interactive nature, Web applications are vulnerable to increases in volume of client requests, which can hinder both performance and
reliability. Thus, it is fundamental to design Web systems that can guarantee
adequate levels of service at least within the expected traffic conditions.
Throughout this chapter we presented a methodology to design missioncritical Web systems that take into account performance and reliability
requirements. The proposed approach is conceptually valid for every Web
application, although we focus mainly on systems represented by prevalent
dynamic content, since this category presents, in our view, interesting design and implementation challenges.
Besides the design process, we describe the issues related to performance testing by showing the main steps and the goals of black-box and
white-box performance tests. We finally consider some interventions that
can be carried out whenever performance tests are not satisfactory. After
the identification of causes of violation, we present the main interventions
to improve performance: system tuning, system scale-up and system scaleout. The chapter concludes with a case study in which we apply the proposed methodology to a medium-size e-commerce application.
Acknowledgements
The authors acknowledge the support of MIUR in the framework of the
FIRB project “Performance evaluation of complex systems: techniques,
methodologies and tools” (PERF).
References
1
Active Server Pages (2004) http://msdn.microsoft.com/asp
2
Akamai Technologies (2005) http://www.akamai.com
3
Andreolini M, Colajanni M, Morselli R (2002) Performance study of dispatching algorithms in multi-tier web architectures. ACM Sigmetrics Performance Evaluation Review, 30(2):10–20
4
Andreolini M, Colajanni M, Nuccio M (2003) Kernel-based Web switches
providing content-aware routing. In: Proceedings of the 2nd IEEE International Symposium on Network Computing and Applications (NCA), Cambridge, MA
Web System Reliability and Performance:
215
5
Apache Web server (2005) http://httpd.apache.org
6
Arlitt MF, Jin T (2000) A workload characterization study of the 1998 World
Cup Web site. IEEE Network, 14(3):30–37
7
Arlitt MF, Krishnamurthy D, Rolia J (2001) Characterizing the scalability of a
large scale Web-based shopping system. ACM Transaction on Internet Technology, 1(1):44–69
8
Arlitt MF, Williamson CL (1997) Internet Web servers: Workload characterization and performance implications. IEEE/ACM Transactions on Networking, 5(5):631–645
9
Barford P, Crovella M (1998) An architecture for a WWW workload generator. In: Proceedings of SIGMETRICS, Madison, WI
10 Barford P, Crovella M (1998) Generating representative Web workloads for
network and server performance evaluation. In Proceedings of SIGMETRICS
1998, Madison, WI, pp 151–160
11 Cardellini V, Casalicchio E, Colajanni M, Yu PS (2002) The state of the art
in locally distributed Web-server systems. ACM Computing Surveys,
34(2):263–311
12 Cardellini V, Colajanni M, Yu PS (1999) Dynamic load balancing on Web
server systems, IEEE Internet Computing, 3(3):28–39
13 Cardellini V, Colajanni M, Yu PS (2003) Request redirection algorithms for
distributed Web systems. IEEE Transactions on Parallel and Distributed Systems, 14(4):355–368
14 Cecchet E, Chanda A, Elnikety S, Marguerite J, Zwaenepoel W (2003) Performance comparison of middleware architectures for generating dynamic
Web content. In: Proceedings of the ACM/IFIP/USENIX International Middleware Conference, Rio de Janeiro, Brazil
15 Chen H, Mohapatra P (2002) Session-based overload control in QoS-aware
Web servers. In: Proceedings of IEEE Infocom, New York, NY
16 Chiu W (2000) Design pages for performance. IBM High Volume Web Site
white papers
17 Chiu W (2001) Design for scalability: an update. IBM High Volume Web Site
white papers
18 Coarfa C, Druschel P, Wallach D (2002) Performance analysis of TLS Web
servers. In: Proceedings of the Network and Distributed System Security
Symposium (NDSS), San Diego, CA
19 Cocoon. The Apache Cocoon Project (2005) http://cocoon.apache.org
20 Cold Fusion (2004) http://www.coldfusion.com
21 Darwin Streaming Server;
http://developer.apple.com/darwin/projects/streaming/
22 Edge Side Includes, ESI (2004) http://www.esi.org
216
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
23 Elnikety S, Nahum E, Tracey J, Zwaenepoel W (2004) A method for transparent admission control and request scheduling in e-commerce Web sites. In:
Proceedings of the 13th International Conference on World Wide Web, New
York, NY
24 Fraternali P (1999) Tools and approaches for developing data-intensive Web
applications: a survey. ACM Computing Surveys, 31(3):227–263
25 Goldberg A, Buff R, Schmitt A (1998) Secure Web server performance dramatically improved by caching SSL session keys. In: Proceedings of
SIGMETRICS, Madison, WI
26 Gray J, Helland P, O’Neil PE, Shasha D (1996) The dangers of replication
and a solution. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Canada
27 Hemminger S. (2004) Netem home page:
http://developer.osdl.org/shemminger/netem
28 iTunes (2005) http://www.apple.com/itunes
29 Java 2 Platform Enterprise Edition, J2EE (2004) http://java.sun.com/j2ee
30 Levon J (2004) Oprofile: a system profiler for Linux.
http://oprofile.sourceforge.net
31 Menascè DA, Almeida VAF, Riedi R, Pelegrinelli F, Fonseca R, Meira V
(2000) In search of invariants for e-business workloads. In Proceedings of
2nd ACM Conference on Electronic Commerce, Minneapolis, MN
32 Menascé DA, Barbarà D, Dodge R (2001) Preserving QoS of e-commerce
sites through self-tuning: a performance model approach. In: Proceedings of
the 3rd ACM Conference on Electronic Commerce, Tampa, FL
33 MySQL database server (2005) http://www.mysql.com
34 Nahum E, Rosu MC, Seshan S, Almeida J (2001) The effects of wide-area
conditions on WWW server performance. In: Proceedings of the ACM
SIGMETRICS International Conference on Measurement and Modeling of
Computer Systems, Cambridge, MA
35 Netcraft (2005) http://www.netcraft.com/survey/archive.html
36 Nielsen J (1994) Usability Engineering. Morgan Kaufmann, San Francisco, CA
37 PHP scripting language (2005) http://www.php.net
38 PostgreSQL database server (2005) http://www.postgresql.org
39 Procps: the /proc file system utilities (2005) http://procps.sourceforge.net
40 Rabinovic M, Spatscheck O (2002) Web caching and replication. AddisonWesley
41 Rabinovich M, Xiao Z, Douglis F, Kalmanek C (2003) Moving edge side
includes to the Real Edge – the clients. In: Proceedings of the 4th USENIX
Symposium on Internet Technologies and Systems
Web System Reliability and Performance:
217
42 Sar: the system activity report (2005)
http://perso.wanadoo.fr/sebastien.godard
43 Sculzrinne H, Fokus GMD, Casner S, Frederick R, Jacobson V (1996) RTP:
A transport protocol for real-time applications, RFC 1889
44 The Tomcat servlet engine (2005) http://jakarta.apache.org/tomcat
45 Vallamsetty U, Kant K, Mohapatra P (2003) Characterization of e-commerce
traffic. Electronic Commerce Research, 3(1–2):167–192
Authors’ Biographies
Mauro Andreolini is currently a researcher in the Department of Information
Engineering at the University of Modena, Italy. He received his masters degree
(summa cum laude) at the Univeristy of Roma, “Tor Vergata”, in January 2001. In
2003, he spent eight months at the IBM T.J. Watson Research Center as a visiting
research student.
His research focuses on the design, implementation and evaluation of locally
distributed Web server systems, based on a best-effort service or on guaranteed
levels of performance. He is a Standard Performance Evaluation Corporation
(SPEC) technician responsible for the University of Modena and Reggio Emilia.
He has served in the organization committee of the IFIP WG7.3 International
Symposium on Computer Performance Modelling, Measurement and Evaluation
(Performance2002).
For additional details, see: http://weblab.ing.unimo.it/people/andreoli.
Michele Colajanni is a Full Professor of Computer Engineering at the Department of Information Engineering of the University of Modena. He was formerly
an Associate Professor at the same University in the period 1998–2000, and a
Researcher at the University of Roma Tor Vergata. He received the Laurea degree
in computer science from the University of Pisa in 1987, and the PhD degree in
computer engineering from the University of Roma “Tor Vergata” in 1991. He has
held computer science research appointments with the National Research Council
(CNR), visiting scientist appointments with the IBM T.J. Watson Research Center, Yorktown Heights, New York. In 1997 he received an award by the National
Research Council for the results of his research activities on high- performance
Web systems during his sabbatical year spent at the IBM T.J. Watson Research
Center.
His research interests include scalable Web systems and infrastructures, parallel
and distributed systems, performance analysis, benchmarking and simulation. In
these fields he has published more than 100 papers in international journals, book
chapters and conference proceedings, in addition to several national conferences.
He has lectured at national and international seminars and conferences.
Michele Colajanni has served as a member of organising or programme committees of national and international conferences on system modelling, performance analysis, parallel computing and Web-based systems. He is the general chair
218
Mauro Andreolini, Michele Colajanni, Riccardo Lancellotti
of the first edition of the AAA-IDEA Workshop. He is a member of the IEEE
Computer Society and the ACM.
For additional details, see: http://weblab.ing.unimo.it/people/colajanni.
Riccardo Lancellotti received the Laurea and the PhD degrees in computer engineering from the University of Modena and from the University of Roma “Tor
Vergata”, respectively. He is currently a researcher in the Department of Information Engineering at the University of Modena, Italy. In 2003, he spent eight
months at the IBM T.J. Watson Research Center as a visiting research student.
His research interests include scalable architectures for Web content delivery
and adaptation, peer-to-peer systems, distributed systems and performance evaluation. Dr. Lancellotti is a member of the IEEE Computer Society.
For additional details, see: http://weblab.ing.unimo.it/people/riccardo.
7 Web Application Testing
Giuseppe A. Di Lucca, Anna Rita Fasolino
Abstract: Web applications are characterised by peculiarities that differentiate them from any other software application. These peculiarities affect
their testing in several ways, which may result in harder than traditional
application testing. Suitable methods and techniques have to be defined and
used to test Web applications effectively. This chapter will present the main
differences between Web applications and traditional ones, and how these
differences impact the testing of Web applications. It also discusses relevant contributions in the field of Web application testing, proposed recently. The focus of the chapter is mainly on testing the functionality of a
Web application, although discussions about the testing of non-functional
requirements are provided too. Readers are required to have a general
knowledge of software testing and Web technologies.
Keywords: Web engineering, Web application testing, Software testing.
7.1 Introduction
In the last decade, with the wide diffusion of the Internet, a growing market request for Web sites and applications has been recorded. As more and
more organisations exploit the World Wide Web (WWW) to offer their
services and to be reached by larger numbers of customers and users, the
request for high-quality Web applications satisfying security, scalability,
reliability, and accessibility requirements has grown steadily. In such a
scenario, testing Web applications to verify their quality became a crucial
problem.
Unfortunately, due to market pressure and very short time-to-market,
the testing of Web applications is often neglected by developers, as it is
considered to be time-consuming and lack a significant payoff [11]. An
inversion of this trend may be obtained if testing models, methods, techniques, and tools that allow testing processes to be carried out effectively
and in a cost-effective manner are available.
Although Web application testing shares similar objectives to those of
“traditional” application testing, there are some key differences between
testing a traditional software system and testing a Web application: the
specific features exhibited by Web applications, and not included in other
software systems, must be considered to comprehend these differences.
220
Giuseppe A. Di Lucca, Anna Rita Fasolino
A Web application can be considered as a distributed system, with a client–
server or multi-tier architecture, including the following characteristics:
− It can be accessed concurrently by a wide number of users distributed
all over in the world.
− It runs on complex, heterogeneous execution environments, composed
of different hardware, network connections, operating systems, Web
servers, and Web browsers.
− It has an extremely heterogeneous nature that depends on the large
variety of software components that it usually includes. These components can be built by different technologies (i.e. different programming
languages and models), and can be of a different nature (i.e. new components generated from scratch, legacy ones, hypermedia components,
COTS, etc.).
− It is able to generate software components at run time according to user
inputs and server status.
Each aspect described in the previous list produces new testing challenges and perspectives. As an example, effective solutions need to be
identified for executing performance and availability testing to verify a
Web application’s behaviour when accessed concurrently by a large number of users. Moreover, as users may utilise browsers with different Web
content rendering capabilities, Web applications must be tested to make
sure that the expected application’s behaviour using different Web browsers, operating systems, and middleware is the one expected. Another critical feature of a Web application to be specifically tested is its security and
ability to be protected from unauthorised access. The different technologies used to implement Web application components influence the complexity and cost of setting up a testing environment required to test each
component. In addition, the different mechanisms used to integrate distributed components produce various levels of coupling and inter-component
data flow, impacting the cost for being tested effectively. As for the existence of dynamically generated software components, the issue here is to
cope with the difficulty of generating and rerunning the same conditions
that produced each component.
Finally, Web application testing also needs to take into account failures
in the application’s required services/functionality, to verify the conformance of the application’s behaviour to specified functional requirements.
Considering that the components of a Web application are usually accessed by navigation mechanisms implemented by hyper-textual links, a
specific verification activity also needs to be devised to check link integrity, to assure that no unreachable components or pending/broken links are
included in the application.
Web Application Testing
221
Problems and questions regarding Web applications’ testing are, therefore, numerous and complex. In this chapter we discuss these problems
and questions and present possible solutions, proposed by researchers,
from both academic and industrial settings.
We use two separate perspectives to analyse Web application testing:
the first considers aspects related to testing the non-functional requirements of a Web application; the second considers the issue of testing the
functionality offered by Web applications.
Section 7.2 introduces several types of non-functional requirements of
Web applications and how they should be tested. From Section 7.3 onwards this chapter focuses on testing the functional requirements of Web
applications. Section 7.3 presents different categories of models used to
obtain suitable representations of the application to be tested. Section 7.4
presents different types of testing scopes for Web applications. In Section
7.5 several test strategies for designing test cases are discussed, while in
Section 7.6 the characteristic features of tools for Web application testing
are analysed. Section 7.7 shows a practical example of testing a Web application. Finally, Section 7.8 presents our conclusions and future trends.
7.2 Web Application Testing: Challenges and
Perspectives
Since the Web’s inception the goals and functionality offered by Web applications, as well as the technologies used to implement them, have
changed considerably. Early Web applications comprised a simple set of
static HTML pages. However, more recent applications offer their users a
variety of functions for manipulating data, accessing databases, and carrying out a number of productive processes. These functions are usually performed by means of software components implemented by different technologies such as Java Server Pages (JSP), Java Servlets, PHP, CGI, XML,
ODBC, JDBC, or proprietary technologies such as Microsoft’s Active
Server Pages (ASP). These components exploit a complex, heterogeneous
execution environment including hardware, software, and middleware
components.
The remainder of this chapter uses the term Web application (or simply
application) to indicate the set of software components implementing the
functionality and services the application provides to its users, while the
term running environment will indicate the whole infrastructure (composed of hardware, software and middleware components) needed to execute a Web application.
222
Giuseppe A. Di Lucca, Anna Rita Fasolino
The main goal of testing a Web application is to run the application using combinations of input and state to discover failures. A failure is the
manifested inability of a system or component to perform a required function within specified performance requirements [13]. Failures can be attributed to faults in the application’s implementation. Generally, there will
be failures due mainly to faults in the application itself and failures that
will be mainly caused by faults in the running environment or in the interface between the application and the environment on which it runs. Since a
Web application is strictly interwoven to its running environment, it is not
possible to test it separately to find out exactly what component is responsible for each exhibited failure. Therefore, different types of testing have to
be executed to uncover these diverse types of failures [17].
The running environment mainly affects the non-functional requirements of a Web application (e.g. performance, stability, compatibility),
while the application is responsible for the functional requirements. Thus,
Web application testing has to be considered from two distinct perspectives. One perspective identifies the different types of testing that need to
be executed to verify the conformance of a Web application with specified
non-functional requirements. The other perspective considers the problem
of testing the functional requirements of an application. It is necessary that
an application be tested from both perspectives, since they are complementary and not mutually exclusive.
Questions and challenges that characterise both testing perspectives will
be analysed in the next sub-sections.
7.2.1 Testing the Non-functional Requirements of a Web
Application
There are different non-functional requirements that a Web application,
either explicitly or implicitly, is usually required to satisfy. For each nonfunctional requirement, testing activities with specific aims will have to be
designed. A description of the verification activities that can be executed
to test the main non-functional requirements of a Web application are presented below.
Performance Testing
Performance testing is carried out to verify specified system performance
(e.g. response time, service availability). Usually, performance testing is
executed by simulating hundreds, or even more, simultaneous user accesses
over a defined time interval. Information about accesses is recorded and
then analysed to estimate the load levels exhausting the system resources.
Web Application Testing
223
In the case of Web applications, system performance is a critical issue
because Web users do not want to wait too long for a response to their
requests; as well, they also expect that services will always be available.
Effective performance testing of Web applications is a critical task because it is not possible to know beforehand how many users will actually
be connected to a real-world running application. Thus, performance testing should be considered as an everlasting activity to be carried out by
analysing data from access log files, in order to tune the system adequately.
Failures that can be uncovered by performance testing are mainly due to
running environment faults (e.g. scarce resources, poorly deployed resources), even if any software component of the application level may
contribute to inefficiency, i.e. components implementing any business rule
by algorithms that are not optimised.
Load Testing
Load testing is often used as a synonym for performance testing but it differs from the latter because it requires that system performance be evaluated with a predefined load level. It aims to measure the time needed to
perform several tasks and functions under predefined conditions. These
predefined conditions include the minimum configuration and the maximum activity levels of the running application. Also, in this case, numerous simultaneous user accesses are simulated. Information is recorded and,
when the tasks are not executed within predefined time limits, failure reports are generated.
As for the difficulties of executing load testing of Web applications,
considerations similar to the ones made for performance testing can also be
taken into account. Failures found by load testing are mainly due to faults
in the running environment.
Stress Testing
Stress testing is conducted to evaluate a system or component at or beyond
the limits of its specified requirements. It is used to evaluate the system’s
response at activity peaks that can exceed system limitations, and to verify
if the system crashes or is able to recover from such conditions. Stress
testing differs from performance and load testing because the system is
executed on or beyond its breaking point, while performance and load
testing simulate regular user activity.
In the case of Web applications, stress testing difficulties are similar to
those that can be met in performance and load testing. Failures found by
stress testing are mainly due to faults in the running environment.
224
Giuseppe A. Di Lucca, Anna Rita Fasolino
Compatibility Testing
Compatibility testing is carried out to determine if an application runs as
expected on a running environment that has various combinations of
hardware, software, and middleware.
In the case of Web applications, compatibility testing will have to uncover failures due to the usage of different Web server platforms or client
browsers, and corresponding releases or configurations.
The large variety of possible combinations of all the components involved in the execution of a Web application does not make it feasible to
test them all; thus usually only the most common combinations are considered. As a consequence, just a subset of possible compatibility failures
might be uncovered.
Both the application and the running environment can be responsible for
compatibility failures. A general rule for avoiding compatibility failures is
to provide Web application users with appropriate information about the
expected configuration of the running environment and with appropriate
diagnostic messages to deal with any incompatibilities found.
Usability Testing
Usability testing aims to verify to what extent an application is easy to use.
Usually, design and implementation of the user interface both affect usability. Thus, usability testing is mainly centred around testing the user
interface: issues concerning the correct content rendering (e.g. graphics,
text editing format) as well as the clarity of messages, prompts, and commands that are to be considered and verified.
Usability is a critical issue for a Web application. Indeed, it may determine the success of the application. As a consequence, an application’s
front-end and the way users interact with it often are aspects that are given
greater care and attention during the application’s development process.
When Web application usability testing is carried out, issues related to
an application’s navigation completeness, correctness, and conciseness are
also considered and verified. This type of testing should be an everlasting
activity carried out to improve the usability of a Web application; techniques of user profiling are usually used to reach this aim.
The application is mainly responsible for usability failures.
Accessibility Testing
Accessibility testing can be considered a particular type of usability testing
whose aim is to verify that the access to an application’s content is allowed
even in the presence of reduced hardware and software configurations on the
client side (e.g. browser configurations disabling graphical visualisation, or
scripting execution), or in the presence of users with disabilities, such as
visual impairment.
Web Application Testing
225
In the case of Web applications, accessibility rules such as the one provided by the Web Content Accessibility Guidelines [24] have been established, so that accessibility testing represents verification the compliance
of an application with such rules. The application itself is generally the
main cause of accessibility problems, even when accessibility failures may
be due to the configuration of the running environment (e.g. browsers
where the execution of scripts is disabled).
Security Testing
Security testing aims to verify the effectiveness of the overall Web application’s defences against undesired access of unauthorised users, its capability to preserve system resources from improper use, and granting
authorised users access to authorised services and resources. Application
defences have to provide protection mechanisms able to avoid or reduce
damage due to intrusions, with costs that should be significantly less than
damages caused by a security break.
Application vulnerabilities affecting security may be contained in the
application code, or in any of the different hardware, software, and middleware components. Both the running environment and the application
can be responsible for security failures.
In the case of Web applications, heterogeneous implementations and
execution technologies, together with the very large number of possible
users and the possibility of accessing them from anywhere, can make Web
applications more vulnerable than traditional applications and security
testing more difficult to accomplish.
7.2.2 Testing the Functional Requirements of a Web
Application
Testing the functional requirements of an application aims at verifying that
an application’s features and operational behaviour correspond to their
specifications. In other words, this type of testing is responsible for uncovering application failures due to faults in the functional requirements’ implementation, rather than failures due to the application’s running environment. To achieve this aim, any failures due to the running environment
should be avoided, or reduced to a minimum. Preliminary assumptions
about the running environment will have to be made before test design and
execution.
Most methods and approaches used to test the functional requirements
of “traditional” software can also be used for Web applications. Similarly
to traditional software testing, a Web application’s functionality testing has
to rely on the following basic aspects:
226
Giuseppe A. Di Lucca, Anna Rita Fasolino
− Testing levels, which specify the different scope of the tests to be carried out, i.e. the collections of components to be tested.
− Test strategies, which define heuristics or algorithms to create test
cases from software representation models, implementation models, or
test models.
− Test models, which represent the relationships between a representation’s elements or a component’s implementation [3].
− Testing processes, which define the flow of testing activities, and other
decisions such as when to start testing, who is to perform the testing,
how much effort should be used, etc.
However, despite their similarity to conventional applications, Web applications also have distinguishing features that cause specific problems
for each aspect described in the previous list. For example, the definition
of testing levels for a Web application requires greater attention than that
applied to traditional software. At the unit testing level, the scope of a unit
test cannot be defined uniquely, since it depends on the existence of different types of components (e.g. Web pages, script functions, embedded objects) residing on both the client and server side of an application. In relation to integration testing, the numerous different mechanisms used to
integrate an application’s heterogeneous and distributed components can
generate several coupling levels and data flow between the components,
which have to be considered to establish a correct integration strategy.
As for the strategies for test design, the classical approaches of black
box, white box, or grey box testing may be taken into account for designing test cases, provided that preliminary considerations are defined. In
general, Web applications’ black box testing will not be different from
software applications’ black box testing. In both cases, using a predetermined coverage criterion, an adequate set of test cases is defined
based upon the specified functionality of the item to be tested. However, a
Web application’s specific features can affect test design and execution.
For example, testing of components dynamically generated by the running
application can be very expensive, due to the difficulty of identifying and
regenerating the same conditions that produced each component. Therefore, traditional testing models used to represent the behaviour of an application may have to be adapted to these characteristics and to the Web applications’ running environment.
White box testing, irrespective of an application’s nature, is usually
based on coverage criteria that take into account structural features of the
application or its components. Adequate models representing an application or component’s structure are used, and coverage criteria and test cases
are appropriately specified. The aim of white box testing is to cover the
structural elements considered. Since the architecture and components of
Web Application Testing
227
a Web application are largely different from those of a traditional application, appropriate models representing structural information at different
levels of granularity and abstraction are needed, and coverage criteria have
to be defined accordingly. For example, models representing navigation as
well as traditional structural aspects of an application need to be taken into
account. Coverage criteria must focus both on hyperlinks, which allow
user navigation in the application, and on inner items of an application’s
component (e.g. its code statements).
Besides black and white box testing, grey box testing can also be considered for Web applications. Grey box testing is a mixture of black and
white box testing, and considers both the application’s behaviour, from the
end user’s viewpoint (same as black box testing), and the application’s
inner structure and technology (same as white box testing). According to
[17], grey box testing is suitable for testing Web applications because it
factors in high-level design, environment, and interoperability conditions.
It is expected that this type of testing will reveal problems that are not easily identified by black box or white box analysis, in particular problems
related to end-to-end information flow and distributed hardware/software
system configuration and compatibility. Context-specific failures relevant
to Web applications are commonly uncovered using grey-box testing.
Finally, for the testing processes, the classical approach for testing execution that starts from unit test and proceeds with integration, system testing, and acceptance testing can also be taken into account for Web applications. For each phase, however, differences with respect to testing
traditional software have to be detected and specific solutions have to be
designed. An important testing process issue is, for instance, to set up an
environment to execute tests at each phase: driver or stub modules are usually required to run tests at the unit or integration phase. Solutions for testing a Web application have to explicitly consider the application’s distributed running environment, and to adopt the necessary communication
mechanisms for executing the components being tested.
7.3 Web Application Representation Models
In software testing the need for models that represent essential concepts
and relationships between items being tested has been documented [3].
Models are able to support the selection of effective test cases, since they
can be used to express required behaviour or to focus on aspects of an application’s structure believed to have defects.
With regard to Web applications, models for representing their behaviour or structure have been provided by several Web application development methodologies, which have extended traditional software models to
228
Giuseppe A. Di Lucca, Anna Rita Fasolino
explicitly represent Web-related software characteristics. Examples of
such models include the Relationship Management Data Model (RMDM)
used by the Relationship Management Methodology (RMM) [14], which
uses entity–relationship-based diagrams to describe objects and navigation
mechanisms of Web applications. Other methodologies, such as Object
Oriented Hypermedia (OOH) [9], integrate the traditional object-oriented
models with a navigational view and a presentation view of the application. The Object-Oriented Hypermedia Design Model (OOHDM) methodology [22] allows for the construction of customised Web applications by
adopting object-oriented primitives to build the application’s conceptual,
navigational, and interface models. WebML (Web Modelling Language)
[2] is, moreover, a specification language that proposes four types of models, Structural Model, Hypertext Model, Presentation Model, and Personalisation Model, used to specify different characteristics of complex Web
applications, irrespective of their implementation details. Finally, an extension of UML diagrams with new class stereotypes for representing specific
Web application components, such as HTML pages, forms, server pages,
is proposed in [4].
In addition to these models, other representation models explicitly
geared towards Web application testing have been proposed in the literature. Two categories are currently used to classify these models: behaviour
models and structural models. The former are used to describe the functionality of a Web application irrespective of its implementation. The latter
are derived from the implementation of the application.
Behaviour models support black box (or responsibility-based) testing.
Use case models and decision tables [6], and state machines [1], have been
used to design Web application test cases for black-box testing techniques.
Structural models are used for white box testing. Both control flow representation models of a Web application’s components [16,18,19], and
models describing an application’s organisation in terms of Web pages and
hyperlinks, have been proposed [6,19]. Further details of these representations are given in Sect. 7.5.
The meta-model of a Web application [7] is now described. This model
is presented in Fig. 7.1 using a UML class diagram where various types of
classes and associations represent several categories of a Web application’s components and their relationships. A Web application can be modelled using a UML class diagram model instantiated from this meta-model.
Web Application Testing
229
Client
Function
Client
Module
Module
Include
*
1..*
Client Script
Redirect
*
*
*
Built Client
Static Page
Include
Build
*
Server Class
Server Class
Page
*
Client Class
Client Class
Web
Web Page
Page
Client Page
HTML
HTMLTag
Tag
Web
Web Object
Object
link
*
1..*
download
*
*
Server
ServerScript
Script
Server
Function
Function
Parameter
Parameter
{incomplete}
1
*
Form
Form
*
Submit
1
Server Page
Server Page
Downloadable File
Downloadable File
Java
JavaApplet
Applet
1
Frameset
Frameset
Multimedia
Multimedia
*
Redirect
1
*
*
Include
*
Field
Field
Interface
Interface
Objects
Objects
Flash Object
Flash Object
1..*
Load_In_Frame
Frame
Frame
{incomplete}
*
DB Interface
DB Interface
Mail
Mail
Interface
Interface
Server
ServerFile
File
Interface
Interface
Fig. 7.1. The meta-model of a Web application presented in [7]
The meta-model assumes that a Web application comprises Web Pages,
which can be grouped as Server Pages, i.e. pages that are deployed on the
Web server, and Client Pages, i.e. pages that a Web server sends back in
answer to a client request. As for the Client Pages, they can be classified
as Static Pages, if their content is fixed and stored permanently, or Client
Built Pages, if their content varies over time and is generated on-the-fly by
a Server Page. A Client Page is composed of HTML Tags. A Client Page
may include a Frameset, composed of one or more Frames, and in each
Frame a different Web Page can be loaded. Client Pages may include
finer-grained items implementing processing actions, such as Client
Scripts. A Client Page may also include other Web Objects such as Java
Applets, images and Multimedia Objects (e.g. sounds, movies), Flash Objects etc. A Client Script may include Client Modules. Both Client Scripts
and Client Modules can include Client Functions, or Client Classes. A
Client Script may redirect the elaboration to another Web Page. In addition, a Client Page may be linked to another Web Page, through a hyperlink to the Web Page’s URL: a link between a Client Page and a Web Page
may be characterised by any Parameter that the Client Page provides to
the Web Page. A Client Page may also be associated with any Downloadable File, or it may include any Form, composed of different types of
Field (e.g. select, button, text-area fields). Forms are used to collect user
input and to submit the input to the Server Page responsible for its elaboration. A Server Page may be composed of any Server Script, which can
230
Giuseppe A. Di Lucca, Anna Rita Fasolino
include any Server Class or Server Function, implementing any processing
action, which may either redirect the request to another Web Page, or dynamically build a Client Built Page providing the result of an elaboration.
Finally, a Server Page may include other Server Pages, and may be associated with other Interface Objects allowing the connection of the Web
application to a DBMS, a file server, a mail server, or another system.
7.4 Unit Integration and System Testing of a Web
Application
The flow of activities of a software testing process usually begins with unit
testing and proceeds with integration and system test. The aim of unit testing is to verify each application’s individual source code component, while
integration testing considers combined parts of an application to verify
how they function together. Finally, system testing aims at discovering
defects that are properties of the entire system rather than of its individual
components.
7.4.1 Unit Testing
To set up Web application unit testing it is important to choose the application components to be tested individually. If we consider the model of a
Web application as presented in Fig. 7.1, different types of unit may be
identified (e.g. Web pages, scripting modules, forms, applets, servlets).
However, the basic unit that can actually be tested is a Web page, if we
consider that any page’s element should also automatically be considered
for testing. As a consequence, pages are usually considered at the unit
testing level, although there are some differences between testing a client
or a server page. We present these differences below.
Testing Client Pages
Client pages constitute the application’s user interface. They are responsible for showing textual information and/or hyperlinks to users, for accepting user input, and for allowing user navigation throughout the application.
A client page may include scripting code modules that perform simple
functions, such as input validation or simple computations. Moreover,
client pages may be decomposed into several frames in which other client
pages can be visualised.
Web Application Testing
231
Testing a client page (including just HTML code) aims to verify:
− Compliance of the content displayed by the page to the one specified
and expected by a user (e.g. the rendering in the browser of both textual content and its formatting of forms, images and other Web objects
will have to be verified).
− Correctness of target pages pointed to by hyperlinks, i.e. when a link is
selected, the right page should be returned.
− Existence of pending links, i.e. links to pages that do not exist.
− Correctness of the actions performed when a button, or any other active object, is selected by a user.
− Correctness of the content visualised in the frames.
If the client page includes scripting code, failures due to scripts will also
have to be verified.
Testing dynamically generated client pages (built-in pages) is a particular case of client page testing. The basic problem with this testing is that
the availability of built-in pages depends on the ability to identify and repeat the same conditions (in terms of application state and user input) used
to build such pages. A second problem is that of having too many pages
being generated, since the number of dynamic pages can be considerable,
depending on the large number of possible combinations of application
state and user input. Equivalence class partitioning criteria (such as those
considering exemplar path execution of server pages) should be used to
deal with this issue.
Unit testing of client pages can be carried out using white box, black
box, or grey box testing techniques. Several implementation-based criteria
can be used to evaluate white box test coverage, such as:
− HTML statement coverage.
− Web object coverage, i.e. each image, multimedia component, applet,
etc. will have to be tested at least once.
− Script block coverage, i.e. each block of scripting code, such as client
side functions, will have to be executed at least once.
− Statement/branch/path coverage for each script module.
− Link coverage.
Testing Server Pages
The main goal of server pages is to implement an application’s business
logic, thus coordinating the execution of business rules and managing the
storing and retrieving of data into/from a database.
Usually, server pages are implemented by a mixture of technologies,
such as HTML, script languages (e.g. VBS, JSP), Java servlets, or COTS.
232
Giuseppe A. Di Lucca, Anna Rita Fasolino
Typical results of server page execution are data storage into a database, or
generation of client pages based on user requests.
Testing a server page aims to identify failures of different types, such as:
−
−
−
−
Failures in the executions of servlets, or COTS.
Incorrect executions of data being stored into a database.
Failures due to the existence of incorrect links between pages.
Defects in dynamically generated client pages (such as non-compliance
of the client page with the output specified for the server page).
Unit testing of server pages can also be carried out using white box,
black box, or grey box techniques. White box coverage criteria include:
−
−
−
−
−
Statement/branch/path coverage in script modules.
HTML statement coverage.
Servlet, COTS, and other Web object coverage.
Hyperlink coverage.
Coverage of dynamically generated pages.
Appropriate driver and stub pages have to be generated to carry out unit
page testing effectively (see Sect. 7.6 for a discussion on the generation of
such drivers and stubs).
7.4.2 Integration Testing
Integration testing is the testing of a Web application’s combined pages to
assess how they function together. An integration criterion has to be used
to choose the pages to be combined and tested together. Design documentation showing relationships between pages can be used to define an integration strategy.
As an example, the Web application model, obtained by instantiating the
meta-model presented in Fig. 7.1, can be used to identify the pages to be
combined. Pages chosen will be those linked by direct relationships, such
as hyperlinks, or by dependency relationships due to redirect or submit
statements (included either in a server or in a client page), or by build relationships between a server page and the client page produced.
Another integration criterion may consider a server page and each client
page it generates at run time as a unit to be tested. The problem of client
page explosion will have to be addressed with equivalence class partitioning criteria.
Page integration can be driven by the use cases implemented by the application, or any other description of the application’s functional requirements.
For each use case (or functional requirement), Web pages collaborating for
Web Application Testing
233
its implementation are to be considered for integration purposes. The identification of such Web pages can be made by analysing development documentation or by reverse engineering the application code. Reverse engineering techniques, such as the one described in [5], can be used to analyse the
relationships between pages and to identify clusters of interconnected pages
that implement a use case.
At the integration testing level, both the behaviour and the structure of
the Web application will have to be considered: knowledge of the application structure will be used to define the set of pages to be integrated, while
knowledge of the behaviour implemented by these pages will be needed to
carry out integration testing with a black box strategy. Therefore, grey box
techniques may be more suitable than pure black or white box ones to
carry out integration testing.
7.4.3 System Testing
System testing aims to discover defects related to the entire Web application. In traditional software testing, black box approaches are usually exploited to accomplish system testing and to identify failures in the externally visible behaviour of the application. However, grey box techniques
that consider the application navigation structure, in addition to its behaviour, for designing test cases may be more effective in revealing Web application failures due to incorrect navigation links among pages (such as
links connecting a page to a different one from the specified page, pending
links, or links to unreachable pages).
Depending on the testing strategy adopted, coverage criteria for system
testing will include:
− User functions/use cases coverage (if a black box approach is used).
− Page (both client and server) coverage (usable for white box or grey
box approaches).
− Link coverage (usable for white box or grey box approaches).
7.5 Strategies for Web Application Testing
Testing strategies define the approaches for designing test cases. They can
be responsibility based (also known as black box), implementation based
(or white box), or hybrid (also known as grey box) [3]. Black box techniques design test cases on the basis of the specified functionality of the
item to be tested. White box techniques rely on source code analysis to
234
Giuseppe A. Di Lucca, Anna Rita Fasolino
develop test cases. Grey box testing designs test cases using both responsibility-based and implementation-based approaches.
This section discusses representative contributions presented in the literature for white box, black box, and grey box testing of Web applications.
7.5.1 White Box Strategies
White box strategies design test cases on the basis of a code representation
of the component under test (i.e. the test model), and of a coverage model
that specifies the parts of the representation that must be exercised by a test
suite. As an example, in the case of traditional software the control flow
graph is a typical test model, while statement coverage, branch coverage,
or basis-path coverage are possible code coverage models.
As for the code representation models adopted to test Web applications,
two main families of structural models are used: the first one focuses on
the level of abstraction of single statements of code components of the
application, and represents the traditional information about their control
flow or data flow. The second family considers the coarser degree of
granularity of the pages of the Web application and essentially represents
the navigation structure between pages of the application with eventual
additional details. As the coverage criteria, traditional ones (such as those
involving nodes, edges, or notable paths from the graphical representations
of these models) have been applied to both families of models.
Two white box techniques proposed in the literature to test Web applications will be presented in this section. The first technique was proposed
by Liu et al. [17] and exploits a test model that belongs to the first family
of models, while the second one was proposed by Ricca and Tonella [19,
20] and is based on two different test models, each one belonging to a different family.
The white box technique proposed by Liu et al. [17] is an example of
how data-flow testing of Web applications can be carried out. The approach is applicable to Web applications implemented in the HTML and
XML languages, including interpreted scripts as well as other kinds of
executable components (e.g. Java applets, ActiveX controls, Java beans) at
both the client and server side of the application.
The approach is based on a Web application test model, WATM, that includes an object model, and a structure model. The object model represents
the heterogeneous components of a Web application and the ways they are
interconnected using an object-based approach. The model includes three
types of objects (i.e. client pages, server pages, and components) and seven
types of relationships between objects. Each object is associated with attributes corresponding to program variables or other HTML specific document
Web Application Testing
235
elements (e.g. anchors, headers, or input buttons), and operations corresponding to functions written in scripting or programming languages. Relationships between objects are of seven types: inheritance, aggregation, association, request, response, navigation, and redirect. The first three have the
classical object-oriented semantics, while the last four represent specific
relationships between client and server pages. A request relationship exists
between a client and a server page when a server page is requested by a client
page; a response relationship exists between a client and a server page when
a client page is generated by a server page as a response of an elaboration; for
two client pages there is a navigation relationship if one of them includes a
hyperlink to the other page; finally, between two server pages there is a redirect relationship if one of them redirects an HTTP request to the other. The
structure model uses four types of graphs to capture various types of data
flow information on a Web application: the Control Flow Graph (CFG) of
an individual function, the Interprocedural Control Flow Graph (ICFG) that
involves more than one function and integrates the CFGs of functions that
call each other, the Object Control Flow Graph (OCFG) that integrates the
CFGs of object functions that are involved in sequences of function invocations triggered by GUI events, and, finally, the Composite Control Flow
Graph (CCFG) that captures the pages where a page passes data to the other
one when the user clicks a hyperlink, or submits a form, and is constructed
by connecting the CFGs of the interacting Web pages.
The data flow testing approach derives test cases from three different perspectives: intra-object, inter-object, and inter-client. For each perspective,
def-use chains of variables are taken into account for defining test paths that
exercise the considered def-use chains. Five testing levels specifying different scopes of the tests to be run have been defined, namely: Function, Function Cluster, Object, Object Cluster, and Application level.
For the intra-object perspective, test paths are selected for variables that
have def-use chains within an object. The def-use chains are computed
using the control flow graphs of functions included in the object, and can be
defined at three different testing levels: single function, cluster of functions
(i.e. set of functions that interact via function calls within an object), and
object level (considering different sequences of function invocations within
an object).
For the inter-object perspective, test paths are selected for variables that
have def-use chains across objects. Def-use chains have to be defined at
the object cluster level, where each cluster is composed by a set of message-passing objects.
Finally, the inter-client perspective derives test paths on the basis of defuse chains of variables that span multiple clients, since in a Web application a variable can be shared by multiple clients. This level of testing is
called application level.
236
Giuseppe A. Di Lucca, Anna Rita Fasolino
This testing technique is relevant since it represents a first attempt to extend the data flow testing approaches applicable to traditional software to
the field of Web applications. However, to make it actually usable in realworld Web application testing, further investigation is required. Indeed,
the effectiveness of the technique has not been validated by any experiment involving more than one example Web application: to carry out these
experiments, an automated environment for testing execution, including
code analysers, data flow analysers, and code instrumentation tools, would
be necessary. Moreover, indications about how this data flow testing approach may be integrated in a testing process would also be needed: as an
example, the various testing perspectives and levels proposed by the approach might be considered in different phases of a testing process to carry
out unit test, as well as integration or system test. However, in this case an
experimental validation and tuning would also be required.
A second proposal in the field of structural testing of Web applications
has been suggested by Ricca and Tonella [19], who proposed a first approach for white box testing of primarily static Web applications. This
approach was based on a test model named the navigational model that
focuses on HTML pages and navigational links of the application. Later,
the same authors presented an additional lower layer model, the control
flow model, representing the internal structure of Web pages in terms of
the execution flow followed [20]. This latter model has also been used to
carry out structural testing.
In the navigational model two types of HTML pages are represented:
static pages, whose content is immutable, and dynamic pages, whose content is established at run time by server computation, on the basis of user
input and server status. Server programs (such as scripts or other executable objects) running on the server side of the application, and other page
components that are relevant for navigational purposes, such as forms and
frames, are also part of the model. Hyperlinks between HTML pages and
various types of link between pages and other model components are included in this code representation.
As for the control flow model, it takes into account the heterogeneous
nature of statements written in different coding languages, and the different mechanisms used to transfer control between statements in a Web application. It is represented by a directed graph whose nodes correspond to
statements that are executed either by the Web server or by the Internet
browser on the client side, and whose edges represent control transfer.
Different types of nodes are shown in this model, according to the programming language of the respective statements.
A test case for a Web application is defined as a sequence of pages to be
visited, plus the input values to be provided to pages containing forms.
Various coverage criteria applicable to both models have been proposed to
Web Application Testing
237
design test cases: they include path coverage (requiring that all paths in the
Web application model are traversed in some test case), branch coverage
(requiring that all branches in the model are traversed in some test case),
and node coverage (requiring that all nodes in the model are traversed in
some test case).
Assuming that the nodes of the representation models can be annotated
by definitions or uses of data variables, further data flow coverage criteria
have been described too: all def-use (all definition-clear paths from every
definition to every use of all Web application variables are traversed in
some test case), all uses (at least one def-clear path if any exists from every
definition to every use of all Web application variables traversed in some
test case), all defs (at least one def-clear path if any exists from every definition to at least one use of all Web application variables is traversed in
some test case).
This testing approach is partially supported by a tool, ReWeb, that
analyses the pages of the Web application and builds the corresponding
navigational model, and another tool, TestWeb, that generates and executes test cases. However, the latter tool is not completely automated,
since user intervention is required to generate input and act as an oracle.
The main limitation of this testing approach concerns its scalability (consider the problem of path explosion in the presence of cycles on the
graphs, or the unfeasibility of the all-do coverage criterion).
A few considerations about the testing levels supported by white box
techniques can be made. Some approaches are applicable at the unit level,
while others are considered at the integration and system levels. For instance, the first approach proposed by Liu et al. [17] is applicable at various testing levels, ranging from unit level to integration level. As an example, the intra-object perspective can be used to obtain various types of
units to be tested, while inter-object and inter-application perspectives can
be considered for establishing the items to be tested at the integration level.
Conversely, the approaches of Ricca and Tonella are applicable exclusively at the system level. As a consequence, the choice of a testing technique to be applied in a testing process will also depend on the scope of the
test to be run.
7.5.2 Black Box Strategies
Black box techniques do not require knowledge of software implementation items under test since test cases are designed on the basis of an item’s
specified or expected functionality.
One main issue with black box testing of Web applications is the choice
of a suitable model for specifying the behaviour of the application to be
238
Giuseppe A. Di Lucca, Anna Rita Fasolino
tested and to derive test cases. Indeed, this behaviour may significantly
depend on the state of data managed by the application and on user input,
with the consequence of a state explosion problem even in the presence of
applications implementing a few simple requirements.
Solutions to this problem have been investigated and presented in the
literature. Two examples of proposed solutions are discussed in this subsection. The first example is offered by the black box testing approach
proposed by Di Lucca et al. [6] that exploits decision tables as a combinatorial model for representing the behaviour of a Web application and to
produce test cases. The second example is provided by Andrews et al. [1]
where state machines are proposed to model state-dependent behaviour of
Web applications and to design test cases.
Di Lucca et al. [6] suggest a two-stage black box testing approach. The
first stage addresses unit testing of a Web application, while the second stage
considers integration testing. The scope of a unit test is a single application
page, either a client or server page, while the scope of an integration test is a
set of Web pages that collaborate to implement an application’s use case.
Unit test is carried out with a responsibility-based approach that uses
decision tables to represent page requirements, and therefore derive test
cases. A decision table can be used to represent the behaviour of software
components whose responses are each associated with a specific condition.
Usually a decision table has two parts: the condition section (listing conditions and combinations of conditions) and the action section (listing responses to be produced when corresponding combinations of conditions
are true). Each unique combination of conditions and actions is a variant,
represented as a single row in the table.
As for the unit testing of client and server pages, the approach requires
that each page under test is preliminarily associated with a decision table
describing a set of variants of the page. Each variant represents an alternative behaviour offered by the page and is defined in terms of an Input section and an output section. In the case of client pages, the input section
describes a condition in terms of input variables to the page, input actions,
and state before test where the state is defined by the values assumed, before test execution, by page variables, tag attributes, cookies, and by the
state of other Web objects used by page scripts. In the output section, the
action associated with each condition is described by the expected results,
expected output actions, and expected state after test (defined as for the
state before test). Table 7.1 shows the template of the decision table for
client page testing.
Such specification technique may be affected by the problem of variant
explosion. However, criteria for partitioning input section data into equivalence classes may be defined and used to reduce the set of variants to be
taken into account.
Web Application Testing
239
In the case of server pages, the decision table template is slightly different (see Table 7.2): for each page variant the input section includes the
input variables field that comprises the variables provided to the server
page when it is executed, and the state before test field that is defined by
the values assumed, before test execution, by page session variables and
cookies, as well as by the state of the session objects used by the page
scripts. In the output section, the expected results field represents the values of the output variables computed by the server page scripts, the expected output field includes the actions performed by the server side
scripts (such as composing and sending an e-mail message), and the expected state after test field includes the values of variables and cookies, as
well as the state of session objects, after execution.
Table 7.1. A decision table template for client page testing
Variant
…
Input Section
Input
Input
State
variables actions before
test
…
Output Section
Expected Expected Expected
state after
results
output
test
actions
…
Table 7.2. A Decision Table template for server page testing
Variant
Input Section
Input
variables
…
State before
test
Output Section
Expected
results
Expected
output actions
Expected
state after
test
…
As for the definition of the decision tables, the authors propose to compile them by analysing the development documentation (if available) or by
reverse engineering the Web application code, and focusing on the page
inner components that help to define the conditions and actions of each
variant. An object model of a Web application representing each component of the application relevant for testing purposes is specifically presented by the authors to support this type of analysis. This model is actually
an extended version of the one reported in Fig. 7.1, including additional
relevant details for the aims of testing (such as session variables).
The test case selection strategy is based on the decision tables and requires that test cases are defined in order to cover each table variant for
both true and false values. Other criteria based on partitioning the input
sets into equivalence classes are also suggested for defining test cases.
240
Giuseppe A. Di Lucca, Anna Rita Fasolino
In this testing approach, decision tables are also used to develop driver
and stub modules which will be needed to execute the client page testing. A
driver module will be a Web page that interacts with the client page by
populating its input forms and generating the events specified for the test
case. The driver page will include script functions, and the Document Object Model (DOM) will allow its interaction with the tested page. Stub
modules can be developed as client pages, server pages or Web objects.
The complexity of the stub will depend both on the type of interaction between the tested page and the component to be substituted, and on the complexity of the function globally implemented by the pair of components.
As for the integration testing, a fundamental question is the one of determining which Web pages have to be integrated and tested. The authors
of this approach propose to integrate Web pages that collaborate with the
implementation of each use case (or functional requirement) of the application. They propose to analyse the object model of the Web application in
order to find client and server pages to be gathered together. A valuable
support for the identification of clusters of interconnected pages may be
provided by clustering techniques, such as the one proposed in [5]. This
technique produces clusters of pages on the basis of a measure of coupling
of interconnected pages that associates different weights to different types
of relationship (Link, Submit, Redirect, Build, Load_in_Frame, Include)
between pages. Once clusters have been defined and use cases have been
associated to each of them, the set of pages included in each cluster will
make up the item to be tested. For each use case a decision table can be
defined to drive integration testing. Such a decision table can be derived
from the ones defined for the unit testing of the single pages included in
the cluster.
The second black box approach for Web application testing considered
in this section exploits Finite State Machines (FSMs) for modelling software behaviour and deriving test cases from them [1]. This approach explicitly takes into account the state-dependent behaviour of Web applications, and proposes specific solutions for addressing the problem of state
explosion.
The process for test generation comprises two phases: in the first phase,
the Web application is modelled by a hierarchical collection of FSMs,
where the bottom-level FSMs are formed by Web pages and parts of Web
pages, while a top-level FSM represents the whole application. In the second phase, test cases are generated from this representation.
The model of the Web application is obtained as follows. First, the application is partitioned into clusters that are collections of Web pages and
software modules that implement a logical function. This clustering task is
made manually and is thus subjective. Second, Web pages that include
more than one HTML form, each of which is connected to a different
Web Application Testing
241
back-end software module, will be modelled as multiple Logical Web
Pages (LWP), in order to facilitate testing of these modules. Third, an FSM
will be derived for each cluster, starting from bottom-level clusters containing only modules and Web pages (no clusters), and therefore aggregating lower-level FSMs into a higher level FSM. Ultimately, an Application
FSM (AFSM) will define an FSM of the entire Web application. In each
FSM, nodes will represent clusters and edges will represent valid navigation among clusters. Moreover, edges of the FSMs will be annotated with
inputs and constraints that may be associated with the transitions. Constraints on input, for instance, will indicate if input data are optional and
their eventual input order. Information will also be propagated between
lower-level FSMs.
Annotated FSMs and aggregate FSMs are thus used to generate tests.
Tests are considered as sequences of transitions in an FSM and the associated constraints. Test sequences for lower-level FSMs are combined to
form the test sequences for the aggregate FSMs. Standard graph coverage
criteria, such as all nodes and all edges, are used to generate sequences of
transitions for clusters and to aggregate FSMs.
While the approach of Di Lucca et al. provides a method for both unit
and integration testing, the one by Andrews et al. mainly addresses integration and system testing. Both approaches use clustering to identify groups
of related pages to be integrated, even if in the second one the clustering is
made manually, and this may limit the applicability of the approach when
large-size applications are tested.
The second method can be classified as a grey box rather a than pure
black box technique. Indeed, test cases are generated to cover all the transitions among the clusters of LWPs, and therefore knowledge of the internal structure of the application is needed. Grey box testing strategies will
be discussed in the next subsection.
7.5.3 Grey Box Testing Strategies
Grey box testing strategies combine black box and white box testing approaches to design test cases: they aim at testing a piece of software
against its specification but using some knowledge of its internal workings.
Among the grey box strategies we will consider the ones based on the
collection of user session data. These methods can be classified as grey
box since they use collected data to test the behaviour of the application in
a black box fashion, but they also aim at verifying the coverage of any
internal component of the application, such as page or link coverage.
Two approaches based on user session data will be described here.
242
Giuseppe A. Di Lucca, Anna Rita Fasolino
7.5.4 User Session Based Testing
Approaches based on data captured in user sessions transparently collect
user interactions with the Web server and transform them into test cases
using a given strategy.
Data to be captured about the user interaction with the Web server include clients’ requests expressed in form of URLs and name value pairs.
These data can be obtained from the log files stored by the Web servers, or
by adding script modules on the requested server pages that capture the
name value pairs of exchanged parameters. Captured data about user sessions can be transformed into a set of HTTP requests, each one providing a
separate test case.
The main advantage of this approach is the possibility of generating test
cases without analysing the internal structure of a Web application, thus
reducing the costs of finding inputs. In addition, generating test cases using
user session data is less dependent on the heterogeneous and fast-changing
technologies used by Web applications, which is one of the major limitations of white box testing techniques. However, it can be argued that the
effectiveness of user session techniques depends on the set of user session
data collected: the wider this set, the greater the effectiveness of the approach to detect faults; but the wider the user session data set, the greater
the cost of collecting, analysing and storing data. Therefore there is a
trade-off between test suite size and fault detection capability.
Elbaum et al. [8] propose a user session approach to test a Web application and present the results of an empirical study where the effectiveness
of white box and user session techniques was compared. In the study, user
session collected data consist of sequences of HTTP requests made by
users. Each sequence reports the pages (both client and server ones) the
user visited together with the data he/she provided as input, in addition to
the data resulting from the elaboration of requests made by the user.
The study considered two implementations of the white box testing approach proposed by Ricca and Tonella [19], and three different implementations of the user session approach. The first implementation transforms
each individual user session into a test case; the second implementation
combines interactions from different user sessions; and the third implementation inserts user session data into a white box testing technique. The
study explored the effectiveness of the techniques in terms of the fault
detection they provide, the cost-effectiveness of user-session- based techniques, and the relationship between the number of user sessions and the
effectiveness of the test suites generated based on those sessions’ interactions. As a general result, the effectiveness of white box and user session
techniques was comparable in terms of fault detection capability, even if
the techniques showed it was possible to find different types of faults. In
Web Application Testing
243
particular, user session techniques were not able to discover faults associated with rarely entered data. The experiment also showed that the effectiveness of user session techniques improves as the number of collected
user sessions increases. However, as the authors recognised, the growth of
this number puts additional challenges on the cost of collecting and managing sessions, such as the problem of finding an oracle to establish the
expected output of each user request. The possibility of using reduction
techniques, such as the one described in [10], is suggested by the authors
as a feasible approach for reducing test suite size, but its applicability
needs further investigation. A second empirical study carried out by the
same authors and described in [8] essentially confirmed the results of the
first experiment.
Sampath et al. [21] have explored the possibility of using concept analysis to achieve scalability in user-session based testing of Web applications.
Concept analysis is a technique for clustering objects that have common
discrete attributes. It is used in [21] to reduce a set of user sessions to a
minimal test suite, which still represents actual executed user behaviour. In
particular, a user session is considered as a sequence of URLs requested by
the user, and represents a separate use case offered by the application.
Starting from an original test suite including a number of user sessions, this
test suite is reduced by finding the smallest set of user sessions that covers
all the URLs of the original test suite. At the same time, it represents the
common URL of the different use cases represented by the original test
suite. This technique enables an incremental approach that updates the test
suite on-the-fly, by incrementally analysing additional user sessions. The
experiments carried out showed the actual test suite reduction is achievable
by the approach, while preserving the coverage obtained by the original
user sessions’ suite, and with a minimal loss of fault detection. The authors
have developed a framework that automates the entire testing process, from
gathering user sessions through the identification of a reduced test suite to
the reuse of that test suite for coverage analysis and fault detection. A detailed description of this framework can be found in [21].
7.6 Tools for Web Application Testing
The effectiveness of a testing process may significantly depend on the tools
used to support the process. Testing tools usually automate some tasks required by the process (e.g. test case generation, test case execution, evaluation of test case results). Moreover, testing tools may support the production of useful testing documentation and its configuration management.
A variety of tools for Web application testing has been proposed, where
the majority was designed to carry out performance and load testing, security
244
Giuseppe A. Di Lucca, Anna Rita Fasolino
testing, or to implement link and accessibility checking and HTML validation. As for the functional testing, existing tools’ main contribution is limited
to managing test case suites created manually, and to matching the test case
results with respect to an oracle created manually. Greater support for automatic test case generation would help enhance the practice of testing Web
applications. User session testing can also be useful since it captures details
of user interactions with the Web application. Test scripts that automatically
repeat such interactions could also be created to assess the behaviour exhibited by the application. A list of more than 200 either commercial or freeware Web testing tools for Web applications is presented in [12].
Web application testing tools can be classified using the following six
main categories:
a)
b)
c)
d)
e)
f)
Load, performance and stress test tools.
Web site security test tools.
HTML/XML validators.
Link checkers.
Usability and accessibility test tools.
Web functional/regression test tools.
Tools belonging to categories a), b), e) can be used to support nonfunctional requirement testing, while tools from categories c) and d) are
more oriented to verifying the conformance of a Web application code to
syntactical rules, or the navigability of its structure. This functionality is
often offered by Web site management tools, used to develop Web sites
and applications. Tools from category f) support functionality testing of
Web applications and include, in addition to capture and replay tools, other
tools supporting different testing strategies such as the one we analysed in
Sect. 7.5.
Focusing on tools within category f), their main characteristics are discussed below, where the main differences from tools usable for traditional
applications testing are also highlighted.
Services that are generic and aim to aid the functionality testing of a
Web application should include:
− Test model generation: this is necessary to produce an instance of the
desired/ specified test model of the subject application. This model
may be either one of the models already produced in the development
process, and the tool will have just to import it, or produced by reverse
engineering the application code.
− Test Case Management: this is needed to support test case design and
testing documentation management. Utilities for the automatic generation of the test cases would be desirable.
Web Application Testing
245
− Driver and Stub Generation: this is required to produce automatically
the code of the Web pages implementing the driver and stub modules,
needed for test case execution.
− Code Instrumentation: this is necessary to instrument automatically the
code of the Web pages to be tested, by inserting probes that automatically collect data about test case execution.
− Test result analysis: this service will analyse and automatically evaluate test case results.
− Report generation: this service will produce adequate reports about
analysis results, such as coverage reports about the components exercised during the test.
A generic possible architecture of such a tool is depicted in Fig. 7.2,
comprising the following main components:
− Interface layer: implements a user interface providing access to the
functions offered by the tool.
− Service layer: includes the components implementing tool services.
− Repository layer: includes the persistent data structures storing the
Web application model, test cases and test logs, and the files of the instrumented Web pages, driver Web pages, stub Web pages, as well as
the test reports.
Services offered by the tool, such as driver and stub generation, as well
as code instrumentation and test model generation, are more reliant on the
specific technologies used to implement the Web application, while others
will be largely independent of the technologies. As an example, different
types of drivers and stubs will have to be generated for testing client and
server Web pages as the technology (e.g. the scripting languages used to
code the Web pages) affects the way drivers and stubs are developed.
In general, the driver of a client page has the responsibility of loading
the client page into a browser, where it is executed, while the driver of a
server page requires the execution of the page on the Web server. Stubs of
a client page have to simulate the behaviour of pages that are reachable
from the page under test by hyperlinks, or whose execution on the Web
server is required by the page. Stubs of a server page have to simulate the
behaviour of other software components whose execution is required by
the server page under test. Specific approaches have to be designed to implement drivers and stubs for Web pages created dynamically at run time.
Depending on the specific technology used to code Web pages, different
code instrumentation components also have to be implemented. Code analysers, including different language parsers, have to be used to identify
automatically the points where probes are to be inserted in the original
page code.
246
Giuseppe A. Di Lucca, Anna Rita Fasolino
Interface
Service
Source
Files
Test Model
Abstractor
Test Case
Manager
Driver and
Stub Generator
Code
Instrumentator
Test Result
Analyzer
Report
Generator
Repository
External
Tool
WA Test
Model
Execution
Analysis Results
Test Cases
Documentation
Instrumented
Web Pages
Fig. 7.2. The layered architecture of a tool supporting Web application testing
Analogously, the test model generator component that has to reverseengineer the application code for generating the test model is largely dependent on the technologies used to implement the application. Code analysers are also required in this case.
The other modules of such a generic tool are less affected by the Web
application technologies, and can be developed as in the case of traditional
applications.
7.7 A Practical Example of Web Application Testing
In this section we present an example where the functional requirements of
an industrial Web application are tested. It addresses the problem of testing
an existing Web application using poor development documentation. We
also present additional analysis techniques needed to support the application
Web Application Testing
247
testing in the case of existing and scarcely documented Web applications.
Such a problem may also exist within the development process, where the
testing of scarcely documented Web applications is often encountered.
The application’s unit, integration and system testing will be analysed
and testing approaches proposed in the literature are used to accomplish
these tasks.
The Web application presented is named “Course Management”, and
was developed to support the activities of undergraduate courses offered
by a Computer Science Department. The application provides students and
teachers with several distinct services: a teacher can publish course information, and manage examination sessions and student tutoring agendas,
while students can access course information, and register for a course or
an examination session. A registered student can also download teaching
material.
The technologies used to implement the application are HTML, ASP,
VB Script and Javascript. The application includes a Microsoft Access
database, and is composed of 106 source files whose total size is close to
500 Kbytes. As for the development documentation, just a textual description of user functions was available.
The first step is to carry out a preliminary reverse engineering process
to reconstruct design documentation that is essential for a testing activity.
Such documentation includes a specification of functional requirements
implemented by the application, design documentation providing the application’s organisation in terms of pages and their interconnection relationships, as well as traceability information.
To obtain this information, the reverse engineering approach and the
WARE tool presented in [7] were used. This tool the allows main components of a Web application and relationships between components to be
automatically obtained by source code static analysis. The tool also provides the graphical representation of this information, called WAG (Web
Application connection Graph). Table 7.3 lists a count of the items found
in the application code for each category of components and relationships
identified by the tool, while Fig. 7.3 reports the WAG depicting all the
identified components and relationships. In this graph, which is an instantiation of the application’s meta-model, different shapes have been used to
distinguish different types of components and relationships. As an example, a box is used for drawing a Static Page, a trapezium for a Built Client
Page and a diamond for a Server Page.
Using the clustering technique described in [5] and exploiting the available documentation on user functions, the application’s use case model
was reconstructed, and groups of pages, each implementing a use case,
were identified. Figure 7.4 shows this use case model.
248
Giuseppe A. Di Lucca, Anna Rita Fasolino
The testing process carried out was driven by the application’s use case
model. For each use case, a unit testing of the Web pages implementing
the case was executed, using the black box technique based on the decision
tables proposed in [6]. After the unit testing, an integration testing was
carried out. In what follows, we will refer to the use case named “Teacher
and course management” to show how testing was carried out.
The “Teacher and course management” use case implements the application behaviour permitting a registered teacher to manage his/her personal
data and data about courses he/she teaches. This use case allows a teacher
to:
− F1: register, update or delete personal data.
− F2: add a new course and associate it to the teacher for the current
academic year.
− F3: update/delete the data about a course taught by the teacher.
Figure 7.5 specifies the functional requirements of the function F2. Figure 7.6 shows, using the UML extensions from Conallen [4], an excerpt of
the WAG, made up by the pages implementing the function F2..
Figure 7.7 shows the rendering of the client page AddCourse.html (all
the labels/prompts in the page are in Italian) including a form that allows
the input of data needed to register a new course and to be added to the
ones taught by the teacher in the current year.
The unit testing of this page has been carried out using the following
approach. We started by analysing the responsibilities of this page. The
page is in charge of visualising a form (see Fig. 7.7) that allows the input
of required data, checking that all fields have been filled in, checking the
validity of the academic year value, and submitting the input data to the
server page AddCourse.asp. Moreover, a Reset button in the page allows
to be “blanked” all the form fields while a couple of radio buttons labelled
by YES/NO in the page are used to ask if the user wants to input data for
more than one course.
Finally, this page automatically computes the value of the second year
of the academic year field, after the first year value has been provided.
In order to associate the Web page with the decision table required by
the testing approach, for each page input item (such as form fields, buttons,
selection box, etc.) domain analysis was carried out to identify sets of valid
and not valid values. The functional specifications reported in Fig. 7.5
were used to accomplish domain analysis, whose results are reported in
Table 7.4. In the table, the input element named “More Courses?” is referred to the pair of radio buttons labelled YES/NO.
Web Application Testing
249
/aread
/c
/c
/autenticazion
Fig. 7.3. The WAG of the “Course Management” Web application
Table 7.3. Type and count of Web application items identified by static analysis
Item type
Server Page
Static Page
Built Client Page
Client Script
Client Function
Form
Server Script
Server Function
Redirect (in Server Scripts)
Redirect (in Client Scripts)
Link
Submit
Include
Load in Frame Operation
Count
75
23
74
132
48
49
562
0
7
0
45
49
57
4
250
Giuseppe A. Di Lucca, Anna Rita Fasolino
Tutoring Management
<<extend>>
Teacher and Course Management
<<extend>>
<<extend>>
Teacher
<<extend>>
Students’ Enrollment Managem ent
Teacher management
<<include>>
<<i nclude>>
<<extend>>
Examinations Management
User
General Constants Definition Teacher Login
Bulletin Board Management
<<include>>
<<extend>>
Students' Enrollment
<<extend>>
Student management
Student
<<extend>>
Examination Schedul e
Management
Tutoring Management
Fig. 7.4. The use case model of the “Course Management” Web application
Function: F2 Creates a new course and associates it with a registered teacher for the current
academic year.
Pre-condition: The teacher has to be already registered at the Web application.
• The teacher inputs Course Code and Name, and the Academic Year. If any datum is missing,
an error message is displayed.
• The Course Code must not yet exist in the database; the Course Name may already exist in
the database but associated to a different code. If the Course Code already exists, an error
message is displayed.
• The Academic Year and the current Academic Year must coincide, otherwise a message
error is displayed.
• If all the data are valid, the new course is added into the database and a message is sent to
client to notify the success of the operation.
Post-condition: The teacher is associated with the new course for the current academic year.
Fig. 7.5. Functional requirements of function F2
Web Application Testing
251
TeacherAr
ea.html
<<link>>
AddResult.
html
CorseMen
u.html
<<r edir ect >>
<<builds>>
<<link>>
AddCours
e.html
Add
Course.asp
<<submit >>
Course
Fig. 7.6. An excerpt of the WAG using Conallen’s graphical notation
Based on the specifications in Fig. 7.5 and information in Table 7.4, the
decision table reported in Table 7.5 was obtained. This decision table reports all the admissible combinations, deduced by the page implementation, of valid/not valid sets of values of the input elements, together with
the expected results and actions.
In Table 7.5, the columns reporting status before and after test have not
been shown for the sake of readability. The status before and after the test
was specified with respect to the status of the database. For each variant,
the state before test is always: “The teacher is registered in the database
and is allowed to do this operation”. The status after test is “The execution
of the test cases does not change the status of the database”.
252
Giuseppe A. Di Lucca, Anna Rita Fasolino
Fig. 7.7. The rendering of the client page AddCourse.html
Table 7.4. Valid and not valid sets of values of the input elements in the client
page AddCourse.html
Input Name
Valid Values
Course Code
The course code does not exist
in the database
Course Name
The course name does not exist
in the database or it already
exists but it is not associated
with the same inputted Course
Code in the database
Academic Year The current academic year
More Courses? {Yes, No}
Submit Button {Clicked, NotClicked}
Reset Button
{Clicked, NotClicked}
Not Valid Values
The course code already
exists in the database
The course name already
exists in the database and it
is associated with the same
inputted Course Code
Not Equal to current academic year
Not in {Yes, No}
Not in {Clicked, NotClicked}
Not in {Clicked, NotClicked}
Page testing was carried out with the aim of verifying that the page was
correctly visualised in the browser, and that validated data were sent correctly to the server page AddCourse.asp. To test the page, a driver module
allowing the client page to be loaded into a browser, after a registered
teacher’s login, had to be developed as well as a stub module simulating
the execution of the AddCourse.asp server page. This stub module just had
to verify that received data coincided with data sent by the client page, and
notify the result by a message sent to the client.
Web Application Testing
253
Table 7.5. Decision Table for testing the client page AddCourse.html
# Input Section
Course Course More
Academic Submit Reset
Name Code Courses? Year First
field
1 DC
DC
DC
Not Valid DC
Not
Clicked
2 DC
DC
DC
Valid
3 DC
DC
DC
DC
Output Section
Expected Expected
results
output
actions
Academic Academic
Year Error year second
Message field filled
with the
right value.
Clicked Not
Data sub- Academic
Clicked mitted to year second
server
field filled
page
with the
AddCour- right value
se.asp
Stub notification
message
about
submission
correctness
Not
Click All the
The page
Clicked
fields in
AddCourthe form se.html is
have
Visualized
'blank'
values
Note: DC = Don’t Care
A set of test cases was defined to exercise all the variants in Table 7.5.
These test cases are not reported due to lack of space. The execution of the
test cases revealed a failure in the validity check for the AcademicYear
values. Indeed, also the value of the Academic Year successive to the current one is accepted (e.g. 2004/2005 is also the valid value of the current
Academic Year, and the value 2005/2006 is accepted as valid, while all
other successive values, such as 2006/2007, are correctly refused).
This page was also submitted for white box testing. The AddCourse.html page includes HTML code for visualising and managing data input, and two JavaScript functions for validating the Academic Year value.
The branch coverage criterion was used to test this page, and all the
branches were covered by the set of test cases (the page was instrumented
to collect data and verify the coverage). A test model similar to the control
flow model proposed by Ricca and Tonella [19] was used to model the
structure of the AddCourse.html page and design the test cases.
254
Giuseppe A. Di Lucca, Anna Rita Fasolino
As an example of server page testing, we consider the AddCourse.asp
server page. The responsibility of this page is to register the new course in
the database and a course to its teacher, when input values are valid.
To test a server page its code has to be analysed to identify input variables. The input variables found in AddCourse.asp were the ones submitted by the AddCourse.html page (i.e. Course Code, Course Name, More
Courses?, Academic Year variables), besides the session variables LoginOK and TeacherCode. Domain analysis was carried out to define valid
and invalid sets of values of the input elements. As for the session variable
TeacherCode, its valid values were all the ones corresponding to registered
teachers and stored in the database. The valid value for the variable LoginOK was the logical value TRUE associated with the condition of an authorized user who made a successful login.
Table 7.6. Decision table for testing the server page AddCourse.asp
# Input section
Course Course More Aca- LoginOK Expected
Name Code Cours demic
results
es?
Year
1 Valid Valid NO
Valid True
Data registered
into the
data base.
Success
Message
2 Valid Valid YES Valid True
Data registered in
the data
base
3 DC
Not
Valid
DC
DC
True
Error
Message
4 DC
DC
DC
DC
False
Error
Message
5 Not
Valid
Valid
DC
DC
True
Error
Message
6 DC
DC
Not DC
Valid
True
Error
Message
DC – Does not care
Output section
Expected Expected
output
state after test
actions
The new
The page
course and its
TeacherArea.html association to
the teacher has
is visualbeen added to
ised
database
AddCour- The new
se.html
course and its
page is
association to
visualised the teacher has
been added to
database
The page
It coincides
Coursewith the before
Menu.html test state
is visualised
A new
Coincides with
login is
the before test
required
state
None
Coincides with
the before test
state
None
Coincides with
the before test
state
Web Application Testing
255
To compile the decision table associated with the page, expected results
and actions were also looked for. Table 7.6 shows the set of variants we
used to test the AddCourse.asp page. The TeacherCode variable was not
included in the table because it did not affect the page’s behaviour.
As for the testing of the page, a set of test cases was defined in order to
exercise each variant for both true and false values. To execute the page
test, a driver simulating the HTTP requests by a client, as well as a stub
simulating the client pages returned to the client, were developed. No stub
was used to simulate the connection to the database, but a test database
was used. The execution of this functional testing did not reveal any failure. The same page was submitted for white box testing, using the linearly
independent paths coverage criterion. The page included four linearly independent paths which were all covered by the test cases we designed.
At the integration testing level, these two pages were combined and retested together. Table 7.7 reports the decision table including the set of
variants used for integration testing. No more failures were observed during the integration testing.
Unit and integration testing were executed for each group of pages implementing the remaining Web application use cases. Thanks to the unit
testing, a few failures in some pages were observed. They were mostly due
to defects in the validation of user input data. Moreover, we observed that
some client pages included JavaScript functions that were not activated
during the execution, because they were dead code left in the page after a
maintenance intervention that replaced them with new (correctly activated)
functions. A similar datum was observed in a few server pages too. As for
the integration testing, no additional failure was observed.
As for the unit testing of dynamically generated Web pages, an additional effort was required to design test cases of the server pages responsible for building them. These test cases had to be run, so the client pages
were generated, and therefore were captured and stored on-the-fly, to be
successively tested. In some cases, the user-session-based approach was
exploited to identify test cases able to generate and cover the built client
pages.
256
Giuseppe A. Di Lucca, Anna Rita Fasolino
Table 7.7. Decision table for integration testing of the client page AddCourse.html
and server page AddCourse.asp
# Input Section
Course Course More Aca- Submit Reset
Name Code Cour- demic
ses? Year
1 DC
DC
DC
Not
Clicked Not
Valid
Clicked
2 DC
Not
Valid
DC
DC
Clicked Not
Clicked
3 Valid
Valid
NO
Valid
Clicked NotClicked
4 Valid
Valid
YES
Valid
Clicked NotClicked
5 DC
DC
DC
DC
NotClicked
Clicked
Output Section
Expected Expected Expected
state
results
output
after test
actions
Error
The page The
same as
Message AddCourse.ht before
test
ml is
visualised
The page The
Data
same as
submitted Corseto server Menu.htm before
l is Visu- test
page
AddCour- alised
se.asp
Error
Message
The page The new
Data
submitted Teacher- course
to server Area.html and its
is visual- associapage
tion to
AddCour- ized
the
se.asp.
teacher
Success
added to
Message
database
The page The new
Data
submitted AddCour- course
to server se.html is and its
visualized associapage
tion to
AddCor- again
the
se.asp.
teacher
Success
added to
Message
database
The page
All the
fields in AddCourthe form se.html is
visualized
put to
blank
DC – Does not care
We also executed a system test aimed at exercising each use case implemented by the application at least once. This testing did not reveal any
other failures. Moreover, the page coverage reached by this testing was
evaluated. All static pages were covered, except one server page which
Web Application Testing
257
was unreachable, since it was an older version of a page replaced in a
maintenance operation.
In conclusion, the experience of functional testing of the “Course Management” Web application was successfully accomplished. Indeed, the fact
that testing revealed just a few failures in the application (most of which
were due to incorrectly executed maintenance interventions) could be attributed to the “maturity” level of the Web application, which had been
running for two years.
The testing experience also highlighted that a considerable effort was
required to reconstruct the design documentation needed for test design
and execution. This effort might have been saved, or reduced, if this
documentation had already been available before testing. This datum can
be considered as a strong similarity point between functional testing of a
Web application and functional testing of a “traditional” system.
7.8 Conclusions
The openness of Web applications to plenty of users and the strategic value
of the services they offer oblige us to consider seriously the verification of
both non-functional and functional requirements of a Web application.
While new and specific approaches must be necessarily used for the verification of non-functional requirements (see the problems of security or
accessibility testing that are specific for Web applications), most of the
knowledge and expertise in the field of traditional application testing may
be reused for testing the functional requirements of a Web application.
In this chapter we have reported the main differences and points of similarity between testing a Web application and testing a traditional software
application. We considered testing of the functional requirements with
respect to four main aspects, i.e. testing scopes, test models, test strategies
and testing tools. The main contributions to these topics presented in the
literature have been taken into account to carry out this analysis.
The main conclusion we can draw from this discussion is that all testing
aspects that are directly dependent on the implementation technologies
(such as test models, testing scopes, white box testing strategies) have to
be deeply adapted to the heterogeneous and “dynamic” nature of the Web
applications, while other aspects (such as black box strategies, or the objectives of testing tools) may be reused with a reduced adaptation effort.
This finding also indicates that further research efforts should be spent to
define and assess the effectiveness of testing models, methods, techniques
and tools that combine traditional testing approaches with new and specific
ones.
258
Giuseppe A. Di Lucca, Anna Rita Fasolino
A relevant issue for future work may be the definition of methods and
techniques for improving the effectiveness and efficiency of a Web application testing process. As an example, the adequacy of mutation testing
techniques for the automatic validation of test suites should be investigated, as well as the effectiveness of statistical testing techniques in reducing testing effort by focusing on those parts of a Web application that are
most frequently used by massive user populations [15]. Moreover, the
possibility of combining genetic algorithms with user session data for reducing the costs of test case generation may be a further research question
to be investigated. Finally, in the renewed scenario of Web services, new
research challenges are being provided by the necessity to consider testing
of Web services too.
References
1
Andrews AA, Offutt J, Alexander RT (2005) Testing Web Applications by
Modeling with FSMs. Software Systems and Modeling, 4(2)
2
Bangio A, Ceri S, Fraternali P (2000) Web Modeling Language (WebML): a
Modelling Language for Designing Web Sites. In: Proceedings of the 9th International Conference on the WWW (WWW9). Elsevier: Amsterdam, Holland, pp 137–157
3
Binder RV (1999) Testing Object-Oriented Systems. Models, Patterns, and
Tools. Addison-Wesley: Reading, MA
4
Conallen J. (1999) Building Web Applications with UML. Addison-Wesley:
Reading, MA
5
Di Lucca GA, Fasolino AR, De Carlini U, Pace F, Tramontana P (2002)
Comprehending Web Applications by a Clustering Based Approach. In: Proceedings of 10th Workshop on Program Comprehension. IEEE Computer Society Press: Los Alamitos, CA, pp 261–270
6
Di Lucca GA, Fasolino AR, Faralli F, De Carlini U (2002) Testing Web Applications. In: Proceedings of International Conference on Software Maintenance. IEEE Computer Society Press: Los Alamitos, CA, pp 310–319
7
Di Lucca GA, Fasolino AR, Tramontana P (2004) Reverse Engineering Web
Applications: the WARE Approach. Software Maintenance and Evolution:
Research and Practice. John Wiley and Sons Ltd., 16:71–101
8
Elbaum S, Karre S, Rothermel G (2003) Improving Web Application Testing
with User Session Data. In: Proceedings of International Conference on Software Engineering, IEEE Computer Society Press: Los Alamitos, CA, pp 49–
59
9
Elbaum S, Rothermel G, Karre S, Fisher M (2005) Leveraging User-Session
Data to support Web Application Testing. IEEE Transactions on Software
Engineering, 31(3):187–202
Web Application Testing
259
10 Gomez J, Canchero C, Pastor O (2001) Conceptual Modeling of DeviceIndependent Web Applications. IEEE Multimedia, 8(2):26–39
11 Harrold MJ, Gupta R, Soffa ML (1993) A Methodology for Controlling the
Size of a Test Suite. ACM Transactions on Software Engineering and Methodology, 2(3):270–285
12 Hieatt E, Mee R (2002) Going Faster: Testing The Web Application. IEEE
Software, 19(2):60–65
13 Hower R (2005) Web Site Test Tools and Site Management Tools. Software
QA and Testing Resource Center. www.softwareqatest.com/qatWeb1.html
(accessed 5 June 2005)
14 IEEE Std. 610.12–1990 (1990). Glossary of Software Engineering Terminology, in Software Engineering Standard Collection, IEEE Computer Society
Press, Los. Alamitos, CA
15 Isakowitz T, Kamis A, Koufaris M (1997) Extending the Capabilities of
RMM: Russian Dolls and Hypertext. In: Proceedings of 30th Hawaii International Conference on System Science, Maui, HI, (6):177–186
16 Kallepalli C, Tian J (2001) Measuring and Modeling Usage and Reliability
for Statistical Web Testing. IEEE Transactions on Software Engineering,
27(11):1023–1036
17 Liu C, Kung DC, Hsia P, Hsu C (2000) Object-based Data Flow Testing of
Web Applications. In: Proceedings of First Asia-Pacific Conference on Quality Software. IEEE Computer Society Press, Los Alamitos, CA, pp 7–16
18 Nguyen HQ (2000) Testing Applications on the Web: Test Planning for Internet-Based Systems. John Wiley & Sons, NY
19 Ricca F, Tonella P (2001) Analysis and Testing of Web Applications. In:
Proceedings of ICSE 2001 IEEE Computer Society Press, Los Alamitos CA,
pp 25–34
20 Ricca F, Tonella P (2004) A 2-Layer Model for the White-Box Testing of
Web Applications. In: Proceedings of Sixth IEEE Workshop on Web Site
Evolution IEEE Computer Society Press, Los Alamitos, CA, pp 11–19
21 Sampath S, Mihaylov V, Souter A, Pollock L (2004) A Scalable approach to
user-session based testing of Web Applications Through Concept Analysis.
In: Proceedings of 19th International Conference on Automated Software Engineering, IEEE Computer Society Press: Los Alamitos, CA, pp 132–141
22 Sampath S, Mihaylov V, Souter A, Pollock L (2004) Composing a framework
to automate testing of operational Web-based software. In: Proceedings of
20th International Conference on Software Maintenance IEEE Computer Society Press pp 104–113
23 Schwabe D, Guimaraes RM, Rossi G (2002) Cohesive Design of Personalized
Web Applications. IEEE Internet Computing. 6(2):34–43
24 Web Content Accessibility Guidelines 2.0 (2005),
http://www.w3.org/TR/WCAG20 (accessed 5 June 2005)
260
Giuseppe A. Di Lucca, Anna Rita Fasolino
Authors’ Biographies
Giuseppe A. Di Lucca received the Laurea degree in Electronic Engineering from
the University of Naples “Federico II”, Italy, in 1987 and the PhD degree in Electronic Engineering and Computer Science from the same university in 1992.
He is currently an Associate Professor of Computer Science at the Department
of “Ingegneria” of the University of Sannio. Previously, he was with the Department of 'Informatica e Sistemistica' at the University of Naples “Federico II”.
Since 1987 he has been a researcher in the field of software engineering and his
list of publications contains more than 50 papers published in journals and conference proceedings.
He serves on the programme and organising committees of conferences in the
field of software maintenance and program comprehension. His research interests
include software engineering, software maintenance, reverse engineering, software
reuse, software reengineering, Web engineering and software migration.
Anna Rita Fasolino received the Laurea degree in Electronic Engineering (cum
laude) in 1992 and the PhD degree in Electronic Engineering and Computer Science in 1996 from the University of Naples “Federico II”, Italy, where she is currently an Associate Professor of Computer Science. From 1998 to 1999 she was at
the Computer Science Department of the University of Bari, Italy.
Her research interests include software maintenance and quality, reverse engineering, Web engineering, software testing and reuse, and she has published several papers in journals and conference proceedings on these topics. She is a member of programme committees of conferences in the field of software maintenance
and evolution.
8 An Overview of Process Improvement
in Small Settings
Khaled El Emam
Abstract: Existing software process improvement approaches can be applied successfully to small projects and small organisations. However, they
need to customised and the techniques used have to be adapted for small
settings. This chapter provides a pragmatic discussion of issues requires to
implement software process improvement in small settings, covering the
practical obstacles that are likely to be faced and ways to address them.
Keywords: Software process improvement, IDEAL model, Small organisations.
8.1 Introduction
Software process improvement (SPI) efforts in small settings tend to be, in
general, less successful than in large settings [4]. The approaches needed
to improve the practices of small projects are somewhat different from
those required for larger projects. Web development projects are still typically small with just a handful of developers and possibly additional resources in the form of graphics artists and technical writers.
This chapter will discuss the issues relevant to software process improvement (SPI) in a small project context and present examples of assessment and improvement approaches that have worked in the past. Some
of this knowledge is based on the research literature, and some on our own
experience working with small organisations over the past ten years, improving their software engineering practices. Some of our examples include our experience of process improvement at TrialStat Corporation,
where we were responsible for continuous process improvement over a
four-year period.
Small projects may occur in large or small organisations. If the former,
it is possible for small projects to take advantage of some of the resources
of the parent company, e.g. their training programs, internal consultants
and corporately licensed tools. However, if the latter, projects do not have
these advantages, and the organisation will be limited to one or two projects at most. In this chapter we will not differentiate between the two
cases explicitly, unless it is material to the discussion. We therefore refer
to small settings in the general case.
262
Khaled El Emam
To start we need to be more precise about what constitutes a small setting. There have been numerous definitions used in the literature and governments, where they all tend to be vague. European Union projects used
to classify organisations as small if they had up to 50 IT staff [15], also
supported by [6,7]. Varkoi et al. [17] considered a company small if it had
less than 100 employees, Cater-Steel [2] as one with less than 20 employees [2], and Dyba [4] as one with up to 35 employees. The US census considers companies with up to 50 employees as small [10]. Therefore we will
define a small setting as one with up to 50 employees.
8.1.1 Why Do Organisations Initiate SPI Efforts?
It is important to understand what motivates organisations to start an SPI
effort. Motivations will have an influence on the amount of resources they
will make available on and the management’s commitment.
Anecdotally, our experience suggests that the three main drivers are:
•
•
•
A crisis has hit the organisation or a particular project. For example,
shipping a product extremely late or delivering a release with a large
number of defects, with the result that important clients have complained or abandoned the product. The crisis initiates a search for solutions. In some organisations the crisis will result in some key people
taking the blame and being fired. In others, the search for solutions
may result in an SPI initiative.
An important client demands that suppliers have an SPI initiative in
place. In such case, SPI is driven by the client, and the organisation or
project is obliged to respond.
In some cases it is a business requirement to demonstrate that good
practices are followed, e.g. in regulated domains. For the clinical trials
sector, Title 21 Code of Federal Regulations (21 CFR Part 11) is the
Food and Drug Administration regulation that governs software development and operations. The interpretations of this regulation stipulate a
set of software development practices to be in place. Passing an audit
is a requirement for being a part of this business, and therefore management is obliged to have an SPI effort in place to ensure compliance.
However, in addition to our anecdotal evidence, previous research also
indicates that there are additional reasons for initiating SPI efforts. An
analysis performed by the Software Engineering Institute, based on feedback data collected from assessment sponsors, showed that over half of the
sponsors stated that the primary goals of their assessments were either to
monitor the progress of their existing software process improvement programs, or to initiate new programs [3]. Furthermore, over a third of the
An Overview of Process Improvement in Small Settings
263
sponsors said that validating an organisation’s maturity level was a primary goal of their assessment.
In another study [9], assessment sponsors were asked the reasons for
performing a software process assessment in their organisations. The question asked was “To what extent did the following represent important reasons for performing a software process assessment?”. The responses were
measured using a 5-point scale of importance, structured as follows:
1 corresponds to “Very Important”.
2 corresponds to “Important”.
3 corresponds to “Somewhat Important”.
4 corresponds to “Not Very Important”.
5 corresponds to “Not At All Important”.
•
•
•
•
•
Table 8.1 summarises the average calculated for each response, and
sponsors’ answers.
Table 8.1. Reasons for performing a software process assessment
No.
1
2
3
4
5
6
7
8
9
10
11
12
Reason
Gain market advantage
Customer demand to improve process capability
Improve efficiency
Improve customer service
Improve reliability of products
Improve reliability of services in supporting
products
Competitive/marketing pressure to demonstrate
process capability
Generate management support and buy-in for software process improvement
Generate technical staff support and buy-in for
software process improvement
Establish best practices to guide organisational
process improvement
Establish project baseline and/or track projects’
process improvement
Establish project baseline and/or track organisation’s process improvement
Variable Name
ADVANTAGE
DEMAND
EFFICIENCY
CUSTOMER
PRODREL
SERVREL
COMPETITIVE
MANAGEMEN
T
TECHSTAFF
BESTPRACT
TRACKPROJ
TRACKORG
Figure 8.1 presents a range plot that shows square points representing
the response mean (average) for each reason asked. The mean is based on
scores obtained from a single study, i.e. from a single sample of the population of interest. If we were to repeat this study with a different sample it
is very likely that the means of all responses would differ from those in
Fig. 8.1. For this reason we have also included a 95% confidence interval,
264
Khaled El Emam
represented by upper and lower whiskers. A confidence interval delimits
the range of values where the true mean is likely to lie, and 95% represents
the probability of that occurring.
Assuming that a sponsor is indifferent to a given reason if the score
given is 3, we can use the confidence intervals provided in the range plot
to find the reasons sponsors are indifferent to. Whenever whiskers cross
the value of three, then there is evidence, with a 95% confidence, that the
mean response for that reason is not significantly different from 3. Only
the two reasons SERVREL and ADVANTAGE are indifferent to sponsors.
This means that sponsors exhibited indifference on “gaining market advantage” and “improving the reliability of services in supporting products” as
reasons for performing an assessment.
TRACKORG
EFFICIENCY
BESTPRACT
TRACKPROJ
CUSTOMER
DEMAND
MANAGEMENT
TECHSTAFF
PRODREL
SERVREL
ADVANTAGE
COMPETITIVE
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
More Important
Fig. 8.1. Mean of importance scores with 95% confidence interval
In addition, “competitive/marketing pressure to demonstrate process capability” (COMPETITIVE) was clearly not a reason for performing assessments.
The five most important reasons sponsors chose for conducting their
process assessments were: to establish process capability baselines and/or
track progress with project and organisational process improvement
(TRACKORG, TRACKPROJ); to improve efficiency and customer service (EFFICIENCY, CUSTOMER); and to establish best practices to
guide process improvement (BESTPRACT). The importance which the
sponsors assigned to “establishing capability baselines” clearly indicates
An Overview of Process Improvement in Small Settings
265
that they tend to recognise that assessments are an important measurement
procedure. To choose “Improving efficiency and customer service” indicates these sponsors believe that SPI based on the assessment would provide tangible benefits to their projects. The choice of “Establish best practices to guide organisational process improvement” suggests that sponsors
expected that existing good practices embodied in the models would be
transferred to their organisations.
The three reasons scored in the middle range were: the need to generate
support and buy-in for process improvement among the technical staff and
management (MANAGEMENT, TECHSTAFF); customer demand to
improve process capability (DEMAND). Again, these are consistent with
the two basic reasons for performing assessments, namely to build support
for process improvement as well as to accurately measure organisational
capability.
No differences were found in these responses between small and large
organisations.
8.1.2 Process Improvement Cycle
SM
The IDEAL model [14] (see Fig. 8.2) provides the overall framework for
SPI. This model has been used in the past for successful improvements in
small projects [13]. It consists of five phases:
I
D
E
A
L
Initiating: to initiate the improvement program
Diagnosing; to diagnose the current state of practice
Establishing: to establish the plans for the improvement program
Acting: to act on the plans and recommended improvements
Leveraging: to leverage the lessons learned and the business results of the improvement effort
The Initiating phase establishes the business reasons for undertaking a
software process improvement effort. It identifies high-level concerns in
the organisation that can be the stimulus for addressing various aspects of
quality improvement. Communication of these concerns and business perspectives is required during the Initiating phase to gain visible executive
buy-in and sponsorship at this early stage of the improvement effort.
The Diagnosing phase is used to build a common understanding of the
current processes of the organisation, most especially the strengths and
weaknesses of the processes currently employed. It will also help identify
priorities for improving software processes. This diagnosis is based on a
software process assessment (see below).
266
Khaled El Emam
Fig. 8.2. The SEI’s IDEAL Model for SPI (source [14]).
The Establishing phase finalises the strategy and supporting plans for
the software process improvement program. It sets the direction and guidance for the next three to five years, including strategic and tactical plans
for software process improvement.
The Acting phase takes action to effect changes in organisational systems that result in improvements to these systems. Improvements are made
in an orderly manner and in ways that will cause them to be sustained over
time. Techniques used to support and institutionalise change include defining software processes and measurements, pilot testing, and installing new
processes and measurements throughout the organisation.
The Leveraging phase completes the process improvement cycle. Lessons learned from the pilot projects and improvement efforts are documented and analysed to improve the process improvement program for the
future. The business needs determined at the beginning of the cycle are
revisited to see if they have been met. Sponsorship for the program is revisited and renewed for the next cycle of software process improvement.
An Overview of Process Improvement in Small Settings
267
8.1.3 Process Assessments
SPI efforts should begin with some form of software process assessment.
A process assessment identifies the strengths and weaknesses for the overall practices of the target project(s). This is achieved through a series of
interviews and document inspections (e.g. looking at project plans and test
execution results) to determine how software is currently developed and
maintained. Most process assessments rely on a model of best practices to
drive the interviews and document inspections, such as the CMM for
Software [16], the CMMI [1], and ISO/IEC 15504 [8].
Figure 8.3 depicts the context of process assessment, showing that process assessment provides the means of characterising the current process
capabilities of an organisation or project. Analysis of the assessment results is used to identify process strengths and weaknesses. For SPI, this
would ultimately lead to an improvement initiative, which identifies
changes to the processes to improve their capabilities. For capability determination, the assessment results identify whether or not the assessed
processes meet a target capability. If the processes do not match up to the
target capability, this may initiate an improvement effort. Our focus in this
chapter is on process assessment for improvement.
The performance of an assessment requires three different types of inputs:
•
•
•
1
An assessment definition, which includes the identification of the assessment sponsor, the purpose and scope of the assessment, any relevant constraints, and the assessment responsibilities (e.g. who will be
on the assessment team, and who will be interviewed).
An assessment method that describes the activities that need to be performed during an assessment.
An underlying best practice model is required. This model consists of
the definitions of the processes that will be assessed, the assessment
criteria, and a scheme to produce quantitative ratings at the end of the
assessment.1
The quantitative ratings are good for communicating the status of the process
internally or externally. However, they are not necessary for a successful improvement program.
268
Khaled El Emam
Fig. 8.3. The context of process assessment [9]
In most projects the development team knows where the large problems
are. However, they may not have management support to do something
about it; they may not know where to start because there are so many problems; they may not know the best practices to implement to improve their
current situation. Process assessments address these three issues directly.
This is the recommended way to get started with SPI, with some exceptions mentioned below.
The assessment can be formal and conducted by a third-party according
to a well defined method; it can be informal and performed internally
within the organisation; it can be a mixture of the two.
Formal assessments have the advantage of being credible and are more
likely to have repeatable results. However, they also tend to be more expensive.
Internal assessments work well only if the internal assessor has extensive experience with SPI (e.g. from a job in a previous organisation and/or
extensive training and coaching). Then (s)he can act as a surrogate for
external assessors and would have sufficient background knowledge and
access to resources to perform a meaningful assessment.
In some cases management or other staff will not believe an internal assessment and hence will not act on recommendations. This will be partially
driven by the credibility of the internal assessor(s) and the existence of
conflicts between different groups. In such a case it would be better to hire
external consultants.
There are a number of process assessment methods that have been developed specifically for small projects and small organisations [2,12,18].
These reduce the scope of assessments to the processes that are believed to
be most relevant in small settings, and rely more upon interviews and less
on document inspections.
An Overview of Process Improvement in Small Settings
269
8.2 Implementation in Small Settings
8.2.1 Availability of Funds
Projects in small organisations are often characterised as lacking funds.
The same is true for small projects in large organisations where the project
is an independent cost centre. This makes it difficult to sanction heavy
investment in process improvement. For example, there are cases where a
large company invests $1m on SPI and obtains a seven-fold return on investment (ROI). Getting such returns is often contingent in making such a
large investment. For a small project that can only invest say $10,000, the
question is, would they too get the same level of ROI? The most likely
answer is no. Small projects and small organisations need to get credible
returns with small investments.
The most realistic approach for SPI in small settings is incremental improvement, with each increment requiring a small investment. Each small
investment must show a value to the project otherwise it will be difficult to
sustain such an investment over time.
Many of the SPI challenges in small settings result from the availability
of limited funds.
8.2.2 Resources For Process Improvement
A typical starting point in an SPI effort is to setup a Software Engineering
Process Group (SEPG) [11]. While there have been some examples in
small projects where this has worked, for most projects there are insufficient resources to create a dedicated SEPG.
In practice, one or two developers or managers are assigned, on a parttime basis, to be responsible for process improvement. This may be difficult because their primary priority is to deliver the software. Therefore,
unless the regular development workload is reduced to accommodate the
additional responsibility, this approach may not work effectively.
Another model that has been used requires that individuals responsible
for much of the work to implement SPI are consultants external to the organisation. These consultants are hired for an extended period of time and
assist the project in their SPI effort. This model raises two challenges. The
first is that external consultants tend to be quite expensive. Second, unless
the consultants are well integrated with the project and have management’s
strong support, they may not be able to affect meaningful change within
the projects.
270
Khaled El Emam
Government support has frequently alleviated the former challenge.
This reduces the financial burden on projects and makes the successful
implementation of SPI more likely.
8.2.3 Process Model
The ability to handle changing requirements is critical for software projects. Requirements change because the business processes that are being
automated change, and because users develop a better understanding of
their needs over time. An effective process model for these kinds of projects is an iterative one. This breaks up the delivery of functions into several small iterations. At the beginning of each iteration, the software requirements are prioritised and allocated to the subsequent iterations. The
users (or customers) are key participants in this prioritisation exercise.
The more successful iterative projects develop a system architecture at
the very beginning of the project. This requires an understanding of the
total functionality of the system that will be developed, and which parts of
the system need to be flexible to accommodate change.
In an iterative environment SPI should allocate improvements to each
iteration. E.g. a small change to the process to improve testing practices is
introduced in an iteration. After the iteration the team evaluates the
changes. If the evaluation was positive then the change is kept. If the experience was negative then additional changes are made for the next iteration step. With such a mechanism it is possible to introduce incremental
process improvements at a cost that the project can sustain. This is a very
effective way to introduce change in small settings.
8.2.4 Training
Small projects often implement SPI without any formal training because of
limited funding. A study of the implementation of agile practices in small
projects found that none of the projects’ team members had received any
training or mentoring on agile methods [5]. Most of the information was
obtained from conferences, books and the Web.
The consequence of this approach is that it is fraught with trial and error. Most good practices need to be tailored to the specifics of the organisation. If the project team has a lot of patience for error, then this may
work. But the reality is that it takes time to converge to a solution that
works.
This is where government funding, technology transfer institutes and
universities can help. Such organisations can provide access to low cost
An Overview of Process Improvement in Small Settings
271
training for teams working on small projects, therefore reducing some of
the risks involved in an SPI effort.
8.2.5 Relevance of Practices in Assessment Models
Some of the practices in contemporary process assessment (best practice)
models would be hard to follow for small projects. E.g. it is frequently
required that an independent quality assurance (QA) group exist, and part
of the function of that group would be to ensure process compliance.
It is not always possible to have an independent group because of the
additional costs of the extra layers of management. A common approach is
to have the QA and development groups report to the same project manager. This is especially true in small settings.
Another requirement is for documentation. Because small projects have
small teams, informal communication can be very effective for exchanging
information. However, over reliance on informal communication causes
problems when key members of the team leave. This results in key domain
and technical knowledge leaving the company. Therefore, a certain amount
of documentation is necessary, especially technical documentation about
the system (e.g. architecture documentation).
A good assessor with background in software engineering would be able
to modulate the requirements of the assessment model with the realities of
a small setting. This makes the selection of assessors a critical task for
small projects and small organisations.
8.2.6 Changing Behaviour
Some people work for small companies because such environments have
fewer standards and procedures to follow. These staff may resist SPI efforts since SPI is mainly about standardisation of practices to ensure repeatability.
There are two general approaches for dealing with such staff:
•
•
To give staff key roles in the SPI initiative. This will help them get a
better understanding of what the objectives are, and will help gain their
support.
To have senior management provide strong signals that the SPI effort
is important for the organisation and for the projects. This will dilute
attempts to derail or sabotage the SPI efforts (since it would be clear
that the consequences of such actions would be much more severe).
272
Khaled El Emam
8.2.7 Piloting Practices
It is important to pilot the necessary practices before their implementation.
This is particularly important for two reasons:
•
•
Most practices need to be customised and it is not always possible to
get the customisation correct the first time round. Having an experienced person (either internal or as a consultant) does help ensure that
practices are properly customised. One can never be sure until it is
used.
Sometimes there are implicit dependencies between different practices
and this may cause problems if not all of the practices are implemented
at the same time or in an appropriate order.2
It is often a good idea to document the practices that work to ensure that
there is an organisational memory. Documentation of practices should
follow a successful pilot to avoid having to continuously update the documentation and to ensure that the documented practices reflect what the
projects are actually doing. It is frequent that process documentation is
prescriptive and does not match reality, which results in documentation
with little added value. A stipulation that process documentation should be
descriptive would avoid this.
8.2.8 Where To Start
One advantage of staged assessment models, such as the CMM for Software, is that they provide clear guidance as to the order in which practices
should be implemented. For small projects that are attempting to improve
their practices, there are typically many problems that need to be addressed
and it is not possible to tackle all of them at once. This is particularly true
when there are fiscal constraints. A staged model helps the SPI effort to
focus on a few key practices.
Some argue that every organisation is different and therefore one implementation order does not fit all. In practice organisations tend to follow
patterns of bad behaviour and there is a limited set of ways to remedy the
situation. E.g. low maturity organisations need first to manage the change
process, have basic project management in place, repeatable releases, and
to obtain control on the way they make commitments to clients. Without
this, it will be difficult to implement any other practices that would be
2
Many practices in agile methodologies will not work well if refactoring is not
performed regularly. Therefore, not having refactoring in place at the same time
can cause problems.
An Overview of Process Improvement in Small Settings
273
lasting. This set of practices is the first step in most staged software engineering assessment models.
Even with staged models, the number of practices that an organisation
needs to focus on at any one time, even though it is a limited set, may still
seem overwhelming. Therefore, the number of practices employed may
need to be reduced further, taking into account the business objectives of
the organisation.
Recent research has identified specific practices that small projects and
small organisations find valuable. The eight practices that are the focus of
the RAPID assessment process are [2]:
•
•
•
•
•
•
•
•
Requirements gathering
Software development
Project management
Configuration management
Quality assurance
Problem resolution
Risk management
Process establishment
Another study tried to identify the CMMI (specific) practices that are
valued the most by small organisations [18]:
•
•
•
•
•
•
•
•
•
•
•
•
Obtain an understanding of requirements (requirements management)
Obtain commitment to requirements (requirements management)
Identify configuration items (configuration management)
Create or release baselines (configuration management)
Estimate the scope of the project (project planning)
Establish budget and schedule (project planning)
Establish the project plan (project planning)
Obtain plan commitment (project planning)
Conduct progress reviews (project monitoring and control)
Conduct milestone reviews (project monitoring and control)
Analyze issues (project monitoring and control)
Take corrective action (project monitoring and control)
One can also focus process improvement activities on these initial set of
practices. It should be noticed that the two lists converge in many of their
practices and are consistent with the recommendations of the staged models in terms of what to focus on first.
274
Khaled El Emam
8.3 Conclusions
In general, small projects in both small and large organisations pursue
process improvement for the same reasons. However, the models and
methods employed need to be customised to the organisations’ context,
which, for small settings, include reduced funds for investment on improvement and a rapidly changing business environment. In this chapter
we reviewed some of the main issues specific to small settings, and provided a pragmatic guidance for dealing with those issues, based on the
literature and experience working in small settings.
References
1
Ahern D, Clouse A, Turner R (2003) CMMI Distilled: A Practical Introduction to Integrated Process Improvement. Addison-Wesley
2
Cater-Steel A (2004) Low-rigour, rapid software process assessments for
small software development firms. In: Proceedings of the 2004 Australian
Software Engineering Conference, Melbourne, pp 368–377
3
Dunnaway D, Goldenson D, Monarch I, White D (1998) How well is CBA
IPI working? User feedback. In: Proceedings of the 1998 Software Engineering Process Group Conference
4
Dyba T (2003) Factors of software process improvement success in small and
large organisations: An empirical study in the Scandinavian context. In: Proceedings of the European Software Engineering Conference
5
El Emam K (2003) Finding Success in Small Software Projects. Cutter Consortium 4(11)
6
El Emam K, Birk A (2000) Validating the ISO/IEC 15504 Measures of Software Development Process Capability. Journal of Systems and Software,
51:119–149
7
El Emam K, Birk A (2000) Validating the ISO/IEC 15504 Measures of Software Requirements Analysis Process Capability. IEEE Transactions on Software Engineering, 26:541–566
8
El Emam K, Drouin J -N, Melo W (1998) SPICE: The Theory and Practice of
Software Process Improvement and Capability Determination, IEEE Computer Society Press
9
El Emam K, Goldenson D (2000) An Empirical Review of Software Process
Assessments. J Advances in Computers, 53:319–423
10 M. Fayad, M. Laitinen, and R. Ward, "Software engineering in the small,"
Communications of the ACM, vol. 43, pp. 115–118, 2000.
11 Fowler P, Rifkin S (1990) Software Engineering Process Group Guide. Software Engineering Institute CMU/SEI-90-TR-24
An Overview of Process Improvement in Small Settings
275
12 Grunbacher P (1997) A software assessment process for small software enterprises. In: Proceedings of 23rd EUROMICRO Conference'97, New Frontiers
of Information Technology
13 Kautz K, Hansen H, Thaysen H (2000) Applying and adjusting a software
process improvement model in practice: The use of IDEAL model in a small
software Enterprise. In: Proceedings of the International Conference on Software Engineering, June, pp 626–633
14 McFeeley B (1996) IDEAL: A User's Guide for Software Process Improvement. Software Engineering Institute CMU/SEI-96-HB-001
15 Sanders M (1998) The SPIRE Handbook: Better, Faster, Cheaper Software
Development in Small Organisations. European Comission
16 Software Engineering Institute (1995) The Capability Maturity Model: Guidelines for Improving the Software Process: Addison Wesley
17 Varkoi T, Mäkinen T, Jaakkola H (1999) Process improvement priorities in
small software companies. In: Proceedings of the PICMET´99
18 Wilkie FG, McFall D, McCaffery F (2005) An evaluation of CMMI process
areas for small to medium-size software development organisations. Software
Process: Improvement and Practice, April/June, 10:189–201
Author’s Biography
Dr. El Emam is an Associate Professor at the University of Ottawa, Faculty of
Medicine, Canada Research Chair in Electronic Health Information at the University of Ottawa, and a Senior Scientist at the Children's Hospital of Eastern Ontario
Research Institute, where he is leading the eHealth research program. In addition,
Khaled is the Chief Scientist at TrialStat Corporation and a Senior Consultant with
Cutter Consortium's Agile Software Development & Project Management Practice. Previously Khaled was a senior research officer at the National Research
Council of Canada, where he was the technical lead of the Software Quality Laboratory, and prior to that he was head of the Quantitative Methods Group at the
Fraunhofer Institute for Experimental Software Engineering in Kaiserslautern,
Germany. In 2003 and 2004, Khaled was ranked as the top systems and software
engineering scholar worldwide by the Journal of Systems and Software based on
his research on measurement and quality evaluation and improvement, and ranked
second in 2002 and 2005. Currently, he is a visiting professor at the Center for
Global eHealth Innovation at the University of Toronto (University Health Network) and at the School of Business at Korea University in Seoul. He holds a
Ph.D. from the Department of Electrical and Electronics Engineering, King's College, at the University of London (UK).
9 Conceptual Modelling of Web Applications:
The OOWS Approach
Oscar Pastor, Joan Fons, Vicente Pelechano, Silvia Abrahão
Abstract: This chapter introduces a method that integrates navigational
and presentational designs to object-oriented conceptual modelling, and
also provides systematic code generation. The essential expressiveness is
provided using graphical schemas that specify navigation and presentation
features, and use high-level abstraction primitives. Using conceptual
schemas as input, a methodology is defined to systematically take a problem space to the solution space by defining a set of correspondences between conceptual modelling abstractions and the final software components. We also provide a case study that details the application of the
proposed methodology.
Keywords: Web development, Conceptual model, Object-oriented model,
OOWS.
9.1 Introduction
The development of quality and reliable software applications based on
their conceptual schema seems a never ending challenge for the software
engineering community. Nowadays, with the wide extension of the Model
Driven Architectures (MDA), it is more than ever accepted that the right
strategy is to start with a sound, precise and unambiguous description of an
information system in the form of a Conceptual Schema (CS). This CS
must be properly transformed into its corresponding software product by
defining the mappings between conceptual primitives and software representations. The implementation of such mappings has driven the development of model compilers, and there are already interesting academic and
industrial proposals for that [1,2].
The emerging Web engineering discipline [3] is making this challenge
even bigger. Conventional applications have done an acceptable job in
specifying static and dynamic aspects, structure and behaviour. But a Web
application requires consideration of some other particular aspects, not
properly addressed with all those conventional, basically UML-based
methods. Navigation and presentation become first-order citizens, and the
conceptual modelling step must consider them accordingly. Conceptual
modelling of Web applications has become a strong area of research trying
278
Oscar Pastor et al.
to provide methods and tools to overcome the problem, and an interesting
set of proposals is starting to exist.
Basically, these approaches introduce new models and abstraction
mechanisms to capture the essentials of Web applications and to give support for the full development of a Web solution. Some representative efforts to introduce Web features into classical conceptual modelling approaches are OOHDM [4], WebML [5], UWE [6] and WSDM [7].
Our proposal provides a concrete contribution in this context. We introduce a conceptual-modelling-centred method that integrates navigational
and presentational design with a classical object-oriented (OO) conceptual
modelling that provides systematic code generation. The essential expressiveness is introduced in graphical schemas in order to properly specify
navigation and presentation features, using high-level abstraction primitives. Taking CS as an input, a precise methodological guide is defined for
going systematically from the problem space to the solution space by defining a set of correspondences between the conceptual modelling abstractions and the final software components.
The work introduced in this chapter focuses on the required extensions
needed to enhance “classical” OO software production methods (in particular the OO-Method [1]) in order to define a systematic Web modelling
method. It also discusses the high-level abstraction primitives to capture
Web applications’ features by extending CS.
Last but not least, the Web CS can be used as the basic artefact to measure functional size of the future Web application. Doing so, size measurement can be done at the earliest stages of the software production process.
Considering that the CS is converted into a final application, this measurement provides the functional size of the final product from the CS.
The structure of this work is the following. Section 9.2 presents the
methodological approach to model Web applications. The conventional
models of the OO-Method are introduced, together with the extension
where two new models are defined: the navigational model, which captures the navigation semantics of a Web application, and the presentational
model, which specifies aspects related to user interfaces’ layout with a set
of basic patterns. In Sect. 9.3 the model transformation strategy to go from
the CS to the software product is briefly discussed. Section 9.4 puts all the
ideas into practice using a case of study, dealing with the Web application
for a Spanish soccer club.
9.2 A Method to Model Web Applications
OOWS (Object-Oriented Web Solutions) is the extension of the objectoriented software production method OO-Method [1] that introduces the
Conceptual Modelling of Web Applications: The OOWS Approach
279
required expressiveness to capture the navigational and presentational requirements of Web applications.
OOWS provides a full software development method for Web applications that defines a set of activities to be fulfilled to properly specify the
functional, navigational and presentational dimensions of Web applications’ requirements.
The proposed software production method comprises two major steps:
system specification and solution development. A full specification of a
system’s functional requirements is built in the system specification step. A
strategy oriented towards generating the software components of the solution (the final software product) is defined in the second step. This model
transformation strategy, from the system specification to the software solution, is graphically depicted in Fig. 9.1.
Fig. 9.1. Methodological approach
9.2.1 OO-Method Conceptual Modelling
OO-Method [1] is an OO software production method that provides the
model-based code generation capabilities and integrates formal specification techniques with conventional OO modelling notations.
In the “System Specification” step, a conceptual schema is built to represent an application’s requirements. The modelling tools that are used by
the method allow the specification of structural and functional requirements of dynamic applications by means of a set of models. Those models
are the following:
•
A structural model that defines the system structure (its classes, operations and attributes) and relationships between classes (specialisation,
association and aggregation) by means of a class diagram.
280
•
•
Oscar Pastor et al.
A dynamic model that describes the different valid object-life sequences for each system class using State Transition Diagrams. Also
in this model object interactions (communications between objects) are
represented by sequence diagrams.
A functional model that captures the semantics of state changes to define service effects using a textual formal specification [1].
As stated in [3], Web applications have additional properties that should
be modelled. We want to extend the OO-Method to deal with navigation
specification, user interface definition, and user categorisation and personalisation, in order to properly capture Web application requirements. The
following sections explain these extensions.
9.2.2 OOWS: Extending Conceptual Modelling to Web
Environments
The OOWS approach introduces three additional models (user, navigation
and presentation models) that allow developers to: (1) express the types of
users that can interact with the system and the sort of system visibility they
can have; (2) define the system’s navigational semantics; and (3) specify the
system’s presentational requirements. This section discusses the conceptual
modelling primitives that are introduced for building these three models.
User Identification and Categorisation
Before modelling the system’s navigation, the method provides a user
diagram (see Fig. 9.2) to express which kind of users can interact with the
system and what visibility they should have over class attributes and operations. This diagram provides mechanisms to properly cope with additional user management capabilities, such as the user specialisation, which
allows for the definition of user taxonomies to improve navigational specification reuse [8].
Fig. 9.2. User diagram
Conceptual Modelling of Web Applications: The OOWS Approach
281
There are three types of users, determined by how they connect to the
system:
•
•
•
Anonymous users (depicted with a ‘?’ in the head) users who do not
need to provide information about their identity.
Registered users (depicted with a lock in the head) users who need to
be identified to connect to the system. They must provide their user
type.
Generic users (depicted with a cross in the head) users who cannot
connect to the system.
Representing Navigation
Once users have been identified, a structured and organised system view,
for each user type, must be specified. These views are defined over the
class diagram (structure), in terms of the visibility of class attributes, operations and relationships. Navigation specifications are captured in two
steps: the “Authoring-in-the-large” (global view) and the “Authoring-inthe-small” (detailed view).
The “Authoring-in-the-large” step refers to the specification and design
of global and structural aspects of the Web application. It is achieved by
defining a set of system user abstract interaction units and how the user
can navigate from one to another. These requirements are specified in a
navigational map that provides the system view and accessibility that each
kind of user will have. It is represented using a directed graph whose nodes
are navigational contexts or navigational subsystems (forward defined) and
arcs denote navigational links or valid navigational paths (see Fig. 9.3).
Fig. 9.3. Navigational map and navigational subsystem
Navigational contexts (graphically represented as UML packages
stereotyped with the «context» keyword) represent the user interaction
units that provide a set of cohesive data and operations to perform certain
282
Oscar Pastor et al.
activities. Depending on the context reachability, there are contexts of two
types:
•
•
Exploration navigational contexts (depicted with the “E” label) repre1
sent reachable nodes from any node. These contexts define implicit
navigational links starting from any node and ending at themselves.
These links, named exploration links, use dashed arrows, and are explicitly represented from the root of the map represented by the user
(see Fig. 9.3) to the exploration context. One exploration context can
be marked as default or home by adding an “H” label to its exploration
link. This home context will be accessed automatically when the user
connects to the system.
Sequence navigational contexts (depicted with the “S” label) can only
be accessed via a predefined navigational path by selecting a sequence
link (forward defined).
The navigational links (navigational map arcs) represent context reachabilities or navigational paths. There are two types of navigational links:
•
•
Sequence links or contextual links (represented with solid arrows) define a semantic navigation between contexts. Selecting a sequence link
implies carrying contextual information to the target context (the object that has been selected, the source navigational context etc.).
Exploration links or non-contextual links (represented with dashed
arrows) represent a user intentional change of task. When an exploration link is crossed, no contextual information is carried to the target
context.
In order to cope with complex navigational models, the navigational
map is structured using navigational subsystems. A navigational subsystem
is a primitive that allows us to define a sub-graph within the full graph
(hyper graph). Recursively, the content of a subsystem is a graph defined
by a navigational map (see the right-side of Fig. 9.3).
The “Authoring-in-the-small” step refers to the detailed specification of
the contents of the nodes (navigational contexts). To specify this content,
each navigational context comprises a set of abstract information units
(AIUs). An AIU represents a requirement for retrieving specific information. Contextual AIUs (labelled with a circled C) are instantiated when the
system arrives at that context by following a sequence link. Noncontextual AIUs (labelled with a circled NC) do not depend on sequence
links.
AIUs comprise navigational classes that represent class views (stereotyped with the «view» keyword) over class diagram classes. These classes
1
Similar to the Landmark pattern in the hypermedia community.
Conceptual Modelling of Web Applications: The OOWS Approach
283
contain the visible attributes and executable operations that will be available for the user in this context.
Each AIU has one mandatory navigational class, called manager class,
and optional navigational classes, called complementary classes, to provide complementary information about the manager class.
Fig. 9.4. Navigational context
Service links can also be attached to a service. A service link represents
the target navigational context that the user will reach after that service
execution. Figure 9.4 shows a service link related to the operation1() of
the ManagerClass. This specifies that after the execution of the operation1() operation, the system must automatically navigate to the navigational context specified in [ target context ].
In addition, a population condition filter can be specified to any navigational class. This condition defines an object retrieval condition that must
be satisfied. It is described by means of an Object Constraint Language
(OCL) formula.
All navigational classes must be related by unidirectional binary relationships, called navigational relationships. They are defined over existing
aggregation–association–composition or specialization–generalisation
relationships, and represent the retrieval of related instances. When more
than one structural relationship exists between two classes, the role name
of the relationship must be specified (depicted as /role-attribute/) to avoid
ambiguities. Two types of navigational relationships can be defined, depending on whether or not they define a navigation capability:
1. A context dependency relationship (graphically represented using
dashed arrows) represents basic information recovery by crossing a
structural relationship between classes. When a context dependency relationship is defined, all the related object instances to the origin class
object are retrieved.
284
Oscar Pastor et al.
2. A context relationship (graphically represented using solid arrows)
represents the same information recovery as the context dependency
relationship, in addition to navigation capability to a target navigational context, creating a sequence link in the navigational map. Context relationships have the following properties:
− A context attribute that indicates the target context of the navigation (depicted as [ target context ]).
− A link attribute that specifies the attribute (usually an attribute of
the target navigational class) used as the “anchor” to activate the
navigation to the target context.
These primitives comprise the core elements for navigational specifications. However, the specification of the navigational semantics can be enriched by introducing mechanisms to help a user explore and filter the
large amount of information inside a context. The next section presents
how to introduce advanced navigational features to the OOWS navigational model.
Advanced Navigational Features
Navigational contexts retrieve the classes’ population of the CS. We define
the cardinality of a navigational context as the number of instances it
should retrieve. Sometimes the retrieved information is difficult to manage. To help users browsing that amount of information, it is necessary to
define mechanisms for browsing and filtering that information in a navigational context. There are two main search mechanisms: indexes and filters.
Both are described at the bottom of each abstract information unit, under a
dashed line.
An index is a structure that provides an indexed access to the manager
class population. Indexes create a list of summarised information by using
an attribute or a set of attributes. If the indexed property belongs to the
manager class, it should be defined as an attribute index. If the indexed
property belongs to any complementary class, the index should be defined
as a relationship index, and the relationship must be specified. When an
index is activated, a list of all possible values for the indexed attribute(s)
is/are created. By choosing one of these values, all objects that present the
same value will be shown in a search view. This search view describes the
information that will be available to the users to aid them select an instance, which will be active in the navigational context.
A filter defines a population condition that can restrict the object instances to be retrieved. They are applied to attributes of the manager class
(attribute filters) or to attributes of complementary classes (relationship
filter). There are three types of filters:
Conceptual Modelling of Web Applications: The OOWS Approach
•
•
•
285
Exact filters, which take one attribute value and return all matching
instances,
Approximate filters, which take one attribute value and return all the
instances whose attribute values include this value as a sub-string.
Range filters, which take two values (a maximum and a minimum) and
return all the instances whose attribute values fit within the range. If
we specify only one value, it is only bounded on one side.
Optionally, it is possible to define a static population condition to specify predefined filtering conditions (for instance, “retrieving all books that
are best-sellers”). When a filter is activated, the instances that fulfil the
condition become visible within the search view. This search view behaves
as an index.
Presentational Modelling
Once the navigational model is built, we must specify presentational requirements of Web applications using a presentation model (see Fig. 9.1).
This model is strongly based on the navigational model and it uses its
navigational contexts (system–user interaction units) to define the presentation properties.
Presentation requirements are specified by means of patterns that are associated to the primitives of the navigational context (navigational classes,
navigational links, searching mechanisms, etc.). The basic presentation
patterns are as follows:
Information paging. This pattern allows us to define information “scrolling”. All the instances are “broken” into “logical blocks”, so that only one
block is visible at a time. Mechanisms to move forwards or backwards are
provided. This pattern can be applied to the manager class, navigational
relationship, index or filter. The required information is:
•
•
•
Cardinality, which represents the number of instances that form a
block.
Access mode. Sequential access provides mechanisms to go to the next,
previous, first and last logical blocks. Random access mode allows the
user to go directly to a desired block.
Circularity. When this property is active, the set of blocks behaves as a
circular buffer.
Ordering criteria. This pattern defines a class population order (ASCendant or DESCendant) according to the value of one or more attributes. It
can be applied to navigational classes, to specify how retrieved instances
will be sorted, or to access structures and search mechanisms, to sort the
results.
286
Oscar Pastor et al.
Information layout. Four basic layout patterns are provided: register,
tabular, master-detail (with a presentation pattern for the detail) and tree.
They can be applied to the manager class or to a navigation relationship.
These presentation patterns, in addition to the specified navigation features, capture the essential requirements for the construction of Web interfaces. More specialised presentation patterns can be introduced at the
modelling stage to “beautify” the final Web user interface.
9.3 A Strategy To Develop the Web Solution
OOWS follows the OO-Method strategy for systematically moving from
the problem space to the solution space (Fig. 9.1). Although this chapter is
not focused on exploiting this, we introduce the main ideas that will guide
the reification of OOWS conceptual schemas into a software product.
A three-tier architectural style has been selected to generate final applications: a presentation tier, an application tier and a persistence tier.
The information (persistence tier) and functionality (application tier) of
the Web application are generated by the OlivaNova Model Transformation Engines [2] taking as basis the OO-Method structural and behavioural
models. This tool provides an operational, MDA-compliant framework,
where a model compiler transforms a CS into its corresponding software
product.
Taking into account the new features introduced in the OOWS enhanced
Web schema, the generation process is enriched by providing a new translation process to systematically generate the presentation tier for Web
applications. A brief overview of this process is presented below.
Starting from the navigational and presentational models, a group of
connected Web pages, for each type of user, can be obtained in a systematic way. These Web pages define the Web application’s user interface
(presentation tier) for navigating, visualising the data and accessing the
application’s functionality.
A Web page, created for each navigational context in the navigational
map, is responsible for retrieving the specified information in its AIUs.
This strategy divides Web pages into two logical areas:
•
The information area, which presents the specific system view defined
by a context. The presentation model specification is applied to obtain
the layout of this area in the following way. All AIUs are placed as a
part of the Web page. The instances of the manager class are shown as
their layout pattern determines, applying (if defined) the ordering criteria and the information paging. The instances of navigational classes
related by a navigational relationship follow the same strategy.
Conceptual Modelling of Web Applications: The OOWS Approach
•
287
The navigation area, which provides navigation meta-information to
the user, in order to improve quality (usability) aspects of the final application. The meta-information is as follows:
− Where the user is. States what Web page (context) is being currently shown to the user.
− How the user reached here. Shows the navigational path that has
been followed to reach that page.
− Where the user can go to. Shows a link to any exploration context.
− Which filters and index mechanisms can be used by the user.
− Applicational links. Provides additional links to navigate to the
home page, to log into the system, etc. (e.g. login, logout, home).
Detailed information on how to implement a Web interface using the
navigational and presentation models is described in [9].
9.4 Case Study: Valencia CF Web Application
The Valencia CF Web Application2 was developed to publicise information about the competitions in which the Valencia CF Football Team takes
part, its matches, opponent teams, players, line-ups, members and supporters, partnerships, etc. The main functionality comprises a shopping area,
tickets and season bonus tickets selling, and betting for a particular football match.
This section describes the conceptual model that led to this implementation, based on the OO-Method approach and focusing on the OOWS navigational properties.
Due to the application’s size, it is not possible to present the entire modelling in detail. Thus, we have selected a subset, which is detailed in this
section. The subset represents the functionality related to making a bet for
matches where the Valencia CF team is going to play. A registered user
should be able to bet in any match of any competition where the Valencia
CF Football Team plays. To aid in making bets, the system must provide
registered users with statistics on each team, previous results etc. Using
this information, a registered user can make a bet by predicting the final
score. At any time, users should be able to see the results of previous bets,
their betting cash, and to modify their proposed final score for forthcoming
matches.
2
http://www.valenciacf.com.
288
Oscar Pastor et al.
9.4.1 Valencia CF Web Conceptual Model
Following the OO-Method/OOWS approach, the first step is to describe
the structural and behavioural aspects of the Web application. These are to
be gathered by means of a class diagram, state transition diagrams and a
functional model (see Sect. 9.4.1.1).
Section 9.4.1.2 describes the navigational properties of the Valencia CF
Web Application, by means of a User Diagram, describing the different
user types and corresponding Navigational Models describing the accessibility through the system.
Finally, Sect. 9.4.1.3 introduces abstract presentation requirements related to the specified navigational model.
Valencia CF OO-Method Conceptual Model
The first step to build an OO-Method conceptual model is to describe its
structural model (by means of a class diagram), and its behavioural model
(using a dynamic and functional model). According to the main objectives
of the Valencia CF Web application, the structural model must capture
information about competitions where the Valencia CF Football Team
plays, its matches, teams, players, tickets, partnerships, etc. The main functionality involves a shopping area, tickets and season bonus tickets selling,
and betting for a particular match. Figure 9.5 presents the class diagram,
containing close to 50 classes and 60 relationships.
This figure emphasises the portion related to the betting information.
A RegisteredUser can make a Bet for a Match between two Teams by
specifying her/his predicted localScore and visitorScore. The betting
amount must always be less than the betting cash. To fulfil this requirement, a do_a_bet() operation is created at the Bet class, and also an agent
relationship between the RegisteredUser and the do_a_bet() constructor
operation is established to specify that this type of user is able to execute
this operation. This operation needs a precondition to avoid invalid bets:
do_a_bet(p_localScore, p_visitorScore, p_amount) if
p_localScore 0 AND p_visitorScore 0 AND Match. closedForBetting =
FALSE AND p_amount RegisteredUser.cash
Following the same criteria, the change_a_bet() operation at the Bet
class has a similar precondition.
Conceptual Modelling of Web Applications: The OOWS Approach
Fig. 9.5. Class diagram of the Valencia CF Web application
289
290
Oscar Pastor et al.
Each class of the class diagram has its own state transition diagram
(STD) to specify valid sequences. Figure 9.6 describes the STD for the Bet
class. Transitions are labelled with the agent:operation notation, specifying which agent class is allowed to call the operation.
Fig. 9.6. STD for the Bet class
Valuation rules must be specified to capture the semantics of state
changes as a result of the events. These rules are specified within the functional model and use a notation based on the OASIS formal language.3 These
rules use the following syntax: precondition [ event() ] post condition.
The following rules represent the valuation rules for the Bet class:
[ do_a_bet(p_localScore,p_visitorScore,p_amount) ]
localStore=p_localStore AND visitorScore=p_visitorScore AND amount=p_amount AND status=“undefined”
[ change_bet(p_localStore,p_visitorScore,p_amount) ]
localStore=p_localStore AND visitorScore=p_visitorScore AND amount=p_amount
[ win() ] status = “won”
[ lose() ] status = “lost”
To complete the functional description of making a bet, the following
behaviour should be specified: (1) an Administrator user can close a match
for betting, thus avoiding new bets being created; (2) after introducing the
final score of a match, the system must identify successful and unsuccessful bets.
To fulfil the first requirement, a closedForBetting attribute and a closeForBetting() operation were added to the Match class. The default for the
closedForBetting attribute is set to “FALSE”, to enable betting when a
Match is created.
3
http://www.oasis-open.org/specs/index.php.
Conceptual Modelling of Web Applications: The OOWS Approach
291
The Match.closeForBetting() operation must change the value of the
closedForBetting attribute. This requirement is described by means of a
valuation formula in the Match class:
[ closeForBetting() ] closedForBetting = TRUE
Finally, we need an operation to set a Match’s final score, and obtain
successful and unsuccessful bets. This is not an atomic operation. So, we
must define a transaction within the Match class, as follows:
introduceResult(p_localScore, p_visitorScore) {
FOR ALL <Bet4>
WHERE Bet.localScore = p_localScore AND Bet.visitorScore =
p_visitorScore
DO Bet.win()
.
FOR ALL <Bet2>
WHERE Bet.localScore <> p_localScore OR Bet.visitorsScore <>
p_visitorScore
DO Bet.lose()
}
To complete the functional description of this event, a valuation rule
must be specified to this operation within the Match class, to establish the
value for the final scores:
[ introduceResults(p_localScore, p_visitorScore) ]
localScore = p_localScore AND visitorScore = p_visitorScore
Valencia CF Navigational Model
Once the structural and functional requirements have been determined,
the next step is to specify the navigational capabilities through the system. Following the OOWS approach, the following diagrams must be
specified:
(1) a user diagram, describing the different types of users able to use the
application; (2) a navigational map for each user, describing her/his accessibility and visibility while navigating the system; and (3) a presentation
model, describing presentation requirements for the final Web interfaces.
There are different user types that can interact with the system. Anonymous users can explore public information, such as matches, competitions,
last results and teams. RegisteredUsers can use the shopping area, make a
bet on the Valencia CF matches, and buy tickets via the Internet. Sympathizer users have bonus season tickets, discounts at the shopping area and
special prices for the matches. Finally, the Administrator user manages the
4
Those Bets refer to bets that belong to the Match in which the transaction is
being executed.
292
Oscar Pastor et al.
system. Figure 9.7 shows the user diagram for the system, specifying the
four user types. The Anonymous user type is labelled with a “?” because it
does not need identification to access the system. The other three user
types are specialised from the Anonymous to inherit the navigational map
[8]. They are labelled with a “lock” as they need to be identified to enter
the system. Each one of these user types is directly related with its corresponding class in the class diagram.
Fig. 9.7. Valencia CF user diagram
A navigational map is defined for each user type. This navigational map
defines the user accessibility within the system. Figure 9.8 presents the
navigational map for the RegisteredUser user type. Due to the large number of navigational contexts that belong to this user type, its navigational
map was organised as seven first-level (exploration) navigational subsystems, and nine first-level (exploration) navigational contexts. “The Club”
navigational subsystem provides several navigational contexts to show
different types of information on the Valencia CF Football Team (e.g. history, managers, best players); the “Competitions” navigational subsystem
provides information on the matches within the competitions where the
team participates. This subsystem also allows RegisteredUsers to make a
bet for specific matches; All navigational nodes are exploration nodes, i.e.
always accessible for this user type.
The “Last News” navigational context presents the latest and important
news about the team and special events (see Fig. 9.9). It was tagged with
an “H” to show that it is the default (home) context used when the user
logs into the system. This navigational context has ten AIUs. Nine of those
AIUs refer to advertising (e.g. Web Services, The Team 2004–2005,… ,
The Shop), and appear in every navigational context for this user. The
main AIUs are: Last Hour AIU, which provides the latest news about the
team, and Last News AIU, which provides the most recent news. Both
AIUs comprise one News navigational class that presents a view over the
class diagram’s News class. The Last Hour AIU retrieves the news
Conceptual Modelling of Web Applications: The OOWS Approach
293
Fig. 9.8. RegisteredUser navigational map
date_time, headline and content. This AIU also has a population filter,
within the News navigational class, to describe that only the news with
attribute last_hour set to TRUE must be shown. The Last News AIU provides the news date_time and headline.
Each AIU has a contextual navigational relationship that uses the content or the headline attribute as the anchor to the News Details context
inside the News subsystem. Figure 9.13 below presents the actual Web
page that implements this context, where it is possible to observe all AIUs.
The context relationship appears implemented as links for each of the
news, pointing to the News Details Web page.
Fig. 9.9. Last News navigational context
294
Oscar Pastor et al.
To make a bet, a RegisteredUser must go to the Competitions subsystem
(see the navigational map). Figure 9.10 shows the Competitions navigational subsystem specification comprising two other subsystems: Next
Match and Matches. The former provides information related to the next
match the Valencia CF Football Team is scheduled to play. The latter provides information about future (Calendar), present (Live Match Report)
and past (Live Match Historic) matches, and also statistics, results (Results), current classification (Classification) and active bets (Bets and Your
Bets).
Fig. 9.10. Competitions and Competitions.Matches subsystems
Once inside the Competitions subsystem, a user must reach the Matches
subsystem. From there, the user can navigate to the Bets navigational context to make a bet for a specific game. This context must provide the user
with facilities to explore matches, teams and competitions’ working days.
Figure 9.10 presents the specification of this navigational context, comprising several AIUs. Marketing-oriented AIUs (at the bottom of the context) remain the same throughout the system (see Fig. 9.9).
The Bets navigational context has a main AIU (see Fig. 9.11), named
Bet, which shows information related to matches. Only matches where the
closedForBetting attribute is set to FALSE are shown. The Bets AIU comprises six navigational classes (stereotyped with the «view» keyword): the
manager class, Match, provides date and time (navigational attributes).
From Match, complementary information is provided by means of different complementary navigational classes and relationships. First, it specifies
the WorkingDay in which the match takes place and the Competition name.
Second, it specifies the name and the emblem of both local and visitor
Teams by labelling the relationship using its role attribute (see class diagram in Figure 9.5). The numbered brackets are used to differentiate between navigational classes. Finally, the do_a_bet() operation is provided to
Conceptual Modelling of Web Applications: The OOWS Approach
295
the RegisteredUser, within the Bet complementary class, to activate the
do_a_bet() operation, defined in the behavioural models.
Two context relationships (solid arrows) have been defined within this
context. They allow users to navigate from this context to another by selecting a piece of information, from a source context, that is further detailed in the destination context. Within the Match–Bet context relationship, selecting the number (link attribute) of a WorkingDay, takes a user to
the Calendar (context attribute) navigational context, which shows the
matches within this selected WorkingDay. As no link attribute is specified
as a Match–Match context relationship, a text “Match Report” and a link
are used to allow the RegisteredUser to navigate to the Match Report
navigational context for the selected Match, and to to get the report details
for that selected match.
The Bets navigational context has been defined as a contextual AIU (labelled with a circled C) because it is possible to navigate from the Calendar navigational context to this context by selecting a Match (see the navigational map related to this Competitions.Matches subsystem in Fig. 9.10).
In this case, the Bet AIU will be instantiated to the selected match and will
only provide information about that selected match.
The specification of advanced navigational features to improve navigability inside this navigational context is presented below the dashed lines.
They allow searching for a specific match. It has one index and two filters.
All three search mechanisms share the same Search view structure, which
is composed of information to aid the user search for a desired match. The
search view shows the Match date, the WorkingDay number, the Competition’s name, and the name of both teams that will play.
The index is defined as a “Relationship Index” because the indexing
property belongs to a complementary class (WorkingDay), not to the manager class (Match). When this index is activated, a list of WorkingDay
numbers and Competitions’ name is shown. When the user selects a WorkingDay number (link attribute), all the information about the matches belonging to this WorkingDay number, specified in the search view, will be
shown. Finally, by selecting a Match date, all the information specified in
the Bet AIU will be visible, allowing for a bet to be made (see Fig. 9.14
below).
The other two filters let a user search for a team on which (s)he wants to
make a bet. A user can search for a team that plays as “local” in the match,
or a team that plays as “visitor”. In this case, we want the user to introduce
the (partial) name of the required team. As Team name belongs to a class
that is not the manager class in this context, these filters have been defined
as “Relationship Filters”.
296
Oscar Pastor et al.
Fig. 9.11. Bets navigational context
The LocalFlt Filter has been defined over the Match–Team relationship
(see class diagram), navigating through the Local role. Using Team name
as the filter attribute and filter type being “approximated” (see filter definition in Fig. 9.11), this filter allows the RegisteredUser to search for local
teams with a name similar to the one (s)he provided. This filter retrieves
all the information specified by the search view and behaves as described
for previous indexes. The other filter will do the same, but using the Visitor role of the Match–Team relationship.
Conceptual Modelling of Web Applications: The OOWS Approach
297
Valencia CF Presentation Model
Once the navigational model has been built, we specify specific presentational requirements using the presentation model.
The Last News navigational context is responsible for retrieving the last
hour’s news and the latest news. The presentation requirements for this
context are the following: each last hour must be shown according to the
“Register” pattern; Last News must be presented in a table, in groups of six
elements, sorted by decreasing date–time, and showing only the six most
recent news items.
Fig. 9.12. Last News presentation context
Figure 9.12 shows the presentation context specified for the Last News
navigational context. Presentation requirements have been implemented, as
shown in Fig. 9.13.
9.4.2 Implemented Valencia CF Web Application
This section presents the implemented Valencia CF Web application
graphical interface and the direct relationship with its conceptual model,
described in the previous sections. Each navigational context described in
the navigational map will be converted into a Web page. Links between
pages are defined by the context relationships defined within each navigational context.
Within a Web page, an important issue is how to distribute the different
AUIs. This layout distribution can be easily included in the presentation
modelling stage.
When a RegisteredUser connects to the CF Web application using the
login functionality in the upper right of the Web page (see Fig. 9.13), (s)he
will automatically navigate to the Last News Web page. This Web page is
298
Oscar Pastor et al.
Fig. 9.13. Implemented Web page of Last News (home) navigational context
obtained from the Last News navigational context, placed in the navigational map (Fig. 9.8) as the Home context, and described in Fig. 9.9.
This Web page comprises a navigational area (including a navigational
menu with all the exploration contexts of the navigational map) and an
information area in which all the AIUs specified in the navigational context
are placed. The Last Hour AIU provide the date, time, headline and content
about the news that has the last_hour attribute set to true, presented according to the “Register” pattern. The Last News AIU provides the date, time
and headline for the six latest news items. The pagination cardinality is set
to six and ordering criteria by date_time descending (see Fig. 9.12).
If the RegisteredUser wants to make a bet, (s)he must follow the Competitions link (placed at the left of the Web page, in the navigational menu),
and then the Matches link. Those links come from the navigational structure
induced by the Competitions and Matches subsystems. The navigational
menu of Fig. 9.14 shows the implementation of this subsystem’s concept.
Conceptual Modelling of Web Applications: The OOWS Approach
299
Fig. 9.14. Implemented Web page of Bets navigational context
Once inside this Competitions.Matches subsystem, the RegisteredUser
can get to the Bets Web page by clicking on the Bets link in the navigational menu. When the user selects a match using the index, or one of the
filters, the selected match is displayed. For instance, if we select the match
between Vilareal FC (as local) and Valencia CF (as visitor) within the 39th
working day of the Spanish Football League, the Bets Web page will provide the information shown in Fig. 9.14, according to the navigational
specification in Fig. 9.11.
When the user tries to make a bet (specifying a local and a visitor
score), assuming that it is possible to achieve this functionality according
to the behaviour specification of the dynamic model (see Sect. 9.4.1.1), the
system will update its state and then navigate to the Your Bets Web page,
as specified in the service link attached to the do_a_bet() operation, within
the Bet AIU of the Bets context.
As can also be seen in Fig. 9.11, two navigational links appear within
this Web page: a link that allows the user to navigate to the Calendar by
selecting a WorkingDay (see the 39th Working Day link), and a link that
allows the user to navigate to the Match Report Web page, by clicking on
the arrow beside the Match Report text. Both links were specified in the
Bets context as context relationships.
300
Oscar Pastor et al.
9.5 Conclusions
Model-driven architectures are, in our view, the right approach for building Web applications. This is the main conclusion of this work. We have
shown that a conventional5 conceptual modelling approach can be extended with navigational and presentation modelling views, thus properly
integrating all the involved models.
The required conceptual primitives have been presented, their graphical
representation has been introduced, and how to go from the CS to the corresponding final Web application has been studied. The approach has been
explained by introducing a case study, where all the relevant aspects of the
method have been considered.
This way of working opens the door to the implementation of real Web
model compilers. Having identified the set of mapping between conceptual primitives and their corresponding software representations, the implementation of those mappings will be the core of such a model compiler. This is one of the major contributions of the OOWS approach, when
compared to others. Its close link with the MDA-based tool OlivaNova
Model Executions [1], allows us to provide a conceptual modelling-based
environment, where the model becomes the program. Thus, how to go
from the specification (model) to the implementation is fully detailed, and
can be automated.
Further work is mainly required for defining presentation aspects that
should be included in the presentation model to allow for the creation of a
complete Web Application, from an aesthetical point of view, and to enrich
the model expressiveness whenever needed. The implementation of fully
operative Web model compilers is the subject of future work, especially for
covering different software architectures and Web technologies.
References
1
Pastor O, Gomez J, Insfran E, Pelechano V (2001) The OO-Method Approach
for Information Systems Modelling: From Object-Oriented Conceptual Modeling to Automated Programming. Information Systems, 26:507–534
2
CARE Technologies. OlivaNova Model Transformation Engines.
http://www.care-t.com
3
Muruguesan S, Desphande Y (2001) Web Engineering, Software Engineering
and Web Application Development. Lecture Notes in Computer Science - Hot
Topics. Springer, Berlin
5
Conventional here means a method specifying system structure and behaviour.
Conceptual Modelling of Web Applications: The OOWS Approach
301
4
Rossi G, Schwabe D (2001) Object-Oriented Web Applications Modeling.
Information Modeling in the New Millennium, 463–484
5
Ceri S, Fraternali P, Matera M (2002) Conceptual Modeling of Data-Intensive
Web Applications. IEEE Internet Computing, 6(4): 20–30
6
Knapp A, Koch N, Zhang G, Hassler HM (2004) Modeling Business Processes in Web Applications with ArgoUWE. In: Proceedings of UML 2004,
LNCS 3273, Springer, Berlin, pp 69–83
7
De Troyer O (2001) Audience-driven Web Design. Information Modeling in
the New Millennium, 442–462
8
Fons J, Valderas P, Pastor O (2002) Specialization in Navigational Models.
In: Proceedings of the Argentine Conference on Computer Science and Operational Research, 31:16–31
9
Fons J, Pelechano V, Albert M, Pastor O (2003) Development of Web Applications from Web Enhanced Conceptual Schemas. In: Proceedings of the
ER’2003, LNCS 2813, Springer, Berlin. pp 232–245
Authors’ Biographies
Professor Oscar Pastor is the head of the Computation and Information Systems
Department at Valencia University of Technology (Spain). PhD in 1992. Former
researcher in HP Labs, Bristol, UK. Author of over 100 research papers in conference proceedings, journals and books. Received numerous research grants from
public institutions and private industry. Research activities focus on Web engineering, object-oriented conceptual modelling, requirements engineering, information systems and model-based software production. Leader of the project, undertaken since 1996 by the Valencia University of Technology and CONSOFT S.A.,
that has originated the OlivaNova Model Execution, an advanced MDA-based tool
that produces a final software product starting from a Conceptual Schema where
the system requirements are captured. Within this tool scope, he is responsible of
the research team working from the university on the improvement of the underlying framework, focusing on business process modelling, Web technologies and
how to use software and architectural patterns properly to go from the problem
space to the solution space in an automated way.
He is also a member of over 50 scientific committees of well-known international conferences and workshops such as CAiSE, ER, WWW, DSV/IS, RE,
ADBIS, ICWE, CADUI, DEXA, EC-WEB, ICEIS. Member of several editorial
boards of journals and book series, a participant researcher in national and international research projects, and has been invited to give over 30 talks and presentations in different universities and research centres.
Joan Fons is Assistant Professor in the Department of Information Systems and
Computation (DSIC) at the Valencia University of Technology, Spain. His research
involves Web engineering, adaptive systems, conceptual modelling, model-driven
development and pervasive systems. He is a member of the OO-Method Research
group, and he has published several contributions to well-known international
302
Oscar Pastor et al.
conferences (ER, WWW, CAiSE, ICWE, AH, etc.). His PhD is on OOWS, a Web
engineering method to automatically develop a Web solution from a conceptual
model.
Vicente Pelechano is Associate Professor in the Department of Information Systems and Computation (DISC) at the Valencia University of Technology, Spain.
His research interests are Web engineering, conceptual modelling, requirements
engineering, software patterns, Web services, pervasive systems and model-driven
development. He received his PhD degree from the Valencia University of Technology in 2001. He is currently teaching software engineering, design and implementation of Web services, component-based software development and design
patterns in the Valencia University of Technology. He is a member of the OOMethod Research Group at DISC. He has published in several well-known scientific journals (Information Systems, Data & Knowledge Engineering, Information
and Software Technology, etc.) and at international conferences (ER, CAiSE,
WWW, ICWE, DEXA, etc.). He is a member of scientific commitees of wellknown international conferences and workshops as CAiSE, ICWE and IADIS.
Silvia Abrahão is Assistant Professor in the Department of Information Systems
and Computation (DSIC) at the Valencia University of Technology, Spain. She
woks mainly in the domain of software metrics, functional size measurement,
empirical software engineering and Web engineering. She has published over 45
papers in these fields. She gained a PhD in Computer Science from Valencia University of Technology in 2004. Currently, she is a member of the OO-Method
Research Group at DISC and a board member of the Spanish Association of Software Metrics. She takes a keen interest in industry activities and has been representing Spain in the 2004 meeting of the International Software Benchmarking
Standard Group (ISBSG) in Bangalore. She also has been an editorial board member of the Spanish Journal on Process and Metrics for Information Technologies
and a program committee member of the following venues: 3rd Latin American
Web Congress (LA-Web 2005), IADIS Ibero-American Conference on
WWW/Internet 2005, Interact’2005 Workshop on User Interface Quality Models,
the Spanish Software Metrics Association Conference series, and the 2nd Software Measurement European Forum (SEMF 2005).
10 Model-Based Web Application Development
Gustavo Rossi, Daniel Schwabe
Abstract: In this chapter we present our experience with the ObjectOriented Hypermedia Design Method (OOHDM), a model-based approach
for developing Web applications. We first describe the main activities in
OOHDM and then we illustrate the application of the method with a simple example, a CD store.
Keywords: OOHDM, Web development, conceptual model, navigation
model, hypermedia development.
10.1 The OOHDM approach – An Overview
The Object-Oriented Hypermedia Design Method (OOHDM) is a modelbased approach to the development of Web applications. OOHDM uses
different abstraction and composition mechanisms in an object oriented
framework to, on one hand, allow a concise description of complex information items, and on the other hand, allow the specification of complex
navigation patterns and interface transformations. OOHDM provides a
clear roadmap that allows answering the following key questions, generally asked when building Web applications:
•
•
•
•
What constitutes an “information unit” with respect to navigation?
How does one establish what are the meaningful links between information units?
How does one organise the navigation space, i.e., establish the possible
sequences of information units the user may navigate through?
How will navigation operations be distinguished from interface operations and from “data processing” (i.e., application operations)?
In OOHDM, a hypermedia application is built in a five-step process
supporting an incremental or prototype process model. Each step focuses
on a particular design concern, and an object-oriented model is built. Classification, aggregation and generalisation/specialisation are used throughout the process to enhance abstraction power and reuse opportunities.
Table 10.1 summarises the steps, products, mechanisms and design concerns in OOHDM.
304
Gustavo Rossi, Daniel Schwabe
Table 10.1. Activities and formalisms in OOHDM
Activities
Products
Requirements Use cases,
gathering
Annotations
Formalisms
Scenarios; user
interaction diagrams; design
patterns
Conceptual
design
Classes, sub- Object-oriented
systems, rela- modelling contionships,
structs; design
attribute per- patterns
spectives
Navigational Nodes, links, Object-oriented
design
access struc- views; objecttures, naviga- oriented State
tional concharts; context
texts,
classes; design
navigational patterns; usertransforma- centred scenartions
ios
Abstract inter- Abstract inter- Abstract interface design face objects, face widgets;
responses to concrete widexternal
gets; ontologies;
events, inter- design patterns
face transformations
Implementa- Running aption
plication
Mechanisms
Design Concerns
Scenario and Capture the stakeuse case Analy- holder requiresis, Interviews, ments for the apUID mapping to plication.
conceptual
model
Classification, Model the semanaggregation,
tics of the applicageneralisation tion domain
and specialisation
Classification, Takes into account
aggregation,
user profile and
generalisation task. emphasis on
and specialisa- cognitive aspects.
tion.
build the navigational structure of
the application
Mapping be- Model perceptible
tween naviga- objects, impletion and percep- menting chosen
tible objects
metaphors. Describe interface for
navigational objects. Define layout of interface
objects
Those supported Those provided Performance,
by the target
by the target
completeness
environment
environment
We next summarise the different OOHDM activities; detailed syntax
and semantics can be found in [3,6]. Further information about OOHDM
can be found online at the OOHDM Wiki (http://www.ooohdm.inf.pucrio.br:8668).
10.1.1 Requirements Gathering
The first step during requirements gathering is to gather stakeholders’ requirements. To achieve this, it is necessary to first identify the actors (stakeholders) and the tasks they must perform. Next, scenarios are collected (or
Model-Based Web Application Development
305
drafted), for each task and type of actor. The scenarios are then used to form
use cases, which are represented using User Interaction Diagrams (UIDs).
These diagrams provide a concise graphical representation of the interaction
between the user and the system during the execution of a task. UIDs are
validated with the actors, and redesigned if necessary. In sequence, a set of
guidelines are applied to the UIDs to extract a conceptual model. Details
about UIDs can be found in [9].
10.1.2 Conceptual Design
During the conceptual design, an application domain’s conceptual model is
built using object-oriented modelling principles, augmented with primitives, such as attribute perspectives (multiple valued attributes, similar to
HDM perspectives). Conceptual classes may be built using aggregation
and generalisation/specialisation hierarchies. There is no concern for the
types of users and tasks, only for the application domain semantics. A conceptual schema is built out of sub-systems, classes and relationships.
OOHDM uses UML (with slight extensions) for expressing the conceptual
design.
10.1.3 Navigational Design
In OOHDM, an application is seen as a navigational view over the conceptual model. This reflects a major innovation of OOHDM, which recognises
that the objects (items) the user navigates are not the conceptual objects,
but objects that are “built” from one or more conceptual objects.
For each user profile we can define a different navigational structure,
which will reflect objects and relationships in the conceptual schema according to the tasks a user must perform. The navigational class structure
of a Web application is defined by a schema containing navigational
classes. In OOHDM, there is a set of pre-defined types of navigational
classes: nodes, links, anchors and access structures. The semantics of
nodes, links and anchors are as usual in hypermedia applications. Nodes in
OOHDM represent logical “windows” (or views) on conceptual classes,
defined during conceptual design. Links are the hypermedia realisation of
conceptual relationships, as well as task-related links. Access structures,
such as indexes, represent possible ways to start a navigation.
Different applications (in the same domain) may contain different linking
topologies according to a user’s profile. For example, in an academic Web
application we may have a view to be used by students and researchers, and
another view for use by administrators. In the second view, a professor's
306
Gustavo Rossi, Daniel Schwabe
node may contain salary information, which would not be visible in the
student’s view.
The main difference between our approach and others’, in relation to object viewing mechanisms, is that while others consider Web pages mainly
as user interfaces built by “observing” conceptual objects, we favour the
explicit representation of navigational objects (nodes and links) during
design.
The navigational structure of a Web application is described in terms of
navigational contexts, which are generated from navigation classes, such
as nodes, links, indices and guided tours. Navigational contexts are sets of
related nodes that possess similar navigation alternatives (options), and
that are meaningful for a certain step in a task pursued by a user. For example, we can model the set of courses in a semester, the paintings of a
painter, the products in a shopping cart, etc.
10.1.4 Abstract Interface Design
The abstract interface design defines perceptible objects (e.g. a picture, a
city map) in terms of interface classes. Interface classes are aggregations
of primitive classes (e.g. text fields, buttons) and, recursively, of other
interface classes. Interface objects are mapped to navigational objects in
order to have a perceptible appearance. An interface behaviour is defined
by specifying how to handle external and user-generated events, and how
the communication between interface and navigational objects is to take
place.
10.1.5 Implementation
Implementation maps interface and navigation objects to implementation
objects, and may involve elaborated architectures (e.g. client–server), in
which applications are clients to a shared database server containing conceptual objects. A number of CD-ROM-based applications, as well as Web
applications, have been developed using OOHDM, and employing numerous technologies, such as Java (J2EE), .NET (aspx), Windows (asp), Lua
(CGILua), ColdFusion and Ruby (RubyOnRails).
An open source environment for OOHDM, based on a variation of Ruby
on Rails, is available at:
http://server2.tecweb.inf.puc-rio.br:8000/projects/hyperde/trac.cgi/wiki.
Model-Based Web Application Development
307
10.2 Building an Online CD Store with OOHDM
We next illustrate our method using as a case study the design of a simple
CD store. To keep it simple, we focus mainly on the process of finding
products in the store catalogue, with less emphasis on the check-out process (see [5]). This example is somewhat archetypical, as different Web
applications can be modelled using similar ideas to those we show next.
We emphasise the process of mapping requirements into conceptual and
navigational structures, and ignore user interface and implementation issues (see [1,2,8] for discussions about interfaces and implementation).
In OOHDM we build a different navigational model for each user profile. In this application we have at least two orthogonal profiles: the client
(who is looking for CDs to buy) and the administrator (who maintains the
CD store); we will illustrate the application focusing on the client profile.
10.2.1 Requirements Gathering
The first step is to identify the actors in the application; in the example, our
only actor is the client who buys CDs in the online store. Next, for each
actor, we have to identify the tasks that will evolve into potential use scenarios, and later into use cases. The most important tasks identified are the
following:
•
•
•
•
•
•
•
To buy a CD given its title
To buy a CD given the name of a song
To buy a CD given the name of the performer
To find information about a performer
To find CDs given a musical genre
To find best-selling CDs
To find CDs on offer
Scenario Construction
The next activity consists of describing usage scenarios. Scenarios represent the set of tasks a user has to perform to complete a task. Scenarios in
OOHDM are specified textually, from the point of view of the end users.
In this instance, the role of an end user (client) can also be performed either by different members of the design team, or by the CD store employees. For the sake of conciseness, we describe two of the eighteen scenarios
we elicit from three different users.
308
Gustavo Rossi, Daniel Schwabe
Scenario 1: To buy a specific CD.
“I enter the CD title. For each CD matching that title I obtain the CD’s
cover, availability and price. It is possible to obtain detailed information,
such as track names, duration, details of performing artists, and to listen to
CD tracks. It is also possible to obtain additional data on artists. After
reading the information I decide to buy the CD or to quit”
Scenario 2: To buy a CD given its title.
“I enter the CD title and I obtain the list of matching titles. I choose one
and add it to the shopping cart. Whenever the CD information is shown, I
should see information on its availability”
Use Case Specification
Next, we define use cases, based on the set of scenarios and tasks previously defined; we use the following heuristics:
1. Identify those scenarios related to the task at hand. We will use the two
previous scenarios.
2. For each scenario, identify information items that are exchanged between the user and the application during their interaction.
3. For each scenario, identify data items that are inter-related. In general,
they appear together in a use case text.
4. For each scenario, identify data items organised as sets. In general,
they appear as sets in a use case text.
5. The sequences of actions presented in scenarios should also be present
in a use case.
6. For each scenario, the operations on data items should be included in a
use case.
Once the data involved in the interaction, the sequence of actions and
the operations have been defined, we next specify a use case. A use case is
constructed from the sequence of actions, enriched with data items and
operations. Use cases can also be complemented with information from
other use cases, or from the designer.
The resulting use case for the previous scenario is the following:
Use Case: To buy a CD from its title
1. A user enters the CD title (or part of it).
2. The application returns a list of matching CDs. If only one CD
matches, see step 4. For each CD, its title, artist, price, cover and availability are shown.
Model-Based Web Application Development
309
3. If the user wants to buy one or more CDs from the list, (s)he adds them
to the shopping cart. The sale is dealt with using another use case –
Use Case: Buy. Further CD information is available by selecting it.
4. If a single CD is selected, the application provides further information:
title, cover, availability, price, track names and durations, performers,
description, year, genre and country of origin. If the user wants to buy
this CD, (s)he can either add it to the shopping cart, or leave and buy it
later (Use Case: Buy). The user can listen to a track segment if willing
to.
5. Further information about any artists who participated in the CD can
be obtained by selecting the artist’s name. Once selected, the application returns the artist’s name, date of birth, a photograph and a short
biography.
The specification of the remaining use cases follows a similar process.
Thus, only those use cases that are clearly different from the one described
above will be described next.
Use Case: Verify Shopping Cart
1. The shopping cart displays information on all the CDs selected by a
user. For each CD the following information is provided: title, quantity, artist’s name and price. Total price and the estimated delivery date
are also shown.
2. The quantity relative to each CD can be edited, if necessary, by selecting the CD.
Use Case: Buy CD
1. To buy CD(s) a user must provide a name and, optionally, a password.
2. If a user does not have a password, the following information must
then be provided: name, address, telephone, e-mail address and birth
date.
3. Once the necessary information is given, a user is able to further supply the necessary payment data: payment options (cash or deferred),
payment type (cheque or credit card), delivery options (surface or air)
and optionally delivery address.1 The operation is completed only after
being confirmed by the user.
4. After the operation is confirmed, the user receives an order number.
1
The delivery address only needs to be provided if it differs from the user’s
contact address.
310
Gustavo Rossi, Daniel Schwabe
Specifying User Interaction Diagrams
For each previously defined use case, a User Interaction Diagram (UID)
must be specified. The specification of UIDs from use cases can be done
following the guidelines described below. As an example, we detail below
the process of building the UID for the use case: To buy a CD given its
title.
1. Initially the use case is analysed to identify the information exchange
between the user and the application. Information provided by the user
and information returned by the application are tagged. Next, the same
information is identified and made evident in the use case.
2. Items that are exchanged during the interaction are shown as the UID’s
states. Information provided by the user and by the system are always
in separate states. Information produced from computations and information used as input to the computations should be in separate states.
The ordering of states depends on the dependencies between the data
provided by the user, and those returned by the application. In
Fig. 10.1, we show the first draft of a UID where parts of the use case
are transcribed.
<2>
<1>
The system returns a list of CDs
matching the input string. Shown for
each CD its name, price, cover,
availability and the names of the
artist(s) that participate in the CD.
The user enters all or
part of the CD name
<3>
The system returns detailed information
about the CD: name, cover, availability,
price, name and duration of each track,
names of artists, description, year of
release, genre and country of origin
<4>
The system returns the name, the date
of birth, a picture and a bio of the artist.
Fig. 10.1. Defining a UID
The exchange data items, once identified, must be clearly indicated in
the UID. Data entered by the user (e.g. a CD title) is specified using a rectangle: if it is mandatory, the border is a full line; if it is optional, the border is a dashed line (see Fig. 10.2). An ellipsis (…) in front of a label indicates a list (e.g. …CD indicates a list of CDs). The notation Artist(name,
date of birth, bio, photo) is called a structure.
Model-Based Web Application Development
311
…CD (name, price, cover, availability ... Artist (name))
CD title
CD (name, description, year of release
price, availability, cover, genre, country of
origin, …Song(name) duration, excerpt),
...Artist (name))
Artist/name, date of
birth, bio, photo)
Fig. 10.2. Refining interaction states
Transitions between interaction states must be indicated using arrows.
Multiple paths, as indicated in the use cases, might arise (see Fig. 10.3).
Labels between brackets indicate conditions (e.g. [2..N] indicates more
than one result); a label indicating cardinality represents a choice (in the
example, “1” indicates that only one may be chosen).
CD title
[2..N ]
…CD (name, price, cover
availability,
...Artist (name))
[1]
1
CD (name, description, year of release,
price, availability, cover, genre, country of
origin , … Song (name, duration, excerpt),
...Artist (name))
1
Artist (name, date of birth, bio,
photo)
Fig. 10.3. Transitions between interaction states
Finally, operations executed by the user are represented using a line
with a bullet connected to the specific information item to which it is applied, as shown in Fig. 10.4. The name of the operation appears in parentheses.
312
Gustavo Rossi, Daniel Schwabe
1..N (include in
shopping cart)
1..N (include in
shopping cart)
CD title
[1]
[2..N]
…CD (name, price,
cover, availability,
...Artist (name))
1
CD (name, description, year of release, price,
availability, cover, genre, country of origin ,
…Song (name, duration, excerpt), ...Artist
(name))
1
1 (listen)
Artist (n ame, dat e of birth , bio,
ph oto)
Fig. 10.4. Complete specification of the UID for use case To buy CD given its title
Figure 10.5 and Fig. 10.6 present UIDs corresponding to the use cases
To verify Shopping Cart and to buy CD, respectively. Once we finish the
specification of UIDs for all use cases, we can then design the application’s conceptual model.
name
password
[valid password]
[New Client]
name
telephone
Date of
Birth
e-mail
Payment_type [cash,
installments]
Payment_form [credit card, bank transfer]
address
shipping [air, surface]
Shipping address
(Confirm)
Order Number
Fig. 10.5. UID for use case To buy CD
Model-Based Web Application Development
313
1 (change quantity)
… CD (title, quantity, ...Artist (name), price)
Total price
Delivery Deadline
Fig. 10.6. UID for use case To verify Shopping Cart
10.2.2 Conceptual Modelling
To define classes, their attributes, operations and relationships is not an
easy task. However, the information gathered from use cases and UIDs can
help identify core information classes that can be later refined. Next, we
describe a set of guidelines to derive classes from UIDs, exemplified using
the UID in Fig. 10.4 (To buy CD).
1. Class definition. For each data structure in the UID we define a class.
In the example, classes are: CD, Artist, Song.
2. Attribute definition. For each information item appearing in the UID,
either provided by the user or returned by the application, an attribute
is defined according to the following validations:
a. If, given an instance of class X, it is possible to obtain the value for
attribute A, then A can be an attribute of X (provided X is the only
class fulfilling this condition).
b. If, given classes X and Y, it is possible to obtain the value of attribute A, then A will be an attribute of an association between X and
Y.
c. If the attribute corresponding to a data item does not depend on any
existing class, or combination of classes, we need to create a new
one.
The following attributes were identified from the information returned
by the application, as shown in the UID in Fig. 10.4:
•
•
•
•
CD: title, description, year, price, cover, availability, genre, country of origin.
Artist: name, birth date, description, photograph
Song: name
CD-Song: track, duration.
314
Gustavo Rossi, Daniel Schwabe
3. Definition of associations. For each UID, for attributes contained
within a structure that does not correspond to their class, include the
association if there is a relationship between its class and the class representing the structure.
4. Definition of associations. For each UID, for each structure s1, containing another structure s2, create an association between the classes
corresponding to structures s1 and s2.
5. Definition of associations. For each transition of interaction states in
each UID, if there are different classes representing the source interaction state and the target interaction state, define an association between
corresponding classes.
The following associations were identified by applying 3, 4 and 5 to the
UID in Fig. 10.4:
•
•
CD-Artist
CD-Song
6. Operations definition. For each option attached to a state transition in
each UID, verify if there is an operation that must be created for any of
the classes that correspond to the interaction states.
The following operations were identified from this last guideline:
•
•
CD: includeInShoppingCart
CD-Music: listenTrack
In Fig. 10.7 we show an initial conceptual model derived from the UID:
To buy CD from title.
Artist
name
date of birth
bio
photo
CD
title
description
release_year
price
availability
cover
country of origin
genre
includeShoppingCart ()
Song
1..*
1..*
name
duration
excerpt
listenExcerpt ()
Fig. 10.7. Initial conceptual model
After analysing the complete set of UIDs and performing the required
adjustments we obtain the conceptual model shown in Fig. 10.8.
Model-Based Web Application Development
315
Person
name: String
e-mail: String
date of birth: Date
nationality: String
born_in: String
Client
Artist
password: String
telephone: String
address: Address
bio: [Text+, photo:Image]
deceased?: Date
givenName: String
1..*
1
*
makes
composes
composer
Song
* name: String
lyrics: Text
1..*
1..*
interprets
1..*
Order
participates in
number: Integer
ordeDate: Date
PmtType: [credit card, bank
transfer]
PmtType: [cash, installments]
shipping:[air, maritime,
surface]
shippingAddress: Address
/shippingCharges: Real
/totalAmount: Real
expectDeliveryDate: Date
deliveryDate: Date
*
*
CD
Track
title: String
description: Text
year of release: String
price: Real
availability: String
cover: Image
1..* origin: [domestic,
international]
label: String
isCompilation: Boolean
isHighlight: Boolean
onPromotion: Boolean
discount: Real
/qtySold:Integer
*
newOrder()
updateOrder ()
calculateTotalAmount ()
number: Integer
duration: Integer
excerpt: Audio
listenExcerpt ()
1..*
1..*
Order Item
itemNumber: Integer
quantity: Integer
/itemValue: Real
IincludeItemOrder(CD, Order)
changeQty(CD, Order, qty)
calculateItemValue(qty, price);
has
1..*
Genre
name: string
Fig. 10.8. Conceptual model for the CD store
Note that this conceptual model might need further improvements as the
application evolves, since these classes are simply the ones we derive from
the requirement’s gathering activity. However, this evolution belongs more
to the general field of object-oriented design and is not important in the
context of this chapter.
10.2.3 Navigation Design
During the navigation design activity we generate two schemas: the navigational contexts and the navigational class schemas. The former indicate
possible navigation sequences to help users complete their tasks; the latter
specify the navigation objects being processed. Designers create both
schemas from different sources. UIDs and scenarios are important to obtain a sound navigational model. The conceptual model that has also been
316
Gustavo Rossi, Daniel Schwabe
obtained from requirements is also an important source of information.
Finally, designers use previous experience, e.g. using navigation patterns,
as described in [4,7]. Next we detail the creation of navigational contexts.
Derivation of Navigational Contexts
For each task we define a partial navigational context representing a possible
navigational structure to support the task. We detail the creation of the navigational contexts corresponding to the use case: To buy CD given its title.
First, each structure that has been represented in the UID (and the corresponding class in the conceptual model) is analysed to determine the type
of primitive that it will give rise to (e.g. an access structure, a navigational
context or a list). The following guidelines can be used to obtain a navigational context:
1. When the task associated with the UID requires that the user examines
a set of elements to select one, we map the set of structures into an access structure. An access structure is a set of elements, each of which
contains a link. In Fig. 10.9, we show the partial diagram for access
structures CDs and Artists.
Artists
CDs
Fig. 10.9. Access structures
2. When the task does not require such examination, but requires the elements to be accessed simultaneously, map the set into a list, e.g. the list
of songs in a CD (see Fig. 10.10).
CD ?
title, description, year of release, price,
cover, availability, genre, country of origin,
songs: list of <s: Song, t:Track, s.name,
t.duration, t.excerpt where Track(t, c, s)>
Fig. 10.10. List for CD
3. After mapping the different sets of structures we analyse singular
structures in the UID using the following guideline. When the task requires that an element’s information be accessed by the user, we map
the structure into a navigational context. In Fig. 10.11 we show the
partial context diagram from this example.
Model-Based Web Application Development
CD
Alphabetical
Order
CD Alphabetical Order
317
Artist
by CD
Artist by CD
title, description, year of release, price,
cover, availability, genre, country of origin,
songs: list of <s: Song, t:Track, s.name,
t.duration, t.excerpt where Track(t, c, s)>
artists: Idx Artists by CD (self)
includeShoppingCart ()
listenExcerpt()
name, date of birth, bio, photo
Fig. 10.11. Partial context for UID: Buy CD given its title
In the example, both “CD Alphabetical Order” and “Artist by CD” are
contexts, which correspond to sets of elements. The elements that constitute each set are described in the grey boxes.
In Fig. 10.12 and Fig. 10.13 we show other partial contexts obtained
from previously mentioned UIDs. Other UIDs, such as “CD by Genre”,
“CD on Promotion”, would have similar definitions.
CD
Artists
CD by Artist
by Artist
CD by Artist
title, description, yearof release, price,
cover, availability, genre, country of origin,
songs: list of <s: Song,t:Track, s.name,
t.duration, t.
excerpt where Track(t, c, s)>
artists: list of <a: Artist, a.name where ainterprets t: Track
and Track (t,c: CD, s:Song)>
includeShoppingCart()
listenExcerpt()
Fig. 10.12. Partial context for UID: To buy CD given an artist’s name
318
Gustavo Rossi, Daniel Schwabe
CD
Songs
CD by Song
by Song
CD by Song
title, description, year of release, pr ice,
cover, availability, genre, country of origin,
songs: list of <s: Song, t:Track, s.name,
t.duration, t. excerpt where Track(t, c, s)>
artists: list of <a: Artist, a.n ame where a interpret s t: Track
and Track ( t,c: CD, s:Song)>
includeShoppingCart ()
listenExcerpt()
Fig. 10.13. Partial context for UID: To buy CD given a song’s name
Fig. 10.14 and Fig. 10.15 show other kinds of contexts and their element
definitions.
After obtaining the context diagram for each individual task, we integrate the partial context schemas to obtain the application’s complete
navigational context schema, shown in Fig. 10.16. In the integration process, contexts that are the same are unified, and navigation choices between
contexts in different tasks are also examined.
Ctx ?
total_amount, expected delivery date,
cds: list of <c:CD, i:Item, o:Order, c.title, c.price,qi.quantity,
list of <a:Artist, a.name where a interprets t:Track
and Track (t,c:CD, s:Song)>
where Item (i,c, o)>
changeQty ()
Fig. 10.14. Verify shopping cart
Order
Order Update
Update
client name, e-mail, date of birth, telephone,
address, form of payment[cash, installments],
type of payment[credit card, bank transfer],
transport [air, surface], shipping address
Fig. 10.15. To buy CD
Model-Based Web Application Development
319
CD
CDs
Genres
Alphabetical
by Genre
Compilation by Genre
Songs
by Song
Bestsellers
on Promotion
CDs by Query
<name, description
and/or label>
by Query
Main Menu
by Order
by Artist
Artist
Artists
Alphabetical
by CD
Order
Creation
Update
Fig. 10.16. Navigation context schema
We can see that from the main menu, the user can access different access structures (for CDs, Musical Genres, Songs, CDs by Query, and Artists). Each one of them provides access to sets of nodes that support the
achievement of the different tasks identified at the outset.
Specification of the Navigational Class Schema
During the specification of the navigational class schema the designer derives the navigational schema using both the conceptual model and the
navigational contexts schema. Navigational classes, such as nodes, represent views over conceptual classes: a navigational class can present information from one or more conceptual classes. All classes from the navigational contexts schema are node classes. Meanwhile links are derived from
navigation relationships between classes in the navigational contexts
schema. Note that not all navigation in this schema represents a link. The
rule for selecting the target context is analysed (especially when it involves
320
Gustavo Rossi, Daniel Schwabe
navigation between contexts of the same class). If the elements of the target context are related to an object of the same original class, and if this
object is the parameter, then the navigation represents a link.
For example, in the navigational context schema of Fig. 10.16, we have
navigational classes CD, Order and Artist. We have the following navigations among classes: from CD to Artist by CD, from Order to CD by Order
and from CD to Order in Creation/Update. The selection rule for Artist by
CD (Parameter: c:CD-Elements: a: Artist where a participates in c) indicates that the context is integrated by artists related to a particular CD,
which is its parameter; therefore there is a link from CD to Artist. Similarly, selection rules for the other contexts indicate which navigations correspond to links. In Fig. 10.17 we present the resulting navigational class
schema.
Order
{from o: Order}
name: cl:Client, cl.name where cl makes p
e-mail: cl:Client, cl.e-mail where cl makes p
telephone: cl:Client, cl.telephone where cl makes p
address: cl:Cliente, cl.address where cl makes p;
pmt_form: [ccard, bank transfer]
pmt_type: [cash, installments]
shipping:[air, surface]
number: integer
shipping_address: string
/total_price: real
expectedDeliveryDate: Date
cds: Idx CDs by Order (self)
Item
cd_name: c:CD, p:order, c.name where OrderItem (c, p)
1
0..* order_number: c: CD, p: order, p.number where Item (c, p)
quantity: integer
includes
includeItemOrder (c:CD, p:Order)
changeQty (c:CD, p:Order, quantity:Integer)
1
is_a
0..*
CD
{from c: CD}
Artist
name: string
description: text
photo: image *
deceased: Date
cds: Idx CDs by Artist (self)
title: string
description: text
1..* participa 0..* year: string
price: real
availability: string
cover: image
origin: [national, international]
label: string
onPromotion: boolean
/qtySold: integer
artists: list of <a: Artist, a.name where a interprets t:Track
and Track (t, c, s: Song) >
genres: list of <g: Genre, g.name where c has g>
ind_artists: Idx Artists by CD (self)
listenExcerpt (c, t:Track)
Simple CDs
{from c: CD}
songs: list of <s: Song, t: Track,
s.name, t. duration, t.excerpt
where Track (t, c, s)>
Compilation CD
{from c: CD}
songs: list of < s: Song, t: Track,
s.name, t. duration, t.excerpt,
list of <a: Artist, a.name
where a interprets t>
where Track (t, c, s)>
Fig. 10.17. Navigational schema
Model-Based Web Application Development
321
10.2.4 Abstract Interface Design
The abstract interface design focuses on making navigation objects and
application functionality perceptible to the user, which must be done at the
application interface level. At the most abstract level, the interface functionality can be regarded as supporting information exchange between the
application and the user, including activation of functionalities. In fact,
from this standpoint, navigation is just another (albeit distinguished) application functionality.
Since the tasks being supported drive this information exchange, it is
reasonable to expect that this exchange in itself will be less sensitive to
runtime environment aspects, such as particular standards and devices
being used. The design of this interface aspect can be carried out by interaction designers or software engineers.
For the actual running application, it is necessary to define the concrete
look and feel of the application, including layout, font, colour and graphical appearance, which is typically carried out by graphics designers. This
part of the design is almost totally dependent on the particular hardware
and software runtime environment.
Such separation allows shielding a significant part of the interaction design from inevitable technological platform evolution, as well as from the
need to support users in a multitude of hardware and software runtime
environments.
The entire interface is specified by several ontologies, currently described using RDFS (RDFS W3C) and OWL (OWL W3C) as a formalism.
Abstract Widget Ontology
The type of functionality offered by interface elements is called the abstract interface. It is specified using the Abstract Widget Ontology, which
establishes the interface vocabulary, as shown in Fig. 10.18. This ontology
can be thought of as a set of classes whose instances will comprise a given
interface.
322
Gustavo Rossi, Daniel Schwabe
AbstractInterfaceElement
SimpleActivator
ElementExhibitor
VariableCapturer
PredefinedVariable
ContinuousGroup
DiscreteGroup
CompositeInterfaceElement
IndefiniteVariable
MultipleChoices
SingleChoices
Fig. 10.18. Abstract Widget Ontology
An abstract interface widget can be any of the following:
•
•
•
•
•
•
SimpleActivator, which is capable of reacting to external events, such
as mouse clicks.
ElementExhibitor, which is able to exhibit a type of content, such as
text or images.
VariableCapturer, which is able to receive (capture) the value of one
or more variables. This includes input text fields, selection widgets
such as pull-down menus and checkboxes, etc. It generalises two distinct (sub) concepts.
IndefiniteVariable, which allows entering previously unknown values,
such as text strings typed by the user.
PredefinedVariable, which abstracts widgets that allow the selection of
a subset from a set of pre-defined values; often this selection must be a
singleton. Specialisations of this concept are ContinuousGroup, DiscreetGroup, MultipleChoices and SingleChoice. The first allows selecting a single value from an infinite range of values; the second is
analogous, but for a finite set; the remainder are self-evident.
CompositeInterfaceElement, which is a composition of any of the
above.
It can be seen that this ontology captures the essential roles that interface elements play with respect to the interaction – they exhibit information, react to external events, or accept information. As customary, composite elements allow building more complex interfaces out of simpler
building blocks.
The software designer, who understands the application logic and the
types of information exchange that must be supported, should carry out the
abstract interface design. The software designer does not need to take usability issues or the “look and feel” into account, as they will be dealt with
during the concrete interface design, normally carried out by a graphics (or
“experience”) designer.
Model-Based Web Application Development
323
Once the abstract interface has been defined, each element must be
mapped onto both a navigation element, which will provide its contents,
and a concrete interface widget, which will actually implement the element
in a given runtime environment. Fig. 10.19 provides an example of an
interface for a page describing an artist, and Fig. 10.20 shows an abstract
representation of this interface.
Concrete widgets correspond to widgets usually available in most runtime environments, such as labels, text boxes, combo boxes, pulldown
menus, radio buttons, etc.
Home
Main Menu
Artists A to Z
Beatles
CDs
Artists
Songs
Great Britain
1960-1970
The Beatles were one of the most influential
Search
music groups of the rock era . I nitially they
affected the post -war baby boom generation
of
Britain and the U.S. during the 1960s , and later the rest of the
world. Certainly they were the most successful group, with global
sales exceeding 1.1 billion records
CDs
Descriptions
Songs
.…
CDs:
• Sergeant Pepper’s
• Abbey Road
• Revolver
• ...
Í Previous | Next Î
Fig. 10.19. An example of a concrete interface
324
Gustavo Rossi, Daniel Schwabe
Fig. 10.20. Abstract Widget Ontology instance for the example in Fig. 10.19
Model-Based Web Application Development
325
Mappings
The Abstract Interface Ontology contains, for each abstract interface widget, the mapping both to navigation elements, which are application specific, and to a concrete interface element.
There is additional information in the ontology that restricts each abstract interface widget to compatible concrete interface widgets. For example, a “SimpleActivator” abstract interface widget can only be mapped
into the “Link” or “Button” concrete interface widgets.
Actual abstract interface widget instances are mapped onto specific
navigation elements (in the navigation ontology) and onto concrete interface widgets (in the Concrete Interface Widget Ontology). Fig. 10.21
shows the specification of the “Previous Artist” abstract interface widget
(class “SimpleActivator”), shown in Fig. 10.20, which is mapped onto a
“Link” concrete interface element.
...
<awo:SimpleActivator rdf:ID="ArtistAlphaPrevious">
<awo:mapsTo rdf:resource= “http://www.inf.puc-rio.br/~sabrina/ontology/CW/cwo#Link" />
<awo:fromElement>ctxArtistAlpha</awo:fromElement>
<awo:fromAttribute>_Prev</fromAttribute>
<awo:AbstractInterface>ArtistAlpha</AbstractInterface>
</awo:SimpleActivator>
Fig. 10.21. Mapping between abstract interface widget and navigation element
Fig. 10.22 shows an example illustrating how an application’s functionality is integrated, by providing the OWL specification of the “Search”
abstract interface element. It is composed of two abstract widgets, “ElementExhibitor” (lines 9–12), and “CompositeInterfaceElement” (lines 14–
46). The first shows the “Search” string, using a “Label” concrete widget.
The second aggregates the four elements used to specify the field in which
the search may be performed, namely, three “MultipleChoices” – SearchProfessors (lines 25–29), SearchStudents (31–35) and SearchPapers (37–
41) and one “IndefiniteVariable” – “SearchField” (lines 43–46).
The CompositeInterfaceElement element, in this case, has the properties: fromIndex, isRepeated, mapsTo, abstractInterface and hasInterfaceElement. The fromIndex property in line 2 indicates which navigational index this element belongs to. This property is mandatory if no
previous element of type compositeInterfaceElement has been declared.
The association with the “idxSearch” navigation element in line 2 enables
the generation of the link to the actual code that will run the search. Even
though this example shows an association with a navigation element, it
could just as well be associated with a call to application functionality such
as “buy”.
326
Gustavo Rossi, Daniel Schwabe
...
1 <awo:CompositeInterfaceElement rdf:ID="Search">
2
<awo:fromIndex>idxSearch</awo:fromIndex>
3
<awo:mapsTo rdf:resource="&cwo;Composition"/>
4
<awo:isRepeated>false</awo:isRepeated>
5
<awo:hasInterfaceElement rdf:resource="#TitleSearch"/>
6
<awo:hasInterfaceElement rdf:resource="#SearchElements"/>
7 </awo:CompositeInterfaceElement>
8
9 <awo:ElementExihibitor rdf:ID="TitleSearch">
10
<awo:visualizationText>Search</awo:visualizationText>
11
<awo:mapsTo rdf:resource="&cwo;Label"/>
12 </awo:ElementExihibitor>
13
14 <awo:CompositeInterfaceElement rdf:ID="SearchElements">
15
<awo:fromIndex>idxSearch</awo:fromIndex>
16
<awo:abstractInterface>SearchResult</awo:abstractInterface>
17
<awo:mapsTo rdf:resource="&cwo;Form"/>
18
<awo:isRepeated>false</awo:isRepeated>
19
<awo:hasInterfaceElement rdf:resource="#SearchCDs"/>
20
<awo:hasInterfaceElement rdf:resource="#SearchDescriptions"/>
21
<awo:hasInterfaceElement rdf:resource="#SearchSongs"/>
22
<awo:hasInterfaceElement rdf:resource="#SearchField"/>
23 </awo:CompositeInterfaceElement>
24
25 <awo:MultipleChoices rdf:ID="SearchCDs">
26
<awo:fromElement>SearchCDs</awo:fromElement>
27
<awo:fromAttribute>name</awo:fromAttribute>
28
<awo:mapsTo rdf:resource="&cwo;CheckBox"/>
29 </awo:MultipleChoices>
30
31 <awo:MultipleChoices rdf:ID="SearchDescriptions">
32
<awo:fromElement>SearchCDs</awo:fromElement>
33
<awo:fromAttribute>description</awo:fromAttribute>
34
<awo:mapsTo rdf:resource="&cwo;CheckBox"/>
35 </awo:MultipleChoices>
36
37 <awo:MultipleChoices rdf:ID="SearchSongs">
38
<awo:fromElement>SearchSongs</awo:fromElement>
39
<awo:fromAttribute>name</awo:fromAttribute>
40
<awo:mapsTo rdf:resource="&cwo;CheckBox"/>
41 </awo:MultipleChoices>
42
43 <awo:IndefiniteVariable rdf:ID="SearchField">
44
<awo:mapsTo rdf:resource="&cwo;TextBox"/>
4546 </awo:IndefiniteVariable>
...
Fig. 10.22. Example of the OWL specification of the “Search” part of Fig. 10.19
The isRepeated property indicates if the components of this element are
repetitions of a single type (false in this case). The mapsTo property indicates which concrete element corresponds to this abstract interface element. The abstractInterface property specifies the abstract interface that
will be activated when this element is triggered. The hasInterfaceElement
indicates which elements belong to this element.
The ElementExhibitor element has the visualizationText and mapsTo
properties. The former represents the concrete object to be exhibited, in
this case the string “Search”.
Model-Based Web Application Development
327
The MultipleChoices element has the fromElement, fromAttribute and
mapsTo properties. The fromElement and fromAttribute properties indicate
the corresponding element and navigational attribute in the navigational
ontology, respectively. The IndefiniteVariable element has the mapsTo
property.
10.3 From Design to Implementation
Mapping design documents into implementation artefacts is usually timeconsuming and, in spite of the importance of software engineering approaches be generally accepted, implementers tend to overlook the advantages of good modelling practices. The relationship between design models
and implementation components is lost, making the traceability of design
decisions, which is a fundamental aspect for supporting evolution, a
nightmare. We claim that this problem is not only caused by the relative
youth of Web implementation tools but mainly due to:
•
•
•
Lack of understanding that navigation (hypertext) design is a defining
characteristic of Web applications.
The fact that languages and tools are targeted more to support finegrained programming than architectural design.
The inability of methodologists to provide non-proprietary solutions to
the aforementioned “mapping” dilemma.
For example, we can use the Model View Controller (MVC) architecture to map design constructs onto implementation components. The MVC
architecture has been extensively used for decoupling the user interface
from application data, and from its functionality. Different programming
environments provide large class libraries that allow the programmer to
reuse standard widgets and interaction styles by plugging corresponding
classes into her/his “model”.
The model contains application data and behaviours, and also provides
an interface for the view and the controller. For each user interface, a view
object is defined, containing information about presentation formats, and is
kept synchronised with the model’s state. Finally, the controller processes
the user input and translates it into requests for specific application’s functionality. This separation reflects well the fact that Web applications may
have different views, in the sense that it can be accessed through different
clients (e.g. browsers, WAP clients, Web service clients), with application
data separated from its presentation. The existence of a separate module
(the controller) to handle user interaction, or, more generally, interaction
328
Gustavo Rossi, Daniel Schwabe
with other systems or users, provides better decoupling between application behaviour and the way in which this behaviour is triggered.
However, while the MVC provides a set of structuring principles for
building modular interactive applications, it does not completely fulfil the
requirements of Web applications to provide rich hypermedia structures, as
it is based on a purely transactional view of software. In addition, it does
not take into account the navigation aspects that, as we have previously
argued, should be appropriately supported.
The view component includes structure and presentation of data, while
contents are kept in the model. Specifically, a simple use of the MVC is
for nodes and their interfaces to be handled by the same software component (typically a JSP object).
In addition, the MVC does not take into account that navigation should
always occur within a context and that context-related information should
be provided to the user. For example, if we want the same node to have a
slightly different structure, depending on the context in which it is accessed (e.g. CD in a thematic set or in the shopping cart), we have to use
the context as a parameter for the JSP page, and write conditional statements to insert context-sensitive information as appropriate. The JSP becomes overloaded, difficult to manage and evolution becomes practically
unmanageable. The same problem occurs if we use different JSPs for different contexts, thus duplicating code.
An alternative approach is to use a single JSP that generates the information common to all contexts (basic node), and one JSP for each node in
context, which dynamically inserts that common JSP, adding the contextsensitive information. This is still unsatisfactory, since in this case, the
basic node layout becomes fixed and we have lost flexibility.
To overcome these limitations we have developed a software architecture, OOHDM-Java2, which extends the idea of the MVC by clearly separating nodes from their interfaces, thus introducing navigation objects; it
also recognises the fact that navigation may be context-dependent. Details
on the architecture are presented in [1].
In Fig. 10.23 the higher-level components of the OOHDM-Java2 architecture are presented, in addition to the most important interactions between components, while handling a request.
The main components of OOHDM-Java2 are summarised in Table 10.2.
Model-Based Web Application Development
329
Controller
Client
1) Http
Request
Http Request
Translator
6) Http
Response
2) Business
Event
Extended View
JSP (layout)
3) Applicat ion
Funcionality
I nvocation
Business
Objects
Executor
Navigat
Navigational
ional Node
Node
(content
(contents,
s, model
model
view)
view)
5) Selected
View
View
Selector
Model
4) Queries on
Model State
Fig. 10.23. Main components of OOHDM-Java2
Fig. 10.24 outlines the implementation architecture for the interface [2].
Starting with the navigation and abstract interface designs, the corresponding ontology instances are used as input into a JSP generator, which instantiates the interface as a JSP file using TagLibs. The interpreter uses the
Jena library to manipulate the ontology information.
The actual TagLib code used is determined by the concrete widget definition that has been mapped onto the corresponding abstract widget. The
abstract interface determines the nesting structure of elements in the resulting page. It is expected that the designer will group together functionallyrelated elements.
It is possible to use different instances of the TagLib implementation by
changing its declaration. Thus, for each possible concrete widget, a different implementation of the TagLib code will generate the desired HTML
(or any other language) version for that widget.
330
Gustavo Rossi, Daniel Schwabe
Table 10.2. Main components of OOHDM-Java2
Component
Description
HTTP Request
Translator (Controller)
Every http request is redirected to this component. It translates the user request into an action to be executed by the
model. This component extracts the information (parameters) of the request and instantiates a business event, which
is an object that encapsulates all data needed to execute the
event.
This component has the responsibility of executing a business event, invoking model behaviours following some predefined logic.
This component encapsulates data and functionality specific to the application. All business rules are defined in
these objects and triggered from the executor to execute a
business event.
After the execution of a business event, this component
gets the state of certain business objects and selects the
response view (interface).
This component represents the product of the navigational
logic of the application; it encapsulates attributes that have
been obtained from some business objects and other navigational sub-components such as indexes, anchors, etc.
This component has the contents to be shown by the response interface (JSP).
This component generates the look-and-feel that the client
component receives as a response to its request. To achieve
this, it instantiates the corresponding navigational node
component and adds the layout to the node’s contents.
Notice that the JSP component does not interact directly
with model objects. In this way we can have different layouts for the same navigational node.
Executor (Controller)
Business Object
(Model)
View Selector
(Controller)
Navigational
Node (Extended
View)
JSP (Extended
View)
The actual values of navigation elements manipulated in the page are
stored in Java Beans, which correspond to the navigation nodes described
earlier. The element property, generated in the JSP file, contains calls to
the bean that the Tag Library uses to generate the HTML code seen.
Our current implementation of the TagLib code simply wraps each element that has the “DIV” CSS tag with its own ID, and its CSS class is defined according to its abstract widget type. In this way, we can attach CSS
style sheets to the generated HTML to produce the final page rendering.
Model-Based Web Application Development
OOHDM model
(perceptible
The designer
generates the
abstract interface
ontology instance
according to SHDM
331
Using the abstract widget
ontology instance, JSP code is
generated, using especially
defined TagLibs, one for each
Abstract Interface widget
Generate
JSP code
and TagLibs
Abstract
Widget
Ontology
The TagLib code generates
the actual HTML code
corresponding to the
concrete widget
Navigation
Objects
+
Mapping rule
interpreter
Concrete
Interface
Instance
Fig. 10.24. Outline of the implementation architecture
Given the expressive power of CSS, the concrete page definition format
allows a large degree of flexibility for the graphic designer, both in terms
of layout itself and in terms of formatting aspects. Nevertheless, if a more
elaborate page layout is desired, it is possible to edit the generated JSP
manually, altering the relative order of generated elements. For a more
automated approach, it might be necessary to apply XSLT transformations
to the JSP.
10.4 Discussion and Lessons Learned
One of the main advantages of using a model-based approach for Web
applications’ design is the construction of a set of technology-independent
models that can evolve together with application requirements, and that are
largely neutral with respect to other types of changes in the application
(e.g. runtime settings change).
While working with the OOHDM approach we have found that stakeholders feel comfortable with our notation for requirements acquisition
(UID diagrams). In addition, we have used this notation several times to
discuss requirements and requirements evolution.
332
Gustavo Rossi, Daniel Schwabe
The transition from requirements to design can be managed in a seamless way (perhaps simpler than the transition to implementation). Regarding the implementation, we have found that the instability of frameworks
for Web applications deployment usually hinders the use of model-based
approaches, as developers tend to devote much time to implementation and
to neglect design aspects. In this sense, we have tried to keep our notation
simple to make it easy to use.
10.5 Concluding Remarks
This chapter presented the OOHDM approach for building Web applications. We have shown with a simple but archetypical example how to deal
with the different activities in the OOHDM life cycle. We have also presented several guidelines that allow a designer to systematically map requirements to conceptual and navigational structures. Finally, implementation alternatives have also been discussed.
Web engineering is no longer in its infancy; many mature methods already exist and developers can base their endeavours on solid model-based
approaches like OOHDM and others in this book. The underlying principles behind OOHDM, essentially the clear separation of concerns (e.g.
conceptual from navigational and navigational from interfaces), allow not
only “just in time” development but also seamless evolution and maintenance of complex Web applications.
Acknowledgements
The authors wish to thank the invaluable help of Adriana Pereira de
Medeiros in preparing the example used in this chapter. Gustavo Rossi has
been partially funded by Secyt's project PICT No 13623, and Daniel
Schwabe has been partially supported by a grant from CNPq - Brazil.
References
1
Jacyntho MD, Schwabe D, Rossi G (2002) A software Architecture for Structuring Complex Web Applications. Web Engineering, 1(1)
2
Moura SS, Schwabe D (2004) Interface Development for Hypermedia Applications in the Semantic Web. In: Proceedings of LA Web 2004, Ribeirão Preto, Brazil, IEEE CS Press, pp 106–113, Los Alamitos, CA
Model-Based Web Application Development
333
3
Rossi G, Schwabe D (1999) Web application models are more than conceptual models. In: Proceedings of the World Wild Web and Conceptual Modeling'99 Workshop, LNCS 1727, Springer, Paris, pp 239–252
4
Rossi G, Schwabe D, Lyardet F (1999) Integrating Patterns into the Hypermedia Development Process. New Review of Hypermedia and Multimedia,
December
5
Schmid H, Rossi G (2004) Modeling and Designing Processes in E-commerce
Applications. IEEE Internet Computing, January/February: 19–27
6
Schwabe D, Rossi G (1998) An Object Oriented Approach to Web-Based
Application Design. Theory and Practice of Object Systems, 4(4):207–225
7
Schwabe D, Rossi G, Lyardet F (1999) Improving Web Information Systems
with navigational patterns. Computer Networks and Applications, May
8
Schwabe D, Szundy G, de Moura SS, Lima F (2004) Design and Implementation of Semantic Web Applications. In: Proceedings of the Workshop on Application Design, Development and Implementation Issues in the Semantic
Web (WWW 2004), CEUR Workshop Proceedings, http://ceur-ws.org/Vol105/, May
9
Vilain P, Schwabe D, Souza CS (2000) A Diagrammatic Tool for Representing User Interaction in UML. In: Proceedings UML’2000, LNCS 1939,
Springer Berlin, pp 133–147
Authors’ Biography
Gustavo Rossi is full Professor at Facultad de Informática of La Plata National
University, Argentina, and heads LIFIA, a computer science research lab. His
research interests include Web design patterns and frameworks. He coauthored the
Object-Oriented Hypermedia Design Method (OOHDM) and is currently working
on separation of design concerns in context-aware Web applications. He has a
PhD in Computer Science from Catholic University of Rio de Janeiro (PUC-Rio),
Brazil. He is an ACM member and IEEE member.
Daniel Schwabe is an Associate Professor in the Department of Informatics at
Catholic University in Rio de Janeiro (PUC), Brazil. He has been working on
hypermedia design methods for the last 15 years. He is one of the authors of
HDM, the first authoring method for hypermedia, and of OOHDM, one of the
mature methods in use by academia and industry for Web applications design. He
earned a PhD in Computer Science in 1981 at the University of California, Los
Angeles.
11 W2000: A Modelling Notation for Complex
Web Applications
Luciano Baresi, Sebastiano Colazzo, Luca Mainetti, Sandro Morasca
Abstract: This chapter presents W2000, a complete notation for modelling
complex Web applications. All W2000 concepts are based on a precise
meta-model that characterises the different notation elements and identifies
the relationships between them. After introducing the modelling concepts
and the hierarchical organisation of W2000 models, the chapter exemplifies the main modelling features through a case study and clarifies some
design alternatives. The chapter also describes the tool support offered by
W2000.
Keywords: W2000, Web development, Complex Web applications,
Application modelling.
11.1 Introduction
Web applications are complex software systems with Web-based user interfaces. They can be more data- or process-oriented, but in either case
they integrate the user experience provided by the Web with the capability
of executing distributed processes; the Internet glues together the two aspects [15].
The Web is an easy and simple way to allow users to access remote services without forcing them to install special-purpose software on their
computers. The browser renders the interface and lets the user interact with
the business logic. Such a complexity must be suitably addressed from the
very beginning of the development process. Even though we can distinguish between navigation capabilities and business logic, the user must
perceive an integrated solution, where the two components are carefully
blended in a homogeneous product.
Pages must be functional to the services offered by the application, but,
at the same time, services must be structured such that they can be accessed through the pages. Even though the Web still privileges a usercentred approach to the design of Web applications, the page-first approach is not always the right choice.
The design of a complex Web application is, in our view, a software engineering problem. Many traditional methodologies and notations can be
used. However, the user interface plays a key role in the overall quality of
336
L. Baresi et al.
the application, the architecture is heavily constrained by technology, and
the lifetime –at least of the Web interface– is limited.
W2000 [2] is a complete notation for modelling complex Web applications. It borrows concepts from different domains, which are integrated as
a homogeneous solution. W2000 originates from HDM (Hypertext Design
Model [7]), i.e. from hypermedia and data-centric Web applications, but
also borrows principles from UML (Unified Modeling Language [6]) to
support the conception of business processes. W2000 allows the designer
to model all the aspects of Web applications, from Web pages to business
transactions, in a coherent and integrated way. It also adopts a modeldriven approach [8] to allow designers to refine their models incrementally
and move smoothly from specification to design.
This chapter introduces the main concepts of W2000 through its metamodel. According to the Object Management Group (OMG)1 definition,
the meta-model defines the modelling elements of a notation without concentrating on their concrete syntax. Thus, the meta-model covers both the
hierarchical organisation of user specifications and the actual elements that
describe Web applications. The explicit availability of the meta-model is
important to help designers assess the consistency of their models and define automatic transformations between them. All defined models must
comply with the constraints set by the meta-model; transformations among
models are specified by means of special-purpose rules that work on the
meta-objects to create, modify, and delete them. Their purpose is the
automated creation of models as well as the derivation of new models from
existing ones. These rules, along with the meta-model that enforces consistency, are of key importance in the context of a family of applications,
where the same core functionality is embedded in a set of similar applications. For example, we can imagine that the adoption of a new device –say,
a PDA instead of a traditional PC– requires that the application be reorganised to cope with the specialties of the new channel (i.e. the small screen,
in this case).
All these concepts are available through the tool support offered by
W2000. Our prototype framework is implemented as a set of add-ons to
the Eclipse integrated development environment [4].
The chapter also describes a simple Web-based conference manager to
exemplify the main modelling features and discuss the rationale behind
them.
The chapter is organised as follows. Section 11.2 introduces the main
concepts behind W2000 through its meta-model, discusses the idea of consistent models, and introduces transformation rules as a means to support
evolution and adaptability. Section 11.3 clarifies the rationale behind
1
http://www.omg.org/.
W2000: A Modelling Notation for Complex Web Applications
337
W2000, sketches a high-level modelling process, and describes the supporting tools. Section 11.4 exemplifies the modelling features on the models of a simple Web-based conference manager. Section 11.5 concludes the
chapter.
11.2 Modelling Elements
The OMG organises models and meta-models around a four-level hierarchy [13]: objects (level 0) are instances of elements specified in a model
(level 1). Meta-models (level 2) define the languages used to render the
models and the meta-meta-model (level 3) defines the unique language that
must be used to define meta-models. OMG proposes MOF (Meta Object
Facility [13]) as the unique meta-meta-model and UML classes and objects
to render the modelling elements. MOF concepts can be seen as suitable
UML classes, objects as UML objects, but the elements that belong to
models and meta-models can be seen as both objects and classes. They are
objects when we consider them as instances of their higher level concepts,
but they become classes when we consider the modelling features they
offer. For example, a level 2 element is an object (instance) of a level 3
element, but it is also the class –something that can be instantiated– of
level 1 elements; that is, of the models that originate from the meta-model.
In this section, we describe W2000 as a meta-model, and later we demonstrate W2000 level 1 models in Sect. 11.4. Fig. 11.1 shows the hierarchical organisation of W2000 models2. All concepts are fully specified
with attributes and methods: interested readers can refer to [10] for a detailed description of all W2000 elements; here we only introduce concepts
informally.
A W2000 Model comprises some Models. Each Model has a predefined
Package, which acts as a root for the hierarchy of other Packages and
W2000 Elements that belong to the Model. This is implemented through
the abstract class Element with Package and all the W2000 Elements as
sub-classes. Elements belong to the Package in which they are defined, but
are rendered in Diagrams, which could also belong to different Packages.
2
For the sake of simplicity, the meta-models we present slightly simplify some
relations and only assume multiplicities 1..n.
338
L. Baresi et al.
Fig. 11.1. W2000 hierarchy
Figure 11.2 shows the meta-model of package W2000 Elements and all
the concepts that come from it. Conceptually, the starting point is the
package Information, whose goals are the identification and organisation
of all the data that the application should deal with. The former goal belongs to the package Hyperbase, while the latter belongs to the package
Access Structures.
The package Hyperbase identifies the Entities that characterise the application. They define conceptual “data” that are of interest for the user.
Components are then used to structure the Entities into meaningful fragments. They can be further decomposed into sub-components, but the actual contents can be associated with leaf nodes only. Since a Component is
also a Generalisable Element, from the package Common Elements, it is
further decomposed into Slots and Segments.
Slots identify primitive information elements and are the typed attributes
that specify the contents of leaf components. Segments define “macros”,
i.e. sets of slots that can be reused in different elements. Both Slots and
Segments belong to package Common Elements.
Semantic Associations identify navigational paths between related concepts. Their sources and targets are Connectible Elements, i.e. Entities,
other Semantic Associations, or Collections, which are explained later in
this section.
An Association Centre –subclass of the abstract class Centre of the
package Common Elements– describes the set of “target” elements identified by a Semantic Association. In a 1 to n association, it defines how to
identify either the entire set of targets as a whole or each individual element in the set.
W2000: A Modelling Notation for Complex Web Applications
339
W2000 Elements
Information
Hyperbase
child
Entity
Component
parent
Association
Center
Semantic
Association
Common Elements
target
source
Connectible
Element
Generalizable
Element
Center
Navigation
Cluster
NLink
member
Slot
source
target
refines
Segment
Node
Access Structures
container
Collection
Center
Presentation
Collection
Page
target
home
Services
Unit
Section
Process
source
Transition
Link
target Operation
Fig. 11.2. W2000 elements
The package Access Structures organises the information defined so far.
It specifies the main access points to the application and only comprises
Collections, which define groups of elements that should be perceived as
related by the user. Collections organise data in a way that complies with
the mental processes of the application domain. Also Collections can have
special-purpose centres called Collection Centres.
When we move to the package Navigation, we define how the user can
browse through the application. It reshapes the elements in the previous
packages to specify the actual information elements that can be controlled.
Nodes are the main modelling elements and define atomic consumption
340
L. Baresi et al.
units. Usually, they do not define new contents, but render information
already defined by Generalisable Elements. Clusters link sets of Nodes
and define how the user can move around these elements. Nodes and
NLinks identify the navigational patterns and the sequences of data traversed while executing Processes. This leads to organising Clusters in3:
structural clusters if all of their elements come from the same Entity; association clusters if they render Semantic Associations; collection clusters
if they describe the topology of a Collection; and transactional clusters if
they relate to the set of nodes that the user traverses to complete a Process
(i.e. a business transaction). Clusters identify only the main paths through
nodes; other relationships can be identified directly on the actual pages.
The package Services describes the Processes that can be performed by
the user on the application data. Each Operation can be part of a business
process, which is identified by a Process. Transitions identify the execution flow. Processes must be rendered in the navigation model through
suitable Clusters.
Finally, the package Presentation offers Units that are the smallest information elements visualised on pages. They usually render Nodes, but
can also be used to define forms, navigable elements, and labels. Sections
group related Units to better structure a page and improve the degree of
reuse of page fragments. They can be divided into contents sections, which
contain the actual contents of the application, and auxiliary sections, which
add further contents (e.g. landmark elements). Pages conceptually identify
the screens as perceived by the user. Links connect Pages and identify the
actual navigation capabilities offered to users. Links can also “hide” the
enabling of computations (i.e. Operations).
The pure class diagrams of Figs. 11.1 and 11.2 are not sufficient to fully
specify the notion of consistent model. Many constraints are already in the
diagrams in the form of multiplicities associated with each association, but
others must be stated externally by means of OCL (Object Constraint Language [6]) assertions. Among these, we can identify topological constraints, which must always hold true and complement those already embedded in the class diagram, and special-purpose constraints, which
impose specific restrictions on the notation and identify a new dialect.
If we consider the first set, one of the obvious constraints is that each element must be unique in its scope. For example, the following OCL invariant:
Context Entity
inv: allInstances -> forAll(e1, e2 |
e1.Name = e2.Name implies e1 = e2)
3
This specialisation is not rendered with sub-classes, but is specified using a
simple flag associated with class Cluster.
W2000: A Modelling Notation for Complex Web Applications
341
requires that Entity names be unique in a model. If two Entities have the
same name, they are the same Entity. An invariant is defined using inv, a
property that must always be satisfied for all the objects of the class (Entity, in this case). Similarly, we have defined invariants for all other
W2000 elements.
Special-purpose constraints characterise particular instantiations of
W2000. For example, in a simplified version of W2000 for small devices,
we could impose that each Page renders exactly one Section. This condition can be easily stated as an invariant associated with class Page:
Context Page
inv: sections->size = 1
In this case, we use the name sections to refer to the aggregation between Page and Section of Fig. 11.2.
The distinction between topology and special-purpose constraints allows
us to better tune the consistency of models. Several dialects can share the
same meta-model in terms of the topology of the class diagram, and also
some other OCL constraints, but they are characterised by special-purpose
restrictions.
The meta-model, along with its constraints, supplies the means to assess
the consistency of designed models. We can cross-check every model
against its definition (the meta-model) and see if the first is a proper instance of the second. The meta-model is necessary to pave the way to coherence and adaptability.
11.3 Models
W2000 fosters separation of concerns and adopts a model–view–control
approach. A complete W2000 model is organised in four models: information, navigation, services, and presentation. Information defines the data
used by the application and perceived by the user. Navigation and Services
specify the control; that is, how the user can navigate through information
chunks and modify them through suitable business processes. Presentation
states how data and services are presented to the user; that is, it specifies
pages and activation points for business services.
The same language can be used to specify each model at two different
abstraction levels. We use the term in-the-large when we refer to general
aspects, which are only sketched and in many cases are placeholders to
structure the whole application. We use the term in-the-small when we
fully specify all designed concepts.
342
L. Baresi et al.
Conceptually, a number of models can be designed in parallel, and –as
often happens– designers are required to rework the same artefacts several
times to accommodate the design decisions made while developing the
application to enforce consistency between the different parts. We propose
an iterative approach organised around the following steps:
•
•
•
•
Requirements analysis, which is not addressed in this chapter, extends
“conventional” requirements analysis to Web-based applications. It
must cover the analysis of both navigational and functional requirements, which are complementary and intertwined. The analysis of
navigational requirements has the goal of highlighting the main information and navigation structures needed by the different users of the
application. The analysis of the functional requirements concentrates
on the identification of the business processes, as perceived by the different classes of users.
Hypermedia design starts with drafting the information, navigation,
and presentation models. These in-the-large models embed a preliminary design of the Web application that is very close to the requirements and is mainly intended to focus on the essential properties of the
Web application. Following a conventional model-driven approach [8],
hypermedia models are refined to introduce all the details that must be
set before implementing the application. This step produces the in-thesmall version of addressed models and requires more precision and
completeness.
Service design runs in parallel with Hypermedia design and specifies
the main business transactions supported by the application. It extends
the standard specification of the procedural capabilities of a given application by adopting a user-oriented view and by blending the business logic with the user experience offered by the hypermedia parts of
the application.
Customisation activities, if needed, define those features that need to
be specialised, their special-purpose contexts, and also the strategies to
move from the initial models to their customised versions.
Not all of the steps must be necessarily executed for all applications. For
instance, if we think of simple Web applications, we can easily concentrate
on the presentation model and skip all the others. The set of design activities only define a homogeneous framework that must be suitably adapted
to the different situations.
Customisation activities, which are orthogonal to the main modelling
activities, let the designer define which application features –content,
navigation, presentation, and services– need to be specialised with respect
to the context. Context here comprises all the aspects that concern the
situation of use: device characteristics (i.e., traditional browser, PDA, or
W2000: A Modelling Notation for Complex Web Applications
343
mobile phone), user preferences, etc. This activity results in specialpurpose models, which can be generated automatically by means of transformation rules or can be defined manually according to particular design
decisions.
The problem of customising the design to the requirements of a particular context can be addressed in different ways:
•
•
•
If customisation starts while designing the information model, the designer produces an information model for each context, that is, for
specifying the content structures that are specific to each context. Thus
special-purpose navigation and presentation models can be derived incrementally.
If customisation is postponed to navigation, the designer specifies a
single information model, which defines all possible content structures.
It is while working on navigation aspects that this information is filtered and restructured according to the need of every specific context.
The result is a set of context-specific navigation models coupled with
the corresponding presentation models.
If customisation only addresses presentation, the designer produces a
single information model and a single navigation model, but multiple
presentation models. The specification of context-specific contents and
links is only constrained in the presentation structures.
Customisation also affects the design of services. Different contexts–
and thus different navigation and presentation models– may impose particular services and specific ways to interact with the user.
11.3.1 Adaptability
Given the organisation of W2000 models, where Navigation and Services
are built on top of Information, and Presentation exploits the two previous
models, we can identify two kinds of relationships between models:
•
•
Horizontal relationships support customisation and relate different
versions of the same model. For example, the Presentation for a PCbased application and that for a PDA-based system define a horizontal
relationship.
Vertical relationships relate two models in the hierarchy. For example,
the Information and Navigation for a PC-based application define a
vertical relationship.
Both relationships can be implemented by means of transformation
rules that work on instances of the meta-model. They add model elements
automatically. More generally, all modelling activities that are intrinsically
344
L. Baresi et al.
automatic can be rendered through rules. They help the designer save time
and produce correct models. For example, rules could add a component to
each entity, a node for each component, and a cluster for each entity, association, and collection in the model.
Transformation rules help customise and adapt (parts of) models. Adaptation can be required by new requirements or the need for delivering a
new member of the family by modifying some model elements with wellknown patterns. For example, we can support a new device by reshaping
navigation and presentation models. Transformation rules also enforce the
use of modelling patterns. Instead of relying on the ability of designers to
embed significant patterns in their models, rules offer a ready-to-use
means to exploit them.
Finally, transformation rules can convert models, specified using a
given W2000 dialect, into models that comply with another dialect. This is
important because we want to stress the interchangeability among W2000
dialects and the fact that special-purpose applications (e.g. for particular
devices or with specific restrictions on accessibility) could motivate ad-hoc
dialects.
Even if users exploit these rules, they are free to modify their artefacts
by hand to change and complete them. As already stated, we want to make
the modelling phase easier and not completely substitute design intuitions
with machine-based rules. This is also supported by the idea that the approach can be adopted in different ways. At one end, it can be used to define a first framework for the application and leave plenty of room to the
designer to complete it. At the other end, it could offer a complete library
of rules to produce the application almost automatically. In either case, the
meta-model oversees the correctness of produced models.
Given the meta-model presented in Fig. 11.2, we can design several different rules. A rule is a standard graph transformation production rendered
here by using a pair of UML object diagrams: The left-hand side describes
the configuration (sub-graph) that must exist to apply the rule; the righthand side describes how the sub-graph is modified by applying the rule. In
other words, the former is the pre-condition and the latter is the postcondition associated with the rule. The meta-model supplies the type graph
on which rules can predicate.
Here, we introduce rules informally and with the aid of an example. For
the sake of understandability, they do not deal with the structure (Models
and Packages), but we assume a simple flat organisation. If we added hierarchy, concepts would be the same, but the rule would become more complex and less readable.
W2000: A Modelling Notation for Complex Web Applications
345
The rule4 of Fig. 11.3 is a typical example of a vertical relationship. It
helps define the Navigation model by “refining” the components of the
Information model. It adds a new Node element that corresponds to a leaf
Component and the new Node inherits all the Slots that define the Component. Notice that since the cardinality of the set of Slots that belong to the
Component can vary, we use the UML multiobject to identify a variable
collection of objects.
1.leaf == true
3.name = 1.name + "Node"
3.minCard = 1
3.expCard = ?
3.maxCard = ?
3.comment = "automatically generated";
Fig. 11.3. Rule to create a new Node in the Navigation model given a leaf Component in the Information model (vertical refinement)
The rule comprises two object diagrams and two text blocks. The expression before the diagrams defines constraints on the attribute values of
the left-hand side elements to enable the rule. The block after the diagrams
defines how to set the attributes of the right-hand side elements. In this
case, the rule imposes that the Component be a leaf one and shows that the
name of the new Node is the name of the Component augmented with suffix Node. minCard is equal to one, expCard and maxCard are left unspecified, and comment says that the Node is generated automatically.
This rule allows designers to apply it to as many leaf Components as
they want. A slightly more complex rule could enforce the iteration on all
leaf Components that belong to the model. This modification implies the
capability of programming the application of a rule a number of times that
cannot be fixed statically. This is beyond the scope of this section, but
interested readers can refer to [1] for a more detailed presentation of transformation rules and their applicability.
4
As general solution, compositions (black diamonds) of Fig. 11.2 are rendered
with has labels in the rules.
346
L. Baresi et al.
11.3.2 Tool Support
W2000 is supported by an innovative modelling toolset. According to the
architecture in Fig. 11.4, the user interacts with the toolset using the Editor, which is a W2000-specific graphical editor, implemented as an add-in
to Eclipse [4]. A first prototype of this component is available.
Designed models are stored in an MOF repository, implemented with
MDR/netbeans [12]. Topological constraints are embedded directly in the
meta-model, while special-purpose constraints are checked by an external
Constraints validator based on xlinkit [11]. The MOF repository is released and works in conjunction with the Editor, while the Constraints
validator is still under development.
Fig. 11.4. High-level architecture of the tool support
Both the Editor and the MOF repository support XMI (XML Metadata
Interchange [13]) as XML-based neutral format for exchanging artefacts
and fostering the integration with other components (for example, automatic generators of code and documentation).
The Rule Engine is based on AGG (Attributed Graph Grammar System
[5]). It applies transformation rules on the instances of the meta-model.
This is not yet fully integrated with the Editor, but we have already conducted experiments with some rules.
11.4 Example Application
This section explains and exemplifies the main modelling features of W2000
through a simple Web conference manager, an application that helps chairs
run the organisation and bureaucracy associated with running a conference.
W2000: A Modelling Notation for Complex Web Applications
347
Briefly, a Web-based conference management system guides the different classes of users involved in a conference to accomplish their tasks.
This means that it must support authors while submitting papers, guide
programme committee members while reviewing papers, and help the general (programme) chair select papers and set up a programme. Involved
roles impose constraints on the way they can use the application. Generic
users should only be allowed to browse through the public pages that advertise the conference and contain the “usual” information associated with
a conference. These users should not be allowed to browse submitted papers and reviews. Authors should be able to access the information about
their papers, but not that of other papers nor the information about the reviewing process. Programme Committee members (PC members) should
see all papers and optionally reviews, except those for which they have
declared conflicts of interest. The chair must be able to have full access to
the application. After accepting papers, the application should notify all
the authors, asking authors of accepted papers for the camera-ready version of their submissions.
Fig. 11.5. Hierarchical organisation of the application models (generic user)
348
L. Baresi et al.
Another important requirement is that users can access the application
through different devices: for example, they can use conventional PCs or
more advanced PDAs. Devices and user profiles add interesting modelling
dimensions: this means special-purpose models in W2000. For example,
Fig. 11.5 shows the organisation of the models of the Web conference
system for the generic user. In this case, we assume a single information
model, and we distinguish between navigation and presentation models for
PCs and those for PDAs. We do not detail the service model to keep the
figure simple; we will discuss it in Sect. 11.4.4.
11.4.1 Information Model
The information model comprises the identification of the contents of the
application and its high-level structures. It describes the macro-categories
of information objects (entities according to the W2000 jargon) needed by
the different users, the relationships between them (semantic associations),
and the main ways they can be grouped (collections). Entities and semantic
associations are described using hyperbase diagrams, while collections are
described in access diagrams. The different roles, and thus the need for
adapting the design artefacts, impose different hyperbase and access diagrams for each context (i.e. role in this case). In this example, we start by
concentrating on the hyperbase diagram of the conference chair (shown in
Fig. 11.6). This diagram is also the global diagram from which we can
derive the diagrams specific to the other roles.
Fig. 11.6. Global hyperbase diagram (in-the-large view)
W2000: A Modelling Notation for Complex Web Applications
349
Paper, Author, Tutorial, PCMember, and Review are the entities for
which we assume the existence of several instances. This is why each
name is followed by the minimum, maximum, and average number of
instances. These figures can only be estimated at this stage, but it is important to start thinking of the expected number of elements both to design the
database and to work on the navigation and presentation.
In contrast, ConferenceIntroduction, ConferenceLocation, HowToSubmit, HowToReview, and ConferenceChair are entities for which we imagine a single instance; that is, they are singletons in the information space
and thus do not need any cardinality. The diagram is completed by the
semantic associations that link the different entities. The absence of connections among these entities means that we do not foresee any semantic
relation between them. They will be related to the other elements by means
of suitable links while modelling navigation and presentation.
The common hyperbase diagram is the starting point to define the customised versions of the particular roles. The generic user, for example, can
only browse the public information about the conference, i.e. entities ConferenceLocation, ConferenceIntroduction, ConferenceChair, and HowToSubmit, and also the instances of the entities Paper, Tutorial, and Author
as soon as they become available (i.e. the review process is over and final
decisions are taken). Information about PCMembers and Reviews will
never be available to this class of users.
Figure 11.7 presents the hyperbase diagram from the viewpoint of the
generic user: the entities and semantic associations that cannot be “seen”
by these users are deleted from the diagram.
Fig. 11.7. Hyperbase diagram (in-the-large) for generic users
350
L. Baresi et al.
This is a case of simple derivation, where the hyperbase diagram of Fig.
11.6 acts as a starting point for all the other specialisations. In other cases,
the starting point can be a particular view of the system and thus the other
views do not only filter the contents of the starting diagram, but add special-purpose elements.
Each entity, represented in the hyperbase diagrams, needs to be structured in terms of components to organise its content into meaningful parts.
For example, Fig. 11.8 the presents entity Paper and its three components:
Abstract, Submission, and CameraReady. Cardinalities prescribe the minimum, maximum, and average number of components of the same type
associated with a single instance of the root entity. This does not mean that
the three components must be defined simultaneously and are available to
all users, but Fig. 11.8 only specifies the structure of all Paper entities.
Cardinalities say that all papers must have an Abstract and a first Submission, but only accepted papers have also a CameraReady (this is why its
minimum cardinality is equal to zero).
The definition of collections leads to access diagrams. A collection
represents a container of entities or other collections. These elements,
called collection members, can be selected and organised according to
different criteria. Given the roles of our conference manager, we need special-purpose collections for each role. For example, generic users might be
interested in skimming through all accepted papers, or all the papers by a
given author, while PC members may be interested in navigating through
the papers they are supposed to review.
Fig. 11.8. Component tree for entity Paper
W2000: A Modelling Notation for Complex Web Applications
351
Paper
AllPapers
50:100,70
AllPapers
Center
(a)
Paper
PapersByAuthor
50:100,70
100:250,200
PapersByAuthor
Center
(b)
Fig. 11.9. Collections AllPapers (a) and PapersByAuthor (b)
Collections can be shared among roles: for example, all users perceive
the same collection AllPapers (Fig. 11.9(a)). Given the user perspective
adopted by W2000, we can foresee collections like AllPapers, which are
instanced once for the whole application, collections like PapersToReview, which are instantiated once for each PC member (or, more generally, for each user), and collections like PapersByAuthor (Fig. 11.9(b)),
which can be instantiated several times for the same user since he/she can
ask for the papers of different authors, and thus create new instances of
the same collection.
Hyperbase and access diagrams can be refined to specify the slots associated with the various information structures. This in-the-small activity
completes the definition of the elements identified so far. Slots can be either atomic or structured, but W2000 does not provide any built-in library
of slot types. Designers are free to define their own sets of types. As an
example, Fig. 11.10 shows the slots of Abstract components. Types are
defined only informally, but we need to distinguish among: slots like number or title whose type is primitive; slots like mainTopic or submissionCategory, which are strings of a given length; and slot author, which is
compound and whose structure is represented by the sub-tree drawn below
the slot name. Cardinalities have the usual meaning of setting the minimum, maximum, and expected number of elements. In this case, each Paper must have at least one author, no more than five authors, and we estimate an average number of three authors.
352
L. Baresi et al.
Abstract
Number : integer
Title : text
Author [1:5,3]
Name: string [50 ]
Affiliation : string [50 ]
Address : text
Email: string [100 ]
Abstract: text
MainTopic : string [20 ]
SecondaryTopic: string [20 ]
SubmissionCategory: string [20 ]
Fig. 11.10. Slots for component Abstract
Slots and operations define the actual structure of components and entities, but also specify the centres of semantic associations and collections.
Centres are information structures that users exploit for navigating the
associations or collections of interest. They contain the slots that describe
the members of the association (or collection). These slots are partially
borrowed from the members and can also be introduced explicitly to
characterise the container (either an association or a collection). The centres of semantic associations also identify the directions through which
the associations can be navigated: bi-directional associations imply two
different centres. If we consider association Authorship of Fig. 11.6,
which is fully detailed in Fig. 11.11, we can use centres HasWritten and
WrittenBy to allow the information flow from Author to Paper and the
other way around.
The last elements that the designer must specify are the segments. They
are introduced to make the design more efficient and maintainable, by
improving the reuse of definitions. A segment groups a set of related slots
and makes them become a single element that can be reused as such in
different parts of the model. Using a segment corresponds to using the
entire group of slots associated with it. For example, we can define the
segment PaperShortIdentification as the union of the slots Title, Au5
thor.Name, and MainTopic of entity Paper. This segment can then be used
to characterise centre HasWritten, but also collections AllPapers, PapersByAuthor, or PapersToReview: they all need a short description of the
papers they contain.
W2000: A Modelling Notation for Complex Web Applications
353
HasWritten
Center
HasWritten
Author
100:250,200
Authorship
1:3,1
Paper
50:100,70
1:5,3
WrittenBy
WritttenBy
Center
Fig. 11.11. Semantic association Authorship
11.4.2 Navigation Model
After the information model, we need to define how the contents are organised for fruition. By means of a navigation model, designers specify the
navigation structures that describe how the user navigates the contents by
exploiting the relevant relationships defined in the information model.
Semantic associations and collections are the starting points to implement
the hypertext that specifies how the user can navigate the application.
W2000 introduces the concepts of node and navigation cluster. A node
corresponds to the elementary granule of information from/to which the
user can navigate. It is an aggregate of contents meaningful to the user in a
given situation and that cannot be broken into smaller pieces. Instead, a
cluster represents a cognitively well-defined interaction context; it is a
container that groups a set of closely related nodes. A cluster aggregates
the nodes that come from an entity (structural cluster), a semantic association (association cluster), a collection (collection cluster), or a process
(process cluster). The overall navigation across the application is determined by shared nodes that belong to different clusters.
Navigation in-the-large means identifying which clusters are derived
from the information structures and which nodes belong to each cluster.
Structural clusters require that the designers identify the first node, i.e the
node to which users should be moved if they are interested in the entity
rendered by the cluster. Designers must also define the connections among
nodes, to state how users can move from one node to the others, and the
content of each node as a list of slots or segments. The navigation among
nodes can be expressed by using navigation patterns (e.g. index or guided
tour). This in-the-small activity completes the navigation model.
354
L. Baresi et al.
5
Clusters, and nodes are derived from the information model through a
set of rules and design heuristics. For example, in the simplest case, we
could imagine that each entity motivates a structural cluster, whose nodes
correspond to the leaf components of the entity (Fig. 11.12(a)). We also
assume that each node is reachable from all the other nodes in the cluster.
However, this hypothesis does not hold if we need to reorganise the contents with a finer granularity. Given the information model of Sect. 11.4.1,
we can say that the rule that associates a cluster with each entity works
well if the user uses a PC-based Web browser. If the user moves to a PDA,
the designer might prefer to split the information about papers into two
nodes (Fig. 11.12(b)): node Introduction contains information about the
author and the main topics of the paper and node Abstract contains the
abstract of the paper.
To derive association clusters from semantic associations, we can say
that the user can navigate from each node of the source structural cluster
(i.e. the source entity of the association) to a centre node, derived from the
association centre, and then, after selecting an item, to the default node of
the target structural cluster. Figure 11.13 exemplifies this solution on the
semantic association WrittenBy: the user can navigate from each node of
cluster Paper to node ListOfAuthors and, after selecting an author, to node
ShortDescription of cluster Author. The dashed rectangles correspond to
already defined clusters with which cluster WrittenBy is linked.
Paper (PC)
Paper (PDA)
Submission
Abstract
Abstract
Introduction
CameraReady
(a)
(b)
Fig. 11.12. Structural cluster Paper
We need also to specify how the user can navigate the nodes of collection clusters. For example, Fig. 11.14 shows the case of collection AllPapers. Figure 11.14(a) shows the collection cluster designed with the hypothesis that the user accesses the application through a PC-based Web
browser, while Fig. 11.14(b) shows the same collection cluster modelled
for PDAs. In the first case, users can navigate from node ListOfPapers,
5
Notice that the different cluster types use special-purpose symbols in the upper
left corner of the rectangle.
W2000: A Modelling Notation for Complex Web Applications
355
derived from the collection centre, to node Abstract of the selected paper
and back. In the second case, the collection centre is rendered with three
nodes (i.e. the list of papers is split in the three nodes) and users can navigate from each node to the next/previous node or they can navigate to node
Introduction of the selected paper.
WrittenBy
Author
Paper
List Of
Authors
ShortDescription
Fig. 11.13. Association cluster WrittenBy
AllPapers (PDA)
List Of
Papers (1)
AllPapers (PC)
Paper (PDA)
Paper (PC)
List Of
Papers
List Of
Papers (2)
Introduction
Abstract
(a)
List Of
Papers (3)
(b)
Fig. 11.14. Collection cluster AllPapers
Finally, to define the navigation steps implied by business processes, we
need to introduce process clusters, which are the bridge between the hypertext and operations and organise the nodes involved in the execution of a
process. Nodes can come from the information model, and be already used
in other clusters (since they are part of the hypertext behind the application), or they can be added explicitly to define information units that are
needed to complete the process (i.e. forms that collect the actual parameters of a computation before sending them to the system).
Process clusters describe the information units touched by the user during the execution of a process, along with the navigation steps. In this case,
navigation links must also consider anomalous or error conditions: for
example, what happens if the user wants to log into the application, but the
356
L. Baresi et al.
username is wrong, or what happens if one of the actual parameters supplied to the operation is incorrect?
To complete the in-the-small view of the navigation model, the designer
must specify the slots associated with each node. We can imagine a
straightforward derivation from the information model, but also something
more articulated. In the former case, for example, the slots of node Abstract of cluster Paper can be derived automatically from the component
of entity Paper with the same name. In the latter case, the same slots can
be accommodated on nodes Introduction (slots Title, Author, and MainTopic) and Abstract (slots Title and Abstract).
Notice that in some cases the information in the navigation model –for
example, for small devices– can be a subset of that in the information
model, or in a different navigation model. Moreover, the navigation model
may introduce “extra” contents that were not envisioned while conceiving
the information model.
11.4.3 Presentation Model
The presentation model defines how the information and navigation models are rendered in pages. It does not describe the final layout of Web
pages, but it only concentrates on the pages’ content. The notion of page in
W2000 corresponds to the intuitive notion of a Web page that contains
several pieces of information, links, and hooks for operations. Pages are
organised hierarchically as aggregations of sections, which in turn aggregate other sections or units. A section is a homogeneous portion of a page
with a specific goal: it contains a set of related data, coordinates links to
other pages, or activates operations. The different sections of a page are
often completely unrelated.
The units are the smallest elements that we identify in a page. Content
units deliver the application contents and are basically nodes of the navigation model. The content units of a section correspond to nodes of the same
cluster. Decorator units deliver new contents: they embed layout content
defined for pure aesthetic/communication reasons. Interaction units are
pure interaction placeholders: they are graphical elements that embed links
to other pages or the capabilities of triggering operations.
The presentation model contains the set of pages that constitute the user
experience supplied by the application. Pages usually come from the structures defined in the navigation model. The in-the-large design ends with
the identification of the sections that belong to each page. The in-the-small
refinement adds the publishing units to the sections previously identified.
The designer must also include those pages that support the execution flow
of operations and processes. Usually, the designer is not able to identify all
W2000: A Modelling Notation for Complex Web Applications
357
these pages without concentrating on the single business processes. This is
why the actual process can be summarised as follows:
•
•
•
The designer specifies the interaction units to interact with operations
and processes while conceiving the presentation model.
The designer defines the details of operations and processes in the service model, where he/she specifies the execution flow of each process
and intertwines operations and pages. These pages can be defined in
the presentation model, but can also be new since the are not foreseen.
If these pages only contain the interaction units that govern operations,
the designer must rework just the presentation model.
The designer can also rework the navigation model, to add new nodes
or process clusters, if the design of the different processes highlights
holes in the blending between navigation and operations. These
changes must then be transferred to the presentation model.
Moving to the example application, the structural cluster Paper could
motivate three pages if we consider a conventional browser (AbstractPage,
CameraReadyPage, and SubmissionPage) and two pages for a PDA version
(IntroductionPage and AbstractPage). The designer should also decide to
structure each page in two sections: section Landmark, which allows the
user to trigger operations and contains some links to other relevant sections
of the application; and section Content, which conveys the actual information about the paper. Figure 11.15 shows the in-the-small definition of page
AbstractPage: section Landmark contains an interaction unit, which allows
users to log into the application, and a decorator unit, which contains the
AbstractPage
Landmark
Content
< interaction >
< decoration >
Login
WCMLogo
HomePage
Paper.Abstract
DetailsPage
Fig. 11.15. Page Abstract (PC version)
WrittenBy.ListOfAuthors
AuthorPage
358
L. Baresi et al.
conference logo and allows users to navigate to the HomePage. Section
Content, on the other hand, embeds the content unit Paper.Abstract (derived
from node Abstract of cluster Paper) and the content WrittenBy.ListOfAuthors (derived from node ListOfAuthors of association centre WrittenBy).
In contrast, Fig. 11.16 represents the in-the-small specification of the IntroductionPage of the presentation model for the PDA version of the application. Although it contains the two sections Landmark and Content, the
first section only embeds a decorator unit with the conference logo, while
the content section only has the content unit Paper.Introduction (derived
from node Introduction of cluster Paper). This is the starting point to let
the user navigate to page AbstractPage or to page WrittenByPage with the
list of the authors of the paper. The restricted display dimensions require
that the list of authors be displayed in another page: this is different from
the PC-based version.
IntroductionPage
Landmark
Content
< decoration >
WCMLogo
HomePage
Paper.Introduction
AbstractPage
WrittenByPage
Fig. 11.16. Page Introduction (PDA version)
11.4.4 Service Model
The service model complements and extends the hypermedia part of the
Web application. The service model comprises the definition of the business processes supplied by the application, along with the operations
needed to implement them.
W2000: A Modelling Notation for Complex Web Applications
359
W2000 identifies a preliminary taxonomy for Web operations [2] and
the requirements that they induce. Web operations can allow users to:
•
•
•
•
Select the actual parameters to complete an operation. In many cases,
while users browse the information repository, they also collect the
data that will become the actual parameters of their operations. For example, when they select a book they want to buy, they identify it as the
parameter of the buy operation. Even if users perceive they are navigating, they change the way the available information is organised (according to the W2000 jargon, they change the collections defined for
the application). This may also lead to changing the state of the pointed
element (e.g. the application immediately decrements the number of
available copies of the selected book) and the way users can navigate
through the application (e.g. the application forbids users from navigating through the pages of unavailable books).
Change the way users can navigate through pages. Even if operations
do not change the application contents, they can guide the user while
navigating through the pages. Specific choices could allow some links,
but disallow others. For example, the operation could change the order
used to present the elements of a collection: books could be presented
by title or by author. In this case, we would not change the elements in
the collection, but simply the links among them.
Enter new data in the system. For example, all pages that embed forms
implicitly provide these operations. However, this means that if we have
forms, the application data are augmented and changed. It could be the
case also that not all the inserted data become “navigable”, i.e. they are
not rendered in Web pages. In many cases, when we supply a system
with our personal data to register ourselves, we cannot browse them.
Perform complex state-aware computations. For example, consider
applications that log their users and adjust what they can do with respect to what they have already done. Otherwise, we can mention those
applications that embed a high degree of computation, such as billing
or special-purpose applications. These operations must store the state
of the computation for the particular user, but they can also compute
complex algorithmic tasks that could hardly be carried out on a DBMS.
Besides these services, we can also think of other operations that we do
not want to consider now. The first group can be seen as fake navigation:
for example, when users graphically reshape the elements they are browsing. The second group comprises all those operations that we could term as
advanced; that is, those that deal with customising the application with
respect to specific contexts [9]: devices, user profiles, quality of service,
etc. Finally, we should model the fact that sets of operations should be
considered logical transactions.
360
L. Baresi et al.
W2000 supports Web operations with two different types of services:
simple operations, which are atomic (with respect to their execution) computational steps that can be invoked by the user; and processes, which are
not atomic, and can be seen as business transactions. Simple operations are
triggered by the user through the interactive units added to pages. The designer can only control a page’s output and use it to decide upon the next
steps in terms of presentation and navigation flows. Processes require a
finer-grained interaction: they are usually composed of simple operations,
but their workflow must be suitably supported by navigation and pages.
We use pre- and post-conditions, to specify simple operations, and collaboration and activity diagrams, to model processes.
The complete presentation of the extensions to UML diagrams and of
the language we use for pre- and post-conditions is beyond the scope of
this chapter. Here we concentrate on the example application to describe
the main concepts. The conference manager supplies, among others, the
following services:
•
•
•
•
•
•
registration is an operation that allows generic users to register to the
conference. After registering, users become authors and can submit
papers.
login is a single operation that allows generic users to log into the application. The actual pages and services that they can access depend on
their roles and the devices they use.
submitPaper is an operation that allows authors to submit their papers.
reviewPaper is a process that allows PC members to submit their paper
reviews.
assignPaper is an operation that allows the conference chair to assign
papers to PC members.
defineFinalProgram is a process that allows the conference chair to
define the final programme of the conference.
Operations are specified using an OCL-like assertion language, extended with special-purpose keywords. Figure 11.17 presents a possible
definition for operation submitPaper, as if it were a single operation.
context: submitPaper(a: Author, p: Paper)
pre:
registeredAuthors->exists(ra | ra = a) AND
currentPage = "PaperSubmissionPage";
post:
submittedPapers += p AND
p.links->exists(l | l.target = a) AND
a.links->exists(l | l.target = p)
Fig. 11.17. Operation submitPaper
W2000: A Modelling Notation for Complex Web Applications
361
This contract states that the operation can be invoked if the author is
registered and is browsing page PaperSubmissionPage. The post-condition
states that the paper p is added to the collection of submittedPapers and
suitable links are added to connect the newly added paper with the author
that submitted it. Notice that the capability of adding the paper to the
proper collection also means the creation of the new components that characterise the entity.
Processes are suitable aggregations of operations and must be described
either through activity diagrams or through collaboration diagrams. In the
former case, the designer wants to concentrate on the execution flow of the
process and does not consider the right sequence of clusters and pages.
There must be a consistent process cluster that specifies how the execution
flow described by the activity diagram is supported in terms of nodes (and
thus pages). In the latter case, the designer uses a collaboration diagram to
show the execution flow in terms of the single operations that are triggered
on the particular interaction units, the way the user can navigate the different pages to reach the units of interest, and the elements that are created
while executing the operations.
For example, if we consider process reviewPaper (see Fig. 11.18), we
can describe it as a sequence of three operations: selectPaper, downloadPaper, and submitReview, but we allow the PC member to leave the process after downloading the paper, or to submit the review directly after
identifying the paper. This description must be suitably paired with a process cluster that describes the navigation behind the process.
Select paper
Download paper
Submit review
Fig. 11.18. Process reviewPaper
The activity diagram in Fig. 11.18 shows how single operations can be
organised in more complex computations. The same operations can be
used as building blocks in different processes: for example, operation selectPaper can be the starting point of all the processes that manage papers.
362
L. Baresi et al.
The composition, i.e., the actual process, can change because of the different roles or, in general, because of the context in which the process is executed. For example, we can imagine that processes executed on PDAs
must be simpler than those on PCs. The simplification can only benefit the
process, or require that smaller operations be conceived. This is the case,
for example, of all those operations that let the user provide inputs to the
application: the smaller the screen is, the simpler the operation must be.
We cannot foresee that we get the same amount of data with a single PCbased page and with a PDA-based page.
11.5 Conclusions and Future Work
The chapter presents W2000, along with its main modelling features. Although space limitations do not allow us to deeply explain some key features, like the support to application families and transformation rules, the
chapter presents a wide introduction to W2000. Its precise meta-model and
significant excerpts of the models and diagrams of an example application
help the reader understand the key elements.
The actual use of W2000 in some industrial projects is giving encouraging results and is paving the way to our future work. In some cases, industrial partners have highlighted the complexity of the modelling notation as a
possible negative aspect and these comments are motivating the development of a lighter version of W2000 that, coupled with a heavy use of transformation rules, should better help and assist the designer while conceiving
new applications. We are also exploiting the idea of swapping the viewpoint
and thinking of a complex Web application as a workflow supported by
Web pages, but this is still in its infancy and needs further studies.
References
1
Baresi L, Colazzo S, Mainetti L (2005) Beyond Modeling Notations: Consistency and Adaptability of W2000 Models. In: Proceedings of the 20th Annual
ACM Symposium on Applied Computing -- Track on Web Technologies and
Applications, ACM Press, New York (to appear)
2
Baresi L, Denaro G, Mainetti L, Paolini P (2002) Assertions to Better Specify
the Amazon Bug. In: Proceedings of the 14th International Conference on
Software Engineering and Knowledge Engineering, ACM Press, New York
pp 585–592
3
Baresi L, Garzotto F, Maritati M (2002) W2000 as a MOF Metamodel. In:
Proceedings of the 2002 World Multiconference on Systemics, Cybernetics
and Informatics, 1
W2000: A Modelling Notation for Complex Web Applications
363
4
Eclipse consortium (2005) Eclipse - Home page. www.eclipse.org/
5
Ermel C, Rudolf M, Taentzer G (1999) The AGG Approach: Language and
Tool Environment. In: Ehrig H, Engels G, Kreowski H-J, Rozenberg G (eds)
Handbook of Graph Grammars and Computing by Graph Transformation, 2:
Applications, Languages, and Tools, World Scientific, Singapore, pp 551–601
6
Fowler M (2004) UML Distilled. Addison-Wesley, Reading MA
7
Garzotto F, Paolini P, Schwabe D (1993) HDM- A Model-Based Approach to
Hypertext Application Design. ACM Transactions on Information Systems,
11(1):1–26
8
Gerber A, Lawley MJ, Raymond K, Steel J, Wood A (2002) Transformation:
The Missing Link of MDA. In: Proceedings of the 1st International Conference on Graph Transformation (ICGT 2002), LNCS 2505, Springer Verlag,
Berlin pp 90–105
9
Kappel G, Proll B, Retschitzegger W, Schwinger W, Hofer T (2001) Modeling Ubiquitous Web Applications - A Comparison of Approaches. In: Proceedings of the Third International Conference on Information Integration and
Web-based Applications and Services, pp 163–174
10 Maritati M (2001) Il Modello Logico per W2000. MSc. thesis, Università
degli Studi di Lecce - Politecnico di Milano
11 Nentwich C, Capra L, Emmerich W, Finkelstein A (2002) xLinkIt: A Consistency Checking and Smart Link Generation Service. ACM Transactions on
Internet Technology, 2(2):151–185
12 netBeans.org (2005) Metadata Repository MDR home. mdr.netbeans.org/
13 Object Management Group (2002) Meta Object Facility MOF Specification v.1.4. Technical report, OMG, March
14 Object Management Group (2002) XML Metadata Interchange (XMI) Specification. Technical report, OMG
15 Powell TA (1998) Web Site Engineering: Beyond Web Page Design. Prentice
Hall, Upper Saddle River, NJ
Authors’ Biographies
Luciano Baresi is an Associate Professor at Dipartimento di Elettronica e Informazione at Politecnico di Milano, where he earned both his Laurea degree and
PhD in Computer Science. He was also junior researcher at Cefriel (a research
consortium among technical universities and industry in the Milan area) and Visiting Professor at University of Oregon at Eugene (USA), University of Paderborn (Germany), and University of Buenos Aires (Argentina). He has published
and presented some 50 papers in the most important national and international
journals and conferences. He served as programme co-chair of ICECCS 2002
(International Conference on Engineering Complex Computer Systems), GTVMT
2001 (International Workshop on Graph Transformation and Visual Modeling
364
L. Baresi et al.
Techniques, co-located with ICALP), UMICS 2003 and 2004 (the CAiSE Workshop on Ubiquitous Information and Communication Systems), the WISE Workshop on Mobile and Multichannel Information Systems and the ICWE Workshop
on Web Quality. He has been a PC member for several conferences: among them,
WWW, ICWE, SAC, and GT-VMT. His research interests are on Web engineering, with special emphasis on modeling complex applications, validation, and
quality estimation.
Sebastiano Colazzo is a PhD student in Computer Science at the Polytechnic of
Milan. He graduated in Computer Science at the University of Lecce (Italy), with
a thesis on databases. He works for HOC (Hypermedia Open Center) Multimedia
Lab at the Polytechnic of Milan as a consultant and researcher on various projects
(both research and industrial) in the Web application fields. His interests span
Web technology and ubiquitous Web applications design, tools for design and
prototyping of Web application, application usability, and conceptual and logical
design.
Luca Mainetti is an Associate Professor in the Department of Innovation Engineering at the University of Lecce (Italy). His research interests include Web design methodologies, notations and tools, Web and services-oriented architectures
and applications, and collaborative computer graphics. He received a PhD in computer science from Politecnico di Milano (Italy). He is a member of the IEEE and
ACM. Contact him at luca.mainetti@unile.it
Sandro Morasca is a Professor of Computer Science at the Dipartimento di
Scienze della Cultura, Politiche e dell'Informazione of Università degli Studi dell'Insubria in Como, Italy. In the past, he was with the Dipartimento di Elettronica e
Informazione of Politecnico di Milano in Milan, Italy. He was a Faculty Research
Assistant and later a Visiting Scientist at the Department of Computer Science of
the University of Maryland at College Park, and a Visiting Professor at the University of Buenos Aires, Argentina. He has published around 20 papers in international journals (eight of which are in IEEE or ACM Transactions) and around 40
papers in international conferences. In his research and professional activities, he
has investigated the theoretical and practical aspects of measurement in several
software engineering areas and in Web engineering, and has been involved in
several projects with software companies and the public administration. He has
served on the Programme Committee of a number of software engineering conferences, including ICWE and METRICS, the International Workshop on Software
Metrics. He was the General Chair for METRICS 2005, which was held in Como,
Italy, in mid-September 2005. Sandro Morasca serves on the Editorial Board of
“Empirical Software Engineering: An International Journal,” published by Kluwer. He organised, with Luciano Baresi, a workshop on Web quality at ICWE
2004, held in Munich, Germany.
12 What You Need To Know About Statistics1
Katrina D. Maxwell
Abstract: How do you measure the value of data? Not by the amount you
have, but by what you can learn from it. Statistics provides a way to extract valuable information from your data. It is a science concerned with
the collection, classification, and interpretation of data according to welldefined procedures. For a manager, however, statistics is simply one of
many diverse techniques that may improve decision-making.
The purpose of this chapter is to develop a deeper understanding of the
statistical methods used to analyse software project data. The methods used
to analyse software project data come from the branch of statistics known
as multivariate statistical analysis. These methods investigate relationships
between two or more variables. However, before we delve into detailed
explanations of chi-square tests, correlation analysis, regression analysis,
and analysis of variance, you need to understand some basic concepts.
Keywords: Statistical concepts, Regression, Correlation, Distribution,
sampling.
12.1 Describing Individual Variables
In this section, you will learn how to categorise and meaningfully summarise data concerning individual variables.
12.1.1 Types of Variables
All data is not created equal. Information can be collected using different
scales. This has an impact on what method you can use to analyse the data.
There are four main types of scales: nominal, ordinal, interval, and ratio.
Nominal scales – Variables such as business sector, application type, and
application language are nominal-scale variables. These variables differ in
kind only. They have no numerical sense. There is no meaningful order.
For example, let’s say that a business sector has four categories: bank,
1
Maxwell, Katrina D., Applied Statistics for Software Managers, 1st Edition,
©2002. Adapted by permission of Pearson Education, Inc., Upper Saddle River,
NJ. The original chapter has been adapted to this book by Emilia Mendes
366
Katrina D. Maxwell
insurance, retail, and manufacturing. Even if we label these with numbers
instead of names in our database (say 101, 102, 103, and 104), the values
of the numbers are meaningless. Manufacturing will never be “higher”
than bank, just different.
Ordinal scales – The values of an ordinal-scale variable can be ranked in
order. The 10 risk factors discussed in Chapter 5 of my book “Applied
Statistics for Software Managers” are ordinal-scale variables. It is correct
to say that Level 5 is riskier than Level 4, and Level 4 is riskier than Level
3, and so on; however, equal differences between ordinal values do not
necessarily have equal quantitative meaning. For example, even though
there is an equal one-level difference between 3 and 4, and 4 and 5, Level
4 may be 50% more risky than Level 3, and Level 5 may be 100% more
risky than Level 4.
Interval scales – The values of an interval-scale variable can be ranked in
order. In addition, equal distances between scale values have equal meaning. However, the ratios of interval-scale values have no meaning. This is
because an interval scale has an arbitrary zero point. A start date variable
is an example of an interval-scale variable. The year 1993 compared to the
year 1992 only has meaning with respect to the arbitrary origin of 0 based
on the supposed year of the birth of Christ. We know that 1993 is one year
more than 1992, and that 1991 is one year less than 1992. Dividing 1993
by 1992 makes no sense. For example, we could decide to make 1900 year
zero and count from there. In this case, 1993 would simply become 93 and
1992 would become 92 in our new scale. Although in both cases there is a
one-year difference, the ratio 1993/1992 does not equal the ratio 93/92.
Another example of an interval scale is a Likert-type scale. Factors are
rated on a scale of equal-appearing intervals, such as very low, low, average, high, and very high, and are assigned numerical values of 1, 2, 3, 4,
and 5, respectively. However, in real life, it is virtually impossible to construct verbal scales of exactly equal intervals. It is more realistic to recognise that these scales are approximately of equal intervals. Thus, a Likert
scale is really somewhere between an ordinal scale and a true interval scale.
Ratio scales – Variables such as effort, application size, and duration are
measured using a ratio scale. Ratio-scale variables can be ranked in order,
equal distances between scale values have equal meaning, and the ratios of
ratio-scale values make sense. For example, it is correct to say that an application that required 10 months to develop took twice as long as an application that took 5 months. Another ratio scale is a percentage scale. For
example, the percentage of COBOL used in an application is also a ratiotype variable.
What You Need To Know About Statistics
367
A summary of variable type definitions is presented in Table 12.1.
Table 12.1. Summary of Variable type definitions
Variable type
Nominal
Ordinal
Quasi-interval
Interval
Ratio
Is there a
meaningful
order?
No
Yes
Yes
Yes
Yes
Do equal distances
between scale values
gave equal meaning?
No
No
Approximately
Yes
Yes
Does the calculation of ratio make
sense?
No
No
No
No
Yes
I often refer to variables as being either numerical or categorical. What
do I mean by a numerical variable? I mean a variable that has numerical
sense. It can be ordered in a meaningful way. Variables measured using
the ordinal, interval, or ratio scales are numerical-type variables. What do I
mean by a categorical variable? A categorical variable cannot be interpreted in a quantitative sense. We know there are different levels, but we
cannot answer the question “How much of a difference exists between two
levels?” Variables measured using the nominal or ordinal scales are categorical variables. Categorical variables are also referred to as qualitative or
non-metric variables. Non-categorical variables are often described as
quantitative or metric variables.
12.1.2 Descriptive Statistics
The purpose of descriptive statistics is to meaningfully summarise large
quantities of data with a few relatively simple terms. It is important to fully
understand these terms because they are used in many statistical methods.
In addition, descriptive statistics can be used to present easily understandable summary results to decision-makers. They provide answers to questions such as: What was the percentage of projects developed using XYZ?
This corresponds to how many projects? What is a typical project? What
was the smallest or largest project we ever developed? Are our projects
fairly similar in size or do they vary a lot? You can learn an enormous
amount about your data just from descriptive statistics.
Describing the Average
Three measures, the mode, the median, and the mean, can be used to describe a typical project. These measures are often referred to as measures
of central tendency.
368
Katrina D. Maxwell
Mean – Here, we are referring to the arithmetic mean, which is the most
common measure. It is what we usually consider to be the “average” in our
daily lives. It is computed by adding together the observed values and dividing by the number of observations. For example, consider the ages of
five software developers: 20, 25, 25, 30, and 45. The mean is calculated by
adding all the ages together and dividing by 5 (see Eq. 12.1):
20 + 25 + 25 + 30 + 45
= 29
5
(12.1)
The mean is 29 years.2 The mean is expressed mathematically by the
following formula:
x=
Σxi
n
(12.2)
The mean is represented by x . The age of each software developer is
considered to be an observation value (xi): 20 is x1, 25 is x2, and so on. The
summation sign, Σ , means that we should add (sum) the observation values. Finally, we divide by the number of observations (n). There were five
software developers in this example, so we have five observations.
Median – This is the middle value when the data is ordered from smallest to largest value; it is also referred to as the 50th percentile. In the previous example, the median value is 25. If we have an even number of observations, we determine the median by averaging the two middle
observation values. For example, the median of 20, 25, 30, and 45 is (25 +
30) / 2 = 27.5 years.
Mode – This is the most frequent value. In our example of five software
developers, the most frequent age is 25, so 25 years is the mode. Sometimes there is no mode, and sometimes there is more than one mode. For
example, if the ages were 20, 24, 25, 29, and 45, there would be no mode.
If the ages were 20, 20, 25, 35 and 35, there would be two modes: 20 years
and 35 years.
Describing the Variation
The variation of individual variables in our sample can be described by
three commonly used variability measures: the range, the sample variance,
and the sample standard deviation. “Sample” refers to the set of projects
for which we have data.
2
It is important when dealing with numbers to identify the measurement units.
Age is measured in years.
What You Need To Know About Statistics
369
Range – Technically, the range is the difference between the largest and
smallest values. However, as the range is most useful for providing information about the values beyond which no observations fall, I describe a
data set’s range as “the smallest value to the largest value”. If the ages of
three software developers are 20, 25, and 30, the range is from 20 to 30
years.
2
Sample variance (s ) – This measures the average distance between each
value and the mean. It is the sum of the squared differences between each
observation (xi) and the mean value ( x ) divided by the number of observations (n) minus one.3 This is expressed mathematically by the following
formula:
s2 =
Σ(xi − x )2
n −1
(12.3)
For example, let’s consider the ages of three software developers: 23,
25, and 27. The mean of their ages is 25 years. The sample variance is
calculated as follows:
s2 =
[(23 − 25)
2
]
+ (25 − 25)2 + (27 − 25)2
=4
2
(12.4)
Thus, the sample variance is 4 years squared. Unfortunately, most people find it hard to relate to the variance because the measurement units are
squared. What does “years squared” really mean?
Sample standard deviation (s) – The standard deviation is an easier-tounderstand measure of the average distance between each value and the
mean. It is the square root of the sample variance.
s = s2
(12.5)
Thus, the sample standard deviation of our previous example is 2 years.
The larger the variance and standard deviation, the more the projects differ
from one another.
Perhaps you are wondering why we go through the bother of squaring
the differences and then taking their square root. Why not just use the actual differences in the calculation? Well, the reason why the differences
are squared is so that the positive and negative differences do not cancel
each other out. However, it is true that this could also be achieved by taking the absolute value of the differences. So why don’t we do that? The
3
One is subtracted as a corrective measure. Statisticians have found that the variance is underestimated for small samples if we just divide by n.
370
Katrina D. Maxwell
reason is because certain mathematical operations necessary for the development of advanced statistical techniques cannot be carried out on absolute values. What is an absolute value? It is the positive value of a number.
For example, 3 is the absolute value of both positive 3 and negative 3. This
is expressed mathematically as |3| = 3 and |−3| = 3.
Unfortunately, we cannot calculate these six measures for all variable
types. Table 12.2 shows which measures are authorised for each variable
type.
Table 12.2. Authorised operations by variable type
Variable type
Mean
Median
Mode
Range
Variance
Standard deviation
Nominal
X
Ordinal
X
X
X
Interval
Ratio
X
X
X
X
X
X
X
X
X
X
X
X
Now, let’s look at an example of how to determine these measures for a
hypothetical sample of seven projects in which all variables are already
ordered from lowest to highest (Table 12.3). In this example, application
type is a nominal-scale variable. There are two values for application type:
customer service and MIS. Risk level is an ordinal-scale variable measured
using a scale of 1–5. We know that some applications are riskier than others, but that is all. Quality requirements, a quasi-interval scale variable, are
carefully measured using a Likert scale with 1, 2, 3, 4, and 5 representing
very low, low, average, high, and very high. Effort is a ratio-scale variable;
it is measured in hours.
First, let’s describe a typical project using the mean, median, and mode.
In Table 12.3, we can see that the most frequent application type is customer service. The mode is the only central tendency measure authorised
for nominal variables. For ordinal variables, we can calculate the median
and mode. Project 4 is the middle observation. There are three observations above it and three observations below it. Thus, the median risk level
is 2. There are two modes: 2 and 4. Therefore, there is no single typical
risk level.
What You Need To Know About Statistics
371
Table 12.3. Examples of central tendency and variability measures for each variable type
Variable
type
Nominal
application type
Project 1
2
3
4
5
6
7
Mean
Median
Mode
Range
Customer service
Customer service
Customer service
Customer service
MIS
MIS
MIS
Customer service
Ordinal
risk
level
1
2
2
2
4
4
4
2
2 and 4
1 to 4
Interval
quality requirements
1
2
3
3
3
4
5
3
3
3
1 to 5
Sample
variance
1.67
Sample
standard
deviation
1.29
Ratio
effort
300
400
500
600
1000
5000
30,000
5400 hours
600 hours
None
300 to 30,000
hours
120,456,667
2
hours
10,975.3
hours
For interval and ratio variables, we can also calculate the mean in addition to the median and mode. The mean value of quality requirements is 3.
The median value is 3, and 3 is the most frequent value. It looks like we
can safely say that a typical project has average quality requirements.
For effort, the mean is 5,400 hours, the median is 600 hours, and there is
no mode as no number appears more than once. In this case, we have two
different numbers describing a typical project’s effort. The advantages and
disadvantages of each of these measures are summarized in Table 12.4.
For example, one of the disadvantages of the mean is that it is very sensitive to extreme values. As you can see, the one effort of 30,000 hours has a
very big impact on the mean value. Most projects actually have efforts
below the mean value. The median is insensitive to extreme values. Even
if the effort of the last project was 90,000 hours, the median would remain
unchanged.
372
Katrina D. Maxwell
Table 12.4. Relative merits of mean, median, and mode
Mean
Median
Mode
Advantages
Located by simple process
of addition and division
Affected by every item in
group
Not affected by items
having extreme deviation
from the normal
Unlike the mode, not
overly affected by small
number of items
Not affected by extreme
values
Only way to represent
nominal variable
Disadvantages
Affected by the exceptional and the
unusual
Calculated value may not actually exist
Not as easy to calculate (by hand) as
the mean
Not useful when extreme variations
should be given weight
Insensitive to changes in minimum and
maximum values
Calculated value may not actually exist
(when there is even number of observations)
No single, well-defined type may exist
Difficult to determine accurately
Ignores extreme variations
May be determined by small number of
items
Now, let’s consider the three variability measures: range, sample variance, and sample standard deviation. The range can be described for all
variable types except nominal. The sample variance and sample standard
deviation can only be calculated for interval and ratio variables as they
depend on the mean. Like the mean, they are also sensitive to extreme values. The one project with a 30,000 hour effort has a big impact on all three
variability measures.
Frequency Distributions
Data can also be described with frequency distributions. A frequency distribution refers to the number or percentage of observations in a group. The
group can be either a category or a numerical interval. For example, Table
12.5 shows the frequency distribution of a categorical variable application
type (app). We can see that we have 20 transaction processing (TransPro)
applications. This is the number under Frequency. This corresponds to
approximately 59% of all applications in our sample (Percent). The cumulative frequency (Cumulative) is more applicable to numerical intervals, for
example, if you want to know the total number of projects less than a certain size. Here it just means that 85% of the applications were customer
service (CustServ), management information system (MIS), or transaction
processing (TransPro) applications. While this table provides valuable
information to data analysts, it is a bit boring to show upper management.
What You Need To Know About Statistics
373
Table 12.5. Application type frequency distribution
Application Type
CustServ
MIS
TransPro
InfServ
Total
Frequency
6
3
20
5
34
Percent
17.65
8.82
58.82
14.71
100.00
Cumulative
17.65
26.47
85.29
100.00
Frequency distribution tables can be used to make attractive graphs for
your presentations (see Fig. 12.1). You have probably been making pie
charts like this most of your professional life without realising you were
calculating frequency distributions.
Fig. 12.1. Application-type breakdown
Now let’s look at the frequency distribution of a numerical variable. If I
wanted to make a frequency distribution for a size variable (size) where
size is measured, for example, in function points, I would first separate the
data into meaningful groups of increasing size, say 0–999 function points,
1000–1999 function points, and so on. Then I would count how many applications fell into each interval (see Table 12.6).
With numerical data, we are usually interested in knowing the shape of
the distribution. We can see the shape by making a histogram. A histogram
is a chart with a bar for each class. Figure 12.2 shows the histogram of size
using the percentage of projects in each class. We can easily see from this
graph that most projects have a size of less than 1000 function points. Often we make histograms to determine if the data is normally distributed.
374
Katrina D. Maxwell
Table 12.6. Size frequency distribution
Size in function points
0-999
1000–1999
2000–2999
3000–3999
Total
Frequency
29
3
1
1
63
Percent
85.30
8.82
2.94
2.94
100.00
Cumulative
85.30
94.12
97.06
100.00
frequency
40
30
20
10
0
0 - 999
1000 - 1999 2000 - 2999 3000 - 3999
function points
Fig. 12.2. Distribution of size
12.2 The Normal Distribution
One very important frequency distribution is the normal distribution. Figure 12.3 shows the fundamental features of a normal curve. It is bellshaped, with tails extending indefinitely above and below the centre. A
normal distribution is symmetrical about the average. In a normal distribution, the mean, median, and mode all have the same value and thus all describe a typical project.
A normal curve can be described mathematically in terms of just two
parameters, the mean and standard deviation. The width of a normal distribution is a function of the standard deviation of the data. The larger the
standard deviation, the wider the distribution. If our numerical data follows
a normal distribution, we know that about 68% of all observations fall
within plus or minus one standard deviation of the mean. About 95.5% of
the observations lie within plus or minus two standard deviations of the
mean, and 99.7% fall within plus or minus three standard deviations.
What You Need To Know About Statistics
375
Fig. 12.3. Example of normal distribution
12.3 Overview of Sampling Theory
Now that you know how to describe and summarise individual variables,
there is another key concept to understand before we can proceed to identifying relationships in data: the difference between samples and populations.
376
Katrina D. Maxwell
This is important because you probably haven’t been able to collect valid
data for every software project your company has ever undertaken. So, if we
consider that the population of software project data is data for all projects
in your company, what you have is a sample of that data. How, then, can
you be sure that what you find in your sample is true for all projects in your
company?
Fig. 12.4. Sampling from a population
Imagine that you are able to select different samples (groups of projects)
4
at random from the population of all software projects in your company
5
(see Fig. 12.4). As an example, let’s consider the variable effort. For each
sample, we can compute the mean value of effort. The mean value of effort
in each sample will not always be the same. In one sample, it might be 600
hours ( x1 ), in a second sample, 620 hours ( x 2 ), in a third sample, 617
hours ( x 3 ), and so on. We can make a frequency distribution of the mean
efforts of each sample. This distribution is called the sampling distribution
of the sample means. The mean value of an infinite number of sample
means is equal to the population mean (see Fig. 12.5).
4
5
As all inferential (i.e. predictive) techniques assume that you have a random
sample, you should not violate that assumption by removing projects just because they do not fit your model!.
To simplify this complicated discussion, the effort in my example is normally
distributed. In practice, this is not the case.
What You Need To Know About Statistics
377
Fig. 12.5. Distributions of one sample, means of all samples, and the population
If the sample size is large (≥30), the sampling distribution of the sample
means is approximately a normal distribution. The larger the sample, the
better the approximation. This is true even if the effort variable in the
population is not normally distributed. This tendency to normality of sam6
pling distributions is known as the Central Limit Theorem. This theorem
has great practical importance. It means that it doesn’t matter that we don’t
know the distribution of effort in the population. This is handy because, in
practice, all we have is one sample. If we have a sample of at least 30 projects, we can use a normal distribution to determine the probability that the
mean effort of the population is within a certain distance of the mean effort
of our one sample.
As you can see in Fig. 12.5, the sampling distribution of all sample
mean efforts is not as wide as the distribution of all software projects’ efforts. In fact, one of the most important properties of the sample mean is
that it is a very stable measure of central tendency. We can estimate the
hypothetical standard deviation of the sampling distribution of sample
means from the variation of effort in our one sample. This is known as the
standard error of the mean. Note that “error” does not mean “mistake” in
this context. It really means deviation. The term “error” is used to distinguish the standard deviation of the sampling distribution (the standard
error) from the standard deviation of our sample. Otherwise, it is not clear
6
In addition, it has also been shown that if a variable is normally distributed in
the population, the sampling distribution of the sample mean is exactly normal
no matter what the size of the sample.
378
Katrina D. Maxwell
just what standard deviation we are talking about. The standard error is
expressed mathematically as:
sx =
s
n
(12.6)
where s x is the estimated standard error of the mean, s is the standard deviation of our one sample, and n is the size of our one sample. You can see
that if s is very small and n is very large, the standard error of the mean
will be small. That is, the less variation there is in the variable effort in our
sample, and the more projects we have in our sample, the smaller the standard error of the mean (and the narrower the sampling distribution of the
sample means). The narrower the sampling distribution of the sample
means, the more certain we are that the population’s mean effort is near
our one-sample mean effort. This is because in a very narrow distribution,
the population mean is near every sample mean.
The standard error of the mean is important because we can use it to
calculate the limits around our one-sample mean which probably contain
the population mean–probably, because we specify these limits with a certain degree of confidence. Typically, we are interested in 95% confidence
intervals. The 95% confidence interval estimate of the mean states that the
population mean is equal to the sample mean plus or minus 1.96 multiplied
by the standard error of the mean. That is:
population mean = x ± 1.96s x
(12.7)
The value 1.96 is related to the normal curve. Recall that approximately
95.5% of the observations lie within plus or minus two standard deviations
of the mean (see Fig. 12.3). If 95% confidence intervals were constructed
for many samples, about 95% of the intervals would contain the true population mean. Thus, there is still a 5% probability that the true population
mean effort lies outside the 95% confidence interval of our one sample.
The accuracy of this probability increases with larger sample sizes.
12.4 Other Probability Distributions
Three additional common probability distributions are described in this
chapter. You don’t need to worry about which distribution to use in which
circumstance, what they actually look like, or how to read probabilities
from the complicated tables that you find in the appendices of many statistics books. Your statistical analysis package automatically applies the
correct distribution. All you need to know is how to read the probability
from the statistical output.
What You Need To Know About Statistics
•
•
•
379
Student t-distribution–If the sample size is less than 30 projects, then
the t-distribution must be used instead of the normal distribution. The
Student t-distribution assumes that the population from which we are
drawing our sample is normally distributed. (i.e. the Central Limit
Theorem does not apply). The Student t-distribution tends to coincide
with the normal distribution for large sample sizes. Because it is appropriate for either large or small samples, the t-distribution is used in
place of the normal distribution when inferences must be made from
accessible samples to immeasurable populations. Think of it as a modified normal distribution. You will see the t-distribution referred to in
correlation and regression analysis output.
Chi-square distribution–If a population is normally distributed, the
sample distribution of the sample variance is approximated by the chisquare distribution. This test is explained in detail later in this chapter.
Fisher F-distribution–If samples taken from two different normally
distributed populations are independent, the F-distribution can be used
to compare two variances. The calculation of the F-ratio is explained in
detail in Sect. 12.5.4.
Each of these probability distributions assumes that the underlying data
is normally distributed. You can now appreciate why the normal distribution is so important in statistics, and why we must check if the numerical
variables in our software project database are normally distributed—it is
even more important when we don’t have very many projects in our sample.
12.5 Identifying Relationships in the Data
Now that you have learned the basics, you are ready to identify relationships between variables. Table 12.7 shows which statistical methods can
be used in which circumstances. It is important to know what types of
variables you have to apply the correct method. Choosing the correct statistical method is extremely important. Your statistical analysis package
does not automatically decide what method to use– you do.
The concept of dependent and independent variables does not apply to
the chi-square test for independence, nor does it apply to Spearman’s and
Pearson’s correlation coefficients. However, to use the analysis of variance (ANOVA) method, you need to pay attention to which variable is
the dependent variable. The dependent variable is the variable you want
to predict. For example, if you have a ratio-type variable (effort) and an
ordinal-type variable (risk level) as in Table 12.3, you can calculate
Spearman’s correlation coefficient between these two variables. You can
also run an ANOVA procedure to determine how much of the variation in
380
Katrina D. Maxwell
effort (dependent variable) is explained by the variation in the risk level
(independent variable). However, you cannot run an ANOVA procedure
with risk level as the dependent variable.
Table 12.7. Mappings of statistical methods in this chapter to variable types
Variable type
Dependent
Nominal
variable
Nominal
Chi-square
test for independence
Ordinal
Chi-square
test for independence
Interval
ANOVA
Ratio
ANOVA
Independent variable
Ordinal
Interval
Ratio
Chi-square test for
independence
Spearman’s correlation,
chi-square test for
independence
Spearman’s correlation, ANOVA
Spearman’s correlation, ANOVA
Spearman’s
correlation
Spearman’s
correlation
Spearman’s
correlation,
Pearson’s
correlation,
regression
Spearman’s
correlation,
Pearson’s
correlation,
regression
Spearman’s
correlation,
Pearson’s
correlation,
regression
Spearman’s
correlation,
Pearson’s
correlation,
regression
12.5.1 Chi-Square Test for Independence
Two events are independent whenever the probability of one happening is
unaffected by the occurrence of the other. This concept can be extended to
categorical variables. The chi-square test for independence compares the
actual and expected frequencies of occurrence to determine whether or not
two categorical variables are independent. For example, let’s consider two
7
nominal variables, Telon use (telonuse) and application type (subapp). Telon use can be “yes” or “no”, and application type can be customer service,
management information system, transaction processing, or information/online service. We want to know if Telon use is independent of application
type. We will base our conclusion on data from 62 software projects. This is
our sample size. Table 12.8 summarises the actual frequencies found in our
sample. This table is called a contingency table because it shows the frequency for every combination of attributes (i.e. every possible contingency).
7
Telon is a tool that generates code.
What You Need To Know About Statistics
381
If two variables are independent, the proportion of observations in any
category should be the same regardless of what attribute applies to the
other variable. So, if Telon use and application type are independent, we
would expect the percentage of Telon use to be the same for all four application types. It is easy to see in Table 12.9 that the percentages are not
exactly the same.
Table 12.8. Contingency table–actual frequencies
Application
type
CustServ
MIS
TransPro
InfServ
Total
No
12
4
24
8
48
Telon use
Yes
Total
6
18
0
4
5
29
3
11
14
62
The frequencies we would expect if the percentages were the same are
computed in the following way: the overall proportion of projects in our
sample that did not use Telon is approximately 0.77 (= 48/62); the proportion that used Telon is approximately 0.23 (= 14/62). This proportion can
be used to compute the expected number of Telon projects for each application type. There were 18 customer service (CustServ) applications. If
approximately 23% used Telon this makes 4.1 expected Telon/customer
8
service projects. Out of a total of four MIS applications, we would expect
4*(14/62) = 0.9 to use Telon. For transaction processing (TransPro) applications, 29*(14/62) = 6.5 is the expected number of Telon projects. For
information service (InfServ) applications, 11*(14/62) = 2.5 is the expected
number of Telon projects. Then for each application type, the expected
number of projects that did not use Telon is simply the total number for
each application type minus the number that did use Telon. The expected
frequencies are presented in Table 12.10.
Table 12.9. Percentage of applications that did/did not use Telon
Application
type
CustServ
MIS
TransPro
InfServ
Total
8
No
66.67
100.00
82.76
72.73
77.42
Telon use
Yes
33.33
0.00
17.24
27.27
22.58
Total
100.00
100.00
100.00
100.00
100.00
Obviously, a fraction of a project does not exist; however, it is necessary to
keep the decimal places for the calculations.
382
Katrina D. Maxwell
Table 12.10. Contingency table − expected frequencies
Application
type
CustServ
MIS
TransPro
InfServ
Total
No
13.9
3.1
22.5
8.5
48
Telon use
Yes
4.1
0.9
6.5
2.5
14
Total
18
4
29
11
62
Our null hypothesis is that there is no relationship between Telon use
and application type. If we demonstrate that:
•
•
the actual frequencies differ from the frequencies expected if there was
no relationship, and
the difference is larger than we would be likely to get through sampling error,
then we can reject the null hypothesis and conclude that there is a relationship between Telon use and application type. So far in our example, we
have seen that the actual and expected frequencies are not exactly the same
(Condition 1). Now we need to see if the difference is significant (Condition 2). We compare the difference between the actual and expected frequencies with the chi-square statistic.
The chi-square statistic is calculated with the following expression:
χ2 = ¦
(actualij − expectedij )2
(12.8)
expectedij
th
where actualij is the actual frequency for the combination at the i row and
th
j column, and expectedij is the expected frequency for the combination at
th
th
the i row and j column (Table 12.10). For example, actual11 refers to the
actual frequency of customer service (CustServ) applications that did not
use Telon; expected42 refers to the expected frequency of information service (InfServ) applications that used Telon.
Table 12.11 shows the calculation of the chi-square statistic for our example. First we subtract the expected value (exp) from the actual value
(act) for each attribute combination. The farther the expected value is from
the actual value, the bigger the difference. Then we square this value. This
allows negative differences to increase rather than reduce the total. Next,
we divide by the expected value. Finally, we sum ( Σ ) the values in the
last column to arrive at the chi-square statistic.
What You Need To Know About Statistics
383
Table 12.11. Example of chi-square statistic calculation
(i,j)
(1,1)
(1,2)
(2,1)
(2,2)
(3,1)
(3,2)
(4,1)
(4,2)
Sum
act
12
6
4
0
24
5
8
3
62
exp act−exp
13.9
−1.9
4.1
1.9
3.1
0.9
0.9
−0.9
22.5
1.5
6.5
−1.5
8.5
−0.5
2.5
0.5
62
0
(act−exp)
3.61
3.61
0.81
0.81
2.25
2.25
0.25
0.25
2
2
(act−exp) / exp
0.260
0.880
0.261
0.900
0.100
0.346
0.029
0.100
Chi-square = 2.877
The chi-square distribution provides probabilities for different values of
χ . There is a separate distribution for each number of degrees of freedom.
The number of degrees of freedom refers to the number of independent
comparisons. In our example, the number of degrees of freedom is 3 because once we have calculated frequencies for Telon use for three application types in Table 12.10, the remaining five cells can be filled in without
any further calculation of frequencies. For example, the expected frequency for information service (InfServ) applications that used Telon is the
total number of applications that used Telon minus the expected frequencies of the three other application types that used Telon (14 – 4.1 – 0.9 –
6.5 = 2.5). We don’t need to calculate its frequency because we can derive
it from the information we already have. The number of degrees of freedom for the chi-square test is always the number of rows minus one multiplied by the number of columns minus one. Here, (4 – 1) (2 – 1 ) = 3.
Once we have our chi-square value and the number of degrees of freedom, we can see if the difference between actual and expected frequencies
is significant using a chi-square distribution table (see example below).
However, in practice, you will not be undertaking these calculations yourself and you do not need to learn how to use the chi-square distribution
tables. A computer will calculate everything for you.
2
Example
My statistical analysis package informs me that the chi-square statistic
(Pearson chi2) associated with the table above has 3 degrees of freedom
and a value of 2.9686. There is a small difference between the computer’s
value and my value because of rounding errors. The computer’s value is
more precise. The significance level is 0.396 (approximately 40%). The
significance level states the probability (Pr) that we are making an error
when we reject the null hypothesis. Only if the Pr is less than or equal to
0.05 can we reject the hypothesis that application type and Telon use are
384
Katrina D. Maxwell
independent at the 5% significance level. Thus, our null hypothesis that
there is no relationship between Telon use and application type cannot be
rejected.
. tabulate app telonuse, chi2
Application| Telon Use
Type
|
No
Yes |
Total
-----------+----------------------+---------CustServ |
12
6 |
18
MIS |
4
0 |
4
TransPro |
24
5 |
29
InfServ |
8
3 |
11
-----------+----------------------+---------Total |
48
14 |
62
Pearson chi2(3) =
2.9686
Pr = 0.396
12.5.2 Correlation Analysis
A correlation coefficient measures the strength and direction of the relationship between two numerical variables. The correlation coefficient can
have any value between –1 and +1 (see Fig. 12.6).
If the correlation coefficient is –1, this means that the two variables are
perfectly negatively correlated. High values of one are associated with low
values of the other, and vice versa.
If the correlation coefficient is +1, this means that the two variables are
perfectly positively correlated. High values of one are associated with high
values of the other, and vice versa.þ If the correlation coefficient is 0, this
means that the two variables are not correlated at all. In practice, we rarely
see perfect correlation or complete non-correlation. Figure 12.7 shows a
more typical relationship.
Fig. 12.6. Interpreting the correlation coefficient
What You Need To Know About Statistics
385
We can see that development effort and software size are positively correlated because the relationship looks linear and the slope of the line is
increasing. But how strong is the relationship? How can we measure the
correlation?
Two measures of correlation are commonly used when analysing software project data. Spearman’s rank correlation coefficient must be used
9
when the data is ordinal, or when the data is far from normally distributed.
Pearson’s correlation coefficient can be used when the data is of an interval or ratio type. Pearson’s correlation coefficient is based on two key assumptions: (1) the data is normally distributed, and (2) the relationship is
linear.
12
leffort
10
8
6
4
6
lsize
8
Fig. 12.7. Typical relationship between ln(effort) and ln(size).
Spearman’s Rank Correlation
Spearman’s rank correlation coefficient compares the differences in two
variables’ rank for the same observation. A variable’s rank refers to its
placement in an ordered list. For example, consider the following five
software development projects, which are shown in Table 12.12.
9
I also prefer Spearman’s rank correlation coefficient for quasi-interval variables.
386
Katrina D. Maxwell
Table 12.12. Data for five software development projects
Id
2
3
5
6
15
size
647
130
1056
383
249
sizerank
4
1
5
3
2
effort
7871
845
21272
4224
2565
effrank
4
1
5
3
2
We are interested in the relationship between size and effort. First we
have to rank the projects’ size. There are five projects, so the rank of each
project will be a number between 1 and 5. The smallest size project is
given rank 1, the second smallest 2, and so on. We do the same thing for
project effort. We now have two new variables, sizerank and effrank,
which are the respective ranks of the variables size and effort.
We can easily calculate Spearman’s rank correlation coefficient, ρ , using the following equation:
ρ = 1−
6 ¦ D2
(
(12.9)
)
n n2 −1
where D is the difference between the two variables’ rank for the same
project, and n is the number of projects.
How strong is the relationship between effort and size? Some calculation steps are shown in Table 12.13.
The sum of the squared differences is 0. This results in a Spearman’s
rank correlation coefficient of 1. This is an example of perfect positive
correlation.
ρ = 1−
6(0)
(
(12.10)
) =1
5 52 − 1
Table 12.13. Calculation of Sum of Squared Differences
Project id
Rank of
size
Rank of
effort
2
3
5
6
15
n=5
4
1
5
3
2
4
1
5
3
2
Difference
between
ranks, D
4–4=0
1–1=0
5–5=0
3–3=0
2–2=0
Square of
difference,
2
D
0
0
0
0
0
2
ΣD = 0
What You Need To Know About Statistics
387
The second example compares the quality requirements and development time constraints of five hypothetical projects. Quality requirements
and development time constraints are quasi-interval variables measured
using a Likert scale from 1 (very low) to 5 (very high). We see that very
low quality requirements are associated with very high development time
constraints, low quality requirements are associated with high development
time constraints, and so on. Table 12.14 shows how the sum of the squared
differences was calculated for this example.
The sum of the squared differences is 40. Plugging this into Spearman’s
equation results in a correlation coefficient of –1. This is an example of
perfect negative correlation.
Table 12.14. Calculation of sum of squared differences
Project
id
P01
P22
P33
P54
P65
n=5
Rank of
quality requirements
1 (very low)
2 (low)
3 (average)
4 (high)
5 (very high)
Rank of
development time
constraints
5 (very high)
4 (high)
3 (average)
2 (low)
1 (very low)
ρ = 1−
6(40)
(
) = −1
5 52 − 1
Difference
between
ranks, D
1 – 5 = -4
2 – 4 = -2
3–3=0
4–2=2
5–1=4
Square of
difference,
2
D
16
4
0
4
16
2
ΣD = 40
(12.11)
These calculations are slightly more complicated when there are ties in the
ranks. However, as your statistical analysis package will automatically calculate the correlation coefficient, you do not need to be concerned about this.
Pearson’s Correlation
Pearson’s correlation coefficient uses the actual values of the variables
instead of the ranks. So, it takes into account not only the fact that one
value is higher than another, but also the size of the quantitative difference
between two values. It can be calculated with the following formula:
r=
Σ(xi − x )( yi − y )
(n − 1)s x s y
(12.12)
where xi − x is the difference between a project’s value on the x variable
from the mean of that variable, yi − y is the difference between a project’s
value on the y variable from the mean of that variable, sx and sy are the
388
Katrina D. Maxwell
sample standard deviations of the x and y variables, respectively, and n is
the number of observation pairs.
There is no better way to understand an equation than to try out the calculation with some real data. So let’s return to our software project data in
Table 12.12. In the example below, we have the mean and standard deviation of effort and size for the five projects in our sample.
Example
. summarize effort size
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------effort |
5
7355.4
8201.776
845
21272
size |
5
493
368.8123
130
1056
For our sample of five projects, the mean of the effort is 7355.4 hours and
its standard deviation is 8201.776 hours. The mean of the size is 493 function points and its standard deviation is 368.8123 function points. Table
12.15 demonstrates some steps for the calculation of Pearson’s correlation
coefficient between effort and size for these projects.
Plugging these numbers into our formula gives us the following result:
r=
11791035
= 0.9745
(5 − 1) × 368.8123 × 8201.776
(12.13)
Table 12.15. Calculation of Pearson’s Correlation Coefficient Numerator
Project id
2
3
5
6
15
x, size
647
130
1056
383
249
y, effort
7871
845
21,272
4224
2565
(xi − x )
154
−363
563
−110
−244
(yi − y )
515.6
−6510.4
13,916.6
−3131.4
−4790.4
(xi − x )( yi − y )
79,402.4
2,363,275.2
7,835,045.8
344,454.0
1,168,857.6
Σ= 11,791,035
Pearson’s correlation coefficient is 0.9745. Recall that we calculated a
Spearman’s rank correlation coefficient of 1 for this data in the previous
section. Pearson’s correlation coefficient is a more accurate measurement
of the association between interval- or ratio-scale variables than Spearman’s coefficient, as long as its underlying assumptions have been met.
This is because some information is lost when we convert interval- or ratio-scale variables into rank orders. One of the assumptions underlying
Pearson’s correlation coefficient is that the relationship between the two
variables is linear. Let’s look at the data and see if this is the case. We can
What You Need To Know About Statistics
389
effort
20000
10000
0
0
500
size
1000
Fig. 12.8. effort vs. size for correlation example
10
leffort
9
8
7
6
5
6
lsize
7
Fig. 12.9. ln(effort) vs. ln(size) for correlation example
see in Fig. 12.8 that although it is possible to fit a straight line close to the
five data points, the relationship is really a bit curved.
Taking the natural log of the variables effort and size results in a more
linear relationship (Fig. 12.9).
Pearson’s correlation coefficient between ln(size) and ln(effort) is shown
in the example below.
Example
. corr lsize leffort
(obs=5)
|
lsize leffort
--------+-----------------lsize|
1.0000
leffort|
0.9953
1.0000
390
Katrina D. Maxwell
Thus, the linear association is stronger between the natural log of size and
the natural log of effort (0.9953) than it is between size and effort
(0.9745). As you can see in the next example, the natural log transformation has no effect on Spearman’s rank correlation coefficient because although the actual values of the variables change, their relative positions do
not. Thus, the ranking of the variables stays the same.
Example
. spearman lsize leffort
Number of obs =
Spearman's rho =
5
1.0000
12.5.3 Regression Analysis
Whereas a correlation coefficient measures only the strength and direction
of the relationship between two variables, regression analysis provides us
with an equation describing the nature of their relationship. Furthermore,
regression analysis allows us to assess the accuracy of our model.
In simple regression analysis, we are interested in predicting the dependent variable’s value based on the value of only one independent variable. For example, we would like to predict the effort needed to complete a
software project based only on knowledge of its size. In this case, effort is
the dependent variable and size is the independent variable. In multiple
regression analysis, we are interested in predicting the value of the dependent variable based on several independent variables. For example, we
would like to predict the effort needed to complete a software project
based on knowledge about its size, required reliability, duration, team size,
and other factors. Because it is easier to grasp multiple regression analysis
if you understand simple regression analysis, we’ll start with that.
Simple Regression
The least-squares method fits a straight line through the data that minimises the sum of the squared errors. The errors are the differences between
the actual values and the predicted (i.e. estimated) values. These errors are
also often referred to as the residuals.
In Fig. 12.10, the three points, (x1,y1), (x2,y2), and (x3,y3), represent the actual values. The predicted values, (x1 , yˆ1 ) , (x2 , yˆ 2 ) , and (x3 , yˆ 3 ) , are on the
line. The errors are the differences between y and y for each observation.
We want to find the straight line that minimizes error12 + error22 + error32.
You may recall from algebra that the equation for a straight line is of the
form:
What You Need To Know About Statistics
391
(12.14)
yˆ = a + bx
where ŷ is the predicted value of the dependent variable, y, given the
value of the independent variable, x. The constant a represents the value of
ŷ when x is zero. This is also known as the y-intercept. The constant b
represents the slope of the line. It will be positive when there is a positive
relationship and negative when there is a negative relationship.
Fig. 12.10. Illustration of regression errors
To find the a and b values of a straight line fitted by the least-squares
method, the following two equations must be solved simultaneously:
¦ y = na + b ¦ x
(12.15)
¦ xy = a ¦ x + b ¦ x 2
where n is the number of observations. By plugging in the known values of
x and y, a and b can be calculated. Table 12.16 demonstrates some steps in
the calculation of the regression line for the five projects from our correlation example.
Table 12.16. Calculation of sums needed to solve regression equations
Project
id
2
3
5
6
2
x, size
y, effort
x
647
130
1056
383
7871
845
21,272
4224
418,609
16,900
1,115,136
146,689
xy
5092537
109850
22,463,232
1,617,792
392
Katrina D. Maxwell
15
n=5
249
Σx = 2465
2565
Σy = 36,777
62,001
2
Σx = 1,759,335
638,685
Σxy =29922096
We can now solve these two equations for a and b:
36,777 = 5a + 2465b
29,922,096 = 2465a + 1,759,335b
(12.16)
This results in the following regression line:
predicted effort = −3328.46 + 21.67 × size
(12.17)
This is what your statistical analysis package is doing when you ask it to
regress two variables.
Regression Accuracy
A regression line is only a measure of the average relationship between the
dependent and independent variable. Unless there is perfect correlation, in
which all the observations lie on a straight line, there will be errors in the
estimates. The farther the actual values are from the regression line, the
greater the estimation error. How can we translate this into a measure that
will tell us if the fit of the regression line is any good?
Imagine that you join a company and you need to estimate a project’s effort. The only data available is the effort of past projects. You don’t even
know if there were any similar projects in the past or what the projects’
sizes were. How can you use this data? Well, the simplest thing to do would
be to use the average effort of past projects as an estimate for the new project. You are not happy with the result and convince your company that you
could improve future effort estimation if you also knew the sizes of past
projects. Obviously, if you then collected and used this size data to develop
a regression model to estimate effort, you would expect your model to perform better than just taking the average of past efforts. Otherwise, you
would have wasted a great deal of your company’s time and money counting function points. Similarly, comparing the results obtained by the regression equation with the results of using averages is how the accuracy of
the regression model is determined.
Figure 12.11 shows an example using three projects. Let’s pretend that y
is the project effort and x is the size. We can see that for Project 1, the
mean value of effort, y , overestimates the actual value of effort, y1. The
predicted value of effort, ŷ1 , underestimates the actual effort. For Project
2, both the mean value of effort and the predicted value of effort, ŷ 2 , overestimate the actual effort, y2. For Project 3, both the mean value of effort
and the predicted value of effort, y 3 , underestimate the actual effort, y3.
We need to compare the differences between the actual values, the pre-
What You Need To Know About Statistics
393
dicted values, and the mean for each project to calculate the overall accuracy of our model.
Fig. 12.11. Illustration of regression accuracy
The total squared error between the actual value of effort and the mean
value of effort for each project is:
Total SS = ¦( yi − y ) 2
(12.18)
This is the total variation of the data.10 If effort really does depend on
size, then the errors (residuals) should be small compared to the total
variation of the data. The error (Residual SS) is the sum of the squared
differences between the actual value of effort and the predicted value of
effort for each project.
Residual SS = ¦( yi − yˆ i ) 2
(12.19)
This can also be thought of as the total variation in the data not explained by our model. The total variation of the data equals the variation
explained by the model plus the variation not explained by the model, that
10
The statistical term “variance” is defined as the sum of squared deviations divided by the number of observations minus one. It is a measure of the average
variation of the data. Here I am referring to the total variation of the data (i.e.
we don’t divide by the number of observations).
394
Katrina D. Maxwell
is, Total SS = Model SS + Residual SS. The variation in the data explained
by our model is:
Model SS = ¦( yˆ i − y ) 2
(12.20)
Thus, if effort really does depend on size, the Residual SS will be small
and the differences between the predicted values of effort and the mean
value of effort (Model SS) will be close to the Total SS.
This is the logic that underlies the accuracy measure of the regression
2
model, r :
r2 =
Model SS
Total SS
(12.21)
This is the R-squared (r2) value. It is the fraction of the variation in the
data that can be explained by the model. It can vary between 0 and 1 and
measures the fit of the regression equation. If the model is no better than
just taking averages, the Model SS will be small compared to the Total SS
2
and r will approach 0. This means that the linear model is bad. If the Model
2
2
SS is almost the same as the Total SS, then r will be very close to 1. An r
value close to 1 indicates that the regression line fits the data well. Our
2
effort example has an r value of 0.95. This means that 95% of the variation
in effort is explained by variations in size. In simple regression, r2 is also
the square of Pearson’s correlation coefficient, r, between two variables.
2
You may wonder how high of an r value is needed for a regression
model to be useful? The answer is that it depends. If I didn’t know anything
about the relationship between quality requirements and productivity, I
2
would find any r to be useful. Before I knew nothing, but now I know
something. If the r2 is very small, then I know there is no linear relationship.
If the r2 of productivity as a function of quality requirements is 0.25, I
would find it useful to know that quality requirements explain 25% of the
variation in productivity. This is quite a high percentage of productivity for
one variable to explain. However, 0.25 is too small for a good predictive
2
model. In this case, an r over 0.90 would be great. But, I would also need to
check for influential observations and consider the 95% confidence inter2
vals before I got too excited. A very high r is sometimes due to an extreme
value.
Multiple Regression
Multiple regression is basically the same as simple regression except that
instead of the model being a simple straight line, it is an equation with
more than one independent variable. As a result, the calculations are more
complex. In addition, once we get beyond three dimensions (two independent variables), we can no longer visualize the relationship. For exam-
What You Need To Know About Statistics
395
ple, at the most, we can draw a three-dimensional graph of effort as a function of application size and team size. The three-dimensional model is the
plane that minimizes the sum of the squared deviations between each project and the plane. However, it is impossible to draw a four-dimensional
diagram of effort as a function of application size, team size, and reliability
2
2
requirements. In multiple regression, the r is capitalized, R , and is called
the coefficient of multiple determination.
Significance of Results
In both simple and multiple regression, the final step is to determine if our
result is significant. Is our model significant? Are the coefficients of each
variable and the constant significant? What does “significant” mean? Significance is best explained as follows. The lower the probability that our
results are due to chance, the higher their significance. The probability is
related to our sample size (i.e. the number of projects) and the number of
variables we used to model the dependent variable. Different distributions,
namely the F-distribution and the t-distribution, are used to determine
these probabilities. Our statistical analysis package knows which distributions to use and will calculate the probability of our results being due to
chance. We usually consider significant a probability value lower than or
equal to 0.05. In research papers, it is common to read that results are significant at the 5% level (for a probability value lower than or equal to 0.05)
or the 1% level (for a probability value lower than or equal to 0.01).
How To Interpret Regression Output
Now that you know some basics of regression analysis, you will be able to
better understand the regression output in the example below. This is an
example of the regression model ln(effort) as a function of ln(size) using
software project data. Figure 12.7 shows the regression line fit to the data.
In the upper left corner of the output, we have a table. This is known as
the analysis of variance (ANOVA) table. The column headings are defined
as follows: SS = sum of squares, df = degrees of freedom, and MS = mean
square. In this example, the total sum of squares (Total SS) is 34.86. The
sum of squares accounted for by the model is 22.69 (Model SS), and 12.17
is left unexplained (Residual SS). There are 33 total degrees of freedom
(34 observations – 1 for mean removal), of which 1 is used by the model
(one variable, lsize), leaving 32 for the residual. The mean square error
(Residual MS) is defined as the sum of squares (Residual SS) divided by
the corresponding degrees of freedom (Residual df). Here, 12.17/32 = 0.38.
396
Katrina D. Maxwell
Example
. regress leffort lsize
Source |
SS
df
MS
---------+-----------------------------Model | 22.6919055
1 22.6919055
Residual | 12.1687291
32 .380272786
---------+-----------------------------Total | 34.8606346
33 1.05638287
Number of obs
F( 1,
32)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
34
59.67
0.0000
0.6509
0.6400
.61666
-------------------------------------------------------------------leffort |
Coef.
Std. Err.
t
P>|t| [95% Conf. Interval]
---------+---------------------------------------------------------lsize |
.9297666
.1203611 7.725
0.000 .6845991
1.174934
_cons |
3.007431
.7201766 4.176
0.000 1.54048
4.474383
--------------------------------------------------------------------
In the upper right corner of the output, we have other summary statistics.
The number of observations is 34. The F statistic associated with the
ANOVA table (1 and 32 refer to the degrees of freedom of the model and
the residual, respectively) is 59.67. The F statistic is calculated with the
following equation:
F=
22.6919 / 1
Model SS/Model df
=
= 59.67
Residual SS/Residual df 12.1687 / 32
(12.22)
The F statistic tests the null hypothesis that all coefficients excluding the
constant are zero. Prob > F = 0.0000 means that the probability of observing an F statistic of 59.67 or greater is 0.0000, which is my statistical analysis package’s way of indicating a number smaller than 0.00005. Thus, we
can reject the null hypothesis as there is only a 0.005% probability that all
the coefficients are zero. This means that there is a 99.995% probability
that at least one of them is not zero. In this case, we only have one inde2
pendent variable, so its coefficient is definitely not zero. The R (R2
squared) for the regression is 0.6509, and the R adjusted for the degrees of
freedom (Adj R-squared) is 0.6400. The root mean square error (Root MSE)
is 0.61666. This is the same as the square root of MS Residual in the
ANOVA table.
2
When you interpret the R in the statistical output, you should use the
Adjusted R-squared. This is because it is always possible to increase the
2
value of R just by adding more independent variables to the model. This is
true even when they are not related to the dependent variable. The number
of observations must be significantly greater than the number of variables
for the results to be reliable. The Adjusted R-squared is calculated by the
following equation:
What You Need To Know About Statistics
Adjusted R 2 = 1 −
(1 − R 2 )(Total df )
( Residual df )
397
(12.23)
The total and residual degrees of freedom (df) can be read directly from
the statistical output. In regression analysis, the total degrees of freedom
are n−1 and the residual degrees of freedom are n−k, where n is the number of observations and k is the number of independent variables −1 (for
the constant term).
At the bottom of the output, we have a table of the estimated coefficients (Coef.). The first line of the table tells us that the dependent variable
is leffort. The estimated model is:
leffort = 3.0074 + 0.9298 × lsize
(12.24)
At the right of the coefficients in the output are their standard errors
(Std. Err.), t statistics (t), significance of the t statistics (P>|t|), and 95%
confidence intervals (95% Conf. Interval). In this example, the standard
error for the coefficient of lsize is 0.1203611. The corresponding t statistic
is 7.725 (t = Coef./Std.Err.), which has a significance level of 0.000. This
is my statistical analysis package’s way of indicating a number less than
0.0005. Thus, we can be 99.95% sure that the coefficient of lsize is not
really 0. That is, we can be confident that there really is a relationship between leffort and lsize. The 95% confidence interval for the coefficient is
[−0.6846, 1.1749]. This means that we are 95% confident that the true
coefficient of lsize in the population lies between –0.6896 and 1.1749.
Confidence intervals are explained in Sect. 12.3.
Analysis of Residual Errors
If the assumptions of the regression model are met, then the plot of the
residuals vs. fitted values (predicted values) should look like a random
array of dots. If there is a pattern, this indicates that we have a problem.
Figure 12.12 shows this plot for our regression output.
The assumptions of regression are:
1. A linear relationship exists.
2. The residuals have a constant variance. (This is called homoscedasticity.)
3. The residuals are independent.
4. The residuals are normally distributed.
398
Katrina D. Maxwell
residual
1.0237
-1.13505
6.79859
10.5591
Fitted values
Fig. 12.12. Plot of residuals vs. fitted values
We can check Assumptions 1–3 by looking out for the following patterns in the residuals (Figs. 12.13 to 12.15):
Fig. 12.13. Violation of Assumption 1
Fig. 12.14. Violation of Assumption 2
What You Need To Know About Statistics
399
Fig. 12.15. Possible violation of Assumption 3
The residuals in Fig. 12.13 indicate that the relationship is not linear
(violation of Assumption 1). Figure 12.14 shows an example where the
errors increase with increasing values of x. This is a violation of Assumption 2. A residual pattern like Fig. 12.15 could mean that Assumption 3 has
been violated. Assumption 4 is the easiest to check. We simply plot the
distribution of the residuals. Figure 12.16 shows the residual distribution
for our example.
Fraction
.4
.2
0
-1.5
-.75
0
residual
.75
1.5
Fig. 12.16. Checking Assumption 4: distribution of the residuals
This distribution of residuals is not too far from a normal distribution. If
the assumption of a normal distribution is not met, the tests of significance
and the confidence intervals developed from them may be incorrect.
400
Katrina D. Maxwell
12.5.4 Analysis of Variance (ANOVA)
When many of the independent variables are qualitative, we cannot use
regression analysis. We need a different method. ANOVA techniques can
be used to identify and measure the impact of qualitative variables (business sector, application language, hardware platform, etc.) on a dependent
variable (effort, productivity, duration, etc.). Like regression analysis,
these techniques break down the total variation of the data into its various
parts using a set of well-defined procedures. As with regression analysis,
entire books have been written about ANOVA methods. I’ve summarised
the fundamental concepts in this short section.
Simple ANOVA
Let’s say that we want to know if the percentage of JCL (Job Control Language used for programming and controlling batch processing in an IBM
mainframe environment) used is related to application type. We will study
this relationship using some of the maintenance data from industrial software projects. I have chosen to illustrate the ANOVA method with a subset of the data that contains an equal number of observations in each category. In practice, the number of observations in each category will not be
the same because software project data is inherently unbalanced. However,
although the calculations are more complicated, the principal remains the
same. If you understand this example, you will understand the basics of
ANOVA.
One important assumption of ANOVA is that the sample is selected
from a normally distributed parent population, which is the case for our
example data.
Is there a difference in the percentage of JCL use among the three application types? Let’s look at Table 12.17. At first glance, I would say no. It
looks like the percentage of JCL use varies quite a lot within each application type. It is not as if we see values around 30% for all back office database applications, around 50% for all customer interconnection service
applications, and around 70% for all core banking business system applications. However, it is impossible to make any conclusion by just looking at
the data.
What You Need To Know About Statistics
401
Table 12.17. Percentage of JCL use data and some ANOVA calculations
Core banking
business system
GM
38
24
2
43
60
100
63
55
9
62
55
37
37
35
95
47.67
Customer
interconnection
service
54
0
90
0
30
33
21
68
58
56
89
84
96
79
31
52.60
GV
737.38
1033.11
513.21
Back office
database
GM = Group Means
12
52
90
74
64
55
13
49
12
31
39
49
31
35
53
43.93
Sample Mean
= 48.07
Mean of GV
= 761.23
GV = Group Variances
Our ANOVA example will test the following null hypothesis:
The mean percentage of JCL use is the same (in the population) for
each of the three application types.
Of course, we do not know what the means in the population are. But,
we can use our sample to estimate the means. If the means of percentage of
JCL use for each application type in our sample are very close together, we
will be more willing to accept that the null hypothesis is true. Our group
means are 47.67%, 52.60%, and 43.93%. These group means do not seem
that close to me. But are they significantly different given the size of our
11
sample?
We can get a better idea of any relationship that exists by calculating
two variances, the variance between the groups and the variance within the
groups, and comparing them. Both of these are estimates of the population
variance. We can use their ratio to accept or reject the null hypothesis. The
larger the variance in JCL use between application types and the smaller
11
In practice, a software manager would probably not consider a 9% difference in
percentage of JCL use to be that important, even if it was significant.
402
Katrina D. Maxwell
the variance within the application types, the more likely it is that percentage of JCL use really does differ among application types.
Between-Groups Variance
The between-groups variance calculates the variation between the mean
percentage of JCL use of the three application types (47.67, 52.60, 43.93)
measured about the mean JCL use of all 45 applications in our sample
(48.07). The sample variance of the group means is a function of the
squared differences of each group mean and the overall sample mean divided by the number of groups (application types) minus one:
s 2x =
(47.67 − 48.07) 2 + (52.60 − 48.07) 2 + (43.93 − 48.07) 2
= 18.91 (12.25)
2
As this is the sample variance of the group mean, we must multiply it by
the number of observations in the group (15) to calculate the between
groups variance:
2
sbg
= ns x2 = 15(18.91) = 283.65
(12.26)
Within-Groups Variance
The group variance for each application type tells us how close the actual
values of JCL use are to the mean values for that application type; that is,
how much the data varies within each group. This is known as the withingroups variance. For example, the variance of JCL use for back office database applications is:
2
sbackoffice
=
(38− 47.67)2 + (24− 47.67)2 + (2 − 47.67)2 +...+ (95− 47.67)2
= 737.38 (12.27)
14
We have three application types and thus three estimates of population
variance (737.38, 1033.11, and 513.21). Since none of these estimates is
any better than the others, we combine them to create a single estimate of
the population variance based on the average “within”-groups variation:
2
s wg
=
737.38 + 1033.11 + 513.21
= 761.23
3
(12.28)
What You Need To Know About Statistics
403
F Ratio
Now we can calculate the ratio of the between-groups variance and the
within-groups variance. This is known as the F ratio:
F=
2
sbg
2
s wg
=
283.65
= 0.37
761.23
(12.29)
The within groups variance can be thought of as the variance due to
random differences among applications. The between-groups variance can
be thought of as the variance due to random differences among applications plus differences in application type. Thus, the extent to which F exceeds 1 is indicative of a possible real effect of application type on JCL
use. However, even if F exceeds 1, it is possible that it could be by chance
alone. The probability of this occurring by chance is given by the Fdistribution and is calculated automatically by your statistical analysis
package. In our example, F is less than 1, and we can conclude that there is
no relationship between application type and the percentage of JCL use.
How To Interpret ANOVA Output
Now that you know some basics of ANOVA, you will be able to better
understand the ANOVA output in the example below. This is the output
from my statistical analysis package for our ANOVA example: percentage
of JCL use (rperjcl) as a function of application type (apptype).
Example
. anova rperjcl apptype
Number of obs =
45
Root MSE
= 27.5905
R-squared
= 0.0174
Adj R-squared = -0.0294
Source | Partial SS
df
MS
F
Prob > F
----------+------------------------------------------------Model | 566.933333
2 283.466667 0.37
0.6913
apptype | 566.933333
2 283.466667 0.37
0.6913
Residual | 31971.8667
42 761.234921
----------+------------------------------------------------Total |
32538.80
44 739.518182
At the top of the ANOVA output is a summary of the underlying regression. The model was estimated using 45 observations, and the root mean
square error (Root MSE) is 27.59. The R-squared for the model is 0.0174,
and the R-squared adjusted for the number of degrees of freedom (Adj Rsquared) is –0.0294. (See the regression output in the previous section for
a discussion of Adjusted R-squared.) Obviously, this model is pretty bad.
404
Katrina D. Maxwell
The first line of the table summarises the model. The sum of squares
(Model SS) for the model is 566.9 with 2 degrees of freedom (Model df).
This results in a mean square (Model MS) of 566.9/2 ≅ 283.5. This is our
2
between groups variance, sbg . (Once again, there is a small difference between the computer’s between-groups variance and my calculation due to
rounding errors.)
2
sbg
Model SS/Model df
F=
= 2
Residual SS/Residual df s wg
(12.30)
The corresponding F ratio has a value of 0.37 and a significance level of
0.6913. Thus, the model is not significant. We cannot reject the null hypothesis and say that there is no difference in the mean percentage of JCL
use of different application types.12
The next line summarises the first (and only) term in the model,
apptype. Since there is only one variable, this line is the same as the previous line.
The third line summarises the residual. The residual sum of squares (Residual SS) is 31,971.87, with 42 degrees of freedom (Residual df), resulting
in a mean square error of 761.23 (Residual MS). This is our within-groups
2
variance, swg . The Root MSE is the square root of this number.
The Model SS plus the Residual SS equals the Total SS. The Model df
plus the Residual df equals the Total df, 44. As there are 45 observations,
and we must subtract 1 degree of freedom for the mean, we are left with 44
total degrees of freedom.
Multi-variable ANOVA
ANOVA can also be used to produce regression estimates for models with
numerous quantitative and qualitative variables. ANOVA uses the method
of least squares to fit linear models to the quantitative data. Thus, you can
think of it as a combination of multiple regression analysis and the simple
ANOVA I just explained. This is not so strange as you can see that the
underlying principle in both methods is that we compare values to a mean
value. In both methods, we also compare the variation explained by the
model to the total variation of the data to measure its accuracy.
12
Had we been able to reject the null hypothesis in this example, it might not
have been because of the differences in the population means, but because of
the differences in their variances. When the sample variances for the different
groups are very different, as they are in this example, then reject with caution.
The ANOVA approach assumes that the population variances are similar.
What You Need To Know About Statistics
405
12.5.5 Comparing Two Estimation Models
I recommend that non-statistical experts use the Wilcoxon signed-rank test
with matched pairs to determine if there is a statistically significant difference between two estimation models. This is a non-parametric statistic. As
such, it is free from the often unrealistic assumptions underlying paramet13
ric statistics. For example, one of the assumptions of the parametric
paired t-test is that the paired data has equal variances. This may not be the
case with your data and you do not want to have to worry about it. Nonparametric tests can always be used instead of parametric tests; however,
the opposite is not true.
The Wilcoxon Signed-Rank Test Applied to Matched Pairs
The Wilcoxon signed-rank test is based on the sign and rank of the absolute values of pair differences and is done automatically by most statistical
analysis packages. What does this actually mean and how can we apply it
to effort estimation models? Table 12.18 shows the estimation error (i.e.
actual – estimate) and the absolute estimation error (i.e. |actual – estimate|) of two hypothetical effort estimation models used on three projects.
We use the absolute estimation error in our calculations because we are
interested only in the magnitude of the estimation error and not if it is over
or under the estimate. The pair difference, then, is the difference in the
absolute values of the estimation errors of the two models, C and D, for
each project. The sign is negative if Model D’s error is greater than Model
C’s for that project. The sign is positive if Model C’s error is greater than
Model D’s for that project. The rank is based on the comparison of absolute values of the pair differences for each project. The smallest absolute
pair difference of all three projects gets a rank of 1, the second smallest
gets a rank of 2, and so on. The computer uses the information in the last
two columns to compute the Wilcoxon signed-rank test statistic. From this
test, we can determine if either Model C or Model D has consistently
smaller errors.
Let’s look at the statistical output for this test in the next example to try
to understand what is going on. The example compares two models –
Model A and Model B.
In the statistical output, aModel_A refers to Model A’s absolute errors
and aModel_B refers to Model B’s absolute errors. The null hypothesis is
that the distribution of the paired differences has a median of 0 and is
symmetric. This implies that for approximately half the projects, Model A
13
Parametric statistics are only suitable for data measured on interval and ratio
scales, where parameters such as the mean of the distribution can be defined.
406
Katrina D. Maxwell
has a smaller error, and for half the projects, Model B has a smaller error.
Thus neither model is better. If this were the case, then we would expect
the sum of the ranks to be the same for positive and negative differences.
These are the expected values, 976.5, in the statistical output.
Table 12.18. How to Rank Differences for Wilcoxon Signed-Rank Tests on
Matched Pairs
Id
Estimation error (hours) Model C
Estimation error (hours) Model D
Absolute estimation error (hours) Model C
Absolute estimation error (hours) Model D
Pair difference
Sign
Rank of absolute differences
OB1
−200
300
200
300
−100
−
2
OB2
50
100
50
100
-50
−
1
OB3
150
−20
150
20
130
+
3
Example
. signrank aModel_A=aModel_B
Wilcoxon signed-rank test
sign |
obs
sum ranks
expected
---------+--------------------------------positive |
26
798
976.5
negative |
36
1155
976.5
zero |
0
0
0
---------+--------------------------------all |
62
1953
1953
unadjusted variance
adjustment for ties
adjustment for zeros
adjusted variance
20343.75
0.00
0.00
---------20343.75
Ho: aModel_A = aModel_B
z = -1.251
Prob > |z| =
0.2108
What we find, however, is that the rank sum of the positive differences
is 798 and the rank sum of the negative differences is 1155. This means
that Model B’s absolute error is ranked higher than Model A’s absolute
error for more projects (remember that the difference = Model A – Model
B). However, we only have a sample and this may have happened by
chance. So, we need to check the probability that this happened by chance.
The statistic computed by the Wilcoxon test, the z value, is –1.251. If –
1.96 > z > 1.96, there is no difference between the models. If z is less than
–1.96, then Model A has a significantly lower absolute error. If z is greater
What You Need To Know About Statistics
407
than 1.96, then Model A has a significantly higher absolute error. As –
1.251 is between –1.96 and 1.96, this means that there is no statistically
significant difference between the models. Our significance level is 21%
(Pr > |z| = 0.2108). This means that if we reject the null hypothesis, there
is a 21% probability of being wrong (i.e. rejecting the null hypothesis
when it is in fact true). It is typical in statistical studies to accept only a 5%
chance of being wrong. Thus, there is no statistically significant difference
between the two models.
Does this z value, 1.96, seem familiar? In fact, it comes from the 95%
confidence interval of the normal curve. There is a 5% chance of getting a
value higher than |1.96|. This means there is a 2.5% chance of getting a
value lower than –1.96 and a 2.5% chance of getting a value higher than
1.96. This is what is meant by a two-sided (or two-tailed) test. A one-sided
test checks only the chance of a value being lower or higher.
12.5.6 Final Comments
In this chapter, you learned some of the basic concepts of statistics and
developed a deeper understanding of multivariate statistical analysis. I’ll
end with one final word of advice: Remember to be reasonable with your
inferences. If you find some interesting results based on 30 projects in your
company, you can say something about what is going on in your company.
This does not mean that this is true for all software projects in the world.
Only if people in different companies keep finding the same results can
you start to believe that you have found a fundamental truth. For example,
enough studies have now been published that we can be certain that there
is a real relationship between software effort and size. However, the exact
equation describing the relationship varies by study. This is why it is often
necessary to calibrate software cost estimation tools using your company’s
data.
Author’s Biography
Katrina Maxwell is an expert in the area of software development productivity
and cost estimation. Her research has been published in IEEE Transactions on
Software Engineering, IEEE Software, Management Science and Academic Press’s
prestigious “Advances in Computers” series. She is the author of Applied Statistics
for Software Managers published by Prentice Hall PTR. She has taught at the University of Illinois, INSEAD, and the Ecole Supérieure de Commerce de Paris, and
was Programme Chair of the ESCOM-SCOPE 2000 and 2001 conferences. Between 1988 and 1997 she was a Research Fellow at INSEAD where she undertook
research in the areas of economics, business policy, marketing, operations research
408
Katrina D. Maxwell
and technology management. In particular, she worked for four years on a research
project, funded by the cost analysis division of the European Space Agency, to
develop a better understanding of software development costs in order to improve
the evaluation of subcontractor bids. As the manager of the ESA software metrics
database, she improved the data collection methodology, collected and validated
data from subcontractors, analysed the data, and communicated the results via
research papers, conference presentations and workshops. In 1997, she created
Datamax, which specializes in consulting, research, and training in software metrics and data analysis. She is also a Senior Research Fellow at INSEAD.
13 Empirical Research Methods in Web
and Software Engineering1
Claes Wohlin, Martin Höst, Kennet Henningsson
Abstract: Web and software engineering are not only about technical solutions. They are to a large extent also concerned with organisational issues, project management and human behaviour. For disciplines like Web
and software engineering, empirical methods are crucial, since they allow
for incorporating human behaviour into the research approach taken. Empirical methods are common practice in many other disciplines. This chapter provides a motivation for the use of empirical methods in Web and
software engineering research. The main motivation is that it is needed
from an engineering perspective to allow for informed and well-grounded
decisions. The chapter continues with a brief introduction to four research
methods: controlled experiments, case studies, surveys and post-mortem
analyses. These methods are then put into an improvement context. The
four methods are presented with the objective to introduce the reader to the
methods to a level where it is possible to select the most suitable method at
a specific instance. The methods have in common that they all are concerned with quantitative data. However, several of them are also suitable
for qualitative data. Finally, it is concluded that the methods are not competing. On the contrary, the different research methods can preferably be
used together to obtain more sources of information that hopefully lead to
more informed engineering decisions in Web and software engineering.
Keywords: Case study, Controlled experiment, Survey, Post-mortem ana–
lysis, Empirical investigation, Engineering discipline.
13.1 Introduction
To become a true engineering discipline Web and software engineering
have to adopt and adapt research methods from other disciplines. Engineering means, among other things, that we should be able to understand, plan,
monitor, control, estimate, predict and improve the way we engineer our
products. One enabler for doing this is measurement. Web and software
1
A previous version of this chapter has been published in Empirical Methods and Studies
in Software Engineering: Experiences from ESERNET, pp 7–23, editors Reidar Conradi
and Alf Inge Wang, Lecture Notes in Computer Science Springer-Verlag, Germany,
2765, 2003. This chapter has been adapted by Emilia Mendes.
410
Claes Wohlin, Martin Höst, Kennet Henningsson
measurement form the basis, but they are not sufficient. Empirical methods
such as controlled experiments, case studies, surveys and post-mortem
analyses are needed to help us evaluate and validate the research results.
These methods are needed so that it is possible to scientifically state
whether something is better than something else. Thus, empirical methods
provide one important scientific basis for both Web and software engineering. For some types of problems other methods, e.g. the use of mathematical models for predicting software reliability, are better suited, but in most
cases the best method is to apply empiricism. The main reason is that Web
and software development are human intensive, and hence they do not lend
themselves to analytical approaches. This means that empirical methods are
essential to the researcher.
The empirical methods are, however, also crucial from an industrial
point of view. Companies aspiring to become learning organisations have
to consider the following definition of a learning organisation:
“A learning organisation is an organisation skilled at creating, acquiring, and transferring knowledge, and at modifying its behavior to
reflect new knowledge and insights.” [1]
Garvin continues by stating that learning organisations are good at five
activities: systematic problem solving, experimentation, learning from past
experiences, learning from others, and transferring knowledge. This includes relying on scientific methods rather than guesswork. From the perspective of this chapter, the key issue is the application of a scientific
method and the use of empirical methods as a vehicle for systematic improvement when engineering Web applications and software. The quote
from Garvin is inline with the concepts of the Quality Improvement Paradigm and the Experience Factory [2] that are often used in a software engineering context.
In summary, the above means that Web and software engineering researchers and learning organisations both have a need to embrace empirical methods. The main objective of this chapter is to provide an introduction to four empirical research methods and to put them into an
engineering context.
The remainder of this chapter is outlined as follows. Four empirical
methods are briefly introduced in Sect. 13.2 to provide the reader with a
reference framework to better understand the differences and similarities
between the methods presented later. In Sect. 13.3, the four empirical
methods are put into an improvement context before presenting the methods in some more detail in Sects. 13.4 to 13.7. The chapter is concluded
with a short summary in Sect. 13.8.
Empirical Research Methods in Web and Software Engineering
411
13.2 Overview of Empirical Methods
There are two main types of research paradigms having different approaches to empirical studies. Qualitative research is concerned with
studying objects in their natural setting. A qualitative researcher attempts
to interpret a phenomenon based on explanations that people bring to them
[3]. Qualitative research begins with accepting that there is a range of different ways of interpretation. It is concerned with discovering causes noticed by the subjects in the study, and understanding their view of the
problem at hand. The subject is the person who is taking part in a study in
order to evaluate an object.
Quantitative research is mainly concerned with quantifying a relationship or comparing two or more groups [4]. The aim is to identify a cause
effect relationship. The quantitative research is often conducted through
setting up controlled experiments or collecting data through case studies.
Quantitative investigations are appropriate when testing the effect of some
manipulation or activity. An advantage is that quantitative data promotes
comparisons and statistical analysis. The use of quantitative research
methods is dependent on the application of measurement, which is further
discussed in [5].
It is possible for qualitative and quantitative research to investigate the
same topics but each of them will address a different type of question. For
example, a quantitative investigation could be launched to investigate how
much a new inspection method decreases the number of faults found in a
test. To answer questions about the sources of variations between different
inspection groups, we need a qualitative investigation.
As mentioned earlier, quantitative strategies, such as controlled experiments, are appropriate when testing the effects of a treatment, while a
qualitative study of beliefs and understandings is appropriate to find out
why the results from a quantitative investigation are as they are. The two
approaches should be regarded as complementary rather than competitive.
In general, any empirical study can be mapped to the following main research steps: Definition, Planning, Operation, Analysis & interpretation,
Conclusions and Presentation & packaging. The work within the steps
differs considerably depending on the type of empirical study. However,
instead of trying to present four different research methods according to
this general process, we have chosen to highlight the main aspects of interest for the different types of studies.
Depending on the purpose of the evaluation, whether it is techniques,
methods or tools, and depending on the conditions for the empirical investigation, there are four major different types of investigations (strategies)
that are addressed here:
412
•
•
Claes Wohlin, Martin Höst, Kennet Henningsson
Experiment. Experiments are sometimes referred to as research-in-thesmall [6], since they are concerned with a limited scope and most often
are run in a laboratory setting. They are often highly controlled and
hence also occasionally referred to as controlled experiments, which is
used hereafter. When experimenting, subjects are assigned to different
treatments at random. The objective is to manipulate one or more variables and control all other variables at fixed levels. The effect of the
manipulation is measured, and based on this a statistical analysis can be
performed. In some cases it may be impossible to use true experimentation; we may have to use quasi-experiments. The latter term is often
used when it is impossible to perform random assignment of the subjects to the different treatments. An example of a controlled experiment
in Web engineering is to compare two different methods for developing
web applications (e.g. OOHDM vs. W2000). For this type of study,
methods for statistical inference are applied with the purpose of showing with statistical significance that one method is better than the other
[7, 8, 9].
Case study. Case study research is sometimes referred to as researchin-the-typical [6]. It is described in this way because a case study normally studies a real project and hence the situation is “typical”. Case
studies are used for monitoring projects, activities or assignments. Data
is collected for a specific purpose throughout the study. Based on the
data collection, statistical analyses can be carried out. The case study is
normally aimed at tracking a specific attribute or establishing relationships between different attributes. The level of control is lower in a
case study than in an experiment. A case study is an observational
study while the experiment is a controlled study [10]. A case study
may, for example, be aimed at building a model to predict the number
of faults in testing. Multivariate statistical analysis is often applied in
this type of study. The analysis methods include linear regression and
principal component analysis [11]. Case study research is further discussed in [9, 12, 13, 14].
The following two methods are both concerned with research-in-thepast, although they have different approaches to studying the past:
•
Survey. The survey is referred to by [6] as research-in-the-large (and
past), since it is possible to send a questionnaire to or interview a
large number people covering whatever target population we have.
Thus, a survey is often an investigation performed in retrospect, when
a tool or technique, say, has been in use for a while [13]. The primary
means of gathering qualitative or quantitative data are interviews or
questionnaires. These are done by taking a sample that is representative of the population to be studied. The results from the survey are
Empirical Research Methods in Web and Software Engineering
•
413
then analysed to derive descriptive and explanatory conclusions. They
are thengeneralised to the population from which the sample was
taken. Surveys are discussed further in [9, 15].
Post-mortem analysis. This type of analysis is also conducted on the
past as indicated by the name. However, it should be interpreted a little
broader than literally as a post-mortem. For example, a project does
not have to be finished to launch a post-mortem analysis. It should be
possible to study any part of a project retrospectively using this type of
analysis. Thus, this type of analysis may, in the descriptive way used
by [6], be described as being research-in-the-past-and-typical. It can
hence be viewed as related to both the survey and the case study. The
post-mortem may be conducted by looking at project documentation
(e.g. archival analysis [9]) or by interviewing people, individually or as
a group, who have participated in the object that is being analysed in
the post-mortem analysis.
An experiment is a formal, rigorous and controlled investigation. In an
experiment the key factors are identified and manipulated. The separation
between case studies and experiments can be represented by the notion of a
state variable [13]. In an experiment, the state variable can assume different
values and the objective is normally to distinguish between two situations:
for example, a control situation and the situation under investigation. Examples of a state variable could be, for example, the inspection method or
experience of the Web developers. In a case study, the state variable only
assumes one value, governed by the actual project under study.
Case study research is a technique where key factors that may have any
effect on the outcome are identified and then the activity is documented
[12, 14]. Case study research is an observational method, i.e. it is done by
observation of an on-going project or activity.
Surveys are very common within social sciences where, for example, attitudes are polled to determine how a population will vote in the next election. A survey provides no control of the execution or the measurement,
though it is possible to compare it with similar ones, but it is not possible
to manipulate variables as in the other investigation methods [15].
Finally, a post-mortem analysis may be viewed as inheriting properties
from both surveys and case studies. A post-mortem may contain survey
elements, but it is normally concerned with a case. The latter could be either a full Web project or a specific targeted activity.
For all four methods, it is important to consider the population of interest. It is from the population that a sample should be found. The sample
should preferably be chosen randomly from the population. The sample
should consist of a number of subjects: for example, in many cases individuals participating in a study. The actual population may vary from an
414
Claes Wohlin, Martin Höst, Kennet Henningsson
ambition to have a general population, as is normally the objective in experiments where we would like to generalise the results, to a more narrow
view, which may be the case in post-mortem analyses and case studies.
Some of the research strategies could be classified as both qualitative
and quantitative, depending on the design of the investigation, as shown in
Table 13.1. The classification of a survey depends on the design of the
questionnaires, i.e. which data is collected and if it is possible to apply any
statistical methods. Also, this is true for case studies, but the difference is
that a survey is done in retrospect while a case study is done when a project is executed. A survey could also be launched before the execution of a
project. In the latter case, the survey is based on previous experiences and
hence conducted in retrospect to these experiences, although the objective
is to get some ideas of the outcome of the forthcoming project. A postmortem is normally conducted close to the end of an activity or project. It
is important to conduct it close in time to the actual finish so that people
are still available and the experiences fresh.
Experiments are purely quantitative since they have a focus on measuring different variables, change them and measure them again. During these
investigations quantitative data is collected and then statistical methods are
applied. Sections 13.4 to 13.7 give introductions to each empirical strategy, but before this the empirical methods are put into an improvement
context in the following section. The introduction to controlled experiments is longer than for the other empirical methods. The main reason is
that the procedure for running controlled experiments is more formal, i.e. it
is sometimes referred to as a fixed design [9]. The other methods are more
flexible and it is hence not possible to describe the actual research process
in the same depth. Table 13.1 indicates this, where the qualitative and
quantitative nature of the methods are indicated. Methods with a less fixed
design are sometimes referred to as flexible design [9], which also indicates that the design may change during the execution of the study due to
events happening during the study.
Table 13.1. Qualitative vs. quantitative in empirical strategies
Strategy
Experiment
Case study
Survey
Post-mortem
Qualitative/quantitative
Quantitative
Both
Both
Both
Empirical Research Methods in Web and Software Engineering
415
13.3 Empirical Methods in an Improvement Context
Systematic improvement includes using a generic improvement cycle such
as the Quality Improvement Paradigm (QIP) [2]. This improvement cycle is
generic in the sense that it can both be viewed as a recommended way to
work with improvement of Web and software development, and also be
used as a framework for conducting empirical studies. For simplicity, it is
primarily viewed here as a way of improving Web development, and complemented with a simple three-step approach on how the empirical methods
can be used as a vehicle for systematic engineering-based improvement.
The QIP consists of six steps that are repeated iteratively:
1. Characterise. The objective is to understand the current situation and
establish a baseline.
2. Set goals. Quantifiable goals are set and given in terms of improvement.
3. Choose process/method/technique. Based on the characterisation and
the goals, the part to improve is identified and a suitable improvement
candidate is identified.
4. Execute. The study or project is performed and the results are collected
for evaluation purposes.
5. Analyse. The outcome is studied and future possible improvements are
identified.
6. Package. The experiences are packaged so that they can form the basis
for further improvements.
It is in most cases impossible to start improving directly. The first step is
normally to understand the current situation and then improvement opportunities are identified and they need to be evaluated before being introduced into an industrial process as an improvement. Thus, systematic improvement is based on the following steps:
•
•
•
Understand,
Evaluate, and
Improve.
As a scenario, it is possible to imagine that one or both of the two
methods looking at the past are used for understanding and baselining, i.e.
a survey or a post-mortem analysis may be conducted to get a picture of
the current situation. The objectives of a survey and a post-mortem analysis are slightly different as discussed in Sect. 13.2. The evaluation step
may be executed using either a controlled experiment or a case study. It
will most likely be a controlled experiment if the identified improvement
candidate is evaluated in a laboratory setting and compared with another
method, preferably the existing method or a method that may be used for
416
Claes Wohlin, Martin Höst, Kennet Henningsson
benchmarking. It may be a case study if it is judged that the improvement
candidate can be introduced in a pilot project directly. This pilot Web
project ought to be studied and a suitable method is to use a case study. In
the actual improvement in an industrial setting (normally initially in one
project), it is probably better to use a case study approach, which then
may be compared with the situation found when creating the understanding. Finally, if the evaluation comes out positive, the improvement is incorporated in the standard Web or software development process.
The above means that the four methods presented here should be viewed
as complementary and not competing. They all have their benefits and
drawbacks. The scenario above should be viewed as one possible way of
using the methods as complementary in improving the way Web applications and software are engineered.
Next, the four methods are presented in more detail to provide an introduction and understanding of them. The objective is to provide sufficient
information so that a researcher intending to conduct an empirical study in
Web or software engineering can select an appropriate method given the
situation at hand.
13.4 Controlled Experiments
13.4.1 Introduction
In an experiment the researcher has control over the study and how the
participants carry out the tasks that they are assigned to. This can be compared to a typical case study, see below, where the researcher is more of an
observer. The advantage of the experiment is, of course, that the study can
be planned and designed to ensure high validity, although the drawback is
that the scope of the study often gets smaller. For example, it sould be
possible to view a complete Web development project as a case study, but
a typical experiment does not include all the activities of such a project.
Experiments are often conducted to compare a number of different techniques, methods, working procedures, etc. For example, an experiment
could be carried out with the objective of comparing two different reading
techniques for inspections. In this example two groups of people could
independently perform a task with one reading technique each. That is, if
there are two reading techniques, R1 and R2, and two groups, G1 and G2,
then people in group G1 could use technique R1 and people in group G2
could use technique R2. This small example is used in the following subsections to illustrate some of the concepts for controlled experiments.
Empirical Research Methods in Web and Software Engineering
417
13.4.2 Design
Before the experiment can be carried out it must be planned in detail. This
plan is often referred to as the experiment’s design.
In an experiment we wish to draw conclusions that are valid for a large
population. For example, we wish to investigate whether reading technique
R1 is more effective than reading technique R2 in general for any developer, project, organization, etc. However, it is, of course, impossible to
involve every developer in the study. Therefore, a sample of the entire
population is used in the experiment. Ideally, it should be possible to randomly choose a sample from the population to include in the study, but
this is for obvious reasons almost impossible. Often, we end up trying to
determine to which population we can generalise the results from a certain
set of participants.
The main reason for the above is that the relation between sample and
population is intricate and difficult to handle. In the Web and software
engineering domains, it is mostly desirable to sample from all Web or
software developers, or a subset of them, e.g. all Web designers using a
specific programming language. For practical reasons this is impossible.
Thus, in the best case it is possible to choose from Web developers in the
vicinity of the researcher. This means that the sample is not a true sample
from the population, although it may be fairly good. In many cases, it is
impossible to have professional developers and students are used, and in
particular we have to settle for students on a specific course. The latter is
referred to as convenience sampling [9]. This situation means that in most
cases we must go from subjects to population when the preferred situation
is to go from population to subjects through random sampling. This should
not necessarily be seen as a failure. It may be a complementary approach.
However, it is important to be aware of the difference and also to consider
how this affects the statistical analysis, since most statistical methods have
developed based on the assumption of a random sample from the population of interest. The challenge of representative samples is also discussed
in Chap. 12.
Another important principle of experiments is randomisation. With this
we mean that when it is decided which treatment every participant should
be subject to, this is done by random. For example, if 20 people participate
in the study where the two reading techniques R1 and R2 are compared, it
is decided at random which 10 people should use R1 and which 10 people
should use R2.
In experiments a number of variables are often defined. Two important
types of variables are:
418
•
•
Claes Wohlin, Martin Höst, Kennet Henningsson
Independent variables: These variables describe the treatments in the
experiment. In the above example, the choice of reading technique is
an independent variable that can take one of the two values R1 or R2.
Dependent variables: These variables are studied to investigate
whether they are influenced by the independent variables. For example, the number of defects can be a dependent variable that we believe
is dependent on whether R1 or R2 is used. The objective of the experiment is to determine if and how much the dependent variables are
affected by the independent variables.
The independent and dependent variables are formulated to cover one or
several hypotheses that we have with respect to the experiment. For example, we may hypothesise that the number of defects is dependent on the
two reading techniques in the example. Hypothesis testing is discussed
further in relation to the analysis.
The independent and dependent variables are illustrated in Fig. 13.1 together with the confounding factors. Confounding factors are variables that
may affect the dependent variables without the knowledge of the researcher. It is hence crucial to try to identify the factors that otherwise may
affect the outcome in an undesirable way. These factors are closely related
to the threats about the validity of the empirical study. Thus, it is important
to consider confounding factors and the threats to the study throughout the
performance of any empirical study. The threats to empirical studies are
discussed in Sect. 13.4. One objective of the design is to minimise the effect of these factors.
Independent
variables
Experim ent
Dependent
variables
Confounding
factors
Fig. 13.1. Variables in an experiment
Often one of several available standard designs is used. Some examples
of standard designs are:
•
•
Standard design 1: One independent variable with two values. For
example, two techniques should be compared and each participant uses
one of the techniques.
Standard design 2: One independent variable with two values, paired
design. The difference between this design and standard design 1 is
Empirical Research Methods in Web and Software Engineering
•
•
419
that each person in this design is subject to both treatments. The order
in which each participant should apply the treatments is decided at random. For example, if the two reading techniques are to be evaluated,
half of the participants first use R1 and then R2, and the other half first
use R2 and then R1. The reason for using the treatments in different
order is that effects of the order should be ruled out.
Standard design 3: One independent variable with more than two values. The difference between this design and standard design 1 is that
more than two treatments are compared. For example, three reading
techniques may be compared.
Standard design 4: More than one independent variable. With this
design more than one aspect can be evaluated in an experiment. For
example, the choice of both reading technique and requirements notation may be compared in one experiment.
The designs that are presented here are a summary of some of the most
commonly used designs. There are alternatives and more complicated designs. For example, sometimes experiments are carried out as a combination of a pre-study and a main experiment.
13.4.3 Operation
In the operation of an experiment a number of parts can be included. These
include both parts that have to be done when starting the experiment and
when actually running the experiment. Three key parts are:
•
•
•
Commit participants: It is important that every participant is committed
to the tasks. There are a number of factors to consider: for example, if
the experiment concerns sensitive material, it will be difficult to get
committed people.
Prepare instrumentation: All the material that should be used during
the experiment must be prepared. This may include written instructions
to the participants, forms that should be used by the participants during
the tests, etc. The instrumentation should be developed according to
the design of the experiment. In most cases different participants
should be given different sets of instructions and forms. In many cases
paper-based forms are used during an experiment. It is, however, possible to collect data in a number of other ways, e.g. Web-based forms,
interviews, etc.
Execution: The actual execution denotes the part of the experiment
where the participants, subject to their treatment, carry out the task that
they are assigned to. For example, it may mean that some participants
solve a development assignment with one development tool and the
420
Claes Wohlin, Martin Höst, Kennet Henningsson
other participants solve the same assignment with another tool. During
this task the participants use the prepared instrumentation to receive instructions and to record data that can be used later in the analysis.
13.4.4 Analysis and Interpretation
Before actually doing any analysis, it is important to validate that the data
is correct, and that the instruments used (e.g. forms) have been filled out
correctly. This activity may also be sorted under execution of the experiment, and hence be carried out before the actual analysis.
The first part in the actual analysis is normally to apply descriptive statistics. This includes plotting the data in some way to obtain an overview
of the data. Part of this analysis is done to identify and handle outliers. An
outlier denotes a value that is atypical and unexpected in the data set. Outliers may, for example, be identified through boxplots [16] or scatterplots.
Every outlier must be investigated and handled separately. It may be that
the value is simply wrong. Then it may be corrected or discarded. It may
also, of course, be the case that the value is correct. In that case it can be
included in the analysis or, if the reason for the atypical value can be identified, it may be handled separately.
When we have made sure that the data is correct and obtained a good
understanding of the data from the descriptive statistics then the analysis
related to testing one or several hypotheses can start. In most cases the
objective here is to decide whether there is an effect of the value of the
independent variable(s) on the value of the dependent variable(s). This is
in most cases analysed through hypothesis testing. To understand hypothesis testing some important definitions must be understood:
•
•
•
•
The null hypothesis H0 denotes that there is no effect of the independent variable on the dependent variable. The objective of the hypothesis
test is to reject this hypothesis with a known significance.
P(type-I error) = P(reject H0 | H0 is true). This probability may also be
called the significance of a hypothesis test.
P(type-II error) = P(not reject H0 | H0 is false).
Power = 1 - P(type-II error) = P(reject H0 | H0 is false).
When the test is carried out, a maximum P(type-I error) is first decided.
Then a test is used in order to decide whether it is possible to reject the
null hypothesis or not. When choosing a test, it must be decided whether to
use parametric or non-parametric tests. Generally, there are harder requirements on the data for parametric tests. They are, for example, based
on the assumption that the data is normally distributed. However, parametric tests generally have higher power than non-parametric tests, i.e. less
Empirical Research Methods in Web and Software Engineering
421
data is needed to obtain significant results when using parametric tests.
The difference is not large. It is, of course, impossible to provide any exact
figure, but it is in most cases of the order of 10%. For every design there
are a number of tests that may be used. Some examples of tests are given
in Table 13.2.
The tests in Table 13.2 are all described in a number of basic statistical
references. More information on parametric tests can be found in [7], and
information on the non-parametric tests can be found in [8] and [17].
Table 13.2. Examples of tests
Standard design (see above )
Standard design 1
Standard design 2
Standard design 3
Standard design 4
Parametric tests
t-test
Paired t-test
ANOVA
ANOVA
Non-parametric tests
Mann–Whitney
Wilcoxon, Sign–test
Kruskal–Wallis
Before the results are presented it is important to assess how valid the
results are. Basically there are four categories of validity concerns, which
are discussed in a software engineering context in [18]:
•
•
•
•
Internal: The internal validity is concerned with factors that may affect
the dependent variables without the researcher's knowledge. An example of an issue is whether the history of the participants affects the result of an experiment. For example, the result may not be the same if
the experiment is carried out directly after a complicated fault in the
code has caused the participant a lot of problem compared to a more
normal situation. A good example of how confounding factors may
threaten the internal validity in a study is presented in [19].
External: The external validity is related to the ability to generalise the
results of the experiments. Examples of issues are whether the problem
that the participants have been working on is representative and
whether the participants are representative of the target population.
Conclusion: The conclusion validity is concerned with the possibility
to draw correct conclusions regarding the relationship between treatments and the outcome of an experiment. Examples of issues to consider are whether the statistical power of the tests is too low, or if the
reliability of the measurements is high enough.
Construct: The construct validity is related to the relationship between
the concepts and theories behind the experiment and what is measured
and affected. Examples of issues are whether the concepts are defined
clearly enough before measurements are defined, and interaction of
different treatments when persons are involved in more than one study.
422
Claes Wohlin, Martin Höst, Kennet Henningsson
Obviously, it is important to have these validity concerns already in
mind when designing the experiment and in particular when using a specific design type. In the analysis phase it is too late to change the experiment in order to obtain better validity. The different validity threats should
also be considered for the other types of empirical studies discussed in the
following sections.
When the analysis is completed the next step is to draw conclusions and
take actions based on the conclusions.
More in-depth descriptions of controlled experiments can be found in
[18] and [20].
13.5 Case Study
13.5.1 Introduction
A case study is conducted to investigate a single entity or phenomenon
within a specific time space. The researcher collects detailed information
on, for example, one single project for a sustained period of time. During
the performance of a case study, a variety of different data collection procedures may be applied [4].
If we want to compare two methods, it may be necessary to organise the
study as a case study or an experiment. The choice depends on the scale of
the evaluation. An example can be to use a pilot project to evaluate the
effects of a change compared to some baseline [6].
Case studies are very suitable for the industrial evaluation of Web and
software engineering methods and tools because they can avoid scale-up
problems. The difference between case studies and experiments is that
experiments sample over the variables that are being manipulated, while
case studies sample from the variables representing the typical situation.
An advantage of case studies is that they are easier to plan but the disadvantages are that the results are difficult to generalise and harder to interpret, i.e. it is possible to show the effects in a typical situation, but they
cannot be generalised to every situation [14].
If the effect of a process change is very widespread, a case study is more
suitable. The effect of the change can only be assessed at a high level of
abstraction because the process change includes smaller and more detailed
changes throughout the development process [6]. Also, the effects of the
change cannot be identified immediately. For example, if we want to know
if a new design tool increases the reliability, it may be necessary to wait
until after delivery of the developed product to assess the effects on operational failures.
Empirical Research Methods in Web and Software Engineering
423
Case study research is a standard method used for empirical studies in
various sciences such as sociology, medicine and psychology. Within Web
and software engineering, case studies should be used not only to evaluate
how or why certain phenomena occur, but also to evaluate the differences
between, for example, two design methods. This means, in other words, to
determine “which is best” of the two methods [14]. An example of a case
study might be to assess whether the use of perspective-based reading increases the quality of requirements specifications. A study like this cannot
verify that perspective-based reading reduces the number of faults that
reaches test, since this requires a reference group that does not use perspective-based techniques.
13.5.2 Case Study Arrangements
A case study can be applied as a comparative research strategy, comparing
the results of using one method or some form of manipulation to the results
of using another approach. To avoid bias and to ensure internal validity, it
is necessary to create a solid base for assessing the results of the case
study. There are three ways to arrange the study to facilitate this [6].
A comparison of the results of using the new method against a company
baseline is one solution. The company should gather data from standard
projects and calculate characteristics like average productivity and defect
rate. Then it is possible to compare the results from the case study with the
figures from the baseline.
A sister project can be chosen as a baseline. The project under study
uses the new method and the sister project the current one. Both projects
should have the same characteristics, i.e. the projects must be comparable.
If the method applies to individual product components, it could be applied at random to some components and not to others. This is very similar
to an experiment, but since the projects are not drawn at random from the
population of all projects, it is not an experiment.
13.5.3 Confounding Factors and Other Aspects
When performing case studies it is necessary to minimise the effects of
confounding factors. A confounding factor is, as described in Sect. 13.4, a
factor that makes it impossible to distinguish the effects of two factors
from each other. This is important since we do not have the same control
over a case study as in an experiment. For example, it may be difficult to
tell if a better result depends on the tool or the experience of the user of the
tool. Confounding effects could involve problems with learning how to use
424
Claes Wohlin, Martin Höst, Kennet Henningsson
a tool or method when trying to assess its benefits, or using very enthusiastic or sceptical staff.
There are both pros and cons with case studies. Case studies are valuable because they incorporate qualities that an experiment cannot visualise,
e.g. scale, complexity, unpredictability and dynamism. Some potential
problems with case studies are as follows.
A small or simplified case study is seldom a good instrument for discovering Web and software engineering principles and techniques. Increases in scale lead to changes in the type of problems that become most
indicative. In other words, the problem may be different in a small case
study and in a large case study, although the objective is to study the same
issues. For example, in a small case study the main problem may be the
actual technique being studied, and in a large case study the major problem
may be the number of people involved and hence also the communication
between people.
Researchers are not completely in control of a case study situation. This
is good, from one perspective, because unpredictable changes frequently
tell them much about the problems being studied. The problem is that we
cannot be sure about the effects due to confounding factors.
More information on case study research can be found in [12] and [14].
13.6 Survey
Surveys are conducted when the use of a technique or tool has already
taken place [13] or before it is introduced. It could be seen as a snapshot of
the situation to capture the current status. Surveys could, for example, be
used for opinion polls and market research.
When performing survey research the interest may be, for example, in
studying how a new Web development process has improved the developer’s attitudes towards quality assurance. Then a sample of developers is
selected from all the developers at the company. A questionnaire is constructed to obtain information needed for the research. The questionnaires
are answered by the sample of developers. The information collected is
then arranged into a form that can be handled in a quantitative or qualitative manner.
13.6.1 Survey Characteristics
Sample surveys are almost never conducted to create an understanding of
the particular sample. Instead, the purpose is to understand the population,
from which the sample was drawn [15]. For example, by interviewing 25
Empirical Research Methods in Web and Software Engineering
425
developers on what they think about a new process, the opinion of the larger population of 100 developers in the company can be predicted. Surveys aim at the development of generalised suggestions.
Surveys have the ability to provide a large number of variables to evaluate, but it is necessary to aim at obtaining the largest amount of understanding from the smallest number of variables since this reduction also
eases the analysis work.
It is not necessary to guess which are the most relevant variables in the
initial design of the study. The survey format allows the collection of many
variables, which in many cases may be quantified and processed by computers. This makes it is possible to construct a variety of explanatory models and then select the one that best fits the purposes of the investigation.
13.6.2 Survey Purposes
The general objective for conducting a survey is one of the following [15]:
•
•
•
Descriptive.
Explanatory.
Explorative.
Descriptive surveys can be conducted to enable assertions about some
population. This could be determining the distribution of certain characteristics or attributes. The concern is not about why the observed distribution
exists, rather what it is.
Explanatory surveys aim at making explanatory claims about the population. For example, when studying how developers use a certain inspection technique, we might want to explain why some developers prefer one
technique while others prefer another. By examining the relationships between different candidate techniques and several explanatory variables, we
may try to explain why developers choose one of the techniques.
Finally, explorative surveys are used as a pre-study to a more thorough
investigation to ensure that important issues are not foreseen. Creating a
loosely structured questionnaire and letting a sample from the population
answer it could do this. The information is gathered and analysed, and the
results are used to improve the full investigation. In other words, the explorative survey does not answer the basic research question, but it may
provide new possibilities that could be analysed and should therefore be
followed up in the more focused or thorough survey.
426
Claes Wohlin, Martin Höst, Kennet Henningsson
13.6.3 Data Collection
The two most common means for data collection are questionnaires and
interviews [15]. Questionnaires could be provided both in paper form or in
some electronic form, e.g. e-mail or Web pages. The basic method for data
collection through questionnaires is to send out the questionnaire together
with instructions on how to fill it in. The responding person answers the
questionnaire and then returns it to the researcher.
Letting interviewers handle the questionnaires (by telephone or face-toface), instead of the respondents themselves, offers a number of advantages:
•
•
•
Interview surveys typically achieve higher response rates than, for
example, mail surveys.
An interviewer generally decreases the number of “do not know” and
“no answer” responses, because (s)he can answer questions about the
questionnaire.
It is possible for the interviewer to observe and ask questions.
The disadvantage is the cost and time, which depend on the size of the
sample, and they are also related to the intentions of the investigation.
13.7 Post-mortem Analysis
Post-mortem analysis is a research method studying the past, but also focusing on the typical situation that has occurred. Thus, a post-mortem analysis
is similar to the case study in terms of scope and to the survey in that it looks
at the past. The basic idea behind post-mortem analysis is to capture the
knowledge and experience from a specific case or activity after it has been
finished. In [21] two types of post-mortem analysis are identified: a general
post-mortem analysis capturing all available information from an activity or
a focused post-mortem analysis for a specific activity, e.g. cost estimation.
According to [21], post-mortem analysis has mainly been targeted at large
Web and software projects to learn from their success or recovery from a
failure. An example of such a process is proposed by [22]. The steps are:
1. Project survey.
The objective is to use a survey to collect information about the project
from the participants. The use of a survey ensures that confidentiality
can be guaranteed.
2. Collect objective information.
In the second step, objective information that reveals the health of the
project is collected. This includes defect data, person hours spent and
so forth.
Empirical Research Methods in Web and Software Engineering
427
3. Debriefing meeting.
A meeting is held to capture issues that were not covered by the survey. In addition, it provides the project participants with an opportunity
to express their views.
4. Project history day.
The history day is conducted with a selected subset of the people involved to review project events and project data.
5. Publish the results.
Finally, a report is published. The report is focused on the lessonslearned and is used to guide organisational improvement.
To support small- and medium-sized companies, [21] discusses a lightweight approach to post-mortem analysis, which focuses on a few vital
activities and highlights that:
•
•
•
Post-mortem analyses should be open to participation by all team
members and other stakeholders.
Goals may be used to focus the discussions, but this is not necessary.
The post-mortem process consists of three main phases: preparation,
data collection and analysis. These phases are further discussed in [21].
Post-mortem analyses are a flexible type of analysis method. The actual
object to be studied (a whole project or specific activity) and the type of
questions posed are very much dependent on the actual situation and the
objectives of the analysis.
The referenced articles or the book by Whitten [23] provide more information on post-mortem analysis/review.
Finally, it should be noted that empirical methods also provide positive
side effects such as knowledge sharing, which is an added value of conducting an empirical study. This is true for all types of empirical studies. In
an experiment, the subjects learn from comparing competing methods or
techniques. This is in particular true if the subjects are debriefed afterwards in terms of obtaining information about the objective and the outcome of the experiment. In case studies and post-mortem analyses the persons participating obtain a new perspective of their work and they often
reflect on their way of working through the participation in the empirical
study. Finally, in the survey the learning comes from comparing the answers given with the general outcome of the survey. This allows individuals to put their own answers into a more general context.
428
Claes Wohlin, Martin Höst, Kennet Henningsson
13.8 Summary
This chapter has provided a brief overview of four empirical research
methods with a primary focus on methods that contain some quantitative
part. The four methods are: controlled experiments, case studies, surveys
and post-mortem analyses. The main objective has been to introduce them
so that people intending to conduct empirical studies can make an appropriate selection of an empirical research method in a Web or software engineering context.
Moreover, the presented methods must be seen as complementary in
that they can be applied at different stages in the research process. This
means that they can, together in a suitable combination, support each other
and hence provide a good basis for sustainable improvement in Web and
software development.
References
1
Garvin DA (1998) Building a Learning Organization. Harvard Business Review on Knowledge Management, 47–80, Harvard Business School Press,
Boston, USA
2
Basili VR, Caldiera G, Rombach HD (2002) Experience Factory. In: Marciniak JJ (ed.) Encyclopaedia of Software Engineering, John Wiley & Sons,
Hoboken, NJ, USA
3
Creswell JW (1994) Research Design, Qualitative and Quantitative Approaches, Sage Publications, London, UK
4
Denzin NK, Lincoln YS (1994) Handbook of Qualitative Research, Sage
Publications, London, UK
5
Fenton N, Pfleeger SL (1996) Software Metrics: A Rigorous & Practical
Approach, 2nd edition, International Thomson Computer Press, London, UK
6
Kitchenham B, Pickard L, Pfleeger SL (1995) Case Studies for Method and
Tool Evaluation. IEEE Software, July, 52–62
7
Montgomery DC (1997) Design and Analysis of Experiments, 4th edition,
John Wiley & Sons, New York, USA
8
Siegel S, Castellan J (1998) Nonparametric Statistics for the Behavioral Sciences, 2nd edition, McGraw-Hill International, New York, USA
9
Robson C (2002) Real World Research, 2nd edition, Blackwell, Oxford, UK
10 Zelkowitz MV, Wallace DR (1998) Experimental Models for Validating
Technology. IEEE Computer, 31(5):23–31
nd
11 Manly BFJ (1994) Multivariate Statistical Methods - A Primer, 2 edition,
Chapman & Hall, London
Empirical Research Methods in Web and Software Engineering
429
12 Stake RE (1995) The Art of Case Study Research, SAGE Publications, London, UK
13 Pfleeger S (1994–1995) Experimental Design and Analysis in Software Engineering Parts 1–5. ACM Sigsoft, Software Engineering Notes, 19(4):16–20;
20(1):22–26; 20(2):14–16; 20(3):13–15; 20(4):14–17
14 Yin RK (1994) Case Study Research Design and Methods, Sage Publications,
Beverly Hills, CA, USA
15 Babbie E (1990) Survey Research Methods, Wadsworth, Monterey, CA, USA
16 Tukey JW (1977) Exploratory Data Analysis, Addison-Wesley, Reading,
MA, USA
17 Robson C (1994) Design and Statistics in Psychology, 3rd edition, Penguin
Books, London, UK
18 Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (1999)
Experimentation in Software Engineering – An Introduction, Kluwer Academic Publishers, Boston, MA, USA
19 Judd CM, Smith ER, Kidder LH (1991) Research Methods in Social Relations, Harcourt Brace Jovanovich College Publishers, Forth Worth, TX, USA,
6th edition
20 Juristo N, Moreno A (2001) Basics of Software Engineering Experimentation,
Kluwer Academic Publishers, Boston, MA, USA
21 Birk A, Dingsøyr T, Stålhane T (2002) Postmortem: Never Leave a Project
without It. IEEE Software, May/June, 43–45
22 Collier B, DeMarco T, Fearey P (1996) A Defined Process for Project Postmortem Review. IEEE Software, July, 65–72
23 Whitten N (1995) Managing Software Development Projects - Formula for
Success, John Wiley & Sons, NY, USA
Authors Biographies
Dr. Claes Wohlin is a Professor of Software Engineering at Blekinge Institute of
Technology in Sweden and also Pro Vice Chancellor of the Institute. Prior to this,
he held chairs at Lund University and Linköping University. He has a PhD in
Communication Systems from Lund University and five years of industrial experience. His research interests include empirical methods in software engineering, software metrics, software quality and systematic improvement in software
engineering. Claes Wohlin is the principal author of the book “Experimentation in
Software Engineering – An Introduction” published by Kluwer Academic Publishers in 1999. He is co-editor-in-chief of the Journal of Information and Software Technology published by Elsevier. Dr. Wohlin is on the editorial boards of
Empirical Software Engineering: An International Journal, Software Quality Journal and Requirements Engineering Journal. He was the recipient of Telenor´s
Nordic Research Prize in 2004 for his achievements in software engineering and
430
Claes Wohlin, Martin Höst, Kennet Henningsson
improvement of reliability in telecommunication systems. He is a Visiting Pro–
fessor at Chalmers University of Technology working at the IT-University in
Göteborg.
Dr. Martin Höst is an Associate Professor in Software Engineering in the Software Engineering Research Group at the Department of Communication Systems,
Lund University, Sweden. He received a MSc from Lund University in 1992 and a
PhD in Software Engineering from the same university in 1999. His main research
interests include Software Process Improvement, Empirical Software Engineering,
Software Performance Engineering, and Computer simulation of Software development processes. The research is conducted through empirical methods such as
controlled experiments, surveys and case studies. Martin Höst has published more
than 40 papers in international journals, conference proceedings and workshop
proceedings.
Kennet Henningsson is a Ph.D. student in Software Engineering at Blekinge
Institute of Technology in Sweden. He received his MSc in Software Engineering,
with a focus on Management, in 2001 from Blekinge Institute of Technology and
a Licentiate degree in Software Engineering in 2005 from the same university. His
research interests are Fault-based software process improvement, Project Management, and Monitoring of Effort and Software Quality.
Index
A
AdjustedSize measure 76
AIU See Abstract Information Units
Algorithmic model
Generic 33
Regression-based 33
Multiple Regression 33
Stepwise regression 33
Algorithmic techniques 32
ANOVA See Analysis of Variance
Artificial Intelligence Techniques 34
Attributes
External 16
Internal 16
B
Bar charts 48, 82
Boxplots 48, 82
Central value 82
Extreme outliers 82
Inter-quartile range 82
Outliers 82
Whiskers 82
C
CART 38, See Classification and
Regression Trees
Case study 412, 422
Amazon and Cuspide 130
analysis 136
calculated elementary
indicators 134
conceptual schema 131
design phase 131
external quality requirements
130
LSP aggregation schema
design 135
partial and global evaluation
134
DEI application 166
Design inspections 167
heuristic evaluation 172
navigation sequences 171
traffic statistics 170
Web usage analysis 170
on-line shop Web application 205
black-box testing 208
deployment 207
middle layer design 207
performance requirements 206
response time assessment 209
system’s throughput 209
Web page response time 209
white-box testing 210
workload models 206
Process improvement 263
Valencia CF Web Application
287
advanced navigational features
295
AIUs 292, 298
class diagram 290
context relationships 295
exploration contexts 298
filters 295
functional model 290
graphical interface 297
information area 298
navigational area 298
navigational context 298
navigational links 299
navigational map 291, 298
navigational menu 298
presentation model 291, 297
state transition diagram 290
structural model 288
user diagram 291, 292
Web conference manager 346
hyperbase 349
information model 348
navigation model 353
432
Index
presentation model 356
service model 358
Web effort estimation model
Data validation 44
equation 67
Preliminary analysis 47
Web effort estimation model 42
Web effort estimation model
model validation 67
Web productivity measurement
lower productivity bounds 97
productivity equation 95
upper productivity bounds 97
Web productivity measurement
Data validation 79
Preliminary Analyses 81
Web productivity measurement
productivity analysis 96
Web testing 246
Case-based reasoning 34
Inverse rank weighted mean 37
Maximum distance 36
Mean 37
Median 37
Parameters 35
adaptation rules 38
analogy adaptation 37
Feature subset selection 35
number of analogies 37
scaling 37
similarity measure 35
Steps 35
Unweighted Euclidean distance
35
Weighted Euclidean distance 36
CBR See Case-based reasoning
Classification and Regression Trees
38
Company baseline 423
Component-based technologies
191
Conceptual Schema 277
Confounding effects 423
Confounding factor 423
Confounding factors 423
Cook’s D statistic 62
Cross-validation 41
D
Dependent variables 418
Descriptive statistics 367
Descriptive surveys 425
differences from software
development
Application Characteristics 5, 24
Approach to Quality Delivered 5,
24
Architecture and Network 5, 24
Availability of the Application 5,
24
Customers (Stakeholders) 5, 24
Development Process Drivers 5,
24
Disciplines Involved 5, 24
Information Structuring and
Design 5, 24
Legal, social, and ethical issues
5, 24
People involved in development
5, 24
Primary Technologies Used 5,
24
Update Rate (Maintenance
cycles) 5, 24
E
Effort estimation
Purpose 30
empirical investigation
steps 20
Empirical investigation 17
Alternative hypothesis 20
Analysing the data 23
Case study 18
Control object 22
Dependent variable 20
Experimental object 22
Experimental subject 22
Experimental unit 22
Formal experiment 19
Goals 20
Independent variable 20
Null hypothesis 20
Pilot study 23
Index
preparation 21
Reporting the results 23
Research hypothesis 20
Survey 18
Treatment 22
Engineering 13
Evaluation 144
Experiment 412
Experimentation 17
Expert opinion 31
Explanatory surveys 425
Explorative surveys 425
H
Histograms 48, 82
Horizontal replication 193, 196
Hypothesis
Attributes 15
Variables 15
I
Independent variables 418
Interviewer 426
Interviews 426
ISO 9241 146
iterative design 144
L
ln See Natural log
M
Magnitude of Relative Error 40
Manual stepwise regression 60, 79,
91
Steps 60, 91
MdMRE See Median Magnitude of
Relative Error
Mean Magnitude of Relative Error
39
Measurement
Entity 15
process 15
product 15
resource 16
Measurement theory 16
433
Median Magnitude of Relative Error
39
MMRE See Mean Maginitude of
Relative Error
MRE See Magnitude of Relative
Error
Multivariate regression
Assumptions 81
Model stability 92
N
Natural log 52
Normalising data 52
O
Object Management Group See
OMG
Object Oriented Web Solutions See
OOWS
OMG See
Meta-meta-model 337
Meta-models 337
Models 337
OO-Method 278, 279
Dynamic Model 280
Functional Model 280
Structural Model 279
System Specification 279
OOWS 280
Abstract information units 282
Advanced Navigational Features
284
Application tier 286
Authoring-in-the-large 281
Authoring-in-the-small 282
Class views 282
Navigation specifications 281
Navigational classes 282
mandatory 283
optional 283
Navigational contexts 281
exploration navigational
contexts 282
Navigational links 282
contextual links 282
exploration links 282
434
Index
non contextual links 282
sequence links 282
Navigational Map 281
Navigational paths 282
Navigational relationships 283
context attribute 284
context dependency
relationship 283
context relationship 284
link attribute 284
Navigational subsystem 282
OlivaNova Model 286
Persistence tier 286
Population condition filter 283
Presentation Model 285
Presentation patterns 285
information paging 285
ordering criteria 285
Presentation requirements 285
Presentation tier 286
Search mechanisms 284
filter 284
index 284
static population condition 285
Service links 283
User Diagram 280
Users 281
anonymous 281
generic 281
registered 281
Web page 286
information area 286
navigation area 287
P
Performance improvement
System scale-out 204
System scale-up 204
System tuning 203
Plots
Normal P-P Plot 62
Residual plot 62
Post-mortem analysis 413
Pred(n) See Prediction at level n
Prediction at level n 39
Prediction process 29
Predictive accuracy 39, 40
Predictive power 39
Process improvement cycle
IDEAL model 265
Productivity 75
Productivity measurement method
76
Benefits 76
Q
Qualitative research 411
Quality 109
Attribute 109
Calculable concept 110
Entity 109
External quality 115
ISO/IEC 9126 113
ISO/IEC 9126-1 114
external quality 114
internal Quality 114
quality in use 114
Metric 111
Quality in use 117
Quality model 119
Quality requirements 112
Web application
information quality 122
Quantitative research 411
Quasi-experiments 412
Questionnaires 426
R
2
r See R-squared
Regression analysis
assumptions 47
Reliability 7
Representational theory of
measurement 16
S
Scale type
Interval 366
Likert-type 366
Nominal 365
Ordinal 366
Ratio 366
Scatter plot 88
Index
Scatter plots 54
Negative association 57
Positive association 57
scientific knowledge 13
Scientific principles 14
Scientific process
Hypothesis 14
Prediction 14
Validation 14
Scientific process 14
Observation 14
Scientific process 15
Scripting technologies 191
Security 7
Size-based effort estimation model
76
Small projects 261
software development 4
Software process improvement
Motivations 262
Process assessment 267
Spurious relationships 88
State variables 413
Statistics
Absolute estimation error 405
Analysis of variance 395
between groups variance 402
F ratio 403
mean square error 395
within groups variance 402
Central Limit Theorem 377
Chi-square statistic 382
Chi-square test 380
Contingency table 380
Correlation
correlation coefficient 384,
390
Pearson’s correlation 387
Pearson’s correlation
coefficient 387
Spearman’s rank correlation
385
Spearman’s rank correlation
coefficient 385
Dependent variable 379
Distributions
Chi-square distribution 379
435
Fisher F-distribution 379
Normal distribution 374
Student t-distribution 379
Estimation error 405
Frequency distributions 372
homoscedasticity 397
Mean standard error 377
Median 368
Mode 368
Population 376
Range 369
Regression analysis 390
error 393
estimation error 392
least-squares method 390
Multiple regression 394
multiple regression analysis
390
regression line 392
residuals 390
R-squared 394
significance 395
simple regression analysis 390
total squared error 393
Sample 376
Significance level 383
Standard deviation 369
Variable’s rank 385
variables independence 381
Variance 369
Wilcoxon signed-rank test 405
z value 406
Survey 412, 424
T
Tukutuku database 42
Tukutuku Project 42
U
Usability 7, 146
Accessibility 147
Client statistics 166
co-discovery 158
Diagnostic statistics 166
effectiveness 146
efficiency 146
436
Index
Efficiency 146
Evaluation
Automatic tools 165
goals 156
Few errors 146
formative evaluation 157
Learnability 146
Memorability 146
Nielsen's ten golden rules 149
Referrer statistics 166
satisfaction 146
Site traffic reports 166
summative evaluation 157
think aloud protocol 158
Tools for accessibility analysis
165
Tools for usability analysis 166
Tools for Web usage analysis
166
User statistics 166
User testing 157
design 157
Users’ satisfaction 146
Web Accessibility Initiative 148
Usability engineering 145
Usability Engineering 149
usability evaluation 144
Usability inspection 159
Cognitive walkthrough 161
Heuristic evaluation 160
Usability testing 157
usable applications 144
V
Variable
Control a variable 22
Vertical replication 193, 194
W
W2000 336, 341
Clusters 340
association clusters 340
collection clusters 340
structural clusters 340
transactional clusters 340
Context 342
Customisation activities 342
Diagrams 337
Elements 337
Hypermedia design 342
Links 340
Model 337, 341
horizontal relationships 343
information 341
in-the-large 341
in-the-small 341
navigation 341
presentation 341
rule 344
services 341
transformation rules 344
vertical relationships 343
Navigation in-the-large 353
NLinks 340
Nodes 340
Package 337
access structures 338, 339
association centre 338
collections 339
components 338
connectible elements 338
entities 338
hyperbase 338
information 338
navigation 339
presentation 340
segments 338
semantic associations 338
services 340
slots 338
Pages 340, 356
Process clusters 355
Processes 361
Requirements analysis 342
Section 356
Sections 340
auxiliary sections 340
contents sections 340
Service design 342
Special-purpose constraints 340
Toolset 346
Topological constraints 340
Units 356
Index
Web operations 359
processes 360
simple operations 360
Web 1, 3, 29
differences from software
development 5
Web and software development
Application Characteristics 5
Approach to Quality Delivered 6
Architecture and Network 9
Availability of the Application 7
Customers 7
Development Process Drivers 7
Disciplines Involved 10
Information Structuring and
Design 11
Legal, social, and ethical issues
10
Maintenance cycles 8
People involved in development
8
Primary Technologies Used 6
Stakeholders 7
Update Rate 8
Web and software development,
differences 5
Web application 3, 4, 277, 335, See
Web applications
application logic
middle layer 190
development 149
Dynamic Web resources 184
Family of applications 336
Functional testing 197
analytical distributions 198
file lists 198
simple workload patterns 198
traffic generator 199
Multimedia resources 185
Performance testing
black-box testing 200
data collection 199
hot spots 203
performance index 200
performance trends 201
resource performance indexes
202
437
sample granularity 200
white-box testing 202
predominantly dynamic 186
predominantly multimedia 187
predominantly secure 186
predominantly static 186
Reliability measurement
steps 181
Secure Web resources 185
Static Web resources 184
Usability
efficiency 147
Few errors 147
learnability 147
Memorability 147
Users’ satisfaction 147
usability evaluation 156
Volatile Web resources 185
Web hypermedia application 3
Web software application 3
Web usage analysis 162
Web application testing 222
Accessibility testing 224
Back-box testing strategies 237
Black-box testing 226
Compatibility testing 224
Failures 222
Grey-box testing 227
Grey-box testing strategies 241
Integration testing 232
Load testing 223
Performance testing 222
Security testing 225
Stress testing 223
Testing functional requirements
225
Testing tools 243
Unit testing 230
Usability testing 224
User session-based testing 242
White-box testing 226
White-box testing strategies 234
Web applications 1, 2, 4, 120, See
Web application, See Web
application
Web designer 9
Web development 4
438
Index
Web engineering 2
Web hypermedia application 3
Web Quality Evaluation Method
See WebQEM
Web reliability measurement
workload models 188
workload intensity 188
workload mixes 188
Web service
application logic 189
dynamic request 189
HTTP interface 189
information source 190
Web services 183
Web site See Web application
Web software application 3
Web usability 147
access concepts 153
composite data structure 155
content visibility 151
core concept
description 155
pages 155
hypertext modularity 151
areas 151
global landmarks 151
hierarchical landmarks 152
local landmarks 151
navigational access mechanisms
153
Web Usage Mining 163
WebQEM 123
Elementary Measurement and
Evaluation 125
Global evaluation 128
Process steps 124
Quality requirements
specification 125
WebQEM_Tool 129
World Wide Web See Web