Language Technologies for Lifelong
Lear ning
LTfLL - 200 8- 212578
Pr oj e ct D e live r a ble Re port
D e liv e r a ble 2 .2 – Ex ist in g Ser vices – in t e gr a t e d
W ork Packa ge
2
Ta sk
2 .2 , 2.4
Da te of delive ry
Cont ractua l: 28 -02-2009
Code na m e
D2 .2
Type of delive rable
< report >
Se curity
< PU >
( dist r ib ut ion le ve l)
Contribut ors
Act ua l: 13-04-2009
Ve rsion: 1.1
Draft
Final
Conta ct Pe rson
Reinhard Diet l, Fridolin Wild, Bernhard Hoisl, Robert Koblischke
( WUW) , Berit Richt er , Markus Essl, Gerhard Doppler ( BI T MEDI A) ,
Traian Rebedea, St efan Trausan -Mat u (PUB-NCI T) , Philippe
Dessus (UPMF)
Reinhard Diet l, Fridolin Wild, Bernhard Hoisl, Robert Koblischke
( WUW) , Berit Richt er ( BI T MEDI A) , Paola Monachesi (UU) , Kiril
Sim ov (I PP-BAS) , Traian Rebedea (PUB-NCI T) , Sonia Mandin,
Virginie Zam pa (UPMF)
Fridolin Wild (WUW)
W P/ Task responsible
WUW
EC Proj ect Office r
Mr. M. Máj ek
Ab stract
( for disse m ina tion )
This deliv erable gives a snapshot on t he it erat ively developing
prot ot ypes. I t focuses on t he int egrat ion of t he show cases i.e. on
t urning t he exist ing t echnology int o web-based, serv ice- orient ed
applicat ions, and st ruct uring t heir soft ware archit ect ure along t he
concept elaborat ed in t he prev ious infrast ruct ure deliver able D2.1.
This involves m ainly the re-use of ex ist ing technology (hence t he
nam e ‘showcases’) , alt hough sev eral of t hese t ask showcases
already involve new developm ent s. The presentat ion of t he
int egrat ion of exist ing serv ices is w rapped up by an ov erview on
t he int eroperabilit y charact erist ics. Addit ionally, t he int egrat ion
st rat egy pursued is elaborat ed, providing m ore det ails about t he
soft ware developm ent process and t he int egrat ion workflow.
Service Concept , NLP, Educat ional Mash- Ups
Au thors ( Pa rtne r)
Ke yw ords List
LTfLL Proj ect Coordinat ion at : Open Universit y of t he Net herlands
Valkenburgerweg 177 , 6419 AT Heerlen, The Net herlands
Tel: + 31 45 5762624 – Fax : + 31 45 5762800
Ex ist in g Se rv ices – I n t eg ra t e d
Table of Contents
Executive Summary ................................................................................................................ 3
1. Introduction.......................................................................................................................... 4
2. Positioning Showcases (WP4)............................................................................................ 6
2.1 Positioning from Portfolios (T4.1)............................................................................... 6
2.2 Conceptual Development (T4.2).................................................................................. 8
3. Support and Feedback Showcases (WP5) ....................................................................... 12
3.1 Interaction Analysis (T5.1)......................................................................................... 12
3.2 Assessing Textual Products (T5.2) ............................................................................ 15
4. Social and Informal Learning Showcases (WP6) ........................................................... 20
4.1 Task 6.1 – Knowledge sharing network .................................................................... 20
4.2 Task 6.2 – Social component ..................................................................................... 24
5. Overview on Showcase Characteristics ........................................................................... 31
6. Integration Strategy........................................................................................................... 32
6.1 Position of Software Integration within the Project.................................................. 32
6.2 Integration Workflow ................................................................................................. 34
7. Versioning Policy.............................................................................................................. 40
References.............................................................................................................................. 42
LTfLL - 2008 - 212578
2
Ex ist in g Se rv ices – I n t eg ra t e d
Executive Summary
Within this deliverable, a current snapshot on the iteratively developing prototypes is
given. The integration of these showcases focuses mainly on turning the existing
technology into web-based, service-oriented applications, and structuring their software
architecture along the concept elaborated in the previous infrastructure deliverable D2.1.
This involves mainly the re-use of existing technology (hence the name ‘showcases’),
although several of these task showcases already involve new developments.
The task showcases documented here encompass two prototypes per work package, one
for each task outlined in the description of work. This includes:
T4.1: Positioning from Portfolios
T4.2: Conceptual Development
T5.1: Interaction Analysis
T5.2: Assessing Textual Products
T6.1: Knowledge Sharing Network
T6.2: Social Component
The presentation of the integration of existing services is wrapped up by an overview on
the interoperability characteristics of these showcases.
Additionally, the integration strategy pursued is elaborated in section 6, thereby
providing more details about the software development process and the integration
workflow. Finally, section 7 updates the versioning policy in brief.
LTfLL - 2008 - 212578
3
Ex ist in g Se rv ices – I n t eg ra t e d
1. Introduction
Within the first deliverable D2.1 of this work package on integration, a concept had been
elaborated for a service-oriented development framework that is strong enough to
integrate the technologies considered essential for the tasks of the work packages 4, 5,
and 6 while at the same time respecting their heterogeneity. This concept incorporates the
segmentation of the involved systems and components into three layers: a layer each for
widgets, services, and data storage. The service layer itself can further be divided into
service logic and application logic thus encapsulating the service invocation, execution,
and delivery routines and the application specific processing routines respectively.
Following this encapsulated approach with well-defined interfaces has many advantages
in the sense of interoperability, transferability, and reusability of the deployed software
components. A widget-based architecture seems to be the right choice to handle the very
heterogeneous software developments of the different partners. Defining clear interfaces
makes it possible to have loosely-coupled and spatially distributed applications working
together.
Therefore, it is essential to ‘widgetise’ the deployed software which means having a GUIoutput being able to be viewed in any web-browser at the client-layer (web-widget)
which can be addressed using web-services. A first step in this direction is the
widgetisation and servicification of existing showcases: partners need to plan their further
developments for the version 1 of integrated services in form of widgets and services.
This means not necessarily adapting all existing showcases – as these are only
prototypes; it means, however, testing the adaptation where the showcase forms already a
basis for future developments.
Within this deliverable, a current snapshot on the iteratively developing prototypes is
given, thereby following the former described conceptual approach. The integration of
these showcases focuses mainly on turning the existing technology into web-based
applications and structuring their software architecture along the concept touched upon
above and elaborated in more detail in deliverable D2.1. This involves mainly the use of
existing technology, although several of the task prototypes already involve new
developments.
LTfLL - 2008 - 212578
4
Ex ist in g Se rv ices – I n t eg ra t e d
The task prototypes documented here encompass two prototypes per work package, one
for each task outlined in the description of work. This includes:
T4.1: Positioning from Portfolios
T4.2: Conceptual Development
T5.1: Interaction Analysis
T5.2: Assessing Textual Products
T6.1: Knowledge Sharing Network
T6.2: Social Component
LTfLL - 2008 - 212578
5
Ex ist in g Se rv ices – I n t eg ra t e d
2. Positioning Showcases (WP4)
The work within work package 4 on positioning is organized into two complementary
tasks. Task T4.1 focuses on the analysis of portfolios in order to determine a learner’s
current standing (position) in a given domain. Task T4.2 on the contrary concentrates on
diagnosing a learner’s conceptual development. The following subsections outline the
prototypes developed for showcasing both tasks.
2.1 Positioning from Portfolios (T4.1)
As outlined in D2.1, the web service architecture is composed of four layers. The
following sections will outline the R-based LSA framework for web services applied to
this 4-tier-architecture.
On the client layer, a web browser can use any HTTP based communication mechanism
to access web services via RESTful requests. The server hosting the service framework
handles the request by invoking the applications on the service layer and dispatches any
parameters that have been passed to the server.
Within the service layer, the service framework relies on code written in R which is used
to transform, validate, and then communicate the parameters to the application logic
layer. Invocation of the underlying R-scripts is handled by an Apache server, which is
equipped with a module called ‘Rapache’ (Horner, 2009). On arrival of a request for an R
web service at the Apache server, Rapache invokes an instance of the R shared library,
executes the required R web-service script and enables the R framework to access all
information that has been passed to the Apache server by the client (via the HTTP
protocol).
After computation of the application logic – successful or not – the service layer returns a
custom XML structure representing the result object. The actual structure to be used can
be freely chosen by the web service implementer. This individual choice is given to
developers as it enables quick development of XML interactions for simple tasks without
limiting the implementation of a full-fledged communication architecture based on a
standard XML protocol (e.g. SOAP), which would have been the case if e.g. XML-RPC
would have been chosen as the standard protocol.
The application logic layer consists of a service-specific R function which is able to
perform LSA processes with only the parameter list passed by the service layer.
LTfLL - 2008 - 212578
6
Ex ist in g Se rv ices – I n t eg ra t e d
The backend layer ensures the presence of appropriate access functions to the current
space warehouse implementation in the environment (‘scope’) of the dynamic LSA
processes.
Decoupling the LSA process logic from the service layer framework has the following
advantages:
1. A dedicated LSA process logic developer can implement logical functions with
neither knowing, nor caring about the actual implementation of the space
warehouse on the one hand and the service layer on the other hand.
2. The architecture the usage of the storage access methods provided by the storage
layer, which makes the handling of large spaces on the backend layer independent
of the application logic.
3. Having the whole application logic present as a single R language object enables
the manipulation of the logic itself using R, enabling the optimization e.g. of the
data handling by reducing copy-procedures of values (Oehlschlägel, et al., 2008).
The LSA process logic then returns a result object, which is – depending on the task –any
R object, including lists, arrays or even binary image data which may be generated by a
graphics implementation. This data is then passed to the service layer for transformation
and communication to the client layer.
The backend layer provides help for implementers of LSA process logic who may face
several problems concerning the retrieval of the space objects:
Storage requirements for spaces: Depending on the application scenario of the
LSA process logic, spaces can be very large and might not fit into the main
memory of the client, nor server.
Transfer bottlenecks: Depending on the storage device, retrieval of a space from
that device may be slow.
Retrieval overheads: storage mechanisms like compression or serialization create
a computational overhead when retrieving a space.
Simultaneous access: Some applications scenarios (like web services) may require
instant access to space objects for multiple clients at the same time.
The considerations above lead to the development of three types of space storage
mechanisms which can be accessed by the applications layer using a common interface:
LTfLL - 2008 - 212578
7
Ex ist in g Se rv ices – I n t eg ra t e d
The monolithic warehouse approach keeps a central R instance permanently
open, holding all space objects previously calculated in main memory. LSA
processing logic is passed into this R instance and executed there, locally. This
approach eliminates all overhead created by copying large space objects between
storage media as they are accessed directly from memory. It has to be kept in
mind that the central R instance holding the spaces is unavailable to other requests
until the LSA processing logic has finished.
The inter-instance copy approach keeps – like the monolithic approach – all
spaces in main memory. Application logic is – in contrast – not executed in the
storage R instance, but rather, a copy of the original object is passed to the
Rapache instance, freeing the central R instance's access interface again as soon
as the copy has been generated.
The serialisation approach keeps all space objects in a binary file on the server's
hard disk. This has the advantage that on most servers, hard disk space will by far
exceed main memory, and for that reason, storage should be less a problem. On
the downside, (de-)serialization of space objects may be – depending on the
hardware used – a time consuming task which may slow down the LSA process.
2.2 Conceptual Development (T4.2)
Following the proposal of D2.1, the architecture of the showcase prototype for
monitoring the conceptual development of a learner can be segmented into four layers. A
client layer communicates through a service layer with the application logic that again is
supported by particular functionalities (most notably data storage and retrieval
functionalities) coming from a back-end layer (see Fig. 1).
Fig. 1: Architectural layers of the system.
LTfLL - 2008 - 212578
8
Ex ist in g Se rv ices – I n t eg ra t e d
The client layer (see Fig. 2) can be seen as running the visible user interface of the
prototype, typically via a web browser. The evaluation client consists of two components,
the first being a form with a text area into which learners copy & paste their textual input.
The second widget is a visualization of the concept graph which is called upon submitting
the textual input. This graph then renders the relationships between the key concepts
triggered by the textual input through a force-direction lay-out: terms semantically close
to each other will be placed in short distance of each other, while terms distant will be
positioned further apart from each other. The colors of the concept map indicate to which
cluster the terms belong. More details on the calculation will be outlined below in the
description of the other layers.
Fig. 2: Screenshot of the user interface.
The service layer holds the counterpart to the client layer: it provides the html-pages,
receives the textual input via HTTP using POST, dispatches it to the force-directed layout and returns an HTML-wrapped flash application in order to deliver the requested
visualization.
The application logic layer consists of two components; both of them can also be
accessed directly as services. The force-directed lay-out is a flash application written in
flex that uses the Prefuse Flare visualization toolkit (see http://flare.prefuse.org/) in order
to render the underlying graph of conceptual relations among the resulting concepts with
the help of a physics simulation of interacting forces. By dispatching the received input
text to the second component – termsims –, the resulting graph is reconstructed from the
LTfLL - 2008 - 212578
9
Ex ist in g Se rv ices – I n t eg ra t e d
graphML transport format (see http://graphml.graphdrawing.org/) and is dynamically
visualized on the flash stage.
This second service serves to calculate term-to-term similarities by first folding the
textual input into a pre-existing latent-semantic space and by extracting those most
prominent terms from the resulting lower-order text vector in this space by filtering for
the 30 most frequent terms that load with a frequency in the latent-semantic space higher
than a given threshold of .05. These terms are considered to be concepts describing the
textual input in this lower-order latent-semantic space. The size of the node is calculated
to be ten times the value of their frequency in the latent-semantic space (plus 1).
By subsequently calculating the term-to-term cosine distances of these terms, a graph
wrapped into graphML can be returned that contains the concepts as nodes (labeled with
the corresponding term) and all term-to-term cosine distances above a given threshold as
edges.
For example, the resulting concept list might contain the terms ‘dog’, ‘cat’, and ‘table’;
dog and cat might have a cosine distance of .7 in this space (being higher than the
threshold of .6), whereas both dog and cat have a cosine distance of .3 to table; thus, the
termsims service would return only an edge between the nodes ‘dog’ and ‘cat’, but none
involving the node ‘table’.
Additionally, the termsims service returns colors for each node depending on which
cluster it has been assigned to using kmeans with (number-of-nodes / 4) clusters.
The storage layer contains one latent-semantic space constructed from a flexible subset
of the 18 million documents large and freely available PubMed/Medline corpus, nine
million of which included an abstract.
To generate a proper LSA space for the medical field from these documents, we first
extracted these documents from XML to a format more suitable for the subsequent
computations. As we eventually needed the document (bag-of-words) data in R, we chose
to use the R text mining (tm) package (see Feinerer et al., 2008) for data extraction as it
provides facilities to convert a corpus to a term-document-matrix.
However, it soon became obvious that we would run into memory problems when trying
to hold the complete text corpus in memory with R, so we went from storing full text
representations of the documents to representing documents by indices relative to the
corpus vocabulary only. This not only heavily cut down memory usage, but also sharply
LTfLL - 2008 - 212578
10
Ex ist in g Se rv ices – I n t eg ra t e d
sped up the following bag-of-words matrix calculation, which would have been another
serious bottleneck in terms of computation speed. To further reduce the corpus' memory
usage we also applied some basic preprocessing (stemming, stopword filtering, number
removal, lower case, etc.) while reading in the text content.
As we wanted to create custom spaces based on topics or concepts, we first analysed the
PubMed topical information (the MeSH headings) and extracted this data into a MySQL
database to subsequently retrieve the corresponding document ids for to a given MeSH
heading (and its children). To extract the MeSH headings to the database, we modified
the PubMed data extraction demo from the Lingpipe Java libraries for our needs.
We also chose the R text mining package, because its term-document-matrix (TDM)
facility used a sparse matrix as its data format, which was a must for us if we wanted to
incorporate a larger corpus such as PubMed. As the R LSA package (Wild, 2009) could
not yet handle sparse matrices for its singular value decomposition (SVD), we interfaced
the svdlib.c from Berry (1992) in the modified version from Rhode (2009) that can
handle sparse matrices as input data. A new release of the package is currently being
prepared.
Using aforementioned techniques we could successfully calculate a test space that
includes all 9 million abstract-holding PubMed documents and a vocabulary of more than
330.000 (frequency-filtered) terms.
However, if there is more detailed information available about the topical requirements, a
smaller and more efficient space can be calculated around one or more topics. For this
purpose we adapted the R text mining package again: Now the user only needs to provide
the topics (given as MeSH descriptors) to the text mining package, which in turn
generates a new corpus based on the documents of interest extracted from the document
base. From this corpus the user may then easily generate a (preprocessed) textmatrix that
is finally used as input for the singular value decomposition.
This way we created latent-semantic spaces for the MeSH descriptor ‘skin’ (containing
information from 49.414 documents) as well as a space containing the MeSH descriptors
‘pharmacology’, ‘drug interactions’, and ‘accident prevention’ (built from 24.346
documents).
LTfLL - 2008 - 212578
11
Ex ist in g Se rv ices – I n t eg ra t e d
3. Support and Feedback Showcases (WP5)
Work in this work package 5 is organized alongside two tasks. Task T5.1 concentrates on
recommendations derived from interaction analysis of learners. Task T5.2 focuses on
recommendations based on assessing student writings.
3.1 Interaction Analysis (T5.1)
Two applications have been developed for the analysis of collaborative chat
conversations in order to provide automatic feedback and grading to the tutors:
Polyphony Analyzer and ChAMP (Chat Assessment and Modeling Program). As the
implementation of these systems had started before D2.1 was available and because they
have been intended to be used by a limited number of tutors for testing and validation
purposes mainly, they were not designed as services, but as stand-alone desktop systems,
therefore the client-server architecture was not used. Nevertheless, they were partially
implemented a four-tier architecture composed of a three-tier modified MVC (ModelView-Controller) and a data processing layer. The MVC has the restriction that the
Model (Data) layer and the View (Presentation) layer do not communicate directly, but
only through the Controller as Fig. 3 of the architecture displays below.
Fig. 3: Architectural layers of the system.
LTfLL - 2008 - 212578
12
Ex ist in g Se rv ices – I n t eg ra t e d
Different technologies were used for the implementation of these applications: Polyphony
Analyzer was developed in C#.NET (using the additional library WordNet.NET), while
ChAMP was executed in Java. The latter is using the following additional libraries: Jazzy
for spellchecking, Prefuse for social networks modeling, JFreeChart for generating
charts, JWI (Java Wordnet Interface) for interacting with Wordnet and MTJ (Matrix
Toolkit for Java) for EVD - eigenvalue decomposition - and SVD - singular value
decomposition. The configuration of the applications is done by using external
configuration files and special internal classes.
View (Presentation) Layer
The view contains the user interface and controls and is responsible for the interaction
with the user and displays the results of the processing. Polyphony Analyzer is a multiwindowed application, while ChAMP uses multiple tabs. The communication with the
controller is done inside the event (action) handling mechanism used for each control and
all the parameters needed to handle the event processing are transmitted to the controller.
After the controller finishes processing the event, he passes back the results that should
be interpreted by the view in order to be properly displayed to the user by modifying the
controls. A simple way of communicating between the view and the controller is by using
messages that have a type that is related to the actions that should be taken by each layer
and a container that encapsulates all the parameters needed to process the message.
Controller Layer
The controller is used for communication between the view and the model, thus being
responsible of modifying the data layer accordingly to the user’s actions in the view and
by transmitting back the information that should be modified in the view to reflect the
processing of the data. It receives messages from the view and performs the operations
corresponding to the message type and parameters. These messages modify the data
model or the state of the controller. The modification of the data stored in the model is
done either by passing the message further to the model view, if the operations are
simple, or through the data processing layer, if complicated operations are necessary.
After the processing of the data is done, the controller receives the results of the
processing and constructs a new message that is communicated to the view that contains
the information that should be modified by the view. As the applications are not very
complex, the communication between the three other layers performed by the controller
is synchronous, using a single processing thread.
Model (Data) Layer
The model stores the data processed by the application. Initially, the data contains only
the chat conversation that is parsed from an external XML (or HTML) transcript file.
LTfLL - 2008 - 212578
13
Ex ist in g Se rv ices – I n t eg ra t e d
When proceeding with processing the chat, the model is enriched with new information
about the results of the processing. Due to the low memory requirements required for
storing and processing the data, it is stored in the main memory as long as the application
uses it. The model modifies the underlying data either directly when requested by the
controller or indirectly when the processing layer modifies the data in the model to store
its results. When the model finishes processing the data, it notifies the controller and
transmits to it the information that is needed to change the view. Part of the processed
data can be exported in different formats depending of the application: Polyphony
Analyzer exports XSL spreadsheets and text files, while ChAMP exports XML files.
Data Processing Layer
The data processing is responsible for performing the natural language and social
network processing tasks that are more difficult. It is instructed by the controller to start
the analysis and, after finishing its task, updates the model, thus enriching it, and notifies
the controller. Because the data processing layer is separated from all the other tiers, the
functions that perform the processing tasks can be modified without altering any of the
other layers or by performing only minor modifications.
Service-Orientated Approach
For the integration in the LTfLL architecture described in D2.1, the two systems will be
transformed in web services. The view requires the most significant adaptation, as the
desktop based view must be transformed in a web browser usable one, by using HTML
controls and forms, plus modifying the interaction pattern with the server. This way, the
View layer shall become the Client layer, while the rest of the application shall reside on
the server. The rest of the tiers are easily mapped on the D2.1 architecture: the Controller
is the Service layer and must be adapted to respond to HTTP requests, while the Data
Processing layer is equivalent to the Application Logic layer and the Model layer to the
Backend layer, with minor modifications.
The services that shall be developed for task 5.1 are intended to provide support and
feedback for collaborative chat conversations and discussion forums for students, tutors
and teachers. The service implementation is very loosely based on the current systems, as
the services shall employ a different approach, integrating some of the features used for
the two systems and improving them by implementing new features. There shall be two
distinct services that shall use similar approaches, especially for the data processing
techniques: a chat service and a forum service. These services shall be decomposed into a
number of distinct sub-services – for example, the chat analysis service shall have at least
three sub-services: one for feedback generation, one for providing grading for each
participant and one for an enhanced visualization of the conversations. The Client layer
LTfLL - 2008 - 212578
14
Ex ist in g Se rv ices – I n t eg ra t e d
shall use the widget platform based on PHP, JavaScript, AJAX and Flash/Flex or Java
applets/JavaFX. The Backend layer shall be used to store the data in memory and retrieve
it from a persistent storage, such as a relational or XML database. We opt for a relational
database used together with an ORM (Object-Relational Mapping) like the Hibernate
framework for Java. The Service layer shall provide the end-points of the services and the
communication between the outer-world (requests/responses) and the Data and
Application Logic layers.
The most important layer of the service shall be the Application Logic layer because this
is where all the processing is going to be performed. Considering its complexity, the
processing layer can be decomposed into 7 different sub-layers:
1. Basic processing and NLP (Natural Language Processing) pipe: spelling
correction, stemmer, tokeniser, POS tagger;
2. Linguistic ontology (e.g. Wordnet) interfacing sub-layer;
3. Domain ontology and semantic sub-layer – either built by experts or
automatically extracted from various web sources (e.g. using Wikipedia and
Wiktionary);
4. Social network analysis sub-layer;
5. Advanced NLP and discourse analysis sub-layer: identification of cue phrases,
speech acts, rhetorical schemas, lexical chains, co-references;
6. Advanced discourse analysis: adjacency pairs, implicit links, discussion threads,
argumentation, transactivity;
7. Polyphony sub-layer: includes modules for examining inter-animation,
convergence and divergence.
The first four sub-layers perform distinct operations that are not inter-related one with
another, but they are all used by the sub-layers 5-7 that provide more advanced functions.
Moreover, each sub-layer in the interval 5-7 uses the outputs provided by all sub-layers
that have a lower identifier (e.g. sub-layer 6 uses the outputs from all sub-layers 1-5),
thus composing a processing layer stack that has at the base sub-layers 1-4 and at the top
sub-layer 7. All the results provided by the Application Logic layer are saved to the
Backend layer through the Service layer.
3.2 Assessing Textual Products (T5.2)
Apex (Lemaire, et al., 2001) is a system which delivers automatic feedback to students
who read course texts and then summarise them. It invokes LSA for both proposing texts
to read, as a search engine, and for measuring how well the summary matches the given
LTfLL - 2008 - 212578
15
Ex ist in g Se rv ices – I n t eg ra t e d
course text. The student is involved in two main interaction loops. First, as many course
texts as the student wants are proposed upon request (see Fig. 4). The student assesses
each of the texts read as understood (summarisable) or not (reading loop, see Fig. 5).
Then the student can freely choose among the following tasks: write a summary from
each of the read texts he understood (writing loop, see Fig. 7), read a new text or create a
new request, depending on his response and the availability of further texts (see Fig. 5
and 6). During the writing loop, the student can ask Apex to assess his summary (i.e.,
whether he captured the gist of the course text, see Fig. 8 and 9), and then can revise it
accordingly or go to the next summary (see fig. 9). The following use cases describe the
human-system interactions.
Request
write a request
Retrieve the 1st text
Student
Fig. 4: The student indicates key words about the topic to work on and types a request.
Read a source text
Summarisable text ?
true
Select to summarise
or to read a new text
Student
false
Verify if it’s
the last text
true
false
true
true
Verify if one read text or
more is summarizable
the last text
false
Verify if one read text or
more is summarizable
the last text
false
Read a new text
Select to summarise or
to write a new request
Write a new request
Fig. 5: The student reads a text and determines if he can summarise it.
LTfLL - 2008 - 212578
16
Ex ist in g Se rv ices – I n t eg ra t e d
Options after the reading of a text
Read a new text
Display a new text
Summarise a text
Display the list of the understood texts
Fig. 6: The student chooses if he wants to write a summary or read a new text.
Display of the list of the understood texts
Summarise a text
Display the zonearea
summarise
Student
Read the text again
Display the text
Fig. 7: The student determines if he wants to summarise or re-read a text.
Write a summary
Validate the summary
Assess the summary
Student
Fig. 8: The student writes a summary and asks an assessment.
LTfLL - 2008 - 212578
17
Ex ist in g Se rv ices – I n t eg ra t e d
Display of the assessment
*
Assess a new time
Student
Return to the list of the texts
Read a new text
Write a request
Display the list of
the understood texts
Display a source text
Display the request display
Exit
Fig. 9: The student peruses his evaluation and determines that he wants to do after that.
The student actions lead to the execution of different functions which are required to send
requests to a server, on one hand to store any data and on the other hand to use LSA (via
the bellcore application, currently being replaced by the R infrastructure).
Client layer
Clients use a web interface. It lets users read texts, summarise them and peruse
assessments about the work completed. The interface code is in HTML / PHP.
Service layer
This layer links the interface and LSA. In accordance with the users’ actions, every
parameter is sent to the server with POST/GET method. The parameters are used in C
scripts which give the possibility to invoke the LSA program or to recover user data from
text files (one by session).
Application logic layer
C scripts are invoked. They are able to perform LSA from parameters passed or to
recover data directly from files. The LSA application (currently still the Bellcore
application) returns a result file. In this file, semantic proximities are required. The
service layer transforms this file to communicate data to the client layer.
Storage layer
LTfLL - 2008 - 212578
18
Ex ist in g Se rv ices – I n t eg ra t e d
LSA needs a semantic space to function. We compute it from a text corpus, which
depends on the user’s knowledge level and the domain taught. Since computing a
semantic space is processing time demanding, we use semantic spaces computed in
advance. If we want LSA to make comparisons in processing a new document (in
addition to the corpus), we don’t compute a new semantic space, and we use specific
LSA functions (tplus and syn, i.e., “fold in” technique).
Descriptions of functions
Below, the PHP functions which are used are described. Some of them invoke C
functions:
Affiche_requete: the user has to write his request
Fait_choix: this function proposes to go on to read or to write a summary
Fait_selection: this function sends the first text and recovers data about the
understanding of this text. The first text is selected depending on the LSA results.
It invokes from a C function (selectFirstText.c) which invokes the tplus and syn
LSA functions in turn.
Recup_compris1: this function invokes a C function (understoodUser.c) to update
the session file.
Affiche_newText: this function selects the next text and refreshes the session file.
The text is selected depending on LSA results. LSA is invoked from a C function
(newText.c) which invokes the syn LSA function in turn.
Recup_choix: this function recovers user choices from each step.
Afficher_text_compris: this function provides to the user the list of the read and
summarizable texts (displayed as a whole or only the first sentence) as well as the
Apex assessment. Then the user can either summarise a given text, or re-read it.
Ecrire_text: this function displays the form used to summarise a text and store the
user summary.
Fait_choix2: if there is no other text to display, Apex proposes to write the
summaries of the understood texts or to write a new request
Eval_text: this function allows the assessment of a summary by LSA, its storage
and its display. LSA is invoked from a C function (understoodLSA.c) which
invokes tplus and syn LSA functions.
LTfLL - 2008 - 212578
19
Ex ist in g Se rv ices – I n t eg ra t e d
4. Social and Informal Learning Showcases (WP6)
The work within work package 6 on supporting social and informal learning is divided
into two tasks. Task T6.1 deals with the creation of a knowledge sharing network,
whereas task T6.2 focuses on adding a social component to the public knowledge.
4.1 Task 6.1 – Knowledge sharing network
This document describes the architecture and the services which will be part of the
Common Semantic Framework (CSF).
CSF Resources
The data stored in CSF will be in XML format. The notion to be used here is resource.
Typical resources are ontologies, lexicons, learning materials, communication notes,
comments, web links, etc. Each resource is connected with the following additional
information:
DTD or XML Schema – definition of the structure of the XML documents that
are instances of the resource
Internal elements and external elements. Internal elements contain information
which will be processed by the CSF – editing, storing, searching, visualizing, etc.
External elements are pointed from the resource and they generally are processed
by external tools – for example, PDF viewer is used to open PDF documents.
Tools for processing of the elements of the resource.
Visualization rules for the resource – how the elements of the resource are
presented to the user, how the user can manipulate them, etc.
Search Schema(s) for the resource.
Resource creation. The following modes are envisaged – automatic creation by an
external services, manual by the user, or mixture of the previous ones (some
elements of the resource are generated automatically, others are entered
manually).
This information will be called Resource Information. The XML documents for a given
resource will be called also Resource Documents.
Basic Level Services of CSF
The basic level of CSF provides services for all the basic operation of XML documents
that represent resources. These services are: DTD/XML Schema Management; Resource
LTfLL - 2008 - 212578
20
Ex ist in g Se rv ices – I n t eg ra t e d
Information Editing; Tools Declaration; Search Schema Editing; Store Resource
Documents; Retrieve Resource Documents; Visualize Resource Documents.
DTD/XML Schema Management
This service will support: the declaration of a DTD/XML Schema; editing of existing
DTD/XML Schema; validation checks of the resource documents; export of DTD/XML
Schema.
We will reuse the corresponding modules from CLaRK System, implemented by us,
(http://www.bultreebank.org/clark/index.html). CLaRK is a perfect editor for XML
documents and DTDs. It is integrated with XPath processor, DTD Validator and tools
that can support users for XML documents transformations. The full version needs to be
implemented.
Resource Information Editing
This service will support: editing of resource information; import and export of resource
information.
For each kind of resource information an editor will be implemented. The actual
information will be represented as XML documents which will facilitate the exchange of
such information.
To be implemented.
Tools Declaration
This service will support the declaration of external tools (services). The declaration will
contain the following information: where the tool is located; the type of the tool
arguments; the type of the result. Usually the external tools will be used for navigation
over the web, browsing different type of documents (PDF, RTF, etc.), for creation of
resources (complete or partial). The tools will be called under some events within CSF.
To be implemented.
Search Schema Editing
This service will support the creation and modification of search schema. One of the
basic components of CSF is an XML oriented search engine. In order the resource
documents to be searchable at least one search schema needs to be connected to the
resource. The search schema contains definitions of the context of search, terms of search
and retrievable elements. When a search schema is defined an index for it is created and
resource documents can be indexing in it.
It is implemented as an extension of Lucene engine (http://lucene.apache.org/).
Store Resource Documents
LTfLL - 2008 - 212578
21
Ex ist in g Se rv ices – I n t eg ra t e d
This service will support storing of resource documents (local or remote). During the
storing corresponding index will be modified appropriately.
It is implemented using the file system at the moment, but it could be extended to web
repository.
Retrieve Resource Documents
This service will support retrieval of resource documents (or their elements) from the
document repository. The retrieval could be done from a list of documents or via XML
Search Engine. The first option is similar to navigation over file system. The second
option is via Lucene engine based XML search engine. This search engine provides a
query language tuned to search schemas. The query is evaluated over an appropriate
index and the service returns a list of documents and/or their elements that match the
query.
It is implemented as an extension of Lucene engine (http://lucene.apache.org/).
Visualize Resource Documents
This service will read the rules for visualization and will generate a concept map which is
displayed to the user. The user will be able to navigate over it, edit it, stored it and run on
request external tools. There will be two versions of the service – standalone and webbased one. The first will require installation of VUE system locally. The web-based one
will be simplified version of the standalone one.
This service is under implementation within the Visual Understanding Environment
(VUE) – http://vue.tufts.edu/.
Extended Layer of CSF
At this level of CSF the actual resources and services for the tasks within LTfLL project
will be provided. They will include the actual definition of resources with their resource
information. The following resources are envisaged at that phase of the project: Ontology
Management Service; Lexicon Management Service; Document Annotation Service;
Social Media Services (Task WP6.2).
Ontology Management Service
This service will comprise three services: Ontology Storing and Reasoning; Ontology
Translation; Ontology Filtering.
Ontology storing and reasoning service provides the basic functionalities for accessing
information (explicit or implicit) from ontology. This includes:
registration and deregistration of ontology models;
listing of direct and indirect sub-concepts;
LTfLL - 2008 - 212578
22
Ex ist in g Se rv ices – I n t eg ra t e d
listing of direct and indirect super-concepts;
listing of individuals for a concept;
listing of concepts to which an individual belongs;
listing of properties defined for a concept;
structural and logical consistency check of the ontology model registered;
extraction of a registered ontology model;
generation of ontology fragments (with/out sub-concepts; super-concepts; sibling
concepts; property relations and range concepts; with property restriction superconcepts for a concept supplied as a parameter);
generation of a sub-hierarchy, containing super-concepts and/or sub-concepts
with a pointed step for a supplied as a parameter concept.
For implementation of these functionalities Pellet OWL reasoner is used
(http://www.mindswap.org/2003/pellet). The necessary Web services are already
implemented and they will be reused within LTfLL project.
Ontology translation service converts ontology in simplified graph representation and
also translates the concept and relations names in a natural language using the lexicon
service. The result of this service will be the main way in which ontology will be
presented within CSF as a resource. For this resource we will define a search schema, a
set of visualization rules. The actual service is already implemented as our own software
module and an appropriate web service is defined for it. The incorporation of the resource
in CSF will be implemented within LTfLL.
Ontology filtering service will apply rules for simplification of the ontology in order it to
be better understood by the user. This service will be implemented as our own module
within the project.
Lexicon Management Service
This service supports the alignment of lexicons with the ontology. It provides functions
for access of terms for a concept; concepts expressed by a term; addition of new terms;
access to definitions. It is already implemented as our own software module. An
appropriate web service is defined for it.
Document Annotation Service
This service comprises two services: text annotation and image annotation. The text
annotation is implemented as a language pipe performing tokenization, POS tagging,
lemmatization, and semantic annotation. We already have implementations of language
pipes for several languages (from LT4eL project). Where it is possible we will reuse
them. Additionally, we will augment these language pipes in order to achieve better
semantic annotation. The implementation is done by using third parties software for
LTfLL - 2008 - 212578
23
Ex ist in g Se rv ices – I n t eg ra t e d
languages different from Bulgarian. The integration and the augmentation will be done
within CLaRK system (http://www.bultreebank.org/clark/index.html). In order to support
web services we extended CLaRK system to be used in pipes and as a server. This
service will be used for annotation of each textual element of the resources within CSF
such as learning objects, comments, definitions, etc. The image annotation works with
images in multimedia documents. It provides an image editor for selection of regions in
the image and mechanisms for annotation of these regions with concepts from ontology.
The resulting annotation is stored in the document and can be used for searching and
other processing. The image annotator is already implemented.
Social Media Services
These are services which will be implemented with LTfLL project and they will elicit
new knowledge and recommendations from social media.
4.2 Task 6.2 – Social component
Web Services
All functionalities provide a web service interface. If you have configured the tools, you
can use them with standard technologies like WSDL and SOAP through this interface.
After successfully installing the tools you can access the web service description at:
http://<your_tomcat_url>/lt4elservice/services/Lt4elService?wsdl
The current specification lists eight implemented methods.
sendNewLo
Sending a new learning object to the language technology server. This is the first method
that should be invoked right after a new learning object is created in or uploaded to a
learning management system. This service passes the URL of a learning object in its
original format (HTML, PDF …) to the server. To make this work, the linguistic
processing chain for the given language and format must be configured and running.
Input Parameters
loid (xsd:string): Learning Object ID. This ID is used to identify the learning
object. It is assumed, that this ID is generated in the learning management system,
when new learning objects are created. This ID is used as an input/output
parameter in most if the other functions.
language (xsd:string): The (main) language of the learning object. The language
must be represented by a two-letter code as defined in ISO 639-1. See
http://www.oasisopen.org/cover/iso639a.html for details.
LTfLL - 2008 - 212578
24
Ex ist in g Se rv ices – I n t eg ra t e d
url (xsd:string): URL of the learning object file in its original format.
sendNewLoAnnotated
Sending a pre-annotated learning object to the language technology server. This service
does basically the same as sendNewLo. The difference is that instead of a raw learning
object, file paths to local pre-annotated versions of the learning object are passed as
parameters. The files must respect the LT4eLAna DTD. The service should be used,
when a language processing chain is not available for a given language.
Input Parameters
loid (xsd:string): Learning Object ID. This ID is used to identify the learning
object. It is assumed, that this ID is generated in the learning management system,
when new learning objects are created. This ID is used as an input/output
parameter in most if the other functions.
language (xsd:string): The (main) language of the learning object. The language
must be represented by a two-letter code as defined in ISO 639-1. See
http://www.oasisopen.org/cover/iso639a.html for details.
filename (xsd:string): Local path to the pre-annotated learning object file
(LT4eLAnaDTD).
attach (xsd:boolean): not used.
filename2 (xsd:string): Local path to ontological annotated file. If no path is
given, semantic search will not be available for this learning object.
Output Parameters
accepted (xsd:boolean): True, if the language technology server successfully
received the learning object, false otherwise.
getStatus
Get processing status of a learning object that has been sent to the language server. This
function can be used after sendNewLO or sendNewLOAnnotated has been invoked for a
learning object. Since the processing and conversion of a new learning object may take
several minutes, this function tells the learning management system the status of the
processing. It can be used to be displayed for the user and to deactivate certain functions
that cannot be used until the processing status is finished (2).
Input Parameters
loid (xsd:string): Learning Object ID. (see sendNewLo).
Output Parameters
LTfLL - 2008 - 212578
25
Ex ist in g Se rv ices – I n t eg ra t e d
status (DocumentStatus): Two parameters are returned. Status contains the current
processing status of the LO as string with the following possible values:
o UNKNOWNPROCESSING
o FAILED
o FINISHED
The second parameter StatusStr contains a longer status message, containing
additional information e.g. about a processing failure.
Types
DocumentStatus sequence of Status (xsd:string) StatusStr (xsd:string)
deleteLO
Delete a learning objects representation on the language technology server. This function
should be called when a learning object is deleted in the learning management system.
After successfully invoking deleteLO, subsequent calls to getStatus will return
UNKNOWN again.
Input Parameters
loid (xsd:string): Learning Object ID. (see sendNewLo).
Output Parameters
success (xsd:boolean): True, if the language technology server successfully
removed the learning object, false otherwise.
findKeywordCandidates
Find candidate terms for keyword annotation of a learning object. This method should be
used by a learning management system, when a learning object is annotated with
keywords. A lot of learning management systems come with support for LOM or Dublin
Core meta data. Both of these standards allow the annotation with keywords. However,
simple tagging systems work the same way and could use this function to propose
keywords to an annotator.
Current Restrictions
In general this function works better, when the internal language model gets
larger. This means if only a small number of learning objects have been sent to
the language technology server by using the function sendNewLo, the quality of
the results is suboptimal. Good quality can be expected after 30-50 mid-size
learning objects have been sent to the language technology server.
Input Parameters
LTfLL - 2008 - 212578
26
Ex ist in g Se rv ices – I n t eg ra t e d
loid (xsd:string): Learning Object ID. (see sendNewLo).
maxnum (xsd:int): Maximum number of keywords that should be returned by the
function.
method (xsd:string): Method of keyword detection.
o tfidf TF-IDF
o ridf R-IDF
o adridf ADR-IDF (currently best performing)
Output Parameters
keywords (ArrayOfString): Ranked keywords.
Types
ArrayOfString
o minOccurs 0
o maxOccurs unbounded
o type xsd:string
sendApprovedKeywords
Send all keywords related to a learning object that has been approved by a human
annotator back to the learning technology server. The language technology server could
later use this information, e.g. during the search.
Input Parameters
loid (xsd:string): Learning Object ID
keywords (tns:ArrayOfString): Keywords approved by an author.
Output Parameters
success (xsd:boolean): Success true/false
getDefinitionCandidates
Get a set of terms and candidate definitions for a learning object. This method can be
used by learning managements systems to support semi-automatic generation of
glossaries with terms and definitions found in a learning object. The returned value
context includes the surrounding text of the definition (usually around three sentences).
Input Parameters
loid (xsd:string): Learning Object ID. (see sendNewLo).
Output Parameters
LTfLL - 2008 - 212578
27
Ex ist in g Se rv ices – I n t eg ra t e d
definitions (ArrayOfDefinition): Array of terms and related defining texts.
Types
ArrayOfDefinition
o minOccurs 0
o maxOccurs unbounded
o type Definition
Definition
type sequence of
context (xsd:string)
definedTerm (xsd:string)
definingText (xsd:string)
search
Search for learning objects. This function supports extended search capabilities based on
fulltext search, keyword based search and semantic search. Semantic search supports
multilingual retrieval of learning objects by using lexicons and an ontology.
Input Parameters
searchTerms (tns:ArrayOfString): The search terms
semantic (xsd:boolean): Semantic search.
keywords (xsd:boolean): Keyword search.
fulltext (xsd:boolean): Fulltext search.
conjunctive (xs:boolean): Conjuctive combination of search terms (otherwise
disjunctive)
searchLangs (tns:ArrayOfStrings): Search term languages
retrievalLangs (tns:ArrayOfStrings): Languages of target learning objects
method (xsd:string): Search Method („SEMANTIC“, „KEYWORD“,
FULLTEXT“)
searchConcepts (tns:ArrayOfString): Concepts, if learning objects related to
concepts are searched
systemLang (xsd:string): System Language
maxSnippets (xsd:int): Maximum number of search context snippets that should
be returned. Use lower values to increase performance.
Output Parameters
searchresult (WSSearchResult): Two arrays. The first one includes a result list
with all found LOs, including information on score, text context, and related
concepts. The second array holds a list of ontology concepts that are related to the
search terms.
LTfLL - 2008 - 212578
28
Ex ist in g Se rv ices – I n t eg ra t e d
Types
WSSearchResult
resultList ArrayOfSearchResult
termConcepts ArrayofString
Search Result
docid xsd:string (Learning Object ID)
fulltext xsd:boolean (Found by fulltext search)
semantic xsd:boolean (Found by semantic search)
keyword xsd:boolean (Found by keyword search)
matchingConcepts ArrayOfString (Concepts related to search termsand
LO)
rankedConcepts ArrayOfString (Concepts annotated to the LO)
score xsd:double (Relevance Score)
snippet xsd:string (Contextual text snippet)
getConceptNeighbourhood
Get relations and related concepts of an ontology concept. This function can be used to
support browsing through the ontology in the learning management system interface.
Input Parameters
concepts (tns:ArrayOfString): Concepts
languages (tns:ArrayOfString): Languages (the entries in the lexicons for these
languages and the concepts included in the fragments will be returned)
Output Parameters
fragements (tns:ArrayOfString): Ontology fragments for concepts.
getRankedConceptsForDocs
This service returns all concepts that are related to one (or multiple) learning object(s).
The concepts are ranked according to their number of occurrences in the learning module.
Input Parameters
loids (tns:ArrayOfString): Learning Object IDs.
Output Parameters
concepts (tns:ArrayOfConceptItem): List of ranked concepts.
LTfLL - 2008 - 212578
29
Ex ist in g Se rv ices – I n t eg ra t e d
Types
ConceptItem
concept
docId
LTfLL - 2008 - 212578
ArrayOfSearchResult
xsd:string
30
Ex ist in g Se rv ices – I n t eg ra t e d
5. Overview on Showcase Characteristics
T4.1
welldefined
interfaces
T4.2
welldefined
interfaces
full
full
Documentation
partly
partly
License
fully
open
source
easy
possible
HTML,
CSS,
JavaScrip
t
Encapsulation
WP2
Integration
T5.1
welldefined
interfaces
easy
possible
T5.2
T6.1
partly
partly
possible
possible
partly
partly
nearly
full
fully
open
source
partly
open
source
partly
open
source
easy
possible
possible
possible
HTML,
CSS,
Flash
T6.2
welldefined
interfaces
easy
possible
Client layer
Widgetisation
Visualisation
adaptatio
ns needed
easy
possible
HTML,
CSS,
Java
Applet
HTML,
CSS,
JavaScrip
t
possible
adaptatio adaptatio
ns needed ns needed
easy
possible
Service layer
Servicification
Technology
full
XML,
REST,
AJAX
easy
possible
XML,
REST
over
HTTP
XML,
REST
REST
over
HTTP
C, PHP
text files
VUE,
XML,
REST
SOAP,
WSDL,
XML
Application logic layer
Programming
language
R, PHP
R, Flare
C#.NET,
Java,
Python,
PHP
MySQL
file
system,
MySQL
XSL, text
files,
XML
Ruby,
Java
Storage Layer
Technology
LTfLL - 2008 - 212578
XML
MySQL
31
Ex ist in g Se rv ices – I n t eg ra t e d
6. Integration Strategy
6.1 Position of Software Integration within the Project
Integration Lifecycle
As this project’s software outcomes will be very heterogeneous in their implementation
characteristics it is important to define a suitable integration strategy for this task. Within
the first deliverable D2.1 the basic software development and release process for the
whole project period was defined. Enhancing this first approach one step further, Fig. 10
displays a simplified project lifecycle, emphasising on the software development process
important for the proposed integration strategy (bold items represent integration specific
tasks).
Fig. 10: Simplified project lifecycle emphasising software integration
process; adapted spiral model from (Boehm, 1988)
As specified in deliverable D2.1 software releases are defined to be prepared in line with
the submission of the deliverables of the different work packages. With respect to WP 2
LTfLL - 2008 - 212578
32
Ex ist in g Se rv ices – I n t eg ra t e d
this means, that the project’s lifecycle is an iterative process having altogether three main
loops: (1) ‘Existing services (showcases) integrated’, (2) ‘Services v1 – integrated’ and
(3) ‘Services v2 – integrated’.
The (1) initial iteration has its starting point from the Description of Work (DoW) and
defines first software requirements which are met at large in the architecture, design, and
implementation of the prototypes reflecting the different partners’ showcases. After the
technical validation, the outcome is a revised development plan with adapted
requirements which are influencing the scenario-based design process. Being now in the
(2) second iteration a new software design is elaborated causing new development of
software products which have to be tested and integrated in the general WP2
infrastructure. After having a stable first version of integrated services, a product release
is planned freezing the development at that stage. The technical validation is done like in
the first loop, resulting in a new development plan and therefore emerging adapted
requirements. As it can be assumed – being in the (3) last iteration – that the software
development process has become as stable as the software outcomes, a last detailed
design phase is the initial task for the final development, integration, and testing phase
resulting in a second version of integrated software products. At the end of the project a
final release is done having well tested and stable software products which can be
integrated in existing applications as defined in the requirements.
Software Development Process
It can be assumed that coding of applications is mostly done in-line with typical looking
processes well known in software development. By having a look at Fig. 11 a software
development process can be seen focusing on test and integration tasks (bold).
Fig. 11 can be seen as a sub-system of the entirely integration process throughout the
whole project. This means that each stage of the development process has to be more or
less run through in each iteration of the project lifecycle.
By having a closer look at Fig. 11 all development starts with a requirements analysis,
followed by defining the functional specification and the software architecture and the
system design. Then the actual coding of software parts takes place. Unit testing can be
seen as the first stage of the dynamic test process and is the smallest testable part of an
application. Until now all development specific issues are done more or less by the
different WPs themselves.
LTfLL - 2008 - 212578
33
Ex ist in g Se rv ices – I n t eg ra t e d
After having assured that the individual software components
are fit for use, the application is integrated within the WP2
infrastructure. Along with the integration, testing takes place to
ensure that the program can run on WP2’s infrastructure and to
collect requirements for and guarantee transferability and
interoperability.
Yet, it is not finally decided if the project’s software outcomes
should consist of a single coherent system or of individual
software components loosely coupled together. By sticking to a
widget based design approach, this decision can be postponed
to a later point in time when the different software parts are
more evolved (e.g. after validation of stage ‘Services v1 –
integrated’). This software architectural decision affects very
much the kind of system tests performed.
The last stage of the software development process forms
acceptance testing, which is done using at the one side a
technical validation and at the other side a stakeholder driven
qualitative validation approach.
Fig. 11: Adapted and simplified
software development process
6.2 Integration Workflow
As for the actual integration, a process is defined which ensures an easy way to develop,
share, integrate and test different software components. The process has to ensure that
every partner can build software according to his/her preferences but has to stick to a
minimal set of general rules which guarantee good communication and cooperation. This
is important because every partner will have to collaborate with others at a certain stage
in the development lifecycle.
Therefore, two separate processes are defined meeting the needs of software developers.
It is distinguished between (1) partners who uses WP2 infrastructure for developments
and (2) partners who uses external infrastructure.
LTfLL - 2008 - 212578
34
Ex ist in g Se rv ices – I n t eg ra t e d
Developments on WP2 Infrastructure
Every developing project partner is approved to have access to the WP2 development and
test infrastructure. WP2 ensures that partners are granted appropriate access to suitable
programming environments meeting their requirements. Therefore, project partners can
develop their software components on a ‘common and typical’ server infrastructure to
guarantee at least a minimum level of transferability and interoperability. For
developments following this approach a workflow has been defined which is displayed in
Fig. 121. There a typical development process is shown starting with an initial integration
of already existing software at project partner’s side and ending with a new stable
package release consisting of an up and running service on the WP2 live system. Light
blue activities intended to be in the scope of project partner WPs, whereas light red
activities are duties of WP2.
The initial integration of existing software has to follow the workflow of external
infrastructure developments to ensure to have an already integrated system for further
developments (partner WPs and WP2 involved). The software can then be extended by
the project partner resulting in a new revision (e.g. at the end of a day) which units have
to be tested to verify internal correctness. After successful unit tests a commit to the
global SourceForge repository is done. This development cycle is iterated until a major
revision is deployed. As software developments are done already on the WP2
infrastructure integration tests are done automatically by partner WPs (otherwise
components would not operate correctly). Major revision of software outcomes are also
tested for their validity by WP2. After confirming a correct major revision it is stated as
being an integrated revision. If any error occurs which cannot be solved by WP2 partner
WPs are informed which have to adapt the software.
As stated in former sections software releases are defined to be done in line with the
submission of the deliverables of the different WPs. This minimum requirement must be
met, but there is nothing to say against a software release between two deliverables, if a
new stable version is deployed and satisfactorily tested. This means if a major revision is
stated stable by the involved WPs a new package release can be instantiated. For this
reason WP2 will deploy the already successfully integrated software on the test system
also on the live server. If the installation and configuration fails for any reason, partner
WPs are informed which can adapt the software with the help of WP2. As this adaptation
will result in a new revision, it has to pass through all the stages of testing once again. At
1
For all activity diagrams displayed in this paper it is assumed that the project partner has already been
granted access to the WP2 infrastructure and that the required development environment is set up.
Therefore, this process is not illustrated in the corresponding figures.
LTfLL - 2008 - 212578
35
Ex ist in g Se rv ices – I n t eg ra t e d
the end of the integration process a correct working service should be up and running on
the live server and being accessible by everybody from the outside.
At every step in the development process documentation of the software has to be done.
But to ensure a traceable integration process protocols of this work have also to be
maintained. More information can be found in section 7.
LTfLL - 2008 - 212578
36
Ex ist in g Se rv ices – I n t eg ra t e d
Fig. 12: Activity diagram of workflow for developments on WP2 infrastructure
LTfLL - 2008 - 212578
37
Ex ist in g Se rv ices – I n t eg ra t e d
Developments on External Infrastructure
Some partner WPs feel the need to develop software on their own infrastructure.
Therefore, for them the WP2 servers are not the best choice as a programming
environment. It should be no problem that partner WPs deploy programs on their own as
long as it can be guaranteed that developed software systems can run on WP2’s
infrastructure to test their transferability.
The workflow of developing software on external infrastructure can be seen in Fig. 13.
Again light blue activities belong to partner WPs, while light red activities are in the
scope of WP2. The development of new software parts and unit tests are all done on
partner WPs infrastructure. That means that versioning and track changes are not
obligatory and every partner WP has to take care of backing up their programming code
on their own.
If a major revision has been developed the partner WP commit it to the main
SourceForge repository. WP2 checks-out/update the version on their infrastructure and
perform integration tests to ensure that the software works in their environment. As these
integration tests are likely not to be performed periodically, errors will occur and WP2
will inform the partners if they cannot fix the error by themselves. The partner WP has to
adapt the software and testing procedures will start again.
If the major revision is stated as a stable package – and of course after successful
integration tests – a package release on the live server is done like in the workflow of
developing on WP2 infrastructure in the former chapter. Again, at the end of the
integration process a correct working system should be up and running on the WP2 live
server.
LTfLL - 2008 - 212578
38
Ex ist in g Se rv ices – I n t eg ra t e d
Fig. 13: Activity diagram of workflow for developments on external infrastructure
LTfLL - 2008 - 212578
39
Ex ist in g Se rv ices – I n t eg ra t e d
7. Versioning Policy
1) Check-in code to SourceForge by project partners:
The code checked in to the SourceForge repository should include a folder named docs,
which contains an installation manual (install.txt) and a test manual (test.txt).
For more information about the installation and test manuals follow the instructions in
steps 2 and 3.
2) Installation manual (install.txt):
The installation manual (install.txt) describes how to install the service in a
stepwise manner. It may contain instruction parts such as ‘Requirements’, ‘How to
install’, ‘Troubleshooting’, etc. The steps described in the installation manual should be
clear enough so that a technical staff that is not familiar with the corresponding service is
capable of performing a working installation successfully. If the code to be installed
requires libraries or packages not existing on the infrastructure, these should be listed in
the ‘Requirements’ part of installation manual. The required packages should be specified
in detail to simplify the installation process.
3) Test manual (test.txt):
The test manual (test.txt), describes a stepwise procedure to test the service that is
already installed. The goal is to verify the service manually in order to ensure that the
service meets the functionality and works properly on the production system. It should
contain detailed instructions about the inputs, commands and operations to be executed
during the test and the expecting output of the service. In order to ensure the successful
integration to the production system the infrastructure partners (BIT MEDIA & WUW)
offers detailed information about the development environment and the server hard- and
software.
4) Check out the code by infrastructure partners:
The infrastructure partners (WP2) are informed automatically through the email
notification of SourceForge when the project partners check in their code. If the code
does not include the required installation and test manuals described in step 1 the project
partner is informed by infrastructure partners about the missing files. The corresponding
partner is then asked to complete the code according to instructions in step 1 and check it
in again to SourceForge.
5) Service installation by infrastructure partners:
LTfLL - 2008 - 212578
40
Ex ist in g Se rv ices – I n t eg ra t e d
WP2 will try to install the software following the instructions in the installation manual
as described in step 2. In the case of installation failures an error description is sent back
to the partner, who offers the service. The partner is asked to update the installation
manual or code, check it in again and offer a cooperative support to the infrastructure
partners in order to achieve a working installation on the infrastructure.
6) Service test by infrastructure partners:
If the service could be installed successfully, WP2 will try to test the installed service
according to the instructions in the test manual described in step 3. If the test fails or
cannot be executed, an error description is sent back to the partner, who offers the
service. The partner is asked to update the test manual, or fix the errors, check in the code
again and offer a cooperative support to the infrastructure partners in order to achieve a
working service on the infrastructure.
7) Final success report to project partners:
If the service is installed and tested successfully, WP2 informs the partners about the
successful installation and testing of the service.
LTfLL - 2008 - 212578
41
Ex ist in g Se rv ices – I n t eg ra t e d
References
Berry, M. 1992. Large Scale Singular Value Computations, International Journal of
Supercomputer Applications 6:1, pp. 13-49.
Boehm, B. W. 1988. A Spiral Model of Software Development and Enhancement.
Computer. 1988, Vol. 21, 5.
Feinerer, I. and Hornik, K. and Meyer, D. 2008. Text mining infrastructure in R.
Journal of Statistical Software, 25(5):1-54, March 2008
Horner, J. 2009. rapache: Web application development with R and Apache. [Online]
2009. [Cited: February 26, 2009.] http://biostat.mc.vanderbilt.edu/rapache/.
Lemaire, B. and Dessus, P. 2001. A system to assess the semantic content of student
essays. Journal of Educational Computing Research. 2001, Vol. 24, 305-320.
Oehlschlägel, J. and Adler, D. and Nenadic, O. and Zucchini, W. 2008. A first
glimpse
into
'R.ff'.
http://www.statistik.uni-dortmund.de/useR-2008/slides/
Oehlschlaegel+Adler+Nenadic+Zucchini.pdf.
Rohde, D. 2009. svdlibc, http://tedlab.mit.edu/~dr/svdlibc/
Wild, F. 2009. lsa: Latent Semantic Analysis, R package version 0.61
LTfLL - 2008 - 212578
42