Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

LTfLL Deliverable 2.2-Existing services-integrated

2009

This deliverable gives a snapshot on the iteratively developing prototypes. It focuses on the integration of the showcases ie on turning the existing technology into web-based, service-oriented applications, and structuring their software architecture along the concept elaborated in the previous infrastructure deliverable D2. 1. This involves mainly the re-use of existing technology (hence the name 'showcases'), although several of these task showcases already involve new developments.

Language Technologies for Lifelong Lear ning LTfLL - 200 8- 212578 Pr oj e ct D e live r a ble Re port D e liv e r a ble 2 .2 – Ex ist in g Ser vices – in t e gr a t e d W ork Packa ge 2 Ta sk 2 .2 , 2.4 Da te of delive ry Cont ractua l: 28 -02-2009 Code na m e D2 .2 Type of delive rable < report > Se curity < PU > ( dist r ib ut ion le ve l) Contribut ors Act ua l: 13-04-2009 Ve rsion: 1.1 Draft Final Conta ct Pe rson Reinhard Diet l, Fridolin Wild, Bernhard Hoisl, Robert Koblischke ( WUW) , Berit Richt er , Markus Essl, Gerhard Doppler ( BI T MEDI A) , Traian Rebedea, St efan Trausan -Mat u (PUB-NCI T) , Philippe Dessus (UPMF) Reinhard Diet l, Fridolin Wild, Bernhard Hoisl, Robert Koblischke ( WUW) , Berit Richt er ( BI T MEDI A) , Paola Monachesi (UU) , Kiril Sim ov (I PP-BAS) , Traian Rebedea (PUB-NCI T) , Sonia Mandin, Virginie Zam pa (UPMF) Fridolin Wild (WUW) W P/ Task responsible WUW EC Proj ect Office r Mr. M. Máj ek Ab stract ( for disse m ina tion ) This deliv erable gives a snapshot on t he it erat ively developing prot ot ypes. I t focuses on t he int egrat ion of t he show cases i.e. on t urning t he exist ing t echnology int o web-based, serv ice- orient ed applicat ions, and st ruct uring t heir soft ware archit ect ure along t he concept elaborat ed in t he prev ious infrast ruct ure deliver able D2.1. This involves m ainly the re-use of ex ist ing technology (hence t he nam e ‘showcases’) , alt hough sev eral of t hese t ask showcases already involve new developm ent s. The presentat ion of t he int egrat ion of exist ing serv ices is w rapped up by an ov erview on t he int eroperabilit y charact erist ics. Addit ionally, t he int egrat ion st rat egy pursued is elaborat ed, providing m ore det ails about t he soft ware developm ent process and t he int egrat ion workflow. Service Concept , NLP, Educat ional Mash- Ups Au thors ( Pa rtne r) Ke yw ords List LTfLL Proj ect Coordinat ion at : Open Universit y of t he Net herlands Valkenburgerweg 177 , 6419 AT Heerlen, The Net herlands Tel: + 31 45 5762624 – Fax : + 31 45 5762800 Ex ist in g Se rv ices – I n t eg ra t e d Table of Contents Executive Summary ................................................................................................................ 3 1. Introduction.......................................................................................................................... 4 2. Positioning Showcases (WP4)............................................................................................ 6 2.1 Positioning from Portfolios (T4.1)............................................................................... 6 2.2 Conceptual Development (T4.2).................................................................................. 8 3. Support and Feedback Showcases (WP5) ....................................................................... 12 3.1 Interaction Analysis (T5.1)......................................................................................... 12 3.2 Assessing Textual Products (T5.2) ............................................................................ 15 4. Social and Informal Learning Showcases (WP6) ........................................................... 20 4.1 Task 6.1 – Knowledge sharing network .................................................................... 20 4.2 Task 6.2 – Social component ..................................................................................... 24 5. Overview on Showcase Characteristics ........................................................................... 31 6. Integration Strategy........................................................................................................... 32 6.1 Position of Software Integration within the Project.................................................. 32 6.2 Integration Workflow ................................................................................................. 34 7. Versioning Policy.............................................................................................................. 40 References.............................................................................................................................. 42 LTfLL - 2008 - 212578 2 Ex ist in g Se rv ices – I n t eg ra t e d Executive Summary Within this deliverable, a current snapshot on the iteratively developing prototypes is given. The integration of these showcases focuses mainly on turning the existing technology into web-based, service-oriented applications, and structuring their software architecture along the concept elaborated in the previous infrastructure deliverable D2.1. This involves mainly the re-use of existing technology (hence the name ‘showcases’), although several of these task showcases already involve new developments. The task showcases documented here encompass two prototypes per work package, one for each task outlined in the description of work. This includes:       T4.1: Positioning from Portfolios T4.2: Conceptual Development T5.1: Interaction Analysis T5.2: Assessing Textual Products T6.1: Knowledge Sharing Network T6.2: Social Component The presentation of the integration of existing services is wrapped up by an overview on the interoperability characteristics of these showcases. Additionally, the integration strategy pursued is elaborated in section 6, thereby providing more details about the software development process and the integration workflow. Finally, section 7 updates the versioning policy in brief. LTfLL - 2008 - 212578 3 Ex ist in g Se rv ices – I n t eg ra t e d 1. Introduction Within the first deliverable D2.1 of this work package on integration, a concept had been elaborated for a service-oriented development framework that is strong enough to integrate the technologies considered essential for the tasks of the work packages 4, 5, and 6 while at the same time respecting their heterogeneity. This concept incorporates the segmentation of the involved systems and components into three layers: a layer each for widgets, services, and data storage. The service layer itself can further be divided into service logic and application logic thus encapsulating the service invocation, execution, and delivery routines and the application specific processing routines respectively. Following this encapsulated approach with well-defined interfaces has many advantages in the sense of interoperability, transferability, and reusability of the deployed software components. A widget-based architecture seems to be the right choice to handle the very heterogeneous software developments of the different partners. Defining clear interfaces makes it possible to have loosely-coupled and spatially distributed applications working together. Therefore, it is essential to ‘widgetise’ the deployed software which means having a GUIoutput being able to be viewed in any web-browser at the client-layer (web-widget) which can be addressed using web-services. A first step in this direction is the widgetisation and servicification of existing showcases: partners need to plan their further developments for the version 1 of integrated services in form of widgets and services. This means not necessarily adapting all existing showcases – as these are only prototypes; it means, however, testing the adaptation where the showcase forms already a basis for future developments. Within this deliverable, a current snapshot on the iteratively developing prototypes is given, thereby following the former described conceptual approach. The integration of these showcases focuses mainly on turning the existing technology into web-based applications and structuring their software architecture along the concept touched upon above and elaborated in more detail in deliverable D2.1. This involves mainly the use of existing technology, although several of the task prototypes already involve new developments. LTfLL - 2008 - 212578 4 Ex ist in g Se rv ices – I n t eg ra t e d The task prototypes documented here encompass two prototypes per work package, one for each task outlined in the description of work. This includes:       T4.1: Positioning from Portfolios T4.2: Conceptual Development T5.1: Interaction Analysis T5.2: Assessing Textual Products T6.1: Knowledge Sharing Network T6.2: Social Component LTfLL - 2008 - 212578 5 Ex ist in g Se rv ices – I n t eg ra t e d 2. Positioning Showcases (WP4) The work within work package 4 on positioning is organized into two complementary tasks. Task T4.1 focuses on the analysis of portfolios in order to determine a learner’s current standing (position) in a given domain. Task T4.2 on the contrary concentrates on diagnosing a learner’s conceptual development. The following subsections outline the prototypes developed for showcasing both tasks. 2.1 Positioning from Portfolios (T4.1) As outlined in D2.1, the web service architecture is composed of four layers. The following sections will outline the R-based LSA framework for web services applied to this 4-tier-architecture. On the client layer, a web browser can use any HTTP based communication mechanism to access web services via RESTful requests. The server hosting the service framework handles the request by invoking the applications on the service layer and dispatches any parameters that have been passed to the server. Within the service layer, the service framework relies on code written in R which is used to transform, validate, and then communicate the parameters to the application logic layer. Invocation of the underlying R-scripts is handled by an Apache server, which is equipped with a module called ‘Rapache’ (Horner, 2009). On arrival of a request for an R web service at the Apache server, Rapache invokes an instance of the R shared library, executes the required R web-service script and enables the R framework to access all information that has been passed to the Apache server by the client (via the HTTP protocol). After computation of the application logic – successful or not – the service layer returns a custom XML structure representing the result object. The actual structure to be used can be freely chosen by the web service implementer. This individual choice is given to developers as it enables quick development of XML interactions for simple tasks without limiting the implementation of a full-fledged communication architecture based on a standard XML protocol (e.g. SOAP), which would have been the case if e.g. XML-RPC would have been chosen as the standard protocol. The application logic layer consists of a service-specific R function which is able to perform LSA processes with only the parameter list passed by the service layer. LTfLL - 2008 - 212578 6 Ex ist in g Se rv ices – I n t eg ra t e d The backend layer ensures the presence of appropriate access functions to the current space warehouse implementation in the environment (‘scope’) of the dynamic LSA processes. Decoupling the LSA process logic from the service layer framework has the following advantages: 1. A dedicated LSA process logic developer can implement logical functions with neither knowing, nor caring about the actual implementation of the space warehouse on the one hand and the service layer on the other hand. 2. The architecture the usage of the storage access methods provided by the storage layer, which makes the handling of large spaces on the backend layer independent of the application logic. 3. Having the whole application logic present as a single R language object enables the manipulation of the logic itself using R, enabling the optimization e.g. of the data handling by reducing copy-procedures of values (Oehlschlägel, et al., 2008). The LSA process logic then returns a result object, which is – depending on the task –any R object, including lists, arrays or even binary image data which may be generated by a graphics implementation. This data is then passed to the service layer for transformation and communication to the client layer. The backend layer provides help for implementers of LSA process logic who may face several problems concerning the retrieval of the space objects:  Storage requirements for spaces: Depending on the application scenario of the LSA process logic, spaces can be very large and might not fit into the main memory of the client, nor server.  Transfer bottlenecks: Depending on the storage device, retrieval of a space from that device may be slow.  Retrieval overheads: storage mechanisms like compression or serialization create a computational overhead when retrieving a space.  Simultaneous access: Some applications scenarios (like web services) may require instant access to space objects for multiple clients at the same time. The considerations above lead to the development of three types of space storage mechanisms which can be accessed by the applications layer using a common interface: LTfLL - 2008 - 212578 7 Ex ist in g Se rv ices – I n t eg ra t e d    The monolithic warehouse approach keeps a central R instance permanently open, holding all space objects previously calculated in main memory. LSA processing logic is passed into this R instance and executed there, locally. This approach eliminates all overhead created by copying large space objects between storage media as they are accessed directly from memory. It has to be kept in mind that the central R instance holding the spaces is unavailable to other requests until the LSA processing logic has finished. The inter-instance copy approach keeps – like the monolithic approach – all spaces in main memory. Application logic is – in contrast – not executed in the storage R instance, but rather, a copy of the original object is passed to the Rapache instance, freeing the central R instance's access interface again as soon as the copy has been generated. The serialisation approach keeps all space objects in a binary file on the server's hard disk. This has the advantage that on most servers, hard disk space will by far exceed main memory, and for that reason, storage should be less a problem. On the downside, (de-)serialization of space objects may be – depending on the hardware used – a time consuming task which may slow down the LSA process. 2.2 Conceptual Development (T4.2) Following the proposal of D2.1, the architecture of the showcase prototype for monitoring the conceptual development of a learner can be segmented into four layers. A client layer communicates through a service layer with the application logic that again is supported by particular functionalities (most notably data storage and retrieval functionalities) coming from a back-end layer (see Fig. 1). Fig. 1: Architectural layers of the system. LTfLL - 2008 - 212578 8 Ex ist in g Se rv ices – I n t eg ra t e d The client layer (see Fig. 2) can be seen as running the visible user interface of the prototype, typically via a web browser. The evaluation client consists of two components, the first being a form with a text area into which learners copy & paste their textual input. The second widget is a visualization of the concept graph which is called upon submitting the textual input. This graph then renders the relationships between the key concepts triggered by the textual input through a force-direction lay-out: terms semantically close to each other will be placed in short distance of each other, while terms distant will be positioned further apart from each other. The colors of the concept map indicate to which cluster the terms belong. More details on the calculation will be outlined below in the description of the other layers. Fig. 2: Screenshot of the user interface. The service layer holds the counterpart to the client layer: it provides the html-pages, receives the textual input via HTTP using POST, dispatches it to the force-directed layout and returns an HTML-wrapped flash application in order to deliver the requested visualization. The application logic layer consists of two components; both of them can also be accessed directly as services. The force-directed lay-out is a flash application written in flex that uses the Prefuse Flare visualization toolkit (see http://flare.prefuse.org/) in order to render the underlying graph of conceptual relations among the resulting concepts with the help of a physics simulation of interacting forces. By dispatching the received input text to the second component – termsims –, the resulting graph is reconstructed from the LTfLL - 2008 - 212578 9 Ex ist in g Se rv ices – I n t eg ra t e d graphML transport format (see http://graphml.graphdrawing.org/) and is dynamically visualized on the flash stage. This second service serves to calculate term-to-term similarities by first folding the textual input into a pre-existing latent-semantic space and by extracting those most prominent terms from the resulting lower-order text vector in this space by filtering for the 30 most frequent terms that load with a frequency in the latent-semantic space higher than a given threshold of .05. These terms are considered to be concepts describing the textual input in this lower-order latent-semantic space. The size of the node is calculated to be ten times the value of their frequency in the latent-semantic space (plus 1). By subsequently calculating the term-to-term cosine distances of these terms, a graph wrapped into graphML can be returned that contains the concepts as nodes (labeled with the corresponding term) and all term-to-term cosine distances above a given threshold as edges. For example, the resulting concept list might contain the terms ‘dog’, ‘cat’, and ‘table’; dog and cat might have a cosine distance of .7 in this space (being higher than the threshold of .6), whereas both dog and cat have a cosine distance of .3 to table; thus, the termsims service would return only an edge between the nodes ‘dog’ and ‘cat’, but none involving the node ‘table’. Additionally, the termsims service returns colors for each node depending on which cluster it has been assigned to using kmeans with (number-of-nodes / 4) clusters. The storage layer contains one latent-semantic space constructed from a flexible subset of the 18 million documents large and freely available PubMed/Medline corpus, nine million of which included an abstract. To generate a proper LSA space for the medical field from these documents, we first extracted these documents from XML to a format more suitable for the subsequent computations. As we eventually needed the document (bag-of-words) data in R, we chose to use the R text mining (tm) package (see Feinerer et al., 2008) for data extraction as it provides facilities to convert a corpus to a term-document-matrix. However, it soon became obvious that we would run into memory problems when trying to hold the complete text corpus in memory with R, so we went from storing full text representations of the documents to representing documents by indices relative to the corpus vocabulary only. This not only heavily cut down memory usage, but also sharply LTfLL - 2008 - 212578 10 Ex ist in g Se rv ices – I n t eg ra t e d sped up the following bag-of-words matrix calculation, which would have been another serious bottleneck in terms of computation speed. To further reduce the corpus' memory usage we also applied some basic preprocessing (stemming, stopword filtering, number removal, lower case, etc.) while reading in the text content. As we wanted to create custom spaces based on topics or concepts, we first analysed the PubMed topical information (the MeSH headings) and extracted this data into a MySQL database to subsequently retrieve the corresponding document ids for to a given MeSH heading (and its children). To extract the MeSH headings to the database, we modified the PubMed data extraction demo from the Lingpipe Java libraries for our needs. We also chose the R text mining package, because its term-document-matrix (TDM) facility used a sparse matrix as its data format, which was a must for us if we wanted to incorporate a larger corpus such as PubMed. As the R LSA package (Wild, 2009) could not yet handle sparse matrices for its singular value decomposition (SVD), we interfaced the svdlib.c from Berry (1992) in the modified version from Rhode (2009) that can handle sparse matrices as input data. A new release of the package is currently being prepared. Using aforementioned techniques we could successfully calculate a test space that includes all 9 million abstract-holding PubMed documents and a vocabulary of more than 330.000 (frequency-filtered) terms. However, if there is more detailed information available about the topical requirements, a smaller and more efficient space can be calculated around one or more topics. For this purpose we adapted the R text mining package again: Now the user only needs to provide the topics (given as MeSH descriptors) to the text mining package, which in turn generates a new corpus based on the documents of interest extracted from the document base. From this corpus the user may then easily generate a (preprocessed) textmatrix that is finally used as input for the singular value decomposition. This way we created latent-semantic spaces for the MeSH descriptor ‘skin’ (containing information from 49.414 documents) as well as a space containing the MeSH descriptors ‘pharmacology’, ‘drug interactions’, and ‘accident prevention’ (built from 24.346 documents). LTfLL - 2008 - 212578 11 Ex ist in g Se rv ices – I n t eg ra t e d 3. Support and Feedback Showcases (WP5) Work in this work package 5 is organized alongside two tasks. Task T5.1 concentrates on recommendations derived from interaction analysis of learners. Task T5.2 focuses on recommendations based on assessing student writings. 3.1 Interaction Analysis (T5.1) Two applications have been developed for the analysis of collaborative chat conversations in order to provide automatic feedback and grading to the tutors: Polyphony Analyzer and ChAMP (Chat Assessment and Modeling Program). As the implementation of these systems had started before D2.1 was available and because they have been intended to be used by a limited number of tutors for testing and validation purposes mainly, they were not designed as services, but as stand-alone desktop systems, therefore the client-server architecture was not used. Nevertheless, they were partially implemented a four-tier architecture composed of a three-tier modified MVC (ModelView-Controller) and a data processing layer. The MVC has the restriction that the Model (Data) layer and the View (Presentation) layer do not communicate directly, but only through the Controller as Fig. 3 of the architecture displays below. Fig. 3: Architectural layers of the system. LTfLL - 2008 - 212578 12 Ex ist in g Se rv ices – I n t eg ra t e d Different technologies were used for the implementation of these applications: Polyphony Analyzer was developed in C#.NET (using the additional library WordNet.NET), while ChAMP was executed in Java. The latter is using the following additional libraries: Jazzy for spellchecking, Prefuse for social networks modeling, JFreeChart for generating charts, JWI (Java Wordnet Interface) for interacting with Wordnet and MTJ (Matrix Toolkit for Java) for EVD - eigenvalue decomposition - and SVD - singular value decomposition. The configuration of the applications is done by using external configuration files and special internal classes. View (Presentation) Layer The view contains the user interface and controls and is responsible for the interaction with the user and displays the results of the processing. Polyphony Analyzer is a multiwindowed application, while ChAMP uses multiple tabs. The communication with the controller is done inside the event (action) handling mechanism used for each control and all the parameters needed to handle the event processing are transmitted to the controller. After the controller finishes processing the event, he passes back the results that should be interpreted by the view in order to be properly displayed to the user by modifying the controls. A simple way of communicating between the view and the controller is by using messages that have a type that is related to the actions that should be taken by each layer and a container that encapsulates all the parameters needed to process the message. Controller Layer The controller is used for communication between the view and the model, thus being responsible of modifying the data layer accordingly to the user’s actions in the view and by transmitting back the information that should be modified in the view to reflect the processing of the data. It receives messages from the view and performs the operations corresponding to the message type and parameters. These messages modify the data model or the state of the controller. The modification of the data stored in the model is done either by passing the message further to the model view, if the operations are simple, or through the data processing layer, if complicated operations are necessary. After the processing of the data is done, the controller receives the results of the processing and constructs a new message that is communicated to the view that contains the information that should be modified by the view. As the applications are not very complex, the communication between the three other layers performed by the controller is synchronous, using a single processing thread. Model (Data) Layer The model stores the data processed by the application. Initially, the data contains only the chat conversation that is parsed from an external XML (or HTML) transcript file. LTfLL - 2008 - 212578 13 Ex ist in g Se rv ices – I n t eg ra t e d When proceeding with processing the chat, the model is enriched with new information about the results of the processing. Due to the low memory requirements required for storing and processing the data, it is stored in the main memory as long as the application uses it. The model modifies the underlying data either directly when requested by the controller or indirectly when the processing layer modifies the data in the model to store its results. When the model finishes processing the data, it notifies the controller and transmits to it the information that is needed to change the view. Part of the processed data can be exported in different formats depending of the application: Polyphony Analyzer exports XSL spreadsheets and text files, while ChAMP exports XML files. Data Processing Layer The data processing is responsible for performing the natural language and social network processing tasks that are more difficult. It is instructed by the controller to start the analysis and, after finishing its task, updates the model, thus enriching it, and notifies the controller. Because the data processing layer is separated from all the other tiers, the functions that perform the processing tasks can be modified without altering any of the other layers or by performing only minor modifications. Service-Orientated Approach For the integration in the LTfLL architecture described in D2.1, the two systems will be transformed in web services. The view requires the most significant adaptation, as the desktop based view must be transformed in a web browser usable one, by using HTML controls and forms, plus modifying the interaction pattern with the server. This way, the View layer shall become the Client layer, while the rest of the application shall reside on the server. The rest of the tiers are easily mapped on the D2.1 architecture: the Controller is the Service layer and must be adapted to respond to HTTP requests, while the Data Processing layer is equivalent to the Application Logic layer and the Model layer to the Backend layer, with minor modifications. The services that shall be developed for task 5.1 are intended to provide support and feedback for collaborative chat conversations and discussion forums for students, tutors and teachers. The service implementation is very loosely based on the current systems, as the services shall employ a different approach, integrating some of the features used for the two systems and improving them by implementing new features. There shall be two distinct services that shall use similar approaches, especially for the data processing techniques: a chat service and a forum service. These services shall be decomposed into a number of distinct sub-services – for example, the chat analysis service shall have at least three sub-services: one for feedback generation, one for providing grading for each participant and one for an enhanced visualization of the conversations. The Client layer LTfLL - 2008 - 212578 14 Ex ist in g Se rv ices – I n t eg ra t e d shall use the widget platform based on PHP, JavaScript, AJAX and Flash/Flex or Java applets/JavaFX. The Backend layer shall be used to store the data in memory and retrieve it from a persistent storage, such as a relational or XML database. We opt for a relational database used together with an ORM (Object-Relational Mapping) like the Hibernate framework for Java. The Service layer shall provide the end-points of the services and the communication between the outer-world (requests/responses) and the Data and Application Logic layers. The most important layer of the service shall be the Application Logic layer because this is where all the processing is going to be performed. Considering its complexity, the processing layer can be decomposed into 7 different sub-layers: 1. Basic processing and NLP (Natural Language Processing) pipe: spelling correction, stemmer, tokeniser, POS tagger; 2. Linguistic ontology (e.g. Wordnet) interfacing sub-layer; 3. Domain ontology and semantic sub-layer – either built by experts or automatically extracted from various web sources (e.g. using Wikipedia and Wiktionary); 4. Social network analysis sub-layer; 5. Advanced NLP and discourse analysis sub-layer: identification of cue phrases, speech acts, rhetorical schemas, lexical chains, co-references; 6. Advanced discourse analysis: adjacency pairs, implicit links, discussion threads, argumentation, transactivity; 7. Polyphony sub-layer: includes modules for examining inter-animation, convergence and divergence. The first four sub-layers perform distinct operations that are not inter-related one with another, but they are all used by the sub-layers 5-7 that provide more advanced functions. Moreover, each sub-layer in the interval 5-7 uses the outputs provided by all sub-layers that have a lower identifier (e.g. sub-layer 6 uses the outputs from all sub-layers 1-5), thus composing a processing layer stack that has at the base sub-layers 1-4 and at the top sub-layer 7. All the results provided by the Application Logic layer are saved to the Backend layer through the Service layer. 3.2 Assessing Textual Products (T5.2) Apex (Lemaire, et al., 2001) is a system which delivers automatic feedback to students who read course texts and then summarise them. It invokes LSA for both proposing texts to read, as a search engine, and for measuring how well the summary matches the given LTfLL - 2008 - 212578 15 Ex ist in g Se rv ices – I n t eg ra t e d course text. The student is involved in two main interaction loops. First, as many course texts as the student wants are proposed upon request (see Fig. 4). The student assesses each of the texts read as understood (summarisable) or not (reading loop, see Fig. 5). Then the student can freely choose among the following tasks: write a summary from each of the read texts he understood (writing loop, see Fig. 7), read a new text or create a new request, depending on his response and the availability of further texts (see Fig. 5 and 6). During the writing loop, the student can ask Apex to assess his summary (i.e., whether he captured the gist of the course text, see Fig. 8 and 9), and then can revise it accordingly or go to the next summary (see fig. 9). The following use cases describe the human-system interactions. Request write a request Retrieve the 1st text Student Fig. 4: The student indicates key words about the topic to work on and types a request. Read a source text Summarisable text ? true Select to summarise or to read a new text Student false Verify if it’s the last text true false true true Verify if one read text or more is summarizable the last text false Verify if one read text or more is summarizable the last text false Read a new text Select to summarise or to write a new request Write a new request Fig. 5: The student reads a text and determines if he can summarise it. LTfLL - 2008 - 212578 16 Ex ist in g Se rv ices – I n t eg ra t e d Options after the reading of a text Read a new text Display a new text Summarise a text Display the list of the understood texts Fig. 6: The student chooses if he wants to write a summary or read a new text. Display of the list of the understood texts Summarise a text Display the zonearea summarise Student Read the text again Display the text Fig. 7: The student determines if he wants to summarise or re-read a text. Write a summary Validate the summary Assess the summary Student Fig. 8: The student writes a summary and asks an assessment. LTfLL - 2008 - 212578 17 Ex ist in g Se rv ices – I n t eg ra t e d Display of the assessment * Assess a new time Student Return to the list of the texts Read a new text Write a request Display the list of the understood texts Display a source text Display the request display Exit Fig. 9: The student peruses his evaluation and determines that he wants to do after that. The student actions lead to the execution of different functions which are required to send requests to a server, on one hand to store any data and on the other hand to use LSA (via the bellcore application, currently being replaced by the R infrastructure). Client layer Clients use a web interface. It lets users read texts, summarise them and peruse assessments about the work completed. The interface code is in HTML / PHP. Service layer This layer links the interface and LSA. In accordance with the users’ actions, every parameter is sent to the server with POST/GET method. The parameters are used in C scripts which give the possibility to invoke the LSA program or to recover user data from text files (one by session). Application logic layer C scripts are invoked. They are able to perform LSA from parameters passed or to recover data directly from files. The LSA application (currently still the Bellcore application) returns a result file. In this file, semantic proximities are required. The service layer transforms this file to communicate data to the client layer. Storage layer LTfLL - 2008 - 212578 18 Ex ist in g Se rv ices – I n t eg ra t e d LSA needs a semantic space to function. We compute it from a text corpus, which depends on the user’s knowledge level and the domain taught. Since computing a semantic space is processing time demanding, we use semantic spaces computed in advance. If we want LSA to make comparisons in processing a new document (in addition to the corpus), we don’t compute a new semantic space, and we use specific LSA functions (tplus and syn, i.e., “fold in” technique). Descriptions of functions Below, the PHP functions which are used are described. Some of them invoke C functions:           Affiche_requete: the user has to write his request Fait_choix: this function proposes to go on to read or to write a summary Fait_selection: this function sends the first text and recovers data about the understanding of this text. The first text is selected depending on the LSA results. It invokes from a C function (selectFirstText.c) which invokes the tplus and syn LSA functions in turn. Recup_compris1: this function invokes a C function (understoodUser.c) to update the session file. Affiche_newText: this function selects the next text and refreshes the session file. The text is selected depending on LSA results. LSA is invoked from a C function (newText.c) which invokes the syn LSA function in turn. Recup_choix: this function recovers user choices from each step. Afficher_text_compris: this function provides to the user the list of the read and summarizable texts (displayed as a whole or only the first sentence) as well as the Apex assessment. Then the user can either summarise a given text, or re-read it. Ecrire_text: this function displays the form used to summarise a text and store the user summary. Fait_choix2: if there is no other text to display, Apex proposes to write the summaries of the understood texts or to write a new request Eval_text: this function allows the assessment of a summary by LSA, its storage and its display. LSA is invoked from a C function (understoodLSA.c) which invokes tplus and syn LSA functions. LTfLL - 2008 - 212578 19 Ex ist in g Se rv ices – I n t eg ra t e d 4. Social and Informal Learning Showcases (WP6) The work within work package 6 on supporting social and informal learning is divided into two tasks. Task T6.1 deals with the creation of a knowledge sharing network, whereas task T6.2 focuses on adding a social component to the public knowledge. 4.1 Task 6.1 – Knowledge sharing network This document describes the architecture and the services which will be part of the Common Semantic Framework (CSF). CSF Resources The data stored in CSF will be in XML format. The notion to be used here is resource. Typical resources are ontologies, lexicons, learning materials, communication notes, comments, web links, etc. Each resource is connected with the following additional information:  DTD or XML Schema – definition of the structure of the XML documents that are instances of the resource  Internal elements and external elements. Internal elements contain information which will be processed by the CSF – editing, storing, searching, visualizing, etc. External elements are pointed from the resource and they generally are processed by external tools – for example, PDF viewer is used to open PDF documents.  Tools for processing of the elements of the resource.  Visualization rules for the resource – how the elements of the resource are presented to the user, how the user can manipulate them, etc.  Search Schema(s) for the resource.  Resource creation. The following modes are envisaged – automatic creation by an external services, manual by the user, or mixture of the previous ones (some elements of the resource are generated automatically, others are entered manually). This information will be called Resource Information. The XML documents for a given resource will be called also Resource Documents. Basic Level Services of CSF The basic level of CSF provides services for all the basic operation of XML documents that represent resources. These services are: DTD/XML Schema Management; Resource LTfLL - 2008 - 212578 20 Ex ist in g Se rv ices – I n t eg ra t e d Information Editing; Tools Declaration; Search Schema Editing; Store Resource Documents; Retrieve Resource Documents; Visualize Resource Documents. DTD/XML Schema Management This service will support: the declaration of a DTD/XML Schema; editing of existing DTD/XML Schema; validation checks of the resource documents; export of DTD/XML Schema. We will reuse the corresponding modules from CLaRK System, implemented by us, (http://www.bultreebank.org/clark/index.html). CLaRK is a perfect editor for XML documents and DTDs. It is integrated with XPath processor, DTD Validator and tools that can support users for XML documents transformations. The full version needs to be implemented. Resource Information Editing This service will support: editing of resource information; import and export of resource information. For each kind of resource information an editor will be implemented. The actual information will be represented as XML documents which will facilitate the exchange of such information. To be implemented. Tools Declaration This service will support the declaration of external tools (services). The declaration will contain the following information: where the tool is located; the type of the tool arguments; the type of the result. Usually the external tools will be used for navigation over the web, browsing different type of documents (PDF, RTF, etc.), for creation of resources (complete or partial). The tools will be called under some events within CSF. To be implemented. Search Schema Editing This service will support the creation and modification of search schema. One of the basic components of CSF is an XML oriented search engine. In order the resource documents to be searchable at least one search schema needs to be connected to the resource. The search schema contains definitions of the context of search, terms of search and retrievable elements. When a search schema is defined an index for it is created and resource documents can be indexing in it. It is implemented as an extension of Lucene engine (http://lucene.apache.org/). Store Resource Documents LTfLL - 2008 - 212578 21 Ex ist in g Se rv ices – I n t eg ra t e d This service will support storing of resource documents (local or remote). During the storing corresponding index will be modified appropriately. It is implemented using the file system at the moment, but it could be extended to web repository. Retrieve Resource Documents This service will support retrieval of resource documents (or their elements) from the document repository. The retrieval could be done from a list of documents or via XML Search Engine. The first option is similar to navigation over file system. The second option is via Lucene engine based XML search engine. This search engine provides a query language tuned to search schemas. The query is evaluated over an appropriate index and the service returns a list of documents and/or their elements that match the query. It is implemented as an extension of Lucene engine (http://lucene.apache.org/). Visualize Resource Documents This service will read the rules for visualization and will generate a concept map which is displayed to the user. The user will be able to navigate over it, edit it, stored it and run on request external tools. There will be two versions of the service – standalone and webbased one. The first will require installation of VUE system locally. The web-based one will be simplified version of the standalone one. This service is under implementation within the Visual Understanding Environment (VUE) – http://vue.tufts.edu/. Extended Layer of CSF At this level of CSF the actual resources and services for the tasks within LTfLL project will be provided. They will include the actual definition of resources with their resource information. The following resources are envisaged at that phase of the project: Ontology Management Service; Lexicon Management Service; Document Annotation Service; Social Media Services (Task WP6.2). Ontology Management Service This service will comprise three services: Ontology Storing and Reasoning; Ontology Translation; Ontology Filtering. Ontology storing and reasoning service provides the basic functionalities for accessing information (explicit or implicit) from ontology. This includes:  registration and deregistration of ontology models;  listing of direct and indirect sub-concepts; LTfLL - 2008 - 212578 22 Ex ist in g Se rv ices – I n t eg ra t e d         listing of direct and indirect super-concepts; listing of individuals for a concept; listing of concepts to which an individual belongs; listing of properties defined for a concept; structural and logical consistency check of the ontology model registered; extraction of a registered ontology model; generation of ontology fragments (with/out sub-concepts; super-concepts; sibling concepts; property relations and range concepts; with property restriction superconcepts for a concept supplied as a parameter); generation of a sub-hierarchy, containing super-concepts and/or sub-concepts with a pointed step for a supplied as a parameter concept. For implementation of these functionalities Pellet OWL reasoner is used (http://www.mindswap.org/2003/pellet). The necessary Web services are already implemented and they will be reused within LTfLL project. Ontology translation service converts ontology in simplified graph representation and also translates the concept and relations names in a natural language using the lexicon service. The result of this service will be the main way in which ontology will be presented within CSF as a resource. For this resource we will define a search schema, a set of visualization rules. The actual service is already implemented as our own software module and an appropriate web service is defined for it. The incorporation of the resource in CSF will be implemented within LTfLL. Ontology filtering service will apply rules for simplification of the ontology in order it to be better understood by the user. This service will be implemented as our own module within the project. Lexicon Management Service This service supports the alignment of lexicons with the ontology. It provides functions for access of terms for a concept; concepts expressed by a term; addition of new terms; access to definitions. It is already implemented as our own software module. An appropriate web service is defined for it. Document Annotation Service This service comprises two services: text annotation and image annotation. The text annotation is implemented as a language pipe performing tokenization, POS tagging, lemmatization, and semantic annotation. We already have implementations of language pipes for several languages (from LT4eL project). Where it is possible we will reuse them. Additionally, we will augment these language pipes in order to achieve better semantic annotation. The implementation is done by using third parties software for LTfLL - 2008 - 212578 23 Ex ist in g Se rv ices – I n t eg ra t e d languages different from Bulgarian. The integration and the augmentation will be done within CLaRK system (http://www.bultreebank.org/clark/index.html). In order to support web services we extended CLaRK system to be used in pipes and as a server. This service will be used for annotation of each textual element of the resources within CSF such as learning objects, comments, definitions, etc. The image annotation works with images in multimedia documents. It provides an image editor for selection of regions in the image and mechanisms for annotation of these regions with concepts from ontology. The resulting annotation is stored in the document and can be used for searching and other processing. The image annotator is already implemented. Social Media Services These are services which will be implemented with LTfLL project and they will elicit new knowledge and recommendations from social media. 4.2 Task 6.2 – Social component Web Services All functionalities provide a web service interface. If you have configured the tools, you can use them with standard technologies like WSDL and SOAP through this interface. After successfully installing the tools you can access the web service description at: http://<your_tomcat_url>/lt4elservice/services/Lt4elService?wsdl The current specification lists eight implemented methods. sendNewLo Sending a new learning object to the language technology server. This is the first method that should be invoked right after a new learning object is created in or uploaded to a learning management system. This service passes the URL of a learning object in its original format (HTML, PDF …) to the server. To make this work, the linguistic processing chain for the given language and format must be configured and running. Input Parameters  loid (xsd:string): Learning Object ID. This ID is used to identify the learning object. It is assumed, that this ID is generated in the learning management system, when new learning objects are created. This ID is used as an input/output parameter in most if the other functions.  language (xsd:string): The (main) language of the learning object. The language must be represented by a two-letter code as defined in ISO 639-1. See http://www.oasisopen.org/cover/iso639a.html for details. LTfLL - 2008 - 212578 24 Ex ist in g Se rv ices – I n t eg ra t e d  url (xsd:string): URL of the learning object file in its original format. sendNewLoAnnotated Sending a pre-annotated learning object to the language technology server. This service does basically the same as sendNewLo. The difference is that instead of a raw learning object, file paths to local pre-annotated versions of the learning object are passed as parameters. The files must respect the LT4eLAna DTD. The service should be used, when a language processing chain is not available for a given language. Input Parameters  loid (xsd:string): Learning Object ID. This ID is used to identify the learning object. It is assumed, that this ID is generated in the learning management system, when new learning objects are created. This ID is used as an input/output parameter in most if the other functions.  language (xsd:string): The (main) language of the learning object. The language must be represented by a two-letter code as defined in ISO 639-1. See http://www.oasisopen.org/cover/iso639a.html for details.  filename (xsd:string): Local path to the pre-annotated learning object file (LT4eLAnaDTD).  attach (xsd:boolean): not used.  filename2 (xsd:string): Local path to ontological annotated file. If no path is given, semantic search will not be available for this learning object. Output Parameters  accepted (xsd:boolean): True, if the language technology server successfully received the learning object, false otherwise. getStatus Get processing status of a learning object that has been sent to the language server. This function can be used after sendNewLO or sendNewLOAnnotated has been invoked for a learning object. Since the processing and conversion of a new learning object may take several minutes, this function tells the learning management system the status of the processing. It can be used to be displayed for the user and to deactivate certain functions that cannot be used until the processing status is finished (2). Input Parameters  loid (xsd:string): Learning Object ID. (see sendNewLo). Output Parameters LTfLL - 2008 - 212578 25 Ex ist in g Se rv ices – I n t eg ra t e d  status (DocumentStatus): Two parameters are returned. Status contains the current processing status of the LO as string with the following possible values: o UNKNOWNPROCESSING o FAILED o FINISHED The second parameter StatusStr contains a longer status message, containing additional information e.g. about a processing failure. Types  DocumentStatus sequence of Status (xsd:string) StatusStr (xsd:string) deleteLO Delete a learning objects representation on the language technology server. This function should be called when a learning object is deleted in the learning management system. After successfully invoking deleteLO, subsequent calls to getStatus will return UNKNOWN again. Input Parameters  loid (xsd:string): Learning Object ID. (see sendNewLo). Output Parameters  success (xsd:boolean): True, if the language technology server successfully removed the learning object, false otherwise. findKeywordCandidates Find candidate terms for keyword annotation of a learning object. This method should be used by a learning management system, when a learning object is annotated with keywords. A lot of learning management systems come with support for LOM or Dublin Core meta data. Both of these standards allow the annotation with keywords. However, simple tagging systems work the same way and could use this function to propose keywords to an annotator. Current Restrictions  In general this function works better, when the internal language model gets larger. This means if only a small number of learning objects have been sent to the language technology server by using the function sendNewLo, the quality of the results is suboptimal. Good quality can be expected after 30-50 mid-size learning objects have been sent to the language technology server. Input Parameters LTfLL - 2008 - 212578 26 Ex ist in g Se rv ices – I n t eg ra t e d    loid (xsd:string): Learning Object ID. (see sendNewLo). maxnum (xsd:int): Maximum number of keywords that should be returned by the function. method (xsd:string): Method of keyword detection. o tfidf TF-IDF o ridf R-IDF o adridf ADR-IDF (currently best performing) Output Parameters  keywords (ArrayOfString): Ranked keywords. Types  ArrayOfString o minOccurs 0 o maxOccurs unbounded o type xsd:string sendApprovedKeywords Send all keywords related to a learning object that has been approved by a human annotator back to the learning technology server. The language technology server could later use this information, e.g. during the search. Input Parameters  loid (xsd:string): Learning Object ID  keywords (tns:ArrayOfString): Keywords approved by an author. Output Parameters  success (xsd:boolean): Success true/false getDefinitionCandidates Get a set of terms and candidate definitions for a learning object. This method can be used by learning managements systems to support semi-automatic generation of glossaries with terms and definitions found in a learning object. The returned value context includes the surrounding text of the definition (usually around three sentences). Input Parameters  loid (xsd:string): Learning Object ID. (see sendNewLo). Output Parameters LTfLL - 2008 - 212578 27 Ex ist in g Se rv ices – I n t eg ra t e d  definitions (ArrayOfDefinition): Array of terms and related defining texts. Types  ArrayOfDefinition o minOccurs 0 o maxOccurs unbounded o type Definition  Definition  type sequence of context (xsd:string) definedTerm (xsd:string) definingText (xsd:string) search Search for learning objects. This function supports extended search capabilities based on fulltext search, keyword based search and semantic search. Semantic search supports multilingual retrieval of learning objects by using lexicons and an ontology. Input Parameters  searchTerms (tns:ArrayOfString): The search terms  semantic (xsd:boolean): Semantic search.  keywords (xsd:boolean): Keyword search.  fulltext (xsd:boolean): Fulltext search.  conjunctive (xs:boolean): Conjuctive combination of search terms (otherwise  disjunctive)  searchLangs (tns:ArrayOfStrings): Search term languages  retrievalLangs (tns:ArrayOfStrings): Languages of target learning objects  method (xsd:string): Search Method („SEMANTIC“, „KEYWORD“, FULLTEXT“)  searchConcepts (tns:ArrayOfString): Concepts, if learning objects related to concepts are searched  systemLang (xsd:string): System Language  maxSnippets (xsd:int): Maximum number of search context snippets that should be returned. Use lower values to increase performance. Output Parameters  searchresult (WSSearchResult): Two arrays. The first one includes a result list with all found LOs, including information on score, text context, and related concepts. The second array holds a list of ontology concepts that are related to the search terms. LTfLL - 2008 - 212578 28 Ex ist in g Se rv ices – I n t eg ra t e d Types  WSSearchResult resultList ArrayOfSearchResult termConcepts ArrayofString  Search Result docid xsd:string (Learning Object ID) fulltext xsd:boolean (Found by fulltext search) semantic xsd:boolean (Found by semantic search) keyword xsd:boolean (Found by keyword search) matchingConcepts ArrayOfString (Concepts related to search termsand LO) rankedConcepts ArrayOfString (Concepts annotated to the LO) score xsd:double (Relevance Score) snippet xsd:string (Contextual text snippet) getConceptNeighbourhood Get relations and related concepts of an ontology concept. This function can be used to support browsing through the ontology in the learning management system interface. Input Parameters  concepts (tns:ArrayOfString): Concepts  languages (tns:ArrayOfString): Languages (the entries in the lexicons for these languages and the concepts included in the fragments will be returned) Output Parameters  fragements (tns:ArrayOfString): Ontology fragments for concepts. getRankedConceptsForDocs This service returns all concepts that are related to one (or multiple) learning object(s). The concepts are ranked according to their number of occurrences in the learning module. Input Parameters  loids (tns:ArrayOfString): Learning Object IDs. Output Parameters  concepts (tns:ArrayOfConceptItem): List of ranked concepts. LTfLL - 2008 - 212578 29 Ex ist in g Se rv ices – I n t eg ra t e d Types  ConceptItem concept docId LTfLL - 2008 - 212578 ArrayOfSearchResult xsd:string 30 Ex ist in g Se rv ices – I n t eg ra t e d 5. Overview on Showcase Characteristics T4.1 welldefined interfaces T4.2 welldefined interfaces full full Documentation partly partly License fully open source easy possible HTML, CSS, JavaScrip t Encapsulation WP2 Integration T5.1 welldefined interfaces easy possible T5.2 T6.1 partly partly possible possible partly partly nearly full fully open source partly open source partly open source easy possible possible possible HTML, CSS, Flash T6.2 welldefined interfaces easy possible Client layer Widgetisation Visualisation adaptatio ns needed easy possible HTML, CSS, Java Applet HTML, CSS, JavaScrip t possible adaptatio adaptatio ns needed ns needed easy possible Service layer Servicification Technology full XML, REST, AJAX easy possible XML, REST over HTTP XML, REST REST over HTTP C, PHP text files VUE, XML, REST SOAP, WSDL, XML Application logic layer Programming language R, PHP R, Flare C#.NET, Java, Python, PHP MySQL file system, MySQL XSL, text files, XML Ruby, Java Storage Layer Technology LTfLL - 2008 - 212578 XML MySQL 31 Ex ist in g Se rv ices – I n t eg ra t e d 6. Integration Strategy 6.1 Position of Software Integration within the Project Integration Lifecycle As this project’s software outcomes will be very heterogeneous in their implementation characteristics it is important to define a suitable integration strategy for this task. Within the first deliverable D2.1 the basic software development and release process for the whole project period was defined. Enhancing this first approach one step further, Fig. 10 displays a simplified project lifecycle, emphasising on the software development process important for the proposed integration strategy (bold items represent integration specific tasks). Fig. 10: Simplified project lifecycle emphasising software integration process; adapted spiral model from (Boehm, 1988) As specified in deliverable D2.1 software releases are defined to be prepared in line with the submission of the deliverables of the different work packages. With respect to WP 2 LTfLL - 2008 - 212578 32 Ex ist in g Se rv ices – I n t eg ra t e d this means, that the project’s lifecycle is an iterative process having altogether three main loops: (1) ‘Existing services (showcases) integrated’, (2) ‘Services v1 – integrated’ and (3) ‘Services v2 – integrated’. The (1) initial iteration has its starting point from the Description of Work (DoW) and defines first software requirements which are met at large in the architecture, design, and implementation of the prototypes reflecting the different partners’ showcases. After the technical validation, the outcome is a revised development plan with adapted requirements which are influencing the scenario-based design process. Being now in the (2) second iteration a new software design is elaborated causing new development of software products which have to be tested and integrated in the general WP2 infrastructure. After having a stable first version of integrated services, a product release is planned freezing the development at that stage. The technical validation is done like in the first loop, resulting in a new development plan and therefore emerging adapted requirements. As it can be assumed – being in the (3) last iteration – that the software development process has become as stable as the software outcomes, a last detailed design phase is the initial task for the final development, integration, and testing phase resulting in a second version of integrated software products. At the end of the project a final release is done having well tested and stable software products which can be integrated in existing applications as defined in the requirements. Software Development Process It can be assumed that coding of applications is mostly done in-line with typical looking processes well known in software development. By having a look at Fig. 11 a software development process can be seen focusing on test and integration tasks (bold). Fig. 11 can be seen as a sub-system of the entirely integration process throughout the whole project. This means that each stage of the development process has to be more or less run through in each iteration of the project lifecycle. By having a closer look at Fig. 11 all development starts with a requirements analysis, followed by defining the functional specification and the software architecture and the system design. Then the actual coding of software parts takes place. Unit testing can be seen as the first stage of the dynamic test process and is the smallest testable part of an application. Until now all development specific issues are done more or less by the different WPs themselves. LTfLL - 2008 - 212578 33 Ex ist in g Se rv ices – I n t eg ra t e d After having assured that the individual software components are fit for use, the application is integrated within the WP2 infrastructure. Along with the integration, testing takes place to ensure that the program can run on WP2’s infrastructure and to collect requirements for and guarantee transferability and interoperability. Yet, it is not finally decided if the project’s software outcomes should consist of a single coherent system or of individual software components loosely coupled together. By sticking to a widget based design approach, this decision can be postponed to a later point in time when the different software parts are more evolved (e.g. after validation of stage ‘Services v1 – integrated’). This software architectural decision affects very much the kind of system tests performed. The last stage of the software development process forms acceptance testing, which is done using at the one side a technical validation and at the other side a stakeholder driven qualitative validation approach. Fig. 11: Adapted and simplified software development process 6.2 Integration Workflow As for the actual integration, a process is defined which ensures an easy way to develop, share, integrate and test different software components. The process has to ensure that every partner can build software according to his/her preferences but has to stick to a minimal set of general rules which guarantee good communication and cooperation. This is important because every partner will have to collaborate with others at a certain stage in the development lifecycle. Therefore, two separate processes are defined meeting the needs of software developers. It is distinguished between (1) partners who uses WP2 infrastructure for developments and (2) partners who uses external infrastructure. LTfLL - 2008 - 212578 34 Ex ist in g Se rv ices – I n t eg ra t e d Developments on WP2 Infrastructure Every developing project partner is approved to have access to the WP2 development and test infrastructure. WP2 ensures that partners are granted appropriate access to suitable programming environments meeting their requirements. Therefore, project partners can develop their software components on a ‘common and typical’ server infrastructure to guarantee at least a minimum level of transferability and interoperability. For developments following this approach a workflow has been defined which is displayed in Fig. 121. There a typical development process is shown starting with an initial integration of already existing software at project partner’s side and ending with a new stable package release consisting of an up and running service on the WP2 live system. Light blue activities intended to be in the scope of project partner WPs, whereas light red activities are duties of WP2. The initial integration of existing software has to follow the workflow of external infrastructure developments to ensure to have an already integrated system for further developments (partner WPs and WP2 involved). The software can then be extended by the project partner resulting in a new revision (e.g. at the end of a day) which units have to be tested to verify internal correctness. After successful unit tests a commit to the global SourceForge repository is done. This development cycle is iterated until a major revision is deployed. As software developments are done already on the WP2 infrastructure integration tests are done automatically by partner WPs (otherwise components would not operate correctly). Major revision of software outcomes are also tested for their validity by WP2. After confirming a correct major revision it is stated as being an integrated revision. If any error occurs which cannot be solved by WP2 partner WPs are informed which have to adapt the software. As stated in former sections software releases are defined to be done in line with the submission of the deliverables of the different WPs. This minimum requirement must be met, but there is nothing to say against a software release between two deliverables, if a new stable version is deployed and satisfactorily tested. This means if a major revision is stated stable by the involved WPs a new package release can be instantiated. For this reason WP2 will deploy the already successfully integrated software on the test system also on the live server. If the installation and configuration fails for any reason, partner WPs are informed which can adapt the software with the help of WP2. As this adaptation will result in a new revision, it has to pass through all the stages of testing once again. At 1 For all activity diagrams displayed in this paper it is assumed that the project partner has already been granted access to the WP2 infrastructure and that the required development environment is set up. Therefore, this process is not illustrated in the corresponding figures. LTfLL - 2008 - 212578 35 Ex ist in g Se rv ices – I n t eg ra t e d the end of the integration process a correct working service should be up and running on the live server and being accessible by everybody from the outside. At every step in the development process documentation of the software has to be done. But to ensure a traceable integration process protocols of this work have also to be maintained. More information can be found in section 7. LTfLL - 2008 - 212578 36 Ex ist in g Se rv ices – I n t eg ra t e d Fig. 12: Activity diagram of workflow for developments on WP2 infrastructure LTfLL - 2008 - 212578 37 Ex ist in g Se rv ices – I n t eg ra t e d Developments on External Infrastructure Some partner WPs feel the need to develop software on their own infrastructure. Therefore, for them the WP2 servers are not the best choice as a programming environment. It should be no problem that partner WPs deploy programs on their own as long as it can be guaranteed that developed software systems can run on WP2’s infrastructure to test their transferability. The workflow of developing software on external infrastructure can be seen in Fig. 13. Again light blue activities belong to partner WPs, while light red activities are in the scope of WP2. The development of new software parts and unit tests are all done on partner WPs infrastructure. That means that versioning and track changes are not obligatory and every partner WP has to take care of backing up their programming code on their own. If a major revision has been developed the partner WP commit it to the main SourceForge repository. WP2 checks-out/update the version on their infrastructure and perform integration tests to ensure that the software works in their environment. As these integration tests are likely not to be performed periodically, errors will occur and WP2 will inform the partners if they cannot fix the error by themselves. The partner WP has to adapt the software and testing procedures will start again. If the major revision is stated as a stable package – and of course after successful integration tests – a package release on the live server is done like in the workflow of developing on WP2 infrastructure in the former chapter. Again, at the end of the integration process a correct working system should be up and running on the WP2 live server. LTfLL - 2008 - 212578 38 Ex ist in g Se rv ices – I n t eg ra t e d Fig. 13: Activity diagram of workflow for developments on external infrastructure LTfLL - 2008 - 212578 39 Ex ist in g Se rv ices – I n t eg ra t e d 7. Versioning Policy 1) Check-in code to SourceForge by project partners: The code checked in to the SourceForge repository should include a folder named docs, which contains an installation manual (install.txt) and a test manual (test.txt). For more information about the installation and test manuals follow the instructions in steps 2 and 3. 2) Installation manual (install.txt): The installation manual (install.txt) describes how to install the service in a stepwise manner. It may contain instruction parts such as ‘Requirements’, ‘How to install’, ‘Troubleshooting’, etc. The steps described in the installation manual should be clear enough so that a technical staff that is not familiar with the corresponding service is capable of performing a working installation successfully. If the code to be installed requires libraries or packages not existing on the infrastructure, these should be listed in the ‘Requirements’ part of installation manual. The required packages should be specified in detail to simplify the installation process. 3) Test manual (test.txt): The test manual (test.txt), describes a stepwise procedure to test the service that is already installed. The goal is to verify the service manually in order to ensure that the service meets the functionality and works properly on the production system. It should contain detailed instructions about the inputs, commands and operations to be executed during the test and the expecting output of the service. In order to ensure the successful integration to the production system the infrastructure partners (BIT MEDIA & WUW) offers detailed information about the development environment and the server hard- and software. 4) Check out the code by infrastructure partners: The infrastructure partners (WP2) are informed automatically through the email notification of SourceForge when the project partners check in their code. If the code does not include the required installation and test manuals described in step 1 the project partner is informed by infrastructure partners about the missing files. The corresponding partner is then asked to complete the code according to instructions in step 1 and check it in again to SourceForge. 5) Service installation by infrastructure partners: LTfLL - 2008 - 212578 40 Ex ist in g Se rv ices – I n t eg ra t e d WP2 will try to install the software following the instructions in the installation manual as described in step 2. In the case of installation failures an error description is sent back to the partner, who offers the service. The partner is asked to update the installation manual or code, check it in again and offer a cooperative support to the infrastructure partners in order to achieve a working installation on the infrastructure. 6) Service test by infrastructure partners: If the service could be installed successfully, WP2 will try to test the installed service according to the instructions in the test manual described in step 3. If the test fails or cannot be executed, an error description is sent back to the partner, who offers the service. The partner is asked to update the test manual, or fix the errors, check in the code again and offer a cooperative support to the infrastructure partners in order to achieve a working service on the infrastructure. 7) Final success report to project partners: If the service is installed and tested successfully, WP2 informs the partners about the successful installation and testing of the service. LTfLL - 2008 - 212578 41 Ex ist in g Se rv ices – I n t eg ra t e d References Berry, M. 1992. Large Scale Singular Value Computations, International Journal of Supercomputer Applications 6:1, pp. 13-49. Boehm, B. W. 1988. A Spiral Model of Software Development and Enhancement. Computer. 1988, Vol. 21, 5. Feinerer, I. and Hornik, K. and Meyer, D. 2008. Text mining infrastructure in R. Journal of Statistical Software, 25(5):1-54, March 2008 Horner, J. 2009. rapache: Web application development with R and Apache. [Online] 2009. [Cited: February 26, 2009.] http://biostat.mc.vanderbilt.edu/rapache/. Lemaire, B. and Dessus, P. 2001. A system to assess the semantic content of student essays. Journal of Educational Computing Research. 2001, Vol. 24, 305-320. Oehlschlägel, J. and Adler, D. and Nenadic, O. and Zucchini, W. 2008. A first glimpse into 'R.ff'. http://www.statistik.uni-dortmund.de/useR-2008/slides/ Oehlschlaegel+Adler+Nenadic+Zucchini.pdf. Rohde, D. 2009. svdlibc, http://tedlab.mit.edu/~dr/svdlibc/ Wild, F. 2009. lsa: Latent Semantic Analysis, R package version 0.61 LTfLL - 2008 - 212578 42