Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

D7. 4 Validation 4

2011
This deliverable describes the objectives, approach, planning and results of the third pilot round, in which both individual and threaded services underwent validation. The two goals of this round were to provide input to the LTfLL exploitation plan and roadmap (deliverable 2.5). 531 participants (316 learners) took part in the pilots, which used LTfLL services based on five different languages. The average timespan of the pilots was three weeks and involved learners, tutors, teaching managers, the LTfLL team and Technology Enhanced ......Read more
D7.4 - Validation 4 Page 1 of 349 Appendix B: Validation Reporting Templates Table of Contents Appendix B.1 Validation Reporting Template Overview ...................................................................................................................... 2 Appendix B.2 Validation Reporting Template for WP4.1 (bitmedia & IPP-BAS)................................................................................... 4 Appendix B.3 Validation Reporting Template for WP4.2 (UNIMAN and OUNL) ................................................................................. 64 Appendix B.4 Validation Reporting Template for WP5.1 (PUB-NCIT and UNIMAN) ........................................................................ 120 Appendix B.5 Validation Reporting Template for WP5.2 (UPMF / CNED) ....................................................................................... 182 Appendix B.6 Validation Reporting Template for WP6.1 (IPP-BAS & Sofia University) ................................................................... 230 Appendix B.6 Validation Reporting Template for WP6.2 (PUB-NCIT & UU) .................................................................................... 270 Appendix B.7 Validation Reporting Template for Long Thread (OUNL, AURUS & PUB-NCIT)........................................................ 320 This appendix provides the full validation reporting templates (VRTs), with the exception of the pilot sites, courses and participants (part of Section 2) which is provided in Appendix A.3. The appendix begins with an overview describing the layout of the VRTs.
D7.4 - Validation 4 Page 2 of 349 Appendix B.1 Validation Reporting Template Overview The finalized validation template for Round 3 comprises the following sections: Section 1: Functionality implemented in Version 1.5 or the Long Thread, alpha and beta testing Section 2: Validation pilot overview Section 3: Results validation/verification of Validation Topics Section 4: Results validation activities informing future changes / enhancements to the system Section 5: Results validation activities informing transferability, exploitation and barriers to adoption Section 6: Conclusions Section 7: Roadmap (to pass to D2.5) Section 1 describes the changes to the software in Version 1.5, compared with Version 1. Section 2 describes the pilot environment: participants, language, pilot task to be completed, summary details of verification and other experiments. Sections 3 5 provide the results. Partners were asked to provide the results scientifically, without discussion ("the results should speak for themselves"), to allow the WP7 team and others to take as unbiased a view as possible. Section 6 provides the conclusions under four headings: Conclusions on whether the validation topics have been validated, within the limitations of the methodologies used
D7.4 - Validation 4 Appendix B: Validation Reporting Templates Table of Contents Appendix B.1 Appendix B.2 Appendix B.3 Appendix B.4 Appendix B.5 Appendix B.6 Appendix B.6 Appendix B.7 Validation Reporting Template Overview ...................................................................................................................... 2 Validation Reporting Template for WP4.1 (bitmedia & IPP-BAS)................................................................................... 4 Validation Reporting Template for WP4.2 (UNIMAN and OUNL) ................................................................................. 64 Validation Reporting Template for WP5.1 (PUB-NCIT and UNIMAN) ........................................................................ 120 Validation Reporting Template for WP5.2 (UPMF / CNED) ....................................................................................... 182 Validation Reporting Template for WP6.1 (IPP-BAS & Sofia University) ................................................................... 230 Validation Reporting Template for WP6.2 (PUB-NCIT & UU) .................................................................................... 270 Validation Reporting Template for Long Thread (OUNL, AURUS & PUB-NCIT)........................................................ 320 This appendix provides the full validation reporting templates (VRTs), with the exception of the pilot sites, courses and participants (part of Section 2) which is provided in Appendix A.3. The appendix begins with an overview describing the layout of the VRTs. Page 1 of 349 D7.4 - Validation 4 Appendix B.1 Validation Reporting Template Overview The finalized validation template for Round 3 comprises the following sections: Section 1: Section 2: Section 3: Section 4: Section 5: Section 6: Section 7: Functionality implemented in Version 1.5 or the Long Thread, alpha and beta testing Validation pilot overview Results – validation/verification of Validation Topics Results – validation activities informing future changes / enhancements to the system Results – validation activities informing transferability, exploitation and barriers to adoption Conclusions Roadmap (to pass to D2.5) Section 1 describes the changes to the software in Version 1.5, compared with Version 1. Section 2 describes the pilot environment: participants, language, pilot task to be completed, summary details of verification and other experiments. Sections 3 – 5 provide the results. Partners were asked to provide the results scientifically, without discussion ("the results should speak for themselves"), to allow the WP7 team and others to take as unbiased a view as possible. Section 6 provides the conclusions under four headings: Conclusions on whether the validation topics have been validated, within the limitations of the methodologies used Page 2 of 349 D7.4 - Validation 4 A SWOT analysis (Strengths, Weaknesses, Opportunities and Threats) considering the objective "<LTfLL service> (v1.5) will be adopted in pedagogic contexts beyond the end of the project". A short overall conclusion regarding the likelihood of adoption of v.1.5 of the service (as informed by the SWOT). The most important future actions to promote adoption of the service (as informed by the SWOT). These should be carried into the Roadmap section 7. Section 7 provides the roadmap to be passed to D2.5. The roadmap is in five sections, addressing: Future enhancements to the system Changes to scenarios of use Possible additional educational contexts for deployment The most important issues for future technical research to enable deployment of language technologies in educational contexts Further validation planned for beyond the end of the project This deliverable seeks to answer specific questions concerning exploitation and the roadmap. This led to a decision to limit partners' discussions of items in the VRTs, though the WP7 team recognizes the wealth of data that could be discussed in more depth in future papers. Accordingly, partners were asked to be as brief as possible and to draw very specific conclusions concerning exploitation and the roadmap (only) from their data. Because Section 3 (Results for validation topics) provides data to be categorised as validated / validated with qualifications / not validated, it was particularly important that categorisation teams should not be influenced by discussion in this section. Page 3 of 349 D7.4 - Validation 4 Appendix B.2 Validation Reporting Template for WP4.1 (bitmedia & IPP-BAS) Verification data was provided by WUW. Section 1: Functionality implemented in Version 1.5 and alpha / beta-testing record Brief description of functionality Version number of unit Changes from Version 1.0 Short thread (4.1 - 6.1) v1.5 The domain ontology in IT, present in 6.1, was made available also for stakeholders of 4.1. It gives the concept coverage within learners‟ answers in the live feedback (matched, missing, and additional) through concept annotation and comparison. The tutor can also mark the information he/she agrees with to improve the results of the service. Semantic search can be used during the creation of the questionnaire for selection of appropriate learning objects. Manual addition of concept annotation can be used to enrich the lexicalization of the ontology. Lexicons and annotation grammars for German and Bulgarian languages have been added. They have been related to the ontology in the appropriate way. Annotation service v1.5 Provided concept annotation for learning materials in Bulgarian in addition to English enriched Information Technologies ontology and related lexicons for the two languages. As a consequence of the improved concept availability, tutors are given a better choice when selecting representative conceptual information on a given topic and on the other hand learners are provided with more exhaustive model of the particular knowledge domain (vocabulary and notions). Live feedback component v1.5 Integration of knowledge rich (KR) and knowledge poor (KP) approach for different languages - The integration of KR and KP approach has been enhanced for the use in the German pilot. Page 4 of 349 D7.4 - Validation 4 The combination of the two results adds one more layer of knowledge representation for both the learning materials and the learners' answers. The former ensures the concept information, while the latter – the language expressions in texts. Tutors can prepare for each question a fixed list of concepts that s/he considers obligatory for an answer to be satisfactory. Learners can use this information, which is part of the Live Feedback, to learn new concepts, to find information about them and to improve their answers. Lexicalisations update v1.5 Added functionality that provides means for the tutor to add new language expressions for the domain specific concepts and thus to affect text annotation in both learning materials and learners answers. According to his/her understanding s/he can supplement the list of lexical items, corresponding to a given concept, with new terms or terms that for some reason were not included in the original lexicon. Alpha-testing Pilot site and language Bitmedia (German) Date of completion of alpha testing: 28 October 2010 Who performed the alpha testing? bitmedia (Christoph Mauerhofer, Wolfgang Maierl) Pilot site and language IPP-BAS (Bulgarian) Date of completion of alpha testing: 12 October 2010 Who performed the alpha testing? Alexander Simov, IPP-BAS Beta-testing Page 5 of 349 D7.4 - Validation 4 Pilot site and language: bitmedia (Wien) Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): No If ‘No’ or ‘Partially’, give reasons: The primary aspect for the beta-testing was to ensure the quality of the service results, which are independent of the embedded and standalone version. During the validation we used the standalone version to avoid different user interface layouts based on the available customizations in the widget version. The widgetised version was used for the presentations and dissemination activities and will be the only version used for further activities. beta-testing performed by: Barbara Busch, Hans Kudy (bitmedia Wien) beta testing environment (stand-alone service / integrated into Elgg): stand-alone service HANDOVER DATE: 17.11.2010 (Date of handover of software v.1.5 for validation) Pilot site and language: IPP-BAS (Bulgarian) Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): Yes If ‘No’ or ‘Partially’, give reasons: beta-testing performed by: Laska Laskova, Stanislava Kancheva (IPP-BAS) beta testing environment (stand-alone service / integrated into Elgg): Elgg HANDOVER DATE: 17.11.2010 (Date of handover of software v.1.5 for validation) Page 6 of 349 D7.4 - Validation 4 Section 2: Validation Pilot Overview NB Information about pilot sites, courses and participants has been transferred to Appendix A.3 Pilot task Pilot site: bitmedia Austria Pilot language: German What is the pilot task for learners and how do they interact with the system? Learners answer a set of questions on topics from the course domain (Introduction to IT). They receive an instant live feedback from the system so they do not have to wait for the tutors‟ reaction to get recommendations for improving their knowledge, if needed. The tutors are advised to use the short thread scenario to add annotations to the concept. What do the learners produce as outputs? Are the outputs marked? The output is a short text, answer to an open question. The output is graded from 0 to 100 (done by the system based on the existing grading of tutors for different answers). How long does the pilot task last, from the learners starting the task to their final involvement with the software? During the first part of the learners' educational path, the LTfLL project team explained to the tutors and learners involved the goals of the LeaPos Service, the basic usage of the user interface and the difference between phrases and concepts. The further usage of the service was handed over to the individual tutors and learners. The learners were able to use the LeaPos Service at an individualized time during the following days or immediately after the introduction to the LeaPos Service based on their individual time management. The LeaPos Service was used one or two times per learner to get feedback from the system. About one and half or two weeks later a final session for collecting the feedback and the individual interview‟s took place. During the two weeks period the learners were involved in their defined learning path and were allowed to use different tools for reaching their learning goals (e-Learning Content, lab materials, LeaPos-Service, printed learning materials, tutors help, ECDL pretesting system, …). The tutors where able to use the short thread scenario to add annotations to the concept. Page 7 of 349 D7.4 - Validation 4 How do tutors/student facilitators interact with the learners and the system? Tutors select questions, relevant concepts, phrases and learning materials for these questions. In addition the short thread functionality for the annotation approach is used. They assess and comment learners‟ answers after considering live feedback information. Describe any manual intervention of the LTfLL team in the pilot: No manual intervention. Pilot site: Sofia University Pilot language: Bulgarian What is the pilot task for learners and how do they interact with the system? Learners answer a set of questions on random topics from the course domain (Introduction to IT). They receive an instant live feedback from the system so they do not have to wait for the tutors‟ reaction to get recommendations for improving their knowledge, if needed. What do the learners produce as outputs? Are the outputs marked? The output is a short text, answer to an open question. The output is graded from 0 to 100 (done by the system based on the existing grading of tutors for different answers). How long does the pilot task last, from the learners starting the task to their final involvement with the software? Two weeks time span from the learners‟ log in the system (obligatory) through the tutor‟s final grading of the answers (obligatory), and then again students‟ improved answers (optional). Since the students shared an opinion that they would need more flexible times for performing the task, the time and the number of corrections for the answers were not fixed. For that reason it was not surprising that feedback was also received after the two-week period. How do tutors/student facilitators interact with the learners and the system? Tutors select questions, relevant concepts, phrases and learning materials for these questions. They assess and comment learners‟ answers after considering live-feedback information. Describe any manual intervention of the LTfLL team in the pilot: No manual intervention. Page 8 of 349 D7.4 - Validation 4 Page 9 of 349 D7.4 - Validation 4 Section 3: Results - validation/verification of Validation Topics OVT: 1.1 Pilot site bit Austria Pilot language German Operational Validation Topic Absolute value of score The tutors/experts find that When the score given by the system is compared with the score given by the tutor, the difference between the two values is small. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measure: Pearson correlation of system scores against human/tutor scores. Based on 303 graded answers and 10 questions. Scoring algorithm: Weighted scoring - Take closest 3 answers and calculate weighted (by cosine distance) average grade. For testing purposes use N-1 when searching for closest answers. Results: Correlation*: 0.57 Training process: Automated best dimension identification: Given the ranking formula, calculate the ideal number of space dimensions to be used that achieves the highest possible correlations on a per-question basis. * Correlations range from 0.25 to 0.68 depending on the question. Formative results with respect to validation indicator, including quotations Stakeholder type Results Tutors / Interview The absolute value of the score provides useful information of the learner‟s knowledge. Deviations in the scoring provided by the LeaPos Service are lower compared with the deviations generated by traditional scoring based on human estimation. Page 10 of 349 D7.4 - Validation 4 OVT: 1.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic Absolute value of score The tutors/experts find that When the score given by the system is compared with the score given by the tutor, the difference between the two values is small. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measure: Pearson correlation of system scores against human/tutor scores. Based on 274 graded answers and 10 questions. Scoring algorithm: Weighted scoring - Take closest 3 answers and calculate weighted (by cosine distance) average grade. For testing purposes use N-1 when searching for closest answers. Results: Correlation*: 0.63 Training process: Automated best dimension identification: Given the ranking formula, calculate the ideal number of space dimensions to be used that achieves the highest possible correlations on a per-question basis. * Correlations range from 0.26 to 0.85 depending on the question. OVT: 1.2 Pilot site bit Austria Pilot language German Operational Validation Topic Relative value of score The tutors/experts find that When a learner has improved his/her answer, as judged by the tutor, an increase in the live feedback score is observed consistently. Formative results with respect to validation indicator, including quotations Stakeholder type Results Tutors / Interview “The improvement of the answers given by the learners is continuously reflected by the feedback score.” “The increments of the feedback score for improved answers are differing for similar improvements and not absolutely consistent”. Page 11 of 349 D7.4 - Validation 4 OVT: 1.3 Pilot site bit Austria Pilot language German Operational Validation Topic Knowledge Poor feedback: The tutors/experts find that A high proportion of the phrases in the two columns (positive, missing) are judged as being correct feedback. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measure: Pearson correlation of cumulated phrase scores against human/tutor scores. The qualitative output of the phrase list was quantified by summing up the underlying phrase scores of the identified phrases in the answer, which are used to determine which phrases are displayed to the user. Based on 303 graded answers and 10 questions. Individual phrase scoring formula: log(grade_sum+1) * ridf Results: Correlation*: 0.43 * Correlations range from 0.16 to 0.59 depending on the question. Note that this is an artificial value to quantify a qualitative list of detected phrases and has little if any direct implication on whether the list is useful to the learner/tutor or not. Page 12 of 349 D7.4 - Validation 4 OVT: 1.3 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic Knowledge Poor feedback: The tutors/experts find that A high proportion of the phrases in the two columns (positive, missing) are judged as being correct feedback. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measure: Pearson correlation of cumulated phrase scores against human/tutor scores. The qualitative output of the phrase list was quantified by summing up the underlying phrase scores of the identified phrases in the answer, which are used to determine which phrases are displayed to the user. Based on 274 graded answers and 10 questions. Individual phrase scoring formula: log(grade_sum+1) * ridf Results: Correlation*: 0.5 * Correlations range from 0.28 to 0.74 depending on the question. Note that this is an artificial value to quanitify a qualitative list of detected phrases and has little if any direct implication on whether the list is useful to the learner/tutor or not. OVT: 1.4 Pilot site bit Austria Pilot language German Operational Validation Topic Knowledge Rich feedback: The tutors/experts find that A high proportion of the concepts in the two columns (common, missing, additional) are judged as being correct feedback. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measure: Pearson correlation of concept scores against human/tutor scores. The qualitative output of the concept list was quantified by summing up the number of the identified concepts in the answer. Based on 303 graded answers and 10 questions. Results: Page 13 of 349 D7.4 - Validation 4 Correlation*: 0,33 * Correlations range from 0.1 to 0.6 depending on the question. Note that this is an artificial value to quanitify a qualitative list of detected concepts and has little if any direct implication on whether the list is useful to the learner/tutor or not. OVT: 1.4 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic Knowledge Rich feedback: The tutors/experts find that A high proportion of the concepts in the two columns (common, missing, additional) are judged as being correct feedback. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measure: Pearson correlation of concept scores against human/tutor scores. The qualitative output of the concept list was quantified by summing up the number of the identified concepts in the answer. Based on 303 graded answers and 10 questions. Based on 274 graded answers and 10 questions. Results: Correlation*: 0.35 * Correlations range from 0.1 to 0.75 depending on the question. Note that this is an artificial value to quanitify a qualitative list of detected concepts and has little if any direct implication on whether the list is useful to the learner/tutor or not. OVT: 2.1 Pilot site bit Austria Pilot language German Operational Validation Topic Tutors spend less time preparing final feedback for learners and grading compared with traditional means. Summative results with respect to validation indicator Page 14 of 349 D7.4 - Validation 4 Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors / Interview Results: Average time for the tutor-learner feedback session (calculated with 10 feedback sessions for “Traditional” and “Using the LeaPos Service”) : Traditional: about 24 min for each learner Using the LeaPos Service: about 18 min for each learner Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 7. It takes less time to complete my teaching tasks using LeaPos than without the system. Experimental 3,4 1,18 45% 5 Tutors 8. Using LeaPos enables me to work more quickly than without the system. Experimental 3,3 1,14 48% 5 Tutors 9. I do not wait too long before receiving the requested information. Experimental 2,4 1,22 18% 5 Tutors 10. LeaPos provides me with the requested information when I require it (i.e. at the right time in my work activities). Experimental 3,3 0,99 45% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors “The service is an additional tool to enable the learners themselves to find out their missing knowledge.” Tutors “The response time of the tool is good.” Tutors “The overall quality of the feedback is good enough to help the learners and to support new tutors.” Tutors “Using the existing feedback of the LeaPos Service the tutor is able to jump into the positioning process at a higher level. So it‟s possible to ask more specific questions for each learner right from the beginning of the session and save time.” Comment Each tutor was responsible for a group of learners, where some of them used the positioning service and the other learners Page 15 of 349 D7.4 - Validation 4 worked in traditional means. The results of the comparison where used for the time analysis. Page 16 of 349 D7.4 - Validation 4 OVT: 2.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic Tutors spend less time preparing final feedback for learners and grading compared with traditional means. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 7. It takes less time to complete my teaching tasks using LeaPos than without the system. Experimental 4,0 1,00 67% 3 Tutors 8. Using LeaPos enables me to work more quickly than without the system. Experimental 4,7 0,58 100% 3 Tutors 9. I do not wait too long before receiving the requested information. Experimental 4,7 0,58 100% 3 Tutors 10. LeaPos provides me with the requested information when I require it (i.e. at the right time in my work activities). Experimental 4,7 0,58 100% 3 Formative results with respect to validation indicator Stakeholder type Results Tutors “The main advantage of the system is that it saves time. For some of the questions it was enough for me only to flip through the lists of phrases and concepts to decide on the grade.” Tutors “For me it is a nice to have a repository with all the necessary information at one place” Tutors “Feedback is immediate, that is very important.” Tutors “Sometimes it takes me some time to get oriented within the list of phrases and concepts, and then to connect it to the Live Feedback measure.” Page 17 of 349 D7.4 - Validation 4 OVT: 2.2 Pilot site bit Austria Pilot language German Operational Validation Topic It is easy (there is less cognitive load) for tutors to provide feedback and grading using LeaPos. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 70. Using LeaPos, grading learners takes a lot of mental effort. Experimental 2,0 0,82 0% 5 Tutors 71. Using LeaPos, it takes a lot of mental effort to provide feedback for learners. Experimental 2,0 0,82 0% 5 Tutors 72. Using the output provided by LeaPos, it is easy for new tutors to provide feedback and grading in the BIT training environment. Experimental 3,7 0,58 60% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors “It‟s easy to use the LeaPos Service” Tutors “I expect that the service is useful for new tutors” Tutors “The user interface could be improved – the relevant information is mostly on the end of the page” OVT: 2.2 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic It is easy (there is less cognitive load) for tutors to provide feedback and grading using LeaPos Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 70. Using LeaPos, grading learners takes a lot of mental effort. (1=strongly agree, 5= strongly disagree) Experimental 2,0 1,00 66, 7% 3 Page 18 of 349 D7.4 - Validation 4 Tutors 71. Using LeaPos, it takes a lot of mental effort to provide feedback for learners. (1=strongly agree, 5= strongly disagree) Experimental 2,0 1,00 66, 7% 3 Tutors 72. Using the output provided by LeaPos, it is easy for new tutors to provide feedback and grading in our environment. Experimental 3,3 0,58 33% 3 Formative results with respect to validation indicator Stakeholder type Results Tutors "It is much easier to find the gaps in learner‟s knowledge exactly because of the structured output from the system – it is fast to read and easy to use as a starting point for my feedback.” OVT: 3.1 Pilot site bit Austria Questionnaire type Pilot language German Operational Validation Topic Tutors perceive that the feedback received from the system helps them prepare feedback for learners. (relevant, useful, accurate, trustworthy). Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Relevant Tutors 73. LeaPos provides feedback that is relevant to my preparation of learner feedback. Experimental 4,0 1,0 60% 5 Tutors 74. LeaPos provides feedback that is relevant to learners. Experimental 4,7 0,58 60$ 5 Useful Tutors 75. LeaPos provides feedback that is useful to my preparation of learner feedback. Experimental 4,3 0,58 60% 5 Tutors 76. The “List of Phrases” (used and missing) provided by the system is helpful to me in preparing learner feedback. Experimental 4,0 0,00 60% 3 Page 19 of 349 D7.4 - Validation 4 Tutors 77. I perceive that the “List of Phrases” (used and missing) would help learners in their studies. Experimental 3,7 0,58 40% 5 Tutors 78. The “List of Concepts” (used and missing) provided by the system is helpful to me in providing learner feedback. Experimental 4,0 0,00 60% 3 Tutors 79. I perceive that the “List of Concepts” (used and missing) would help learners in their studies. Experimental 4,7 0,58 60% 5 Accurate Tutors 80. LeaPos feedback is sufficiently accurate to inform my feedback. Experimental 4,3 0,58 60% 5 Tutors 81. The “Grading (percentage value)” in the live-feedback represents an overview of the current position of the learner. Experimental 3,4 1,67 40% 5 Tutors 82 The “List of Phrases” (used and missing) provided by the system is mostly correct. Experimental 3,6 0,55 60% 5 Tutors 83. The “List of Concepts” (used and missing) provided by the system is mostly correct. Experimental 4,3 0,50 80% 5 Trustworthy Experimental 84. I trust LeaPos to provide helpful feedback. Experimental 3,7 0,58 60% 5 Tutors Formative results with respect to validation indicator Stakeholder type Results Page 20 of 349 D7.4 - Validation 4 Tutors “The Grading is a useful hint for the tutor – more important is the list of concepts to decide if relevant knowledge is missing” “It‟s helpful to have a look at the missing phrases to recognize missing knowledge of the learner” “There are some missing feedback elements for some questions – e.g: „Name at least three data storage devices and describe their properties‟. 'For this question the learners got the information about missing storage devices but no hints for the properties of these devices.” “If we found missing feedback in the list of phrases it was possible to identify the reason: Not all tutors expect the same phrases in the answers, so that these phrases were not available in the existing „gold standard‟ answers”. Page 21 of 349 D7.4 - Validation 4 OVT: 3.1 Pilot site IPP-BAS Questionnaire type Pilot language Bulgarian Operational Validation Topic Tutors perceive that the feedback received from the system helps them prepare feedback for learners. (relevant, useful, accurate, trustworthy). Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Relevant Tutors 73. LeaPos provides feedback that is relevant to my preparation of learner feedback. Experimental 4,3 0,58 100% 3 Tutors 74. LeaPos provides feedback that is relevant to learners. Experimental 4,3 0,58 100% 3 Useful Tutors 75. LeaPos provides feedback that is useful to my preparation of learner feedback. Experimental 4,3 0,58 100% 3 Tutors 76. The “List of Phrases” (used and missing) provided by the system is helpful to me in preparing learner feedback. Experimental 4,0 0,00 100% 3 Tutors 77. I perceive that the “List of Phrases” (used and missing) would help learners in their studies. Experimental 3.3. 0.58 33% 3 Tutors 78. The “List of Concepts” (used and missing) provided by the system is helpful to me in providing learner feedback. Experimental 4,0 0,00 100% 3 Tutors 79. I perceive that the “List of Concepts” (used and missing) would help learners in their studies. Experimental 4,0 0,00 100% 3 Experimental 4,0 0,00 100% 3 Accurate Tutors 80. LeaPos feedback is sufficiently accurate to inform my feedback. Page 22 of 349 D7.4 - Validation 4 Tutors 81. The “Grading (percentage value)” in the live-feedback represents an overview of the current position of the learner. Experimental 3,7 0,58 67% 3 Tutors 82 The “List of Phrases” (used and missing) provided by the system is mostly correct. Experimental 3,7 0,58 67% 3 Tutors 83. The “List of Concepts” (used and missing) provided by the system is mostly correct. Experimental 4,0 0,00 100% 3 Experimental 4,0 0,00 100% 3 Trustworthy Tutors 84. I trust LeaPos to provide helpful feedback. Formative results with respect to validation indicator Stakeholder type Results Tutors “The links to learning materials are very useful. I like that the system provides links to relevant texts at any moment of my – and learners‟ – interaction with it.” Tutors “The list of phrases, both missing and positive, could be used to teach learners some of the typical for the professional language terms and expressions.” Tutors “I could evaluate the relevance of my own learning materials in the following way: inserting them as answers to a given question and then receiving the live feedback from the system.” OVT: 3.2 Pilot site bit Austria Pilot language German Operational Validation Topic Learners perceive that the live feedback received from the system contributes to informing their study activities. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 6. The information the system provides to me is accurate enough for Experimental 3,4 1,18 52% 25 Page 23 of 349 D7.4 - Validation 4 helping me to perform my learning tasks. Learners 50. LeaPos provides feedback that is relevant to my study activities. Experimental 3,8 0,95 56% 25 Learners 51. LeaPos provides feedback that is useful to my study activities. Experimental 4,0 0,92 52% 25 Learners 52. The “List of Phrases” (used and missing) provided by the system is helpful. Experimental 3,9 0,83 60% 25 Learners 53. The “List of Concepts” (used and missing) provided by the system is helpful. Experimental 4,1 1,01 56% 25 Learners 54. LeaPos feedback is sufficiently accurate to inform my study activities. Experimental 4,1 0,87 68% 25 Learners 55. The “List of Phrases” (used and missing) provided by the system is mostly correct. Experimental 3,9 0,97 64% 25 Learners 56. The “List of Concepts” (used and missing) provided by the system is mostly correct. Experimental 3,8 1,08 52% 25 Learners 57. I trust LeaPos to provide helpful feedback. Experimental 3,9 1,04 68% 25 Formative results with respect to validation indicator Stakeholder type Results Learners “There are some wrong results in the list of phrases and concepts – improvement will be helpful for the learner – it‟s not an issue for the tutor” Learners “To provide two different lists of text is not useful for the learner – the learner is not able to distinguish between phrases and concepts” OVT: 3.2 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic Learners perceive that the live feedback received from the system contributes to informing their study activities (relevant, useful, accurate, trustworthy). Page 24 of 349 D7.4 - Validation 4 Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Relevant Learners 6. The information the system provides to me is accurate enough for helping me to perform my learning tasks. Experimental 3,8 0,62 72% 25 Learners 50. LeaPos provides feedback that is relevant to my study activities. Experimental 3,9 0,67 80% 25 Useful Learners 51. LeaPos provides feedback that is useful to my study activities. Experimental 4,0 0,54 88% 25 Learners 52. The “List of Phrases” (used and missing) provided by the system is helpful. Experimental 3,8 0,62 72% 25 Learners 53. The “List of Concepts” (used and missing) provided by the system is helpful. Experimental 3,8 0,55 76% 25 Accurate Learners 54. LeaPos feedback is sufficiently accurate to inform my study activities. Experimental 3,8 0,66 72% 25 Learners 55. The “List of Phrases” (used and missing) provided by the system is mostly correct. Experimental 3,8 0,65 68% 25 Learners 56. The “List of Concepts” (used and missing) provided by the system is mostly correct. Experimental 3,9 0,53 80% 25 Experimental 3,8 0,62 72% 25 Trustworthy Learners 57. I trust LeaPos to provide helpful feedback. Formative results with respect to validation indicator Stakeholder type Results Learners “I‟d like to see some ranking of the concepts - which are more important and which not that important.” Page 25 of 349 D7.4 - Validation 4 Learners “Some of the phrases were actually just variations of one and the same phrase” Learners “It is really nice that you can not only to see the missing concepts, but you can also learn what they mean and see where to get more information about them.” Page 26 of 349 D7.4 - Validation 4 OVT: 3.3 Pilot site bit Austria Pilot language German Operational Validation Topic Learners perceive that they receive useful additional feedback, compared with traditional means Experimental / control group Questionnaire type Questionnaire no. & statement Learners 58. It is useful to get extra feedback from LeaPos, in addition to the tutor‟s feedback. Experimental Learners 59. Receiving feedback from LeaPos in addition to the tutor feedback provides me with more detailed feedback (compared with the tutor feedback I got in the last course without using the Positioning Service). Experimental Mean Standard deviation %Agree / Strongly agree n= 3,5 1,17 36% 25 3,3 1,20 52% 25 Formative results with respect to validation indicator Stakeholder type Results Learners “It‟s interesting to use the live feedback to find information for learning” Learners “I don‟t believe that I got additional feedback, but I got immediately a feedback” OVT: 3.3 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic Learners perceive that they receive useful additional feedback, compared with traditional means Experimental / control group Questionnaire type Questionnaire no. & statement Learners 58. It is useful to get extra feedback from LeaPos, in addition to the tutor‟s feedback. Page 27 of 349 Experimental Mean Standard deviation %Agree / Strongly agree n= 4,3 0,79 80% 25 D7.4 - Validation 4 Learners 59. Receiving feedback from LeaPos in addition to the tutor feedback provides me with more detailed feedback (compared with the tutor feedback I got in the last course without using the Positioning Service). 4,1 0,71 76% 25 Experimental Formative results with respect to validation indicator Stakeholder type Results Learners “I want to know beforehand if I can give a one word answer, or I have to justify my choice with some explanation to get the highest grade.” OVT: 3.4 Pilot site bit Austria Questionnaire type Pilot language German Operational Validation Topic Learners perceive that the system can target learning materials depending on their needs Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Experimental 3,7 0,93 60% 25 Relevant Learners 60. LeaPos provides learning materials that are relevant to my study activities. Useful Learners 61. LeaPos provides learning materials that are useful to my study activities. Experimental 3,8 0,96 60% 25 Learners 62. LeaPos provides a diversity of hints (phrases, concepts and learning materials), which are useful for finding appropriate learning materials. Experimental 3,7 0,93 64% 25 Formative results with respect to validation indicator Stakeholder type Results Learners “I would expect a detailed information about the relevant topics in the learning materials to save time” Page 28 of 349 D7.4 - Validation 4 Learners “It‟s important to have an open mind for the provided hints” Page 29 of 349 D7.4 - Validation 4 OVT: 3.4 Pilot site Sofia University Questionnaire type Pilot language Bulgarian Operational Validation Topic Learners perceive that the system can target learning materials depending on their needs (relevant, useful, trustworthy) Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Experimental 3,8 0,66 64% 25 Relevant Learners 60. LeaPos provides learning materials that are relevant to my study activities. Useful Learners 61. LeaPos provides learning materials that are useful to my study activities. Experimental 4,0 0,71 64% 25 Learners 62. LeaPos provides a diversity of hints (phrases, concepts and learning materials), which are useful for finding appropriate learning materials. Experimental 4,1 0,76 84% 25 Experimental 3,9 0,70 80% 25 Trustworthy Learners 63. I trust LeaPos to provide helpful learning materials. Formative results with respect to validation indicator Stakeholder type Results Learners “One of the best features of the system is that you can access relevant materials immediately, you don‟t have to search for them on the net” Learners “Not all of the documents could be used as a main source, some of them were good only as supplementary materials” Page 30 of 349 D7.4 - Validation 4 OVT: 4.1 Pilot site bit Austria Questionnaire type Pilot language German Operational Validation Topic Tutors perceive that positioning is more effective compared with traditional means because the quality and quantity of the input to positioning is improved. Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Experimental 3,7 0,58 60% 5 Experimental 4,7 0,58 60% 5 Quality Tutors 84. Using LeaPos, the learner‟s input to positioning is a good reflection of his/her knowledge. Quantity Tutors 85. Using LeaPos, I have enough information about the learner on which to base my positioning decision. Formative results with respect to validation indicator Stakeholder type Results Tutors “The LeaPos Service doesn‟t outline the whole position of the learner but is very helpful to accelerate the positioning process”. Tutors “The list of concepts was very helpful for the positioning” Comment The usage of the LeaPos Service provides efficiency for the positioning service with reduced workload for the tutors compared to the traditional positioning task. Page 31 of 349 D7.4 - Validation 4 OVT: 4.1 Pilot site IPP-BAS Questionnaire type Pilot language Bulgarian Operational Validation Topic Tutors perceive that positioning is more effective compared with traditional means because the quality and quantity of the input to positioning is improved. Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Experimental 4,0 0,00 100% 3 Experimental 4,3 0,58 100% 3 Quality 84. Using LeaPos, the learner‟s input to positioning is a good reflection of his/her knowledge. Tutors Quantity Tutors 85. Using LeaPos, I have enough information about the learner on which to base my positioning decision. Formative results with respect to validation indicator Stakeholder type Results Tutors “Live-feedback gives a right idea of the learner‟s knowledge”. Tutors “The LeaPos Service provides immediately positioning results for the learner which improves the efficiency of learning” Tutors “I would like to see the information also in context (other answers to questions; other tasks)”. OVT: 4.2 Pilot site bit Austria Questionnaire type Pilot language German Operational Validation Topic Tutors perceive that using LeaPos, learners receive homogeneous feedback Experimental / control group Questionnaire no. & statement Page 32 of 349 Mean Standard deviation %Agree / Strongly agree n= D7.4 - Validation 4 Tutors 86. Using LeaPos, different tutors would be likely to provide very similar feedback to the same .learner. Experimental 4,0 0,00 60% 3 Tutors 87. Using LeaPos, where two learners have the same missing concepts, they would receive the same hints for finding learning materials. Experimental 3,3 0,58 20% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors “LeaPos will be helpful to consolidate the provided feedback by different tutors” Tutors “The missing concepts are not absolutely representing the missing knowledge” OVT: 4.2 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic Tutors perceive that using LeaPos, learners receive homogeneous feedback Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 86. Using LeaPos, different tutors would be likely to provide very similar feedback to the same learner. Experimental 4,0 1,00 67% 3 Tutors 87. Using LeaPos, where two learners have the same missing concepts, they would receive the same hints for finding learning materials. Experimental 4,0 1,00 67% 3 Formative results with respect to validation indicator Stakeholder type Results Tutors “I started to grade the answers very quickly after the first 3 or 4 answers. Usually it takes me much more time to adjust my preliminary grading methodology with regard to the test results.” Page 33 of 349 D7.4 - Validation 4 OVT: 4.3 Pilot site bit Austria Pilot language German Operational Validation Topic Learners can receive feedback when they need it Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 63A. It is helpful to receive immediate feedback from LeaPos when I need it (no latency time for waiting). Experimental 4,0 1,07 64% 25 Formative results with respect to validation indicator Learners “It‟s exciting to use the live feedback” Learners “I would like to use the live feedback functionality for additional modules in the training” OVT: 4.3 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic Learners can receive feedback when they need it Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 63A. It is helpful to receive immediate feedback from LeaPos when I need it (no latency time for waiting). Experimental 4,5 0,59 96% 25 Page 34 of 349 D7.4 - Validation 4 OVT: 5.1 Pilot site bit Austria Pilot language German Operational Validation Topic The live feedback helps learners improve their answers, so they can demonstrate their knowledge more effectively Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 64. The live feedback helps me improve my answers. Experimental 4,1 0,93 76% 25 Learners 65. The live feedback reminds me to include extra information in my answer that I had forgotten to include originally. Experimental 4,1 0,83 76% 25 Learners 66. The live feedback helps me demonstrate my knowledge more effectively. Experimental 4,3 0,7 80% 25 Formative results with respect to validation indicator Stakeholder type Results Learners “The live feedback provides interesting information for me” OVT: 5.1 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic The live feedback helps learners improve their answers, so they can demonstrate their knowledge more effectively Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 64. The live feedback helps me improve my answers. Experimental 4,4 0,71 88% 25 Learners 65. The live feedback reminds me to include extra information in my answer that I had forgotten to include originally. Experimental 4,5 0,59 96% 25 Page 35 of 349 D7.4 - Validation 4 Learners 66. The live feedback helps me demonstrate my knowledge more effectively. Experimental 4,1 0,73 80% 25 Formative results with respect to validation indicator Stakeholder type Results Learners “It‟s a plus that you can get a score for your performance right away.” “It is stimulating to be able to see how to improve your answer and than to check up if this works immediately.” “It‟s nice to see that you are “in the green sector”. Learners “You can feel the system “guides” you to improve your answer” OVT: 6.1 Pilot site bit Austria Pilot language German Operational Validation Topic The direct feedback provided by the system encourages learners to undertake further study to address gaps in their coverage. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 18. Using LeaPos increases my curiosity about the learning topic. Experimental 3,7 1,12 64% 25 Learners 20. Using the system motivates me to explore the learning topic more fully. Experimental 3,8 0,94 68% 25 Learners 22. I am eager to explore different things with LeaPos. Experimental 3,8 0,94 68% 25 Formative results with respect to validation indicator Stakeholder type Results Learners “I used the service during the times my concentration was not good to improve my learning capacity” Page 36 of 349 D7.4 - Validation 4 OVT: 6.1 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic The direct feedback provided by the system encourages learners to undertake further study to address gaps in their coverage. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 18. Using LeaPos increases my curiosity about the learning topic. Experimental 4,0 0,76 72% 25 Learners 20. Using the system motivates me to explore the learning topic more fully. Experimental 4,1 1,04 68% 25 Learners 22. I am eager to explore different things with LeaPos. Experimental 4,1 0,78 76% 25 Formative results with respect to validation indicator Stakeholder type Results Learners “It took me more time to correct some of my answers, because I didn‟t only search for the section that would answer the question, I continued reading further” “The question for the XML data types was surprising for me. I tried to find more information, but there wasn‟t. It would have been great if there were more documents on this topic.” OVT: 7.1 Pilot site bit Austria Pilot language German Operational Validation Topic: There is a saving in institutional resources overall Formative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Analysis of consumed time, Final interview with the Branch office manager bit Vienna Results: Page 37 of 349 D7.4 - Validation 4 The following analysis has been made based on the existing IT-Basics and ECDL education project we are delivering for unemployed learners at bit Vienna. The following tasks and resources are needed to implement the LeaPos Service in our trainings environment: Management Overhead: 5 days (once to implement a new tool) Based on the experience of integration of new tools (e-Learning tools, Testing tools) an management overhead for planning and informing employees involved is required (e.g.: Explain the idea and goals to the responsible employees in the course management, Planning the technical implementation and internal testing, Scheduling events for training of the tutors, …) Technical Implementation of the LeaPos Service: 2 days (once overall) This task has to be done once (Server Setup, Installation of the LeaPos Service, Initial User Management). The implementation can be done by our existing technical stuff during “non-peak times” and doesn‟t cause additional costs in the bit environment. Training for the tutors: 2-3 hours (once for each tutor, groups of between 6 and 8 tutors) To enable and motivate the tutors to use the LeaPos Service face-to-face training is required. The primary goal of this training is to explain the ideas of the language technology approach and how to interpret and use the LeaPos Service results. Also guidelines for using the service in combination with other existing tools in the bit training environments are delivered during this training. The explanation of the user interface and the handling of the software will take only 10 or 15 minutes, because the user interface is easy to understand. We are able to integrate this training in the existing regular scheduled events for our tutors (combination of information transfer and social event). Establishing the initial questionnaire: 2-3 days (once per course) To improve the existing questionnaires, which are used in the bit group, is already defined as regular process. To integrate the LeaPos Service for the bit courses the existing questionnaires can be adjusted step by step during the this regular process. In this case no additional resources for this task are required. As additional workload the graded answers for the questionnaire have to be collected and graded, which will take about up to one day. Uploading the initial questionnaire and the learning materials (continuous process) The LeaPos Service is able to upload all of the important file types which are used for training materials, so that our tutors are able to Page 38 of 349 D7.4 - Validation 4 upload the materials themselves during their normal working times. To improve the results of the LeaPos Service the tutors should annotate the learning materials. These annotations are used by the service to provide appropriate training materials. Because of that the annotations done by one tutor are also useful for the other tutors to get familiar with new training materials. This benefit results in an overall reduced time for integrating new training materials in our environment. Guiding the learners: 1 or 2 hour(s) (once per each learner for all courses) To ensure that the learners will use the LeaPos Service during the first week of the whole education path a guided introduction into the idea and goals of the LeaPos Service and the use of language technologies for learning takes place. For learners with no IT experience the additional time of 1 hour is calculated to get familiar with the usage of the interface. Tutor resources required during the course Tutor resources currently used in the bit environment There is one tutor responsible to support a group of 8 up to 12 learners over a period of 6 weeks (whole education period defined in this project per each learner). The tutors are responsible for delivering a few hours face to face training per week and providing individualized support for each learner during his learning process (e.g.: defining next steps in learning, providing training materials, answering questions, …). Opportunity provided by the LeaPos Service and LTfLL tools We are establishing educational projects in this area which are based on many self-learning components. This approach is important to reach two main goals: - Efficiency of learning (the learner has to pass the exams at the end of the courses) - Cost efficient learning tracks The tutors are important for the positioning process and supporting the learners during this self-learning time slots. The LeaPos Services enables the learners to perform a self-positioning task for continuing their learning process. Using this benefit we will be able to schedule a limited time in the learning process without any tutor support (or centralised support with remote tools like video conferencing), where the tutor can be working on different job activities. E.g. a solution could be to have a combination of 6 hours guided learning with direct tutor support and 2 hours of full self-learning without tutor support. Currently we are using about 8 tutors at the same time for 5 days a week. Reducing the tutor support time from 8 to 6 hours a day will result in saving 2 working days per week. Page 39 of 349 D7.4 - Validation 4 Conclusion: For implementing the LeaPos Service for 5 different courses of the training path we will have to invest about 18 days of preparation time (5 days for the management overhead, 2 days for the technical implementation, 1 day for the training of the tutors and 10 days for preparing the questionnaire). Using the approach of reducing the working time of tutor resources we will save 2 days per week. In this case after a period of about 9 weeks the original investment will be compensated. To add an additional course in LeaPos to the training path we will only have to invest about 2 days for establishing the questionnaire with the graded answers and the training materials. So after the LeaPos Services has been established as standard tool for the training environment the additional amount of investment for adding a course is really acceptable. OVT: 7.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: There is a saving in institutional resources overall Formative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: considering average set up times within flexible time constraints (2 weeks) Results: The students and the tutors had flexible times to perform their tasks, since the tasks were external to their present courses. In general, students performed his task in one hour time, including consultations with relevant materials and live feedback. It took to the tutor between 10 and 15 minutes in average to assess and grade the answers of a student using the LeaPos Service. OVT: 8.1 Pilot site bit Graz Pilot language German Operational Validation Topic The service meets one or more institutional objectives. Page 40 of 349 D7.4 - Validation 4 Formative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Interview with the Branch office manager bit Vienna, CEO bit Schulungscenter Results: An important institutional objective is to develop improved educational solutions for reaching high quality and cost efficient learning concepts. Therefor the management of the bit group is interested in using tools like the LeaPos Service or other tools of the LTfLL project in the future. Especially the LeaPos Service provides an additional approach of improving the learning task in our projects where unemployed people are educated. But also for e-Learning solutions the service could improve the quality of the learning. Page 41 of 349 D7.4 - Validation 4 OVT: 8.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic The service meets one or more institutional objectives Formative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Interview with teaching manager from IPP-BAS (n=1). Results: An institutional objective is to re-use as much as possible from the learning resources. The main problem (probably with any new system) is the effort necessary to be invested to initially prepare the resources. Also, a clear mechanism to maintain them. Another institutional objective is the coverage of the whole curriculum. Our courses in IT are of two types: mathematically oriented and software oriented. The latter (with the exception of writing programs) are suitable for using LeaPos. Thus, at the moment we cover part of our curriculum, but this is still useful and is worth it. An important institutional objective is the cooperation among our tutors. The main usage will be in the individual work of the tutors with the students, but if the service is widely accepted by our tutors, we could build a repository of questions to be shared by all of them. This would improve the quality of the used questionnaires and there would be a better coverage over the learning topics. Attracting more students is a major institutional objective. "I envisage the main role of the service as a way of attracting more students, thus – increasing their number in our IT courses (especially in MA programs)". OVT: 9.1 Pilot site bit Austria Pilot language German Operational Validation Topic Users were motivated to continue to use the system after the end of the formal validation activities Summative results with respect to validation indicator Questionnaire type Experimental / control group Questionnaire no. & statement Page 42 of 349 Mean Standard deviation %Agree / Strongly agree n= D7.4 - Validation 4 Tutors 21. I would recommend this system to other teachers to help them in their teaching. Experimental 3,8 0,96 50% 4 Tutors 22. I am eager to explore different things with LeaPos. Experimental 4,0 0,00 100% 4 Tutors 29. I would like to use the service in my teaching after the pilot. Experimental 4,3 0,58 75% 4 Tutors 30. If the service is available after the pilot, I will definitely use it in my teaching. Experimental 4,3 0,58 75% 4 Learners 21. I would recommend this system to others Experimental 4,0 0,91 72% 25 Learners 22. I am eager to explore different things with LeaPos Experimental 3,8 1,03 56% 25 Learners 29. I would like to use the service after the pilot. Experimental 3,8 1,22 56% 25 Learners 30. If the service is available after the pilot, I will definitely use it Experimental 3,9 0,95 64% 25 OVT: 9.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic Users were motivated to continue to use the system after the end of the formal validation activities Summative results with respect to validation indicator Learners 21. I would recommend this system to others Experimental 3,8 0,94 67% 12 Learners 22. I am eager to explore different things with LeaPos Experimental 3,8 0,72 67% 12 Learners 29. I would like to use the service after the pilot. Experimental 3,7 0,89 58% 12 Learners 30. If the service is available after the pilot, I will definitely use it Experimental 3,6 1,08 50% 12 Page 43 of 349 D7.4 - Validation 4 OVT: 9.2 Pilot site bit Austria Pilot language German Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption). Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Generic questionnaire - learners Results: Descriptive Statistics - Learners N Mean Std. Deviation Effectiveness 24 3,51 ,842 Efficiency 24 3,03 ,858 Cognitive Load 21 3,10 ,995 Usability 25 3,67 ,712 Satisfaction 25 3,69 ,805 Facilitating conditions 24 3,61 ,996 Self-Efficacy 23 3,86 ,771 Behavioural intention 24 3,81 1,020 BIT-MEDIA 25 3,58 ,610 Valid N (listwise) 19 Page 44 of 349 D7.4 - Validation 4 OVT: 9.2 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption). Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Generic questionnaire - learners Results: Descriptive Statistics - Learners N Mean Std. Deviation Effectiveness 25 4,02 ,536 Efficiency 25 4,27 ,473 Cognitive load 25 3,76 1,091 Usability 25 4,24 ,548 Satisfaction 25 4,11 ,705 Facilitating conditions 25 4,32 ,402 Self-Efficacy 25 4,12 ,738 Behavioural intention 25 3,98 ,884 IPP-BAS 25 4,14 ,417 Valid N (listwise) 25 Page 45 of 349 D7.4 - Validation 4 OVT: 9.3 Pilot site bit Austria Pilot language German Operational Validation Topic Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutor workshop 1. How likely are you to consider adopting LeaPos in your own educational practice? Experimental 4,3 0,55 60% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors The LeaPos Service is helpful for preparing the feedback for the learner during the positioning phases of the course. Tutors If the management of a learning institution is going to implement the LeaPos service, the initial workload has to be divided to all involved tutors. This approach will allow providing the LeaPos Service with qualitative questionnaires. Page 46 of 349 D7.4 - Validation 4 Section 4: Results – validation activities informing future changes / enhancements to the system VALIDATION ACTIVITY Pilot partner: bitmedia Service language: German Additional formative results (not associated with validation topics) Alpha testing The Knowledge Rich approach was only available for the English course (This issue was solved before the pilot). Beta testing The user interface is working well. Learner focus group 1 The discussion with learner focus group brought up the following results: “The most important feature of the LeaPos Service is the Live Feedback, which immediately provides real useful information for the learners.” “Some improvements for the navigation in the LeaPos Service could be made – over-all the tools is easy to use”. “Explanations about both the result lists should be implemented in the service (e.g.: animated demo results with explanations about how to interpret the results) – this would enable the learners to use the service without getting the intro session.” Tutor interviews Manager Interview “The testing of the LeaPos Service demonstrated the capacity of the tool to support tutors and learners.” “The percentage value is not very important for the positioning task.“ “The two different results (phrases and concepts) should be integrated in one list, because the difference between both results is only interesting for language experts.” “If we used the LeaPos Service in the Internet Explorer the colours do not provide useful contrast (e.g.: question text).” “We are interested in the LeaPos Service as additional tool for supporting the learners and tutors”. “The user interface should also be available in different languages (e.g.: German, Polish, Czech, …) based on customer requirements. “There are possibilities to implement the LeaPos Service in different learning scenarios (traditional, e-Learning and combinations).” Page 47 of 349 D7.4 - Validation 4 VALIDATION ACTIVITY Pilot partner: IPP-BAS Service language: Bulgarian Additional formative results (not associated with validation topics) Alpha testing Adding Bulgarian lexicalisations of concepts is not easy with the lemmatization of a morphologically rich language. This problem was solved before the pilot. Beta testing The visualisation for simple text documents makes them difficult to read, because there is no text wrapping, and one has to scroll from left to right and back to finish a paragraph. Learner focus group 1 First, learners discussed whether they should use the system again or they should not use it again. Most of the learners would use the system again and were even surprised that it is free. One of them proposed to translate the UI for free in Bulgarian. The most important reasons why the focus group participants would use the system again were: o The learning materials are uploaded to the system and ready to use (“You don‟t have to go to the library or search for info on the net”). o The learners appreciate that the system points the gaps in their knowledge very precisely. Live Feedback is quick and the process of improvement is not postponed by the necessity to wait for the tutor‟s reaction. o LeaPos suggests learning materials related to the specific topic, which makes the learning easier. o The system stimulates the learning process, because it represents the learning process like a game that the learners want to win (“I wanted to answer as many questions as possible with the highest score possible”). Some of the learners wouldn't use the system again. The reasons for that were: o The system does not look like the universally recognized model for learning management systems (Moodle). This confused some of them, who are used to another type of scenario, where the knowledge evaluation follows the learning process, and does not precede it. o The interface is not intuitive and it takes time to get used to the “onion layer” model (the answer field is wrapped in the Question field, which in turn is wrapped in the course field). When the student was asked if he used the built-in help, he answered negatively. Page 48 of 349 D7.4 - Validation 4 o o The type of answer the tutor/system expects has to be specified explicitly in the question or by length restrictions in the answer field. Currently there is no way to prevent cheating (plagiarism). The discussion about the changes and enhancements to the software was focused primarily on the improvement of the visualisation: o There is no information what type of answer is expected from the learners (“Why I got less points when my answer was correct? It is one word answer, but it is correct.”). It would be better for learners if they knew in advance whether they have to answer to the question with one word or with a whole text. o The system does not show a list of already answered questions as a model to follow. o The link “Show relevant learning material” should be bigger or should look like a radio button, there is a chance to miss it. The learners discussed also the effect on their learning when they use the system. All of them said that the system stimulates them to learn. Also the system helps them to improve their answers. Some of the students said that they did not waste time when they were using the system, because there was no need to look for learning materials. According to some of the learners, the system made them more curious, thus they explored all the uploaded materials, even the ones which were not recommended for the given topic. Learner focus group 2 (prioritisation of enhancements) Learners judged that the six most important areas for enhancement of the system (i.e. clusters) are: 1. The visualisation (rendering in web browser) for simple text format files needs to be improved (NB: Learning materials for Bulgarian were in .txt format). The readability of the document is important and motivates learners to use the materials offered by the system. 2. Currently, there is no way to keep history of the individual learning curve (for example, how many positive vs. how many negative answers have been received): the points received for each variant of the answer should be stored. Making improvement visible is stimulating. 3. The concepts are not ranked: important – not-that-important – details. This would help students understand which type of knowledge is basic and which is extra. 4. There is no way to control the time it takes for the learner to answer. Optionally, the time to answer could be limited, or points could be detracted, if time limit is exceeded. 5. The learner cannot monitor his own progress. A list of answered /unanswered questions, and a ratio between them might be provided. Page 49 of 349 D7.4 - Validation 4 6. The interface could be more attractive and easy to navigate through. Learners judged that the most important single improvements that should be made to the system are: 1. One has to check if there is feedback from the tutor or to know beforehand when to expect that. It would be good to get a notification when the tutor has provided his feedback. 2. Give some motivation for the content of the selected “List of Phrases”. They are extracted from answers that were graded with the highest grade, but sometimes they duplicate in their content, or are not ready to be used without some modification: “Sometimes it remains unclear why these phrases are recommended”. 3. Sometimes it is difficult to find the piece of information one needs to answer. Highlight the segment in the learning material, that is relevant to the question. If the case is that only a part of a document is relevant to the question‟s topic, it will be better if it‟s somehow marked. 4. Adding an element of play or even competition might prove to be stimulating for some users: “Instead of just answering a bunch of questions, make people compete for time or give them some bonuses - you know what they say - even if it is just a ribbon, still it is a reward. A funny animation or simply “Congratulations!” message to pop-up will do too.” Tutor interviews Major changes identified by tutors are: “I think the system should provide an explanation for some of its decisions. Especially in regard to the list of missing phrases and additional concepts.” “At the moment the service cannot generalize over the various phrases, and I am sometimes lost in more or less synonymous expressions.” “The software should be used equally easy in various browsers (I could not use it with Internet Explorer)” Manager Interview „The system could only benefit from a spellchecker integration. In my courses spelling and grammar form an integral part of the assessment.” “The system has the potential to attract more students in IT area, especially in MA programs”. “I also know about the other service of the project (6.1) and I would like to use the ontology in both of them. This will increase the quality of the knowledge structure in our programs” “In Bulgaria the number of students is one important element of the evaluation of the universities. Using such service in the appropriate way will increase the number of the students” “The students might be attracted to using of such services if they have initial access to a large number of questions, which requires a lot of initial work by LeaPos team and the tutors respectively.” Page 50 of 349 D7.4 - Validation 4 Page 51 of 349 D7.4 - Validation 4 Section 5: Results – validation activities informing transferability, exploitation and barriers to adoption VALIDATION ACTIVITY Partner(s) involved: bitmedia Service language: German Additional formative results (not associated with validation topics) Alpha testing None Learner focus group 1 “Explanations about both the result lists should be implemented in the service (e.g.: animated demo results with explanations about who to interpret the results) – this would enable the learners to use the service without getting the intro session.” Tutor interviews “The two different results (phrases and concepts) should be integrated in one list, because the difference between both results is only interesting for language experts.” “The user interface should also be available in different languages (e.g.: German, Polish, Czech, …) based on customer requirements. Manager Interview VALIDATION ACTIVITY Partner(s) involved: IPP-BAS Service language: Bulgarian Additional formative results (not associated with validation topics) Alpha testing Major issues encountered in transferring LeaPos to Bulgarian: o Adding lexicalisations (which express the ontology concepts in a natural language – in this case - Bulgarian) has to be mediated by lemmatization for languages, rich in inflection. This issue was solved for the pilot by FLSS team. o The available learning objects in Bulgarian did not cover the sub-domain of the BITMEDIA questionnaire. Thus, a new questionnaire was created on one of the present sub-domains. It took roughly one hour to introduce a distinct question. The time includes checks within the repository and ontology. Page 52 of 349 D7.4 - Validation 4 Learner focus group 1 Live feedback cannot be used when the test does not presuppose a textual representation of knowledge (for example, solutions to mathematical problems or writing software programmes). It was noted that the interface is in English, while the pilot was in Bulgarian. For considering a wider usage in Bulgaria, synchronization of the interface with the pilot language is required. The system works on question-based level (one question at a time). It would be useful to explore the connections among several related questions. Tutor interviews Manager Interview All tutors agreed that the immediate feedback will attract learners and thus it is very likely that they will be inclined to work with LeaPos. It is necessary to start with a hands-on type activity and train the learners how to get the best of the live-feedback, even when the output is not very precise. Another problem, at least at the beginning, will be the need to create a number of tests with many different types of questions in order for the learners to be able to use the system by themselves. The system cannot provide feedback for part of the courses (mathematical tasks, software code, etc.) The ontology service of the system is very useful when combined with 6.1 semantic search for suggesting relevant learning materials A repository of questions lacks, which to serve as a core resource for various tasks and sub-domains within IT, and to be shared by the tutors Transferability questionnaire: Relevance of the service in other pedagogic settings Pedagogic setting Reason(s) Pedagogic settings for which the service would be suitable: Setting 1: Using LeaPos in revising for exams in Universities The ranking of the answers helps the students to get oriented about their knowledge with respect to the curriculum. Setting 2: Using LeaPos for getting supplementary information on the topic in Universities The suggestion of learning materials on the topic when the answer is not satisfying helps the students to improve and widen their basic knowledge. Page 53 of 349 D7.4 - Validation 4 Pedagogic setting Setting 3: Using LeaPos for assessing and grading students‟ answers on a topic. Reason(s) Tutors save time in LeaPos and have at disposal important information, such as matched and missing concepts/phrases as well as the automatic live feedback. Pedagogic settings for which the service would be less suitable: Setting 1: In a setting where the input includes formulas and software code. At the moment LeaPos can assess only textual input. Page 54 of 349 D7.4 - Validation 4 Section 6: Conclusions Validation Topics OVT Operational Validation Topic Validated unconditionally Validated with qualifications* PVT1: Verification of accuracy of NLP tools OVT1.1 Absolute value of score The tutors/experts find that When the score given by the system is compared with the score given by the tutor, the difference between the two values is small. BIT-MEDIA (German) IPP-BAS (Bulgarian) OVT1.2 Relative value of score The tutors/experts find that When a learner has improved his/her answer, as judged by the tutor, an increase in the live feedback score is observed consistently. BIT-MEDIA (German) OVT1.3 KP feedback: The tutors/experts find that A high proportion of the phrases in the two columns (positive, missing) are judged as being correct feedback. BIT-MEDIA (German) IPP-BAS (Bulgarian) OVT1.4 KR feedback: The tutors/experts find that A high proportion of the concepts in the two columns (common, missing, additional) are judged as being correct feedback. BIT-MEDIA (German) IPP-BAS (Bulgarian) PVT2: Tutor efficiency OVT2.1 Tutors spend less time preparing final feedback for learners and grading compared with traditional means. IPP-BAS (Bulgarian) Page 55 of 349 BIT-MEDIA (German) Not validated Qualifications to validation D7.4 - Validation 4 OVT Operational Validation Topic OVT2.2 It is easy (there is less cognitive load) for tutors to provide feedback and grading using LeaPos Validated unconditionally Validated with qualifications* Not validated Qualifications to validation BIT-MEDIA (German) IPP-BAS (Bulgarian) PVT3: Quality and consistency of (semi-) automatic feedback OR information returned by the system OVT3.1 Tutors perceive that the feedback received from the system helps them prepare feedback for learners. BIT-MEDIA (German) IPP-BAS (Bulgarian) OVT3.2 Learners perceive that the live feedback received from the system contributes to informing their study activities. BIT-MEDIA (German) SU (Bulgarian) OVT3.3 Learners perceive that they receive useful additional feedback, compared with traditional means. SU (Bulgarian) BIT-MEDIA (German) OVT3.4 Learners perceive that the system can target learning materials depending on their needs. SU (Bulgarian) BIT-MEDIA (German) PVT4: Making the educational process transparent Page 56 of 349 Sometimes it is difficult to differentiate between the basic and additional information that is received via the system D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally OVT4.1 Tutors perceive that positioning is more effective compared with traditional means because the quality and quantity of the input to positioning is improved BIT-MEDIA (German) IPP-BAS (Bulgarian) OVT4.2 Tutors perceive that using LeaPos, learners receive homogeneous feedback IPP-BAS (Bulgarian) OVT4.3 Learners can receive feedback when they need it. Validated with qualifications* Not validated Qualifications to validation BIT-MEDIA (German) Tutors need some time to adjust their own grading system to the one, provided by the system BIT-MEDIA (German) The idea behind the usage of phrases and concepts needs to be made clearer to the students BIT-MEDIA (German) SU (Bulgarian) PVT5: Quality of educational output OVT5.1 The live feedback helps learners improve their answers, so they can demonstrate their knowledge more effectively BIT-MEDIA (German) SU (Bulgarian) PVT6: Motivation for learning OVT6.1 The direct feedback provided by the system encourages learners to undertake further study to address gaps in their coverage. SU (Bulgarian) PVT7: Organisational efficiency OVT7.1 There is a saving in institutional resources overall BIT-MEDIA (German) PVT8: Relevance Page 57 of 349 IPP-BAS (Bulgarian): insufficient evidence D7.4 - Validation 4 OVT Operational Validation Topic OVT8.1 The service meets one or more institutional objectives Validated unconditionally Validated with qualifications* Not validated Qualifications to validation BIT-MEDIA (German) IPP-BAS (Bulgarian) PVT9: Likelihood of adoption OVT9.1 Users were motivated to continue to use the system after the end of the formal validation activities BIT-MEDIA (German) IPP-BAS (Bulgarian) OVT9.2 A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users). IPP-BAS (Bulgarian) OVT9.3 Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? BIT-MEDIA (German) BIT-MEDIA (German) Exploitation (SWOT Analysis) The objective you are asked to consider is: "<service> (v1.5) will be adopted in pedagogic contexts beyond the end of the project". Strengths The strengths of the system (v1.5) that would be positive indicators for adoption are: “Saving tutor time and costs of formative feedback and positioning” Based on this benefit a reduced number of tutors are able to support a group of learners. This enables the learning provider to save costs. Page 58 of 349 D7.4 - Validation 4 “Targeting learning materials according to learner need (short thread functionality)” The LeaPos Services offers available learning materials to the learner. Therefore the learner is able to proceed with his learning tasks without tutors “Feedback is immediately provided” The learners are able to get the feedback from the LeaPos Service immediately without interaction from tutors and able to continue their learning activities. “Exciting to use, useful and motivating” The learners are enjoying the functional user interface and are motivated by the lists of additional phrases and concepts to follow up their learning activities. “Support for tutors building a repository of targeted learning materials (short thread functionality)” The LeaPos Services offers additional benefits for a group of tutors, which is supporting the same learners. There is a central place to add and annotate learning materials for the tutors. This functionality improves the motivation for each tutor to add materials to the LeaPos Service, because there is a direct benefit available for all of the tutors in the group. Weaknesses The weaknesses of the system (v1.5) that would be negative indicators for adoption are: “It takes time to get oriented in the result (lists)” The result lists of the LeaPos Service provides missing phrases and concepts that were not included in the learner's answer, which seems to be a negative result for the learner. Therefore the learners have to be guided to use this information as important information for their next learning activities. “Two different lists are confusing for some learners” Because for some learner the usage of two lists is confusing the widget-version enables the customization to use only one of the lists. “There are some incorrect results included in the feedback” The main improvement strategy to avoid incorrect results is to add graded answers for the knowledge poor approach and to update the concepts data for the knowledge rich approach. From the point the learners recognized how to interpret the result lists, they were able ignore incorrect results in the live feedback. Page 59 of 349 D7.4 - Validation 4 Opportunities The system has potential as follows: LeaPos has the potential to be appropriate in many non-self-directed learning situations, which rely on limited content. LeaPos can be used in any short answer situation where formative feedback and/or feedback is required. Possible uses of LeaPos can be found in primary schools through to assessing lifelong learning (as in bitmedia) or short courses in Continuous Professional Development. There is interest at bitmedia in extending LeaPos to other languages (e.g. Czech) and countries where bitmedia has a presence With some enhancement, LeaPos could be used for progress tracking. Threats “Implementing a new technology” Educational companies and institutes may be concerned with the required activities for implementing a language technology based tool in their environment because they are not familiar with this technology. “Plagiarism when answering the questions” The tutors may be concerned about the possibility that the learners are providing the same or nearly the same answers for the questions. Lack of interoperability with corporate Learning Management Systems An “Open Mind” for new technologies is required” Users may be concerned about the language technology based concept of the LeaPos Service and not be prepared to try it. Overall conclusion regarding the likelihood of adoption of LeaPos Version 1.5: LeaPos performed strongly in all aspects of the validation, and both pilot institutions expressed their interest in continuing to use LeaPos. Although LeaPos would benefit from improvements to the usability, it is already useful in real learning situations. LeaPos is very versatile in its potential educational contexts of use, being appropriate for any short answer situation where formative feedback and/or feedback is required, from primary education through to lifelong learning situations. Therefore we conclude that with effective marketing, LeaPos could become widely adopted in real educational settings. LeaPos offers additional functionality for the education company bitmedia to support their learners in combination with reducing the costs for tutors. Based on these opportunities the management of the bit group will continue involving additional tutors and learners in using the LeaPos service. The system is very likely to be adopted by both stakeholder groups – tutors and learners. Since a manager was interviewed from IPP-BAS, there is a confirmation that the management at IPP-BAS would like to continue working with LeaPos for various projects and in the teaching courses. In Page 60 of 349 D7.4 - Validation 4 spite of the users‟ requirements on larger ontology and more learning materials, tutors like the system, because it ensures immediate feedback to the learners, which 1. gives the tutors some time for reaction and 2. helps the tutors in taking the right grading decision, and 3. makes the learners eager to see themselves high scoring in “the green sector”, and to explore the suggested learning materials. LeaPos is more appropriate for non-self-directed learning scenarios, which rely on limited content. However, there is a possibility for the repository to be further enriched with respect to the tasks and the domain. Most important actions to promote adoption of LeaPos: Technical Improve the usability to make the system more intuitive, including the on-line help Continue to improve the accuracy of the language technologies and the Knowledge Poor feedback in particular “Support for different languages” The adoption of the user interface for localized languages should be simplified (e.g. xml configuration file) to reduce the required time for implementing the LeaPos Service in different countries. Investigate possible enhancement to incorporate progress tracking Domain-specific data “Build standard sets of questionnaires and answers and associated learning materials” The availability of existing questionnaires and answers will speed up the implementing of the LeaPos Service for different learning providers and learning institutes. Commercial companies could pick up this approach as business model. “Provide updated concepts” The availability of improved concepts for different courses (knowledge areas) enables the LeaPos customers to implement the Knowledge Rich Approach without additional effort. The required activities could be achieved by research staff in different universities, because the work is time-consuming and typical commercial companies don‟t have access to language technology experts. Exploitation Continue to disseminate LeaPos in research and educational conferences. Consider targeting commercial companies at these conferences who could launch LeaPos in particular domains. Set up a user group to share experiences and assist in dissemination Page 61 of 349 D7.4 - Validation 4 Section 7 – Road map Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important future enhancements to the system in order to meet stakeholder requirements: Most important: 1. Adding explanations to the result lists to assist the learners to recognize the benefits of the feedback. 2. Highlighting the important phrases and concepts in the result lists of the feedback. 3. Improvements of user interface (included help system) 4. Explain the benefits of the phrases- and concepts list in the live feedback to enable the stakeholders to choice the only one or both result lists based on their requirements. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important changes to the current scenario(s) of use in order to meet stakeholder requirements: Most important: 1. The learners should be guided to use the LeaPos Service more than one time during the course to get feedback for their current learning topics. 2. The importance of using the motivation benefit during the course should be added to the scenario. 3. The functionality of direct links to relevant parts of learning materials should be added to the scenario. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are possible additional educational contexts for future deployment: Most important: 1. After the LeaPos Service has been implemented and used in the classroom environment the usage can be passed over to distance learning scenarios. 2. The LeaPos Service could be used for pre-assessments of classroom face-to-face trainings to provide an overview of the learners knowledge to trainer some days before the first day of training. 3. The LeaPos Service concept has the potential to act as marketing tool for learning materials (books, e-learning or traditional classroom learning) as side effect of the live feedback lists of phrases and concepts - Page 62 of 349 D7.4 - Validation 4 a commercial adaption with included promotion of learning offers is possible. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important issues for future technical research to enable deployment of language technologies in educational contexts: Most important: 1. Adding support for different languages in the user interface – the user interface should automatically switch to the preferred language of the user based on his personal settings in the web browser. 2. Minimize the effort for adding questionnaires and concepts to the LeaPos service. 3. Adding Export- and Import functionality for the system data (questionnaires, answers and learning materials for one course) to allow an easy exchange of existing data (Building the data for LeaPos for specific courses could be triggered as business model similar to e-learning content). Roadmap - validation activities Further validation planned for beyond the end of the project: Claim (OVT): The LeaPos Service provides benefits for the ECDL (European Computer Driving License) education in the School environment in Austria Methodology: Transferring the LeaPos Service to the Secondary School Environment in Austria as a pilot. Objective (OVT): Adopting the LeaPos Service for new course without using language technology expert support Methodology: Adding a new course to the LeaPos Service with using this course as pilot at bitmedia. Page 63 of 349 D7.4 - Validation 4 Appendix B.3 Validation Reporting Template for WP4.2 (UNIMAN and OUNL) Section 1: Functionality implemented in Version 1.5 and alpha / beta-testing record Brief description of functionality Version number of unit Changes from Version 1.0 1. a list of concepts, which is an alternative view of the data shown in a conceptogram; different colours show which concepts came from which posting in the RSS feed v1.5 did not exist in V1.0 2. a clearer combined conceptogram v1.5 font, size, and colour changes 3. a multiple-merge conceptogram which combines three or more single conceptograms v1.5 did not exist in V1.0 4. a link from concepts to source, i.e., a particular blog posting where the concept was written about v1.5 did not exist in V1.0 5. extended context specific Help pages v1.5 improved help Many changes were made in v1.5 and documented in the LTfLL Deliverable 4.3, Appendix B. The major changes are listed below. Alpha-testing. Pilot site and language UNIMAN Date of completion of alpha testing: 7 October 2010 Who performed the alpha testing? Alisdair Smithies, Isobel Braidman Page 64 of 349 D7.4 - Validation 4 Page 65 of 349 D7.4 - Validation 4 Pilot site and language OUNL (Dutch) Date of completion of alpha testing: 7 October 2010; 14 October 2010 Who performed the alpha testing? Adriana Berlanga, Jan Hensgens Beta-testing Pilot site and language: UNIMAN, English Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): If ‘No’ or ‘Partially’, give reasons: Yes beta-testing performed by: Michelle Keown (Tutor), Tristan Pocock (Tutor), Peter Yeates (PhD student) beta testing environment (stand-alone service / integrated into Elgg): HANDOVER DATE: 22 October 2010 (Date of handover of software v.1.5 for validation) Pilot site and language: OUNL, Dutch Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): If ‘No’ or ‘Partially’, give reasons: The widget did not work at that time in Elgg beta-testing performed by: Jannes Eshuis (Tutor), Theo Verheggen (tutor) beta testing environment (stand-alone service / integrated into Elgg): Stand-alone service Page 66 of 349 No D7.4 - Validation 4 HANDOVER DATE: 8 November 2010 (Date of handover of software v.1.5 for validation) Page 67 of 349 D7.4 - Validation 4 Section 2: Validation Pilot Overview NB Information about pilot sites, courses and participants has been transferred to Appendix A.3 Pilot task Pilot site: UNIMAN Pilot language: English What is the pilot task for tutors and how do they interact with the system? Tutors were provided with access to the service. • Tutors collaborated to produce a blog (representative of the Intended Learning Outcomes for a specific Problem Based Learning Case). • Tutors accessed CONSPECT and produced a reference model from their blog. • Tutors then viewed the outputs of CONSPECT‟s analysis. • Tutor reference model conceptogram and list of concepts • The outputs of analysis of an individual student‟s blog • Student group reference model • Output of the student group reference model compared with tutor reference model • Output of an individual compared with tutor reference model • Check the list of concepts and draw conclusions about the progress of the student • Provide comments or feedback on the outputs to the pilot facilitator, who forwards any necessary actions to the students. What do the tutors produce as outputs? Are the outputs marked? Yes, conceptograms and concept lists. They were not marked How long does the pilot task last, from the tutors starting the task to their final involvement with the software? 3 weeks How do tutors/student facilitators interact with the learners and the system? They review the conceptograms and concept lists. Page 68 of 349 D7.4 - Validation 4 Describe any manual intervention of the LTfLL team in the pilot: UNIMAN provided the tutors and students with OpenID accounts and instructions for how to access and use the system. Pilot site: UNIMAN Pilot language: English This box describes the pilot task for students. What is the pilot task for students and how do they interact with the system? Students study a PBL case, “Helen”, from the Mind and Movement module, and were asked to keep a blog of their study on Elgg. They used CONSPECT to produce reference models, create comparison between them and draw conclusions. They were provided with the following instructions: Use CONSPECT to: Create and view your reference model Check the concept list, and the graph Make annotations about your thoughts Compare your reference model with other individual students Combine your reference model vs. the tutor model Combine your reference model vs. the group model Make you reference models public, and compare it with other public reference model What do the students produce as outputs? Are the outputs marked? Conceptograms. They were not marked, but were viewed by tutors, who were asked for their feedback. .How long does the pilot task last, from the students starting the task to their final involvement with the software? 3 weeks How do tutors interact with the learners and the system? Module tutors were presented with the outputs from the system and invited to provide feedback to students, via email. Describe any manual intervention of the LTfLL team in the pilot: OpenID accounts were provided, along with a user guide of the tool. Page 69 of 349 D7.4 - Validation 4 Pilot site: OUNL Pilot language: Dutch There were pilot workshops for learners and students. This box describes the pilot task for tutors. What is the pilot task for tutors and how do they interact with the system? Tutors were presented with a case, and were asked to use CONSPECT to produce reference models, create comparison between them and draw conclusions. They were provided with the following case: Imagine that you have ask your students to keep a blog, in which they write study tasks You want to see how they are doing, checking if the concepts they are mentioning in their texts are those they are required to learn in the course, individually and as a group … You decide to use CONSPECT and: Create at least 2 reference models Tutor or “book” ref. model Student or group ref. model Compare the reference models Combine the reference models you‟ve created Check the list of concepts and draw conclusions about the progress of the student Make your reference models public, and compare it with other public reference model To perform these tasks tutors were provided with examples of tutor‟s blogs, student‟s blog and group blogs. Tutors had provided earlier these materials taken from students‟ answers to specific assignments, tutor‟s materials, and workbook. What do the tutors produce as outputs? Are the outputs marked? Conceptograms. They were not marked How long does the pilot task last, from the tutors starting the task to their final involvement with the software? 2 hours, without the preparation time some of them invest on producing/getting the materials for the blogs. How do tutors interact with the learners and the system? Yes, they have to use the system to perform the tasks. They create conceptograms and review them. They also make comparison between the Page 70 of 349 D7.4 - Validation 4 different outputs of the system and discussed between them how they found the information the system provides as well as the interaction with the system. Describe any manual intervention of the LTfLL team in the pilot: OUNL created the blogs (using the material provided by the tutors). OpenID accounts were provided, a user guide of the tool, and a list of LSA parameters that will work with the provided examples. Pilot site: OUNL Pilot language: Dutch There were pilot workshops for learners and students. This box describes the pilot task for students. What is the pilot task for students and how do they interact with the system? Students were presented with a case, and were asked to use CONSPECT to produce reference models, create comparison between them and draw conclusions. They were provided with the following case: You‟ve been asked to write an essay, which answers four questions related to the first part of the course about “De vele vormen van selectie” You want to check if the concepts you are mentioning in your text are relevant for the assignment … You decide to use CONSPECT and: Create your reference model Check the concepts, and the graph Make annotations about your thoughts Compare your reference model Combine your reference model vs. the tutor model Combine your reference model vs. the group model Make you reference models public, and compare it with other public reference model Before the pilot session students were asked write an essay that answered 4 specific questions related to the first part of their course. This was the input they would use during the activity. There were also available in the system a concept map created using a tutor‟s blog (which has been created using the tutor‟s answer to the assignment), and a group concept map, which was created aggregating the input received from the students (essays from students) in to a blog entry. What do the students produce as outputs? Are the outputs marked? Page 71 of 349 D7.4 - Validation 4 Conceptograms. They were not marked but during the session these conceptograms were used by the tutor (who was present in the session) to discuss the input with the student. How long does the pilot task last, from the students starting the task to their final involvement with the software? 2 hours, without the preparation time they invest on producing their essay. How do tutors interact with the learners and the system? Yes, a tutor was present in the sessions and he gave some feedback to students considering the output from the system. Describe any manual intervention of the LTfLL team in the pilot: OUNL created the blogs (using the material provided by the students and the tutor). OpenID accounts were provided, a user guide of the tool, and a list of LSA parameters that will work with the provided examples. Page 72 of 349 D7.4 - Validation 4 Experiments Name of experiment: Validation of the outcomes of the tool Objective(s): The objective of this experiment was to ask tutors to check if the output CONSPECT provides includes the concepts on which they would provide feedback, and if the tool has calculated correctly a high proportion of concepts as important (as requested in OVT1.1 and OVT1.2). Details: Tutors were asked to provide materials to create reference models. They provided course material, students‟ texts (n=5) and a digitalized book in two subjects: Evolutionary Psychology and Cultural Psychology. With this information conceptograms were created: reference model, group model and student‟s models. Comparisons were made between them, and the result was presented and discussed with tutors. After, tutors had to indicate if the conceptogram covered sufficiently well. Tutors mentioned that conceptograms include relevant concepts, and the relations seem quite good, but there were also important concepts that were missing in the map. There were also concerns about the level of detail that the conceptogram shows. Tutors also find it difficult to interpret the conceptograms. They mention as well that is difficult to understand why some concepts that do not appear in the text do appear in the conceptogram. On the one hand this can be useful to identify new concepts and relations, on the other this could be confusing and misleading for learners. One of the tutors mentioned that he would like to provide a list of “most relevant concepts” in advance and that from there the tool will build the conceptogram. Page 73 of 349 D7.4 - Validation 4 Section 3: Results - validation/verification of Validation Topics OVT: OVT1.1 Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors have assessed that a high percentage of the concepts identified by CONSPECT are relevant to the task the learners have undertaken. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (n=2) assessed the relevance of concepts reported by CONSPECT to the learning task from 5 sets of results. Results: An average of 32% of terms reported by CONSPECT was deemed relevant to the learning task. 68% of terms were considered irrelevant. OVT: OVT1.2 Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors have assessed that CONSPECT had identified most of the concepts on which they would provide feedback Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (n=2) assessed the number of concepts reported by concepts that were relevant to the materials the students had produced in 5 sets of results. Results: Tutors assessed that CONSPECT identified an average (MEAN=8.4 topics) 23% of topics from the student materials that were suitable to provide feedback on. Page 74 of 349 D7.4 - Validation 4 Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q35 I am able to assess that CONSPECT had identified most of the concepts on which they would provide feedback Experimental 1.8 0.84 0% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors felt the relevance of concepts provided in the present version of CONSPECT was not of sufficient “depth or accuracy” to allow them to provide detailed feedback. Stemming of words was seen as a key weakness of the service – “in the medical domain the specificities of technical words can have vastly different implications, this is not accurate enough, I‟d have to guess really” “I perceive that the concept of a package like conspect could be extremely useful in teaching - in essence in allowing very rapid comparison of student responses with a model answer. The key to such a process, though, critically depends on the trustworthiness of the output and the degree to which the output is meaningful.” “Most important: the software cannot be used for students at this level without providing some means of helping tutors and students to analyse the depth of their understanding, rather than the superficialities presently included” OVT: OVT1.2 Pilot site OUNL Pilot language Dutch Operational Validation Topic Tutors have calculated that CONSPECT had identified most of the concepts on which they would provide feedback. Formative results with respect to validation indicator Stakeholder type Results Page 75 of 349 D7.4 - Validation 4 Tutors OVT: OVT1.3 Tutor responses were negative about the relevance of concepts identified by CONSPECT but positive about the method it uses. Identified concepts were considered irrelevant or too general to be of sufficient value in assessment of a learner‟s understanding in a task: “I don't think that the information represented in the bubbles in the diagram is all that representative of the meaning in the text either in terms of the balance of topical coverage, or the organisation of the concepts relative to each other.” “I do miss some important terms while others are superfluous” Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors have assessed that CONSPECT has provided appropriate linkages between concepts for most of the relevant conceptual relations. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q36 I am able to assess that CONSPECT has provided appropriate linkages between concepts for most of the relevant conceptual relations. Experimental 1.4 0.89 0% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors The tutors felt that CONSPECT‟s ability to provide appropriate linkages was not suitably effective. “It needs to expose interesting/important relationships between topics” “The tutor blog does contain expected details of the cellular basis for memory and emotional responses, together with the way in which alcohol consumption might affect this. The best that the conceptogram could do was to relate alcohol with consumption and alcoholism.” Page 76 of 349 D7.4 - Validation 4 OVT: OVT1.4 Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors have assessed that most of the concepts have been correctly categorised as important. Questionnaire type Questionnaire no. & statement Experimental / control roup Mean Standard deviation %Agree / Strongly agree n= Tutors Q37 I am able to assess that most of the key concepts have been correctly categorised as important Experimental 2.0 1.22 20% 5 OVT: OVT1.4 Pilot site OUNL Pilot language Dutch Operational Validation Topic Tutors have calculated that a high proportion of concepts have been correctly categorised as important. Formative results with respect to validation indicator Stakeholder type Results Tutors (interview, n=2) “Some terminology, that I really would have expected to turn up in the concept maps, it did not turn up, even though these are terms that are repeated quite often through the design maps. So the level of depth required in a University context is not well covered by the maps” OVT: OVT2.1 Pilot site UNIMAN Pilot language English Operational Validation Topic Using CONSPECT, tutors spend less time preparing feedback than without the system Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q7 It takes less time to complete my teaching tasks using CONSPECT Experimental 2.4 1.14 20% 5 Page 77 of 349 D7.4 - Validation 4 than without the system. Tutors Q8 Using CONSPECT enables me to work more quickly than without the system. Experimental 2.2 1.30 20% 5 Tutors Q9 I do not wait too long before receiving the requested information. Experimental 2.4 1.10 0% 5 Tutors Q10 CONSPECT provides me with the requested information when I require it (i.e. at the right time in my work activities). Experimental 2.2 1.10 0% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors (n=2) commented that CONSPECT took a long time to process the blogs and they often thought the service had “hung”. There were concerns that students would not engage with the service if this was the norm. “…it just stopped and nothing happened! I didn‟t know what to do. It could have shown a pop up box showing that it was processing at the very least” Tutors Tutor (n=1) was positive about the service‟s potential to assess large numbers of student blogs in a short time, and provide a prompt indication of who was not engaging in the learning task. “This could be really useful to show, on a short turnaround, how many of the students are actually understanding the things we‟ve asked them, I guess, though I‟m not sure how easy it would be to gauge that from the interfaces you‟ve shown me… is there a summary view of all students anywhere?” OVT: OVT2.2 Pilot site UNIMAN Pilot language English Operational Validation Topic It is easier (there is less cognitive load) for tutors to provide feedback using CONSPECT compared with not using the system Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q11a Please rank on a 5-point scale the mental effort (1 = very low mental effort; 5 = very high mental effort) you invested to accomplish Experimental 3.6 1.67 60% 5 Page 78 of 349 D7.4 - Validation 4 teaching tasks using CONSPECT. Q11b Overall, using the system requires significantly less mental effort to complete my teaching tasks than when manually assessing learners‟ conceptual coverage. OVT: OVT3.1 Pilot site UNIMAN Pilot language English Experimental 2.4 0.89 0% 5 Operational Validation Topic Tutors judge that CONSPECT shows correctly the conceptual coverage of a given topic . Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q41 CONSPECT shows correctly the conceptual coverage of a given topic. Experimental 2.2 1.30 20% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutor responses were negative about the relevance of concepts identified by CONSPECT but positive about the method it uses. Identified concepts were considered irrelevant or too general to be of sufficient value in assessment of a learner‟s understanding in a task: “I don't think that the information represented in the bubbles in the diagram are all that representative of the meaning in the text either in terms of the balance of topical coverage, or the organisation of the concepts relative to each other.” “The output is extremely reductionist and gives very little idea of the level of understanding or complexity behind each of the concepts shown.” “A major problem with this version CONSPECT, however, is that it does not provide sufficient depth to be of use, as the concepts that it identifies are far too general and superficial.” “Results don't expose any 'meaning' from the inputs” Page 79 of 349 D7.4 - Validation 4 “A major concern is that the general nature of the concepts identified gives the students the wrong message that this level is adequate preparation for module assessments. They are not adequate for this purpose and this makes tutor‟s jobs more difficult than had they been giving feedback without CONSPECT” “Given the general nature of the concepts identified I suggest it could be more useful to students either at a more basic level in their education, for example the “access to medicine course” or to other areas and disciplines than medicine, that require use of more general concepts “ OVT: OVT3.1 Pilot site OUNL Pilot language Dutch Operational Validation Topic Tutors judge that CONSPECT shows correctly the conceptual coverage of a given topic. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q6 The information CONSPECT provides me is accurate enough for helping me perform my teaching tasks. Experimental 2.2 0.84 0 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors reported that they identify important terms, but there were other important concepts that were missing: “The concepts mentioned in these plots are pretty good. I do miss some important terms while others are superfluous, but I'd say approx. 75-80% of what should be there is actually plotted” There was also not clear to them why some concepts that are not present in the text do appear in the graph, while others that are important and present in the text do not show up in the graph. Page 80 of 349 D7.4 - Validation 4 OVT: OVT3.2 Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors agree with the learner progress shown by Conspect Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q42 Tutors agree with the learner progress shown by Conspect Experimental 2.6 1.14 20% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors found CONSPECT‟s outputs did not provide a clear picture of a learner‟s progress. The visualisation of all student outputs combined is difficult to see. There is too much information on the screen to be able to easily identify different learners. “It needs to be more user friendly and the concept maps need to be clearer. They move about too much and it's difficult to separate the different concepts. It's difficult to understand the joint concept maps.” Page 81 of 349 D7.4 - Validation 4 OVT: OVT3.2 Pilot site OUNL Pilot language Dutch Operational Validation Topic Tutors agree with the learner progress shown by CONSPECT Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors The way CONSPECT provides information (list concepts, graphical representation) is useful to identify learners‟ progress. Experimental 3 0.71 20% 5 Tutors The way CONSPECT provides information (list concepts, graphical representation) is useful to identify the progress of a group of learners Experimental 3.4 0.55 40% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors find it difficult to interpret the conceptograms. “I find it hard to see what the graphs tell me. I do see the concepts, their overlap and omissions”; “I find it hard to make sense of the graphs; clusters of concepts are usually okay, but I would expect some terms to be much more central or to mediate between clusters. At other instances, I find unconnected dots that represents rather central terms” They like better the idea of having a graph than a list of concepts, especially because the graph contains the relationships between concepts. OVT: OVT3.3 Pilot site UNIMAN Pilot language English Operational Validation Topic Students feel the feedback provided supports them in adapting their learning plans. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Students Q31 The feedback provided supports me to adapt my learning plans. Experimental 3.3 0.89 43% 16 Page 82 of 349 D7.4 - Validation 4 Formative results with respect to validation indicator Stakeholder type Results Students The results published were confusing. I found it very difficult to understand what the results signified. It would be simpler to understand feedback in a score or percentage format. OVT: OVT3.3 Pilot site OUNL Pilot language Dutch Operational Validation Topic Students feel the feedback provided supports them in adapting their learning plans . Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Students I think that the information CONSPECT provides helps me to be better informed about my current learning progress. Experimental 3.5 1.29 50% 4 Students I think that comparing my graphical representation against a group representation is useful to identify my progress. Experimental 3.5 0.58 50% 4 Students I think that comparing my graphical representation against a predefined representation (e.g. Tutor representation) is useful to identify my progress. Experimental 4.0 0.82 50% 4 Students I think that the information CONSPECT provides helps me to identify knowledge gaps in my current learning progress. Experimental 2.8 1.50 50% 4 Formative results with respect to validation indicator Stakeholder type Results Students Students were positive about the concept: “I like the concept, to see how concepts are related and what are the missing ones” Page 83 of 349 D7.4 - Validation 4 “I like the idea a lot. I can see the concepts I‟ve miss in my summaries” “I like the graph visualization to have a nice overview of what I have missed” “I think it can become a handy tool in the future” Some of the students did not consider that the group model was relevant for them. “I don‟t trust my peer‟s texts, I prefer to have the tutor text” Students mentioned that they do not understand why only the stem part of the word was showed, and find it difficult to understand why some words that are not in the text appear in the conceptogram; “I don‟t understand why only parts of the word are displayed, the only first part could mean different concepts” Students also mentioned that they would like to have more information about the relations (semantic meaning), as well as more information to interpret the map. OVT: OVT3.4 Pilot site UNIMAN Pilot language English Operational Validation Topic Students agree with the feedback provided Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners Q32 (Learners) I agree with the feedback provided. Experimental 3.1 0.72 31% 16 Formative results with respect to validation indicator Stakeholder type Results Students “It was helpful to confirm that I had covered some of the main concepts but the level of detail wasn't always so helpful.” “I like comparing my own model with the intended learning outcomes model. This was helpful to identify weaknesses in my blog and helped by suggesting topics to cover.” “The results published were confusing. I found it very difficult to understand what the results signified. It would be simpler to understand feedback in a score or percentage format.” Page 84 of 349 D7.4 - Validation 4 OVT: OVT4.1 Pilot site UNIMAN Pilot language English Operational Validation Topic Students are able to position themselves whenever they want using CONSPECT Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Students Q33 I am able to position myself whenever I want Experimental 3.3 0.79 50% 16 Formative results with respect to validation indicator Stakeholder type Results Students “Easier access than open ID. I found the login and access to the blogs very hit and miss.” Students found the service did not always recall their results on the first attempt, there were some instances where the software was not accessible, which resulted in students recording a low score for this question. OVT: OVT4.2 Pilot site UNIMAN Pilot language English Operational Validation Topic Using CONSPECT, tutors are able to assess the conceptual progress of their students based on their reflective documents. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q2 Overall, CONSPECT helps me to complete my teaching tasks successfully. Experimental 2.4 1.14 20% 5 Tutors Q3 Overall, I believe that CONSPECT provides adequate support for my teaching. Experimental 2.4 1.14 20% 5 Tutors Q4 Overall, I find CONSPECT useful in my teaching. Experimental 2.4 1.14 20% 5 Tutors Q38 I am able to assess the conceptual progress of my students based on their reflective documents. Experimental 2.4 1.52 40% 5 Page 85 of 349 D7.4 - Validation 4 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors (n=3) commented that the direct feedback provided by the service was not trustworthy and each analysis would require further investigation prior to release of the feedback to ensure it was not misleading: “A major concern is that the general nature of the concepts identified gives the students the wrong message that this level is adequate preparation for module assessments. They are not adequate for this purpose and this makes tutor‟s jobs more difficult than had they been giving feedback without CONSPECT” Tutors Tutor (n=1) was positive about the service‟s utility for this purpose but found the interface was not sufficiently intuitive. OVT: OVT4.2 Pilot site OUNL Pilot language Dutch Operational Validation Topic Using CONSPECT, tutors are able to assess the conceptual progress of their students based on their reflective documents. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q2 Overall, CONSPECT helps me to complete my teaching tasks successfully. Experimental 2.2 0.8 0 5 Tutors Q3 Overall, I believe that CONSPECT provides adequate support for my teaching. Experimental 2.4 1.14 20% 5 Tutors Q4 Overall, I find CONSPECT useful in my teaching. Experimental 2.6 0.89 20% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors reported that they could see potential in the method implemented within the service to identify areas for additional support but that the information presented to them by the service in its present form did not provide a sufficient level of detail Page 86 of 349 D7.4 - Validation 4 on which to base feedback. Tutors were able to cross-check the outputs against student blogs and found this a useful way to “drill down” to examine whether a student had covered a topic area effectively, but were critical that CONSPECT had not picked up some of the important concepts and instead presented a large quantity of superficial data. OVT: OVT4.3 Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors are able to able to locate the outliers within their groups Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q39 I am able to locate outliers within my group Experimental 2.0 1.22 20% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors reported that the conceptogram output was unclear and difficult to interpret with the number of students usually present in a PBL case. In interviews, tutors (n=5) reported that this was one of the most important features they would like the service to provide, but were dissatisfied with the way the present visualisation provides this information. OVT: OVT4.3 Pilot site OUNL Pilot language Dutch Operational Validation Topic Tutors are able to able to locate the outliers within their groups Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q34. The way CONSPECT provides information (list concepts, graphical representation) is useful to identify learners progress. Experimental 3.0 0.71 20% 5 Page 87 of 349 D7.4 - Validation 4 Tutors Q35. The way CONSPECT provides information (list concepts, graphical representation) is useful to identify the progress of a group of learners Experimental 3.4 0.55 40% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors find it difficult to interpret the conceptograms. Tutors reported that the conceptogram was difficult to interpret. The combination of more than two conceptograms, show information that was impossible to interpret. “I find it hard to see what the graphs tell me. I do see the concepts, their overlap and omissions”; “I find it hard to make sense of the graphs; clusters of concepts are usually okay, but I would expect some terms to be much more central or to mediate between clusters. At other instances, I find unconnected dots that represents rather central terms” They like better the idea of having a graph than a list of concepts, especially because the graph contains the relationships between concepts. In interviews, tutors (n=5) also show concerns about the creation of the group model, a feature that the system does not generate automatically, the task of creating an aggregated text could be very time consuming in big groups of students, and could change constantly, making it difficult to create and maintain. OVT: OVT4.4 Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors are able to provide extra support for the problematic outliers during the learning process Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q5 The CONSPECT service helps me to improve the quality of my support to learners. Experimental 2.6 0.89 20% 5 Tutors Q40 I am able to provide extra support for the problematic outliers during the learning process Experimental 2.0 1.22 20% 5 Formative results with respect to validation indicator Page 88 of 349 D7.4 - Validation 4 Stakeholder type Results Tutors Tutors reported that they could see potential in the method implemented within the service to identify areas for additional support but that the information presented to them by the service in its present form did not provide a sufficient level of detail on which to base feedback. Tutors were able to cross-check the outputs against student blogs and found this a useful way to “drill down” to examine whether a student had covered a topic area effectively, but were critical that CONSPECT had not picked up some of the important concepts and instead presented a large quantity of superficial data. OVT: OVT4.4 Pilot site OUNL Pilot language Dutch Operational Validation Topic Tutors are able to provide extra support for the problematic outliers during the learning process Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors Q5 The CONSPECT service helps me to improve the quality of my support to learners. Experimental 2.8 1.30 40% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors Tutors reported that they could see potential in the method implemented within the service to identify areas for additional support but that the information presented to them by the service in its present form did not provide a sufficient level of detail on which to base feedback. Tutors were able to cross-check the outputs against student blogs and found this a useful way to “drill down” to examine whether a student had covered a topic area effectively, but were critical that CONSPECT had not picked up some of the important concepts and instead presented a large quantity of superficial data. Page 89 of 349 D7.4 - Validation 4 OVT: OVT5.1 Pilot site UNIMAN Questionnaire type Pilot language English Operational Validation Topic Students are able to use the feedback given during the writing process to help them to improve the final texts Questionnaire no. & statement Q34 The feedback given during the writing process helps me to improve the final texts. Formative results with respect to validation indicator Students Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Experimental 3.4 0.96 56% 16 Stakeholder type Results Students “I have developed working on cues, from the concept list, and making the information concise” OVT: OVT6.1 Pilot site UNIMAN Pilot language English Operational Validation Topic Students find the feedback provided encourages them to undertake further study Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Students Q35 The feedback given encourages me to undertake further study Experimental 3.5 0.82 56% 16 Formative results with respect to validation indicator Stakeholder type Results Students “It has made me reflect on what work I actually did and writing the blog worked as a good was of looking back on what work I had done and how much I had actually remembered. This way it worked as a revision-like process.” It would probably also help in writing a better portfolio, as in each case the links between different topics would be clearer. It would also serve as a good summarising tool for those such as myself who don't usually draw all of my notes together after Page 90 of 349 D7.4 - Validation 4 studying each ILO in each case." OVT: OVT6.1 Pilot site OUNL Pilot language Dutch Operational Validation Topic Students find the feedback provided encourages them to undertake further study Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Students 18. Using CONSPECT increases my curiosity about the learning topic. Experimental 3.0 0.82 25% 4 Students 19. CONSPECT makes learning more interesting. Experimental 2.3 0.50 0% 4 Students 20. Using the CONSPECT motivates me to explore the learning topic more fully. Experimental 2.5 1.0 25% 4 Formative results with respect to validation indicator Stakeholder type Results Students “I like the graph visualization to have a nice overview of what I have missed” A student like it the idea that some concepts that are not in the text are displayed in the conceptogram, this is a way of “discovering new associations between concepts”. OVT: OVT6.2 Pilot site UNIMAN Pilot language English Operational Validation Topic Students are confident that they are aware of those aspects of their conceptual coverage that are strong Page 91 of 349 D7.4 - Validation 4 Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Students Q36 I am confident that I am aware of those aspects of my conceptual coverage that are strong Experimental 3.6 0.96 56% 16 Formative results with respect to validation indicator Stakeholder type Results Students “within each case there were many different ideas we had to go away and study, and using a system like Conspect it helped draw all those different ideas back into the case story. This made the links and reasons and dynamics between each topic more stated.” OVT: OVT6.3 Pilot site UNIMAN Pilot language English Operational Validation Topic Students who use CONSPECT feel better informed about their own learning. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Students Q36 When I use CONSPECT I feel better informed about my own learning. Experimental 3.2 0.98 43% 16 Formative results with respect to validation indicator Stakeholder type Results Students “CONSPECT allows easy comparison of the ideas that me and my peers or a tutor have about the case that we summarised.” Page 92 of 349 D7.4 - Validation 4 OVT: OVT7.1 Pilot site UNIMAN Pilot language English Operational Validation Topic There is a saving in institutional resources overall Formative results with respect to validation indicator Stakeholder type Results No evidence – The pilot presented an additional activity for the institution. OVT: OVT7.1 Pilot site OUNL Pilot language Dutch Operational Validation Topic There is a saving in institutional resources overall Formative results with respect to validation indicator Stakeholder type Results No evidence – The pilot presented an additional activity for the institution. OVT: OVT8.1 Pilot site UNIMAN Pilot language English Operational Validation Topic The service addresses one or more institutional objectives Formative results with respect to validation indicator Stakeholder type Results Tutors “Student feedback is an issue for the School, specifically formative provision, so this service provides a solution to an institutional problem. The challenge to adoption is that we don‟t ask students to provide the kinds of information this system needs to work and there are the privacy issues around their data to consider so I don‟t think we would be able to make use of it as it is now.” Page 93 of 349 D7.4 - Validation 4 OVT: OVT8.1 Pilot site OUNL Pilot language Dutch Operational Validation Topic The service addresses one or more institutional objectives Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors C. I think CONSPECT addresses one of the burning problems of the institution. Experimental 3 1.58 40% 5 Formative results with respect to validation indicator Stakeholder type Results Tutors “Overall, I would say that CONSPECT has potential but is not yet developed far enough to make a significant contribution” "Anything that helps time wise or money wise, we will grab by both hands and we will use it.” Managers A Manager suggested that the tool will address a key issue of the institution: the success rates of students that end their studies. Studies show that lack of feedback is a key issue. With this tool you can provide feedback, repeatedly. “It is much less frustrated for students to get a lot of feedback and when they have to handing their final essay to have some confidence that it works. And that would keep students working, and don‟t get discouraged” “It becomes very expensive if you want to provide feedback on writing essays. Certainly, you cannot do it for a reasonable cost more than once, really you would like to do this repeatedly, but we are unable to do it, because it just takes too much time....but it would help on the early stages to have automatic feedback on the things you are writing” OVT: OVT9.1 Pilot site UNIMAN Questionnaire type Pilot language English Operational Validation Topic Users were motivated to continue to use the system after the end of the formal validation activities Experimental / control group Questionnaire no. & statement Page 94 of 349 Mean Standard deviation %Agree / Strongly agree n= D7.4 - Validation 4 Tutors 21. I would recommend this system to others. Experimental 1.8 1.30 20% 5 Tutors 22. I am eager to explore different things with CONSPECT Experimental 2.4 1.52 40% 5 Tutors 29. I would like to use the service after the pilot. Experimental 2.0 1.22 20% 5 Tutors 30. If the service is available after the pilot, I will definitely use it. Experimental 1.8 1.30 20% 5 Students 21 I would recommend this system to other learners to help them in their learning. Experimental 2.8 0.91 19% 16 Students 22 I am eager to explore different things with CONSPECT Experimental 3.3 0.95 50% 16 Students 29 I would like to use the service in my learning activities after the pilot. Experimental 3.5 0.89 50% 16 Students 30 If the service is available after the pilot, I will definitely use it in my learning activities Experimental 3.1 0.77 31% 16 OVT: OVT9.2 Pilot site UNIMAN Pilot language English Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users) Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Generic questionnaire - learners Results: Descriptive Statistics - Learners N Effectiveness Mean 22 3,13 Std. Deviation ,805 Page 95 of 349 D7.4 - Validation 4 Efficiency 16 2,89 ,516 Cognitive load 16 3,00 1,033 Usability 22 2,93 ,867 Satisfaction 22 3,08 ,752 Facilitating conditions 22 3,83 ,795 Self-Efficacy 16 3,40 ,772 Behavioural intention 22 3,16 ,878 UNIMAN 22 3,13 ,625 Valid N (listwise) 16 OVT: OVT9.2 Pilot site OUNL Pilot language Dutch Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users) Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Results: Descriptive Statistics - Learners N Effectiveness Efficiency Cognitive load Usability Mean Std. Deviation 4 2.83 4 3.06 N/A N/A 4 2.25 4 3.50 N/A N/A Page 96 of 349 D7.4 - Validation 4 Satisfaction Facilitating conditions Self-Efficacy Behavioural intention OUNL Valid N (listwise) OVT: OVT9.3 Pilot site OUNL 4 2.71 4 2.92 N/A 4 3.08 4 3.00 N/A 4 2.98 4 N/A Pilot language Dutch N/A N/A Operational Validation Topic Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree Tutors How likely are you to adopt CONSPECT in your own educational practice? Dissemination 2.8 1.36 38% Page 97 of 349 n= D7.4 - Validation 4 Section 4: Results – validation activities informing future changes / enhancements to the system VALIDATION ACTIVITY Pilot partner: UNIMAN Service language: English Additional formative results (not associated with validation topics) Alpha testing Major changes identified during alpha testing but not yet implemented are: Better handling of the creation of a group model is highly desirable, It is time-consuming to compare individual learner progress with the group as a whole – more use of summary data is required. The process of comparing conceptograms is not clear. The visualisation of conceptogram comparison is not clear – there is no clear key / legend identifying the meaning of the symbols. There are two manually configured thresholds but no indication of the number range, whether they should be integer or decimal, and implications of these for the processing and outputs. When comparing 2 conceptograms, it is not immediately clear which is represented by which colour. There are too many stem words in the results. The terminology on the interface is not clear. The first screen the user sees has at the bottom the options “feeds” and “concepts‟. They are not clear. Something like: “creation of conceptograms” (instead of feeds) and “list of conceptograms” (instead of concepts) might give a better idea. It is also not clear what the lists show, there is no description of what the interface is displaying. Beta testing Major changes identified during beta testing but not yet implemented are: The ability to upload any kind of file, not just RSS feeds. There are privacy issues, such as the student materials shouldn‟t be freely available in a blog. It is not clear why and how the tool uses “tags” There is no simple way of comparing progress of a learner over time. A time slider was discussed at previous development meetings as a means of providing this feature but this has not yet been implemented. Tutor interviews Major changes identified by tutors are: Page 98 of 349 D7.4 - Validation 4 It would be better to upload any kind of file, not only feeds While clicking directly into a concept, this should directly point out the place where the concept has been found (maybe highlight all the occurrences of the concept). Now, it goes directly to the whole text, without any advice. Conceptograms need to be easier to read and render in a way that stops them from moving about on the screen. Terminology used must be made more intuitive Results don‟t provide guidance or direction for action There are options in the interface that are useless for tutors or students: such as ”GraphML Data” Tutor workshop(s) Major changes identified by tutors attending the workshop are that the results are at too high a level and are not sufficiently detailed enough to provide a useful basis on which to give feedback. The threshold management and visualisation need to be improved to address these requirements, perhaps with a slider control as used in similar systems (Leximancer for example). Learner focus group Major changes identified during the Focus Group – Terms are unclear (stemmed terms) – the service needs to present complete words and make use of phrases instead of presenting two words of the same concept as individual bubbles. The relationships between terms are not detailed enough to be useful – too high a level. Having access to the tutor blog is desirable for students. The group conceptogram is difficult to use – the terms are difficult to read and it is impossible to see an individual‟s connections. A simple way to do this might be to use colour to highlight an individual‟s connections when their name (square?) is clicked on. Teaching manager interview VALIDATION ACTIVITY Privacy and quality assurance issues need to be managed within the service to make it easily implementable in an Undergraduate Medical School context. Pilot partner: OUNL Service language: Dutch Additional formative results (not associated with validation topics) Page 99 of 349 D7.4 - Validation 4 Alpha testing Major changes identified during alpha testing but not yet implemented are: Better handling of LSA parameters required, too complex and too time consuming. Furthermore, once the parameters have been set, and the graph generated there is no way to go directly to modify the parameters. While building a conceptogram that includes more than 1 blog post, the concepts in the concept list are ordered by post, and not by their relevance considering all the selected posts. Better handling of the creation of a group model will be highly desirable; it is very time-consuming to compare individual learner progress with the group as a whole – aggregated data is required. The comparison of more than 2 conceptograms is difficult to interpret. It shows rectangles but it is not clear what they mean. It is possible to exclude some concepts in the graph, but it would be also desirable to be able to group concepts that mean the same for the user. While comparing 2 conceptograms, the table that shows the missing and overlapping concepts should have headings, the headings now are not informative (“graph 1”). The conceptograms present too many stem words, that could mean different concepts in Dutch Beta testing Major changes identified during beta testing but not yet implemented are: The level of detail that the conceptogram shows is not detailed enough when the corpus is not particular from that topic. It could provide a general view, but tutors request a deeper level of detail “some terminology, that I really would have expected to turn up in the concept maps, did not turn up, even though these are terms that are repeated quite often through text inputs”. Be able to upload any kind of file, not only feeds. There are privacy issues, such as the student materials shouldn‟t be freely available in a blog. The dates and times of the creation of “conceptograms” are incorrect, and this creates confusion It is not clear why and how the tool uses “tags” Tutor interviews Major changes identified by tutors are: User interface designed for end-users not from technical perspective: “we use terms the teachers use, you use terms from your technology and our teachers don't, they know what a concept is, but it is quite different from yours. Tag, they don't know what a tag is” Page 100 of 349 D7.4 - Validation 4 Better handling of the parameters: “that you have to set the parameters, but you do not comprehend the meaning of those parameters. So that should be turned around and presented in a tutor perspective, not in a technological perspective”; “now it is too complicated and the things you said about just, well the user interface problems and setting parameters that you don't understand. It happens to take a lot of steps” It would be better to upload any kind of file, not only feeds While clicking directly into a concept, this should directly point out the place where the concept has been found (maybe highlight all the occurrences of the concept in the original text). At the moment, it goes directly to the whole text, without any advice. To have two different modes: tutor and student modes. In a tutor mode, tutors should be able to indicate the most important concepts in advance. Displaying one screen both the conceptogram and the list of concepts, side by side. Percentages of completion (% of concepts that are covered by one text vs. the other text) Easy integration with current University‟s VLE (Blackboard or Moodle) There are options in the interface that are useless for tutors or students: such as ”GraphML Data” Tutor workshop(s) Major changes identified by tutors attending the workshop are Indicate what is the relation between concepts (semantic meaning of the links) Highlight of the concepts in the original text Zoom in the conceptogram: view the underling concepts, links and additional texts Importance of each one of the concepts in the whole graph (%) Comparison of links between conceptograms. Learner focus group 1 Major changes identified during learner Focus Group are: Zoom in concepts and get more information about the concept, its relations with other concepts, etc. Highlight the concepts in the original text Show the semantic of the links Comparison of links (my relations vs. Tutor‟s relations in a conceptogram) Display the conceptogram side by side of the list of words Page 101 of 349 D7.4 - Validation 4 Be able to manipulate the graph: make groups of words, annotate the map Compare only parts of the text, or only certain concepts from the reference model Identify why a concept is displayed in the graph (it it comes from the text or from the corpus only) In the concept list: show major concepts, its value (%), tree structure (that shows the relation between the concept and other concepts) Teaching manager interview Integration with the institutional VLE (moodle) The system has to have a very easy, intuitive and friendly interface The output the system provides must be fully probed, so its quality is demonstrated Page 102 of 349 D7.4 - Validation 4 Section 5: Results – validation activities informing transferability, exploitation and barriers to adoption VALIDATION ACTIVITY Partner(s) involved: UNIMAN Service language: English Additional formative results (not associated with validation topics) Alpha testing Stemmed words – this issue must be addressed to enable tutors and students to see value in the service Naming and time conventions must be improved Service must permit different formats of document to be included Beta testing Tutors are concerned about the use of blogs as input, there are fundamental concerns about how privacy would be managed in the system. Tutors required significant support to use the system and found it overly complex to access the information they required. Tutors found the service was most useful to identify outliers but considered that the conceptogram output was not a sufficient basis on which to provide feedback, the terms were not relevant. Tutors found the OpenID login process unwieldy. Tutor interviews Tutor workshop(s) All tutors mentioned that o The tool is too complex for most people, the navigation and results presentation needs to be simple to use. o The setting of parameters is too complex and with different blogs produced using different parameters, the comparability of the outputs is brought into question Tutors suggested alternative uses of the service, e.g.: o Assess Masters level assignments on the MSc teaching certification course o Check if the learning materials tutors are creating contain the most important concepts o Assess different resources for their relevance to the learning objectives Tutors were positive about using blogs with PBL: “One point I would like to make was that while I was working on the blog I found it an extremely useful way of looking that Page 103 of 349 D7.4 - Validation 4 bit deeper into the case (even though i had recently covered this one it is amazing how quickly we move on to the next case), probably not a bad thing for a PBL tutor. If these blogs were to be produced for all cases in the future I think it would be extremely beneficial for them to be made available to tutors as well as students being able to compare their own blog. A group of tutors producing blogs may also be a good idea?” Learner focus group 1 Students would be more prepared to use this if it was simpler to log in – using their institutional details with a single log in for the blog and the service was identified as desirable. Teaching manager interview Teaching managers see the value of CONSPECT as a catalyst to encourage tutors to provide formative feedback, if the necessary processes and technologies were put in place. As learners responded positively to producing a blog for this activity, the results will be used to inform future decisions about inclusion of such activities and specifications for use of e-learning technologies in the new curriculum. Workshop with Faculty e-learning experts In answer to the question "How likely are you to adopt CONSPECT in your own educational practice?": mean 4.3 out of 5, SD 0.75, 83% Agree/Strongly Agree (n=18) VALIDATION ACTIVITY Partner(s) involved: OUNL Service language: Dutch Additional formative results (not associated with validation topics) Alpha testing Major issues encountered in transferring CONSPECT to Dutch: o Stemmed: Too many stem words, that could mean different concepts o Stop list, should contain adequate level of detail, otherwise the tool displays too many irrelevant words Major issues encountered in transferring CONSPECT to Evolutionary Psychology domain: o Specific Corpus: specific corpus about Evolutionary Psychology is not available in Dutch, most books and references are in English, but students learn and write in Dutch. Beta testing There was considerable tutor‟s resistance to the use of blogs as input, as they do not have time to maintain blogs as well as do their coursework. They are very concerned about privacy. Page 104 of 349 D7.4 - Validation 4 Tutors were sceptical to try the tool, it seemed too complex Tutors have problems on understanding what the graph is telling them. Alternative way of sign-in besides OpenID Tutor interviews All tutors mentioned that The tool is too complex; the interface should be simpler and easy to use. The setting of parameters is too complex They would need to have the tool integrated in the other technologies they use (Moodle, Blackboard) and it has to be completely bullet proof. Upload of any kind of file (word, .pdf), blogs are not well spread and have privacy issues. Tutors pointed out different new uses of the services, such as: Check bachelor thesis (comparing the traditional way of assessing them versus the results from CONSPECT) Check if the learning materials tutors are creating contain the most important concepts Generate a first outline (based on a set of input resources) to create study materials Check plagiarism Check different study material (books, references, etc), and compare them, to decide which one of them fits better the learning objectives Quality check of the learners text materials: learners might be obliged to produce a conceptogram to show they are producing learning evidences before be able to get access to the model answers (currently, students get model answers in an automatic way, if they type a possible answer they get the model, but most of them type only letters, without really writing any answer) Check internal coherence of texts, discourse, argumentation Using in forums, to get the picture of the discussion, generate conceptogram about a group model Tutor workshop(s) Tutors identify the value of Conspect, but feel the tool was still too basic in terms of the semantic meaning of the links, and the need of getting interactive feedback regarding the link between concepts and relationships, Learner focus group 1 Most students mentioned that the interface was not intuitive. One student mention that he did like the idea that the interface did not have too many options Page 105 of 349 D7.4 - Validation 4 Learners mentioned that they will not use the system if it is too time consuming Learners show concerns about the ethical use of the tool: that the tool could be used by tutors to get marks of student‟s work Teaching manager interview The manager indicated that the tool should be fully validated in different topics and courses. The interface should be intuitive and integrated in the current VLE (Moodle) Page 106 of 349 D7.4 - Validation 4 Transferability questionnaire: institutional policies and practices UNIMAN is using the Blackboard platform; this is a hosted service and there is no opportunity to integrate CONSPECT. OUNL uses BlackBoard and Moodle and there is a strong idea that any new service should be integrated into the University's VLE. UNIMAN has strict privacy and ethics policies, which govern the way in which student data is anonymised and shared with both other students and staff. The way the service handles student data needs to be amended to work with the existing privacy rules governing data held on the VLE. Permissions could be inherited from existing data, or need to be determined manually, and the data needs to be anonymised automatically for the service to be made available across the institution. At OUNL tutors and students want to keep private their work, they are not willing to share their text with others in the Web. The tool uses open feeds, and that have privacy issues for students and tutors. Students do not want to have their texts published, tutors do not want to have their materials free available thru a blog. Up to now, OUNL students do not receive any automatic feedback from a computer system. Tutors are afraid that students will not be willing to interact with a feedback system instead that with a tutor. UNIMAN staff and students do not use blogs. Staff and students undertook the writing of blogs as an additional activity to their normal workload. Participants found the process of writing blogs about their practice extremely useful, and the pilot provides positive evidence of the usefulness of this activity which could be used to inform institutional practice in the future. OUNL tutors do not use Blogs in their teaching practice. Students are not asked to use blogs during their learning studies. Tutors are not willing to incorporate any means outside the university‟s VLE. They want to be able to upload any kind of file, not only feeds. OUNL students are very pragmatic, they focus on get the information and learning materials they need to study and do not have time to do any extra effort. The tool is still too complex by itself. Particularly, the parameter section. For a student, only a single click to get the feedback seems the most desirable way to gain adoption. OUNL students are mostly adults, they are not digital natives (sometimes, they have problems to login to the normal VLE site. The tool requires a to have substantial digital knowledge as, for example, understand what is an OpenID, what is a blog, a feed, etc. OUNL tutors are willing to incorporate new tools only if they are designed in their own “vocabulary and terms”. The tool still is quite technical oriented, and contains quite number of terms, parameters and information that tutors do not understand. For instance, the parameter section is specially a problem, or the “GraphML Data” option is not useful for tutors. Page 107 of 349 D7.4 - Validation 4 Transferability questionnaire: Relevance of the service in other pedagogic settings Pedagogic setting Reason(s) Pedagogic settings for which the service would be suitable: Service initially developed for Problem Based Learning Any setting in which learners produce text materials based on their knowledge of a given domain. reasons: students can use the service to compare their written materials to materials representing a desired level of domain knowledge in any area for which the tool has been primed with an appropriate LSA space. Pedagogic settings for which the service would be less suitable: Any setting in which learners work principally with non-text based materials, and where they are not required to write about their learning of such. reasons: Text-based learning materials are the input for the service. Transferability questionnaire: Relevance of the service in other domains Types of domain Reason(s) Types of domain for which the service would be suitable: setting 1: Any domains where the primary discourse for assessment is text-based. (no images, no formulas, no procedural knowledge): literature, psychology, education, social and human sciences. reasons: LSA analyses words and relations in language, to establish the closeness of concepts. Types of domain for which the service would be less suitable: setting 1: Domains where knowledge is practical, procedural. E.g. engineering, mechanics setting 2: Domains where knowledge is subjective and locally contextualised, where no corpus in the desired language is reasons: CONSPECT could identify that learners knew what terms meant, e.g. from a glossary, but would not be best placed to assess their knowledge of assembly / execution of tasks. Page 108 of 349 D7.4 - Validation 4 available. setting 3: Domains such as mathematics or chemistry where much of the information is symbolic. Page 109 of 349 D7.4 - Validation 4 Section 6: Conclusions Validation Topics OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated Qualifications to validation PVT1: Verification of accuracy of NLP tools OVT1.1 Tutors have assessed that a high percentage of the concepts identified by CONSPECT are relevant to the task the learners have undertaken. UNIMAN Technical terms are stemmed and need to be represented as complete words to improve accuracy. This would improve the clarity of the results. The detail showed in the conceptogram is not deep enough to represent the text. However, the graph does include relevant concepts. OVT1.2 Tutors have calculated that a high proportion of concepts have been correctly categorised as important. UNIMAN OUNL Level and detail of reporting needs further refinement (UNIMAN) The stemming algorithm is not of sufficient quality for the Dutch language, the same stem could refer to different concepts, the same concept could have several stems. This makes it difficult to understand the graph and list of concepts. Page 110 of 349 D7.4 - Validation 4 OVT Operational Validation Topic OVT1.3 OVT1.4 Validated unconditionally Validated with qualifications* Not validated Qualifications to validation Tutors have assessed that CONSPECT has provided appropriate linkages between concepts for most of the relevant conceptual relations. UNIMAN Level and detail of reporting needs further refinement Tutors have assessed that most of the concepts have been correctly categorised as important. UNIMAN OUNL Level and detail of reporting needs further refinement UNIMAN OUNL It takes more time to provide feedback. UNIMAN OUNL There is a higher cognitive load. PVT2: Tutor efficiency OVT2.1 Using CONSPECT, tutors spend less time preparing feedback than without the system OVT2.2 It is easier (there is less cognitive load) for tutors to provide feedback using CONSPECT compared with not using the system PVT3: Quality and consistency of (semi-) automatic feedback OR information returned by the system OVT3.1 Tutors judge that CONSPECT shows correctly the conceptual coverage of a given topic. OUNL UNIMAN The complexity of the topic used in the validation was too high for CONSPECT to draw useful results from. Tutors indicated that conceptograms show important concepts, but others that are also relevant are not shown in the graph. OVT3.2 Tutors agree with the learner progress shown by Conspect OUNL UNIMAN Tutors perceive that the information is potentially useful, but the ambiguity of stemmed Page 111 of 349 D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated Qualifications to validation words was problematic. OVT3.3 Students feel the feedback provided supports them in adapting their learning plans. OUNL UNIMAN OVT3.4 Students agree with the feedback provided UNIMAN Students found the process useful but the presentation of results was unclear. Students indicated that they found the outputs of CONSPECT relevant, but that they would like to have more information about the concept itself and the meaning of its relations. (OUNL) PVT4: Making the educational process transparent OVT4.1 Students are able to position themselves whenever they want using CONSPECT OVT4.2 Using CONSPECT, tutors are able to assess the conceptual progress of their students based on their reflective documents. UNIMAN OUNL OVT4.3 Tutors are able to able to locate the outliers within their groups UNIMAN OVT4.4 Tutors are able to provide extra support for the problematic outliers during the learning process UNIMAN OUNL UNIMAN PVT5: Quality of educational output Page 112 of 349 Tutors were able in some cases to identify outliers. No relevant evidence from OUNL. D7.4 - Validation 4 OVT Operational Validation Topic OVT5.1 Students are able to use the feedback given during the writing process to help them to improve the final texts Validated unconditionally Validated with qualifications* Not validated UNIMAN Qualifications to validation Students felt they were able to identify some concepts on which they were then able to improve their texts (UNIMAN) PVT6: Motivation for learning OVT6.1 Students find the feedback provided encourages them to undertake further study UNIMAN OVT6.2 Students are confident that they are aware of those aspects of their conceptual coverage that are strong UNIMAN Seeing that others had covered similar topic helped to positively reinforce learners‟ confidence OVT6.3 Students who use CONSPECT feel better informed about their own learning. UNIMAN Seeing that others had covered similar topic helped to positively reinforce learners‟ confidence OUNL Students find the feedback useful, but there are no clear identifiers that the feedback provided encourages them to undertake further study. This has to be validated further. PVT7: Organisational efficiency OVT7.1 There is a saving in institutional resources overall N/A: There is no evidence to support that there is a saving in institutional resource. Students are not presently provided with formative feedback on written materials of the format used within the pilot. PVT8: Relevance Page 113 of 349 D7.4 - Validation 4 OVT Operational Validation Topic OVT8.1 The service meets one or more institutional objectives Validated unconditionally Validated with qualifications* Not validated UNIMAN OUNL Qualifications to validation Tutors could identify new potential uses of the tool, identify the benefits of automatically processing information, and see how the tool could be used in future plans of the university (UNIMAN). A Manager indicated that the tool helps to provide continue formative, which might help to motivate students, and therefore reduce drop-out rate (one of the institutional objectives) (OUNL). PVT9: Likelihood of adoption OVT9.1 Users were motivated to continue to use the system after the end of the formal validation activities OVT9.2 A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users). OVT9.3 Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? UNIMAN Tutors were enthusiastic about the potential of CONSPECT but not its current implementation. UNIMAN OUNL OUNL Exploitation (SWOT Analysis) Page 114 of 349 UNIMAN: Mean 3.13 OUNL: Mean 2.98 OUNL: Mean 2.8 D7.4 - Validation 4 The objective you are asked to consider is: "CONSPECT (v1.5) will be adopted in pedagogic contexts beyond the end of the project". Strengths The strengths of the system (v1.5) that would be positive indicators for adoption are: The system provides formative feedback on demand to learners. This is a major institutional objective in both pilot institutions. The system provides a novel means by which to analyse a text document, and establish the presence and coverage some of the key concepts that relate to a background corpus. To provide this information manually is time consuming and resource intensive, specifically In learning environments with large number of learners who are required to produce text and interactions (blog, forums). The conceptual basis of the system was positively received by all stakeholders. Weaknesses The weaknesses of the system (v1.5) that would be negative indicators for adoption are: Care should be taken that the tool produces input which has an adequate level of detail (i.e. concepts are specific from the topic, not general concepts) At present, the system only takes RSS feeds as input. Interoperability with other components of VLE is needed to market it to potential vendors. The Interface is too complex, and not user-oriented. Tutors expect that students will not have enough knowledge and skills to use the tool. The process and interface support to access and compare user outputs needs further improvement. Opportunities The system has potential as follows: The system has the ability to assist tutors in identifying outliers quickly. Tutors were keen on this aspect of the service The system can be used to extract the main points from reading materials quickly, so that tutors/learners can see whether a long text is relevant to their practice or current study plans. The system can be used to check whether or not students inputs are sufficient to get a „model answer‟ from the institutional VLS. The system could be used to extract the main points form a discussion forum, so tutors/learners can see the topics of discussion and have a view of what the group is discussing Threats Both institutions reported that the software would be unlikely to be adopted because it is not part of the institutional Page 115 of 349 D7.4 - Validation 4 VLE. The institutions are standardising on major corporate software in order to reduce maintenance costs. Trust: Tutors and learners don‟t have confidence in the results of CONSPECT. Overall conclusion regarding the likelihood of adoption of CONSPECT Version 1.5: CONSPECT shows potential for adoption in other institutions to meet major institutional objectives of feedback on demand and personalised support for learners. The ability to check whether students are able to produce a piece of text from the course and identify the percentage of students who are not considering key terms is attractive. However, the service requires further enhancement to realise its potential. Most important actions to promote adoption of CONSPECT: Solve interface issues. The interface should user-centered. Minimize learner/tutor input (particularly, LSA parameters). Provide guidance to interpret the conceptograms, to assist users understanding of them Recognize the importance and impact of the corpus (detail, language) and thresholds required to produce useful outputs Test integration with VLE (e.g., Moodle, Blackboard, elgg) Provide aggregated statistical reports on learner progress Section 7 – Road map Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important future enhancements to the system in order to meet stakeholder requirements: Most important: 1. Present complete words and meaningful phrases to be of use to students and tutors. 2. Handle multiple input formats 3. Provide simple time-based view of an individual‟s conceptual evolution Page 116 of 349 D7.4 - Validation 4 4. Improve the visualisation of comparisons of conceptogram models to make it more simple to identify specific individuals. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important changes to the current scenario(s) of use in order to meet stakeholder requirements: Most important: 1. CONSPECT will be of greater value in lower-level courses where the knowledge requirements are based on fewer topics with greater focus. 2. Users should maintain a blog over the duration of a module, not simply over the duration of service use, to provide a richer dataset for analysis by the service. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are possible additional educational contexts for future deployment: Most important: 1. The system could be used to extract the main points from a discussion forum, so tutors/learners can see the topics of discussion and have an overview of what the group has been discussing 2. The system could be used to support collaborative writing: students write first their own text, and then create conceptograms to compare with others, as well as a group model. This is input for discussion and to go further in a new version of the text 3. The system could be used to analyze data collection from research (focus groups, interview text), and extract key concepts mentioned in these data Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important issues for future technical research to enable deployment of language technologies in educational contexts: Most important: 1. Understanding how language relating to process and procedural knowledge can be analysed meaningfully. 2. Understanding how language analysis of discourse about entities and their attributes differs from language about processes and procedural knowledge 3. Identifying how student discourse about their skills and competence can best be captured in formats that lend themselves to language pattern analysis. Page 117 of 349 D7.4 - Validation 4 4. Understanding how these technologies are applied to the learner in his / her knowledge domain, and how the outputs can be best be formulated to allow him/her to plan his learning trajectory. Roadmap - validation activities Further validation planned for beyond the end of the project: Claim (OVT): Tutors have identified that a conceptogram displays mainly the most relevant concepts in the input text, stem words and irrelevant words are not predominant in the graph Methodology: 1. Comparison of conceptograms against input text 2. Tutor interview Objective (OVT): Students find the feedback provided helps them to understand better the topic and encourages them to undertake further study Methodology: 1. Thematic analysis of students behaviour within the VLE or system (experimental and control group) 2. Test examination (marks experimental and control group) 3. Student questionnaire Objective (OVT): Tutors feel that the system reduces their workload and that they can focus on problematic outliers during the learning process Methodology: 1. Tutors questionnaire 2. Student questionnaire Page 118 of 349 D7.4 - Validation 4 Page 119 of 349 D7.4 - Validation 4 Appendix B.4 Validation Reporting Template for WP5.1 (PUB-NCIT and UNIMAN) Section 1: Functionality implemented in Version 1.5 and alpha / beta-testing record Brief description of functionality Version number of unit Changes from Version 1.0 Tutor view: Assignment Maintenance (add / define, edit, delete) V1.5 Added forum assignments functionality – did not previously exist. Added import LSA space from the R-LSA framework – did not previously exist. Thus new spaces were created for the Medicine pilot and the Long Thread validation on Web2.0. Other new spaces are easy to import and use. Tutor view: Conversation Maintenance (add / upload, edit / define, delete, process / analyze) V1.5 Added possibility to process CSV files that contain transcripts of discussion forums. Student and tutor view: Conversation Feedback V1.5 Improved functionality by adding suggested concepts from the semantic space that are missing from the conversation in order to give the learners a suggestion on what to study next. Implemented clustering of concepts for providing more accurate concept groups, in order to display to the users only the most relevant concepts from each cluster (for missing concepts especially). Improved the appearance (changed labels, added headings) of the feedback by taking into consideration the results from v1.0 validation round. Student and tutor view: Conversation Visualization V1.5 Adapted the visualization for being able to display discussion forum threads (lower number of posts than for chats and greater number of participants). Student and tutor view: Utterance Feedback V1.5 Added new functionality for labelling the posts in a discussion forum using Page 120 of 349 D7.4 - Validation 4 Garrison and Anderson‟s “Model of Inquiry” to separate posts that contain teaching, social and cognitive presence. The analysis of v1.0 showed that the grading of the utterances was problematic because of fact that the content score was not high enough and therefore less important utterances received greater scores. The proposed solution was modifying the grading algorithm by taking into account the scores assigned by tutors to utterances in v1.0 validation and modify the factors of each scoring component (content score, social score, utterance structure score, etc.) The new results are showing an improvement. Changed the linguistic patterns for several speech acts in order to increase the accuracy of the speech acts identification, when using the golden standard corpora. Student and tutor view: Participant Feedback V1.5 The analysis of v1.0 showed that the grading of the participants regarding the on-topic (content) score was problematic due to the fact that some participants may have used important concepts, but only from a given part of the semantic space. Therefore, the proposed solution was to add a bonus factor for using important concepts that are not in the same cluster in the semantic space. Improved social score to also take into account the implicit links, but with a lower importance than the explicit links. Student and tutor view: Search Conversation V1.5 Improved ranking of the search results, by making use the concept clustering ability. This way, for each keyword the query is expanded only with the terms in its cluster, thus reducing the search time with more than 10 times (especially important for large semantic spaces). Alpha-testing Pilot site and language PUB-NCIT, English Date of completion of alpha testing: October 15 , 2010 th Page 121 of 349 D7.4 - Validation 4 Who performed the alpha testing? Traian Rebedea, Mihai Dascalu Pilot site and language UNIMAN Date of completion of alpha testing: 7 October 2010 Who performed the alpha testing? Alisdair Smithies, Isobel Braidman Beta-testing Pilot site and language: PUB-NCIT, English Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): No If ‘No’ or ‘Partially’, give reasons: Preferred to run the pilot using the stand-alone version of PolyCAFe as this seemed more suitable (e.g. larger screen space, no other information except the PolyCAFe widgets are presented) for the users. The Long Thread is using the Elgg version of PolyCAFe‟s widgets which have also been alfa and beta-tested by the same people. beta-testing performed by: Iulia Pasov (student), Claudiu Mihail (student), Costin Chiru (tutor), Alexandru Gartner (tutor) beta testing environment (stand-alone service / integrated into Elgg): stand-alone service th HANDOVER DATE: October 26 , 2010 (Date of handover of software v.1.5 for validation) Pilot site and language: UNIMAN, English Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): No If ‘No’ or ‘Partially’, give reasons: Preferred to run the pilot using the stand-alone version of PolyCAFe, hosted by PUB, as this is more appropriate for the users. The Long Thread is using the Elgg version of PolyCAFe. Page 122 of 349 D7.4 - Validation 4 beta-testing performed by: Zara Sandiford (student), Santosh Tadi (student), Maria Regan (tutor) beta testing environment (stand-alone service / integrated into Elgg): stand-alone service, hosted by PUB th HANDOVER DATE: November 5 , 2010 (Date of handover of software v.1.5 for validation) Page 123 of 349 D7.4 - Validation 4 Section 2: Validation Pilot Overview NB Information about pilot sites, courses and participants has been transferred to Appendix A.3 Pilot task Pilot site: PUB-NCIT Pilot language: English What is the pilot task for learners and how do they interact with the system? The learners are divided into groups of 5 students (5 experimental and 2 control groups) and are given two successive chat assignments related to Human-Computer Interaction to debate using ConcertChat. The experimental group was asked to use PolyCAFe to get feedback for each assignment. The control group did not use PolyCAFe for the first assignment. The use of PolyCAFe for the second assignment is not mandatory, so the learners have an option to use PolyCAFe only if they think it would be useful for them. The two topics for the assignments are: - A debate about the best collaboration tool for the web: chat, blog, wiki, forums and Google Wave. Each student shall choose one of the 5 tools and shall present its advantages and the disadvantages of the other tools. Thus, you will act as a "sales person" for your tool and try to convince the others that you have the best offer (act as a marketer - http://www.thefreedictionary.com/marketer). You must also defend your product whenever possible and criticize the other products if needed. - You are in the board of decisions of a company that plans to use collaborative technologies for its activities. Each of you has studied the advantages and disadvantages of the following technologies that are considered by the company: chat, blog, wiki, forums and Google Wave. Engage into a collaborative discussion in order to decide for which activities it is indicated to use each technology. You should give the best advice for the technology that you support and convince the others to use it. The result of this discussion should be a plan of using these technologies in order to have the best outcomes for your company. You can also think of other useful technologies beside these ones, but do not insist on them. What do the learners produce as outputs? Are the outputs marked? The learners produce as outputs two chat conversation logs for each group. These outputs do not count in the mark for the HCI course as not all the students attending the course participated in the validation experiment. However, the chat conversations are marked by the tutors for the verification experiment. Page 124 of 349 D7.4 - Validation 4 How long does the pilot task last, from the learners starting the task to their final involvement with the software? th th The pilot task runs for about a month: November 11 – December 10 2010. How do tutors/student facilitators interact with the learners and the system? The tutors define the assignments in PolyCAFe and the list of relevant topics for each assignment. The tutors provide manual feedback to each of the students involved in a chat conversation for the first assignment. Each tutor assesses one conversation without using PolyCAFe and one conversation using PolyCAFe. No manual feedback is provided for the second assignment, only the feedback provided by PolyCAFe. Each tutor uses PolyCAFe to help him assess and provide final (manual) feedback to 2-3 chat conversations. Describe any manual intervention of the LTfLL team in the pilot: There are no manual interventions done in the pilot. Pilot site: UNIMAN Pilot language: English What is the pilot task for learners and how do they interact with the system? The learners are divided into groups of 7 or 8 students (4 experimental and 2 control groups) and are given a forum assignment related to debating professional practice in Medicine. Then, they are asked to use PolyCAFe and get feedback (the two control groups did not use PolyCAFe initially, they received the results of the Polycafe analysis at the end of the discussion). What do the learners produce as outputs? Are the outputs marked? The learners produce as outputs a discussion forum for each group. These outputs do not count in the mark for the course as not all the students attending the course participated in the validation experiment. All students participate in the discussions but only a small sample of groups have been used for the pilot. Formal feedback is not normally provided on the forums and they are not marked by a teacher/tutor, they are moderated by a student facilitator, who is trained in online facilitation techniques. The activity of assessing the forums using PolyCAFe and viewing the feedback it generates is therefore an additional task within the learning environment. Page 125 of 349 D7.4 - Validation 4 How long does the pilot task last, from the learners starting the task to their final involvement with the software? th th The pilot task runs for a month: November 15 – December 15 2010. How do tutors/student facilitators interact with the learners and the system? The facilitators guide a discussion about professionalism. All students participate in this activity, with student facilitators leading each group. The facilitators participating in the pilot use the feedback provided by PolyCAFe to direct their guidance and can choose to share the outputs with the students involved in a discussion, via email. Describe any manual intervention of the LTfLL team in the pilot: Compilation of the forums into spreadsheets, adapting outputs into a format that can be processed by PolyCAFe, distributing results of analysis via email to student facilitators. Experiments Name of experiment: Experiment A – PUB-NCIT Objective(s): Determine the relative quality of the manual feedback provided by tutors with and without using PolyCAFe Details: Each chat conversation for the first assignment is provided with manual feedback from 4 tutors: 2 of them use PolyCAFe and 2 without using it. After that, the tutors decide which manual feedback is better (i.e. the feedback informed with / without PolyCAFe) by using a set of common indicators: quality of feedback related to participation and collaboration, quality of feedback related to the content of the conversation, coverage of the feedback. Name of experiment: Experiment B – PUB-NCIT Objective(s): Determine the quality of the participants‟ grading provided by PolyCAFe Details: Each tutor that does not use PolyCAFe for giving manual feedback for a particular chat conversation (look at Experiment A, for each conversation Page 126 of 349 D7.4 - Validation 4 2 tutors provided manual feedback without PolyCAFe) and each student provides a ranking in order of merit of the participants to the chat conversation they attend, by considering (1) content, 2) collaboration and participation and, 3) overall. The ranking orders produced by tutors and learners were compared with the ones provided by PolyCAFe. Thus, Tutors (6) and students (35) manually ranked the participants to each of the 7 chat conversations for the first assignment. For each chat conversation, there have been 5 rankings from the students, plus two from the tutors that did not use PolyCAFe for providing manual feedback. The average ranking for each participant in a conversation was then computed and it was compared to the one provided by PolyCAFe for content and social impact. Page 127 of 349 D7.4 - Validation 4 Section 3: Results - validation/verification of Validation Topics OVT: OVT1.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The tutors/experts find that the speech acts discovered in the conversation (chat or forum) are correct. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (2) manually annotated two chat conversations with speech acts. Precision and recall were computed for each speech act. Results: Average speech act class precision: 85% Average speech act class recall: 70% The following table contains the precision and recall for each speech act class. Speech act - label Continuation Statement Greeting Accept Partial accept Agreement Understanding Negative Reject Partial reject Action directive Info request Thanks Maybe Precision 93% 94% 100% 92% 71% 90% 96% 97% 73% 35% 75% 100% 100% 100% Page 128 of 349 Recall 92% 93% 80% 80% 55% 51% 58% 78% 82% 27% 90% 71% 100% 69% D7.4 - Validation 4 Conventional Personal opinion Sorry OVT: OVT1.1 Pilot site UNIMAN Pilot language English 66% 100% 66% 50% 36% 75% Operational Validation Topic The tutors/experts find that the speech acts discovered in the conversation (chat or forum) are correct. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (2) manually annotated three sets of results from PolyCAFe. Relevance of the words reported on the topics covered in the forum posts was assessed on a scale of 1 to 10, 10 being most relevant to the conversation. Results: 37% (Frequency of concepts – of the words reported, 37% were appropriate) 33% (Topics detected that were relevant to the conversation) 17% (Relevant noun topics) OVT: OVT1.2 Pilot site UNIMAN Pilot language English Operational Validation Topic The tutors/experts find that the labels corresponding to Garrison‟s community of inquiry model in a forum are correct. FORUMS ONLY Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (2) manually annotated three sets of results from PolyCAFe. Accuracy of the feedback reported on the Garrison Community of Inquiry model was assessed on a scale of 1 to 10, 10 being most relevant to the post assessed. Results: 63% individual CoI feedback considered accurate categorisation. Page 129 of 349 D7.4 - Validation 4 50% information about collaboration was 50% correct, some instances of “BAD” result reported in instances where progress was good. Page 130 of 349 D7.4 - Validation 4 OVT: OVT1.3 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The tutors/experts find that the scores assigned to the utterances are correct. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: The tutors (3) manually annotated two chat conversations from the first assignment with scores (from 1-4) for each utterance. In order to be able compute the inter-rater agreement, one chat conversation was annotated by two tutors. Results: Chat 1 (331 utterances): Tutor 1 – Tutor 2 (inter-rater) correlation: 61% Tutor 1 – PolyCAFe correlation: 60% Tutor 2 – PolyCAFe correlation: 41% Tutor average – PolyCAFe correlation: 57% Chat 2 (277 utterances) Tutor – PolyCAFe correlation: 55% (No inter-rater data) OVT: OVT1.4 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The tutors/experts find that the scores assigned to the participants for a given concept and globally are correct. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Experiment B – PUB-NCIT Results: The average precision of the ranking provided by PolyCAFe is 67% (computed for all the 35 students, in 7 different conversations) and 77% Page 131 of 349 D7.4 - Validation 4 precision with the tutors (computed on all the 7 conversations). The average distance between the manual ranking and PolyCAFe‟s ranking is 0.43 when compared to students‟ rankings and 0.23 when compared to the tutors‟ ranking. The following table highlights the precision, correlation and average distance between the rankings provided by the students, the tutors and PolyCAFe. Rankings compared Tutors – System Students – System Tutors – Students OVT: OVT1.5 Pilot site PUB-NCIT Pilot language English Correlation 94% 84% 84% Precision 77% 66% 71% Average distance 0.23 0.43 0.40 Operational Validation Topic The tutors/experts find that the PolyCAFe correctly identifies the important (relevant) concepts from the conversation. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: The tutors (3) manually annotated three chat conversations from the first assignment with the most important 10 concepts and the next 10 important concepts (if any). Results: By comparing the concepts provided by the tutors for the three chats and comparing them with the most important 30 topics determined by PolyCAFe, there has been a precision of: 18/28 = 64%. No inter-rater agreement data, as not all the tutors have fulfilled this task. Page 132 of 349 D7.4 - Validation 4 OVT: OVT2.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Tutors/facilitators spend less time preparing feedback for learners compared with traditional means. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Time measurements for preparing the feedback by the tutors – each chat conversation has been analyzed and provided feedback for by 4 tutors – 2 using PolyCAFe and 2 not using the system. Tutor questionnaire. Results: Average time needed to prepare feedback without PolyCAFe: 84 minutes, standard deviation: 15 minutes Average time needed to prepare feedback with PolyCAFe: 55 minutes, standard deviation: 20 minutes Average time saved = (84 – 55) / 84 = 35% Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 7. It takes less time to complete my teaching tasks using PolyCAFe than without the system. Experimental 4.5 0.84 83% 6 Tutors 8. Using PolyCAFe enables me to work more quickly than without the system. 9. I do not wait too long before receiving the requested information. Experimental 4.5 0.84 83% 6 Experimental 4.8 0.41 100% 6 10. PolyCAFe provides me with the requested information when I require it (i.e. at the right time in my work activities). 34. I spend less time preparing feedback to learners than without the system. 35. I find that using PolyCAFe is a very time-efficient way of providing feedback. 36. I find the information needed to write the feedback for the learners more quickly using PolyCAFe than without it. Experimental 4.7 0.52 100% 6 Experimental 4.5 0.84 83% 6 Experimental 4.3 0.52 100% 6 Experimental 4.7 0.52 100% 6 Tutors Tutors Tutors Tutors Tutors Page 133 of 349 D7.4 - Validation 4 Formative results with respect to validation indicator Stakeholder type Results Tutors One tutor said: “very useful tool in reducing time” and all the others agreed. “It is easier to assess collaboration, involvement and determine the most important concepts and parts of the conversation.” OVT: OVT2.1 Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors/facilitators spend less time preparing feedback for learners compared with traditional means. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 7. It takes less time to complete my teaching tasks using PolyCAFe than without the system. Experimental 2 0.71 0% 5 Tutors 8. Using PolyCAFe enables me to work more quickly than without the system. 9. I do not wait too long before receiving the requested information. Experimental 2 0.71 0% 5 Experimental 3 0.00 0% 5 Experimental 2.8 0.45 0% 5 Experimental 2.4 1.14 20% 5 Experimental 2.6 1.14 20% 5 Experimental 2.6 1.14 20% 5 Tutors 10. PolyCAFe provides me with the requested information when I require it (i.e. at the right time in my work activities). 34. I spend less time preparing feedback to learners than without the Tutors system. 35. I find that using PolyCAFe is a very time-efficient way of providing Tutors feedback. 36. I find the information needed to write the feedback for the learners Tutors more quickly using PolyCAFe than without it. Formative results with respect to validation indicator Tutors Stakeholder type Results Student Facilitators felt that PolyCAFE presented them with too much information and required a lot of interpretation to make sense of Page 134 of 349 D7.4 - Validation 4 Facilitators OVT: OVT2.2 the results. They felt that the context of the discussion threads had been lost in the way the results were presented, due to the anonymization process. “There‟s a lot of information but it‟s not clear which… what it means, how do I use it? What changes do I need to make to the guidance I‟m giving to help people improve their contributions? I can‟t find that out without reading through all the information… I seriously think it will probably take longer than if I just look at the forums themselves” Pilot site PUB-NCIT Pilot language English Operational Validation Topic It is easier (there is less cognitive load) for tutors/facilitators to provide feedback using PolyCAFe compared with just reading the learners‟ online conversations. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement 11a. Please rank on a 5-point scale the mental effort (1 = very low mental effort; 5 = very high mental effort) you invested to accomplish teaching tasks using PolyCAFe. 11b. Overall, using the system requires significantly less mental effort to Tutors complete my teaching tasks than when using normal chat transcripts. 37. I find it easier to analyze a chat conversation using PolyCAFe than Tutors without it. 38. There is a lot of information in the conversation that I cannot process Tutors without PolyCAFe. 39. I find that the discussion threads and their inter-animation are difficult Tutors to follow without PolyCAFe. Formative results with respect to validation indicator Tutors Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Experimental 3.0 0.63 16% 6 Experimental 4.8 0.41 100% 6 Experimental 4.8 0.41 100% 6 Experimental 4.0 0.63 83% 6 Experimental 4.8 0.41 100% 6 Stakeholder type Results Tutors “The cognitive load is much higher when not using PolyCAFe because the task is very difficult” “It would be much more difficult not to use PolyCAFe” Page 135 of 349 D7.4 - Validation 4 OVT: OVT2.2 Pilot site UNIMAN Pilot language English Operational Validation Topic It is easier (there is less cognitive load) for tutors/facilitators to provide feedback using PolyCAFe compared with just reading the learners‟ online conversations. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Student Facilitators Experimental 3.8 0.84 60% 5 Student Facilitators 11a. Please rank on a 5-point scale the mental effort (1 = very low mental effort; 5 = very high mental effort) you invested to accomplish teaching tasks using PolyCAFe. 11b. Overall, using the system requires significantly less mental effort to complete my teaching tasks than when using normal chat transcripts. Experimental 2.8 1.30 40% 5 Student Facilitators 37. I find it easier to analyze a chat conversation using PolyCAFe than without it. Experimental 2.6 1.34 40% 5 Student Facilitators 38. There is a lot of information in the conversation that I cannot process without PolyCAFe. Experimental 2.4 0.89 0% 5 Student Facilitators 39. I find that the discussion threads and their inter-animation are difficult to follow without PolyCAFe. Experimental 2.2 0.84 0% 5 Formative results with respect to validation indicator Stakeholder type Results Student facilitators The facilitators found the results of the analysis took a lot of effort to interpret compared with following the threads. It was difficult to see which feedback related to which thread, and to make sense of the figures returned. “It's not clear how to address the feedback, how do I act on it?” The numbers allocated had no reference attached to them on which to gauge what they meant. The groups comprised up to 8 students – the facilitators felt that the PolyCAFE service would be more appropriate in a larger group setting. “the visualisation quickly gives a clear idea of who‟s participating but that‟s clear anyway because the group is quite small” Page 136 of 349 D7.4 - Validation 4 OVT: OVT3.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Tutors/facilitators perceive that the feedback received from the system helps them prepare feedback for learners. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 6. The information the system provides me is accurate enough for helping me perform my teaching tasks. Experimental 3.8 0.41 83% 6 Tutors Experimental 4.2 0.75 83% 6 Experimental 4.5 0.84 83% 6 Tutors 40. PolyCAFe provides feedback that is relevant to my preparation of learner feedback. 41. PolyCAFe provides feedback that is useful to my preparation of learner feedback. 42. PolyCAFe 's feedback is sufficiently accurate to inform my feedback. Experimental 3.8 0.41 83% 6 Tutors 43. I trust PolyCAFe to provide helpful feedback. Experimental 4.3 0.52 100% 6 Tutors Formative results with respect to validation indicator Stakeholder type Results Tutors “The feedback provided by PolyCAFe helps you to easily identify the important parts of a conversation and to get a quick overview of the discussion” “The visualization of the conversation is extremely useful” “Not always the feedback is exact, but it does not influence the evaluation of the tutors” Page 137 of 349 D7.4 - Validation 4 OVT: OVT3.1 Pilot site UNIMAN Pilot language English Operational Validation Topic Tutors/facilitators perceive that the feedback received from the system helps them prepare feedback for learners. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Student Facilitators 6. The information the system provides me is accurate enough for helping me perform my teaching tasks. Experimental 2.2 0.45 0% 5 Student Facilitators 40. PolyCAFe provides feedback that is relevant to my preparation of learner feedback. Experimental 2.6 1.14 20% 5 Student Facilitators 41. PolyCAFe provides feedback that is useful to my preparation of learner feedback. Experimental 2.6 1.14 20% 5 Student Facilitators 42. PolyCAFe 's feedback is sufficiently accurate to inform my feedback. Experimental 2.4 1.14 20% 5 Student Facilitators 43. I trust PolyCAFe to provide helpful feedback. Experimental 2.6 1.14 20% 5 Formative results with respect to validation indicator Stakeholder type Results Student Facilitators One facilitator felt that the feedback was a helpful addition to their own observations of the discussion forums. Page 138 of 349 D7.4 - Validation 4 OVT: OVT3.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Learners perceive that the feedback received from the system contributes to informing their study activities. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: System logging Results: 285 visits to PolyCAFe, 1447 page-views were achieved between November, 1st – December 1st (more than 40 page-views per student). Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 6. The information the system provides me is accurate enough for helping me perform my learning tasks. Experimental 3.7 0.52 60% 25 Learners 31. PolyCAFe provides feedback that is relevant to my study activities. Experimental 3.9 0.91 72% 25 Learners 32. PolyCAFe provides feedback that is useful to my study activities. Experimental 3.8 0.85 72% 25 Learners 33. PolyCAFe's feedback is sufficiently accurate to inform my study activities. 34. I trust PolyCAFe to provide helpful feedback. Experimental 3.8 0.88 64% 25 Experimental 4.0 0.87 80% 25 Learners Formative results with respect to validation indicator Stakeholder type Results Learners “PolyCAFe is very useful to assess inter-connectivity and inter-animation, plus concept analysis.” “Important tool for analyzing the conversations of my other colleagues in order to see whether I am position within the class.” “The relevant concepts are very useful” “useful for assessing the collaboration” Page 139 of 349 D7.4 - Validation 4 “very relevant for the progressive analysis of a group that has more than 2 discussion” OVT: OVT3.2 Pilot site UNIMAN Pilot language English Operational Validation Topic Learners perceive that the feedback received from the system contributes to informing their study activities. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 6. The information the system provides me is accurate enough for helping me perform my learning tasks. Experimental 3.4 0.98 48% 21 Learners 31. PolyCAFe provides feedback that is relevant to my study activities. Experimental 3.6 0.98 62% 21 Learners 32. PolyCAFe provides feedback that is useful to my study activities. Experimental 3.6 1.03 71% 21 Learners 33. PolyCAFe's feedback is sufficiently accurate to inform my study activities. 34. I trust PolyCAFe to provide helpful feedback. Experimental 3.4 1.03 57% 21 Experimental 3.5 1.08 62% 21 Learners Formative results with respect to validation indicator Stakeholder type Results Learners Individual students found the feedback on their utterances interesting and useful in that it allowed them to see the ways in which their utterances had been classified. Some students (n=5 agreed in a focus group) felt that some of the feedback was irrelevant at a group level but useful to individuals to see their own performance in the group and it motivated them to develop their responses. “The assessment of my own statements was helpful... it let me see how I‟d responded to others and I could work on my answers and see the changes in the way the system classified my response... it was interesting” Page 140 of 349 D7.4 - Validation 4 OVT: OVT3.3 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The feedback given by different tutors/facilitators to the same student is more consistent using PolyCAFe than without using it (there is more homogeneity among the responses provided to learners). Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Experiment B – PUB-NCIT. Two tutors manually analyzed all the feedback provided by the tutors for the first assignment (with and without PolyCAFe) in order to find differences in consistency between feedback provided by different tutors to the same chat conversation. Results: Each chat conversation has been provided feedback for by two tutors using PolyCAFe and two tutors not using PolyCAFe. All 14 pairs of feedback were evaluated relatively to the consistency of the provided feedback on collaboration + inter-animation, content and overall. The marks were from 1 (very inconsistent) to 5 (very similar/consistent feedback). There has been a slight increase in the collaboration and inter-animation assessment when using PolyCAFe (average grade for consistency related to this issue increased with 20%). However, the average grade for feedback related to content and overall feedback had only a very small increase for consistency (11%, respectively 13%). Formative results with respect to validation indicator Stakeholder type Results Tutors “The feedback writing style is very difficult to change and it seems to have had a great influence in the feedback written by each tutor regardless of using it (the system) or not…” Page 141 of 349 D7.4 - Validation 4 OVT: OVT3.4 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The feedback provided by tutors/facilitators after using PolyCAFe is more extensive (higher quality) than without using the system. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Experiment B – PUB-NCIT. Two tutors manually analyzed all the feedback provided by the tutors for the first assignment (with and without PolyCAFe) in order to find differences in quality between feedback provided by different tutors to the same chat conversation. Results: Each chat conversation has been provided feedback for by two tutors using PolyCAFe and two tutors not using PolyCAFe. The quality of each feedback was graded using scores from 1 to 5 by the two tutors on the collaboration + inter-animation quality, content quality and overall quality. The overall quality has increased by merely 10% (17% collaboration and just 5% content quality). Formative results with respect to validation indicator Stakeholder type Results Tutors “The quality of the feedback using PolyCAFe could have been better, but I did not want to copy the information already delivered by the system” Page 142 of 349 D7.4 - Validation 4 OVT: OVT4.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Using PolyCAFe, tutors/facilitators monitor the learner‟s participation in online discussions better: detect conversations with bad/good collaboration, discover the coverage of concepts of each participant and other differences between learners. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 44. PolyCAFe is effective for monitoring the quality of collaboration. Experimental 4.0 0.63 83% 6 Experimental 4.3 0.52 100% 6 Experimental 4.7 0.52 100% 6 45. PolyCAFe is effective for determining the extent of concept coverage of each participant. 46. PolyCAFe is effective for determining the relevant concepts in the Tutors conversation Formative results with respect to validation indicator Tutors Stakeholder type Results Tutors The tutors all agreed that PolyCAFe is useful for monitoring the learner‟s participation in online discussions better than any alternative. However, there has been a debate that it is difficult to find a perfect measure for collaboration and several measures should be investigated and compared to the current scores computed by PolyCAFe. “PolyCAFe is very helpful to assess the degree of collaboration, which cannot be computed correctly without assistance” Page 143 of 349 D7.4 - Validation 4 OVT: OVT4.1 Pilot site UNIMAN Pilot language English Operational Validation Topic Using PolyCAFe, tutors/facilitators monitor the learner‟s participation in online discussions better: detect conversations with bad/good collaboration, discover the coverage of concepts of each participant and other differences between learners. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Student facilitators 43. PolyCAFe is effective for monitoring the quality of collaboration. Experimental 2.6 1.14 20% 5 Student facilitators 44. PolyCAFe is effective for determining the extent of concept coverage of each participant. Experimental 2.8 1.30 40% 5 Student facilitators 45. PolyCAFe is effective for determining the relevant concepts in the conversation Experimental 2.8 1.10 20% 5 Formative results with respect to validation indicator Stakeholder type Results Student facilitators Student facilitators felt that the way in which the results are presented makes the quality aspects of collaboration unclear. Note: Individual feedback was removed for the pilot due to concerns about the ethical implication of presenting individuals with feedback that was rated “BAD”. “I didn‟t feel comfortable with the numbers, what do they mean? When it says “bad” it doesn‟t say in what way it‟s bad”. Tutors “there are many aspects of the discussions that were not picked up by PolyCafe” “Group X showed empathy towards the children and the pain they may have gone through in the example used but I didn't feel this was acknowledged in the feedback given. On the whole it seemed more like a frequency list of key concepts.” Page 144 of 349 D7.4 - Validation 4 OVT: OVT4.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The system provides learners with information that helps them reflect better on their performance as individuals and as group members compared with traditional means. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Student questionnaire (comparison to control group) Results: PolyCAFe is considered a big improvement in comparison to the current system (reading the chat log/transcript in any format), especially with regard to allowing the students to reflect on their performance as an individual: the average score has increased from 2.7 to 4.2, while the agreement rate has almost tripled from 30% to 80%. Moreover, PolyCAFe offers a better alternative for reflecting on each student‟s contribution in the conversation as a member of a group, with an average score increased from 3.1 (control group) to 4.4 (experimental group), and the agreement rate more than doubled from 40% to 88%. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 35. PolyCAFe helps me reflect on my performance as an individual. Experimental 4.2 0.88 80% 25 Learners 35. The current system (reading the chat transcript/log) helps me reflect on my performance as an individual. Control 2.7 1.12 30% 10 Learners 36. PolyCAFe helps me reflect on my contribution as a member of a group. Experimental 4.4 0.81 88% 25 Learners 36. The current system (reading the chat transcript/log) helps me reflect on my contribution as a member of a group. Control 3.1 1.17 40% 10 Formative results with respect to validation indicator Stakeholder type Results Learners PolyCAFe has been considered very useful for helping the students reflect on their performance (either individual or part of a group). The learners considered that it would have been even more useful if the topics of the conversation would have been Page 145 of 349 D7.4 - Validation 4 somewhat more unfamiliar to them or inter-disciplinary. “PolyCAFe is really helpful to compare you with people from other groups, outside my chat group” (by looking at their feedback provided by PolyCAFe and comparing it with yours) OVT: OVT4.2 Pilot site UNIMAN Pilot language English Operational Validation Topic The system provides learners with information that helps them reflect better on their performance as individuals and as group members compared with traditional means. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 35. PolyCAFe helps me reflect on my performance as an individual. Experimental 3.5 1.08 57% 21 Learners 35. The current system (reading the chat transcript/log) helps me reflect on my performance as an individual. Control 3.6 0.79 30% 7 Learners 36. PolyCAFe helps me reflect on my contribution as a member of a group. Experimental 3.6 1.08 62% 21 Learners 36. The current system (reading the chat transcript/log) helps me reflect on my contribution as a member of a group. Control 3.1 1.07 57% 7 Formative results with respect to validation indicator Stakeholder type Results Learners “I like the list of words, it prompted me to consider the topics that we had covered” (n=6 agree in focus group 1) OVT: OVT4.3 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The visualization offers the users a better understanding of chat conversations and discussion forums. Summative results with respect to validation indicator Page 146 of 349 D7.4 - Validation 4 Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 47. PolyCAFe‟s visualization is effective for monitoring the quality of collaboration. Experimental 4.5 0.55 100% 6 Tutors 48. PolyCAFe‟s visualization is useful for determining the inter-animation of concepts and ideas (social learning). Experimental 4.0 0.63 83% 6 Learners 37. PolyCAFe‟s visualization is effective for monitoring the quality of collaboration. Experimental 4.4 0.64 92% 25 Learners 37. The current visualization (reading the chat transcript/log in html format) is effective for monitoring the quality of collaboration. Control 2.7 1.41 30% 10 Learners 38. PolyCAFe‟s visualization is useful for determining the inter-animation of concepts and ideas (social learning). Experimental 4.2 0.72 84% 25 Learners 38. The current visualization (reading the chat transcript/log in html format) is useful for determining the inter-animation of concepts and ideas (social learning). Control 2.7 1.41 30% 10 Formative results with respect to validation indicator Stakeholder type Results Tutors “The visualization of the conversation is extremely useful” “can get a easy overview of the collaboration, implication, number of links and discussion threads” Learners “the inter-animation of concepts is difficult to determine without the visualization” “good representation of the conversation”... “permits you to find out where you had a good collaboration, how many colleagues linked to you” Page 147 of 349 D7.4 - Validation 4 OVT: OVT4.3 Pilot site UNIMAN Pilot language English Operational Validation Topic The visualization offers the users a better understanding of chat conversations and discussion forums. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Student Facilitators 47. PolyCAFe‟s visualization is effective for monitoring the quality of collaboration. Experimental 2.8 1.10 20% 5 Student Facilitators 48. PolyCAFe‟s visualization is useful for determining the inter-animation of concepts and ideas (social learning). Experimental 3.0 0.71 20% 5 Learners 37. PolyCAFe‟s visualization is effective for monitoring the quality of collaboration. Experimental 3.8 1.18 71% 21 Learners 37. The current visualization (reading the chat transcript/log in html format) is effective for monitoring the quality of collaboration. Control 3.3 0.95 57% 7 Learners 38. PolyCAFe‟s visualization is useful for determining the inter-animation of concepts and ideas (social learning). Experimental 3.9 1.2 67% 21 Learners 38. The current visualization (reading the chat transcript/log in html format) is useful for determining the inter-animation of concepts and ideas (social learning). Control 3.4 1.13 57% 7 Formative results with respect to validation indicator Stakeholder type Results Student Facilitators Could not see quality indicators in the visualisation – suggested that it could only be used to quantify extent of participation. Learners Students commented that they could see the interactions and described these as “helpful” but the visualisation did not expose the concepts and ideas being discussed. They felt that this would work better if shown with an individual discussion thread, Page 148 of 349 D7.4 - Validation 4 rather than as an overall analysis of the group‟s whole forum. OVT: OVT5.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Learner performance in online discussions is improved in the areas of content coverage and collaboration when using PolyCAFe. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measurements Results: Based on scores computed automatically by PolyCAFe: Experimental group for the second assignment Control group for the second assignment Improvement over control group Average score for a chat conversation (collaboration + content) 6.80 6.37 (6.80-6.37)/6.37 = 6.8% Average importance of the most important 20 concepts 0.194 0.192 1.2% Average number of utterances 351 338 (351-338)/338 = 3.8% Average distribution of (implicit and explicit) links between utterances 1.12 0.87 (1.12-0.87)/0.87 = 29% For all the indicators presented above, the chat conversations for the experimental groups were better than those of the control group. However, only for collaboration (average number of links/utterance) and for total average score of a utterance there was found a substantial increase between the two groups. Stakeholder type Results Learners The learners felt that the feedback for the first set of assignments gave them indicators that showed that they needed to Page 149 of 349 D7.4 - Validation 4 collaborate better and be more involved in the discourse. Moreover, the sets of concepts that were not covered, also offered them some insight. They also said that the task was a bit simple and they would have found the feedback even more useful if discussing about topics that they knew less about. OVT: OVT6.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The direct feedback provided by the system encourages learners to undertake further study to address gaps in their coverage. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 18. Using PolyCAFe increases my curiosity about the learning topic. Experimental 3.7 0.98 56% 25 Learners 20. Using the system motivates me to explore the learning topic more fully. Experimental 3.5 1.05 40% 25 Learners 22. I am eager to explore different things with PolyCAFe. Experimental 3.6 1.12 48% 25 Formative results with respect to validation indicator Stakeholder type Results Learners The learners said that the topic of the chat conversations was not very difficult for them and this was one of the reasons why the feedback was not as useful as it would have been for a more difficult topic. They also said that the feedback would be more interesting for open-questions and discussions related to humanities, economics or law, for example. OVT: OVT6.1 Pilot site UNIMAN Pilot language English Operational Validation Topic The direct feedback provided by the system encourages learners to undertake further study to address gaps in their coverage. Page 150 of 349 D7.4 - Validation 4 Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Learners 18. Using PolyCAFe increases my curiosity about the learning topic. Experimental 3.3 0.75 43% 21 Learners 20. Using the system motivates me to explore the learning topic more fully. Experimental 3.0 0.92 33% 21 Learners 22. I am eager to explore different things with PolyCAFe. Experimental 3.3 1.23 43% 21 Formative results with respect to validation indicator Stakeholder type Results Learners The types of feedback provided were not perceived as providing constructive direction for further study. “Conversation feedback and utterance feedback needs to be clearer to give me directions on what I need to do” Some students (n=5 agreed in a focus group) felt that some of the feedback was irrelevant at a group level but useful to individuals to see their own performance in the group and it motivated them to develop their responses. “The assessment of my own statements was helpful... it let me see how I‟d responded to others and I could work on my answers and see the changes in the way the system classified my response... it was interesting” OVT: OVT7.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic There is a saving in institutional resources overall* Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measurements, Focus group with tutors Results: Average time needed to prepare feedback without PolyCAFe: 84 minutes, standard deviation: 15 minutes Page 151 of 349 D7.4 - Validation 4 Average time needed to prepare feedback with PolyCAFe: 55 minutes, standard deviation: 20 minutes Average time saved = (84 – 55) / 84 = 35% For a class with 100 students and 2 chat assignments, the total time gained by the tutors that use PolyCAFe for providing manual feedback for the 50 chat groups would be 29 * 50 = 1450 minutes = 24 hours that could be used more efficiently. Formative results with respect to validation indicator Stakeholder type Results Tutors The tutors said that PolyCAFe might be very useful if used as an assessment standard for various assignments and to determine whether (novice) tutors are giving correct feedback. OVT: OVT8.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The service addresses one or more institutional objectives Formative results with respect to validation indicator Stakeholder type Results Teaching manager Economy of the university: The most important aspect is the economy for the University from many points of view (time and availability of tutors, for example), but mainly it should be first an improvement in the educational process and in the knowledge that the students acquire. The head of the computer science department also said that PolyCAFe would be very useful for other departments in the University. “I am also convinced that this can be an economy for the University from many points of view, but mainly it should be first an improvement in the educational process and in the knowledge that the students acquire…” Student intellectual and social development: It is important for students that work in teams, especially that there are more and more projects that require team-work and collaboration. Chat is a very used application by students, it is also important to be able and support its use for the “professional student”. “The tool helps for clarifying different aspects until the students become acquainted with a certain subject” Supporting tutors in assessment: “It could be useful even for the teaching staff in order to coordinate activities between the same class, but which has different Page 152 of 349 D7.4 - Validation 4 professors and tutors”. “I think that every course could benefit from using the system, but the question is to which extent”, “as the tutors can easily have a view of the level of the class, of the degree of collaboration” “I think that the product has a great potential and it would be very useful for other departments as well… for example, the department of economics, pedagogy, …” Page 153 of 349 D7.4 - Validation 4 OVT: OVT8.1 Pilot site UNIMAN Pilot language English Operational Validation Topic The service addresses one or more institutional objectives Formative results with respect to validation indicator Stakeholder type Results Teaching manager Provision of formative feedback: “We need to provide high quality feedback to students, formative feedback is an area that we have been underperforming in so systems such as the one you‟ve shown could go some way towards addressing the need for us to improve.” OVT: OVT9.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Users were motivated to continue to use the system after the end of the formal validation activities Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Measurements Results: th th Number of visits between December, 26 and February, 6 : 334 (approximately 100 visits could be from the long thread validation experiment) th th Number of pageviews between December, 26 and February, 6 : 1164 Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 21. I would recommend this system to other teachers to help them in their teaching. Experimental 4.8 0.41 100% 6 Tutors 22. I am eager to explore different things with PolyCAFe. Experimental 4.3 0.53 100% 6 Page 154 of 349 D7.4 - Validation 4 Tutors 29. I would like to use the service in my teaching after the pilot. Experimental 4.8 0.41 100% 6 Tutors 30. If the service is available after the pilot, I will definitely use it in my teaching. Experimental 4.8 0.41 100% 6 Learners 21. I would recommend this system to other learners to help them in their teaching. Experimental 3.8 1.08 60% 25 Learners 22. I am eager to explore different things with PolyCAFe. Experimental 3.6 1.12 48% 25 Learners 29. I would like to use the service in my learning activities after the pilot. Experimental 3.4 1.04 48% 25 Learners 30. If the service is available after the pilot, I will definitely use it in learning activities. Experimental 3.3 1.06 44% 25 OVT: OVT9.1 Pilot site UNIMAN Pilot language English Operational Validation Topic Users were motivated to continue to use the system after the end of the formal validation activities Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 21. I would recommend this system to other teachers to help them in their teaching. Experimental 3 1.58 40% 5 Tutors 22. I am eager to explore different things with PolyCAFe. Experimental 3 1.22 40% 5 Tutors 29. I would like to use the service in my teaching after the pilot. Experimental 3.2 1.48 40% 5 Tutors 30. If the service is available after the pilot, I will definitely use it in my teaching. Experimental 3.2 1.48 40% 5 Learners 21. I would recommend this system to other learners to help them in their Experimental 3.5 1.08 57% 21 Page 155 of 349 D7.4 - Validation 4 teaching. Learners 22. I am eager to explore different things with PolyCAFe. Experimental 3.3 1.23 43% 21 Learners 29. I would like to use the service after the pilot. Experimental 3.3 1.19 48% 21 Learners 30. If the service is available after the pilot, I will definitely use it Experimental 3.2 1.08 38% 21 OVT: OVT9.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users) Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Generic questionnaire - learners Results: Descriptive Statistics - Learners N Mean Std. Deviation Effectiveness 31 3,53 ,630 Efficiency 31 3,60 ,735 Cognitive Load 31 4,29 ,902 Usability 31 4,05 ,786 Satisfaction 31 3,76 ,753 Facilitating conditions 31 3,78 ,723 Self-Efficacy 31 3,86 ,783 Behavioural intention 31 3,50 1,000 PolyCafe PUB-NCIT 31 3,75 ,576 Page 156 of 349 D7.4 - Validation 4 Valid N (listwise) 31 Page 157 of 349 D7.4 - Validation 4 OVT: OVT9.2 Pilot site UNIMAN Pilot language English Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users) Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Generic questionnaire - learners Results: Descriptive Statistics - Learners N Mean Std. Deviation Effectiveness 28 3,32 ,700 Efficiency 28 3,38 ,658 Cognitive load 28 3,00 1,186 Usability 28 3,71 ,717 Satisfaction 28 3,37 ,782 Facilitating conditions 28 3,73 ,685 Self-Efficacy 28 3,61 ,732 Behavioural intention 28 3,29 1,040 PolyCafe UNIMAN 28 3,46 ,610 Valid N (listwise) 28 Page 158 of 349 D7.4 - Validation 4 OVT: OVT9.3 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Tutors attending a dissemination workshop give high scores to the question 'how likely are you to adopt the service?' Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors How likely are you to adopt PolyCAFe in your own educational practice? Dissemination 4.05 0.62 84% 19 Formative results with respect to validation indicator Stakeholder type Results Tutors “It can be an useful tool for collaborative learning, especially to stimulate the interest and the aptitudes of the students in order to participate in an authentic debate on a given subject.” “the applicability is practically unlimited when organizing free chats” “considering to apply PolyCAFe on forums and taking decisions based on the options/discussions of the students” OVT: OVT10.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Learners are more involved and motivated with the course by using PolyCAFe. Formative results with respect to validation indicator Stakeholder type Results Learners The learners said that using PolyCAFe offered them useful information for the second assignment, especially related to increasing collaboration and concepts that were not covered. The learners considered that it is very important to see the results of their colleagues in order to be able to compare themselves with other peers. Moreover, they said that using PolyCAFe for more than two assignments would be more Page 159 of 349 D7.4 - Validation 4 motivating and useful for them. Page 160 of 349 D7.4 - Validation 4 Section 4: Results – validation activities informing future changes / enhancements to the system VALIDATION ACTIVITY Pilot partner: PUB-NCIT Service language: English Additional formative results (not associated with validation topics) Alpha testing Correction of importance of topics, concepts and overall importance to be shown with the same number of decimals (two decimals) Beta testing Slight modification of the grading algorithm by correcting some bugs Tutor interviews Improve the help section. Improve the navigation from a widget to another (maybe have a non-widget version) Make the transition smoother from local feedback (participant or utterance) to global feedback (conversation feedback or visualization) Implement a detailed scoring for the utterances Improve the conversation threads detection algorithm Utterance feedback: highlight important concepts in an utterance, colour utterances according to their importance Implement support for Math formulas and, maybe, for simple scripted tasks Tutor workshop(s) Implement automatic LSA training from online resources for various topics and domains, in order to configure the system more easily Develop interfaces to popular discussion forum interfaces, like phpBB, Moodle, etc. Extend to scripted activities or activities that require the students to reach certain learning outcomes: develop a way to measure the coverage of each learning outcome. Learner focus group 1 Improve the interface by moving all the widgets into a single window, use of tabs, use of icons Improve the help, provide tool-tips, visual hints, “what is this?” buttons Easier to upload any kind of chat log Page 161 of 349 D7.4 - Validation 4 Better privacy and user management Allowing the possibility to “track” a participant with different usernames in different chat conversations Provide feedback real-time during a chat conversation Extend the analysis on a team-level basis in order to compare different teams/chats with each other The current feedback should be rephrased, maybe in a more directed manner, like giving an advice to the participants instead of a score Learner focus group 2 (prioritisation of enhancements) Learners judged that the five most important areas for enhancement of the system (i.e. clusters) are: 1. Improve the interface, especially by not using the widgets. 2. Improving the help and examples section. 3. Real time feedback for chat conversations. 4. Extend the analysis to provide additional feedback that determines the position of a team within the group of all the teams that solved an assignment (also rank teams, not only participants inside a team). 5. The feedback should be expressed in a more direct manner (like an advice), instead of just providing scores and indicators. Learners judged that the five most important single improvements that should be made to the system are: 1. Implement an interface that has a single widget or a non-widget version. 2. Extend the help section to contain examples for each functionality. 3. Reformulate the feedback to sound more like an advice. 4. Also rank teams, not only participants inside a team. 5. Integrate PolyCAFe‟s feedback into a real chat environment. Teaching manager interview I think that every course could benefit from using the system, but the question is to which extent Other (please specify) Development team: Extend PolyCAFe to the Romanian language. Determine in what pedagogical scenarios PolyCAFe is most useful Page 162 of 349 D7.4 - Validation 4 VALIDATION ACTIVITY Pilot partner: UNIMAN Service language: English Additional formative results (not associated with validation topics) Alpha testing Removal of individual feedback service – Good / Bad results do not align with institutional policy Beta testing Refinement of topics used in LSA space on which grading was based – an activity was undertaken by the team at UNIMAN to produce a set of MESH headings to underpin PolyCAFe‟s use with the professional development forums. These were used to generate a new semantic space for the third validation round. Facilitator interviews Results must be accompanied by tooltip help – providing a user handbook too unwieldly The quantity of data shown needs to be reduced and reporting needs to be more concise The data needs to be shown at an individual thread level of the discussion forum to be contextually grounded for the viewer Tutors felt the feedback wasn‟t sufficiently objective and of a high enough standard to make further use of this on a stand-alone basis, but felt that if the accuracy of the reporting and the summary data were to be improved it could be used for reporting, to provide summary data on a dashboard for all groups, rather than as a direct feedback mechanism. Tutors felt, however, that with improved usability and reliable summary reporting, the service could provide a useful aid to their practice. Tutor workshop(s) Tutors saw value in the system as an automated means to provide real-time feedback to students Tutors liked the chat visualisation as it gave them an immediate sense of whether learners were engaging Tutors disliked word ratings, presented without a clear rationale. Learner focus group Improve the usability – the widgets are not intuitive Improve the help, provide tool-tips, visual hints, “what is this?” buttons The numbers are not meaningful and the text feedback is not constructive Provide constructive suggestions for ways in which the recipient of the feedback might improve Page 163 of 349 D7.4 - Validation 4 Learner focus group 2 (prioritisation of enhancements) Teaching manager interview Learners judged that the five most important areas for enhancement of the system (i.e. clusters) are: 1. Usability – navigation is not intuitive 2. Performance indicators – numerical data represents a high cognitive load for students to interpret 3. Unclear how to address feedback – Suggestions for this must be displayed where scores are low Learners judged that the five most important single improvements that should be made to the system are: 1. Provide “less of more” – better summary data 2. Improve the way in which text feedback is displayed, use colour to identify good and bad, instead of words/numbers 3. Feedback on individual utterances would be more helpful if the user could navigate the utterances in forum format, list view is difficult to follow. Provide the service as a dashboard to allow tutors to view and compare performance across multiple groups. Improve the reliability of the service and adjust the workflow of how feedback is provided to address institutional requirements for feedback. Page 164 of 349 D7.4 - Validation 4 Section 5: Results – validation activities informing transferability, exploitation and barriers to adoption VALIDATION ACTIVITY Partner(s) involved: PUB-NCIT Service language: English Additional formative results (not associated with validation topics) Tutor interviews PolyCAFe would not be useful for classes requiring special skills or that need to solve practical problems or artistic classes. Also not good for Maths or other classes with formulas, neither for scripted activities. Suggestion: use PolyCAFe for discussions on European projects between partners to highlight differences in ideas, concept coverage, collaboration, etc. PolyCAFe might be very useful if used as an assessment standard for various assignments and to determine which are the (novice) tutors that are not giving correct feedback. Tutor workshop(s) Problems with getting the corpora required for the latent semantic spaces for a lot of subjects: there should be a mechanism to use available online text or books Very useful for free chatting The tutors highlighted the difficulty to find relevant discussion subjects for some domains/courses that are are suitable for being analyzed by PolyCAFe The tutors wanted feedback with regard to completing or reaching learning objectives during a dicussion Learner focus group 1 Concerns with data privacy. Some students do not feel the need for feedback after being engaged in a chat conversation. Usability problems for those users that are not very experienced with software tools. It would be more useful for open-topic discussions. Useful for domains like law, marketing, social sciences, etc. Very useful to have PolyCAFe‟s feedback for subjects where the learners are novice (or not very advanced). Teaching manager “I think that the product has a great potential and it would be very useful for other departments as well… for example, Page 165 of 349 D7.4 - Validation 4 interview the department of economics, pedagogy, …” “There might be a problem with the acceptance of other teaching mangers of this combined form of face to face and online (hybrid) learning” “I am also convinced that this can be an economy for the University from many points of view, but mainly it should be first an improvement in the educational process and in the knowledge that the students acquire…” Other (please specify) VALIDATION ACTIVITY Major issues encountered in transferring PolyCAFe to Romanian: o No open-source POS tagging software is available. Solution: build one from annotated corpora that is available freely o No Romanian WordNet is available for open usage. Solution: use dictionary entries instead o The stemmers for Romanian are not as good as for English. Solution: use a lemmatizer. Partner(s) involved: UNIMAN Service language: English Additional formative results (not associated with validation topics) Alpha testing Major issues encountered in transferring PolyCAFe to the Medicine domain: o The detailed issues discussed in the forum are not always detected by the system, though they may be relevant to the high-level topics prescribed for discussion within the activity. o “Good / Bad” results for individual performance do not meet institutional feedback requirements. Tutor (facilitator) interviews Facilitators will only adopt the service if it provides constructive guidance. They need aggregated data about participation levels, suggestions for topics and clearer reporting of key issues to help plan interventions. The service would need to be improved in its accuracy and usability to provide concise, objective feedback. The service outputs take too much time to interpret and are not meaningful in their present format. Tutor workshop(s) Tutors see the value of a service to summarise discussion forum activity but have different information requirements to that currently output by the service. They would like to see information about how well different groups are performing, participation levels and critical issues, in a tutor dashboard. Page 166 of 349 D7.4 - Validation 4 More summary data reported in a clearer interface across multiple groups would help tutors to identify intervention points more easily. There is a perception that there is “too much” information delivered by the PolyCAFE service in its current format. The topics on which PolyCAFE provided feedback to students were not the same as those identified by tutors who viewed the student outputs and tutors felt that unless there was a workflow to manage the release of feedback, they would not trust the system. Learner focus group 1 Students suggested that tab-based navigation might make the service simpler to use than having to select individual service components from a list of widgets. They were keen that the service be integrated into the institutional VLE, with reporting on individual forums at the thread level Students were frustrated by the feedback because it didn‟t suggested how low scores could be improved. The use of numerical scores was disliked as there was no legend as to how the numbers were derived. Trust in the numbers was low. Words like “Bad” on feedback were not considered useful and students felt that if they received a “Bad” grading it would not motivate them to contribute further. Colour coding was suggested as one means of addressing and consolidating both numeric and text based feedback, using some kind of traffic-light based system to identify areas in need of further development. This was considered by all present to represent a good approach which would motivate people to engage to achieve the “Green light” status. Learner focus group 2 Students were keen to use the system with the institutional VLE. They suggested that if processing was nearer to real-time the results would motivate them to develop their responses more thoroughly. The scoring mechanisms need to be simple, the cognitive load of using different scales of numeric and text data was considered higher than reading through the forums. By changing the presentation format of the results to a simplified single scale, the students thought it would probably help them to judge the results more easily and would have greater potential to save time in identifying areas for further development. Teaching manager interview System does not meet institutional policy requirements about feedback. Further work is needed to improve the format in which the students are presented with the feedback from PolyCAFe. “Blackboard is the institutional VLE at the University of Manchester. There are clear guidelines about student feedback and this needs to address the requirements outlined in these guidelines. Statements such as “good” and “bad” are not sufficiently justified in the software to provide a basis on which we could currently consider implementing this on an institutional basis” Page 167 of 349 D7.4 - Validation 4 Other: workshop with the e-learning team A workshop was held with members of the Faculty e-learning team (n=22). Results of the Likert question "How likely are you to adopt PolyCAFe in your own educational practice?" were: mean 3.7, SD 0.95, 68% Agree/Strongly Agree Other PolyCAFe works on keywords, but concepts in professionalism often contain more than one word (Swiss Cheese Model, personal hygiene) The keywords we selected as relevant for the learning task didn't undergo any tuning. Related to this is the important issue that we weren't 100% convinced that we were mapping our course to keywords well enough. It was really difficult to know what to do in this case. Transferability questionnaire: Relevance of the service in other pedagogic settings Pedagogic setting Reason(s) Pedagogic settings for which the service would be suitable: Setting 1: Use of PolyCAFe together with chat or forums for revising for exams. Reasons: Students get feedback for their understanding of the topic under discussion and suggestions for improvement. Settting 2: Use of PolyCAFe together with chat or forums for finding collaborative solutions to problems that can be described without the importance of a sequence of steps (PBL). Reasons: Students get feedback for their understanding of the problem and of the elements proposed as a solution. Setting 3: Use of PolyCAFe together with chat or forums to further investigate a given topic of interest to the learner (Self Regulated Learning). Reasons: Students can assess their understanding of the topic relative to their peers in the chat, learn from their peers, see which peers are more knowledgeable. Pedagogic settings for which the service would be less suitable: Setting 1: Use of PolyCAFe together with chat or forums in a setting that involves scripted collaboration. Reasons: The scripts are not accounted by PolyCAFe in the analysis. Page 168 of 349 D7.4 - Validation 4 Pedagogic setting Reason(s) Setting 2: Use of PolyCAFe together with chat or forums in a setting that does not involve or require collaboration (that is designed to be solved individually). Reasons: PolyCAFe is designed to be used in collaboration focused settings. Setting 3: Use of PolyCAFe together with chat or forums in a setting where students do not collaborate efficiently and they focus on providing long answers to questions, without debating. Reasons: PolyCAFe is designed to be used for collaborative discussions where the posts/utterances are relatively short and the participants engage in active discussions that also involve short messages and arguments. Page 169 of 349 D7.4 - Validation 4 Transferability questionnaire: institutional policies and practices Interoperability with the institutional LMS is needed (PUB-NCIT: Moodle; UNIMAN: BlackBoard). Privacy concerns about the use of chat conversation and discussion forums of the students by the services could be overcome by anonymizing, not showing information about students to peers without their consent, asking for students' consent before using the services. Transferability questionnaire: Relevance of the service in other domains Types of domain Reason(s) Types of domain for which the service would be suitable: Setting 1: All domains where textual descriptions of a descriptive knowledge would be suited and sufficient (little or no images, formulas, specific data that involves certain numbers and procedural knowledge are necessary): several areas of computer science, literature, psychology, education, social and human sciences. Reasons: Linguistic technologies (NLP pipe, LSA, ontologies) do not account for pictorial descriptions. Moreover, it is difficult to address procedural knowledge, since the order of the steps for a procedure is crucial and difficult to be analysed automatically (as stated before the sequence of steps is unknown > similar to scripts). Types of domain for which the service would be less suitable: Setting 1: See above: several areas of geography, medicine, mathematics, physics, engineering, etc. Reasons: See above Setting 2: All domains for which it is difficult to have a large corpus of relevant text material. Reasons: Needed for LSA training. Page 170 of 349 D7.4 - Validation 4 Transferability questionnaire: Relevance of the service in other domains Types of domain Reason(s) Types of domain for which the service would be suitable: Setting 1: All domains where textual descriptions of a descriptive knowledge would be suited and sufficient (little or no images, formulas, specific data that involves certain numbers and procedural knowledge are necessary): several areas of computer science, literature, psychology, education, social and human sciences. Reasons: Linguistic technologies (NLP pipe, LSA, ontologies) do not account for pictorial descriptions. Moreover, it is difficult to address procedural knowledge, since the order of the steps for a procedure is crucial and difficult to be analysed automatically (as stated before the sequence of steps is unknown > similar to scripts). Types of domain for which the service would be less suitable: Setting 1: See above: several areas of geography, medicine, mathematics, physics, engineering, etc. Reasons: See above Setting 2: All domains for which it is difficult to have a large corpus of relevant text material. Reasons: Needed for LSA training. Page 171 of 349 D7.4 - Validation 4 Section 6: Conclusions Validation Topics OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated Qualifications to validation UNIMAN (English) PUB (English): Some speech acts have low precision and recall which should be improved (or at least further investigated). UNIMAN (English): Speech acts not sufficiently correct. PVT1: Verification of accuracy of NLP tools OVT1.1 The tutors/experts find that the speech acts discovered in the conversation (chat or forum) are correct. PUB (English) OVT1.2 The tutors/experts find that the labels corresponding to Garrison‟s community of inquiry model in a forum are correct. UNIMAN (English) UNIMAN (English): Accuracy should be better to validate unconditionally. PUB (English): No forums at PUB. OVT1.3 The tutors/experts find that the scores assigned to the utterances are correct. PUB (English) OVT1.4 The tutors/experts find that the scores assigned to the participants for a given concept and globally are correct. PUB (English) OVT1.5 The tutors/experts find that the PolyCAFe correctly identifies the important (relevant) concepts from the conversation. PUB (English) PUB (English): Precision should be a little better. PVT2: Tutor efficiency Page 172 of 349 D7.4 - Validation 4 OVT2.1 Tutors/facilitators spend less time preparing feedback for learners compared with traditional means. PUB (English) UNIMAN (English) OVT2.2 It is easier (there is less cognitive load) for tutors/facilitators to provide feedback using PolyCAFe compared with just reading the learners‟ online conversations. PUB (English) UNIMAN (English) UNIMAN (English) UNIMAN (English): Visualisation component is a valuable aid but text / numerical feedback needs further consideration in how it is presented to make it meaningful PVT3: Quality and consistency of (semi-) automatic feedback OR information returned by the system OVT3.1 Tutors/facilitators perceive that the feedback received from the system helps them prepare feedback for learners. PUB (English) OVT3.2 Learners perceive that the feedback received from the system contributes to informing their study activities. PUB (English) OVT3.3 The feedback given by different tutors/facilitators to the same student is more consistent using PolyCAFe than without using it (there is more homogeneity among the responses provided to learners). PUB (English) PUB (English): It seems that the feedback style depends on the tutor very much and the consistency of the feedback was only slightly changed. OVT3.4 The feedback provided by tutors/facilitators after using PolyCAFe is more extensive (higher quality) than without using the system. PUB (English) PUB (English): It seems that the feedback style depends on the tutor very much and the quality of the feedback was only slightly changed. Page 173 of 349 UNIMAN (English) UNIMAN (English): The feedback encourages individual reflection but feedback needs to provide clearer identification of areas for improvement D7.4 - Validation 4 PVT4: Making the educational process transparent OVT4.1 Using PolyCAFe, tutors/facilitators monitor the learner‟s participation in online discussions better: detect conversations with bad/good collaboration, discover the coverage of concepts of each participant and other differences between learners. PUB (English) UNIMAN (English) OVT4.2 The system provides learners with information that helps them reflect better on their performance as individuals and as group members compared with traditional means. PUB (English) UNIMAN (English) UNIMAN (English): Service encouraged learners to reflect on their performance but the quality of feedback is inconsistent OVT4.3 The visualization offers the users a better understanding of chat conversations and discussion forums PUB (English) UNIMAN (English) UNIMAN (English): Learners were more positive about the visualisation than the facilitators. PUB (English) PUB (English): Fully validated for collaboration but not validated for content PUB (English) UNIMAN (English) PUB (English): The results are not satisfying to validate unconditionally as the average scores were between 3.5-3.8/5.0 UNIMAN (English): Students felt that the feedback did not provide clear actions to remedy areas identified as poorly performing PVT5: Quality of educational output OVT5.1 Learner performance in online discussions is improved in the areas of content coverage and collaboration when using PolyCAFe. PVT6: Motivation for learning OVT6.1 The direct feedback provided by the system encourages learners to undertake further study to address gaps in their coverage. Page 174 of 349 D7.4 - Validation 4 PVT7: Organisational efficiency OVT7.1 There is a saving in institutional resources overall. PUB (English) PVT8: Relevance OVT8.1 The service meets one or more institutional objectives PUB (English) UNIMAN (English) PVT9: Likelihood of adoption OVT9.1 Users were motivated to continue to use the system after the end of the formal validation activities PUB (English) UNIMAN (English) UNIMAN (English): Average scores between 3.0-3.5/5.0 OVT9.2 A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users). PUB (English) UNIMAN (English) PUB (English): Average score 3.75 UNIMAN (English): Average score 3.46 OVT9.3 Tutors attending a dissemination workshop give high scores to the question 'how likely are you to adopt the service?' PUB (English) PVT10: Additional WP-specific VTs, related to Unique Selling Points OVT10.1 Learners are more involved and motivated with the course by using PolyCAFe. PUB (English) Page 175 of 349 PUB (English): the focus group results are encouraging, but this validation topic should be tested more thoroughly/formally. D7.4 - Validation 4 Exploitation (SWOT Analysis) The objective you are asked to consider is: "PolyCAFe (v1.5) will be adopted in pedagogic contexts beyond the end of the project". Strengths The strengths of the system (v1.5) that would be positive indicators for adoption are: PolyCAFe promotes learner reflection on their performance as individuals and as members of a group Feedback from PolyCAFe has been shown to improve the collaborative skills of learners in online discussions The institution requires less tutor time for feedback, support and grading. PolyCAFe monitors the participation of the students makes the learning processes more transparent e.g. locating the outliers, positioning individual learners in the peer group. PolyCAFe contributes to improving the consistency of feedback between tutors, especially if PolyCAFe is used as a start point when proving the manual feedback to the learners The feedback appears to have motivational aspects for engaging students into their activity. Weaknesses The weaknesses of the system (v1.5) that would be negative indicators for adoption are: The usability and poor guidance when using the system provides discomfort to users It is difficult to interpret the results especially due to the high amount of information The generic trust and the reliability of the system should be improved by improving accuracy It is unclear whether students will accept that their grading goes beyond the submitted products, as PolyCAFe enables the grading of the discussion processes during the production PolyCAFe is best used for topics with small numbers of key words per concept Difficulties for new sites in analyzing their situations in order to know which key words to enter into the system Opportunities The system has potential as follows: It is the only software on the market that provides complex feedback for online conversations that focuses on stimulating collaboration While the use of CSCL (conversations) is becoming more and more popular to relieve the tutor burden, the PolyCAFe software enables the tutors to monitor the individual contributions). Page 176 of 349 D7.4 - Validation 4 The system might be used as a feedback standard for training tutors assess collaborative activities There are a lot of situations when learners that participate in online discussions do not receive any feedback for their productions, therefore the automatic feedback provided by PolyCAFe would be very valuable. PolyCAFe could also be used as a starting point for a chat agent that offers live feedback for students involved in chats and forums in order to motivate them or make them engage into a better collaborative discourse. The use of Web2.0 in education often implies that there are a lot of textual outputs that are very small portions of text similar to a chat conversation. PolyCAFe could be easily adapted for this task. Threats PolyCAFe may not be suitable for all chats/discussion forum situations. It is more suited to chats/forums where (1) grading and/or detailed feedback is required, and (2) where one of the aims of the chat/forum is for social learning to take place. There may be change management issues in introducing automatic feedback / assessment systems into new environments. Introduction of the system has to overcome concerns about changes in working practices and whether PolyCAFe's output can be trusted Some learners may feel uncomfortable being monitored during their collaboration It may be difficult to integrate PolyCAFe with the IT architecture of the institution. Institutions may be deterred from adopting PolyCAFe owing to the extent of initial training in interpreting the data. Privacy issues might arise in some institutions when using the system to analyze the contributions of students Overall conclusion regarding the likelihood of adoption of PolyCAFe Version 1.5: PolyCAFe is an innovative service, first on the market, for providing complex feedback for online chats and forums used in collaborative learning situations. It saves time for assessing the conversation for tutors, but it is best suited to specific pedagogic settings in which tutors analyze the discussions and the students expect to receive feedback. However, it is important to enhance the functionality of the system by improving the usability, providing better help and instructions of using and interpreting the feedback. In order to be more convincing for users, PolyCAFe should be tested in more domains and various contexts in order to prove its reliability and usefulness for the users. Most important actions to promote adoption of PolyCAFe: Page 177 of 349 D7.4 - Validation 4 Improve the interface and the usability of the system Provide a better help and interpretation of the feedback Transform the indicators provided as feedback into advice that guides the learners instead of assessing them Extend PolyCAFe to various domains and another language where there is an interest in adopting the service Use PolyCAFe‟s feedback live in a chat conversation in order to guide the learners‟ conversation Improve the privacy of the system, by anonymization, user-alias management or by explicit sharing of the locus of control Page 178 of 349 D7.4 - Validation 4 Section 7 – Road map Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important future enhancements to the system in order to meet stakeholder requirements: Most important: 1. Improving the interface by supporting the users with an easy overview/interpretation of the vast amount of data generated. 2. Improving the help section and providing illustrative examples of relevant collaboration patterns. 3. The feedback should be expressed in a more direct manner (like an advice), instead of just providing scores and indicators. 4. Integrate PolyCAFe into a chat environment in order to provide “live” feedback or to join as agent the discussion process. 5. Extending PolyCAFe for Romanian in order to be used for more domains at PUB-NCIT (for courses conducted wholly in Romanian). Other: Improving the visualization of the discussion threads (and enhancing the threads detection algorithms). Implement an anti-spam mechanism. Improve the privacy by using a better mapping between users and aliases used for anonymization. Improve the semantic similarity measure by considering alternatives like Latent Dirichlet Allocation and ontologies. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important changes to the current scenario(s) of use in order to meet stakeholder requirements: Most important: 1. Focus on the importance of the collaboration aspect of the task to be solved. 2. Ask the learners to be as involved as possible in the collaborative task. 3. Try to develop a method for the automatic evaluation of the degree of reaching the learning outcomes, which are specified by the teacher when setting up the assignment. This way the usage scenario is extended to cases where teachers need to assess the degree of reaching/fulfilling a Page 179 of 349 D7.4 - Validation 4 learning outcome for a chat or forum assignment (or for each participant in such a conversation). 4. Investigate using PolyCAFe for discussions that arise naturally in forums, chats or twits of Communities of Practices or learners that use Web2.0 tools naturally (outside a formal education context). 5. As in certain situations it is difficult to determine the key concepts for the discussion beforehand, develop a scenario that uses PolyCAFe without specifying them. Other: Automatic training of semantic spaces from online resources or books Based on the results and conclusions from validation, the LTfLL team has agreed that the following are possible additional educational contexts for future deployment: Most important: 1. Formal education where it is important to have open collaborative online debates between several participants. 2. Collaborative discussions of online Communities of Practice that use forums or twits for communication (informal learning). 3. Language learning (foreign or native) for inexperienced learners that need to learn specific concepts by debates and examples. 4. Argumentation-based learning situations that use online discussions. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important issues for future technical research to enable deployment of language technologies in educational contexts: Most important: 1. Because the pilot participants (especially the tutors) expressed concerns and alternative viewpoints on the meaning of good collaboration, it is important to develop several alternative algorithms for assessing the degree of collaboration in a discussion and compare their results on a golden standard in order to determine the best one. 2. Because the current implementation of PolyCAFe discovers a great number of implicit links, several elongated discussion threads have been observed. Therefore, there is a need to improve the discussion threads partitioning algorithms in order to split these threads. 3. As various alternatives exist for measuring semantic similarities, it is good to also integrate alternative methods for computing the semantic similarities between utterances by also considering supplements to LSA: pLSA, LDA, ontologies. Page 180 of 349 D7.4 - Validation 4 4. Further extend the implicit links detection mechanism by studying newer algorithms for coreference resolution and RST parsers. These algorithms should be especially designed for conversations with at least two participants because the discourse model and performance of algorithms that work well on written texts (and are tested on the standard MUC and DUC corpora) do not have very good performance in conversations. 5. Develop an algorithm to measure the coverage of a learning outcome within a discussion by comparing different machine learning and evaluation techniques in order to be able to use PolyCAFe in this new learning scenario. Roadmap - validation activities Further validation planned for beyond the end of the project: OVT10.1. Learners are more involved and motivated with the course by using PolyCAFe. Claim (OVT): Methodology: Questionnaires, pre and post tests, comparison with control group, extended activity. Objective (OVT): Methodology: Learners are knowledgeable in the domain of the course by using PolyCAFe. Objective (OVT): Methodology: Test the accuracy of the feedback for more courses and topics in order to assess and improve the reliability of the service. Questionnaires, pre and post tests, comparison with control group, extended activity. Measurements Page 181 of 349 D7.4 - Validation 4 Appendix B.5 Validation Reporting Template for WP5.2 (UPMF / CNED) Section 1: Functionality implemented in Version 1.5 and alpha / beta-testing record Brief description of functionality Version number of unit Changes from Version 1.0 Code structure 1.5 In order to make maintenance easier, the structure of the code has been cleaned up and reorganized. Additionally, these changed had to be done in order for widgetization to be handled more smoothly. R-LSA integration 1.5 LSA implementation requires a deprecated OS, which is a strong constraint for system managers who will not dedicate one server to the system. This coincidentally allowed us to improve the feedback algorithm and the response time (measures have shown improvement comprised between 60 and 80%, see D5.3) Learner judgment on feedback 1.5 In order to make Pensum‟s role clearer to the learners (allow them to reflect on their writing so as to understand the documents better and not fixing the shortcomings of their synthesis to them) as well as the fact that semantic analysis cannot be 100% accurate, the learners can question the feedback provided and even justify it. If they use the functionality, it provides tutors insight on the student‟s intent and can trigger valuable exchanges. Severity management 1.5 This functionality allows the learner to handle the quantity of feedback provided by the system, which can display only the most important issues or many of them. Depending on what they are doing (first draft/last proof-reading), the functionality can be an asset to the learner‟s organization of their work. Version handling 1.5 All the successive versions of the synthesis are stored along with feedback and learner actions on feedback. The learner will benefit from it, in that any action Page 182 of 349 D7.4 - Validation 4 taken upon feedback will be propagated to later versions. Teachers and researchers alike use it as a data source relatively to the learner‟s writing process and response to feedback by the system. Alpha-testing Pilot site language and UPMF- French Date of completion of alpha testing: End of September, 2010 Who performed the alpha testing? Philippe Dessus & Mathieu Loiseau Beta-testing Pilot site and language: UPMF – French Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): No. If „No‟ or „Partially‟, give reasons: The Pensum Widget (v 1.6 and up) was being developed while the experimentation took place. Widgetization required the extension of the in-depth restructuring of the code. It also required the development of various new features (multiple vector space handling, text addition, etc.), which were not relevant to the testing at hand. We were also in the process of reworking of some existing services (such as sentence detection). We thought it best to have student work on the latest stable version of the system. Beta-testing performed by: 17 participants (learners, n = 9; tutors, n = 8). beta testing environment (stand-alone service / integrated into Elgg): Stand-alone service Page 183 of 349 D7.4 - Validation 4 HANDOVER DATE: 12 October 2010 (Date of handover of software v.1.5 for validation) Page 184 of 349 D7.4 - Validation 4 Section 2: Validation Pilot Overview Note: The underlying goal of this pilot study is to test Pensum in real-world settings, with learners attending an existing ICT course at a distance.Ethical requirements of French universities for experiments involving humans imply they are informed on the goals of the study, as well as their full voluntariness in participating to the study, so they can leave the experiment at any moment. This section only mentions one Pilot site, but we were in contact with another Pilot Site (CNED-University of Rouen, France), involving 51 bachelor students in Educational Sciences attending the same e-learning course as CNED-Lyon. The CNED-Rouen students have been contacted to participate as volunteers to this pilot study and only three of them performed the whole study. We thus decided to integrate them to the analysis of the first Pilot site, as they were given the same task and attended exactly the same courses in the same conditions. We performed a first analysis of the reasons why we had so few participants from CNED-Rouen involved in this study: one of us went to Lyon during the first students‟ meeting in order to present the experiment and the task in the beginning of November whereas the CNED-Rouen students were contacted by e-mail only, yielding a weaker involvement on the task. This point is taken into account in the “Most important actions” rubric (section 6 of this document). NB Information about pilot sites, courses and participants has been transferred to Appendix A.3 Pilot task Pilot site: CNED-Lyon-Rouen Pilot language: French What is the pilot task for learners and how do they interact with the system? 51 participants preparing their Bachelor degree in Educational Sciences (CNED) were recruited to write two syntheses but only those who answered to the questionnaire were considered. Four tutors (Master degree on Educational Sciences) were also recruited to manage learners. Learners had to use Pensum at a distance in order to write out two syntheses of the first two parts of their ICT course. No length constraints were given, and 10 days were left for the students to write out both syntheses. They were asked to use Pensum to write a first synthesis and were randomly divided in two groups (first synthesis: experimental group, n = 17; control group, n = 22). The experimental group was guided to Pensum whereas the control group was guided to a fake interface of Pensum, without feedback and with very low level text formatting possibilities, in order to prevent them to use the advanced functionalities of a word Page 185 of 349 D7.4 - Validation 4 processor. After writing the first synthesis, the control group was asked to use Pensum and the experimental group to use the fake Pensum to have a counterbalanced design (second synthesis: experimental group, n = 16; control group, n = 11). Experimental results were analyzed on the merging of the first and second experimental groups. Control group results were analyzed only for the first control group with participants who had not previously used Pensum, because order effects were detected. After having written each synthesis, learners were asked to fill out the LTfLL likert questionnaire. Finally, 12 learners agreed to participate to an interview by phone. What do the learners produce as outputs? Are the outputs marked? All participants produced two syntheses corresponding to the first two exercises in their ICT lessons in a counterbalanced design (first summary with Pensum, the second without it or vice versa). The first exercise provided was to summarize one document, whereas the second summarized two documents. The answers of the learners to the questionnaire were crossed with the traces of their activity on Pensum, in order to see whether the extensiveness of their use of the system influences their judgement. Finally, outputs were marked by expert teachers. How long does the pilot task last, from the learners starting the task to their final involvement with the software? 10 days How do tutors/student facilitators interact with the learners and the system? Learners can send questions by e-mail to tutors. For the second synthesis, tutors can also use the notepad to share their comments with the students. Describe any manual intervention of the LTfLL team in the pilot: None. Experiments Page 186 of 349 D7.4 - Validation 4 Name of experiment: Expert evaluation Objective(s): Assess the quality of the Pensum‟s feedback compared to expert‟s feedback Assess the output quality Measure the time saved to correct Pensum‟s syntheses vs. word processor-based syntheses (written without feedback) Details: Participants: Seven teachers specialized in synthesis writing (associate professors and postdoctorate students in Linguistics) were recruited (6 females, 1 male) as experts to mark a random set of the written syntheses (with and without Pensum). One expert has been removed because he didn‟t perform his task and another evaluated only the half of his syntheses. Task: Four different syntheses were assigned to each expert: two writing with feedback (Pensum) and two writing without feedback as control group (the fake Pensum interface). Only one synthesis was assigned to all experts to evaluate an inter-rater agreement score. They were asked to mark synthesis to each of three levels: Concept present in the learners‟ syntheses, which are not present in the source text, gaps in coherence, and concepts missing in the learners‟ syntheses that are present in the source text. Analysis: Then we calculated the percentage of Pensum‟s Recall and Precision. The precision corresponds to the relevance of the information given by the system. Recall corresponds to the feedback‟s accuracy compared to information identified by the expert. The overall Recall and Precision rates correspond to taking into account all experts and syntheses but it is possible to analyse Recall and Precision synthesis by synthesis to evaluate the matching between Pensum‟s feedback and expert‟s feedback. The experts were also asked to grade the syntheses, in order to try to evaluate the potential gain in quality of student outputs provided by Pensum. Name of experiment: Dissemination Workshop Objective(s): Promote Pensum to a larger number of likely tutors Evaluate Pensum‟s acceptance within this group Collect tutors‟ viewpoints and suggestions about Pensum Details: Participants Page 187 of 349 D7.4 - Validation 4 27 participants (including 6 teachers) took part of our workshop. Workshop The workshop took place during a French congress dedicated to Internet innovations and notably to the web 2.0-based education. We were invited to present the LTfLL project and more precisely Pensum. First, we introduced the LTfLL project and Pensum to participants (10 min), following a demonstration of Pensum’s widget in writing a synthesis (10 min). Second, we conducted a debate based on the LTfLL focus group template (10 min). Finally the dissemination workshop questionnaire was distributed (filled out and reclaimed after the workshop). Nevertheless only 6 tutors (out of 27 who attended the workshop) agreed to answer to the dissemination questionnaire. Page 188 of 349 D7.4 - Validation 4 Section 3: Results - Validation/verification of Validation Topics OVT 1.1 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic According to tutors, in a high proportion of cases, the feedback presented by the system correctly identifies the concepts missing in the learners‟ syntheses that are present in the source texts Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Calculation of Precision and Recall of the Feedback (expert‟s experiment) Results: 22 syntheses were evaluated by Experts [4 syntheses by tutors (5) and 2 syntheses for the last expert] Numbers of feedback given by Pensum were counted, as those given by experts. Common feedback is feedback given both by Pensum and by Experts. Feedback Pensum Expert Common Mean feedback by synthesis 5.6 15.4 1.9 In average, Pensum identify less concepts missing than experts do, and those detected are sometimes different than those the experts detect. Thus the overall Recall is 12% and the overall Precision is 34%. On average, tutors‟ (n = 4) opinion to the question of missing concepts identification (Q. 34: “the feedback presented by the system correctly identifies concepts missing”) is negative (M = 1.5; SD = 0.58). Page 189 of 349 D7.4 - Validation 4 OVT 1.2 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic According to tutors, in a high proportion of cases, the feedback correctly identifies concept present in the learners‟ syntheses, which are not present in the source texts. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (4) questionnaire and by calculating the Precision and Recall of the Feedback Results: Feedback Pensum Expert Common Mean feedback by synthesis 1.6 1.1 0.4 In average Pensum identify more concepts present than experts do. The overall Recall is 39% and the overall Precision is 26%. On average, tutors‟ (n = 4) opinion on the question of the off-topic identification (Q. 35: “the feedback correctly identifies concept present in the syntheses but not in the source texts”) is negative (M = 2.3; SD = 1.26). Formative results with respect to validation indicator Stakeholder type Results Tutors “Off-topic detection yields relative performance, it depends on the instructions and the expected length of the text.” Page 190 of 349 D7.4 - Validation 4 OVT 1.3 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic According to tutors, in a high proportion of cases, the feedback correctly identifies gaps in the coherence of the learners‟ syntheses. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (4) questionnaire and by calculating the Precision and Recall of the Feedback Results: See OVT 3.12 Feedback Pensum Expert Common Mean feedback by synthesis 20.9 1.8 1.5 In average Pensum identify more gaps in the coherence than experts do. The overall Recall is 85% and the overall Precision is 7%. On average, tutors‟ (n = 4) opinion to the question of the gaps in coherence identification (Q. 36: “the feedback correctly identifies gaps in the coherence”) is negative (M = 2.3; SD = 1.26). Formative results with respect to validation indicator Stakeholder type Results Tutors “A lot of gaps in coherence”, Tutors “There are many gaps in coherence identified by Pensum but often unjustified” Tutors “The low reliability of coherence gaps feedback calls into question its inclusion in the software” Page 191 of 349 D7.4 - Validation 4 OVT 2.1 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic The tutor spends less time preparing feedback compared to traditional means. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (4) questionnaire, communication between learners and tutors, and time spent by experts to mark the synthesis. Results: Number of e-mails received and sent by tutors per learner, for the first syntheses‟ task. Group Experimental Control Nb total e-mail Received e-mail total e-mail Sent e-mail per participant received by tutors per learner sent by tutors learner 33 41 1.2 39 1.2 22 47 2.1 46 2.1 Since each tutor of the experimental group received and sent less e-mails than those of the control group, we can infer that experimental group‟s tutors spent less time to prepare feedback compared to traditional means. Only one tutor interacted with their learners through the notepad. Group Experimental Control N experts 6 6 N synthesis 11 11 Mean time for correction in min (SD) 20.1 (6.1) 22.9 (11.2) In average each expert spent 20.1 min to assess Pensum‟s pre-assessed syntheses (n = 7 experts; n= 2 syntheses by expert) whereas each expert spent 22.9 min to assess control syntheses (n = 7 experts; n= 2 synthesis by expert). Note: 1 expert has been removed (work not done) and 1 expert has corrected only one of each. Moreover, there is moderate correlation (r = –.63; p = .05) between the time spent to mark syntheses and the number of requested feedback. This shows that the more learners ask for feedback, the less time is spent to mark the syntheses. Page 192 of 349 D7.4 - Validation 4 Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Tutors 7. It takes less time to complete my teaching tasks using Pensum than without the system. Experimental 2.8 1.26 25% 4 Tutors 8. Using Pensum enables me to work more quickly than without the system. Experimental 2.5 1.29 25% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors “The assessment with Pensum‟s interface is longer because there are not enough tools like arrows, colours or other visual cues to insert in the synthesis”. Tutors “For long lessons or with various documents, Pensum can help us save time.” Tutors “But it facilitates education especially for students who have not synthesized texts long ago” OVT 2.2 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic It is easier (there is less cognitive load) for tutors to provide feedback using Pensum compared with just reading learner texts Page 193 of 349 D7.4 - Validation 4 Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (4) questionnaire Results: Only questionnaire-related results. Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Tutors 11a. Rate mental effort (reversed scale) Experimental 3.5 1.0 75% 4 Tutors 11b. System requires less mental effort Experimental 3.0 1.83 50% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors “Improvements must be made in the communication between the tutor and the learner” Tutors The interface is “not satisfactory”, Tutors It needs ”Clues about the of learners‟ certainty” on the relevance of their synthesis Tutors It needs “A help button” Tutors It needs “A presentation of tutor's remarks in parallel to synthesis displays” OVT3.1 Pilot site CNED-LyonRouen Pilot language French Operational Validation Topic The teacher„s activity shifts towards providing more advanced feedback. Summative results with respect to validation indicator Page 194 of 349 D7.4 - Validation 4 Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (4) questionnaire Results: Only questionnaire-related results Questionnaire type Questionnaire no. & statement Experimenta l / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Tutors 44. To what extent do you think your job as a teacher/tutor has been transformed by the fact that when students used Pensum, it gave them pieces of feedback on their work. Experimental 3.0 0.82 25% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors “If the learner is not motivated in his task, Pensum is as useful as paper-pencil.” Tutors workshop “Automatic systems are needed to unburden teachers of repetitive tasks but it should not replace the teacher ” OVT3.2 Pilot site CNED-LyonRouen Pilot language French Operational Validation Topic The feedback given is more consistent than that of different tutors (there is more homogeneity among the responses provided to learners). Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Learners (33) questionnaire Results: Only questionnaire-related results Questionnaire type Experimental / control group Questionnaire no. & statement Page 195 of 349 Mean (/5) Standard deviation %Agree / Strongly agree n= D7.4 - Validation 4 Learners 31. I perceive the feedback as more consistent than that of tutors Experimental* 2.1 0.8 0% 33 OVT 3.3 Pilot site Pilot language Operational Validation Topic Learners find the feedback given by the system is mostly correct. CNEDFrench LyonRouen Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Learners (33) questionnaire Results: Only questionnaire-related results Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Learners 32. I find the feedback given by the system is correct in more than 75% of the cases Experimental* 2.8 1.06 27% 33 *question for experimental group only Formative results with respect to validation indicator Stakeholder type Results Learners “We don‟t know why what we do is wrong, there is no comment.” Learners The feedback is “sometimes right, sometimes wrong”. Learners The feedback “creates misunderstandings”. Learners Feedback is “not clear enough and not quite readable”. Page 196 of 349 D7.4 - Validation 4 OVT 3.4 Pilot site Pilot language Operational Validation Topic Learners find the feedback given by the system is relevant to (i.e. useful to them in) the task in hand. CNEDFrench LyonRouen Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Learners (33) questionnaire and trace analysis Results: Questionnaire and correlations between answers and trace-related data. Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Learners 33. I find the feedback given by the system is relevant to (i.e. useful to them in) the task in hand. Experimental* 2.9 1.01 27% 33 *question for experimental group only The correlation between Q. 33 and the number of requested feedback is r = 0.42, p < .05, showing a moderate but significant relation between the opinion of relevance and the actual use of the feedback Formative results with respect to validation indicator Stakeholder type Results Learners Pensum helps to “read more precisely the course” and to “memorize more things and (...) retain the key-passages” Learners “The feedback is not fine enough”, “not clear enough”, and “not explicit enough”. Page 197 of 349 D7.4 - Validation 4 OVT 3.5 Pilot site Pilot language Operational Validation Topic Learners trust the feedback provided by Pensum . CNEDFrench LyonRouen Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Learners interview Results: Formative results only. Formative results with respect to validation indicator Stakeholder type Results Learners “To me, feedback can identify the mistakes and they are often justified, it was a help and it is obvious that feedback can‟t be perfect” Learners “I‟m often disagree with the feedback, my synthesis seemed OK for me but not for Pensum.” Learners “ I had no confidence in the feedback” Page 198 of 349 D7.4 - Validation 4 OVT 3.6 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic The tutors find the feedback given by the system at the right level considering the task in hand in a high proportion of cases. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (4) questionnaire Results: Questionnaire-based results only. Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Tutors 41. The gap in coherence feedback helps learners write readable synthesis Experimental 3.3 0.96 50% 4 Tutors 42. The off-topic feedback helps learners write syntheses with no or few off-topic Experimental 3.8 0.5 75% 4 Tutors 43. The concept missing feedback helps learners write syntheses with only important concepts present in the source text Experimental 3.0 1.15 50% 4 OVT 3.7 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic Tutors perceive that the feedback from Pensum provides a reliable source of information about learners' conceptual coverage. Summative results with respect to validation indicator Page 199 of 349 D7.4 - Validation 4 Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors (4) questionnaire Results: Questionnaire-related results only. Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Tutors 45. The gap in coherence feedback gave me a reliable source of information about learners' conceptual coverage. Experimental 2.8 1.71 25% 4 Tutors 46. The off-topic feedback gave me a reliable source of information about learners' conceptual coverage. Experimental 2.8 0.96 25% 4 Tutors 47. The concept missing feedback gave me a reliable source of information about learners' conceptual coverage. Experimental 3.3 0.96 50% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors “The feedback is not explicit enough on what Pensum expects from learners”. Tutors “Repetitions induced by Pensum‟s feedback and by the necessary changes to correct them provide a better learning and highlight learner‟s essential ideas.” OVT 4.1 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic Learners can receive feedback whenever they want Summative results with respect to validation indicator Page 200 of 349 D7.4 - Validation 4 Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Learners (33) questionnaire Results: Questionnaire-related results only. Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Learners 10. Pensum provides me with the requested information when I require it (i.e. at the right time in my work activities). Experimental* 3.2 0.88 30% 33 *question for experimental group only Formative results with respect to validation indicator Stakeholder type Results Learners “The feedback should be automatic.” OVT 5.1 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic The textual output that is handed over to the teacher is of better quality. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Comparison of experts‟ marks and trace analysis Results: Group Experimental Control N experts N syntheses Mean scores /20 (SD) 7 7 13 13 12.33 (3.6) 12.25 (3.4) Page 201 of 349 D7.4 - Validation 4 The average of Pensum‟s synthesis are marked 12.33/20 whereas the average of Control‟s synthesis are marked 12.25/20. Nevertheless the difference is not statistically significant, F(1,26) = 0.0002, ns. Additionally the syntheses of learners who requested feedback are not significantly better than those who did not (r = 0.25, ns.) Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Tutors 40. Better output with the system Experimental 2.5 1.73 25% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors “It needs a human behind; otherwise Pensum is similar to a word processor.” OVT 6.1 Pilot site Pilot language Operational Validation Topic The direct feedback provided by the system encourages learners to undertake further study to address gaps in CNEDFrench their coverage. LyonRouen Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Learners (experimental group, n = 33; control group, n = 22) questionnaire and trace analysis Results: Questionnaire and correlations with trace-related data. Questionnaire type Questionnaire no. & statement Learners 18. Using Pensum increases my curiosity about the learning topic. Learners 19. Pensum makes learning more interesting. Page 202 of 349 Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Experimental 3.1 1.18 45% 33 Control 3.3 1.21 36% 22 Experimental 2.9 0.95 27% 33 D7.4 - Validation 4 20. Using Pensum motivates me to explore the learning topic more fully. Learners Control 3.0 1.25 31% 22 Experimental 3.1 1.17 36% 33 Control 3.2 1.41 41% 22 Correlations between the number of distinct feedback items provided by Pensum over the course of the learner's activity and respectively Q. 18 (Pensum increases my curiosity), Q. 19 (Pensum makes learning more interesting), and Q. 20 (Pensum motivates me to explore the learning topic more fully) are respectively r = 0.44, p < .01; r = 0.40, p < .05; r = 0.38, p < .05 . These correlations show a moderate but significant relation between the motivation to use Pensum and the actual use of the feedback. Formative results with respect to validation indicator Stakeholder type Results Learners “I did not want to explore more the course because I think I've a look at this method.” OVT 7.1 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic There is a saving in institutional resources overall Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Managers (n = 3) interview, communication between learners and tutors, and time spent by experts to mark the synthesis. Results: Questionnaire, communication and experts results. As highlight above, Pensum allows to clarify the communication between learners ant tutors. Additionally, the use of Pensum allows to save time spent to mark the synthesis (see OVT 2.1). Manager # 1 indicated that the main quality of Pensum is not especially to help save time or money, but rather to enable the institution to be more aware on the competences of the learners, and their own understanding of the course material. Manager # 2 indicated that the three kinds of feedback offered to learners allow them to learn to write a summary or synthesis by identifying highly Page 203 of 349 D7.4 - Validation 4 relevant elements. The competence of writing being a core competence. In addition he brings out the time saved in the correction of synthesis and the planning of a course. Finally, Pensum would enable learners to become more involved in their learning. However, in its current state, improvements should be added especially in the production of the text quality. Although the three feedback are not sufficient, there are very important information that are not taken into account as could do a linguistic feedback on spelling, grammar, syntax or structure. Manager # 3 indicated that Pensum can save resources if used with tutors or teachers involved in learners guidance within the learning environment. Page 204 of 349 D7.4 - Validation 4 OVT 8.1 Pilot site CNED-LyonRouen Pilot language French Operational Validation Topic The service meets one or more institutional objectives. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Managers (3) interview Results: Questionnaire-only results Manager # 1 indicated that Pensum would enable the institution to solve two main problems with regard to the learners‟ competence assessment. First, with Pensum learners would be assessed on their positioning with regard to their understanding of a course, as well as with regard to competences referential. Second, as the institution has a lot of “minutes” documents from meetings, Pensum could enable learners to check their understanding, both for people who attended the meeting either for other ones. Manager # 2 indicated that one of the goals of his institution is the integration of ICT in education, the simplicity of the software and its use are a decisive advantage in its integration in his institution. Manager # 3 indicated that Pensum can solve three pedagogy-focused goals: improve the quality of teachers‟ productions (as authors), the quality of learner‟s understanding of course contents as well as the achievement rate of exams (notably transmissive ones, literature-based). OVT 9.1 Pilot site Pilot language Operational Validation Topic Users were motivated to continue to use the system after the end of the formal validation activities CNEDFrench LyonRouen Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Learners (experimental group, n = 33; control group, n = 22), tutors (4) questionnaires, and correlation with trace analysis Results: Questionnaire-based results and trace data analysis Page 205 of 349 D7.4 - Validation 4 Page 206 of 349 D7.4 - Validation 4 Questionnaire type Questionnaire no. & statement Learners 29. I would like to use the service after the pilot. Tutors 30. I would like to use the service after the pilot. Learners 30. If the service is available after the pilot, I will definitely use it Tutors 31. If the service is available after the pilot, I will definitely use it Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Experimental 3.3 1.23 45% 33 Control 3.9 1.31 68% 22 3.0 1.41 50% 4 Experimental 3.2 1.29 42% 33 Control 3.6 1.26 55% 22 Experimental 1.8 0.96 0% 4 Moreover, the correlation between Q. 30 (learner experimental group) and the number of requested feedback is r = 0.412, p < .05, showing a moderate but significant relation between the learners‟ likelihood of use of Pensum in the future and their actual use of the feedback. Finally, after the pilot, over the 51 with login and password for Pensum only two CNED students used it to synthesize another part of their ICTE course not included in our experiment. Formative results with respect to validation indicator Stakeholder type Results Learners “I will be motivated to use Pensum again, if improvements are made.” OVT 9.2 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption). Summative results with respect to validation indicator Page 207 of 349 D7.4 - Validation 4 Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Learners (experimental group, n = 33) questionnaire Results: Questionnaire-based results Descriptive Statistics - Learners N Mean Std. Deviation Effectiveness 33 2,91 ,833 Efficiency 33 2,85 ,643 Cognitive load 33 2,52 1,149 Usability 33 3,76 ,812 Satisfaction 33 3,03 ,909 Facilitating conditions 33 3,73 ,911 Self-Efficacy 33 3,43 ,860 Behavioural intention 33 3,18 1,249 CNED 33 3,21 ,656 Valid N (listwise) 33 OVT 9.3 Pilot site CNEDLyonRouen Pilot language French Operational Validation Topic Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? Summative results with respect to validation indicator Page 208 of 349 D7.4 - Validation 4 Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors dissemination (6) questionnaire Results: Questionnaire-related results only. Questionnaire type Questionnaire no. & statement Experimental / control group Mean (/5) Standard deviation %Agree / Strongly agree n= Tutors workshop How likely are you to consider adopting the service in your own educational practice? Experimental 2.3 1.36 33% 6 Page 209 of 349 D7.4 - Validation 4 Section 4: Results – validation activities informing future changes / enhancements to the system VALIDATION ACTIVITY Pilot partner: CNED Lyon & Rouen Service language: French Additional formative results (not associated with validation topics) Alpha testing Beta testing Minor interface changes (icons, size of fields, etc) Learners and tutors suggestions from Verification study (most frequent suggestions from the poll): Interface changes: Possibility to change font, highlighting, size of text… Possibility to view the feedback from within the writing mode Propose more extensive explanations on each feedback and solutions to improve the synthesis From focus group Make the feedback prompts viewable from within the synthesis writing field. Propose more extensive explanations on each piece of feedback and efficient solutions to improve the synthesis. The enhancement of the interface for writing and reading syntheses (font, highlighting, text size). Tutor interviews General suggestions: Enhance the layout (line breaks, paragraphs, ...) and don‟t let Pensum remove the existing one. Highlight the feedback Additional functionalities Highlight and bookmarking function, spelling, synonym, syntax link More feedback, keywords, tooltips A warning button that warns tutors a learner needs help Clues about the of learners‟ certainty on the relevance of their synthesis Page 210 of 349 D7.4 - Validation 4 Tutor workshop(s) A dictionary of synonyms and a thesaurus Pensum could take into account the meaning of logical connectors. Learners Interview (prioritisation of enhancements) Learners judged that the five most important areas for enhancement of the system are: To improve the feedback (33%) To enhance the layout of Pensum (25%) To improve the account of the syntax of syntheses (8.3%) To allow the highlighting of the course/synthesis (8.3%) The ability to link several sentences to each other (8.3%) To improve the compatibility with the web browser (8.3%) To improve the interactivity with the tutor (8.3%) Teaching manager interview A comprehensive help has to be provided to users. Integration of linguistic feedback: spelling, grammar, syntax or structure Page 211 of 349 D7.4 - Validation 4 Section 5: Results – validation activities informing transferability, exploitation and barriers to adoption VALIDATION ACTIVITY Partner(s) involved: CNED Lyon & Rouen Service language: French Additional formative results (not associated with validation topics) Alpha testing Beta testing Tutor interviews Major issues encountered in transferring Pensum to the ICT domain: o All the source texts have to be transferred manually: no automatic “import” procedure to add course texts in the database. Internet Explorer is not usable. Warn the user (or better, test) Firefox or Safari must be used. Moreover, give learners advice to use FireFox during the Pilot study. Reason to adoption Even though the reliability of feedback is not optimal, Pensum allows learners to re-read the course and the synthesis carefully and, thereby, to learn better in this way. Barrier to adoption Not enough tools like arrows, colours or other visual cues. Tutor workshop(s) Reason to adoption The teachers can focus on higher level tasks while Pensum handles lower ones Support to carry out competence improving exercises Barriers to adoption Learners who rephrase the course are more disadvantaged than those who merely paraphrase it Teachers already perform the tasks Pensum promotes, so they are reluctant to transfer their work to a machine. Learners interview Reason to adoption Allow to deepens their synthesis when they failed relevant elements in the source text Allow to limit the off-topic when they tend to write too Page 212 of 349 D7.4 - Validation 4 A better analysis of the source text Barriers to adoption Pensum takes time to learn how to use it Pensum takes time to check all the prompts and to revise the synthesis accordingly The layout is not saved No highlight It doesn‟t take into account abbreviations Not enough place to write synthesis Teaching manager interview Reason to adoption The three types of feedback help to correct the mistakes in order to work towards the improvement of the synthesis. Barriers to adoption Unless a comprehensive help is offered to the user, some functionalities (e.g., feedback rejection, feedback tolerance tuning) are not so easy to understand in the current version of Pensum (Manager #1). Unexpected technical issues Tutors and teachers implication (who actually help learners to use the tool and answer their questions) might be improved with a fully integrated specific interface to manage learners and courses easier. Transferability questionnaire: Relevance of the service in other pedagogic settings CNED is using a blackboard/webCT platform (see http://www.sciencedu.org/), thus there are difficulties to integrate our service in this platform. Transferability questionnaire: Relevance of the service in other pedagogic settings Pedagogic setting Reason(s) Page 213 of 349 D7.4 - Validation 4 Pedagogic setting Reason(s) Pedagogic settings for which the service would be suitable (see also D5.3 § 5.3 for more information): Use of Pensum for revising exams. Use of Pensum for students to get ideas from source texts before a debate (using chat) on a given topic. reasons: students can get a good overall view of the course texts and can also get feedback on their understanding. reasons: students can work on a topic without being influenced by others. Ideas are more structured than a mere reading. Use of Pensum to work on a case study. reasons: Studying in-depth cases is fostered by their reformulation (synthesis writing). Use of Pensum to trigger a debate. The participants write out their own opinion on a topic. reasons: As for a case study, a debate runs smoothly when all the participants clearly understand the different questions and issues on a given topic. Pedagogic settings for which the service would be less suitable: setting 1: Problem-based learning. Students work to solve a given problem in order to learn a domain. reasons: PBL entails a very structured procedure (steps toward solution) in order to solve the problem. This procedure is not taken into account in our service. Transferability questionnaire: Relevance of the service in other domains Types of domain Reason(s) Page 214 of 349 D7.4 - Validation 4 Types of domain for which the service would be suitable: setting 1: all domains with textual descriptions of a descriptive knowledge would suit (no images, no formulas, no procedural knowledge): literature, psychology, education, social and human sciences. reasons: LSA doesn‟t account for pictorial descriptions. Moreover, difficulties to address procedural knowledge, since the order of the steps for a procedure is crucial and difficult to be analysed by LSA (bag of words approach). setting 2: use Pensum to work on the minutes of a meeting. The minutes are in the course part, and the synthesis part is what each participant actually remembers about the meeting. The feedback enables to highlight the discrepancies between the minutes and what each participant has in mind after the meeting. This would allow to enhance the quality of the decisions made during the meeting. reasons: The task to agree with the minutes of a meeting is very close to the comprehension of a source text. A bad comprehension of some items of the minutes may yield their rephrasing. Types of domain for which the service would be less suitable: setting 1: see above: cartography-based geography, some medicine-oriented domains, etc. setting 2: all domain in which no large general language corpus exists. reasons: see above. Page 215 of 349 D7.4 - Validation 4 Section 6: Conclusions Validation Topics OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated N/A Qualifications to validation PVT1: Verification of accuracy of NLP tools OVT1.1 According to tutors, in a high proportion of cases, the feedback presented by the system correctly identifies the concepts missing in the learners‟ syntheses that are present in the source texts. CNEDLyon/Rouen (UPMF) Though the results are low on this OVT, they should be moderated by the fact that the task requires a high cognitive charge and heavily depends on the synthesis length. Moreover the task forces the tutors to analyse the text sentence by sentence whereas they usually work with larger text units. OVT1.2 According to tutors, in a high proportion of cases, the feedback correctly identifies concept present in the learners‟ syntheses, which are not present in the source texts. CNEDLyon/Rouen (UPMF) The off-topic detection shows better results than the previous function. A difficulty in the task which is inherent to self-regulated learning is that the learner decides of the length constraints of his/her summary. This makes the task of the tutor regarding off-topic more difficult (cf. expert quote). OVT1.3 According to tutors, in a high proportion of cases, the feedback correctly identifies CNEDLyon/Rouen (UPMF) The coherence feedback seems to be over generating which is partly due to sentence extraction algorithm. Page 216 of 349 D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated N/A Qualifications to validation gaps in the coherence of the learners‟ syntheses. PVT2: Tutor efficiency OVT2.1 The tutor spends less time preparing feedback. OVT2.2 It is easier (there is less cognitive load) for tutors to provide feedback using PENSUM compared with just reading learner texts CNEDLyon/Rouen (UPMF) CNEDLyon/Rouen (UPMF) Because the display must be improved ergonomically, tutors indicate some cognitive load using Pensum (The Likert scale reported in the question 11a is in reverse order). PVT3: Quality and consistency of (semi-) automatic feedback OR information returned by the system OVT3.1 The teacher„s activity shifts towards providing more advanced feedback. OVT3.2 The feedback given is more consistent than that of different tutors (there is more homogeneity among the responses provided to learners). CNEDLyon/Rouen (UPMF) CNEDLyon/Rouen (UPMF) Page 217 of 349 Insufficient evidence to categorise. The overall low precision and recall rates prevented users from assessing feedback consistency. What was assessed here was feedback relevance (the results are consistent with those for OVT 1.1 to D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated N/A Qualifications to validation 1.3). OVT3.3 Learners find the feedback given by the system is mostly correct. CNEDLyon/Rouen (UPMF) As indicated above, the overall Precision rate not quite satisfactory (34%, 26%, and 7%), consequently learners don‟t find the feedback correct. These results also explain why learners don‟t trust the feedback. OVT3.4 Learners find the feedback given by the system is relevant to (i.e. useful to them in) the task in hand. CNEDLyon/Rouen (UPMF) Even if learners indicate that Pensum is a help to learn their lessons, to them the feedback could be clearer and more precise. Nevertheless the correlation between the opinion on feedback relevance and the number of requested feedback indicate that the more learners requested feedback, the more they find it most relevant to the task in hand. This result allows us to hypothesize that taking advantage of the feedback is tightly linked with the learners' understanding of the system, which requires extensive use." OVT3.5 Learners trust the feedback provided by Pensum. CNEDLyon/Rouen (UPMF) Page 218 of 349 Even if learners can differentiate erroneous feedback, learners doubt of their validity because too much errors are prompted. D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated N/A Qualifications to validation OVT3.6 The tutors find the feedback given by the system at the right level considering the task in hand in a high proportion of cases. CNEDLyon/Rouen (UPMF) Likert not convincing enough for fully validated. These results concern the adequacy of the feedback level, not its reliability (see OVT 1.1 through 1.3). OVT3.7 Tutors perceive that the feedback from Pensum provides a reliable source of information about learners' conceptual coverage. CNEDLyon/Rouen (UPMF) It seems that tutors find a difference between the feedback given by Pensum and what one can infer from it about learners conceptual coverage. It is worth mentioning that the 'missing concept feedback' scores are perceptibly higher than the two other types of feedback on this link, which is understandable as the other two types of feedback are less directly linked with conceptual coverage, even though we believe they contribute to the overall quality of the work. Even though, the other two types of feedback do not seem to provide sufficient information on conceptual coverage, question Q47 offers respectable acceptance by the tutors. PVT4: Making the educational process transparent OVT4.1 Learners can receive CNED- Page 219 of 349 Learners can get just-in-time D7.4 - Validation 4 OVT Operational Validation Topic feedback whenever they want Validated unconditionally Validated with qualifications* Not validated Lyon/Rouen (UPMF) N/A Qualifications to validation feedback in order to revise their syntheses, even though some suggest they should not have to ask for them. PVT5: Quality of educational output OVT5.1 The textual output that is handed over to the teacher is of better quality. CNEDLyon/Rouen (UPMF) These results do not allow us to validate that the textual output that is handed over to the teacher is of better quality. In average, the Pensum group score is only marginally superior to the control group score (1st synthesis with a fake interface) and does not provide significant difference. PVT6: Motivation for learning OVT6.1 The direct feedback provided by the system encourages learners to undertake further study to address gaps in their coverage. CNEDLyon/Rouen (UPMF) Page 220 of 349 Compared to the control group, the feedback doesn‟t influence the learning motivation. 1. Rather high agreement of experimental group 2. Control group task and fake interface were not designed to motivate the participants. 3. The fact that questionnaires were filled twice turned out D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated N/A Qualifications to validation tedious. Nevertheless, correlations between learners‟ opinion about the way they experience Pensum‟s and the number of feedback requested shows that Pensum becomes a tool that facilitates investment in learning on the long run. PVT7: Organisational efficiency OVT7.1 There is a saving in institutional resources overall CNEDLyon/Rouen (UPMF) There is insufficient evidence to prove that there is a saving in resources. PVT8: Relevance OVT8.1 The service meets one or more institutional objectives CNEDLyon/Rouen (UPMF) PVT9: Likelihood of adoption OVT9.1 Users were motivated to continue to use the system after the end of the formal validation activities CNEDLyon/Rouen (UPMF) Page 221 of 349 Whereas both learner and tutors would like to use Pensum after the Pilot study, tutors show more reluctance due to the absence of proper learner-teacher communication device available from within the system. Indeed, unless D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated N/A Qualifications to validation such functionality is implemented, Pensum will be an extra resource to integrate to their everyday practice. OVT9.2 A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users). CNEDLyon/Rouen (UPMF) The UTAUT result is the indicator of the overall Pensum‟s evaluation. Indeed Pensum seems to be usable (positive opinion on Usability and Facilitating conditions dimensions) but its weaknesses notably on the feedback Precision and Recall influence learners‟ negative opinion on Effectivenness, Efficiency and Cognitive Load dimensions. Consequently, the overall Pensum‟s evaluation is not satisfactory. OVT9.3 Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? CNEDLyon/Rouen (UPMF) Participants of the dissemination workshop were not only tutors or teachers, but also from different institutions as industrialists, scholars, and people in politics. Thus, it was difficult to many of them to consider adopting Pensum in their practice because they have no actual educational practice. However the two teachers of the group indicated their adoption willingness. Page 222 of 349 D7.4 - Validation 4 Exploitation (SWOT Analysis) The objective you are asked to consider is: "Pensum (v. 1.5) will be adopted in pedagogic contexts beyond the end of the project". Strengths The strengths of the system (v. 1.5) that would be positive indicators for adoption are: Pensum provides on-demand support for learners engaged in writing syntheses Learners who engaged with Pensum were statistically likely to want to use it again Pensum provides learners with valuable hints concerning gaps in coherence and off-topic elements of their syntheses Pensum saves tutor time on marking and supporting learners during the writing task (see OVT 2.1 and 7.1) There is an indication that the quality of the final syntheses is better, where learners engage in requesting feedback, though further work is required to confirm this Pensum has no open source equivalent Weaknesses The weaknesses of the system (v1.5) that would be negative indicators for adoption are: The lack of validity of the feedback, despite the self-regulated learning-based functionalities (e.g. with concept missing feedback Pensum takes into account all lesson‟s sentences and teachers who take into account only the main ideas, consequently the Precision and Recall rates are low) Change in the focus for writing syntheses, from a word processor as the primary tool to Pensum, The lack of textual enhancement functionalities (a synonym dictionary, boldface, highlighting, lists, etc.) The interface management has to be improved: a feedback zone different from the writing zone a tutor interface separate to the writing interface and to the source text Steep learning curve Opportunities The system has potential as follows: Since many educational systems use synthesis or summary writing in their secondary levels, Pensum can be used at these levels, provided that adapted corpora are processed beforehand. Pensum can be used as a way to train tutors to be aware of the main features of the students‟ syntheses. To promote Self-Regulated Learning. This novel approach of learning is the object of a large number of works and Page 223 of 349 D7.4 - Validation 4 publications, so this increases the likelihood of Pensum being used and adopted by external researchers as ways to study or promote students‟ self-regulated learning. Threats Tutor resistance regarding the transfer of their work to a machine Too high expectations on the validity and the goals of the feedback may lead to inappropriate uses of Pensum. A detrimental use of Pensum is to think it is likely to replace teachers instead to provide some hints and guidance to learners in their writing. If Pensum's compatibility with common e-learning standards (like Moodle or Dokeos) is low, then companies or universities would be reluctant to adopt it. Overall conclusion regarding the likelihood of adoption of Pensum Version 1.5: Pensum v1.5 is not ready at the moment for wider adoption in educational settings, though there are indications that Pensum could, with improvements, provide useful 'any time, any place' support to learners. Pensum has also demonstrated that it can save tutor time spent in providing in-exercise feedback and final marking. Self-regulated learning is the object of a large number of research works and publications, and Pensum is likely to be more sustainable in the research community than in educational settings immediately following the end of LTfLL. The pilot demonstrated concerns about Pensum replacing tutors, so careful change management would be required for further implementation. This would suggest that Pensum must be sufficiently ready to attract higher management interest, in order to provide a suitable environment for change management. Learners also need encouragement to use Pensum: this pilot showed that validation results from learners who engaged with requesting feedback had statistically higher scores on a number of markers. This validation study highlights several problems that have to be resolved. Improvements that must be made are in four main directions. The main problem concerns the precision of the feedback, as there are too many errors compared to experts‟ feedback. Consequently learners cannot trust Pensum‟s feedback. This is likely to have had a very negative impact on a range of measures in the validation. Improvements to the user interface and on-line interaction with tutors would also be desirable. Up to now, Pensum has no open-source rival, though some research-based or commercial rivals of Pensum do exist (see D 5.3). The selfregulated learning -based writing-to-learn approach behind Pensum is promising and may lead to its adoption in several e-learning companies or universities following further enhancement. Most important actions to promote adoption of Pensum: Technical Solve the problem of feedback Precision and Recall. Page 224 of 349 D7.4 - Validation 4 Improve the ergonomic interface and add some indispensable functionalities. Research Consider further the relationship between Pensum and a word processor, from the viewpoint of the user being primarily located within the word processor Continue dissemination to the research community Exploitation in educational settings The precision/recall issue must be improved before further pilots take place Successful pilots are a prerequisite for further exploitation in educational settings Careful change management with regard to tutors is an absolute requirement; learners also need to be managed to encourage them to request feedback regularly Improve the training of learners. Page 225 of 349 D7.4 - Validation 4 Section 7 – Road map Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important future enhancements to the system in order to meet stakeholder requirements: Most important: 1. Improve feedback validity. Since Recall and Precision rates are sometimes low (respectively 39% and 26% for the off-topic feedback, 12% and 34% for the concept missing feedback, 85% and 7% for the gap in coherence feedback; see OVT 1.1 to 1.3 and OVT3.5 from D7.4), most learners do not fully trust the pieces of feedback prompted by Pensum Consequently an improvement might shift its focus on recall over precision, once a sufficient level of the latter is achieved, depending on the user‟s response to scenario enhancement number 1 (below). 2. Annotation functionalities and ergonomic improvements of the interface. Learners and tutors proposed some important enhancements of the interface: better text and synthesis display (independent, larger), text highlighting and commenting, synthesis formatting and eventually fully taking advantage of AJAX to compute feedback on the fly and display it to the user as they type (see barriers to adoption from tutors and learners interview, section 5 in D7.4). 3. Administrator interface. The managers agreed Pensum would be worth used in their educational contexts, but for a proper use, Pensum tutors or teachers need a fully integrated specific interface for managing students, courses, LSA spaces and language. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important changes to the current scenario(s) of use in order to meet stakeholder requirements: Most important: 1. Switch in focus on the feedback. Pensum can be viewed by the learner not as a feedback tool (which provides fully valid feedback every time) but also as an annotation tool that guides him/her in the writing process. Pensum can be viewed as a checklist of questions to pose in the process of understanding a course. To fully perform this scenario enhancement, system enhancement number 2 (above) should be completed as well. 2. Allow more appropriate learners scenarios of use and behaviors. The Round 3 Pilot study showed that only learners who persevere in the use of Pensum could positively affect their efforts on their learning motivation (e.g., correlations in OVT 3.4 and 6.1 in D7.4). So new scenarios would be worth devising. For instance: Ask learners to work not with sentences but after grouping sentences, in allowing them to group together sentences to create sense units before relaunching the analysis – this also might be a lead towards better feedback, system enhancement number 1 (above). Since the more the learners used Pensum‟s feedback, the more their opinion on it was positive, a new scenario of use would require the learners ask for feedback at predetermined moments of their production (e.g., once a paragraph is written). Page 226 of 349 D7.4 - Validation 4 Enhance tutors‟ role within the scenario of use of Pensum and work on their training, as well as this of the teachers. Devise scenarios that allow the learners to use Elgg as a social website for e-learning as a whole, and in which Pensum can be used. 3. Information toward managers. Managers‟ interview showed that Pensum do not have to be considered to save resources per se but rather to smoothen the educational process. Consequently Pensum roll out needs a continuous and careful educational support for teachers and learners. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are possible additional educational contexts for future deployment: Most important: Since many educational systems use synthesis or summary writing in their secondary levels, Pensum is likely to be used at these levels, provided that adapted corpora are processed. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important issues for future technical research to enable deployment of language technologies in educational contexts: Most important: 1. Consider alternatives to LSA for assessing free text. Recent research literature challenges the overwhelming domination of LSA (e.g., Pensum‟s engine) for assessing educational material. New methods have been proposed that aim at improving the basic similarity measures, like Probabilistic LSA, Latent Dirichlet Allocation, or Random indexing. 2. System adaptability. Traces from user's behavior with our services can prove to be a reliable source of information towards adaptability. The vector spaces chosen and the parameters used could be made to adapt better to variations of domain, source text, type of writing or even according to learner/tutor interaction with the system. Indeed, we now have infrastructure to allow the user to change the different threshold values and to explicitly question the feedback. Facilities towards implementing functionalities to allow the user to toy also with the vector space used have already been implemented. 3. Usability. A problem for Pensum (and Conspect as well) was to import of documents from a variety of sources (Word, PDF, web pages). A specific web-based tool would be very useful to provide this functionality. This tool would function as Firebug (http://getfirebug.com/) does for selecting elements: the learner would select for importation the relevant elements (content as opposed to ads or navigation menus) and it would be automatically uploaded and processed by the language technology-based application. Roadmap - validation activities Page 227 of 349 D7.4 - Validation 4 Further validation planned for beyond the end of the project: Claim (OVT): The ergonomic interface is satisfactory Methodology: Learner and tutor questionnaires Claim (OVT): The Pensum‟s interface allow to read the source text satisfactorily Methodology: Learner and tutor questionnaires Claim (OVT): The Pensum‟s interface allows to write the synthesis satisfactorily Methodology: Learner questionnaire Claim (OVT): Interaction with the learner/tutor is facilitate with Pensum Methodology: Learner and tutor questionnaires Claim (OVT): Pensum‟s feedback is viewed as guidelines to work on a course rather than prescriptions to follow mandatorily. Methodology: Learner questionnaire. Claim (OVT): Pensum‟s guidance doesn‟t force the learner/tutor to adopt undesired work habits. Methodology: Learner and tutor questionnaires. Claim (OVT): The more learners ask for feedback (and tutors guide learners about it), the more their opinion on it is positive. Methodology: Learners and tutor questionnaires, trace data. Page 228 of 349 D7.4 - Validation 4 Page 229 of 349 D7.4 - Validation 4 Appendix B.7 Validation Reporting Template for WP6.1 (IPP-BAS & Sofia University) Section 1: Functionality implemented in Version 1.5 and alpha / beta-testing record Brief description of functionality Version number of unit Changes from Version 1.0 Annotation service v1.5 In addition to the linguistic pipe that processes learning objects in English (for them to become searchable within FLSS), another one was compiled - for processing learning objects in Bulgarian. In this way, the stakeholders have the possibility to explore the multilingual facility – i.e. to retrieve relevant texts in two languages instead of only one. Also, the efficiency of the annotation process itself was optimized. Thus, the addition of newly processed materials to the repository is done to make it easier and faster. Lexicalisation service v1.5 The ontological new concepts have been tuned to the language specific lexicons - the newly added words and phrases from English, and especially – from Bulgarian materials. In this way, the semantic search within the repository will be more precise and with a better coverage. Statistics element v1.5 The service displays the number of concept occurrences per document on stakeholder‟s request. In this way, considering the frequency of the present concepts, the stakeholder can get a better impression whether the document is relevant to the topic, or not. Alpha-testing Pilot site and language IPP-BAS (Bulgarian) Date of completion of alpha testing: 20 Sept 2010 Page 230 of 349 D7.4 - Validation 4 Who performed the alpha testing? Kiril Simov, Petya Osenova, Laska Laskova Beta-testing Pilot site and language: IPP-BAS(Bulgarian) Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): Partially If ‘No’ or ‘Partially’, give reasons: The service has been embedded in Elgg, but the components require full screen beta-testing performed by: Stanislava Kancheva (tutor), Alexander Savkov (tutor) beta testing environment (stand-alone service / integrated into Elgg): integrated into Elgg HANDOVER DATE: 15 Oct 2010 Page 231 of 349 D7.4 - Validation 4 Section 2: Validation Pilot Overview NB Information about pilot sites, courses and participants has been transferred to Appendix A.3 Pilot task Pilot site: IPP-BAS, Sofia University Pilot language: Bulgarian What is the pilot task for tutors and how do they interact with the system? The tutors have to structure a course unit within an introductory course in IT domain, called “Introduction to HTML”. To select the topics, they use the information, provided by the domain ontology, and for selecting the relevant learning materials, they rely on the semantic search facility. What do the tutors produce as outputs? Selected materials and a draft course unit. The former are stored in the FLSS repository, while the latter is created outside the system. How long does the pilot task last, from the tutors starting the task to their final involvement with the software? Two weeks How do tutors/student facilitators interact with the learners and the system? There is no interaction with the learners - the goal is to develop course units on a selected topic in the IT domain. Tutors interact directly with the system in time of their preference - at work or at home, or both. There are no limitations on the number of interactions. Describe any manual intervention of the LTfLL team in the pilot: The tutor has to write down the structure of the course unit in a file, stored outside the system. He/she has also to add the titles of the found relevant learning objects within the system. Experiments Experiment 1 (only verification experiment): Page 232 of 349 D7.4 - Validation 4 Name of experiment: Evaluating the output of the three versions of the language pipe Objective(s): The FLSS team needs to know how efficient will be the addition of new learning material to the repository. On the other hand, the user needs to know what annotation suffices for her/his needs. Pipe 0.3 was the only version used as language pipe in version 1 of the semantic annotation. Some users reported during the validation that the annotation process was too slow. Thus, the semantic annotation service has been split into different language pipes, each of these adds a specific annotation. Details: The verification was performed by comparing the output of the three pipes with each other and to the gold standard. We selected three learning objects in the sub domain of HTML and annotated each of them using all three pipes. The results are as follows: Pipe 0.1 annotated 316 concepts; Pipe 0.2 annotated 282 concepts; and Pipe 0.3 annotated 299 concepts. The set of annotations produced by Pipe 0.3 completely contains the set produced by Pipe 0.2. Pipe 0.3 distributed via coreferential chains 17 concepts to pronouns, and 11 terms were annotated with more specific concepts. For the 11 more specific concepts we considered the old annotations when we compared Pipe 0.2 and Pipe 0.3. This comparison shows that usage of Pipe 0.2 reduces the number of the annotated phrases in the text, but is comparable with Pipe 0.3 concerning the concept coverage. Pipe 0.1 wrongly annotated 34 concepts compared to the output of Pipe 0.2. From these 34 concept annotations, 6 concepts are unique. The error rate is 10.7 %. Experiment 2 (also validated in OVT 1.1): Name of experiment: Semantic search verification Objective(s): Aims at providing evidence that the service returns relevant learning objects. Details: The verification was organized as a workshop with five tutors from IPP-BAS. The work was done in the period 27.09.2010 to 1.10.2010 with version 1.5 of the FLSS services. The tutors have been divided into two groups. The first one (two tutors) was shown a list of learning materials related to HTML which are available within the FLSS. They were asked to choose three topics and to augment them with all the relevant learning objects in the system. The result from this activity was a gold standard with respect to the specified topics and their underlying learning objects. The other group (three tutors) was asked to formulate queries with respect to one of the topics and to perform a semantic search for relevant material. The retrieved materials were then automatically compared to the gold standard for each topic. Two metrics have been calculated: Precision and Recall. For the semantic search verification the Precision is considered more important, since the retrieved material must be relevant to a topic. However, also Recall has been considered, because it leads to the conclusion that specific searches should be related to specific topics. Otherwise, recall drops. NB The results of this experiment is presented into more detail in D6.3. Page 233 of 349 D7.4 - Validation 4 Section 3: Results - validation/verification of Validation Topics OVT: 1.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: The high proportion of the learning objects (LOs), offered by the system, is relevant to the topic, chosen by the teacher. Summative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: LTfLL staff from IPP-BAS (n=5); 3 topics have been chosen; 2 people selected manually the relevant LОs from the database; 3 people performed an automatic semantic search and reported the results. Both data have been compared wrt Recall and Precision. Results: In the following table the Precision and Recall are presented. It turned out that 8 from 12 queries resulted in a precision higher than 50%. This result from verification shows that people learn very effectively how to specify queries which match their expectations with a high level of precision. This generally means to retrieve the appropriate materials fast. As expected, the Recall drops when the search becomes too specific with respect to a broadly defined topic. In order to explore the whole topic, the user needs to ask several queries. This observation leads us to plan a new extension of the query mechanism with a set of queries. Concept lt4el:Table lt4el:TableTag lt4el:TableCell lt4el:CaptionTag lt4el:TRTag lt4el:HTMLFontRelatedTag Precision 16,5 % 81,8 % 82,4 % 100 % 72,7 % 30,3 % Recall 93,8 % 56,3 % 87,5 % 6,3 % 50 % 73,6 % Page 234 of 349 D7.4 - Validation 4 lt4el:BoldTag & lt4el:HTML lt4el:ItalicTag & lt4el:HTML lt4el:BasefontTag lt4el:Image lt4el:Image & lt4el:HTML lt4el:Image & lt4el:HTMLTag OVT: 2.1 Pilot site IPP-BAS 67,4 % 63,5 % 100 % 13,7 % 38,1 % 67,3 % Pilot language Bulgarian 53,7 % 55,3 % 7,7% 86,4 % 72,4 % 40,4 % Operational Validation Topic: The teacher saves time when developing a course unit in FLSS compared to traditional means. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 7. It takes less time to complete my teaching tasks using FLSS than without the system. Experimental 4.7 0.58 100% 3 Tutors 8. Using FLSS enables me to work more quickly than without the system. Experimental 4.7 0.58 100% 3 Tutors 9. I do not wait too long before receiving the requested information. Experimental 4.0 1.00 67% 3 Tutors 34. It takes me less time to develop a course unit using FLSS, than without the system. Experimental 4.3 1.15 67% 3 Tutors 35. I find using FLSS to develop a course unit is a very time-efficient way of developing a course. Experimental 4.7 0.58 100% 3 Tutors 36. I find the process of developing a course unit is quicker using FLSS, compared with not using the system. Experimental 4.7 0.58 100% 3 Page 235 of 349 D7.4 - Validation 4 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “The system helps me to prepare a course faster than usual, because I do nоt waste time on searching for materials on the net” Focus Group 1: “Given the uploaded materials are approved by a competent tutoring authority, I would rather use the system than search, examine and select materials myself.” Focus Group 1: “I save time, because the repository provides already selected materials per topic.” OVT: 2.1 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic: The teacher saves time when developing a course unit in FLSS compared to traditional means. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 7. It takes less time to complete my teaching tasks using FLSS than without the system. Experimental 4.5 1.00 75% 4 Tutors 8. Using FLSS enables me to work more quickly than without the system. Experimental 4.3 0.96 75% 4 Tutors 9. I do not wait too long before receiving the requested information. Experimental 3.8 1.89 75% 4 Tutors 34. It takes me less time to develop a course unit using FLSS, than without the system. Experimental 4.3 0.50 100% 4 Tutors 35. I find using FLSS to develop a course unit is a very time-efficient way of developing a course. Experimental 4.5 0.58 100% 4 Page 236 of 349 D7.4 - Validation 4 Tutors 36. I find the process of developing a course unit is quicker using FLSS, compared with not using the system. Experimental 4.8 0.50 100% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “Simplify the user interface and that will speed up the process of getting used to the system even more” OVT: 2.2 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: The teacher invests fewer efforts (cognitive load) when developing a course unit in FLSS compared to traditional means. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 11a. Please rank on a 5-point scale the mental effort (1 = very low mental effort; 5 = very high mental effort) you invested to accomplish teaching tasks using FLSS. Experimental 2.3 0.58 33% 3 Tutors 11b. Overall, using the system requires significantly less mental effort to complete my teaching tasks than when using an Internet browser. Experimental 4.7 0.58 100% 3 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “Working with FLSS requires less effort than using a browser to search for materials - the information is structured, the materials are retrieved semantically.” Page 237 of 349 D7.4 - Validation 4 OVT: 2.2 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic: The teacher invests fewer efforts (cognitive load) when developing a course unit in FLSS compared to traditional means. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 11a. Please rank on a 5-point scale the mental effort (1 = very low mental effort; 5 = very high mental effort) you invested to accomplish teaching tasks using FLSS. Experimental 3.0 0.82 75% 4 Tutors 11b. Overall, using the system requires significantly less mental effort to complete my teaching tasks than when using an Internet browser. Experimental 2.8 0.96 25% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “It takes time to understand how the system works, to understand the difference between the word-based approach and the FLSS concept-based approach, but after that it‟s really easy”. OVT: 3.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: Teachers perceive that the learning materials offered by FLSS are useful to them in developing a course unit. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 6. The information the system provides me is accurate enough for helping me perform my teaching tasks. Experimental 4.0 1.00 67% 3 Tutors 37. FLSS provides learning materials that are relevant to my topic Experimental 4.0 1.00 67% 3 Page 238 of 349 D7.4 - Validation 4 Tutors 38. The learning materials retrieved are useful for the course design. Experimental 4.0 0.00 100% 3 Tutors 39. The majority of the retrieved learning objects fit my course topic. Experimental 3.7 0.58 67% 3 Tutors 40. I trust the system to offer me learning materials useful for the course I am designing. Experimental 4.3 0.58 100% 3 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “Probably for the regular teacher in a non-academic environment it would be a bit tricky to add new learning objects so it would be better for FLSS staff to provide more.” Tutors Focus Group 1: “I think the materials available in the system were suitable for an introductory course, but not as a source for the advanced students.” OVT: 3.1 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic: Teachers perceive that the learning materials offered by FLSS are useful to them in developing a course unit. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 6. The information the system provides me is accurate enough for helping me perform my teaching tasks. Experimental 4.5 1.00 75% 4 Tutors 37. FLSS provides learning materials that are relevant to my topic Experimental 4.3 1.50 75% 4 Tutors 38. The learning materials retrieved are useful for the course design. Experimental 4.3 0.96 75% 4 Tutors 39. The majority of the retrieved learning objects fit my course topic. Experimental 4.0 0.82 75% 4 Tutors 40. I trust the system to offer me learning materials useful for the course I am designing. Experimental 4.5 1.00 75% 4 Page 239 of 349 D7.4 - Validation 4 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “The materials fit to the curriculum and it‟s easy to decide which documents to recommend.” Tutors Focus Group 1: “Probably we are going to use FLSS for redesign of some parts of the courses “Administering SQL Server” and/or “Querying MS SQL Server”. OVT: 4.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: Using the ontology assists the teacher in establishing the hierarchy of main concepts within the course unit. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 41. Browsing the ontology helps me decide which topics to include in my course. Experimental 4.0 1.00 67% 3 Tutors 42. Browsing the ontology helps me to structure my course in a comprehensive way (so that all important aspects are covered). Experimental 4.3 0.58 100% 3 Tutors 43. Browsing the ontology helps me see the relationships between different concepts in my course. Experimental 4.3 1.15 67% 3 Tutors 44. Browsing the ontology helps me to introduce concepts in a logical order in my course. Experimental 4.3 1.15 67% 3 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “Usually when I prepare a course, I already have an idea about the course structure grounded in my own understanding about the educational process, but it is always useful to get another point of view.” Tutors Focus Group 1: “I would prefer the thematic classification to the ontological one. For example, if I search for concepts, related Page 240 of 349 D7.4 - Validation 4 to „font‟, I would like to see also size, font type etc. in the same place.” OVT: 4.1 Pilot site Sofia university Pilot language Bulgarian Operational Validation Topic: Using the ontology assists the teacher in establishing the hierarchy of main concepts within the course unit. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 41. Browsing the ontology helps me decide which topics to include in my course. Experimental 4.5 0.96 75% 4 Tutors 42. Browsing the ontology helps me to structure my course in a comprehensive way (so that all important aspects are covered). Experimental 4.0 0.82 75% 4 Tutors 43. Browsing the ontology helps me see the relationships between different concepts in my course. Experimental 4.5 1.00 75% 4 Tutors 44. Browsing the ontology helps me to introduce concepts in a logical order in my course. Experimental 4.3 1.00 75% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “The ontology helps a lot in constructing a hierarchy of domain specific concepts, which forms the basis of the course.” Tutors Focus Group 1: “The present ontology helps me if I prepare a basic course. However, I think that for master and PhD courses I would need a more complex resource as a support service, which includes more relations besides „is-a‟.” Page 241 of 349 D7.4 - Validation 4 OVT: 5.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: The teacher thinks that the quality of the derived main structure of a course, together with its relevant support material, is good. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 5. The FLSS helps me to improve the quality of my support to learners. Experimental 4.3 0.58 100% 3 Tutors 45. I believe that the main structure of the course I have designed is of good quality. Experimental 4.0 1.00 67% 3 Tutors 46. I believe that the content of the course I have designed is of good quality. Experimental 4.0 1.00 67% 3 Tutors 47. Overall, I am satisfied with the course I have designed. Experimental 4.0 0.00 100% 3 Tutors 48. I believe that FLSS has helped me design a better course than when using traditional means. Experimental 4.3 0.58 100% 3 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “Once I get an idea about how the information on the course topic is structured, it is much easier to explore different curriculum models consistent with the level of knowledge of the learners group.” Tutors Focus Group 1: “I‟m quite satisfied with my course structure.” Page 242 of 349 D7.4 - Validation 4 OVT: 5.1 Pilot site Sofia university Pilot language Bulgarian Operational Validation Topic: The teacher thinks that the quality of the derived main structure of a course, together with its relevant support material, is good. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 5. The FLSS helps me to improve the quality of my support to learners. Experimental 4.8 0.50 100% 4 Tutors 45. I believe that the main structure of the course I have designed is of good quality. Experimental 4.0 0.82 75% 4 Tutors 46. I believe that the content of the course I have designed is of good quality. Experimental 4.5 1.00 75% 4 Tutors 47. Overall, I am satisfied with the course I have designed. Experimental 4.3 0.50 100% 4 Tutors 48. I believe that FLSS has helped me design a better course than when using traditional means. Experimental 4.5 0.58 100% 4 Formative results with respect to validation indicator Stakeholder type Results Tutors Focus Group 1: “The good point is that students can easily find suitable materials on any of the topics in my course.” Tutors Focus Group 1: “I wish the depth of the suggested structure to be more balanced. I discovered that sometimes the system helps me with more elaborated hierarchical structure, but sometimes it seems to be flat. Then I have to use other ways in order to make it sufficiently detailed.” Page 243 of 349 D7.4 - Validation 4 OVT: 5.2 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: An advantage of FLSS is that the search can return learning materials in other languages, providing teachers with a wider range of materials for multi-lingual learners Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 49. It is useful to be able to include learning materials in more than one language (e.g. English, Bulgarian) in my course unit. Experimental 5.0 0.00 100% 3 Tutors 50. FLSS helps me find useful learning materials in more than one language Experimental 4.0 0.00 100% 3 Experimental 4.0 1.00 67% 3 51. FLSS provides me with a better choice of learning materials because it offers me materials in more than one language. Formative results with respect to validation indicator Tutors Stakeholder type Results Tutors Focus Group 1: “Since most of my students read in English, and the lexicon in this particular domain is strongly influenced by English terminology, it‟s very useful to have documents on one topic in the two languages - Bulgarian and English.” OVT: 5.2 Pilot site Sofia university Questionnaire type Pilot language Bulgarian Operational Validation Topic: An advantage of FLSS is that the search can return learning materials in other languages, providing teachers with a wider range of materials for multi-lingual learners Experimental / control group Questionnaire no. & statement Page 244 of 349 Mean Standard deviation %Agree / Strongly agree n= D7.4 - Validation 4 49. It is useful to be able to include learning materials in more than one language (e.g. English, Bulgarian) in my course unit. Tutors 50. FLSS helps me find useful learning materials in more than one language 51. FLSS provides me with a better choice of learning materials because Tutors it offers me materials in more than one language. Formative results with respect to validation indicator Tutors Experimental 4.8 0.50 100% 4 Experimental 4.8 0.50 100% 4 Experimental 3.8 0.96 50% 4 Stakeholder type Results Tutors Focus Group 1: “As expected, there are a lot more materials in Bulgarian, than in English within the available LOs. But the quality of the latter is better.” OVT: 7.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: There is a saving in institutional resources overall Formative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Interview with tutors from IPP-BAS (n=4). Results: The tutors pointed out that there are several conditions that have to be fulfilled for the system to save institutional resources: - to build a repository with sufficient number of learning objects, that once established, can be easily updated with new documents, while keeping the quality of selections per topic; - to be able to modify the ontologies; for example, to add new concepts or new lexicalisations. When these conditions are met, FLSS could prove to be very efficient in reducing most of the time-consuming activities related to teaching preparation. Page 245 of 349 D7.4 - Validation 4 OVT: 8.1 Pilot site IPP-BAS Pilot language Bulgarian Operational Validation Topic: The service meets one or more institutional objectives Formative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Interview with teaching manager from IPP-BAS (n=1). Results: Our teachers and our students are respectively specialized/specializing in IT area. Thus, they are very demanding when considering the adoption of new software. It takes time in order to trust it, and usually our politics is to adopt an already world-wide popular system (such as Moodle or ILIAS). On the other hand, we are open to experiments with new systems. If it remains open for public usage and if our tutors (and students) start to use it, we can consider the institutional adoption in more serious terms. To share my own opinion on why I think FLSS is likely to be used: In terms of efficiency, one of the main advantages of the system is the reusability of the learning objects. The benefits are twofold – tutors can select from a number of materials and students have immediate access to them. The number of regular students who go to work is increasing and the system may provide a way to cope with this problem. Another “burning issue” right now is the ratio between teaching and research time of the tutors, so any tool that can decrease the former in favour of the latter, while keeping and even improving the quality of the educational process, is welcome. The service can be used also for the purposes of internal training, group role detection, as a tool to inspect the students‟ outcome of learning by authoring activity. The usage of FLSS depends on different integration issues, licensing and many other requirements that might arise. OVT: 8.1 Pilot site Sofia University Pilot language Bulgarian Operational Validation Topic: The service meets one or more institutional objectives Formative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Interview with teaching manager from Sofia University (n=1). Page 246 of 349 D7.4 - Validation 4 Results: I am Vice-Dean who is responsible for the BA programs at Faculty of Slavic languages. In our Faculty, there are already some optional classes at BA level in IT in order to prepare the students for the MA program in Computational Linguistics and for working with more advanced tools over the language. We are at the beginning of experimenting with eLearning courses in various areas. Thus, we trust Moodle, for example. FLSS is very suitable for our purposes due to the following reasons: 1. the tutors that have been involved in the validation, share their opinion that after the initial effort in getting acquainted with the system, they started to use it easily and became eager to master in it; 2. FLSS is free, webbased, the support team is very close to us and we can rely on them; and 3. the services suit our requirements for basic (not advanced) courses in IT. In our educational system, the tutor has the freedom to choose her/his approaches to make a course or to work with the students. I can only support the popularization of FLSS within our Faculty. I would be more confident, however, if Faculty of Mathematics and Informatics share their opinion on this system. OVT: 9.1 Pilot site IPP-BAS Pilot language Bulgarian Users were motivated to continue to use the system after the end of the formal validation activities Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 21. I would recommend this system to other teachers to help them in their teaching. Experimental 4.7 0.58 100% 3 Tutors 22. I am eager to explore different things with FLSS. Experimental 4.7 0.58 100% 3 Tutors 29. I would like to use the service in my teaching after the pilot. Experimental 4.7 0.58 100% 3 Tutors 30. If the service is available after the pilot, I will definitely use it in my teaching. Experimental 4.7 0.58 100% 3 Formative results with respect to validation indicator The system remained opened after the validation. The stakeholders were ensured that the FLSS team could help them with additional materials into the repository, processing and handling a new ontology with a lexicon (if needed). The participants expressed interest in using it. However, at this time of the academic year nobody had a course to prepare in IT. For that reason, we suggested this OVT to be validated additionally in the questionnaire. Page 247 of 349 D7.4 - Validation 4 Stakeholder type Results Tutors Focus Group 1: "The work with the system stimulates me to experiment and try different ways to organize my course, so I intend to use it in the future.” Tutors in dissemination workshop Focus Group 1: “I would consider adopting the software because it provides a useful framework to optimize the learning processes.” OVT: 9.1 Pilot site Sofia University Pilot language Bulgarian Users were motivated to continue to use the system after the end of the formal validation activities Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly agree n= Tutors 21. I would recommend this system to other teachers to help them in their teaching. Experimental 5.0 0.00 100% 4 Tutors 22. I am eager to explore different things with FLSS. Experimental 4.8 0.50 100% 4 Tutors 29. I would like to use the service in my teaching after the pilot. Experimental 5.00 0.00 100% 4 Tutors 30. If the service is available after the pilot, I will definitely use it in my teaching. Experimental 5.00 0.00 100% 4 Formative results with respect to validation indicator The system remained opened after the validation. The stakeholders were ensured that the FLSS team could help them with additional materials into the repository, processing and handling a new ontology with a lexicon (if needed). The participants expressed interest in using it initially as it is, since the courses are basic at a Humanity Faculty. One of the tutors kept using it, since this semester he has a course in IT at MA level. In order to get other stakeholders‟ opinion, we suggested this OVT to be validated in the questionnaire. Stakeholder type Results Page 248 of 349 D7.4 - Validation 4 Tutors Focus Group 1: “While I was testing the system, I kept thinking “What if it was not an IT domain ontology, but Linguistic ontology - that would be really nice!” - so I would very much like to use the FLSS again to my benefit.” Tutors Focus Group 1: “It was fun exploring the system, so I would recommend it to my colleagues.” Page 249 of 349 D7.4 - Validation 4 OVT: 9.2 Pilot site IPP-BAS Sofia University A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption). Pilot language Bulgarian Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Generic questionnaire (tutors). Because of low numbers, results were aggregated for the two pilot sites Results: Descriptive Statistics - Tutors N Mean Std. Deviation Effectiveness 7 4,55 ,356 Efficiency 7 4,21 ,918 Cognitive Load 7 3,14 ,378 Usability 7 4,20 ,566 Satisfaction 7 4,71 ,300 Facilitating conditions 7 4,29 ,591 Self-Efficacy 7 4,10 ,738 Behavioural intention 7 4,86 ,378 Transferability 7 3,86 ,476 IPP-BAS & SU 7 4,38 ,414 Valid N (listwise) 7 Page 250 of 349 D7.4 - Validation 4 OVT: 9.3 Pilot site IPP-BAS Pilot language Bulgarian Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? Formative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors from IPP-BAS (n=8) Results: 6 tutors stated that they would like to adopt the service for their own purposes, while 2 were not sure whether the service is useful to them at this stage of its development. The positive tutors are involved in European or national projects, I which they see a perspective to rely on FLSS for their work. Tutors OVT: 9.3 “We are developing a national project for creating new application-oriented methods and end-user oriented tools for Semantic Web Service descriptions oriented to Technology Enhanced Learning (http://sinus.iinf.bas.bg/index.php ). The project uses “learning by authoring” approach in the Bulgarian Iconography domain. The learners are crating a multimedia document that contains primary multimedia recourses and some texts. We might use some part of FLSS in order to support the instructor when she reviews the progress of the learners by mean to support them in their work.” Pilot site Sofia University Pilot language Bulgarian Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors from Sofia University (n=4) Results: All the tutors stated that they would like to use the service for their own tasks. Page 251 of 349 D7.4 - Validation 4 Tutors OVT: 9.4 “I can prepare my course on mark-up languages for the MA in Computational Linguistics.” Pilot site IPP-BAS Pilot language Bulgarian Teachers and managers are motivated to adopt the system, because it suggests multilingual search. Formative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors from IPP-BAS (focus group, n=3) Results: All tutors included materials both in Bulgarian and English. They stated that one of the most obvious advantages of FLSS is its multilinguality. Even though there is a shared observation that students are more willing to read materials in one language, still this feature allows more options for students with different needs and preferences. OVT: 9.4 Pilot site Sofia University Pilot language Bulgarian Teachers and managers are motivated to adopt the system, because it suggests multilingual search. Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Tutors from Sofia University (focus group, n=4) Results: Again, both materials in Bulgarian and English were selected as a result from testing the system. The main argument was that in the IT domain English tends to be the “working language” and even if there are sufficient LOs in Bulgarian, it is almost necessary to provide materials in English – there are a lot of terms in the Bulgarian IT domain lexicon that are not unanimously accepted. Page 252 of 349 D7.4 - Validation 4 Section 4: Results – validation activities informing future changes / enhancements to the system VALIDATION ACTIVITY Pilot partner: IPP-BAS Service language: English Additional formative results (not associated with validation topics) Alpha testing Annotation visualisation is confusing when there are more than one annotations / comments. Time for ontology load is too long in case two lexicons for two languages are attached to it. Beta testing Multiple upload of documents does not work. Management over the retrieved documents is problematic (if you focus on one piece, the other disappear). Tutor interviews Text extraction from different types of documents is not done within the system. The upper part of the ontology is not understandable to the users. It is too abstract with respect to the specific topic of interest. It is difficult initially to remember the sequence of all steps in FLSS, when handling learning materials. The definitions of the concepts are not in Bulgarian, and they disappear quickly, which hampers the work process. Within the bunch of the retrieved documents it is difficult to mark and manage the current selection. At first glance it seems that one and the same concept appears in several places in the ontology. Stakeholders do not understand why. They prefer the discriminative kind of information representation. Tutor workshop(s) When a concept is missing, it is difficult to be handled by the ontology enrichment service. There is no clear procedure how to do that. There is no aggregated statistic information, provided for the user defined groups of LOs. It is confusing to be shown the possibility of opening either the repository, or the ontology. These steps should be ordered. For example, first – the ontology, second, the repository, or vice versa. It is not clear how to find the most appropriate concept for a topic when browsing the ontology. There is no clear mechanism on how to adapt the domain ontology to newly coming information in the area. Page 253 of 349 D7.4 - Validation 4 I still need time to decide how I might use FLSS as a supplemental tool to LMS and my traditional methods. I need to invest some efforts in adapting the semantic search module in FLSS for the purpose of our national project on Semantic technologies for Web Services and technologically supported learning (Д-002-189/16.12.08). Teaching manager interview “I am not satisfied with the fact that the control over the course creation process is not entirely controlled by FLSS. Thus, the ready structure and related LOs cannot be validated within the system.” “I am not satisfied with the fact that the interface is only in English. The native language of the learners should be equally supported by the system.” VALIDATION ACTIVITY Pilot partner: Sofia University Service language: English Additional formative results (not associated with validation topics) Alpha testing N/A (alpha testing had been done only at IPP-BAS) Beta testing N/A (beta testing had been done only at IPP-BAS, since the participants from SU wanted to test the ready version of the software) Tutor interviews Documentation lacks on some basic, but important steps in the course creation process. It slows down the work. The tutor is confused what to do next, when all the necessary support windows with information are opened. The tutor is initially confused with the procedure of saving documents at different stages of his/her work. It is not always clear what you save exactly and where. An aspect of usage is pre-testing the learners, but this feature is very sensitive to different licensing, integration and organizational issues. Tutor workshop(s) The main problem remains how to add new concepts and new materials into FLSS. The interface in some parts has a smaller resolution, which leads to waste of time, when trying to adjust it accordingly. The big number of error messages that pop up at different stages of work become quite annoying. Page 254 of 349 D7.4 - Validation 4 The fact that you have to write down the course out of FLSS is annoying. Teaching manager interview “There are only initial steps in the system towards ensuring interactivity among tutors (for example, exchange of comments on their opinions and deletions, combinations, insertions of learning material).” “I am not quite sure how FLSS might fit the Moodle architecture”. Page 255 of 349 D7.4 - Validation 4 Section 5: Results – validation activities informing transferability, exploitation and barriers to adoption VALIDATION ACTIVITY Partner(s) involved: IPP-BAS Service language: Bulgarian Additional formative results (not associated with validation topics) Alpha testing Major issues encountered in transferring FLSS to Bulgarian: Concept annotation accuracy for morphologically rich language. The high frequency of English terms in Bulgarian texts called for the application of NLP components for both languages Major issues encountered in transferring FLSS to IT domain: Sparseness of up-to-date advanced Bulgarian LOs in the IT domain Beta testing Bulgarian IT domain lexicon variability and the quality of the documents in this language influence the performance of the system. Localisation of the interface is desirable. Tutor interviews The users would like to see Bulgarian language in more active use. For example, the QuickFind facility operates only on URI. There are no sub-domains in IT, elaborated enough for the teaching purposes. The tutors would not like to do the pre-selection and processing of the learning data themselves. They feel uncertain about when, how and by whom these adjustments would be made. I would be surprised, if the system could be used in the Mathematics domain, since FLSS relies on coherent text data, not on formulas, tables, numbers. It could be hard for regular teachers to add new learning objects so it is better to develop them. Further user evaluations are needed to advertise successfully a system like FLSS. I need to work longer with FLSS before I can give any ideas for its further development. Tutor workshop(s) The FLSS approach should be offered together with a relevant pedagogical strategy what to do next. Page 256 of 349 D7.4 - Validation 4 The corpus of learning objects is too small, especially in Bulgarian. Data-driven approaches to teaching are limited, because the ontology imposes some constrains over the domain modelling. Teaching manager interview “Adopting the system institutionally means more workload for our administration.” “There are members of the teaching staff that will initially refuse to use the system due to their unawareness of system‟s advantages.” “Our courses are specialized in the IT domain. Thus they already presuppose some basic knowledge of the learners in the area. In this respect the ontology coverage and the number and variety of learning objects is not enough”. “I want to test FLSS in a real setting, and get more feedback from both – tutors and students. Then I can consider the system for adoption.” VALIDATION ACTIVITY Partner(s) involved: Sofia University Service language: Bulgarian Additional formative results (not associated with validation topics) Alpha testing N/A, system is only alpha tested at IPP-BAS Beta testing N/A (FLSS only tested at IPP-BAS, since SU participants wanted to test the ready version of the system) Tutor interviews It takes some time until the tutors understand how to use the system in the best way: complementary to their traditional means or as a substitution to them. There is no enough metadata about the present learning objects for the tutor to get oriented quickly. The presentational aspect of displaying the learning object is very limited. No partitioning into Introduction, Illustrations, Exercises, etc. Tutor workshop(s) Building a repository of LOs will require more efforts from the staff at the beginning. This might have negative effect on their motivation to use the services. In order for the tutors to add their own materials into FLSS and to process them successfully, they need more Page 257 of 349 D7.4 - Validation 4 background information what the idea and architecture is behind the system. For some topics, tutors get a partial or schematic structure for the intended course. Teaching manager interview “At the moment there is no a well-established program for getting basic knowledge in IT area at the Faculty (only one optional course for BA and an MA in Computational Linguistics). This limits the interested tutors to the number of only few.” The Faculty cooperates closely with the Faculty of Mathematics and Informatics. They use Moodle for eLearning. Thus, the manager would prefer the FLSS team to try to establish FLSS there in the first place. The manager thinks they are more experienced, and therefore – he would appreciate their opinion. FLSS is a web-based facility, but the manager is uncertain on whether it would work properly if more people start to use it. Transferability questionnaire: Institutional policies and practices Sofia University uses Moodle as the only learning platform. Staff at IPP-BAS do not use systems for course preparations, and while some tutors could adopt the services in their work, we do not expect IPP-BAS to impose this. Transferability questionnaire: Relevance of the service in other pedagogic settings Pedagogic setting Reason(s) Pedagogic settings for which the service would be suitable: self-directed learning directed learning course creation revising for exams Semantic search allows better location of the necessary information. The pre-selection of the learning materials in a specific (sub)domain saves time to the stakeholders, since it ensures relevant information quickly and easy The links to an ontology and to other similar documents provide means for better understanding of the topics in a structured way Pedagogic settings for which the service would be less Page 258 of 349 D7.4 - Validation 4 Pedagogic setting suitable: social learning essay writing Reason(s) the ontology reflects the common sense expert knowledge in a certain domain there is no support for grading essays Transferability questionnaire: Relevance of the service in other domains Types of domain Reason(s) Types of domain for which the service would be suitable: knowledge oriented domains each domain requires knowledge supporting resources (ontology, lexicons) Types of domain for which the service would be less suitable: Domains that are more skill oriented than knowledge oriented are not very suitable for the service The service requires an ontology to support the semantic search and similarity measure between documents. If the domain is not appropriate for such conceptualization, then the service is not appropriate Page 259 of 349 D7.4 - Validation 4 Section 6: Conclusions Validation Topics OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated Qualifications to validation WP6.1 IPP-BAS PVT1 Verification of accuracy (numeric) 1.1 The high proportion of the learning objects (LOs), offered by the system, is relevant to the topic, chosen by the teacher. PVT2 Tutor efficiency 2.1 The teacher saves time when developing a course unit in FLSS compared to traditional means. IPP-BAS SU 2.2 The teacher invests fewer efforts (cognitive load) when developing a course unit in FLSS compared to traditional means. IPP-BAS PVT3 Quality and consistency of info returned by system IPP-BAS Page 260 of 349 SU IPP-BAS: The result depends very much on tutor‟s query approach when searching, depending on whether the topic is broad (more reliable) or narrow (recall drops). SU: No data – the verification was performed at IPP-BAS only. D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally 3.1 Teachers perceive that the learning materials offered by FLSS are useful to them in developing a course unit. IPP-BAS SU PVT4 Making the educational process transparent 4.1 Using the ontology assists the teacher in establishing the hierarchy of main concepts within the course unit. PVT5 Quality of educational output 5.1 The teacher thinks that the quality of the derived main structure of a course, together with its relevant support material, is good. IPP-BAS SU 5.2 An advantage of FLSS is that the search can return learning materials in other languages, providing teachers with a wider range of materials for multi-lingual learners IPP-BAS SU PVT6 Motivation for learning 6.1 NOT APPLICABLE PVT7 Organisational efficiency 7.1 There is a saving in institutional resources overall PVT8 Relevance Validated with qualifications* Not validated Qualifications to validation IPP-BAS SU There is insufficient evidence to prove that there is a saving in resources. Page 261 of 349 D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally 8.1 The service meets one or more institutional objectives IPP-BAS SU PVT9 Likelihood of adoption 9.1 Users were motivated to continue to use the system after the end of the formal validation activities IPP-BAS SU 9.2 A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption). IPP-BAS SU 9.3 Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? SU 9.4 Teachers and managers are motivated to adopt the system, because it suggests multilingual search. IPP-BAS SU Validated with qualifications* Not validated IPP-BAS Qualifications to validation IPP-BAS: Unclear of the extent to which individual tutors attending would use FLSS educationally (rather than in research), though future use in SINUS noted Exploitation (SWOT Analysis) The objective you are asked to consider is: "FLSS (v1.5) will be adopted in pedagogic contexts beyond the end of the project". Strengths The strengths of the system (v1.5) that would be positive indicators for adoption are: FLSS saves time in course construction FLSS supports reuse of learning objects in different tasks and courses Page 262 of 349 D7.4 - Validation 4 FLSS supports reuse of learning objects in multilingual settings as multilingual retrieval of learning materials is provided ontology helps tutors to structure the course teaching process is optimized because it provides easy search for relevant materials and facilitates the course creation Weaknesses The weaknesses of the system (v1.5) that would be negative indicators for adoption are: complex interface although tutors can use the ontology to assist in structuring a course, they cannot build the whole course in FLSS adding new learning objects addition is not trivial a lot of initial investment of effort in familiarization with the system no thematic classification along with ontological is-a model adoption to another domain requires effort accuracy of retrieval could be further improved response time could be improved Opportunities The system has potential as follows: FLSS can be used for evaluating already compiled curricula. FLSS can effectively maintain the teaching process. FLSS substantially reduces the tutors‟ time for course preparation, and thus – frees time for research and professional development of the teaching staff. FLSS can support semi-automatic generation of metadata, which can be used as additional features for retrieval of relevant learning material. FLSS can support other domains and languages (for example, the SQL related courses mentioned by tutors). It is better suited for all domains, in which the conceptual knowledge (not skills) is the main target in the learning process, and which have or might have a formalized domain ontology equipped with lexicons, and for all languages, which have basic NLP tools for initial processing. Threats The tutors are not used to working with a complex system. Thus, they need some time and good will to adjust and Page 263 of 349 D7.4 - Validation 4 discover its advantages. Specific LMSs are sometimes used in the teaching process, in which, however, this system is not incorporated. Thus, additional tuning by FLSS staff and willingness by the tutors are required for a smart combination and better results. The usefulness of the FLSS repository depends very much on the topic, the level of teaching (BA, MA, other), the pedagogic task, since it is of a limited size and needs updates with respect to the mentioned factors. In many domains FLSS will not be accepted due to lack of formally expressed conceptual information. The adoption of FLSS can be delayed due to lack of sufficient quantity of learning materials. Page 264 of 349 D7.4 - Validation 4 Overall conclusion regarding the likelihood of adoption of FLSS Version 1.5: FLSS was well-received at the pilot sites and tutors found two aspects of FLSS helpful in developing a course: the retrieval of multilingual learning materials and the ontology for structuring the course. These aspects led to savings in tutor time, a major institutional driver. Teaching managers at both pilot sites would consider further use of the software, for (1) teaching and (2) evaluation of curricula. The pilot sites noted that they would like to test FLSS in various contexts prior to the institutional adoption. It takes time for the people to see the full potential of FLSS, since they usually rely on already popular and worldwide famous architectures (such as Moodle and ILIAS). Adoption beyond these sites is dependent on further dissemination and exploitation activities, as well as the availability of a suitable repository of learning objects and ontology. Experience at SU showed that FLSS is not intuitive to new users, so work on improving the user interface will be important for extending adoption. Further extension of FLSS to internalise the entire course creation process within FLSS would also be helpful. Overall, our conclusion is that the software v1.5 meets a real need in an effective way, though with a very small repository and for a restricted area of the IT domain. The cost of small scale pilots (as proposed at IPP-BAS and SU for further exploration) is likely to be prohibitive for most institutions/courses. The effort involved in setting up and maintaining the repository of learning materials and possibly the ontology would suggest that FLSS can only be adopted widely for course construction where there are national or international initiatives to fund these activities. Most important actions to promote adoption of FLSS: Technical make the user interface more intuitive internalise the process of course development within FLSS further improve the accuracy of retrieval of learning materials the response time could be improved optimize the NLP processing module make FLSS compatible with the LMS architecture so it can interoperate with institutional VLEs Dissemination and exploitation establishing of user groups in the two sites of the validation with regular workshops to discuss problems related to concrete usage of the system, extension of ontologies, lexicons and the repository (see the next point) provide more use cases in order to make explicit its full potential provide scenario and use case in another (sub)domain (SQL, for example) Page 265 of 349 D7.4 - Validation 4 provide exhaustive guidelines on the FLSS exploration organize dissemination activities and attract interested parties Organisational create a mechanism for enriching the repository, ontology, lexicons and grammars in collaboration with other educational institutions or at national/international level. providing help to the users for creation of resources (ontologies, learning objects, etc) for new domains Usability providing more interactivity between the system and the stakeholder. enriching the explicit information over the learning material (relations among concepts and terms; statistics of concept occurrences over a group of documents, etc.) Page 266 of 349 D7.4 - Validation 4 Section 7 – Road map Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important future enhancements to the system in order to meet stakeholder requirements: Most important: 1. to make the process of structuring of a course with augmented LOs internal to FLSS. Now only the facilities of browsing the ontology and retrieve the relevant material are manageable inside. 2. to show aggregated statistics over not only one document, but also over the learning objects selected by the stakeholder 3. to enrich the visual representation (for example, bigger windows for manipulation, more explicit connections among the data) 4. to allow more interventions and interactivity on the user‟s side (for example, adding concepts in the ontology or exchanging comments on the data) 5. to localize the interface in Bulgarian (with a possibility for other languages different from English). Other: to reduce the ontology upload time when working with bigger ontologies to implement data architectures for more use cases (for example, to upload another ontology with a related lexicon) to make FLSS compatible to the LMS architecture Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important changes to the current scenario(s) of use in order to meet stakeholder requirements: Most important: 1. to organize more gradual acquaintance with FLSS for the stakeholders before starting its exploration. 2. to provide several possible templates of using FLSS, apart from the main one, which is: first, consulting the ontology and then searching for the relevant LOs. Possible cooperation with psycholinguists and designers of user interfaces is envisaged here. 3. providing evidence that the scenario might be parameterized for different tasks (basic or advanced course, etc.) and in more subdomains 4. to build the scenario into a real-life architecture (for example to test in on a whole course, not just on a course unit) 5. to set up a full multilingual architecture for at least two languages (interface, rich lexicon, search). Other: Page 267 of 349 D7.4 - Validation 4 a clear procedure for automatic and manual support of the users should be designed and tested by FLSS team. presentational partitioning is to be done into Introduction, Illustrations, Exercises, etc. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are possible additional educational contexts for future deployment: Most important: 1. Educational institutions and other interested parties in Europe should be attracted to contribute to a larger and well-structured in sub-domains repository, based on ASP, and also to provide resources for other languages. Initially, the European CLARIN infrastructure is considered for such cooperation. 2. Courses for advanced students in IT at Universities can be developed. 3. FLSS can be tested in various sub-domains (web design, editing, presenting, etc.) 4. FLSS can be introduced in the high schools (especially 10 and 11 grades, where a curriculum in IT is followed) 5. FLSS can be used for evaluating of already designed curricula in other educational projects. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important issues for future technical research to enable deployment of language technologies in educational contexts: Most important: 1. A more stable synchronization among the various language resources is envisaged in the interface - ontologies, lexicons, LOs. For example, there is no visible connection between the concept in the ontology and the concept occurrences within LOs. Also, users get confused by the chaotic and cumulative suggestions for using the various facilities. 2. More efforts have to be invested into the optimization of the NLP processing pipes with respect to a domain and a language. There should be a mechanism for adjusting the existing tools to work together in a pipe. Also, the addition of a new (sub)domain or language should be simplified. 3. The integration or compatibility of stand-alone systems into bigger architectures (such as LMS-es) needs further investigation. When the possible interaction between such systems is clearly defined in advance for the stakeholders, the adoption in various educational contexts would become more likely. 4. The overall behaviour of the system should be considered. For example, what time parameter is acceptable for the user to upload some material and processes it, or to wait for the result from the query; how many error messages and unexpected bugs are tolerable by the user; what set-up is more intuitive for handling the task, etc. 5. Still in FLSS there is useful information that cannot be explored, because it is hidden (i.e. without visualization). Stakeholders would like to see more explicit relations among the data and the resources. It will be handled via by more statistics parameters, and partly by graphical means. Page 268 of 349 D7.4 - Validation 4 Roadmap - validation activities Further validation planned for beyond the end of the project: Claim (OVT): Teachers perceive that the learning materials offered by FLSS are useful to them in developing a whole course. Methodology: 1. questionnaire, 2. comparison of the developed courses to state-of-the-art ones, 3. estimation by a manager on the quality of the course, 4. test of the developed courses in a real setting (analysis on students‟ opinions) Objective (OVT): OVT 2.1 The teacher saves time when developing a course unit in FLSS compared to traditional means. Methodology: 1. estimating the time two tutors invest: the first one using the materials and facilities of FLSS, and the other – the net or other sources. Page 269 of 349 D7.4 - Validation 4 Appendix B.7 Validation Reporting Template for WP6.2 (PUB-NCIT & UU) Section 1: Functionality implemented in Version 1.5 and alpha / beta-testing record Brief description of functionality Version number of unit Changes from Version 1.0 Knowledge Discovery – Los V1.5 Provide scientific documents from Bibsonomy in addition to materials from social network sites Knowledge Discovery – Los V1.5 Indicate were documents come from (e.g. Delicious, SlideShare, YouTube) Knowledge Discovery – Los V1.5 Dynamic (instead of static) social data retrieval (Delicious, YouTube, SlideShare, and Bibsonomy) Knowledge Discovery – Los V1.5 Disambiguated search results on the basis of the ontology Ontology visualisation V1.5 Show shortest path from concept1 to concept2 Ontology visualisation V1.5 Make ontology fragment dynamic Ontology Enrichment V1.5 Disambiguation integrated in ontology enrichment Knowledge Discovery – LOs / Ontology visualisation / Social learning - LOs V1.5 Combined Social and Semantic Search service Definition finder V1.5 Reduced length of definitions People finder V1.5 Show how people are related Help Functionality V1.5 Written a Quick Start Guide Scalability of the software V1.5 Improved scalability to enable working with large groups of people Crawler V1.5 Distributed crawler instead of serial crawler Page 270 of 349 D7.4 - Validation 4 Facebook and Twitter crawlers V1.5 Implemented two additional crawlers for crawling social network data* Caching V1.5 Implemented caching for ontology requests , search, and recommendation Alpha-testing Pilot site and language PUB-NCIT (English) Date of completion of alpha testing: 30/09/10 Who performed the alpha testing? Vlad Posea Pilot site and language UU (English) Date of completion of alpha testing: 14/10/10 Who performed the alpha testing? Thomas Markus, Eline Westerhout, Paola Monachesi Beta-testing Pilot site and language: PUB-NCIT (English) Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): yes If ‘No’ or ‘Partially’, give reasons: The widgets were embedded in Moodle – the learning environment used in PUB-NCIT beta-testing performed by: Costin Chiru (tutor), Radu Vasiliu – learner beta testing environment (stand-alone service / integrated into Elgg): th HANDOVER DATE: 5 October 2010 (Date of handover of software v.1.5 for validation) Page 271 of 349 widgets embedded into Moodle D7.4 - Validation 4 Pilot site and language: UU (English) Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): Partially If ‘No’ or ‘Partially’, give reasons: Knowledge Discovery Service: stand-alone, because the software had to be integrated into the learning environment used within the UU course (WebCT), which was easier with the stand-alone version. Social Learning Service: the Elgg-widgets have been used. beta-testing performed by: Erna Kotkamp (tutor) beta testing environment (stand-alone service / integrated into Elgg): Learning services, both integrated in WebCT HANDOVER DATE: 21 October 2010 (Date of handover of software v.1.5 for validation) Page 272 of 349 Stand-alone Knowledge Discovery service and Elgg Social Section 2: Validation Pilot Overview NB Information about pilot sites, courses and participants has been transferred to Appendix A.3 Pilot task Pilot site: PUB-NCIT Pilot language: English What is the pilot task for learners and how do they interact with the system? The pilot was embedded in the Human-Computer Interaction Course at the Politehnica University of Bucharest. Participation in the experiment was obligatory for all students. The course consists of 3 hours of presentation and 2 hours of labs weekly. The lab tasks aim to produce outputs, like scripts, HTML/Javascript interfaces or XML Schemas. The software has been embedded in Moodle, the learning management system that was used for this course. The software was made available to the students for more than one month. The software was first presented to the students and then made available for them to use it. The students were recommended to use the software to look for additional materials or to visualize the concepts in the domain. They mostly used the software in the lab or at home while they were solving their assignments. They used the iFLSS to find learning materials to supplement the ones already provided by the teaching team inside the course management system. The link to the software was placed near the link to the official course documentation to encourage learners to use it. The students clicked the link to find additional documentation and searched for documents in the tutors‟ social networks and visualized the concepts using the knowledge discovery service. After reading the documentation and listening to a short presentation in the class the students had to carry out small tasks. If the students had problems dealing with the tasks or with the iFLSS they could ask a tutor for help. What do the learners produce as outputs? Are the outputs marked? Learners produced outputs from the lab tasks, like scripts or XML Schemas. These outputs are marked manually – the lab activity is 15% of the total mark, which made the labs in which the experiments was performed to value 5% of the total course mark. The characteristics of the labs are that they are not very difficult, but they require a big amount of work in a small amount of time. Note that because the learners used the software within a real task contributing to their course mark, all learners had to be treated in the same way, so we could not use a control group. How long does the pilot task last, from the learners starting the task to their final involvement with the software? 1 month How do tutors/student facilitators interact with the learners and the system? Page 273 of 349 D7.4 - Validation 4 During the labs, the tutors present a technology, like XML, XML Schema or Javascript to the students. The tutors also offer support to the students while solving the tasks. In order to maximally profit from the software (social learning service), the tutors also have to add content to their accounts on social networking sites. This task has to be done a few days before the labs, so that the iFLSS could index these documents. Describe any manual intervention of the LTfLL team in the pilot: The previous validation revealed that learners do not use social networking sites for bookmarking purposes very often. Therefore, for the Social Learning service, we used the tutors‟ social networking accounts for searching instead of those of the learners. Pilot site: UU Pilot language: English What is the pilot task for learners and how do they interact with the system? The validation has been integrated in an existing ICT course at the Humanities faculty. Around 200 students followed this course, and this group was divided into eight groups of 20-25 students. The validation was carried out with two of these groups. Participating in the experiment was obligatory for all students in these groups. Within the course, the students spent three weeks learning HTML and CSS. They used the iFLSS for different tasks, such as finding documents, finding people, identifying relationships between topics, and learning how a domain is structured. The software has been embedded in the WebCT environment used in the course. What do the learners produce as outputs? Are the outputs marked? In the course, the students have to build their own web page. The course schedule and assignments were fixed already and we were not allowed to change this. We therefore designed some additional assignments for the experiment groups in which the students were explicitly asked to use the iFLSS. The outputs are not marked. How long does the pilot task last, from the learners starting the task to their final involvement with the software? 3 weeks How do tutors/student facilitators interact with the learners and the system? Each week there was a hands-on session, in which the tutors introduced some new concepts and assisted the students when necessary. The students finished their assignments during these sessions. The tutors used the system to introduce new topics using the ontology and the learning materials. They also pointed the students to the system when they had questions that could be answered using the system. Page 274 of 349 D7.4 - Validation 4 Describe any manual intervention of the LTfLL team in the pilot: The previous validation showed that students generally do not use the social networking sites that are used in the iFLSS system very often. We therefore decided to employ the network of an LTfLL member instead of letting them create their own accounts. Experiments Pilot Site: PUB-NCIT Name of experiment: Ranking Objective(s): Verification of social search Details: We asked 25 students to rate the results returned by the social search on 5 terms: - 1 term that the tutor didn‟t explicitly use in his social network, - 1 term used as a tag by the tutor but that had alternative spellings for its tags and produced errors, - 1 term that the tutor explicitly used as tag, and had no alternative spellings - 2 terms of their own choice. The first 5 results for each search were re-ranked by the students. If the student considered a result should not be in the first five results, he marked the result with an X. The results of the experiment are presented in OVT 1.1. Learners were also asked to consider whether the search returns people relevant to the search topic. They were asked to perform queries on the same five terms as above (3 given terms and 2 terms of their own choice) and to decide for the first 5 results whether they are relevant. The results of this experiment are presented in OVT 1.2. Pilot Site: UU Name of experiment: Relevance of learning materials recommended by the knowledge discovery service Objective(s): Investigate whether learning materials retrieved by the system are relevant. Details: Since the semantic search based on the ontology is especially useful for disambiguating terms, the experiment focuses on ambiguous queries (e.g. python, java). From the complete set of ambiguous terms from the ontology, we have manually selected 32 queries, which were indicative for searches a student typically will encounter. For each of these queries, the relevance of the first 20 results has been judged by two domain experts. For more details on this experiment, we refer to Markus et al. (submitted). Page 275 of 349 D7.4 - Validation 4 Pilot Site: UU Name of experiment: Logging usage statistics Objective(s): Measuring the motivation to keep using the system after the pilot Details: The usage of the system during the validation sessions and outside of the sessions has been logged. The system has been used by 49 students during the validation. 26 out of these 49 students (53.1%) have opened the system outside the sessions as well. In addition, 13 other students who did not participate in the experiment, have accessed the software. Page 276 of 349 D7.4 - Validation 4 Section 3: Results - validation/verification of Validation Topics OVT: 1.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The social learning service provides a high proportion of learning materials that match the search topic Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: A group of 25 students judged the results for 5 queries Results: The percentage of results learners considered useful were as follows: Term 1 (xml; in course materials but not tagged by tutor): mean 58% (SD=27.38%) Term 2 (xmlschema/XML Schema: tagged by tutor under one of two alternative spellings): mean 35% (SD=25.35%) Term 3 (xpath: tagged by tutor): 85% (SD=21.81%) Terms 4 & 5 (chosen by learners): 62% (SD=33.40%) OVT: 1.1 Pilot site UU Pilot language English Operational Validation Topic The knowledge discovery service provides a high proportion of learning materials that match the search topic and are suitable as learning materials Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: The top-20 results on 32 ambiguous queries have been judged by two domain experts Results: Contrary to regular search engines, the knowledge discovery service is able to deal with ambiguous queries because it is based on an ontology. The majority of learning materials retrieved for ambiguous queries match the search topic and are suitable as learning materials (recall: 0.77, precision: 0.75). It should be noticed that the search methods provided within the social network itself (i.e. the APIs) performed considerably worse on the same queries (precision: 0.22, which means that only 1 out of 5 results is relevant). The recall cannot be measured in this case, since we do not know the number of relevant resources on the Internet. Page 277 of 349 D7.4 - Validation 4 OVT: 1.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The social network service suggests a high proportion of people relevant to the search topic Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Students search for relevant people related to five queries (similar to OVT1.1, PUB-NCIT) Results: Relevance is in this context defined as relevant to the search query, i.e. useful people to ask questions about this topic. The percentage of results learners considered useful were as follows: Term 1 (xml; in course materials but not tagged by tutor ): mean 92% (SD=23.52%) Term 2 (xmlschema/XML Schema: tagged by tutor under one of two alternative spellings): mean 63% (SD=21.38%) Term 3 (xpath: tagged by tutor): 84% (SD=25.32%) Terms 4 & 5 (chosen by learners): 85% (SD=30.12%) Learners had been asked to examine the first five results. However, it was noted that some queries did not return five results if the tutor's network contained less than five people relevant to the search topic. OVT: 1.3 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The average learner's social network has enough people in it who can help him Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Analysis of networks to find number of relevant connections Results: An experiment in which the networks of 11 tutors were analysed revealed that tutors have access to 41 relevant persons on average. From this experiment, it is not clear which of these connections would be actually available for support, like chat or e-mail, but the documents they publish or bookmark can be used as learning materials for the students. The experiment investigated the accounts of 11 tutors on 5 platforms: Delicious.com, Page 278 of 349 D7.4 - Validation 4 YouTube.com, Flickr.com, Twitter.com and SlideShare.net. We examined both the initial structure of the tutors‟ networks and the networks‟ evolution in time: Tutors examined: 11 People in the tutor‟s networks: 456 Average number of resources posted in the network:47/day Average relations created in the network: 63/day 1 More details on this experiment can be found in Stoica et al. (2010) . OVT: 2.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Tutors have to spend less time finding relevant learning materials and helping the learner to identify related concepts Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Tutors KD34. I have to spend less time finding relevant learning materials when I use the Knowledge Discovery Service. Experimental 2.5 0.58 0 4 Tutors KD35. I have to spend less time to identify concepts related to the course topics when I use the Knowledge Discovery Service Experimental 2.5 0.58 0 4 Tutors SL34. I have to spend less time finding relevant learning materials when I use the Social Learning Service. Experimental 3.33 0.58 33 3 Formative results with respect to validation indicator Stakeholder type Results Tutors “It tells me stuff I already know.” (about the KD) Tutors “I gain time only after investing some time in it.” (SL) 1 Anamaria Stoica, Vlad Posea, Cristina Scheau and Mihai Teleru. An Analysis of the Usage of Social Networking Web Sites for Learning Purposes. Page 279 of 349 D7.4 - Validation 4 OVT: 2.1 Pilot site UU Pilot language English Operational Validation Topic Tutors have to spend less time finding relevant learning materials and helping the learner to identify related concepts Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimenta l / control group Mean Standard deviation %Agree / Strongly Agree n= Tutors KD34. I have to spend less time finding relevant learning materials when I use the Knowledge Discovery Service. Experimental 3.67 0.58 66.7 3 Tutors KD35. I have to spend less time to identify concepts related to the course topics when I use the Knowledge Discovery Service Experimental 3.67 0.58 66.7 3 Tutors SL34. I have to spend less time finding relevant learning materials when I use the Social Learning Service. Experimental 3.33 1.52 33.3 3 Formative results with respect to validation indicator Stakeholder type Results Tutors “The knowledge discovery service provides better results than Google, because the results are filtered”. I.e. It searches on the basis of the bookmarks from other people who considered it relevant information. Tutors “The usefulness of the social learning services depends on your network.” Page 280 of 349 D7.4 - Validation 4 OVT: 2.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic There is less cognitive load for the tutor to help the learners to find relevant learning materials and to help the learner to identify related concepts Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Tutor KD11a. Please rank on a 5-point scale the mental effort you invested to accomplish teaching tasks using the Knowledge Discovery service (1=Very High, 5=Very low). Experimental 4 0.82 n.a. 4 Tutor KD11b. Overall, using the knowledge discovery system requires significantly less mental effort to complete my teaching tasks than when using Google. Experimental 2.25 0.96 0 4 Tutor KD36. The cognitive load for finding relevant learning materials to be used in the course is lower when I use the system Experimental 2.75 0.96 0 4 Tutor KD37. The cognitive load for identifying concepts related to the course topics is lower when I use the system Experimental 2.75 0.96 0 4 Tutor SL11a. Please rank on a 5-point scale the mental effort you invested to accomplish teaching tasks using the Social Learning service (1=Very high, 5= Very low). Experimental 3.67 0.58 n.a. 3 Tutor SL11b. Overall, using the social learning system requires significantly less mental effort to complete my teaching tasks than when using Google. Experimental 2 1 0 3 Tutor SL35. The cognitive load for finding relevant learning materials to be used in the course is lower when I use the system Experimental 2.33 0.58 0 3 Formative results with respect to validation indicator Stakeholder type Results Tutor “It‟s hard to compare the system with Google given that it is very familiar to us, as it is to students, and it has also specialized tools like Google Scholar.” Page 281 of 349 D7.4 - Validation 4 OVT: 2.2 Pilot site UU Pilot language English Operational Validation Topic There is less cognitive load for the tutor to help the learners to find relevant learning materials and to help the learner to identify related concepts Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Tutor KD11a. Please rank on a 5-point scale the mental effort (1 = very high mental effort; 5 = very low mental effort) you invested to accomplish teaching tasks using the Knowledge Discovery service. Experimental 3.33 0.58 n.a. 3 Tutor KD11b. Overall, using the knowledge discovery system requires significantly less mental effort to complete my teaching tasks than when using Google. Experimental 3.67 0.58 66.7 3 Tutor KD36. The cognitive load for finding relevant learning materials to be used in the course is lower when I use the system. Experimental 3.67 0.58 66.7 3 Tutor KD37. The cognitive load for identifying concepts related to the course topics is lower when I use the system. Experimental 3.67 0.58 66.7 3 Tutor SL11a. Please rank on a 5-point scale the mental effort (1 = very high mental effort; 5 = very low mental effort) you invested to accomplish teaching tasks using the Social Learning service. Experimental 3.33 1.15 n.a. 3 Tutor SL11b. Overall, using the social learning system requires significantly less mental effort to complete my teaching tasks than when using Google. Experimental 3.33 1.15 66.7 3 Tutor KD36. The cognitive load for finding relevant learning materials to be used in the course is lower when I use the system. Experimental 3.67 1.15 33.3 3 Formative results with respect to validation indicator Stakeholder type Results Tutors “It would be useful if the tutor could have total control over the knowledge discovery service (i.e. possibility to change and Page 282 of 349 D7.4 - Validation 4 manipulate content, ontology, etc.) Its use will be affected by how difficult it is to manipulate it.” Tutors “Both services could be improved by giving tutors the option to modify the content." The tutors specified what they would like to be able to change: "adapt the ontology for a specific course by marking the course concepts in a different color" and "allowing tutors to add more information about learning materials.” Tutors “The social learning service should allow tutors to categorize the resources, to provide comments about the learning materials.” OVT: 3.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The learners judge the learning materials provided by the system as being relevant for their learning task Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimenta l / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD7. The learning materials provided by the knowledge discovery service are relevant for my learning task. Experimental 3.3 0.91 35 20 Learners SL7. The learning materials provided by the social learning service are relevant for my learning task. Experimental 2.9 0.83 28 25 Formative results with respect to validation indicator Stakeholder type Results Learner The results are not oriented on problem solving. “If I want to find the solution for a very specific bug I can‟t find anything useful here” Learner “When I have to do something quick I often don‟t have time to watch training videos – I go directly to an example” Learner “I would rather click the link returned by the (social) search if I‟d see more information about it in the window (like Google search results)” Learner “If the teaching assistant bookmarked this it must be good” Page 283 of 349 D7.4 - Validation 4 OVT: 3.1 Pilot site UU Pilot language English Operational Validation Topic The learners judge the learning materials provided by the system are relevant for their learning task Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimenta l / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD7. The learning materials provided by the knowledge discovery service are relevant for my learning task. Experimental 3.2 0.93 42.9 35 Learners SL7. The learning materials provided by the social learning service are relevant for my learning task. Experimental 3.32 0.91 47.1 35 Learners G7. The learning materials provided by Google are relevant for my learning task. Control 3.4 0.71 29.4 18 Formative results with respect to validation indicator Stakeholder type Results Learner The learning materials provided by the social learning service are appreciated because “they are more trusted than the Google results”. Students like the idea that “you can see which documents their tutor trusts” (More about this in OVT3.8) Learner It is useful to be able to find learning materials from various sources, especially the YouTube videos are appreciated by the learners. The learners remarked that it depends on the learning context whether videos are wanted: “W hen you are solving a task, you do not want to spend time watching a video, it is better to have a document in such a situation. But a video might be useful when you study.” Learner “More information would help to decide whether documents are relevant, for example something like Google has, a short text which contains the term [comment UU: student meant the result snippets].” Learner Google generally suggests “easier documents, which is often enough to find an answer to questions.” Page 284 of 349 D7.4 - Validation 4 OVT: 3.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The learners judge the people proposed by the social network service as being relevant Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners SL8. The people proposed by the social learning service are relevant. Experimental 3.6 1.12 56 25 Formative results with respect to validation indicator Stakeholder type Results Learners “I don‟t know who these people are, so I don‟t know if the persons are relevant”. Learners “Certainly a feature I miss from search engines: finding persons that are dedicated on posting documents on what I‟m interested in” OVT: 3.2 Pilot site UU Pilot language English Operational Validation Topic The learners judge the people proposed by the social network service as being relevant Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners SL8. The people proposed by the social learning service are relevant. Experimental 3.29 0.68 38.2 35 Formative results with respect to validation indicator Stakeholder type Results Learners In the validation context, the network of an LTfLL member was used, which influenced the results: “I didn't know the people, so how do I know whether they are relevant?” Learners Positive experience with documents from a certain person was used for future search requests: “I found a good document from this person for my previous request, so other bookmarks from the same person are probably also interesting.” Page 285 of 349 D7.4 - Validation 4 Learners OVT: 3.3 [Learners appreciated the idea of using their tutor's network, which in a university context is a realistic alternative to using the students' own networks.] Pilot site PUB-NCIT Pilot language English Operational Validation Topic Learners trust the retrieved learning materials more than those found by traditional means Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD9. I trust the retrieved learning materials from the knowledge discovery service more than those found by Google. Experimental 2.5 1.39 35 20 Learners SL9. I trust the retrieved learning materials from the social learning service more than those found by Google. Experimental 2.5 1.23 20 25 Formative results with respect to validation indicator Stakeholder type Results Learner “I‟m very used to searching with Google and I trust them very much. I find it amusing that you are trying to compare with them” Learner “I trust resources recommended by my teacher more than I trust resources recommended by Google. Sometimes they are the same” Page 286 of 349 D7.4 - Validation 4 OVT: 3.3 Pilot site UU Pilot language English Operational Validation Topic Learners trust the retrieved learning materials more than those found by traditional means Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD9. I trust the retrieved learning materials from the knowledge discovery service more than those found by Google. Experimental 3.17 1.25 54.3 35 Learners SL9. I trust the retrieved learning materials from the social learning service more than those found by Google. Experimental 3.21 1.02 39.4 35 Formative results with respect to validation indicator Stakeholder type Results Learner “My tutor will probably only follow good researchers.” The table for OVT 4.1 is missing for PUB-NCIT, because they did not include this question in their questionnaire. OVT: 4.1 Pilot site UU Pilot language English Operational Validation Topic Learners can independently identify gaps in their knowledge in a given domain and learn how concepts are related to each other Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD40. The knowledge discovery software helps me to identify gaps in my knowledge and to learn how topics are related to each other. Experimental 3.12 0.81 35.3 35 Page 287 of 349 D7.4 - Validation 4 Formative results with respect to validation indicator Stakeholder type Results Learners “When you are a beginner in the domain you don't know anything. This makes it difficult to know where to start.” Tutors The tutors clearly distinguish between different groups of students: “The knowledge discovery service is appropriate for doers, whereas it might be less appropriate for students that are more insecure, they might feel better with a text that offers a clearer path.” OVT: 4.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The visual representation of the domain helps learners to understand the domain better compared to Google Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD8. Because of the visual representation of the domain I now understand the domain better than when I would have used Google. Experimental 3.2 1.47 45 20 OVT: 4.2 Pilot site UU Pilot language English Operational Validation Topic The visual representation of the domain helps learners to understand the domain better compared to Google. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD8. Because of the visual representation of the domain I now understand the domain better than when I would have used Google. Experimental 2.71 1.2 25.7 35 Page 288 of 349 D7.4 - Validation 4 OVT: 4.3 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The visual representation of the domain helps learners to understand the domain better than they would have without this visualization. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD22. The visual representation of the domain helped me to learn more about the topics covered in the course than I would have without this visualisation. Experimental 4.1 0.94 70 20 Formative results with respect to validation indicator Stakeholder type Results Learners “I can very rapidly discover the keywords and concepts in a domain.” Learners “I feel the relations between concepts are not very clearly expressed” Learners “It is very good as an introduction to the topic, but I feel that after I know the domain a bit it doesn‟t offer anything special” Learners “The visual representation of the domain allows me to learn by myself” OVT: 4.3 Pilot site UU Pilot language English Operational Validation Topic The visual representation of the domain helps learners to understand the domain better than they would have without this visualization. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD22. The visual representation of the domain helped me to learn more about the topics covered in the course than I would have without this visualization. Experimental 3.12 0.88 38.2 35 Page 289 of 349 D7.4 - Validation 4 Formative results with respect to validation indicator Stakeholder type Results Learners “The visualisation does not contain enough information about the relations between the terms, which makes it difficult to understand the domain on the basis of the graph.” Learners “How can I see what I should know when I don't know anything yet?” Learners “Not useful to search, it is not clear how you search, all these terms are not known and you don't know what to click.” Learners “There is too much information, we don't need it, we don't need/want to learn more, we simply want to learn what we need to pass the test and we want to figure it out quickly. No distinction is made in the graph.“ Learners “I have a lot of experience with using mind maps, which are quite similar to the graphs. I like the visualisation.” Tutors The information overload in the visualisation was mentioned several times in the interviews: “It makes it difficult to see what is most relevant”. A suggestion provided was: “give the topics that are obligatory for the course a different colour to help students”. The course was still running when we wrote the VRT. The quality of the educational output is only measured at the end of the course, when the students had to hand in their assignments (PUB-NCIT) or to do an exam (UU). We therefore have no results regarding the influence of the iFLSS on the quality of the eductional output. OVT: 6.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The learners perceive that the iFLSS supports more self-directed learning compared to traditional means Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD26. The knowledge discovery service supports more self-directed learning compared to other tools I use. Experimental 3.5 0.89 40 20 Learners SL25. The social learning service supports more self-directed learning compared to other tools I use. Experimental 3.0 1.02 28 25 Formative results with respect to validation indicator Page 290 of 349 D7.4 - Validation 4 Stakeholder type Results Learners “The visual representation of the domain allows me to learn by myself” OVT: 6.1 Pilot site UU Pilot language English Operational Validation Topic 2 The learners perceive that the iFLSS supports more self-directed learning compared to traditional means. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners G25. Google supports more self-directed learning compared to other tools I use. Control 3.3 0.85 17.7 17 Learners KD26. The knowledge discovery service supports more self-directed learning compared to other tools I use. Experimental 3.09 0.79 29.4 35 Learners SL25. The social learning service supports more self-directed learning compared to other tools I use. Experimental 3.03 0.87 32.4 35 Formative results with respect to validation indicator Stakeholder type Results Learners “The software provides an overview of the domain, this can be helpful for reflection purposes.” Learners “I do not have experience with learning in settings outside of the university.” 2 The interviews revealed that this question was not clear to all students and that they do not have much experience with self-directed learning. Page 291 of 349 D7.4 - Validation 4 OVT: 7.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic There is a saving in institutional resources overall Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan Stakeholder type Results Teaching manager “The software might bring gains in the time the professors spend creating teaching materials” OVT: 7.1 Pilot site UU Pilot language English Operational Validation Topic There is a saving in institutional resources overall Summative results with respect to validation indicator Formative results with respect to validation indicator Results that inform a change to the scenario or developed software, or inform the implementation/exploitation plan Stakeholder type Results Teaching manager “The set-up costs may be relatively high in the beginning, but on the long-term there will be a saving in resources. “ The TM specifically mentioned the time needed to find and tune a new ontology and to build your network. OVT: 8.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic The service meets one or more institutional objectives Formative results with respect to validation indicator Stakeholder type Results Teaching manager The teaching manager agreed on this statement. “The system helps our students to access diverse learning materials while doing research” Page 292 of 349 D7.4 - Validation 4 OVT: 8.1 Pilot site UU Pilot language English Operational Validation Topic The service meets one or more institutional objective Formative results with respect to validation indicator Stakeholder type Results Teaching manager The teaching manager agreed on this statement. The system meets several institutional objectives: assist learners in understanding how a domain is structured allow learners to learn from professionals the learners can easily identify qualitative learning materials OVT: 9.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Users are motivated to keep using the system after the end of the validation activities 3 Questionnaire type Questionnaire no. & statement Experimenta l / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD35. I would like to use the knowledge discovery service after the pilot. Experimental 3.6 0.94 55 35 Learners KD36. If the knowledge discovery service is available after the pilot, I will definitely use it. Experimental 3.3 1.21 45 35 Learners SL34. I would like to use the social learning service after the pilot. Experimental 3.4 1.19 52 25 Learners SL35. If the social learning service is available after the pilot, I will definitely use it. Experimental 3.0 1.14 36 25 Tutors KD29. I would like to use the knowledge discovery service in my teaching after the pilot. Experimental 3.75 0.5 66 3 3 At PUB-NCIT, no logging has been used to measure the use after the pilot. Page 293 of 349 D7.4 - Validation 4 Tutors KD30. If the knowledge discovery service is available after the pilot, I will definitely use it in my teaching. Experimental 3.25 0.5 33 3 Tutors SL29. I would like to use the social learning service in my teaching after the pilot. Experimental 3.0 1.0 33 3 Tutors SL30. If the social learning service is available after the pilot, I will definitely use it in my teaching. Experimental 3.33 0.58 33 3 Formative results with respect to validation indicator Stakeholder type Results Learners [Some students showed curiosity towards the application. The others prefer to use systems with which they are already familiar if the advantage of using the new one isn‟t overwhelming.] Learner “I would like to use this tool (KD service) at the beginning of a course to see quickly which concepts are covered in this course.” Learner “I would like to play with this (SL service) from time to time as an alternative to classical search but I can‟t give up the traditional search methods” Tutors “The KD service is much easier to maintain (requires much less effort for me) and that is why it is much more likely for me to adopt it” OVT: 9.1 Pilot site UU Pilot language English Operational Validation Topic Users are motivated to keep using the system after the end of the validation activities Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: (1) learners & tutors / questionnaire (2) students / logging usage statistics Results: The usage of the system during the validation sessions and outside of the sessions has been logged. The system has been used by 49 students during the validation. 26 out of these 49 students (53.1%) have opened the system outside the sessions after the pilot has ended as well. In addition, 13 other students who did not participate in the experiment have accessed the software. These data seem to go against the opinions of the students on the questionnaires, on which only 20% indicated that they would be interested in using the system again. Page 294 of 349 D7.4 - Validation 4 Questionnaire type Questionnaire no. & statement Experimenta l / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD35. I would like to use the knowledge discovery service after the pilot. Experimental 2.56 0.99 20 35 Learners KD36. If the knowledge discovery service is available after the pilot, I will definitely use it. Experimental 2.49 1.05 22 35 Learners KD39. I am motivated to keep using the knowledge discovery system as long as it is provided in WebCT. Experimental 2.76 0.96 26.7 35 Learners SL34. I would like to use the social learning service after the pilot. Experimental 2.68 0.88 19.4 35 Learners SL35. If the social learning service is available after the pilot, I will definitely use it. Experimental 2.62 0.85 16.1 35 Learners SL38.I am motivated to keep using the social learning system as long as it is provided in WebCT. Experimental 3.12 0.98 29.0 35 Tutors KD29. I would like to use the knowledge discovery service in my teaching after the pilot. Experimental 3.00 1.00 33.3 3 Tutors KD30. If the knowledge discovery service is available after the pilot, I will definitely use it in my teaching. Experimental 3.00 1.00 33.3 3 Tutors SL29. I would like to use the social learning service in my teaching after the pilot. Experimental 3.33 0.58 33.3 3 Tutors SL30. If the social learning service is available after the pilot, I will definitely use it in my teaching. Experimental 3.0 0 0 3 Formative results with respect to validation indicator Stakeholder type Results Learners The interviews made learners think more about the usefulness of the service and one of the students – who filled in a negative answer in the questionnaire – remarked: “I think I'm going to miss this software in my other courses now I see the potential of it!” Learner “The knowledge discovery service is useful for reflection. It enables me to check very quickly whether I understand all terms for my exam.” Page 295 of 349 D7.4 - Validation 4 Learner “The knowledge discovery service can help you to find an original topic for a paper. It now often is the case that everyone chooses the same topic, but I prefer to investigate a topic that is not very common.” Learner “I would follow student X when I had the social learning service. He works hard and always provides good summaries to other students. He probably reads good articles.” Tutor “It can constitute a useful support for students that want to know more, which are often neglected during class “ Tutor “A lot of time and energy should be invested in setting it up and this might be an obstacle in deciding to use it. It would take more time than prepare a standard course, but if you have set it up, it seems easier to maintain “ OVT: 9.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption) Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Generic questionnaire - learners Results: Descriptive Statistics - Learners N Mean Std. Deviation Effectiveness 45 3,23 ,573 Efficiency 45 3,02 ,807 Cognitive Load 45 2,76 ,981 Usability 45 3,90 ,646 Satisfaction 45 3,39 ,870 Facilitating conditions 45 3,62 ,698 Self-Efficacy 45 3,39 ,810 Behavioural intention 45 3,28 1,069 PUB-NCIT 45 3,39 ,514 Valid N (listwise) 45 Page 296 of 349 D7.4 - Validation 4 OVT: 9.2 Pilot site UU Pilot language English Operational Validation Topic A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption) Summative results with respect to validation indicator Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Generic questionnaire - learners Results: Descriptive Statistics - Learners N Mean Std. Deviation Effectiveness 34 3,19 ,805 Efficiency 34 3,08 ,596 Cognitive load 34 2,65 1,098 Usability 34 3,45 ,629 Satisfaction 34 2,79 ,637 Facilitating conditions 34 3,35 ,568 Self-Efficacy 34 3,13 ,715 Behavioural intention 34 2,65 ,793 UU 34 3,09 ,545 Valid N (listwise) 34 Page 297 of 349 D7.4 - Validation 4 OVT: 9.4 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Learners find the information provided by the system in addition to the learning materials (e.g. titles, users, definitions) useful for the task being undertaken. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD The information provided by the system in addition to the learning materials (e.g. titles, users, definitions) is useful. Experimental 3.8 0.91 65 20 SL The information provided by the system in addition to the learning materials (e.g. titles, users, definitions) is useful. Formative results with respect to validation indicator Experimental 3.2 0.87 48 25 Learners Stakeholder type Results Learners [Several of the learners appreciated the definitions provided by the knowledge discovery service] Learners [Several of the learners didn‟t like the tags provided by the learning service. They said they would have preferred some snippets of what the links refer to] OVT: 9.4 Pilot site UU Pilot language English Operational Validation Topic Learners find the information provided by the system in addition to the learning materials (e.g. titles, users, definitions) useful for the task being undertaken. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD37a. The definitions provided by the knowledge discovery service in addition to the learning materials are useful. Experimental 3.21 0.81 35.3 35 Page 298 of 349 D7.4 - Validation 4 Learners KD37b. The tags provided by the knowledge discovery service in addition to the learning materials are useful. Experimental 3.29 0.84 38.2 35 Learners KD37c. The document titles provided by the knowledge discovery service in addition to the learning materials are useful. Experimental 3.26 0.67 35.3 35 Learners SL36a. The users provided by the system in addition to the learning materials are useful. Experimental 3.06 0.7 27.3 35 Learners SL36b. The tags provided by the system in addition to the learning materials are useful. Experimental 3.06 0.81 29.4 35 Formative results with respect to validation indicator Stakeholder type Results Learners Both tags and titles are relevant: “tags give information about the topics covered in the document and eventually about the type of document, while a title directly shows what the page is about.” OVT: 9.5 Pilot site PUB-NCIT Pilot language English Operational Validation Topic Learners perceive that they can find learning materials more quickly compared to traditional means. Summative results with respect to validation indicator Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD38. With the Knowledge Discovery service, I can find learning materials more quickly than with Google. Experimental 2.4 1.19 20 20 Learners SL37. With the Social Learning service, I can find learning materials more quickly than with Google. Experimental 2.3 0.85 8 25 Formative results with respect to validation indicator Stakeholder type Results Learners Sometimes the software (Social search) is slow and answers the queries slowly Page 299 of 349 D7.4 - Validation 4 [It‟s easier to get results with a software that you‟re already accustomed to] Learners OVT: 9.5 Pilot site UU Pilot language English Operational Validation Topic Learners perceive that they can find learning materials more quickly compared to traditional means. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation %Agree / Strongly Agree n= Learners KD38. With the Knowledge Discovery service, I can find learning materials more quickly than with Google. Experimental 2.59 0.93 14.7 35 Learners SL37. With the Social Learning service, I can find learning materials more quickly than with Google. Experimental 2.65 0.81 8.8 35 Formative results with respect to validation indicator Stakeholder type Results Learners [For the social learning service, the selection process may be shorter when the tutor uploads good documents. In such cases it can be faster than Google.] Learners “The time it takes to find the software in WebCT is already more than the time it takes to complete a search request with Google” Page 300 of 349 D7.4 - Validation 4 Section 4: Results – validation activities informing future changes / enhancements to the system VALIDATION ACTIVITY Partner(s) involved: PUB-NCIT Service language: English Additional formative results (not associated with validation topics) Alpha testing Major changes identified during alpha testing but not yet implemented are: The path finding service, which shows how a person is related to the user, was dropped as it generated too many requests and made the server too slow. Tutor interviews Tutors would like to be able to edit the ontology to add new concepts that they believe are interesting for the students Tutors would like to be able to add content directly in the visualisation window (like links with resources for a given concept) 4 Tutors argued that sometimes the search in the SL returns items completely unrelated to the domain . Maybe those results should be omitted Learner focus group 1 KD service Further explanations regarding the relations between concepts would be nice A richer variety of relations is needed (e.g. JQuery is-library-for Javascript) One person suggested introducing some time limit between two clicks, before the user can go to the next concept. The visualisation should allow filtering by difficulty – for example a beginner doesn‟t need to see terms more suited to advanced learners from the beginning Eliminate the “grey” results completely as they cause confusions. (Grey results are considered to be ambiguous by the Knowledge Discovery disambiguation service) SL service Facebook integration was strongly suggested, as it is the social network on which users spend the most of their time The system (especially the SL service) was criticised as too slow when everybody in the class uses it. The integration to Moodle should be improved The links provided do not offer sufficient information to understand if a link is worth visiting or not. Some students 4 Explanation: if there are few documents related and all belong to one person, it also returns other documents belonging to that person Page 301 of 349 D7.4 - Validation 4 suggested eliminating the tag list (very similar for all items returned by a query) and replacing it with some text from the page A problem detected and repaired was that the search was case-sensitive. This wasn‟t discovered in the beta-testing but with real users. General Learners would prefer to see the context in which the relevant concept is used in the document. Learner focus group 2 (prioritisation of enhancements) 5 Learners judged that the five most important areas for enhancement of the iFLSS (i.e. clusters) are: Performance under heavy use (SL) User interface (SL) Integration with the PUB-NCIT learning environment (both SL and KD) Support for new web sites (Facebook) (SL) Ranking of the learning materials (both SL and KD) Learners judged that the five most important single improvements that should be made to the system are: 1. More information displayed on the visualisation (see results from focus group 1) (KD) 2. Replacing tags in the social search with snippets showing what the link is about (SL) 3. Integrate the widgets in Moodle and not link to external sites (SL) 4. Search faster (SL) 5. Add Facebook as a source of information (SL) Teaching manager interview The teaching manager considered the applications interesting and useful. The results returned by the search were considered to be good The visualisation was considered to be useful User interface on KD and SL could be improved (access keys, more content offered for search) Add some help information 5 The iFLSS uses tags instead of the document text to search for relevant documents. This makes the inclusion of snippets problematic. Page 302 of 349 D7.4 - Validation 4 VALIDATION ACTIVITY Pilot partner: UU Service language: English Additional formative results (not associated with validation topics) Alpha testing Major changes identified during alpha testing but not yet implemented are: showing relation path (i.e. how is a person related to me?) in social learning service was too slow, therefore we left this feature out of validation Beta testing Tutor interviews Labels in the ontology sometimes overlap Improved support for tutors would be appreciated: option to mark course concepts in different colour to prevent information overload possibility to change ontology for tutors include option to add comments to social network resources (e.g. tutor‟s opinion about a resource) Learner focus group 1 Knowledge discovery service The problem of information overload needs to be addressed. There is too much information in the knowledge discovery service for uncertain beginners. The knowledge discovery service should make clear what the students have to know and what is considered optional. Note: if this feature would be adapted, a distinction between using the software in formal and informal learning contexts will be necessary. Learners would like to have clear information about how the terms are related. It depends on the learning style of a learner whether or not the knowledge discovery service is useful. Visually overlapping terms in knowledge discovery service should be avoided. Social learning service The interface of the social learning service needs to be improved. For example, the students would like to have snippets like Google, and more information about relevance within the course. In addition to showing who bookmarked a learning material, it would be also relevant to know who created it. In the validation context, the network of an LTfLL member was used, which influenced the results. It is relevant for learners to indicate clearly the status of a person in the network and how it fits in their study-related social network. Improve ranking of results. Page 303 of 349 D7.4 - Validation 4 General The task determines which types of learning materials are relevant. For example, you don't want to get a video when you are solving a task, since you do not have the time to watch it. It would be better to have a document in this case, but a video might be more useful when you study. Add snippets to the search results. Offer system via general link instead of the course website to allow quicker access (e.g. http://mycourse.iFLSS.com) Learner focus group 2 (prioritisation of enhancements) Teaching manager interview Learners judged that the five most important areas for enhancement of the system (i.e. clusters) are: 1. Interface (especially SL) 2. Link system better to course (i.e. better integration of system in course, instead of providing it as a general system) (both) 3. Accessibility of the system (both) 4. Ranking (both) 5. Determine language of documents (both) Learners judged that the five most important single improvements that should be made to the system are: 1. Concepts in different colours on basis of topics covered in course (KD) 2. Add more relations in the ontology (KD) 3. Make system easier accessible, not via WebCT (i.e. a general link like http://mycourse.iFLSS.com) (both) 4. Add snippets to search results (both) Both services can improve the learning process and the New Media institute (note: we interviewed the TM from this institute) would be willing to try them if they can be easily adapted to other domains Knowledge discovery service ready to be used The social learning service should be improved, especially with respect to the interface Since both services offer different advantages and opportunities, it is not considered necessary to offer an integrated system, although it would be nice to offer institutions the possibility to combine the two components into one system. Page 304 of 349 D7.4 - Validation 4 Section 5: Results – validation activities informing transferability, exploitation and barriers to adoption VALIDATION ACTIVITY Partner(s) involved: PUB-NCIT Service language: English Additional formative results (not associated with validation topics) Beta testing Moved the services to an Amazon server because of performance problems. The transfer of the services took about 4 hours because there were a lot of additional software packages that had to be installed (python packages mostly). This should be solved by generating an executable containing all the packages. It might be interesting to assemble all the LTfLL services on an Amazon machine so that anyone can start a fully installed server in seconds. Content has been added to the social network sites Tutor interviews Tutors didn‟t believe in the widgetized approach. One of the tutors stated that it could be more difficult for the student to find information in a lot of different widgets than in one integrated application (window). The tutorial should also contain some advice on how to use the social networking sites more efficiently – e.g. browser extensions, tips and tricks on how to tag based on learner‟s searching habits. Some tutors believe that their colleagues who teach subjects that are not related to web 2.0 wouldn‟t use social networking sites for learning purposes. Learner focus group 1 The system ran too slowly when everybody was using the system The links are good only when the teaching assistants invest time – that probably won‟t happen in all the courses The service is not so useful for problem-solving tasks It is easier to obtain results from software with which the learners are familiar Teaching manager interview The teaching manager argued that for the SL service, it would be difficult to convince the teaching assistants to work on social networking sites and for the KD service the resources returned are not validated by a tutor. The Teaching Manager was open to considering further integration with Moodle. However, to enable a deeper integration of the iFLSS into Moodle, this should be thoroughly tested by the person responsible for the Moodle platform. The Teaching Manager expressed his concerns regarding the effort for tutors in setting up their networks for the Social Learning service, and concluded that the Knowledge Discovery service was more likely to be adopted. Other (please specify) During verification, it was found that tutor networks sometimes are too small to return five people relevant to a topic. Page 305 of 349 D7.4 - Validation 4 VALIDATION ACTIVITY Partner(s) involved: UU Service language: English Additional formative results (not associated with validation topics) Beta testing The social network content and contacts had to be adapted to the course Some missing concepts covered in the course had to be included in the ontology Tutor interviews Some tutors are negative about disclosing personal information to students, but they suggested as an alternative that it would be acceptable if one could choose which resources and messages are (not) shared with students. It is not problematic to disclose your network (i.e. your friends) to students, it might on the other hand have advantages since they might come across interesting people. However, there might be differences in this respect according to which generation you belong. Learner focus group 1 Usefulness of social learning service strongly depends on teachers. Interesting to follow clever students, while less interesting for following friends. Only useful in courses in which one has to search for new material, while often the information is already provided by st the tutors (note: they were all 1 year students). Students in general are not willing to try new systems, since they are quite satisfied with Google's results. The impact of using new software is considered a barrier for learners, who are all experienced Google users. Google works fine and the learners are satisfied with it, so they do not see the need to use other software. Another barrier is the fact that the software is integrated in WebCT, which involves additional steps to find the software, whereas Google is immediately accessible at any time. Teaching manager interview The teaching manager of the New Media institute said that they are eager to try new systems and would be ready to adopt them as long as tutors can control the software themselves. However, teachers from other institutes believe that this is not the case at the faculty level, since the system is too innovative and there is no funding that can be invested to improve the performance. Page 306 of 349 D7.4 - Validation 4 Transferability questionnaire: Institutional policies and practices The usage of social platforms for regular student usage is necessary for the Social Learning component and should be allowed and recommended by the institution. Alternatively the social search service can be deployed with the option to allow students to search within the social network of their tutor without having to have social networking accounts themselves. This may be an effective compromise where they do gain access to high quality learning material found in the social network of their tutor. The negative side-effect is however that additional personalisation of the content is not possible. This is the setup that was pioneered during the validation in order to reduce the requirements for adoption of the software by students. The use of social networking tools should be encouraged for resource sharing between colleagues in the workplace. Tutors indicated that they would see great value in social networking with colleagues provided that separate work-only-accounts are facilitated and critical mass is obtained. Transferability questionnaire: Relevance of the service in other pedagogic settings Pedagogic setting Pedagogic settings for which the service would be suitable: Reason(s) This can be used in any environment that fulfils the following characteristics: student wants to understand a new domain there is some kind of supervision or direction from a tutor Social Search: directed learning, social learning student projects, workbased learning Semantic Search: directed learning, self-directed learning, problem-based learning, essay writing the tutor doesn‟t offer full time tutoring and doesn‟t have time to recommend study materials both tutor and learner use social platforms In SDL, the service can be used to learn the most important concepts and relations within a domain on your own in DL, the service can be used as an additional learning resource to identify how a concept is related to other concepts in PBL, the service can assist students while collaboratively solving problems by providing the expert view on a domain. For essay writing, the service can be used to get oriented and to find important or novel concepts and documents. The service can be used for reflection and to prepare for exams by providing a clear overview of course subjects which the students can check his or her proficiency in. The tutor can employ the tool to give the students the proper context and interrelations of the course Page 307 of 349 D7.4 - Validation 4 Pedagogic setting Reason(s) subjects as they are taught each week. Pedagogic settings for which the service would be less suitable: Social Search - PBL Social and Semantic Search Revising for exams Problem based learning is not likely to be helped by the social component as it is not very likely for a peer to run into the same very specific problem that you run into Revising for exams – the tools help you find new materials, but does not offer summaries or revision features Less suited for students which are very insecure and get confused when confronted with non-linear approaches to learning. Increased time pressure as for example with a large number of mandatory assignments to be completed during the course will force students to stick to conventional means of acquiring information. Just opening a new system and getting to know it is too much effort compared to conventional means and not worth it, even if there is a long-term pay-off when using it. This problem even extends to solutions properly embedded within the institute‟s LMS, because the login-procedure itself is already an additional boundary to adoption. Transferability questionnaire: Relevance of the service in other domains Types of domain Types of domain for which the service would be suitable: learning Types of domain for which the service would be less suitable: Practical assignments Reason(s) The services can be used in any learning domain (e.g. mathematics, biology, linguistics), as long as there is an ontology covering the knowledge of that specific domain and the learner has a network in which resources on this domain are contained. So, any restrictions are not related to the service itself. The learning process itself should however not have a strong emphasis on practical assignments with little attention to the theoretical background behind the task when using knowledge discovery tool. Page 308 of 349 D7.4 - Validation 4 Section 6: Conclusions Validation Topics OVT Operational Validation Topic Validated unconditionally Validated with qualifications Not validated Qualifications to validation PVT1: Verification of accuracy of NLP tools OVT1.1 The knowledge discovery system provides a high proportion of relevant learning materials that match the search topic (knowledge discovery) OVT1.1 The social learning search provides a high proportion of relevant learning materials that match the search topic OVT1.2 The social network service suggests a high proportion of people relevant to the search topic OVT1.3 The average learner's social network has enough people in it who can help him UU PUB-NCIT Assumption: tutor has added content to the social networking sites. The search is also sensitive to the use of different spelling variants. PUB-NCIT Assumption: tutor is connected to people that post content in the domains searched by the students PUB-NCIT More likely if the learner is connected to the tutor. Depends on the tutor being connected to people that post content in the domains searched by the students PVT2: Tutor efficiency OVT2.1 Tutors have to spend less time finding relevant learning materials and helping the learner to identify related concepts UU Page 309 of 349 PUBNCIT Tutors often know 'too much' already. Especially coverage of social resource search considered too low, but this depends on the D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications Not validated Qualifications to validation efforts of the tutor itself. High set-up costs influence opinion of tutors. Tutor wants more influence on the ontology. OVT2.2 There is less cognitive load for the tutor to help the learners to find relevant learning materials and to help the learners to identify related concepts UU PUBNCIT The system itself is considered not cognitively demanding. However, it's hard to beat Google, which is very common to everyone. Again, the cognitive load for identifying concepts is not high, but the system does not outperform Google. This may also depend on the course concepts. PVT3: Quality and consistency of (semi-) automatic feedback OR information returned by the system OVT3.1 The learners judge the learning materials provided by the system as being relevant for their learning task PUB-NCIT UU Results considered equally relevant as Google results, but system does not outperform Google. System considered less useful for problem-solving tasks. OVT3.2 The learners judge the people proposed by the social network service as being relevant PUB-NCIT UU Tested with tutors‟ networks in validation, still to be tested with students' own networks. Feedback to user about relevance of people should be improved. OVT3.3 Learners trust the retrieved learning materials more than those found by traditional means UU Page 310 of 349 PUBNCIT Different results in UU and PUBNCIT. The social learning system did not expose the trust dimension D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications Not validated Qualifications to validation enough. A small group of early adopters appreciated and understood the idea. Probably these are the ones that are more familiar with the use of social networks. PVT4: Making the educational process transparent OVT4.1 Learners can independently identify gaps in their knowledge in a given domain and learn how concepts are related to each other OVT4.2 The visual representation of the domain helps learners to understand the domain better compared to Google. UU PUB-NCIT Page 311 of 349 Information overload is a problem: there's one big gap. Makes it difficult for beginners, should be tested with learners that have more knowledge about the domain already UU Visual representation helped (OVT4.1 and 4.3), but the system does not outperform Google when looking at the means. However, at both institutions – especially at PUB-NCIT – there is still a considerable group of positive people saying that the system does outperform Google. The opinions of the learners also seem to be related to some extent to their learning styles. Testing with groups of people having different learning styles could point out whether this indeed is D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications Not validated Qualifications to validation the case. OVT4.3 The visual representation of the domain helps learners to understand the domain better than they would have without this visualization. PUB-NCIT UU Students would have preferred a longer usage of the service in different domains to give them more insights in the usefulness of the visualisation. After discussing the visualisation in the focus groups (UU), students could more clearly see the benefits of the visualisation while learning domains they're interested in. PUB-NCIT The UU learners do not have experience with self-directed learning, should be tested with other learners or informal learning professionals. The PUB-NCIT learners are weakly positive, especially about the use of the knowledge discovery service for self-directed learning. PVT5: Quality of educational output PVT6: Motivation for learning OVT6.1 The learners perceive that the iFLSS supports more self-directed learning compared to traditional means PVT7: Organisational efficiency OVT7.1 There is a saving in institutional resources overall The set-up costs may be relatively high in the beginning, but on the long-term there will be a saving in Page 312 of 349 D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications Not validated Qualifications to validation resources (UU). The PUB-NCIT teaching manager agreed that the software might bring gains in the time the professors spend creating teaching materials. However, there's not sufficient effidence that there will be a saving in institutional resources. More research is needed to draw conclusions in this respect. PVT8: Relevance OVT8.1 The service meets one or more institutional objectives PUB-NCIT UU The system meets several institutional objectives, the most important one according to UU and PUB-NCIT is that it assists students to easily access diverse learning materials in doing research PVT9: Likelihood of adoption OVT9.1 Users were motivated to continue to use the system after the end of the formal validation 6 activities PUB-NCIT UU 6 System remains available until the end of the course Page 313 of 349 Logging results for students at UU quite positive, while they were less positive on the questionnaire. Learners had problems generalizing to other domains / courses and only thought of the setting they experienced in the course. Tutors generally more positive. D7.4 - Validation 4 OVT Operational Validation Topic OVT9.2 A high score was obtained in the generic questionnaires (based on UTAUT: likelihood of adoption by users). OVT9.3 Tutors attending a dissemination workshop give high scores to the question 'how likely are you to consider adopting the service in your own educational practice? OVT9.4 Learners find the information provided by the system in addition to the learning materials (e.g. titles, users, definitions) useful for the task being undertaken. OVT9.5 Learners perceive that they can find learning materials more quickly compared to traditional means. Validated unconditionally Validated with qualifications Not validated PUB-NCIT UU Qualifications to validation Pending (tutor workshop: end of February) Many neutral opinions can have several reasons. Solution could be to test this point using a scale with even number of responses to force users to choose. PUB-NCIT UU PUBNCIT UU Barriers: using system takes more time, because it is new and embedded in course environment. Exploitation (SWOT Analysis) The objective you are asked to consider is: "The iFLSS (v1.5) will be adopted in pedagogic contexts beyond the end of the project". Strengths The strengths of the iFLSS (v1.5) that would be positive indicators for adoption are: Innovative functionalities: graph visualisation of a domain, the user search and the social resource search System enhances the learning experience: it offers the learner trusted materials and provides an overview of a domain Time needed to maintain courses after the set-up phase is low Willingness to use the system: the iFLSS was accessed after the pilot by more than 50% of the Dutch students that 7 used it in the pilot. 7 We only have UU figures on this. Page 314 of 349 D7.4 - Validation 4 Usability Weaknesses The weaknesses of the iFLSS (v1.5) that would be negative indicators for adoption are: Added value of the iFLSS compared to Google is not clear to all users Set-up costs: effort required to set up the ontology for a new domain and to find good quality resources and contacts. System feedback: assessing trust and quality is difficult for documents from 'friends of a friend' System performance: part of the system is too slow at present to deal with large groups of learners at the same time. Opportunities The iFLSS (v1.5) has potential as follows: Social media are actively employed by students. The iFLSS allows the learner to profit from the knowledge in their network and adds an educational dimension to the use of social media. The system supports several aspects of the learning process (reflection, knowledge discovery, identifying relevant learning materials, finding people) Additional support for good students: the iFLSS provides easy access to more quality-approved documents than the standard course materials. It allows tutors to search learning materials for their students in the networks of colleagues and fellow researchers Learners have access to each other's bookmarks and can easily see which articles 'good' students use. The chances of adoption of the iFLSS would improve if the system would be part of the institutional LMS Threats The iFLSS (v1.5) has the following threats: Google is the standard: students are not willing to use other systems with overlapping functionalities Conservative attitude: tutors may be not ready for integrating their social networks activities in their teaching, which is necessary for the iFLSS to succeed Control information: tutors and the teaching manager want to control the information provided to their learners, which is contrary to the philosophy of social learning as adopted in the iFLSS. Privacy issues: tutors' contacts may be not willing to publicise their assets New developments: since the beginning of the project, Facebook has become much more dominant in social networking, whereas the iFLSS does not currently interoperate with Facebook Problem-solving support: a common learning context in which students search for non-course materials is the problemsolving context. The iFLSS does not pay attention to this particular learning situation. Page 315 of 349 D7.4 - Validation 4 Overall conclusion regarding the likelihood of adoption of the iFLSS Version 1.5: The iFLSS supports the learning process by offering two innovative services, which do not exist in current learning management systems yet. More specifically, the system has made important contributions towards (1) integrating social search, social media content and social networks in a learning environment and (2) offering learners a visual overview of a domain through which they have access to socially relevant documents. These innovative aspects of the software are highly regarded among a small group of users, but cause difficulties of comprehension and understanding for others, who were not able to see the added value of the iFLSS over existing software. This lack of awareness negatively influences the likelihood of adoption at this moment. However, we believe that as time goes by the use of social networks in learning will become more common to users and, as a consequence, the likelihood of adoption of the iFLSS will increase. In addition, users are normally conservative and it requires a long period of time before they will switch from a known system to an unknown one. Most important actions to promote adoption of the iFLSS: Functional: include interoperability with Facebook Functional: investigate possibilities for support in problem solving System set-up: provide explanation on how to include new ontologies System set-up: provide an executable file for transferring the service to new servers Usability: work on improved feedback to make it easier to assess trust and quality of results from the social resource search Usability: improve scalability for certain parts of the iFLSS User group: (1) Investigate whether the service is better directed towards self-directed learners (researchers, tutors, mature adults) rather than young undergraduates and (2) test the system with a group of early adopters Support: description of the advantages that the iFLSS has for learning compared to existing systems on the project website Page 316 of 349 D7.4 - Validation 4 Section 7 – Road map Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important future enhancements to the system in order to meet stakeholder requirements: Most important: 1. Include interoperability with Facebook and investigate best use of Facebook with respect to learning. Facebook has evolved rapidly in the last years and is still growing. There are 500 million active users, each of them having on average 130 friends. Making the system available through Facebook would lower the threshold for learners. 2. Improve user interface, especially for Social Learning. The learners had difficulties to assess the relevance and difficulty of learning materials on the basis of the current user interface. System feedback should be improved to tackle this issue. 3. Improve performance under heavy use, especially for social learning. The iFLSS is a prototype and some of its components had performance problems when dealing with a large number of users at the same time. To make the system useful, this problem needs to be solved, since learners take Google as the standard, and are not willing to wait for their results much longer than they have to wait when using Google. 4. Implement improved support for tutors. The ontology is now not tailored to a specific course or subdomain, but shows the complete domain, without making a distinction between course concepts and non-course concepts. Even though the coverage of the ontology is high, there might be concepts missing. Tutors would like to be able to adapt the ontology to their own needs and asked in the validation for a way to distinguish between course and non-course concepts and wanted to know how to find and include their own ontology. 5. Develop a personalization mechanism to improve results in ranking. The ranking of the learning materials was often considered not appropriate. A way to improve the ranking would be to take the profile of the learner into account, allowing him to search documents on his own level. Other: Link system better to course (i.e. better integration of system in course, instead of providing it as a general system) Enable learners to interact with the system (e.g. adding/adapting content, offering feedback) Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important changes to the current scenario(s) of use in order to meet stakeholder requirements: Most important: 1. The system is currently less appropriate for courses where self-directed learning is not required, e.g. courses where tutors want to control the material presented to the learner. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are possible additional educational contexts for future deployment: Most important: Page 317 of 349 D7.4 - Validation 4 1. Courses where development of independent / self-directed learning is an intended learning outcome. The iFLSS is useful in this setting, since it offers learners access to information and documents that go beyond the standard course materials. 2. Situations in which tutors wish to find learning materials for their courses from trusted sources. The tutors can easily access learning materials from fellow tutors and researchers who teach comparable courses at other institutions. 3. Courses where learners are encouraged to find materials from people more expert than themselves. E.g. learners could follow 'good' students to see which learning materials these students use. 4. The ontology fragment provides learners a concise overview of a domain. It is considered useful for reflection purposes: do I know what I should know? And which topics should I look at? 5. Exploit the power of successful social media sites such as Facebook to promote use. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important issues for future technical research to enable deployment of language technologies in educational contexts: Most important: 1. Language Technologies need to take the learner context (i.e. his level of conceptual development on a topic) into account in order to determine the appropriateness of new resources. Research in this direction should look at the possibility to develop a (1) personalised difficulty estimator for new resources triggered by specific concepts and or tags and (2) and improved method for ranking results on the basis of the learners' knowledge 2. Integration of conventional, ontology-based and social searches. Page 318 of 349 D7.4 - Validation 4 Roadmap - validation activities Further validation planned for beyond the end of the project: Testing in an informal learning environment Objective (OVT): Assessing whether knowledge discovery supports independent knowledge acquisition in informal learning contexts Methodology: Validating the software in an informal learning context Further validation planned for beyond the end of the project: Using students' own networks Objective (OVT): Results are considered more trustworthy when the users know the people who have bookmarked / uploaded them. Methodology: Identify group of people that are actively using social networks and validate the system with them Further validation planned for beyond the end of the project: Testing with groups of people having different learning styles Claim (OVT): Learners with visual learning styles appreciate the visualisation of the knowledge discovery service more than non-visual oriented learners Methodology: Use an existing questionnaire to determine the learning style of learners, such as http://www.engr.ncsu.edu/learningstyles/ilsweb.html, and investigate whether there is a relation between the learning style and the learners' opinions about the software Page 319 of 349 D7.4 - Validation 4 Appendix B.8 Validation Reporting Template for Long Thread (OUNL, AURUS & PUB-NCIT) Section 1: Functionality implemented in Version 1.5 and alpha / beta-testing record Brief description of functionality Version number of unit Changes from Version 1.0 LongThread v1.0 Based on the existing LTfLL services the data transfer has been implemented. PenSum v1.5 Changed to an English version Conspect v1.5 Changed the domain to IT Alpha-testing Pilot site and language OU, English Date of completion of alpha testing: 2 February 2011 Who performed the alpha testing? Katja Bülow, Debra Harris Beta-testing Pilot site and language: PUB-NCIT, English Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): yes If ‘No’ or ‘Partially’, give reasons: The services has been embedded in Elgg beta-testing performed by: Traian Rebedea beta testing environment (stand-alone service / integrated into Elgg): integrated into Elgg Page 320 of 349 D7.4 - Validation 4 HANDOVER DATE: 3 February 2011 Pilot site and language: OUNL, English Has stakeholder validation taken place using the service embedded in Elgg (Yes/No/Partially): yes If ‘No’ or ‘Partially’, give reasons: The services has been embedded in Elgg beta-testing performed by: Slavi Stoyanov, Adriana Berlanga beta testing environment (stand-alone service / integrated into Elgg): integrated into Elgg HANDOVER DATE: 3 February 2011 Page 321 of 349 D7.4 - Validation 4 Section 2: Validation Pilot Overview NB Information about pilot sites, courses and participants has been transferred to Appendix A.3 Pilot task – learner pilot Pilot site: PUB-NCIT Pilot language: English, Rumanian What is the pilot task for learners and how do they interact with the system? The long thread combines 4 different LTfLL services, and pilots were run as workshops because it was not practical to run full pilots for the following reasons: (1). a real task covering the LT workflow would take much time (more than could be provided by the learners), (2); it would require a training in advance of the individual services and their combinations and (3) it would require a substantial input of the learners at each step. The learners were asked to follow exactly the process flow designed and to use predefined data to investigate the threading mechanism and to use the different services into more depth. The task has been specified in the following scenario: “…learners search for and received relevant learning materials through the iFLSS service. Then learners can study them and write a synthesis, to indicate the extent to which they have understood the content (PenSum). The synthesis is an input for CONSPECT to detect automatically the relationships between concepts in a concept map format. The workflow of the long thread includes additional support from iFLESS, which provide resources collected from social media. Finally, the teacher can determine topics for a group chat in PolyCafe as the discussion can be analysed to identify important issues and level of participation of the learners.” The main objective for this validation was: Measuring pedagogic effectiveness and efficiency of the long thread. What do the learners produce as outputs? Their opinions about the Threading approach, a quality check (formative validation) on the Long Thread by using our LT questionnaire and by generating ideas to change the software as well as to propose additional threads. The focus was mainly on formative (qualitative) data, but descriptive quantitative data from the questionnaire was also collected. How long does the pilot task last, from the tutors starting the task to their final involvement with the software? Only one part of a day to get introduced, to do the hands-on exercises and to provide feedback. How do tutors/student facilitators interact with the learners and the system? The learners did follow the different steps in the thread to provide them a concrete feeling about the Long Thread. Specific groups of 4 learners have been instructed to go into more details for the service attributed to the group. There they did use the service (iFLSS to explore the resources, Pensum to write/analyse a synthesis, Conspect to analyse the concepts covered in the synthesis and PolyCAFe to discuss about the different Page 322 of 349 D7.4 - Validation 4 services) to do their subtasks. Describe any manual intervention of the LTfLL team in the pilot: The data input (including output from a preceding service in the work flow) had been predefined to enable the groups to explore the steps, which are located later in the flow. Pilot task – teachers/TEL experts Pilot site: OUNL Pilot language: English What is the pilot task for teachers/experts and how do they interact with the system? Two events have been conducted at the OUNL. Event 1: A focus group with tutors/teachers was organised to discuss the concept of threading and the long thread. They did not interact with the LTfLL services and the long thread, the information required for a proof of concepts was provided by the LTfLL team using a presentation. In this presentation a very concise description of the functionality of the LTfLL services and the concepts of threading had been given, the Long Thread had been explained as a possible educational use for such threading and finally the aims of the focus group with the method to be used . After the introduction they have been asked to give comments and to generate ideas about possible benefits, weakness and obstacles for adoption of long threads and secondly to fill in a questionnaire. The main objectives for this validation event were: a) Identifying benefits, weakness and obstacles for adoption of the long thread concept and b) Measuring pedagogic effectiveness and efficiency of the long thread. Event 2: a walkthrough and focus group with technology-enhanced learning experts was organised to get acquainted with the Long Thread. They did use the thread description and instruction on the LTfLL server to get hands-on experiences. In this exercise they did use preloaded data. The task has been specified in the same scenario as the learners at PUB-NCIT: “…learners search for and received relevant learning materials through the iFLSS service. Then learners can study them and write a synthesis, to indicate the extent to which they have understood the content (PenSum). The synthesis is an input for CONSPECT to detect automatically the relationships between concepts in a concept map format. The workflow of the long thread includes additional support from iFLESS, which provide resources collected from social media. Finally, the teacher can determine topics for a group chat in PolyCafe as the discussion can be analysed to identify important issues and level of participation of the learners.” The main objectives for this validation event were: b) Measuring pedagogic effectiveness and efficiency of the long thread and c) Investigating possible improvement to the current version of the long thread and informing the LTfLL roadmap. What do the learners produce as outputs? Their opinions about the Threading approach, a quality check (formative validation) on the Long Thread by using our LT questionnaire and by generating ideas to change the software as well as to propose additional threads. A further clustering of the comments of event 1 had been conducted by the Long Thread Validation Team. The focus was mainly on formative (qualitative) data but a quantitative analysis was also conducted (descriptive statistics and cluster analysis) How long does the pilot task last, from the tutors starting the task to their final involvement with the software? Page 323 of 349 D7.4 - Validation 4 2 ½ hours to get introduced, to do the hands-on exercises and to provide feedback. How do tutors/TEL experts interact with the learners and the system? For event 1: the tutors did not interact with the Long Thread and the services. For event 2: the TEL-experts did follow the flow of the Long Thread. Describe any manual intervention of the LTfLL team in the pilot: The data input (including output from a preceding services) had been preloaded to enable the demonstration (event1) and walkthrough (event2). Page 324 of 349 D7.4 - Validation 4 Section 3: Results - validation/verification of Validation Topics OVT: 1.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic: The integration with a step-wise access and data transfer functions in the Long Thread. Summative results with respect to validation indicator: Experimental results, with stakeholders involved and brief methodology Stakeholders / methodology: Results: In general the data transfer between the different services work as has been proven in the two hands-on validation events. Formative results with respect to validation indicator Stakeholder type Results TEL-experts Two challenges will be taken in the future: the automatic processing of the search results from iFLSS by PenSum requires that the provided URL directs to a textual document. Whenever the search result refers to more complex sites (including navigation, menus) PenSum has problems finding the main textual body. OVT: 2.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic: The teacher saves time and resources by using the long thread. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Learners The combination of language technology services saves time Experimental 3.40 0.96 Page 325 of 349 %Agree / Strongly agree n= 25 D7.4 - Validation 4 OVT: 2.1 Pilot site OUNL Pilot language English Operational Validation Topic: The teacher saves time and resources by using the long thread. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers The combination of language technology services saves time Experimental 2.13 0.99 9 TEL-experts The combination of language technology services saves time Experimental 2.38 0.52 8 OVT: 2.2 Pilot site PUB-NCIT Pilot language English Questionnaire no. & statement Experimental / control group Mean Standard deviation Learners Using the combined language technology services seems to require little efforts. Experimental 3.04 1.24 Pilot site OUNL Pilot language English n= Operational Validation Topic: There is less cognitive load required to use the Long Thread. Questionnaire type OVT: 2.2 %Agree / Strongly agree %Agree / Strongly agree n= 25 Operational Validation Topic: There is less cognitive load required to use the Long Thread. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers Using the combined language technology services seems to require little efforts. Experimental 2.00 0.76 9 TEL-experts Using the combined language technology services seems to require little efforts. Experimental 2.00 0.76 8 Formative results with respect to validation indicator Page 326 of 349 %Agree / Strongly agree n= D7.4 - Validation 4 Stakeholder type Results TEL-experts I do not believe in making things easier by providing scenario. OVT: 3.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic: The combination of language technology services would be useful for my learning. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Learners The combination of language technology services would be useful for my learning. Experimental 3.96 0.79 OVT: 3.1 Pilot site OUNL Pilot language English %Agree / Strongly agree n= 25 Operational Validation Topic: The combination of language technology services would be useful for my teaching. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers The combination of language technology services would be useful for my teaching. Experimental 2.00 1.07 9 TEL-experts The combination of language technology services would be useful for my teaching. Experimental 4.13 0.64 8 %Agree / Strongly agree n= Formative results with respect to validation indicator Stakeholder type Results TEL-experts I can see the value of the individual services quite well, but think that the added value of combining individual services will emerge after having used them and having become thoroughly familiar with them. TEL-experts I‟d love to test these applications in my practice. Teachers Is there any proof that language technology can help in teaching Page 327 of 349 D7.4 - Validation 4 OVT: 6.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic: The use of threading and the Long Thread are encouraging the motivation for learning. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Learners Using combined language technology makes learning more interesting Experimental 3.96 0.94 OVT: 6.1 Pilot site OUNL Pilot language English %Agree / Strongly agree n= 25 Operational Validation Topic: The use of threading and the Long Thread are encouraging the motivation for learning. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers Using combined language technology makes learning more interesting Experimental 3.63 0.52 9 TEL-experts Using combined language technology makes learning more interesting Experimental 2.25 0.89 8 Formative results with respect to validation indicator Stakeholder type Results Teachers As a teacher I need to know my students and have direct contact; if a lot is in between, I get lost. Page 328 of 349 %Agree / Strongly agree n= D7.4 - Validation 4 OVT: 8.1 Pilot site PUB-NCIT Pilot language English Operational Validation Topic: The flexible combination of services in a thread has a potential to solve specific educational problems. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Learners Combinations of different language technologies services, when compared with individual services, would enable new solutions for specific educational problems. Experimental 3.92 0.76 25 Learners The language technology services could be combined in many different ways. Experimental 3.84 1.07 25 Learners The combination of language technology services could be used across different subject matter domains. Experimental 3.76 1.05 25 %Agree / Strongly agree Formative results with respect to validation indicator Stakeholder type Results Learners Documenting for an extensive project that needs to be solved in a team (iFLSS, Conspect, PenSum, PolyCAFe) Learners Teaming up students by using their compatibilities (abilities) (iFLSS, PolyCAFe) Learners Improving iteratively your knowledge (PenSum, Conspect, iFLSS) Learners Suggesting resources that were missed from a conversation (iFLSS, Conspect) Learners Documenting the start-up of a project (chat, PolyCafe, Conspect, iFLSS) Learners Documenting for a bachelor thesis (Conspect, iFLSS, PolyCAFe) Page 329 of 349 n= D7.4 - Validation 4 OVT: 8.1 Pilot site OUNL Pilot language English Operational Validation Topic: The flexible combination of services in a thread has a potential to solve specific educational problems. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers Combinations of different language technologies services, when compared with individual services, would enable new solutions for specific educational problems. Experimental 3.50 0.76 9 TEL-experts Combinations of different language technologies services, when compared with individual services, would enable new solutions for specific educational problems. Experimental 3.87 0.35 8 Teachers The language technology services could be combined in many different ways. Experimental 3.75 0.71 9 TEL-experts The language technology services could be combined in many different ways. Experimental 3.88 0.64 8 Teachers The combination of language technology services could be used across different subject matter domains. Experimental 2.25 0.71 9 TEL-experts The combination of language technology services could be used across different subject matter domains. Experimental 3.25 1.49 8 Formative results with respect to validation indicator Stakeholder type Results Teachers These combinations of tools seem strongest for high school use or specific higher education fields. Teachers Conspect + Pensum + iFLss = good combination. Page 330 of 349 %Agree / Strongly agree n= D7.4 - Validation 4 OVT: 8.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic: The combination of language technology services would be useful for my teaching. Formative results with respect to validation indicator Stakeholder type Results Learners Useful to all the widgets interacting together if you have a complex task Tutor The learners were able to list 6 educational scenarios for threads from their own needs OVT: 8.2 Pilot site OUNL Pilot language English Operational Validation Topic: The combination of language technology services would be useful for my teaching. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers From educational point of view, I see potential in combining different individual language technology services in one application. Experimental 4.13 0.64 9 TEL-experts From educational point of view, I see potential in combining different individual language technology services in one application. Experimental 3.88 0.64 8 OVT: 9.1 Pilot site OUNL Pilot language English %Agree / Strongly agree n= Operational Validation Topic: Users were motivated to continue to use the system after the end of the formal validation activities Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Learners I would be interested in using the combined language technology services after this pilot. Experimental 3.44 0.87 Page 331 of 349 %Agree / Strongly agree n= 25 D7.4 - Validation 4 OVT: 9.1 Pilot site OUNL Pilot language English Operational Validation Topic: Users were motivated to continue to use the system after the end of the formal validation activities Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers I would be interested in using the combined language technology services after this pilot. Experimental 2.63 1.41 9 TEL-experts I would be interested in using the combined language technology services after this pilot. Experimental 4.25 0.71 8 %Agree / Strongly agree n= Formative results with respect to validation indicator Stakeholder type Results TEL-experts In general the concepts are nice but there are many usability and technical issues to solve. OVT: 9.2 Pilot site PUB-NCIT Pilot language English Operational Validation Topic: The combined language technology services could work well alongside other software. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Learners The combined language technology services could work well alongside other software I usually use. Experimental 3.28 0.89 25 Learners The combined language technology services could work well alongside individual services. Experimental 3.64 0.86 25 Learners I feel comfortable in using language technology services in combination. Experimental 3.48 1.05 25 Page 332 of 349 %Agree / Strongly agree n= D7.4 - Validation 4 OVT: 9.2 Pilot site OUNL Pilot language English Operational Validation Topic: The combined language technology services could work well alongside other software. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers The combined language technology services could work well alongside other software I usually use. Experimental 3.13 1.25 9 TEL-experts The combined language technology services could work well alongside other software I usually use. Experimental 3.87 0.35 8 Teachers The combined language technology services could work well alongside individual services. Experimental 3.38 0.74 9 TEL-experts The combined language technology services could work well alongside individual services. Experimental 3.75 0.89 8 Teachers I feel comfortable in using language technology services in combination. Experimental 3.96 0.94 9 TEL-experts I feel comfortable in using language technology services in combination. Experimental 2.63 0.52 8 OVT: 9.3 Pilot site PUB-NCIT Pilot language English %Agree / Strongly agree n= Operational Validation Topic: The use of threading has the potential to enlarge the possible user groups. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Learners Combined language technology services when compared with individual services have added value for education. Experimental 3.64 0.86 Page 333 of 349 %Agree / Strongly agree n= 25 D7.4 - Validation 4 OVT: 9.3 Pilot site OUNL Pilot language English Operational Validation Topic: The use of threading has the potential to enlarge the possible user groups. Questionnaire type Questionnaire no. & statement Experimental / control group Mean Standard deviation Teachers Combined language technology services when compared with individual services have added value for education. Experimental 3.38 0.74 9 TEL-experts Combined language technology services when compared with individual services have added value for education. Experimental 3.38 0.74 8 Page 334 of 349 %Agree / Strongly agree n= D7.4 - Validation 4 Section 4: Results – validation activities informing the user requirements for the Threading approach Case I (Focus group with tutors) Measurement Instruments Card sorting and questionnaire were the two measurement instruments used for data collection. Card sorting requires the participants to generate, in the format of statements, ideas about the benefits, weakness and obstacles for adoption of long threads. The list of statements was then uploaded into a webbased environment and the members of the LT team individually performed sorting of the statements across the three initial groups (benefits, weakness and obstacles) to identify some additional patterns. Analysis and results The data collected is qualitative by nature but apart from content analysis, we also performed some quantitative analysis on it. The participants generated 55 statements. Of them, 14 were Benefits, 22 Weaknesses, 13 Obstacles and 6 Interesting/Suggestions. The participants like the idea that large amount of text can be handled, feedback is objective, and long threads would have a positive effect on learners for developing effective learning strategies. The tutors have some concerns regarding resistance of stakeholders, too much reliance on technology, applicability to different educational context, quality of outcomes, and that combination may not add value to individual services. Some possible obstacles, as indicated by the participants, are combined workload, completeness and correctness of results, and acceptance of “automated” support by stakeholders. During this first level of analysis, we noticed some recurring issues across the four basic headings, e.g feedback, workload, stakeholders‟ resistance, relevance to other domains and developing learning strategies. To reveal possible hidden structures in the data the LT team applied card sorting. The list of all statements was unloaded into websort, a web-based tool supporting card sorting. The LT team did use closed categories, which have provided the websort results (up to 25% low agreement; 50-75% medium agreement; more than 75 % high agreement). By using the three graphs visualising the results from cluster analysis and a cluster analysis algorithm, which takes the input from the item-by-item percentage matrix to aggregate in an objective way the contribution of all sorters, we ended up with eight groups of statements. They are as follows: Learning strategies (5 items), Adoption of stakeholders (5), Method (11), Implementation (5), Quality of outcomes (6), Other domains uptake (4), Feedback (9), and Workload (10). 1. Learning strategies are about the effect of long threads on students‟ learning. Some of the statements included in this group are: „Students learn to critically look for feedback‟; „Students will learn to become more independent‟, and „Feedback loop for students‟. Page 335 of 349 D7.4 - Validation 4 2. Adoption of stakeholders includes statements about barriers for adopting long threads. Examples are: „Willingness of the teachers to use it? Resistance‟, „Willingness of schools/students to use the tools‟, „Convince students and teachers of value of programs that provide feedback‟. 3. Method consists of some generic statements that refer to threads as a pedagogic approach. Some representative statements are: „Complete, supplementary method‟, „Combining (doubtful) services might be detrimental to good aspects‟, „Different angles of the approach for educational problems‟, „Is there any proof that language technology can help in teaching?, „The whole might be more than the sum of the parts‟. 4. Implementation contains items such as „Implementation [is] difficult‟, „Isn‟t a quite great risk to buy this?‟ „A cost benefit analysis is missing‟. 5. Quality of outcomes indicates concerns about capacity of language technologies to provide the same quality of outputs as human experts. Examples of statements are: „Quality is not the same as quantity‟, „System analyses, so no tutor-dependent‟, „Can the programs detect misconceptions and lack of understanding/quality?‟. 6. Other domains uptake reflects concerns of the participants about the applicability of long threads to different context, domains and educational levels. Some statements included in this group are as follows: „These combinations of tools seem strongest for high school use or specific higher education fields‟, „Not applicable in all types of education‟, „Are the tools applicable to any contexts?‟ 7. Feedback, as the name suggests, is about the potential of long thread to give objective and reliable feedback to students and the ability of stakeholders to read it in a right way. Some examples are: „Feedback is not dependent on an individual tutor‟, „Feedback is objective‟, „The services lead to unwanted/unproductive bias in all feedbacks‟, „Students and teachers are unable to use it. Do not have skills‟, „Not always reliable feedback‟, „Not all students are able to critically judge feedback‟, „Risk for students to adapt to wrong feedback‟. 8. Workload is about efficiency of the tool in terms of time and effort spent. Some of the statements included in the cluster are as follows: „Large amounts of texts can be handled‟, „Reduction of workload for teachers‟, „Overall tutor load should go down‟, „Tools might lead to higher cognitive load exceeding the gain‟, „Increased work load for students‟, „Tutor still needs to put much time in it‟. A further interpretation of results suggests some possible combinations between the eight categories, namely: Method (ideas related to thread as a pedagogic approach), Implementation and Adoption (a combination between the clusters Implementation, Stakeholders adoption and Other domains uptakes), Feedback and Quality of outcomes (Learning Strategies, Feedback and Quality of outcomes), and Workload. Provisional results on these categories are: Method:From the one hand, the participants principally assume that a long thread as a whole might be more than the sum of its parts. A long thread is a method complementary to the individual services that provides different perspectives on educational problems. From the other hand, the tutors suspect that combinations of services could be detrimental for individual services. Feedback and Quality of outcomes. From the one hand, the locus of control is on the system, not on the individual tutor, which makes the feedback objective. Feedback provided by the tool supports students to develop self-regulated learning. At the same time, concerns are raised as whether such feedback is reliable, can it match the quality of human experts, and whether stakeholders have the needed skills to deal with the process and outcomes of such “automated” feedback. Page 336 of 349 D7.4 - Validation 4 Workload. In general, the participants believe that long threads should reduce the workload of both teachers and students but the current state of the application does not provide enough evidence that long threads can save time and efforts. On contrary, the participants see many problems in this respect. Implementation and adoption. This composed group contains statements that expressed mainly concerns regarding resistance of stakeholders to adopt long threads, applicability to other context and domains, but also some suggestions for how to increase the likelihood of adoption. Overall, the results indicate that the participants like some aspects of the idea of long threads (benefits), some other they dislike (weakness) and the participants have some concerns regarding implementation and adoption (obstacles). More importantly, it turned out that for one and the same issue (e.g. method, feedback, workload), the tutors had both positive and negative reactions. It is an indication for the complexity of their thoughts as a reflection of the complexity of the problems they address. This draws a more realistic picture of the perception of the participants on the idea of long threads at a particular moment. The results from the long thread questionnaire LTQ point out at the same direction. As can be seen, the highest scores get items such as „From educational point of view, I see potential in combining different individual “language technology services” in one application‟ (M = 4.13; SD = 0.6); „The “language technology services” could be combined in many different ways‟ (M = 3.75; SD = 0.7); „Using combined “language technology services” would make learning more interesting‟ (M = 3.63; SD = 0.5); „Combinations of different “language technologies services”, when compared with individual services, would enable new solutions for specific educational problems‟ (M =.3.5; SD = 0.8). Low scores get the following items: „The combination of “language technology services” would be useful for my teaching‟ (M = 2; SD =1); „Using the combined “language technology services” seems to require little effort‟ (M = 2; SD = 0.8); „The combination of “language technology services” seems to save time‟(M = 2.13; SD =1); „The combination of “language technology services” could be used across different subject matter domains‟( M = 2.25; SD = 0.7); „I feel comfortable in using “language technology services” in combination‟( M = 2.25; SD = 0.9). In general, the tutors evaluate high the potential of long threads to provide in long term new solutions for educational problem. However, the tutors do not see at the moment, how long threads can help their teaching neither in term of usefulness, nor in term of efficiency (saving time and efforts). Based on the formative feedback on the individual services in the Long Thread by the learners we found as main results that: they appreciate the flexibility of the presented analyses (dynamic conceptogram, zooming possibilities, different ways of analyzing conversations). they like the highlighting functions for important feedback/results (concepts covered in course, concepts related keywords, the conversation thread highlight feature). they did recognise additional potentials for the individual tools, which are not yet covered by their user scenarios (e.g. Conspect to identify concepts from a course and searching them on Wikipedia or Google, Pensum to summarize a discourse or presentation, PolyCAFe to improve their abilities for debate, dialogue and collaboration, iFLSS use it as a dictionary/thesaurus for new concepts). Page 337 of 349 D7.4 - Validation 4 they consider the user interface of a variable quality and not consistent across the services. While Pensum has been praised about its user interface potentials, it receives also critics about not being user friendly. The other services have also been criticized (the conceptograms are too compact, difficult to rearrange, indicators do not have very good/useful labels, difficult to read due to font size and underlining). they see quality issues in the feedback (Inaccessible concepts due to stemming, not neglecting small irrelevant phrases in original texts, starting a new unrelated paragraph leads always to a coherence error, several not important words are part of the conversation feedback, in the resources from the social network is a lot of junk besides the relevant ones). they discovered some stability issues in some of the services (when combining two resources there have been several errors, could not load new RSS feeds, problems using the conversation thread tabs). they found find the on-line guidance insufficient for some of the services (there are few indications on how to use it and very few hints, this process is =in the beginning= not intuitive) they encountered several usability issues due to the use of many different widgets in the long thread they witnessed problems with the loading time of the widgets in the long thread (too many of them, too many resources needed) Page 338 of 349 D7.4 - Validation 4 Section 5: Results – validation activities informing transferability, exploitation and barriers to adoption No information provided. Page 339 of 349 D7.4 - Validation 4 Section 6: Conclusions Validation Topics OVT Operational Validation Topic Validated unconditionally Validated with qualifications* Not validated Qualifications to validation PVT1: Verification of the Long Thread OVT1.1 The integration with a step-wise access and data transfer functions in the Long Thread. PUB-NCIT OUNL PVT2: Tutor efficiency OVT2.1 The teacher saves time and resources by using the long thread. PUB-NCIT OUNL OVT2.2 There is less cognitive load required to use the Long Thread. PUB-NCIT OUNL PVT3: Quality and consistency of (semi-) automatic feedback OR information returned by the system OVT3.1 The combination of language technology services would be useful for my teaching. PUB-NCIT OUNL-TELexperts OUNLTeachers PUB-NCIT OUNL PVT4: Making the educational process transparent OVT4.1 N/A PVT5: Quality of educational output OVT5.1 N/A PVT6: Motivation for learning OVT6.1 The use of threading and the Long Thread are Page 340 of 349 Teachers did not have hands-on experience with the Long Thread D7.4 - Validation 4 OVT Operational Validation Topic Validated unconditionally Validated with qualifications* OUNL Not validated Qualifications to validation encouraging the motivation for learning. PVT7: Organisational efficiency OVT7.1 PVT8: Relevance OVT8.1 The flexible combination of services in a thread has a potential to solve specific educational problems. PUB-NCIT OVT8.2 The use of threading has the potential to enlarge the possible user groups. PUB-NCIT OUNL PUB-NCIT generated a set of possible new threads PVT9: Likelihood of adoption OVT9.1 I would be interested in using the combined language technology services after this pilot. PUB-NCIT OUNL OVT9.2 The combined language technology services could work well alongside other software. PUB-NCIT OUNL OVT9.3 The use of threading has the potential to enlarge the possible user groups. PUB-NCIT OUNL Strong disagreement between TEL experts and teachers Exploitation (Strengths, Weaknesses and Threats Analysis) The objective you are asked to consider is: "The Long Thread will be adopted in pedagogic contexts beyond the end of the project". Strengths LT provides in time feedback of a consistent quality (objective, not tutor-dependent) LT improves the independency of the learners (including skills to judge the feedback received) LTfLL services and threading introduces new angles to approach educational problems LT enables self-directed learning for complex tasks as a complete, supplementary approach for more traditional Page 341 of 349 D7.4 - Validation 4 learning Innovative technology and educational approaches Weaknesses Fear of bypassing teachers and too much reliance on computers and software An imbalance between the educational gain and the increased (work)load Quality issues of the feedback are risky for the learning process The orientation on textual utterances makes transfer to Maths and “exact” sciences problematic or even impossible. Value of programs depend on strictly described learning tasks is not suitable for writing essay about topic of choice Threats Requires a a very large resource consumption to set up, install the corpus, tutor involvement and to create network Threads specify a standardised task flow which may conflict with the learning style of the learners Not all learners are able to critically judge feedback. A steep learning curve: the LT requires the knowledge and the use of a lot of different tools Too much widgets may lead to cognitive overload Overall conclusion regarding the likelihood of adoption of the threading approach: The concept of threading appears to be useful for the stakeholders in our Long Thread Validation. However, the current version is far away from being ready to be sold as a stand-alone product. We consider that the “proof of concept” of threading has been accepted. The practical use of threading has now to be proven in more different educational contexts, but the problems (stability, quality and accuracy issues etc.) of some of the individual services are a serious risk to successful validation in real contexts. As one of the tutors participating in the Long Thread validation said, “Combining (doubtful) services might be detrimental to good aspects”. Clearly a prerequisite for more extensive roll-out of the Long Thread is for additional work to be done at the level of the individual services, for which Version 1.5 is still an intermediate version. Most important actions to promote adoption of FLSS: Technical: Improve the individual services Improve the data integration and the posssible workflows with their access and connections Improve the consistency across the services used (interface, use of concepts). Page 342 of 349 D7.4 - Validation 4 Improve the loading time in particular whenever the LT needs many widgets Roll out: Ensure that the potential users get experiences in the use of the different individual LTfLL services to build up their self-confidence. provide convincing examples of educational threads (including other domains, languages and learning strategies). Make setting up threads more easy, e.g. outsource substantial parts of the preparation (corpuses and delivering processed data to be used in the services). Deliver guidelines and instructions to manage the expectations of the stakeholders Page 343 of 349 D7.4 - Validation 4 Section 7 – Road map Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important future enhancements to the system in order to meet stakeholder requirements: Most important: Improve the guidance on the screen Make the interfaces of the different services more user friendly and more consistent across the services Improve the quality of the results and the feedback (correctness, relevancy) generated within the services Make a editing environment to enable the creation of threads by end users Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important changes to the current scenario(s) of use in order to meet stakeholder requirements: Most important: Design a scenario for the not Language Technology oriented stakeholders to make threads Based on the results and conclusions from validation, the LTfLL team has agreed that the following are possible additional educational contexts for future deployment: Most important: Investigate whether the LTfLL approaches match the learning needs and pedagogic approaches at High schools. Based on the results and conclusions from validation, the LTfLL team has agreed that the following are the most important issues for future technical research to enable deployment of language technologies in educational contexts: Most important: Design the complete set of rules about input and output requirements to be met by the linked services. Design how the data will be shared Page 344 of 349 D7.4 - Validation 4 Annex 1: Qualitative data from the learner workshop on threading (PUB-NCIT). PART 1. Feedback for individual services: CONSPECT Strengths: - Highlighting the concepts that are present in the course documentation; - Highlighting the concepts that are related to a keyword the student is looking for, and the semantic relationships between the keyword and other concepts - The conceptogram is dynamic, with a pleasant design that helps in the visualization of the concepts. It is also interactive. - The existence of the zoom functionality for the conceptogram Weaknesses: - The widgets (using iframes) are a very bad idea: messy layout, a lot of imbricated scrollbars, poor usability; - The conceptograms: the elements are too compact, most of the times several concepts are displayed one over another, thus are difficult to follow; - Difficult to rearrange concepts in a conceptogram  they always try to go to the initial location (due to the force-based layout, maybe?) - There are few indications on how to use it, very few hints and texts that help the user - There are no labels for the colors in the conceptogram. - The language is technical, inaccessible to people in other domains (especially due to stemming, for a lot of words are not easy to determine their original meaning) - Very unintuitive user interface (go back buttons, zoom in/out) - When combining two resources, there have been several errors - Could not load new RSS feeds Potential uses of the tool: - Identification of concepts from a course and searching them on Wikipedia or Google - Useful as it extracts very easily the most important concepts from different texts and to see the semantic links between them PENSUM Strengths: - Detects relatively well the phrases that were not part of the summary - Analyzes pretty good different types of summaries (even automatically generated ones) - The feedback does not take very long - After getting used to it (although this process is not very intuitive), it is easy to use overall Page 345 of 349 D7.4 - Validation 4 - The user interface has good potential; it is simple, coherent, good placement of the buttons Weaknesses: - The steps required to get a new feedback need to be explained better (not very intuitive to use for the first times) - For small phrases in the original text (including rhetorical phrases), the feedback always says that they were not covered (although they are irrelevant and these kind of phrases should be excluded). - If you start a new paragraph that is not related to the previous one, one will always get an error for the last phrase in the previous paragraph not being “coherent” with the first phrase in the new paragraph. - In each phrase, you need to have at least one keyword, otherwise it is labeled as incoherent - Not very friendly user interface - Sometimes, the feedback is not relevant or correct - Seems to use only keywords in order to generate the feedback Potential uses of the tool: - For pupils and students that want to write summaries for certain courses. - For teachers that need to summarize some course materials or papers - For anyone that would like to summarize a discourse or a presentation - For teachers that want to “verify”/assess a summary written by a students. - Interactive lessons for the high-school - Checking summaries and maybe even anti-fraud detection POLYCAFE Strengths: - Many different ways to analyze a conversation - The thread highlight feature is useful - Useful to see the topics that were relevant to the discussion and the missing topics from a discussion - Useful statistics about the conversation - A way to find useful/fruitful conversations Weaknesses: - The indicators from the participant feedback maybe should be renamed as they do not have very good/useful labels - Problems using the conversation thread tab in the conversation visualization sometimes - Cannot make the correspondence from a key concept to a thread (highlight the most important threads given a concept) - Several not important words (told, yes) are part of the conversation feedback Page 346 of 349 D7.4 - Validation 4 Potential uses of the tool: The tool is useful for students because: - it helps them improve their abilities to take part in a debate, in a dialogue, in a collaboration - very good for self-evaluation, self-assessment, reflection - semi-automated feedback for conversations IFLSS Strengths: - Useful links for Youtube - The suggested persons are really relevant - Some of the scientific papers are relevant - Some of the Slideshare presentations are relevant Weaknesses: - there would be a need for a concept graph (or similar concepts) - In the resources from the social network, there is also a lot of junk besides the relevant resources - duplicates in the scientific papers list - difficult to read and use because the font is very small and every resource is underlined Potential uses of the tool: - use it as a “dictionary” (Traian – maybe thesaurus) for new concepts - personal learning (especially the Youtube videos) - finding relevant people to offer you support and information in a given domain - searching for scientific papers in a given domain B. weaknesses of the current long thread: - too many widgets that make it difficult to use (6-7 widgets should be a maximum) - the widgets are very diverse (in look and feel, in how they work and respond) - difficult to have a task that requires this current threading scenario - there are usability issues due to the very high number of widgets, with lots of different information (students seem to get lost) - students that are not very computer prone, would adapt very difficult to the long thread C. strengths of the current long thread: - useful to have all the widgets interacting together if you have a complex task - innovative technology and approach Page 347 of 349 D7.4 - Validation 4 - parts of the long thread are very useful - it is useful to have communication between the widgets D. conclusions for long thread - the students opted for using shorter threads (at most 3 tools working together) - there are some problems with the loading time of the widgets in the long thread (too many of them, too many resources are needed) - the most useful links between widgets are from Conspect to IFLSS and from PolyCafe to IFLSS PART 2. Threading ideas: Idea 1 – Documenting for an extensive project that needs to be solved in a team 1. Use IFLSS to search for relevant articles for the subject that is studied 2. Using Conspect to extract the common concepts from all the relevant resources returned at step 1, plus the links between them 3. Using the concepts detected in step 2 and the files in step 1, make a summary that is verified with Pensum. Each student makes a summary, by using the starting concepts 4. After that, the members of each team make a chat brainstorming to see what each of them brings new in the summaries for the studied subject. Using Polycafe to analyze the previous chat, the students choose a project leader. Idea 2 – Teaming up students by using their compatibilities (abilities) 1. Given a certain subject 2. Use IFLSS to get relevant people for the subject 3. Group them automatically in teams/groups for chat conversations. 4. Analyze each chat conversation with PolyCAFe 5. Determine the people that have the same level of abilities (are compatible) given the subject in order to team up for solving a problem. Idea 3 – Improving iteratively your knowledge 1. Given a course and the materials to read 2. Write a summary and analyze it with Pensum 3. For the topics that where not covered either: (3a) locate them with Conspect in an already stored conceptogram (3b) use IFLSS to read resources (or watch videos) about them 4. return to them 2 and see if the summary has improved. Idea 4 – Suggesting resources that were missed from a conversation 1. Have a chat conversation (or a discussion thread in a forum) Page 348 of 349 D7.4 - Validation 4 2. Search the relevant concepts that are missing from the conversation with either: (2a) IFLSS (2b) locate them with Conspect in an already stored conceptogram Idea 5 – Documenting for a project 1. Use a chat for brainstorming before the project 2. analyze the chat results with Polycafe to detect the most important utterances, 3. feed these utterances as input to Conspect. To discover the most important concepts with their connections 4. use IFLSS to find additional resources. Idea 6 – Documenting for the bachelor thesis 1. Feed the documentation for the thesis topic in Conspect in order to generate the conceptogram for each book and article 2. then combine the conceptograms to get the most common concepts. 3. use IFLSS to find additional resources for the most common and most important concepts 4. document these results together with the original documents. 5. Then have a chat with the tutor (or a master student) and 6. analyze it with Polycafe to see if the student has a good performance (similar to the tutor or master student). Page 349 of 349