1. Introduction
Today, open learning is facing new problems as the evolution of higher education has come up as an increasingly complex ecosystem, which comprises various stakeholders such as universities, faculties, students, higher educational institutions, governmental sectors, employer entities, and EdTech software companies [
1,
2]. In open learning cases, most of the stakeholder organizations have established their proprietary informational systems [
2,
3], such as the very popular Massive Open Online Courses (MOOCs) platforms of Udacity [
1,
3,
4], Coursera [
5], and edX [
6], as well as various on-line or remote learning platforms in large universities (e.g., MIT, Stanford, etc.) or open universities [
1,
2]. As a result, professors giving lectures utilizing diverse remote teaching tools and students carrying out learning processes on different HEI software platforms are now very common states. These proprietary HEI informational systems are usually heterogenous, decentralized, and difficult to interact with. Furthermore, numerous open learning resources (and open educational resources in a broader sense) are dispersed and separated in these isolated, globally distributed HEI information systems. It is difficult to achieve effective sharing of open learning resources and, hence, a great waste of educational resources can occur [
2,
7,
8,
9,
10].
An effective interaction mechanism plays a paramount role in the open learning sector as well as the entire education ecosystem [
11]. However, interoperability issues, or, more concretely, the question of authentic, non-repudiable, and quickly available data sharing among related open learning information systems and stakeholders is a key issue that remains unresolved.
The paramountcy of the interaction mechanism in an open learning ecosystem is embodied in many other scenarios too. For instance, in joint-cultivation projects, universities need to exchange and record students’ credit and related exam details. When hiring personnel, employer companies need to carefully check and confirm students’ credentials to avoid fraud and falsification. It would be of value to observe applicant student’s detailed information such as behavior records and learning process achievements, if these data can be provided and ensured to be authentic, and to determine whether the candidate is qualified with claimed competencies and character [
1,
12,
13].
In these scenarios, students’ important data such as credentials (transcripts, certificates, e.g., diploma, degree, training, internship, etc.), learning process behavior records, and study achievements records need to be carefully produced, issued, preserved, and shared among different stakeholders [
14,
15]. Moreover, the sharing process needs to be easy, fast, and available on-demand. Meanwhile, security, authenticity, and confidentiality should be guaranteed. Such a data sharing mechanism has been and still remains a key issue not yet properly resolved [
16].
Numerous technologies have been devised to mitigate this data integration problem, e.g., data interface, web services, role-based access control, etc. However, these classic or traditional technologies usually aim at merely one or several aspects of the complex data exchange issue. It heavily depends on system designer’s expertise and proficiency to determine which technology or technologies should be adopted or combined during system implementation. This may cause customized, special-purpose, and cumbersome solutions rather than comprehensive and general-purpose ones. It would deteriorate in a team coding situation. The design process would be complicated, time-consuming, and error-prone. Consequently, software systems are easy to falsify through back-end databases, and they are hard to detect and prevent.
In the past decade, blockchain has become a promising approach to solve this problem [
17]. It features a novel computing paradigm that combines four categories of technology: distributed storage; peer-to-peer network communication; distributed consensus mechanism; and cryptography [
18,
19]. Originally, it comprised the implementation of the cryptocurrency Bitcoin. It has quickly evolved as a technical platform, a systematical and comprehensive data exchange solution for heterogeneous systems. Many blockchain implementations have been developed, to enumerate, such as Ethereum, R3 Corda, Hyperledger Fabric, Libra, etc. [
19]. It has attracted numerous researchers’ attention and has been introduced to many fields, such as integrity verification, anti-piracy, supply chain management, and medical and health, not to mention education [
20]. Blockchain can be implemented in education and can enhance higher education, and developing an “educational infrastructure” to support online learning is part of this [
1].
Figure 1 concisely summarizes the key stakeholders and challenges of data sharing in our open learning context, key technologies of blockchain, and typical open learning applications that could be enhanced as “trusted” with blockchain technology. Obviously, there are difficulties and challenges in adopting blockchain to solve the open learning trusted data sharing issue.
This study is conducted to respond to the following research question: What would a unified, trusted and blockchain-based data-sharing infrastructure need to be to solve the interoperability issue in an open learning ecosystem?
We further decompose the research question into three sub-questions:
What would the open learning scenario be for the introduction of such a blockchain infrastructure? (scenario schema);
What would the open learning application be for the introduction of such a blockchain infrastructure? (application model);
What would the integration means be to provide an infrastructure for HEI information system integration into blockchain? (integration framework).
Based on a detailed literature review and decades of software development expertise in EdTech, our methodology is to propose a consortium blockchain-based architecture which puts a business process schema, a conceptual model, and a pragmatic developing framework in a synergic usage as a potential solution to the research question. We try to respond to the research question from a view combining both software architecture (supporting system integration) and open learning business features. As presented in this work, we provide an implementation that leverages consortium blockchain technology and makes further extensions for proprietary HEI systems integration, meanwhile achieving trusted open learning data-sharing management and integrity verification.
The main contribution of this paper is as follows:
- (1)
Propose a consortium blockchain-based architecture serving as a trusted and unified data sharing solution among decentralized open learning ecosystems;
- (2)
(a) Design and develop a pragmatic extended blockchain network, using Hyperledger fabric 1.4.4 LTS, and using cache database to optimize blockchain system data processing performance; (b) design and develop a “trusted open learning behavior and achievement management” application as a proof-of-concept of the proposed architecture;
- (3)
Run the proof-of-concept application for 6 months with data from production systems and conduct a series of tests on its performance and scalability to analyze and verify the extended blockchain network.
It is worth mentioning that our proposed blockchain infrastructure is suitable not only for the open learning sector but also the higher education domain in general, provided that the application scenario is based on the mature adoption of information systems. An important consideration of selecting the open learning sector as our research object is that this sector bears a more vivid informational feature compared with other ones.
The remainder of this paper is organized as follows:
Section 2 elaborates on blockchain technology and related work in education field;
Section 3 explains the research methodology with the proposed architecture;
Section 4 clarifies the results obtained in architecture implementation and proof-of-concept; in
Section 5, discussions are covered by experiments on performance, scalability, and comparison with previous work; finally,
Section 6 concludes this paper with future work.
3. Methodology: Proposed Architecture
A business schema of a blockchain-integrated open learning scenario, an application model of blockchain-integrated open learning, and a framework of blockchain integration are used as methodological approaches in this research. These three can respectively function as scenario architecture, application architecture, and technology architecture for open learning data sharing, and jointly constitute an overall architecture that serves as a trusted and unified data sharing solution for open learning ecosystem, as named “Trusted open learning process and achievement management framework on blockchain” (TolFob) in our work. The overall architecture comprises two different abstraction levels: Level 1 involves conceptual depictions for theoretical research, which includes the scenario schema of open learning and its business process abstraction with blockchain integration (
Section 3.1), as well as the conceptual application model of open learning with blockchain integration; while Level 2 is the guideline and means for pragmatic software development, i.e., the framework for trusted consortium blockchain-based open learning system integration and implementation (
Section 3.3).
3.1. Business Schema
In order to provide a trusted, unified, blockchain-based data-sharing infrastructure for open learning, our first step concerns attempting to depict the business schema of an open learning scenario with the integration of blockchain. That is, we need to discern, with the introduction of such a blockchain infrastructure, what the open learning scenario will be. This section presents the proposed business schema for consortium blockchain enhanced open learning scenario and abstraction of its business process.
Figure 2 illustrates a conceptual data-sharing snapshot of the open learning ecosystem studied in our work. The initial solution in our proposal is based on only one single consortium blockchain network. However, considering the increasing adoption of blockchain technology, it would be the case that different groups of HEIs and government sectors construct different blockchain networks with various implementations. Therefore, our enhanced proposal is strengthened by an extensional component of cross-chain mechanism to handle the forthcoming blockchain heterogeneity, which will be discussed in the next section.
In general, the proposed open learning scenario business schema could be formalized as a 9-tuple: <Is, Sr, Br, Oe|Tn, Bc, Sy, Oc, An>, in which Is denotes proprietary HEI open learning Information system, Sr denotes open learning Stakeholder, Br denotes Behavior, Oe denotes Outcome, Tn denotes Transaction, Bc denotes open learning Blockchain system, Sy denotes Strategy, Oc denotes Off-chain storage, and An denotes blockchain-enabled integration application of open learning.
More concretely, we make further definitions and denotations as follows:
Sr: <Srs, Srp, Sre, Sro, Srg>, denotes student, professor, employer, HEI open learning institution (open university), and government sector respectively. To propose the architecture, we first discern and define these five fundamental categories of stakeholders;
Is: <Isa, Isp, Isn, Isd, Isc>, denotes educational administrative system, diploma-oriented online teaching system, non-diploma-oriented online open course system, government’s diploma administrative system, and certificate administrative system as well. Clearly, in an open learning ecosystem, data sharing is heavily dependent on information system interaction. Therefore, we further discern and define five categories of HEI proprietary information systems;
Br: <Brs, Brp, Bre>, denotes Behaviors performed by student, professor, and employer respectively. It is somewhat difficult to accurately define and categorize all the practical open learning behaviors; herein, we name some typical examples: for students, separate and statistical data of <timestamp, time length, content> of online learning activities, such as course-taking, video watching, discussion and quizzes; for professors, issuing data of <timestamp, content> of scores, etc.;
Oe: <Oen, Oep>, denotes Outcome non-available, and Outcome presented by open learning, e.g., student’s essay, picture, model (in the digital version), etc.;
Tn: <Tnt, Tnb>, denotes Transaction intrigued by HEI proprietary learning system and blockchain respectively; e.g., Tnt: log-in, log-out, upload, download, post, submit, etc.; Tnb: hash, public key encryption, etc.;
Bc: <Bcb, Bcc>, denotes specific Blockchain implementation and cross-chain middleware respectively;
Sy: <Syc, Syo, Syp, Sys, Syh> denotes strategy with or without cross chain, with or without off-chain storage, with or without privacy, with or without security, with or without hybrid tradeoff (e.g., privacy-aware, and security enhancement) respectively;
Oc: denotes Off-chain storage;
An: <Ani, Anc, Ano>, denotes blockchain-enabled new applications, e.g., integration verification (data provenance and counterfeit), credential data sharing (diploma, certificate, and behavior record), outcome intellectual property (IP) management, etc.;
With these defined notation sets, the proposed open learning scenario schema could be described as a suite of open learning business process sets, as illustrated in
Figure 3.
Without loss of generality, a normal business process description of the proposed schema could be: An open learning stakeholder (Sr) performs an open learning behavior (Br) on a proprietary HEI open learning information system (Is), and produces an open learning outcome (Oe), and intrigues a transaction (Tn) which is automatically submitted to the blockchain (Bc) with a selected strategy (Sy), and reflects to a blockchain-enabled new open learning application (An).
To clarify, here we select two typical open learning business processes and specify them in detail.
Open learning Business process 1: (line ➀ in
Figure 3.) A trusted, secured open learning behavior and outcome sharing process without privacy protection.
A student stakeholder (Srs) logs in a non-diploma-oriented online open course system (Isn), studies an on-line construction modeling course (e.g., Sketch-up) (Brs), submits (Tnt) a series of Sku models (Oep) later, with the hash of blockchain (Tnb), the learning record and Sku models being entered into a blockchain system (Bcb) with the hybrid strategy (Syh) of “without cross-chain, with off chain-storage, without privacy, with security” (Oca), and finally ended with the integration verification application (Ani) and outcome IP management application (Ano).
Open learning Business process 2: (line ② in
Figure 3.) A trusted, secured certificate sharing process with privacy protection.
An employer stakeholder (Sre) logs in a credential data sharing application (Anc) to check a candidate’s diploma or certificate. The diploma was stored in an off-chain storage (Oca) with a hybrid strategy (Syh) of “with cross-chain, with off chain storage, with privacy, with Security” (Oca). The hash of diploma (Tnb) was stored in a blockchain system with cross chain functionality (Bcc), by an open university staff (Bru) on an educational administrative system (Isa) and the government’s diploma administrative system (Isd).
3.2. Application Model
When the business schema of a blockchain-integrated open learning scenario is abstracted, our next step is trying to depict the application model. That is, we need to discern, with the introduction of a blockchain infrastructure, what the open learning application will be. This section presents our proposed conceptual model for a consortium blockchain-integrated open learning application. The conceptual model for integrated and verified open learning applications is shown in
Figure 4. The proposed model comprises eight tiers that can be further divided into three layers: a consortium blockchain layer; a cross-chain layer; and a trusted open learning application layer. The eight tiers are enumerated as follows from a top-down point of view: trusted open learning application tier; HEI learning system business abstraction tier; cross-chain tier; adaptation tier; contract tier; security tier; storage tier; and network tier.
The core components of the proposed model are as follows:
- (1)
Trusted open learning Application layer: This layer consists of the entire contents of the HEI learning system business abstraction tier and trusted open learning application tier.
Trusted open learning application tier: refers to a suite of open learning applications based on authentic data exchange that collects, stores, and manages designated data from existing HEI learning systems, such as credential sharing applications, learning achievements sharing applications, etc. These applications are defined as trusted and permissioned open to stakeholders.
HEI learning system business abstraction tier: this tier comprises four components: legacy HEI learning system data interface; open learning data composer; transaction manager; identity controller. Legacy HEI learning system data interface is executed on the legacy HEI system side, which encapsulates designated data and sends to data composer component in accordance with agreed formats. The open learning Data composer component constructs metadata model and meta mapping, receives open learning data from legacy HEI learning systems, and makes the policy of correlation, combination, standardization, etc. The transaction manager component works on both application and legacy HEI learning system sides, which detects and analyzes transactions invoked and decomposes to legacy HEI learning system interface as queries. Identity controller components work as unified ID for data correlation among legacy HEI learning systems, applications, and blockchain storage;
- (2)
Consortium blockchain layer: This layer consists of the entire contents of the network tire as well as part contents of the storage tire, security tier, and contract tier.
Network tier: refers to normal IT infrastructures including the common service of IaaS that provides P2P overlay networks for a consortium blockchain.
Storage tier: includes a blockchain structure and an account model that can joint construct a distributed ledger for the network.
Security tier: comprising an encryption component, a consensus component, a privacy component, and a CA component to provide features of tamper-proof, immutability and non-repudiation of data operation.
Contract tier: provides the capacity of customizing business contract that supports the blockchain-entry function of selected business data.
This generic layer depicts the fundamental components of formulating a consortium blockchain network;
- (3)
Cross-chain layer: This layer consists of the entire contents of cross-chain tier and adaptation tier, as well as part of the storage tier, security tier, and contract tier.
Cross-chain tier: This tier is defined as comprising a service stack and a trans-backbone chain. The service stack provides necessary service for heterogenous consortium blockchain interaction, mainly including: multi-chain governance service, task management service, security management service, privacy management service, permission management service, chain resources service. The trans-backbone chain is defined as three core components: registration chain; relay chain; trans-gateway. The service stack and Trans-backbone chain together construct a unified facility for chain-chain interoperability.
Adaptation tier: This tier defines two core components of the task engine and standard API, which works on the consortium blockchains side to provide necessary adaptation to the cross-chain tier.
Contract tier: This tier provides three core components: standard contract; testing contract; governance contract.
Security tier: This tier comprises three core components: Testing interface; Trans-chain interface; unified CA.
Storage tier: the core component of this tier is extension storage model, which means an extra data storage mechanism acting as either off-chain or in-line chain as cache, to improve performance efficiency of the blockchain.
This conceptual eight-tier model outlines the generic components and logical relations for the complex blockchain-based open learning sharing application design. To date, the model carefully takes blockchain interoperability into consideration, and proposes a set of protocols to construct a unified and trusted infrastructure.
3.3. Integration Framework
After the depiction of scenario schema and application model, our final step is trying to design a pragmatic software development framework as a guideline and unified means for implementation. This section presents our proposed blockchain integration framework for trusted open learning application development. The aim of the framework is that open learning system designers can have a unified and comprehensive means to collaborate, without too much concern over issues such as openness, trustworthiness, or scalability. These features are ensured by the nature of the blockchain network.
Figure 5 shows a concise view of the proposed framework that consists of three divisions: the trusted Open Learning Integration Application (OLA); the Pragmatic Blockchain System for open learning (PBS); and the Legacy HEI open learning System (LS). PBS stands at the center of these three divisions. It consists of six core components: Open learning Data Collecting, Chain-Entry Data Standard Preprocess, Unified Chain-Entry Interface, Chain-Entry/Enquiry/Cross Chain Middleware, Off-Chain Storage, and the back-end Consortium Blockchains.
A general process works in the following three steps. Step 1: the OPA designer analyzes the overall scheme of data-to-be-shared in trusted open learning application and forms open learning data specification; Step 2: the LS designer implements Data Sharing Interfaces according to the trusted open learning data specification; Step 3: the PBS designer implements legacy HEI open learning system’s open learning Data Collecting and a series of Chain-Entry transactions. Step 2 is comparatively trivial. We will come to discuss Step 1 and 3 in the following.
In Step 1, the core work is to define an open learning data scheme and open learning Meta-data for a new trusted open learning application. The fundamental work before the Chain-Entry Data Standard Preprocess is defining of a unified open learning data scheme that governs all the data from participating legacy HEI open learning system, thus achieving an unified open learning data process and integration.
The open learning data scheme comprises three levels of considerations:
- (1)
Open learning Data format level: open learning business data format is defined using unified JSON specification;
- (2)
Open learning Data semantic level: define the same naming for different fields with the same meaning; for different business types, define fundamental data structure and extension data structure; adopt unified open learning Meta-data management to facilitate semantic recognize by code;
- (3)
Open learning Data security level: for sensitive open learning data, define data masking and encryption algorithm standard, including specifications related to data security and authorization, as well as digital signature and verification specification for Chain-Entry data.
Open learning Meta-data defining for new trusted open learning application is illustrated as follows:
A suite of open learning Meta-data is defined in this proposed pragmatic framework. Some core ones include legacy HEI open learning platform record, teacher record, student record, learning behavior record, learning outcome record, learning activities related to records, etc. A legacy HEI open learning platform record consists of university ID, platform ID, student records, etc. Student records and teacher records mainly consist of profiles. A behavior record consists of system log attendance, start learning or close learning a course, learning time span, etc. A legacy HEI open learning outcome record consists of uploaded learning achievements, grades and evaluations, etc. A legacy HEI open learning activity related to a student record can include: registration of student in learning platform; double-checking a student’s identity with biometric data such as face image; enrollment in a class, etc. Similarly, activities related to teacher record, behavior, and outcome include teacher registration, selection of a course, upload or update submission, etc.
In Step 3, the core work is to define cross-chain Interface and Chain-Entry Interface.
Cross-chain Interface specification is illustrated as follows:
Open learning process Smart contract, block, and open learning user account are abstracted as blockchain resources in a unified method. Access to blockchain resources can be reached at any site of a cross chain system through the combination of network, chain ID, and resource name, which forms the access address of the unified resources. The addressing path can be defined as: [Network]/[specific chain ID]/[Resource Name].
The Atomic cross-chain access implementation would be: leveraging HTTP Restful interface visit cross-chain path, and supporting resource access in cross-system with HTTP URL.
Chain-Entry interface is illustrated as follows:
Chain-Entry interface adopts a RESTful paradigm.
A simplified open learning Chain-Entry process specification would be:
- (1)
Define unified open learning Chain-Entry data schemes, including time-stamp, CA signature, business key-value, and business data value;
- (2)
In accordance with the defined open learning Chain-Entry data scheme, define a suite of standard open learning smart contracts for data Entry-Chain and Chain-Enquiry, which support different implementations for specific consortium blockchains;
- (3)
Based on the suite of standard open learning smart contracts, define unified open learning data Entry-Chain and open learning Chain-Enquiry API that shields differentiations of specific blockchain implementations, thus achieving cross-chain interoperability, enabling integration and scale for future consortium blockchains.
During the open learning Chain-Entry process, the off-chain storage issue should be handled to store the bulk of open learning business data.
4. Results
This section presents our implementation and proof-of-concept application to serve as results of the proposed architecture.
As stated before, the TolFob architecture is specified to bring trust data sharing for collaborative open learning systems. We consider the following requirements when implementing the architecture: (i): Trustworthiness: open learning provenance data must be collected and stored, immutably, and must be trustworthy; (ii): Transparency of provenance data sharing: Blockchains are fundamentally transparent, where data and interactions are visible to all open learning participants in the blockchain network; (iii): Privacy: open learning provenance data should be shared between authorized personnel; (iv): Interoperability: open learning provenance data collected from legacy HEI open learning systems should be easily integrated.
The implementation of the TolFob integration framework is shown in
Figure 6. It is divided into three modules: on-chain module; off-chain module; and new trusted open learning application module. The on-chain module was implemented using Hyperledger Fabric 1.4.4 LTS platform. To store and retrieve the open learning information in the blockchain, it is necessary to use chaincodes implemented using the Go programming language. The RESTful API Service, the Unified Blockchain API, the Fabric client SDK and off-chain data storage compose the off-chain module which will be fully described in
Section 4.1. The new trusted open learning Application module will be described in
Section 4.2.
4.1. Implementation of Off-Chain Module
The RESTful API Service in the off-chain module allows TolFob to be integrated with any other HEI open learning platform or application, based on communication via REST web services and HTTP. Its main objective is that open learning platforms and applications can easily store and query open learning provenance data on the blockchain. In the off-chain module, storing and querying provenance data on the blockchain is conducted by Unified Blockchain API, and the specific operation on blockchain can be described by a resource url and params.
Table 1 shows an example usage of Unified Blockchain API. The resource
url in
Table 1 contains blockchain network (shouNetwork), blockchain ID (actChain), resource name of smart contract (study) and resource method (insertRecord). The involved open learning Resource specification detail is mentioned in
Section 3. The response JSON field is illustrated and shown in
Table 2.
The Unified Blockchain API service can configure multiple blockchains, and each blockchain has an adapter implementation. Fabric adapter is implemented using Fabric Client SDK in Java. Both Unified Blockchain API service and RESTful API Service are implemented on Springboot framework. To speed up data access, an off-chain data storage is employed. When HEI learning application submits a record to the blockchain, the RESTful API Service will receive the request and store it in the off-chain data storage for indexing purpose before sending the record to the blockchain. And each record in data storage has a transaction hash field for tracking the corresponding transaction records in blockchain. With help of off-chain data storage, a new trusted open learning application can easily aggregate blockchain records for specific topics, for example, aggregating users’ learning activities from the last month. The off-chain data storage is currently implemented on mysql database. Elastic Search and Mongodb will be supported in the near future.
4.2. Implementation of Application Module
Before collecting data from legacy HEI open learning systems, a suite of open learning metadata should be defined first. Details about the open learning metadata are described in
Section 3. The storage of open learning metadata is built upon mysql database and the metadata managing and maintaining are implemented in an open learning application. When open learning metadata is defined, the data collecting service will read the data source definitions from metadata, where the legacy HEI learning system data accessing interface is defined. Both pull and push mechanisms are supported in the data collecting service and the communication data is in JSON format. The pull mechanism is implemented by a periodically scheduled task to pull data from legacy HEI learning system through a http restful
url, that is, the legacy HEI learning system must provide the pulling data restful API, and oauth2.0 client credentials protocol is used to protect data resources. The push mechanism is implemented by a restful API running in open learning data collecting service and accepting data record from legacy HEI open learning systems.
After the data record is collected, the HEI open learning data preprocessing service will parse the record and extract the data source and data type field, and read the meta data schema definition from metadata store. The metadata schema defines the rules which contain field name mappings, indexing fields, filtering fields, and encryption fields and algorithms, and these rules are applied to convert the record in order. After converting the record, the results contain the result data record and indexing fieldsets which will be stored in off-chain storage to speed up data access. According to the data source and data type, the data preprocessing service can read the destination blockchain resource mapping and submit the result record together with indexing filedsets to the RESTful API service and finish ON-CHAIN processing. After the data from diverse legacy open learning systems are put on chain, open learning data sharing and interactions can be easily achieved to all participant open learning applications in the blockchain network.
4.3. Implementation of a Proof-of-Concept Application
We designed and implemented a proof-of-concept application named “trusted open learning behavior and achievement management”, to validate the proposed architecture and framework. The proof-of-concept application involved three organizations in China, including an open university, a remote-teaching college, and a company that enrolls students from the university and college. We selected and used four legacy HEI information systems, which included a University Diplomat online learning platform (UDP for short), a University Non-Diplomat open course platform (UNDP for short), a university credential management system (UCM for short), and a college credential management system (CCM for short). The former three legacy HEI information systems were owned by the university, and the latter one was owned by the college. These four systems were formerly designed, implemented, and maintained by four different EdTech software vendors. These systems were still normally running production systems. Some of their historical running data are listed below:
Data volume of UDP (as of 30 May 2021): 3154 courses opened, 98,181 teachers and students, 44,000 course resources, a data capacity beyond 7 Terabytes, 376 million formative assessment assignments and self-tests, 32,600 units of student online graduation guidance, 23,968 uploaded papers, 180,000 units of teacher online guidance, 9653 iterations of teaching activity, 750,000 student posts, 1,700,000 teacher posts.
Data volume of UNDP: 19,333 users, 38 courses, 1818 students this year (2021), 4743 h study length, total 40,708 times of learning activity, average 12 times of learning activity per student (activated).
Adopting the proposed architecture and framework, we collaborated with the four EdTech software vendors and jointly designed and developed the new proof-of-concept application.
Figure 7 shows a screenshot of the newly developed trusted application.
The reason we selected this scenario roots from a typical open learning data sharing requirement: some registered students (not all) of the college also take part in the university’s online open courses and even hope to get the university’s diploma as a plus. Therefore, they study and receive credentials from both the college and the university through HEI information systems. The company wants to recruit students from both the university and the college. Therefore, a trusted learning behavior and achievement management application will greatly benefit all the stakeholders of this specific open learning sharing scenario.
Based on the proposed blockchain implementation, we first defined the learning metadata for the four legacy HEI learning systems. The main data scope includes: teacher’s profile data; student’s profile data; student’s learning behavior data; student’s learning achievement data; student’s courses score; and student’s credentials, etc. Some key fields of the open learning metadata are exemplified in
Table 3.
According to the defined open learning metadata, we implemented smart contracts of learning data entry blockchain. Software engineers from the four EdTech vendors implemented data interfaces that extract data from legacy open learning systems and encapsulated these data interfaces into RESTful API services. Our implemented blockchain network invokes these services, parses data, and invokes related smart contracts to accomplish these open learning data entrance of blockchain.
In the front end of the proof-of-concept application, we developed correlation functions that invoke the integrated data and verify their trustability through data-trace function provided by Fabric’s smart contract. Based on these correlated data, profiles of students learning in the four legacy HEI systems are created. Recruitment staff of the company can log in the new developed trusted application and ask for interested students’ learning data. Students can log in the newly developed trusted application, browse his or her learning data on four legacy open learning systems, and agree to grant access rights to these data to the requesting company or not.
4.4. Result of the Proof-of-Concept Application
4.4.1. Functionality
The newly developed “trusted open learning behavior and achievement management” application enables different open learning data originally dispersed in the four separated systems to be collected and stored into the Hyperledger Fabric network, immutably and in a trustworthy manner.
To date, the proof-of-concept system can process not only newly produced learning data, but also the historical learning data, as stated in
Section 4.4, i.e., the historical data stored in UDP and UNDP. This illustrates that our proposed blockchain-integrated system bears a strong capacity of historic data traceability in cross-platform open learning and educational resource sharing, which could greatly benefit decentralized HEIs’ data interaction by avoiding the hard-achieving pre-defining manner between and among different HEIs.
4.4.2. Performance
We examined the workload of the proof-of-concept system in a production environment over 5 months, and made a statistic on hourly on-chain workloads, which is shown in
Figure 8. The peak workload happens at a time between 19:00 and 20:00, and the number of transactions here is 27,286, that is, 7.58 per second on average. In our proof-of-concept application, the submission and feedback timestamp of the on-chain record is written in the off-chain record. It is easy to count the response time of on-chain processing, that is, feedback time minus submission time. We calculated the average time and 95% line time of on-chain response time, which are 1.6 s and 5.6 s, respectively. This performance data illustrates that our proposal is not merely for prototype research; indeed, it is capable for use in industrial production systems.
5. Discussion
In this section, we adopt both qualitative and quantitative methods and make a detailed discussion in terms of our proposed architecture, implementation, and proof-of-concept application.
Section 5.1 discusses our system performance of proof-of-concept with extended experiments to illustrate and prove its scalability.
Section 5.2 makes a further discussion of our work in comparison with previous works.
5.1. Scalability Experiment
In
Section 4.4, we presented the performance of the proof-of-concept application which illustrates the competence of our proposal for the production system. However, the proof-of-concept system is constructed as a 4-node consortium blockchain network and integrates four legacy open learning information systems only. To prove the generality of our proposal, more validation work on scalability needs to be conducted. This validation work can be carried out in two directions: (1) enlarge the quantity of consortium blockchain nodes; and (2) enlarge the quantity of joining legacy open learning systems. Direction 1 falls into the blockchain technology category and involves sophisticated topics such as consensus algorithms, which are beyond the scope of this study. In our research, we assume that with the fixed consortium blockchain implementation version, the increase of blockchain nodes is approximately linear with its performance. This assumption is reasonable because, in an open learning ecosystem, the number of blockchain nodes (which could be corresponded to related legacy open learning systems) usually would not be a large one, as we can define a threshold of 20 (i.e., not exceeding 20) in our research. Direction 2 is the focus of our discussion and needs to be answered with quantitative test data. Technically, more joining legacy open learning systems means a heavier workload of blockchain-entry data. Therefore, we designed a suite of workload testing experiments to analyze whether our framework’s scalability could fulfill the pre-defined threshold of 20. In addition, as an illustration for interested readers of direction 1, we also designed a comparison set of running both on 1 blockchain node and on 4 blockchain nodes in our experiments.
These experiments were executed to evaluate the performance of the smart contract invocation (blockchain transaction writes) and query of transaction record (blockchain transaction reads). The testbed system, in line with the production proof-of-concept system, was built upon a blockchain system of Hyperledger fabric 1.4.4 with four organizations. Each organization was served by a virtual machine (node) running orderer. CA and peer services and raft algorithm were adopted among these four orderer services. The hardware configuration of each node in the experiment testbed compared with the production system is shown in
Table 4.
All the four virtual machines in the experiment testbed were allocated in a physical host machine running CentOS 7.5 operation system with one Intel i7-8700K CPU (3.7 GHz 6 cores 12 threads) processor, 32 G RAM and 2 T SATA3 hard disk.
On each node in the experiment testbed, we deployed a unified blockchain restful server to access the underlying fabric blockchain system. To simulate transaction workloads, JMeter was used to generate http requests to Nginx which was running on the host machine and used as a proxy to unified blockchain restful servers.
To test the performance on different workloads, the http requests were generated simultaneously by JMeter with threads quantity of 100, 200, 300, 400, 500 and 600. The Ramp-up period was set to 20 s, and the thread Loop Count was set to 100. We tested both the performance of smart contract invocation (blockchain transaction writes) and query of transaction record (blockchain transaction reads), and measured the system performance by two indexes of throughput and latency.
Figure 9 shows the performance of throughput and latency for the smart contract invocation.
In
Figure 9, by changing the upstream section of Nginx configuration file, two sets of tests were performed where requests were dispatched to single unified blockchain restful server (
n = 1) and four unified blockchain restful servers (
n = 4) in a round robin policy. When the number of threads increased, the throughput grew before it reached the threshold (which is 500 shown in
Figure 9), and the average latency grew because the request is more likely to be waiting than processing when more simultaneous requests swarm into the unified blockchain restful server, that is, latency increased due to more time spent waiting.
Like the performance test on the smart contract invocation,
Figure 10 shows the results of a performance test of query of transaction record. For the same reason, the average latency grows when the number of threads increases. Because transaction query on node could be processed on local ledger blocks, not relying on global order service, the throughput grew greatly when the number of threads increased from 100 to 200 where
n = 1, and increased from 100 to 400 where
n = 4. However, when the number of threads continued to increase, the throughput grew slowly or changed little. While the test was running, we inspected the disk io utility by iostat command on host machine, and found the io utility reached 100% continuously for a long period of time when the number of threads was greater than 400. The results show that after reaching disk io up limits, more simultaneous requests only increase delay without improving the throughput of the query.
By balancing the performance of throughput and latency for the experiments, it was a good choice when threads were 200, where the throughput and average latency of the smart contract invocation were 126 tps and 724 ms, respectively, with
n = 1.
Table 5 shows the comparison of performance on experiment testbed and production system. How the workload and average response time were obtained from the production system is described in
Section 4.4. Hardware configuration in experiment testbed and production system is shown in
Table 4. The results in
Table 5 show that the throughput benchmark in experiment was 17 times more than the workload in the production system while the hardware in the experiment had a much lower configuration than that of the production system. Therefore, the architecture and implementation proposed in this paper can fully satisfy performance requirements.
When the system is scaled up, that is, when more legacy open learning systems join our blockchain network, the testbed with lower hardware configuration could still support 17 joined systems, while achieving the same acceptable performance as the proof-of-concept production system. Furthermore, TolFob was implemented upon CrossChain middleware which could dispatch workloads across multiple blockchain networks; thus, by adding more groups of computers and storages, a better performance of throughput can be achieved. Considering these two factors, it is not hard to say that our framework is competent for the assumed threshold of 20 open learning systems joining. Therefore, the architecture and implementation proposed in this paper shows a good extensibility for a data sharing scenario in open learning environments.
5.2. Comparison with Previous Work
Section 2 has briefly introduced this study’s differentiation from previous works. This section elaborates on a more detailed comparison both qualitatively and quantitatively.
We and previous researchers share a consensus that blockchain technology is a disruptive and promising solution for the open learning data sharing issues. However, due to the complexity of blockchain technology, there are different preferences and views in published papers.
Table 6 summarizes these differences.
The first comparison dimension we selected is blockchain type and implementation, i.e., adopting what type of blockchain network in the study. Authors in [
4,
9,
27,
28] adopted a consortium blockchain of Hyperledger Fabric implementations (different minor versions due to research time). Authors in [
15,
16] adopted the public blockchain of Ark and Ethereum. Authors in [
2] adopted a proprietary hybrid blockchain. We adopted a consortium blockchain and presented a Hyperledger Fabric-based cross-chain extension.
The second comparison dimension we selected is the integration capacity of the proposed blockchain. All the examined studies except [
2] provided smart contract to construct trusted applications. This demonstrates that smart contract has become a fundamental inner schema in blockchain-enhanced application integration. However, only we and authors in [
2,
27] provided further out-smart-contract integration function. Moreover, both papers [
2,
27] only made qualitative discussions of the outer integration function, without presenting a concrete developing framework. We provide a pragmatic software developing framework and implementation that can be directly used as HEI information system integration guideline as well as unified means.
The paper [
2] is a theoretical study on software architecture in an open learning context. This software view is similar to ours. However, authors merely proposed an architecture (combining hybrid blockchain and microservice), made qualitative discussions, and provided no implementation or performance. The paper [
4] is a study focusing on blockchain-based secure storage and sharing scheme for MOOCs learning, which is of a narrower but very close context to our study. Similar to us, the authors selected to adopt a consortium blockchain with Hyperledger Fabric 1.4 implementation. However, they primarily focused on smart contract design and deployment, and did not make any cross chain extensions, nor did they provide outer integration function or developing framework, which we did. The case is similar in study [
9]. Here, authors focused on the specific theme of Digital Education Resources Authentication. The paper [
15] is a study focusing on credit exchange in higher education, which, similarly to encrypted currency, is a very specific theme with a much lower data volume compared to us. This explains the strategy differentiation between ourselves and other authors. They chose to adopt public chains with open-source Ark implementation and did not consider outer integration function. The paper [
16] is a study focusing on student’s credential sharing. Its research scope and data volume are somewhere between this paper and [
15]. Therefore, authors adopted a public chain with Ethereum implementation, and did not consider outer integration function either. The paper [
27] is a study focusing on-campus information system integration, which is similar to our integration view. Moreover, these authors adopted a consortium blockchain with Hyperledger Fabric implementation. However, they did not provide a concrete integration framework. The paper [
28] is a study of trusted data management, not in the education field but in the context of edge computing. Compared to our open learning context, edge computing involves an IoT network and produces a huge data volume. Authors designed and implemented a Hyperledger Fabric 1.3 extended blockchain, but did not consider the integration issue either.
The third comparison dimension we selected is system or implementation performance. As shown in
Table 6, only the three studies [
4,
16,
28] presented their performance data. We carefully examined the data published in these three papers. In study [
4], the performance was given in the form of algorithm computational cost, which we cannot compare with our performance of system response time. Meanwhile, studies [
16,
28] gave their response time of implementation. Comparisons of our study to these papers are listed in
Table 7.
Table 7 shows the response time comparison of our work with the paper [
16,
28]. Our work is based on Hyperledger Fabric 1.4 and made extensions including cross-chain and integration function. The study [
28] was based on Hyperledger Fabric 1.3 and made an extension on security. The study [
16] used Ethereum. The data size and experimental configuration were not the exactly same in the three studies. However, as the publishing times are quite close to each other ([
16] in 2021 and [
28] in 2020), and considering that the configuration we adopted in our study is a common one as of the year 2020, it is reasonable to say that the configuration differentiation would not affect our comparison result too much in this case. To be more persuasive, we selected a higher workload of 20,000 simultaneous requests in our system, compared to 2000 in [
28] (10 times) and 1000 in [
16] (20 times) to offset the possible configuration bias. The average response time of “transaction query” in [
16] was 16 s, which is obviously much larger than 0.79 s (in [
28]) and 0.52 s (in our work). The average response time of “transaction write” in [
16] was 16 s too, which is also obviously much larger than 0.72 s (in our work, not available in [
28]). Authors in [
16] also suggested that the reduction of response time could be achieved by using other platforms such as Hyperledger Fabric. These demonstrate that consortium blockchains such as Hyperledger Fabric (which has high computational capabilities and supports time-efficient consensus algorithms) are more suitable for data sharing in an open learning environment than public chains such as Ethereum. The comparison result also indicates that our implementation outperforms other similar works that we have investigated.
To date, previous works have mainly focused on a specific application theme in open learning data sharing with emphasis on features such as security, efficiency, scalability, trusty, etc. Therefore, they leverage blockchain’s intrinsic feature and conduct research on smart contacts. However, it is also very important to make the blockchain network open and easier to incorporate various data from different HEI systems. In order to achieve this goal, we adopted both a software architecture and a business feature perspective in this study and implemented Fabric extensions such as CrossChain and out-smart-contract integration functions to make the blockchain a more unified and transparent integration infrastructure to legacy HEI systems, while other works had no such feature and found it difficult to handle this complicated open learning system interoperation scenario.
6. Conclusions and Future Work
To resolve the cumbersome interoperability issue of authentic data sharing among the open learning education ecosystem, a consortium blockchain is leveraged and extended in our study. The most vital part of our research is to propose an overall architecture consisting of an open learning business schema, a conceptual application model, and a pragmatic blockchain integration framework, as a guideline and infrastructure for a blockchain-enhanced trusted open learning application development. The results of our implementation and proof-of-concept indicate that our consortium blockchain extended framework is competent for multiple HEI open learning systems integration, with an average response time of 1.6 s when no more than 20 systems were integrated. To the best of our knowledge, this result outperforms other research findings we have investigated. Based on these, it can be said that, under the assumption that the quantity of related information systems in a specific open learning ecosystem usually would not surpass 20, the proposed architecture and framework bear the potential to be widely adopted in open learning data sharing scenarios, as a trusted and unified interoperation infrastructure to create better open learning platforms and bridge the gap in the present open learning ecosystem. This further implies that our research finding may play an important role in the forthcoming significant opportunity in the creation of disruptive open learning business models and flexible open learning ecosystems with the disruptive features of blockchain technology.
This research work still has some limitations. Blockchain in education is essentially for stakeholders of the ecosystem to establish standardization and validation of HEI educational systems to mitigate fraud. In this regard, the limitation of our work mainly lies in two aspects: a lack of data governance specification and cross-chain standards. In order to provide a unified open learning data-sharing infrastructure solution, there should be a specification of data governance for all stakeholders to abide by when integrating legacy open learning systems, especially for the procedure or workflow consensus. Furthermore, a cross-chain standard would be more efficient and elegant for the interoperation of blockchain. Our work focuses on application architecture and software framework, while simplifying these two aspects by proposing weak substitute versions.
In light of this study’s limitations, in the future, we will conduct further research work, which includes proposing a more complete crosschain protocol and core component implementation, adapting more open-source consortium blockchain implementations (besides Hyperledger Fabric), inviting more-open learning HEI stakeholders connecting to our blockchain network, and promoting a team standard for open learning system data governance.