Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Learning Unit 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

ST U D Y U N I T 1

Data and the computerised


information system process
Jelonek (2017) posits that data, information and knowledge have always played a critical
role in business. Companies need new solutions for data processing and analysis as
the amount of various data that can be collected and stored is ever increasing, Following
the 4 Vs (volume, variety, veracity and velocity) of big data, which we learnt about in
AIN1501, organisations use data and analytics to gain valuable insight to inform better
business decisions. In all this, cloud computing technology simplifies the time-
consuming processes of hardware provisioning, hardware purchasing and software
deployment, and revolutionises the way computational resources and services are
commercialised and delivered to customers. Cloud computing technology shifts the
location of this infrastructure to the network so as to reduce the costs associated with
the management of hardware and software resources. This means that the cloud
represents the long-held dream of envisioning computer as a utility, a dream in which
the economy of scale principle helps to effectively drive down the cost of computing
infrastructure (Sakr & Gaber 2014).

1
1.1 Introduction

In AIN1501 we learnt amongst others that the world’s most valuable


resource is no longer oil, but data, and that companies have had a rapidly
growing volume of data which is sourced from different areas of the
business. In the relevant study units 1, 2, 3, 6 and 11 (UNISA 2022),
the basic principles that would help you understand and contextualise
applications were provided as follows:
a. Study unit 1 provided you with the foundation of how organisations use information
systems to manage their information, reduce uncertainties and costs, and increase
revenues and service delivery.
b. Study unit 2 was about information systems (IS) as a tool that handles the flow and
maintenance of information which supports the business operations, and identified
four components being people, equipment, procedures, and data. It also covered
the principles relating to the Accounting Information System (AIS) implementation
in an ERP environment and decision-making.
c. In study unit 3, you learnt about the characteristics of information.
d. In study unit 6, you learnt that Big Data is a term referring to the collection of data
which is so large that it becomes difficult to store and process using traditional
databases and data processing applications. It described data sets so large and varied
that they are beyond the capability of traditional processing.
e. Lastly, key to study unit 11, was to create an understanding of how society is
becoming more dependent upon computer and communications technology, where
many would argue that we have left the industrial age behind, and the information
age has taken over. Enterprise systems were described as commercial software
packages that enable integration of transactions-oriented data and business
process throughout an organisation. This included Enterprise Resource Planning
(ERP), software and related packages, such as advanced planning and scheduling
sales force automation, product configuration among others. The intention was to
ultimately create an understanding that information is used by different
organisations for different purposes.

In this study unit, we will first provide an understanding of the concept of digitalisation of
the economy, the use of Big Data technology, and the role of data in an organisation. We
will then look at the types of data input, processing and output and typical processing
systems and also learn about some of the methods used to process data into
information in an organisation.

The learning outcomes of this study unit are as follows:


• Distinguish between the different methods of data input, data processing and
information output.
• Apply database terminology, applications and structures.
The following icons are included in this study unit:
ICON DESCRIPTION

This icon indicates that you need to self-reflect.

This icon indicates that you need to do some critical thinking.

This icon illustrates an activity you must complete.

This icon indicates that you need to refer to AIN1501.

1.2 Digitalisation of the economy


According to Sadyrin, Syrovatskay and Leonova (2021), the digitalisation of the
economy, as the main direction of technological and economic development,
determines a qualitatively new level of work with information and data. But the
avalanche-like growth of the generated data leads to the need to develop new analytical
tools that allow organisations to structure and systematise large arrays of
heterogeneous data and use them in the formation of management decisions. Working
with Big Data and technologies, powerful computing resources has to operate with huge
amounts of information in real time. An estimated 84 per cent of enterprises believe
those without an analytics strategy run the risk of losing a competitive edge in the
market. Currently, Big Data is widely used in the financial sector, but mainly in tasks
related to the need to accumulate and process large volumes of statistical information,
in simulation modelling, in predictive analytics, in the field of marketing and in
visualisation of analytical data. At the same time, the leaders in the implementation and
application of digital technologies are large companies of the national and international
level, which have the necessary resources and funds for this. Industries that have
adopted the use of big data include financial services, technology, marketing, and
health care, to name a few. The use of big data in the practice of economic entities
belonging to the segment of medium and, moreover, small business is difficult and
expensive in terms of the possibility of independent organisation of storage, processing
and protection of information, as well as the development of the necessary analytical
tools and data management systems. However, in the future, with the development of
digitalisation and the creation of national databases and digital platforms at the state
level, built using cloud technologies and blockchain, many economic entities can gain
access to previously inaccessible resources and use the advantages of working with

3
Big Data in their activities.

1.3 Use of Big Data technologies


Furthermore, Sadyrin, Syrovatskay and Leonova (2021) state that the implementation
of digital technologies requires the technological solution of a number of complex
problems, for example: (i) how to organise the storage of a large amount of data, and
(ii) what physical media should be used for this in order to ensure verification, backup
and protection of information. Although the solutions that enable the potential of digital
technologies had been applied for a long time, the problem is that they are not used in
the field of financial analysis. For example, when working with personal data of
individuals and legal entities, the need to monitor compliance with legislative and legal
norms for working with data can be added. The main tasks of improving the
methodology of financial analysis is to modernise the approaches to determining the
information base, which would take into account the availability of off-accounting data.
This means that transformation of the information base of financial analysis should
provide for the possibility of using structured and unstructured data in analytical
procedures. One of the possible solutions to such problems may be the creation of the
necessary databases at the state level and the provision of access to them to interested
users for a fee. In this regard, public authorities in general, and central banks and
financial supervisory bodies in particular, have started to collect more data to support
their decisions. In addition, private commercial data providers have also expanded their
activities. Different organisations have different solutions.

1.4 Data in an organisation


With reference to figure 1.1, data is generated during different activities (a) in the
organisation, which are continuously collected and entered (input) (b) into the
Computerised Information System (CIS), to process data (c) and store data and
information (d) about their customers, suppliers, inventory, competitors, employees,
finances, business operations, to name only a few, and ultimately make informed
decisions. Therefore, according to Merzbach (2017), a challenge facing companies
today is processing large, often unstructured, amounts of data to gain valuable
information that can then feed into strategic, tactical, and operational business
decisions. Such data is generated not only by the actual day-to-day business, but also
by external data sources, such as social networks.
According to Sakr and Gaber (2014), information from multiple sources is growing at a
staggering rate, with Twitter generating more than 12TB of tweets, Facebook more than
25TB of log data and the New York Stock Exchange (NYE) capturing 1TB of trade
information. About 30 billion radio-frequency identification (RFID) tags are created
every day and adding to this mix is the data generated by the hundreds of millions of
GPS devises sold every year and the more than 30 million networked sensors currently
in use (growing at a rate faster than 30% per year). These data volumes are expected
to double every two years over the next decade. On the other hand, many companies
can generate up to petabytes of information in the course of a year. According to many
estimates as much as 80% of this data is semi -structured or unstructured. This data is
from web pages, blogs, clickstreams, search indices, social media, forums, instant
messages, e-mail, documents, consumer demographics, sensor data from active and
passive systems, and more. Thus, according to Ruan- Ching Li, Hai Jiang, and Albert
Y Zomaya (2017), these many advances we witness today are brought about because
of the many developments we see in algorithms, high performance computing,
databases, datamining, machine learning and so on. Thus, Big Data management and
processing should prove very beneficial for researchers and graduate students focusing
on Big Data in an academic institution and serve as a very useful reference for
practitioners and application developers.

Lizhe Wang, yan Ma, Jining Yan, Victor Chang, and Albert Y. Zomaya (2018), in their
study on massive, large-region coverage, multi-temporal, multi-spectral remote sensing
(RS) datasets, postulated that these are employed widely due to the increasing
requirements for accurate and up-to-date information about resources and the
environment for regional and global monitoring. In general, RS data processing involves
a complex multi-stage processing sequence, which comprises several independent
processing steps according to the type of RS application.

1.4.1 Data generation in an institution of learning

To create a comprehensive understanding of data in an organisation, we use an


institution of learning which we are familiar with. Institutions of learning like Unisa have
large amounts of student’s information available in different IT platforms that can be
collected, entered in a CIS, processed, stored, and retrieved for use to make decisions
in many areas of teaching, learning and administrative processes. A surplus of data
that instructors, instructional designers, program developers, information technology
professionals, and administrators can utilise in the design, management and
implementation of instruction, is generated by contemporary learning experiences. In
this regard, LMS generate a variety of structured and unstructured data. Such data is
produced by learning and administrative systems, task-specific learning tools and
student work created with information and communication technologies. Through this,
a number of data sources in existing eLearning technologies including LMS, blogs and
task specific online tools (e.g., Quizlet) can present educators with valuable information
that can be used for decision making. The challenge for educators in these roles is to
translate these data into meaningful information that can support learning. However,
while the analysis, interpretation, and representation of raw data from Information
Technology (IT) systems requires high level programming, statistical, and
graphics/information design skills, there are a variety of ways that learning data can be
mined without specialised knowledge beyond professional training in an educator’s
area of design, teaching, administrative, or instructional expertise (Voithofer & Golan,
2018).

Maxwell (2021) provides that while data on learning outcomes is essential for
monitoring improvement on those learning outcomes, other data are important for
interpreting the learning outcomes data. The learning outcome data include, for
example, student background and interests, student and parent expectations,

5
perceptions and satisfaction, school climate, teaching practices and interactions with
students, and student destinations and accomplishments. These data could also be
considered process data, since they reference teaching styles and teaching qualities.
However, since they involve reflection on the teaching after it has occurred, perhaps
they should be considered outcome data. He referenced from Bernhardt’s (1998) claim
that it is clear that what separates successful from unsuccessful (schools) institutions
of learning, is their use of data, saying that those who analyse and utilise information
about their institutional communities make better decisions about not only what to
change, but how to institutionalise systemic change.

Voithofer and Golan (2018) state that currently, the dominant form of learning with
technology through Learning Management Systems (LMS) can be found at all levels of
education. Accordingly, learning analytics, which is the process of collecting, organising
and analysing large amount of e-learning data, can benefit administrative decision
making, resource allocation, highlight an institution’s success challenges, and increase
organisational productivity. According to Maxwell (2021), attention should be directed
to some future challenges that may change not only the way we think about data use
to improve student learning but also the way schools (institutions of learning) may
operate.

1.4.2 Other examples of data generation

More examples: each time a customer completes an order form, the organisation
produces an inventory item; and each time a potential employee completes a job
application form, data is generated.

Example 1.1

Review the computerised information system (CIS) process as


explained to you in your AIN1501 (study unit 3) studies.
FIGURE 1.1: COMPUTERISED INFORMATION SYSTEM PROCESS

Source: AIN1501 2022

Refer
to

figure 1.1 throughout the following discussion:


1.5 Categories of data
According to Voithofer and Golan (2018), data is organised around two categories of
structured and unstructured data, as set out below:

1.5.1 Structured data


According to Sadyrin, Syrovatskay and Leonova,(2021), structured data can be
generated by a person when interacting with various devices, or it can be automatically
made, for example, data coming from the sales system from various reading devices,
data from various sensors and tags, amongst others. Accordingly, as Voithofer and
Golan (2018) mention, the most abundant form of data produced by IT-based learning
system and tools in an institution of learning is structured data. It has a uniform and
standardised format that can be stored in and retrieved from a computer database. This
data include information on various types of technology-based learning experiences,
including login times, attendance, test scores, assignment submission, data, and click
behaviours. Therefore, it can be easily visualised, for example, in the grid of a
spreadsheet where the column for a particular data point such as LMS login times or
assignment completed can be seen, analysed and compared. Examples of structured
data that can be found in raw form from a LMS log files include resource access count,
(e.g., PDF reading), date and time of access, discussion board posts, grades of tests
and assignments along with e-mails sent to instructors. Sadyrin, Syrovatskay and
Leonova (2021) mention that the source of structured data is usually existing accounting
or management systems (CRM, ERP, etc). This is usually stored in the form of various
relational databases, which are accessed using special programming languages (SQL,
Java, Python, etc).

1.5.2 Unstructured data


Sadyrin, Syrovatskay and Leonova (2021) state that unstructured or weakly structured
data has an unstructured format or content, and it can be workflow data, data from social
networks or data from mobile devices. On the other hand, Voithofer and Golan (2018)
maintain that unstructured data do not have a standardised or data model. It can be
found in blog posts, discussion boards (forums), chat transcripts, e-mail messages, and
student-produced media. Thus, it can include open-ended textual information including
e-mails, synchronous chat transcripts, and discussion boards (forums) posts along with
non-textual information such as media (e.g., photos, podcasts, videos). In an institution
of learning, while it can be time consuming to analyse unstructured data without
specialised or custom software, the qualitative insights that this category of data
provides, offer helpful information about student, class and program learning process
and outcomes. According to Sadyrin, Syrovatskay and Leonova (2021), working with
unstructured data directly requires the use of technologies for working with Big Data,
since the volume of such data can be very large, and be of different types. This data can
also be combined into databases, for work with which content management systems
(CMS) are used. The most well-known CMS are Hadoop, MapReduce and Streaming,
which allow working with big data in real time.

Input of data (also called data entry) (b), processing of data (c), output (e) and storage
of data and information (d) will now be discussed in more detail.

1.6 Input of data

Data captured on manual documents (hard copy) as well as data entered in

7
the CIS and not yet processed is called raw data (UNISA 2022).

Raw data has little or no value for the organisation in the decision-making process. It
only becomes valuable when processed (c) into information (e) and an organisation then
uses this information for decision making (f) as also explained in the example above about
students LMS.
As linked in the above example of customer order, for example, having a list of all the
sales in a specific branch (raw data) is not useful in itself, but when this raw data is
processed into information (for example, total sales for a month compared to other
months and branches in the organisation), this information now has become useful and
valuable. As a foundation you must fully understand the form of structured and
unstructured data discussed above in sub-paragraphs 1.5.1 and 1.5.2 respectively.
.
Because information is obtained by processing raw data, it is essential that
the raw data captured in the CIS should be accurate, complete, reliable and
verifiable (UNISA 2022).

The quality of information is directly linked to the quality of raw data entered or captured
– in other words, inaccurate data captured or used will lead to inaccurate information,
and incomplete data will lead to incomplete information, which in turn will result in
ineffective decisions. The principle of “garbage-in-garbage-out” is especially true in a
CIS. Many organisations have embarked on expensive data cleaning projects (correcting
and completing data) because they understand the impact of inadequate data on
information and decision making.

NOTE

Note the difference between the terms “information” and “data”.


Make sure you use the correct term.

Information = processed data

Many medium and small organisations still use paper source documents (e.g., invoices,
application forms, etc) to collect data. Most large organisations, however, have moved
to electronic source documents by capturing data directly through computer data entry
screens or by using barcode scanners. In topic 7 (Pastel) you will practise using
computer data entry screens to capture invoices and other source documents.

What is a source document?

A source document is the first documented record of an activity that took


place (UNISA 2022).
Every time a business is involved in a financial transaction, a paper trail is
generated. This paper trail is referred to in accounting as source documents.
1.6.1 Source documents
Whether cheques are written to be paid out, sales are made to generate receipts,
billing invoices are sent by suppliers, or work hours are recorded on an employee’s
timesheet – all the respective documents are source documents. In the Canadian
Revenue Agency (CRA), scanned documents are accepted as long as the records are
produced and retained in paper format or stored in an electronically accessible and
readable format. Although organising and filing these documents can be tedious,
putting in the extra time to properly maintain a paper trail and create an easy way to
access these documents can result in huge time savings in the future, and also
ensures greater transparency. Source documents are used for different purposes by
different people.

1.6.1.1 Source documents for accounting and bookkeeping process

These documents are, first and foremost, important because they serve as physical
evidence that a financial transaction has actually occurred. Nowadays, these documents
do not necessarily need to be a physical hard copy; they may be in a traceable electronic
form.

1.6.1.2 Source documents for auditing process

These documents are also essential because when companies undergo an audit, the
auditor’s access to a clear and accessible paper trail of all transactions enhances the
overall legitimacy and independence of the audit. In order to reaffirm the accuracy of the
company’s balances in individual accounts, auditors need full access to all the
documents. The optical scanner reads handwritten numbers and letters and transfers
this information to an on-line computer and is automatically placed on tape, ready for
computer processing. This data entry method is extremely accurate and fast. A
comprehensive computer edit program is used to check the presence of required data
as well as its validity, compatibility, and arithmetic accuracy. Time and cost savings
together with information that can be related to the auditing process has provided
support and participation by management (UNISA 2022).

1.6.1.3 Source documents for running a business more smoothly and enhance
transparency

All its source documents should be kept and stored for future reference.

1.6.1.4 Source documents for social studies teachers and educators


There is a tacit understanding among social studies teachers and educators that
incorporating primary source documents in planning and teaching is desirable for many
reasons, most prominent among them the ways in which it challenges students to think
at higher levels (UNISA 2022).

9
1.6.2 Characteristics of a good source document
(i) It serves as a good internal control and provides evidence that a transaction
occurred.
(ii) It should capture the key information about a transaction, such as the names of
the parties involved, amounts paid (if any), the date, and the substance of the
transaction.
(iii) It is good when the document describes the basic facts of the transaction such
as the date, the amount, the purpose, and all parties involved in the transaction.
(iv) They are frequently identified with a unique number, so that they can be
differentiated in the accounting system.
(v) Nowadays, these documents do not necessarily need to be a physical hard copy
– they may be in a traceable electronic form.

1.6.3 Input types


In AIN1501 study unit 11 you have learnt that society is becoming more dependent
upon computer and communications technology. Many would argue that we have left
the industrial age behind, and that the information age has taken over (UNISA 2022).
The users of Information and Big Data and the users of information topics are covered
comprehensively in AIN1501 study units 1 and 6, respectively. Data that may be coming
from the source documents explained above, can be entered (input) into a CIS either through
batch input or through online input as discussed below. For both these input types,
the data captured will be stored in the CIS transaction file. Transaction files are
explained in study unit 2 of this module.
Before discussing how data can be entered into a CIS through batch or online inputs,
it is important to understand the concept of data migration as the foundation for
capturing data, whether structured and/or unstructured. This is due to the
understanding that data is generated from different activities and sources and is being
used for different purposes in an organisation. In some instances, it is already
available somewhere in a different platform and/or format for a different use but the
same can be transferred or moved to be used by others for another purpose.

1.6.4 Migration
Migration is the process of moving data from one platform/format to another
platform/format, involving migrating data from a legacy system to the new system
without impacting active applications, and finally redirecting all input/output activity to
the new device. In many cases it is extracted from several different databases, then
managed and finally stored in yet another database. Thus, a high amount of data is
being managed by databases and applications in companies today (UNISA 2022).
Sarmah (2018) mentions that best practices must be followed to avoid migration being
very expensive and incurring hidden costs not identified at the early stage. According
to Sarmah (2018), data needs to be transportable from physical and virtual
environments for concepts such as virtualisation. Thus, to avail clean and accurate data
for consumption, a data migration strategy should be designed in an effective way such
that it will enable an organisation to ensure that tomorrow’s purchasing decisions fully
meet both present and future business needs and render maximum return on
investment. Thus, migrating data can be a complex process during which testing must
be conducted to ensure the quality of the data, taking into account testing scenarios
and the accompanying risks . In simple terms, it is a process of bringing data from
various source systems into a single target system (UNISA 2022).

1.6.5 Reasons for migrating data


In today’s world, migrations of data for business reasons are becoming common. While
the replacement of the old legacy system is the most prevalent reason, other factors
also play a significant role in deciding to migrate the data into a new environment
(Sarmah 2018). Latt (2019) states that these days, data migrations are often started as
firms move from on-premises infrastructure and applications to cloud-based storage and
applications to optimise or transform their companies. Therefore, it is important to the
organisations to transform the data sources of old systems to the new systems, and
today, most companies do data migration to update their systems. The business driver
is usually an application migration or consolidation in which legacy systems are replaced
or augmented by new applications that will share the same dataset. The core reason for
this migration is upgrading the existing system into a developed system according to the
industry requirements. Generally, this is the result of introducing a new system or
location for the data. Sarmah (2018) explains that data migration is a multi-step process
that begins with an analysis of legacy data and culminates in the loading and
reconciliation of data into the new applications. This process involves scrubbing the
legacy data, mapping data from the legacy system to the new system, designing the
conversion programs, building and testing the conversion programs conducting the
conversion, and reconciling the converted data.

1.6.6 Considerations for data migration


Sarmah (2018) states that databases continue to grow exponentially and require
additional storage capacity. As a result, companies are switching to high-end servers
to minimise cost and reduce complexity by migrating to consumable and steady
systems. In addition, to prevent major and various types of issues, organisations need
a reliable and consistent methodology that allows them to plan, design, migrate and
validate the migration. Furthermore, they need to evaluate the need for any migration
software/tool that will support their specific migration requirements, including operating
systems, storage platforms, and performance. Thus, they need to keep in check all the
points, through robust planning, designing, assessment and proper execution of the
project and its variables.

1.6.7 Types of data migration


According to Latt (2019), there are four different types of migration, which are set out
below.

a. Schema migration:
It may be necessary to move from one database vendor to another, or to upgrade the
version of database software being used. With the upgrade of software, it is less likely
to require a physical data migration, but in major upgrades, a physical data migration
may be necessary. Thus, a physical transformation process may be required due to the
possible significant underlying data format change. If so, behaviour in the applications
layer may not be affected, unless the data manipulation language or protocol has

11
changed – but modern applications are written to be agnostic to the database technology
so that a change from Sybase, MySQL, DB2 or SQL Server to Oracle should only require
a testing cycle to be confident that both functional and non-functional performance have
not been adversely affected.

b. Application migration:
When changing application vendors, for example a new CRM or ERP platform,
inevitably, it entails a substantial transformation as almost every application or suite
operates on its own specific data model and also interacts with other applications and
systems within the enterprise application integration environment. Furthermore, to allow
the application to be sold to the widest possible market, commercial off-the-shelf
packages are generally configured for each customer using metadata. Application
programming interfaces (APIs) may be supplied by vendors to protect the integrity of the
data they have to handle.

c. Business process migration:


It operates through a combination of human and application systems actions, often
orchestrated by business process management tools. When such changes occur, they
can require the movement of data from one store, database or application to another to
reflect the changes to the organisation and information about customers, products and
operations. Examples of such migration drivers are mergers and acquisitions, business
optimisation and reorganisation to attack new markets or respond to competitive threats.

d. Legacy migration:
Legacy migration can be classified into well-defined interfaces, applications, and
database services. Its strategies are easy to apply, fast to implement, and can be widely
applied to industry software projects. However, it is very difficult to incorporate with
newer systems such as open-source operating systems because non-extensibility,
incompatibility, and less openness of the underlying hardware and software of the legacy
systems. Its life cycle includes the following procedures: Before migration, (i) Plan,
assess and prepare, (ii) Assess hardware, software and network readiness and plan for
the future, (iii) Clean up by eliminating useless data, consolidating resources, monitoring
everything. During migration: (i) Prototype, pilot and deploy migration (ii) Use powerful
database modeling to simulate migration, resolving issues before commit (iii)Track
migration. After migration: Maintain and manage new environment.

e. Cloud data migration:


According to Iqbal and Colomo-Palacio (2019), as echoed by Amin, Vadlamudi and
Rahaman (2021), one of the most common operations and possible the process, is
moving locally stored data in a public cloud computing environment. Cloud data
migration is the procedure of moving information, localhost applications, services, and
data to the distributed cloud computing infrastructure. The success of this data migration
process is depending on several aspects like planning and impact analysis of existing
enterprise systems. Five different cloud migration strategies, models and procedures
are prescribed: (i) evaluating the performance, (ii) identifying security requirements, (iii)
choosing a cloud provider, (iv) calculating the cost, and (v) making any necessary
organisational changes. Cloud migration has both challenges and advantages, so there
are loads of academic research and technical applications on data migration to the
cloud.

1.6.8 Data migration process


Sarmah (2018) defines the data migration process as the activity of moving data
between different storage types, environments, formats, or computer applications,
which is needed when an organisation changes its computer systems or upgrade to
newer version of the systems. This solution is usually performed through programs to
attain an automated migration. Lastly, as defined by Latt (2019), it is a process of
moving data from one location to another, one format to another, or one application to
another, and includes data files, different types of operating systems and platforms,
personal files and numerous sources. In addition, in some cases is conducted
manually, which can be very time consuming. Therefore, it is a process that can
eventuate in several new problems, especially considering the high amounts of data
that is being processed (UNISA 2022).

According to Sarmah (2018), in many organisations data migration is a regular activity


of IT departments. As Latt (2019) contends, it means that numerous operational
systems and external data sources that includes Relational Data Base (RDB) which
incorporates schema translation and data transformation, personal files from various
sources have to be transformed from old system to new systems. Thus, automated and
manual data cleaning is commonly performed to improve data quality, eliminate
redundant or obsolete information, and match the requirements of the new system.
However, the impending problem is redefining the existing database and storage
system in terms of complex language. Based on the industry rule, it is common that
during the migration, the source and target databases are structurally different, or data
is inconsistent across multiple data sources. Due to this problem, several research
studies and development of migration tools are emerging continuously.

As an example, by correctly migrating a company’s data and legacy


data to Salesforce, building low-maintenance and high-performing data
integrations to orgnisations’ mission-critical systems, it allows for
designing to get the most out of Salesforce and make it a "go-to" place
for all organisation's customer information. Masri and McDermott (2019) mention that
when companies choose to roll out Salesforce, users expect it to be the place to find
any and all information related to a customer—the coveted Client 360° view. On the
day the company goes live, users expect to see all their accounts, contacts, and
historical data in the system. They also expect that data entered in other systems will
be exposed in Salesforce automatically and in a timely manner. As the Salesforce
platform grows more powerful, it also grows in complexity. Whether the company is
migrating data to Salesforce, or integrating with Salesforce, it is important to understand
how these complexities need to be reflected in the design.

1.6.9 Data migration phases


Proper planning is an important requirement of the data migration project, as lack
13
thereof poses a risk that can affect the over budget of the project, loss of data,
exceeding the deadline and/or even failing to complete the project. Team members
performing data migration need to be flexible and highly skilled, require minimal
technical knowledge, but must be intuitive so the business and technical staff can work
collaboratively. Users should be able to understand and implement complexity of
business rules for data migration or quality assurance of data. Three phases can be
identified, as follows:
a. Pre-strategy phase: in this phase, the project manager should identify the
number of legacy systems and count their data structures. Interfaces should also
be identified at this point, if possible.
b. Design phase: this phase should include the mapping of key constraints and
performing data mappings from the logical to the physical model. The design
relates old data formats to the new system's formats and requirements.
c. Testing phase: this is a subsequent phase that should deal with both logical and
physical (syntactical) errors. Once test data has been migrated, the following
basic questions should be addressed:
(i) How many records were supposed to be created?
(ii) How many were actually created?
(iii) Did the data migrate to the correct fields?
(iv) Was the data formatted correctly? (UNISA 2022)

1.6.10 Data migration approach


To achieve an effective data migration procedure, data on the old system is mapped to
the new system utilising a design for data extraction and data loading. Programmatic
data migration may involve many phases, but at the very least it includes data extraction,
where data is read from the old system as well as data loading, where data is written to
the new system. After loading data into the new system, results are subjected to data
verification to determine whether data was accurately translated, is complete, and
supports processes in the new system. During verification, there may be a need for a
parallel run of both systems to identify areas of disparity and forestall erroneous data
loss. Data migration phases (design, extraction, cleansing, load, verification) for
applications of moderate to high complexity are commonly repeated several times
before the new system is deployed.

a. In a highly adaptive approach, concurrent synchronisation, a business-oriented


audit capability and clear visibility of the migration for stakeholders are likely to be
key requirements in such migrations.

b. In a business process migration, business processes operate through a


combination of human and application systems actions, often orchestrated by
business process management tools. When these changes occur, they can require
the movement of data from one store, database or application to another to reflect
the changes to the organisation as well as information about customers, products
and operations.

c. Legacy migration. They can choose from several strategies, which depend on
the project requirements and available resources. Examples of such migration
drivers are mergers and acquisitions, business optimisation and reorganisation to
attack new markets or respond to competitive threat (UNISA 2022).

1.6.11 Validation and implementation of data


In the initial stages, the team needs to identify and address all potential problems before
starting the data migration project. The team needs to develop the rules for migration
plans, data mapping methods and testing methods that can be based on proven facts
rather than assumptions. The following key validations/attributions should be done:
a. Relevance: Is it relevant to all sources of data?
b. Accuracy: Can data be accurate?
c. Integrity: Does it have a logical structure that can be scrutinised?
d. Consistency: Is it understood and consistent?
e. Completeness: Does it ensure to completeness?
f. Validity: Is it valid for business processes?
g. Timeliness: Is it updated?
h. Accessibility: Can it be accessed and reliable?
i. Compliance: Does it comply to standards?
j. Testing and verifying Data: There are many complexities involved in carrying
out data migration testing and verifying. This stage helps in avoiding and
preventing not dealing with issues until too late in the development stage, when it
may prove to be expensive to rectify. Before the conversion of data, teams need
to ensure that unit, system, volume, online application tests are carried out before
the sign-off, as reflected in the business agreement. Each unit of work, the full
volume upload and online application test stage should be conducted as early as
possible. It must be tested to ensure several units of work that may need to be
completed and online testing that can be done in the initial stages are identified.

1.6.12 Reasons for data migration failure.


According to Azeroual and Jha (2020), regardless of which migration strategy a
company chooses, (i.e., storage migration, database migration, application migration,
and business process migration), there should always be a stronger focus on data
cleansing. On the one hand, complete, correct, and clean data not only reduce the cost,
complexity, and risk of the changeover; it also provides a good basis for quick and
strategic company decisions, and is therefore an essential basis for today’s dynamic
business processes. Data quality is an important issue for companies looking at data
migration these days. In many companies, data migration projects fail because their
importance and complexity are not taken seriously enough. In order to determine the
relationship between data quality and data migration, an empirical study with 25 large
German and Swiss companies was carried out to determine the importance of data
quality in companies for data migration. It was found that without acceptable data
quality, data migration is impossible. To make accurate decisions, the data present in
the decision databases should be of good quality. If the migration effort does not
formally specify the level of end-state data quality and the set of quality control tests
that will be used to verify that data quality (garbage in garbage out), the target domain
may wind up with poor data quality, which may result in (i) costs associated with error
detection,(ii) costs associated with error rework (iii) costs associated with error
prevention (iv) time delays in operations (v) costs associated with delays in processing
(vi) difficulty and/or faulty decision making, and (vii) enterprise-wide data inconsistency.
Data quality is the measure of accuracy of data that meets the business requirements
and supports to the decision makings. Data quality can be assessed in terms of various
dimensions such as completeness, accuracy, precision, consistency, and derivation
integrity (UNISA 2022).

15
1.7. System integrating management information across the entire enterprise
Companies are always seeking to become nimbler in their operations and more
innovative with their data analysis and decision-making processes. They are realising
that time lost in these processes can lead to missed business opportunities. In principle,
the core Big Data challenge is for companies to gain the ability to analyse and
understand internet-scale information just as easily as they can now analyse and
understand smaller volumes of structured information. A lack of access and/or timely
access therefore reduces the organisation’s ability to make timely and effective
decisions. Many organisations have realised that timely access to appropriate, accurate
data and information is the key to success or failure. Therefore, for an organisation to
be competitive, data must be collected, processed into useful information, stored and
used in decision making (Li, Jiang & Zomaya 2017). Accordingly, organisations (and
individuals) usually make better informed decisions (f) if they have access to more data
and valuable information.

Klaus, Rosemann, & Gable (2000) pointed out that many organisations prefer to use
one computer system that can be used throughout the organisation by integrating all
the functions. The integration of transaction-oriented data and business process
throughout an organisation is enabled by enterprise systems. These enterprise systems
are commercial software packages and include Enterprise Resource Planning (ERP)
and related packages as software for advanced planning and scheduling sales force
automation and product configuration, among others. Khoualdi and Basahel (2014)
explain that an ERP is a system integrating management information through the
management of the flow of data across the entire enterprise. According to Sadyrin,
Syrovatskay and Leonova (2021), the use of integral systems for interaction with
customers (CRM systems) and resource management and planning systems (ERP
systems) is common business practice for many companies, but at the same time,
modern management faces all the new tasks that are associated not only with storage,
analysis , assessment, verification and protection of very large amounts of data as well
as the need to use new data types in the development of control solutions. This
digitalisation of the economy and the gradual transition to a new technological order
open up new opportunities and prospects in the use of digital and information
technologies in analytical work. In many cases, it is necessary to use both structured
and unstructured data, the sources of which can be a variety of technical and electronic
devices. In this case, the data can have completely different formats.

Also refer to AIN1501 study unit 2 (UNISA 2022) for the AIS
implementation in an ERP environment and decision-making, including
advantages and disadvantages of an ERP system and the relationship
between SAP and ERP software.

1.7.1 Capturing
Data is first captured on paper (hard copy) or electronic (soft copy) source documents.
The electronic source documents referred to are documents created outside of the
organisation’s CISs. For example, the organisation receives an electronic file from one
of its trading partners, containing a batch of electronic source documents. The
organisation’s processes will determine the frequency of capturing the data – for
example, employee data may only be captured once a month before the payroll run,
supplier invoices may be captured twice a week and goods received notes may be
captured daily. Thus, data can be captured through batch input or online input.

1.7.1.1 Batch input


Batch input refers to both a standard ERP interface and a procedure for data migration.
Batch input can be used for data migration in two ways: Standard batch input programs
and Enterprise systems, which are Accounting Information systems (AISs). The SAP
ERP system contains various batch input programs that transform prepared legacy data
into a format that dialog transactions can process. These programs are called standard
batch input programs.

Batch input is used where a huge number of similar source documents must be
captured, and up-to-date data and information are only required on the same
frequency as the frequency of data capturing. An example of batch input is Unisa main
campus receiving completed MCQ mark sheets from Unisa regional offices and through
the post. These completed mark sheets are batched together and captured daily into
Unisa’s CIS. It involves similar source documents being grouped together (batch) and
then entered in the CIS periodically, say, daily, weekly or monthly. It will require
additional batch controls and procedures to be implemented in the organisation’s
control environment (UNISA 2022).
You will learn more about these controls and procedures in auditing (AUE2601).

1.7.1.2 ERP and batch input


The standard ERP interfaces used in batch input refer to both a standard ERP interface
and a procedure for data migration. These are based exclusively on the interfaces,
which are called standard ERP interfaces, being provided in the SAP ERP system. This
mature, proven technology “feeds” dialog transactions with the provided data (usually
in the background), ensuring that all input checks are run, and ensuring that all data
imported with batch input is correct and consistent in the SAP ERP system. The clear
benefit of a batch input recording is that one deals only with the input fields of a dialog
transaction that are relevant for one’s specific case. Ultimately, this ensures that all
input checks are run, ensuring that all data imported with batch input is correct and
consistent in the SAP ERP system (UNISA 2022).

1.7.1.3 Advantages of batch input


The advantage of using batch input is that economies of scale (increased productivity
and lower hardware costs) can be achieved because the data is captured at one point
only and data capturing is not dispersed throughout the organisation.

1.7.1.4 Disadvantages of batch input


17
The main disadvantage of using batch input is that the CIS is not always up to date with
the latest data and information.

1.7.2 Online input


Online input involves data being immediately captured into the CIS at the point where
the activity occurs. An example of online input is students capturing their assignment
MCQ answers themselves into the myUnisa CIS. Sadyrin, Syrovatskay and Leonova
(2021) point out that economic activity is increasingly conducted in an online format
and focuses on various social networks and digital platforms that act as platforms for
doing business and a consumer resource.

According to Souza, Silva, Coutinho, Valduriez and Mattoso (2016), when this is
accomplished online (i.e., without requiring users to stop execution to reduce the data
and resume execution), it can save much time and user interactions can integrate within
workflow execution. Thus, reducing input data is a powerful way to reduce overall
execution time in such workflows. However, because data is captured directly and
immediately, any corrections to the data must also be made immediately in order for the
data capturing process to be completed.

Taking the example of a supermarket pay point, where barcode scanners and terminals
for online inputting are used when customers buy inventory items by scanning
(capturing) the inventory item at the point of sale (pay point), there could be experiences
where the barcode of an item is not being recognised by the CIS (error). As a result, the
person at the till has to manually enter the correct barcode (correction) in order to
complete the transaction.

Large-scale scientific computing often relies on intensive tasks chained through a


workflow. In this regard, scientists need to check the status of the execution at particular
points, to discover if anything odd has happened and then take action. To achieve that,
they need to track partial result files, which is usually a complex and laborious process.
When using a scientific workflow system, provenance data keeps track of every step of
the execution. If traversing provenance data is allowed at runtime, it is easier to monitor
and analyse partial results. Furthermore, scientists can monitor time-consuming
workflow execution at specific simulation exploration points, analyse data at runtime
and decide to stop or re-execute some activities. However, most of the systems execute
workflows “offline” and do not allow for runtime analysis and workflow steering (UNISA
2022). Carvalho, Essawy, Garijo, Medeiros and Gil (2017) assert that workflow systems
support scientists in capturing computational experiments and managing their
execution. Thus, an initial workflow will be changed to create many new variants thereof
that differ from each other in one or more steps.

1.7.2.1 Advantages
The main advantage of using online input is that the data in the CIS is always up to
date. (Please note that although the raw data may be up to date, the information in the
CIS may not be up to date as the data may not be processed yet. Refer to section
1.4.2.)
1.7.2.2 Disadvantages
A disadvantage is that online inputting is more costly because hardware is required
to capture the data at each point where the activity to be captured takes place. Another
major problem, according to Souza, Silva, Coutinho, Valduriez and Mattoso (2016), is
to determine which subset of the input data in a scientific workflow system should be
removed. This includes other problems relating to guaranteeing that the workflow
system will maintain execution and data consistent after reduction, and keeping track
of how users interacted with execution. According to Carvalho, Essawy, Garijo,
Medeiros and Gil (2017), scientists workflow systems are not designed to help
scientists create and track the many related workflows that they build as variants, trying
different software implementations and distinct ways to process data and deciding what
to do next by looking at previous workflow results. During a scientific workflow
execution, a major challenge is how to manage the large volume of data to be
processed, which is even more complex in cloud computing where all resources are
configurable in a pay-per-use model (UNISA 2022).

1.8 Processing data

For raw data to become information, it must be processed (c) (i.e., data
processing) (UNISA 2022).

Consider the following as an example: according to Johnson and Bull (2015), in


designing for visualisation of formative information on learning, Routledge Schools are
diverse, information-rich environments that involve different stakeholders in the
learning process. These include teachers, students, and students’ peers, who are
arguably key stakeholders, along with parents, school leaders, and policy makers.
Inside the classroom, aspects of formative assessment are ubiquitous, with
formative feedback to students about their competencies, knowledge, skills, etc,
appearing in many forms. These may include (but are most certainly not limited to):
written or verbal feedback on work/observations from a teacher, students appraising
their own or others’ work or ideas, discourse with other students instigated from
observing a learning-based artefact, or interaction with other technology. In this
regard, formative assessment can be described as a continuous and systematic
process to gather evidence and provide feedback during learning and can often be
immediately used. The ultimate aim for formative assessment is to help identify
where the student is within his or her learning, and then identify gaps and what he
or she needs to do next to further his or her knowledge or skills, amongst others. It
is further argued that formative assessment itself needs to be planned and built into
the design of educational processes and environments. With regard to what is
covered by formative assessment, this may be more focused towards traditional
domain content or may include other qualities which could be better considered as
21st century skills, such as critical thinking.

1.8.1 The main directions in the analysis and processing of big data

Jelonek (2017) states that the term "big data" refers to datasets whose size exceeds the
19
capabilities of typical databases for entering, storing, managing and analysing
information. Also, in the study of evaluation of Using Big Data Technologies in Russian
Financial Institutions (Bataev 2018), data growth occurs throughout the world, and the
Russian Federation is no exception. In 2017 the data volume reached 580 exabytes,
and this figure reached about 980 exabytes in 2020. According to Jelonek (2017), it
suggests something more than just an analysis of huge amounts of information. The
question is not that organisations create huge amounts of data, but that most of them
are presented in a format that does not conform to the traditional structured database
format – they are weblogs, video recordings, text documents, machine code or, for
example, geospatial data. All this is stored in a variety of different repositories,
sometimes even outside the organisation. As a result, corporations have access to a
huge volume of their data and do not have the tools necessary to establish relationships
between this data and draw meaningful conclusions based on them. If we add that the
data is updated more and more often, it turns out that traditional methods of analysing
information cannot keep up with the huge amount of constantly updated data, which
eventually opens the way for big data technologies.

1.8.2 Electronic Document Management System (EDMS)


According to Artamonov, Ionkina, Tretyakov and Timofeev (2018), over the last
decade active implementation of workflow automation has taken place in large
commercial organisations to improve the quality of work and reduce labour costs and
time input. EDMS in commercial organisations are designed to automate and combine
in one system all the processes going on inside the company, starting from the work of
the personnel department, ending with a specialised analysis of field-oriented data.
Following commercial organisations, scientific organisations with a large flow of
information began to turn to the automation of internal processes, for example, CERN,
the Joint Institute for Nuclear Research, the Scientific Technical Institute for
Interindustry Information and others. The peculiarity of document circulation in
scientific organisations is the need for detailed processing of incoming unstructured
materials and multiple formats data on various computer readable media, and its
subsequent analysis. The European Organization for Nuclear Research (CERN) is a
good example of successful EDMS implementation among major scientific
organisations. Its Electronic Document Handling system (EDH) was developed in
cooperation with the Joint Institute for Nuclear Research and performs almost all of the
interaction between the CERN employee and the administrative units of the Institute:
signing and support of documents, personnel issues (signing contracts, re-attestation,
holidays, etc), recording for various advanced training courses and passing
safeguarding tests, renting transport and much more. EDH allows the organisation to
process 2 000 documents daily. According to Fisher and Frey (2015), schools are
awash with data, and teachers are being asked to gather data in a myriad of high-tech
and low-tech ways. But gathering is not analysing, and without analysis there is little
reason to gather the data in the first place. Teachers need data-collection systems that
lend themselves to rapid analysis and action.

1.8.3 Ways of data processing


Data processing systems face the task of efficiently storing and processing data at
petabyte scale, with the amount set to increase in the future. To meet such a
requirement, highly scalable, shared-nothing systems, e.g., Google's BigTable or
Facebook's Cassandra, are built to partition data and process it in parallel on distributed
nodes in a cluster. This allows the handling of data at scale but introduces new
challenges due to the distribution of data (UNISA 2022).

In-network data processing

Running queries involves a high network overhead because data has to be exchanged
between cluster nodes and hence, the network becomes a critical part of the system.
To avoid the network bottleneck, it is essential for distributed data processing systems
(DDPS) to be aware of the network rather than treating it as a black box. Thus, query
throughput in a DDPS can significantly be improved by performing partial data reduction
within the network. Therefore, an in-network, processing was proposed as a way of
achieving network awareness to decrease bandwidth usage by custom routing,
redundancy elimination, and on-path data reduction, thus increasing the query
throughput of a DDPS. The challenges of an in-network processing system range from
design issues, such as performance and transparency, to the integration with query
optimisation and deployment in data centres. These challenges are formulated as
possible research directions and provide a prototype implementation (UNISA 2022).

Global data processing

As far as in 1998, in a study conducted by Pimazzoni (1998), during the days of


company mergers, downsizing, greater external competitive forces, and pressure to
use internal resources more efficiently, many pharmaceutical companies were
approaching their business in global terms. The meaning of globalisation among
companies differs, however, and success is dependent on many factors. One of these
factors was the efficient management of clinical data, which became increasingly
important. It was one of the keys to improving data quality, and thus, reducing the time
to market. The globalisation of the clinical data management operation required the
development of an integrated database system, the implementation of common
working procedures, and the involvement of people from different countries with
various cultural backgrounds. The impact of and problems and benefits stemming from
these new global data management concepts on local operating companies, in
working on international mega trials, needed to be considered.

Near-Data processing
According to Vinçon, Koch and Petrov (2019), Near-Data Processing refers to an
architectural hardware and software paradigm, based on the co-location of storage and
compute units. Ideally, it will allow for the execution of application-defined data- or
compute-intensive operations in-situ, i.e., within (or close to) the physical data storage.
Thus, Near-Data Processing seeks to minimise expensive data movement, improving
performance, scalability, and resource-efficiency. Processing-in-Memory is a sub-class
of Near-Data Processing that targets data processing directly within memory (DRAM)
chips. The effective use of Near-Data Processing mandates new architectures,
algorithms, interfaces, and development toolchains.
Scientific simulations
The overall goal of Wang (2015), in his study, was to provide high-performance data
management and data processing support on array-based scientific data, targeting data-
intensive applications and various scientific array storages. In this regard, he believed
that such high-performance support can significantly reduce the prohibitively expensive
costs of data translation, data transfer, data ingestion, data integration, data processing,
21
and data storage involved in many scientific applications, leading to better performance,
ease of use, and responsiveness. According to him, scientific simulations were being
performed at finer temporal and spatial scales, leading to an explosion of the output data
(mostly in array-based formats), and challenges in effectively storing, managing,
querying, disseminating, analysing, and visualising these datasets. Many paradigms
and tools used for large-scale scientific data management and data processing were
often too heavy-weight and had inherent limitations, making it extremely hard to cope
with the `Big Data’ challenges in a variety of scientific domains. In contrast to offline
processing, implementation of high-performance data management and data
processing support on array-based scientific data, could avoid, either completely or to a
very large extent, both data transfer and data storage costs.
Big Data in financial analysis

According to Ziora (2015), Big Data is a term connected with an analysis of all the
aspects of huge volumes of data, which can be also conducted in real time.
Technologies of data gathering, data processing, presentation and making available of
information are used in a novel way by e-entrepreneurship in order to create new
business ventures, distribute information and to cooperate with customers and
partners. The basic reasons why organisations implement big data solutions is for
gaining competitive advantage and optimizing business processes. Sadyrin,
Syrovatskay and Leonova (2021) in their article discuss the promising directions of
using big data in financial analysis procedures, which, being integrated into the system
of forming various management decisions, can significantly increase their efficiency.
They posit that the ongoing digital transformation of the national economy inevitably
sets the task of introducing digital technologies and tools into the practice of economic
activities of organisations and enterprises.

One of the areas of digitalisation is Big Data technology, which in many ways is already
being used in the field of finance. However, for the effective use of big data in the
practice of financial analysis of economic activity, it is necessary to solve a variety of
significant problems. This was in consideration of the necessary elements of the
financial analysis system, which, first of all, should be focused on the use of big data,
as well as aspects that, when using digital technologies, can provide the maximum
effect. An analysis of the current methodological and procedural framework for financial
analysis of economic activity shows that it is far from fully consistent with modern
economic, informational, and technological realities, since it was developed long before
the start of digital transitions. In previous years, the main direction of improving financial
analysis was the automation of accounting and analytical procedures based on the use
of computers and various software products. To improve financial analysis, there is
currently a fairly large number of software and hardware tools, the scale of which can
range from a separate accounting automation program to complex management
systems for large international companies. In Russia, one of the leaders in the use of
big data is the financial institutions, which have accumulated a huge amount of
information, in particular on the client base that needs further processing. According to
Sadyrin, Syrovatskay and Leonova (2021), information and analytical systems of
financial analysis should rely on opening up opportunities. One of the most promising
areas of improving financial analysis and increasing its efficiency is the use of various
Big Data technologies to develop new methodological foundations of financial analysis
in order to form effective technological solutions for specific tasks.

1.8.4 Applying Big Data technologies and models

The field of applying Big Data technologies is relatively new and involves a large number
of issues and problems.

Business intelligence

As cited by Ziora (2015), Ohlhorst mentions that Big Data solutions can include such
analytics concepts and technologies as "Traditional Business Intelligence (BI), which as
he mentions consists of a broad category of applications and technologies for gathering,
storing, analysing, and providing access to data. BI delivers actionable information
which helps enterprise users make better business decisions, using fact-based support
systems. It allows for conducting in-depth analysis of detailed business data, provided
by databases, application data, and other tangible data sources". Business intelligence
solutions can improve the decisionmaking process in terms of the acceleration of
decision-making process at all levels of management and increasing its efficiency and
efficacy.

Data mining

Data mining is a process in which data is analysed from different perspectives and then
turned into summary data that are useful. Data mining is normally used with data at rest
or with archival data. Data mining techniques focus on modelling and knowledge
discovery for predictive, rather than purely descriptive, purposes such as uncovering
new patterns from large data sets. Big Data can be applied in hybrid systems as well
and accelerated decision-making and efficient enterprise management support can be
achieved by deploying the right methods and techniques of data mining. Statistical
applications. These look at data using algorithms based on statistical principles and
normally concentrate on data sets related to polls, census, and other static data sets
(UNISA 2022).

Statistical applications

They deliver sample observations that can be used to study populated data sets for the
purpose of estimating, testing, and predictive analysis. Empirical data, such as surveys
and experimental reporting, are the primary sources for analysable information.
Predictive analysis is a subset of statistical applications in which data sets are examined
to come up with predictions, based on trends and information gleaned from databases
(UNISA 2022).

Data modelling

This is a conceptual application of analytics in which multiple “what-if” scenarios can be


23
applied via algorithms to multiple data sets. (UNISA 2022).

Data marts and data warehousing

Another form of big data technology also exists, namely data marts and data
warehouses, which are frequently key components of business intelligence systems.
Other pros of big data application include the following: the recommendation engine
allowing online retailers to match and recommend users to one another or to products
and services based on analysis of user profile and behavioural data; sentiment analysis
used in conjunction with Hadoop, advanced text analytics tools analyse the unstructured
text of social media and social networking posts (UNISA 2002).

Risk modelling

Risk modelling allows for analysis of large volumes of transactional data to determine
risk and exposure of financial assets, to prepare for potential “what-if” scenarios based
on simulated market behaviour, and to score potential clients for risk; fraud detection
where big data techniques are used to combine customer behaviour, historical and
transactional data to detect fraudulent activity; customer churn analysis where
enterprises use Hadoop and big data technologies to analyse customer behaviour data
to identify patterns that indicate which customers are most likely to leave for a competing
vendor or service; social graph analysis, which helps enterprises determine their “most
important” customers; customer experience analytics allowing for integration of data
from previous customer interaction channels such as call centres, online chat, etc, to
gain a complete view of the customer experience (UNISA 2022)

Network monitoring

Network monitoring allows administrators to monitor network activity and diagnose


bottlenecks; and to conduct research and development where enterprises, such as
pharmaceutical manufacturers, use Hadoop to comb through enormous volumes of text-
based research and other historical data to assist in the development of new products.

1.8.5 Methods of processing data


The methods that can be used to process data into information include the following:

(a) Classifying data


Data is arranged into different groups (categories) using some of the data’s specific
characteristics, for example, classifying the data according to cash or credit sales or
according to the source document (i.e., orders, tax invoices, etc).

(b) Performing calculations


Arithmetical or logical calculations can be performed on data. Arithmetical calculations
include addition (+), subtraction (–), multiplication (x) or division (/), for example,
calculating the cost per unit (R cost divided by number of units); the VAT amount on a tax
invoice (VAT% multiplied by the excluding VAT amount); net profit/loss (expenses
subtracted from income); the total amount on an invoice (adding the amounts of the
individual items).

Logical calculations include comparing the data to other data or calculations – for
example, is the data the same (=), not the same (<>), greater than (>), smaller than (<),
greater than or equal to (>=) or smaller than or equal to (<=)? The logical calculation will
have a true or false answer, and based on this answer, further processing will take place.
The IF function is one of various Microsoft Excel functions that can be used for logical
calculations. See study unit 5 in which the IF function is explained in detail.

(c) Sorting data


Data is organised (sorted) in an orderly sequence based on specific criteria, for
example, purchase orders in numerical sequence or the names of customers in
alphabetical order. In part 2 we will learn how to sort data using Microsoft Excel.

(d) Summarising data


This process condenses the data by extracting only specific data based on criteria
provided by the user – for example, adding all the transactions for a specific supplier for
a particular month. A pivot table is an example of a Microsoft Excel function that can be
used to summarise data. In part 2 we will learn how to create a pivot table using Microsoft
Excel.

(e) Transforming data


Data is processed by transforming the format or medium of the original data into another
format or medium – for example, accounting data is transformed into graphical data
(graph), or audio files can be transformed into written text (used by individuals with
hearing disabilities to communicate telephonically) or vice versa. (UNISA 2022).
In Microsoft Excel (Topic 2) we will learn to transform data using graphs.

Input-output in analysing student outcomes

As an example, as early as in 1979, Elfner (1979) in a study where he investigated


alternative techniques for analysing student outcomes, two traditional statistical
methods (one-way analysis of variance and two-way analysis of variance) were
compared with an innovative input-output analysis. The input-output analysis is a two-
step analysis, consisting of a regression analysis to determine the relationships among
input and output variables and an analysis of variance of the residuals grouped by
treatment to determine treatment effect. Student outcome data used was generated in
a typical classroom experiment, comparing three different methodologies of presenting
material to students. Pre- and post-measures of student achievement on a final exam
were taken. Some measures of student input characteristics, including grade point
average, age, sex, year in school, and residential status, were also taken. While the
traditional analysis techniques failed to show any treatment effects, the input-output
25
technique showed one of the treatments to be superior to the others. The treatment
effects on student outcomes may well be a function of input characteristics and an
interaction over time for individual students, which are not always discernible with
traditional statistical analysis techniques. However, the input-output analysis allows for
these problems by including other input characteristics in the analysis and allowing for
the separation of data by individuals. It was concluded that student outcome research
could benefit from the application of the input-output technique.

1.8.6 Processing types


Data can be processed into information by means of two processing types, either
batch processing or real-time processing.

(a) Batch processing

Batch processing occurs when the transaction files (containing the


captured data) are updated to the master files periodically, that is, daily,
weekly or monthly (UNISA 2022).

The main drawback of this method of processing is that the master files are only up to
date once the updating of the transaction files has occurred. When using this method
of processing, users who utilise information and data must be aware of which
transaction files were updated and which files were not, and therefore how up to date
the information they are looking at is. An example of batch processing is the marking
(processing) of all the captured AIN2601 Assignment 1 QUIZ answers on a specific
date. Students would therefore need to be aware that year marks will only be updated
after the assignment marks have been processed. Transaction files and master files
are explained in study unit 2.

(b) Real-time processing

The immediate update of the transaction files to the master files as the
transaction occurs is called real-time processing (UNISA 2022).

An example of real-time processing is buying a movie ticket at a movie theatre. The movie
you want to see and number of tickets (data) entered (either by yourself at the ticket
machine or by the salesperson) is immediately updated to the master files (seating plan)
so that the same seat cannot be sold twice to two different persons.

Real-time processing ensures that the master files are always up to date, and this is
also the greatest advantage of processing. Transaction files and master files are
explained in study unit 2.

1.9 Output/information
Processed data becomes information, and this information can be retrieved by users
through batch output or interactive output.

1.9.1 Batch output

Batch output occurs when all requests for information (i.e., reports, queries,
etc) are batched together and periodically extracted from the CIS (UNISA
2022).

Since requests are batched before being extracted, users have to wait to receive their
required outputs. Batch output is often used for routine reports that must be extracted
at the same time each day, week or month (e.g., sales reports for the day, week or
month). These batched reports are pre-specified and include the same parameters
each time.

A bank generating monthly bank statements for clients is an example of batch output, as
the bank will extract the required information once a month on a specific date from their
CIS; i.e., all clients' bank statements are extracted in one batch.

The benefit of using batch output is that reports are consistent between periods.
For example, the sales reports will include the same branches in the different
geographical areas each time the specific report is extracted. Another benefit of output
is that the extraction of reports can be over down times (over weekends, evenings, etc),
thereby optimising computer resources.

1.9.2 Interactive output

Interactive output occurs when users are directly connected to the CIS and
can request certain information and receive it immediately (UNISA 2022).

Using internet banking and viewing your transactions for a month or a period specified
by you is an example of interactive output.

The main benefit of using this method of output is that users can immediately receive
information for decision making. One of the drawbacks of interactive output is that
the computer resources are not optimally used and as a result, the performance of the
CIS may be negatively affected. For example, at month-end, numerous users extract
reports from the CIS while day-to-day transaction processing still continues. This
increase in use of the CIS can make the users experience a “very slow” CIS response
time (i.e., the computer is “slow”).

1.10 Typical processing systems


In section 1.6 to 1.9 we learnt about batch and online input, batch and real-time
27
processing and batch and interactive output. Different combinations of these inputs,
processing and output types are used in an organisation’s CIS. Some of these typical
combinations are as follows:

Batch input, batch processing and batch output

The processing of claims at a medical aid is an example of such a system. Members


submit claims, which are batched together and then entered in the CIS. All the entered
batches are updated at the end of each day. Routine claim reports, indicating the claims
of the previous week, are extracted and distributed every Monday morning. See figure
1.2 for a visual representation.

FIGURE 1.2: Batch input, batch processing and batch output (UNISA 2022)

Batch input, batch processing and interactive output

For example, twice a week, the gym partner of a medical aid provides the medical aid
with a batch of electronic source documents in one file. This file contains the number of
times each member of the medical aid visited the gym. The batch source documents
contained in the electronic file are imported when received. The transaction files
containing the gym data are updated every Saturday. The members can view their
information on the medical aid’s secure website as soon as the transaction file has
been updated.

Online input, batch processing and interactive output

For example, each branch of an organisation enters its request for inventory online as
needed. The transaction file containing the different branches’ requests is updated to
the master file every two days. The branch manager can extract order information
directly from the operational system.
Online input, real-time processing and interactive output

An example of such a system is the processing of a transaction at a bank’s automatic


teller machine (ATM). The customer enters the transaction at the ATM. The transaction
is immediately updated to the master file and the customer receives a receipt with the
transaction details and his or her new updated balance. See figure 1.3 for a visual
representation.

FIGURE 1.3: Online input, real-time processing and interactive output


(UNISA 2022)

Please note that an organisation’s CIS is not limited to one type of processing system.
Many organisations will use all the types of input, processing and output. The type used
will be determined by the activity performed, i.e., an organisation can use both batch
input and online input. Take for example the capturing of Unisa QUIZ assignment
answers. The answers submitted by students through the myUnisa interface is online
input and the capturing of the physical answer sheets will be captured using batch input
at the Unisa main campus.

Can you think of an example where a CIS uses both real-time processing and batch
processing? Pastel Partner (per the standard set up) will use batch processing for
invoices but will use real-time processing for good receive notes (GRN). You will learn
more about Pastel Partner in topic 7. Refer back to this study unit after you have
completed topic 7.

29
Banks use both batch output (printing of monthly bank statements) and interactive
output (request of a mini statement at an ATM).

Activity 1.1

Think about what processing systems you have encountered in everyday


life. For example, what type of processing system (input, processing and
output) is your favourite clothing store, supermarket or restaurant using?

Go to Discussion Forum 1.1 and discuss this with your fellow students.

Guidelines for participating in forums:


• Compile your post offline and keep record of it.
• Use an academic writing style for referencing and citing the sources you used.
• Post your answer on the forum.
• Reply to contributions of at least two of your fellow students.

1.11 Storage of data and information


Data is saved (stored) for use in processing, and information is stored (saved) to be
used by users. Data and information must be stored in the CIS in such a way that it
can be easily accessed when needed.

1.11.1 Flat file environment

Historically, computer systems’ data and information were stored in a flat file
environment, where files are not related to one another and the users of data and
information each keep their own data and information (UNISA 2022).

This is similar to an environment in which users each have their own Microsoft Excel
spreadsheet and do not share the data and information on their individual
spreadsheets. As computer systems evolved and the need arose for users to share data
and information, databases became the preferred method of storing data and
information. The database environment will be discussed in detail in study unit 2.
Simplistically, the flat file and database environment can be visualised as in figure 1.4
(flat file) and figure 1.5 (database).
FIGURE 1.4: Flat file environment (UNISA 2022)

31
FIGURE 1.5: Sharing data and information in a database environment (UNISA
2022)

1.12 Summary

In this study unit, we looked at typical processing systems based on the types of data input,
processing and output. We also gained an understanding of how data is processed into
information by sorting, classifying, calculating, summarising and transforming it. In the next
study unit, we will examine in detail the database environment used to store data and
information.

REFERENCES:

Amin, R., Vadlamudi, S. & Rahaman, M.M. (2021). Opportunities and challenges of data
migration in cloud. Engineering International, 9(1), pp. 41-50. Elfner, E.S. 1979. A
Comparative Study of Alternative Techniques for Analyzing Student Outcomes.

Artamonov, A., Ionkina, K., Tretyakov, E. & Timofeev, A. (2018). Electronic document
processing operating map development for the implementation of the data
management system in a scientific organization. Procedia computer science, 145, pp.
248-253.

Azeroual, O. & Jha, M. (2021). Without data quality, there is no data migration. Big Data
and Cognitive Computing, 5(2), p.24

Bataev, A.V. (2018, September). Evaluation of Using Big Data Technologies in Russian
Financial Institutions. In 2018 IEEE International Conference" Quality Management,
Transport and Information Security, Information Technologies"(IT&QM&IS) (pp. 573-577).
IEEE.
Bernhardt, V. L. (1998). Data analysis for comprehensive schoolwide improvement. Eye on
Education.p.1

Carvalho, L.A.M.C., Essawy, B.T., Garijo, D., Medeiros, C.B. & Gil, Y. (2017).
Requirements for supporting the iterative exploration of scientific workflow variants.
In Proceedings of the Workshop on Capturing Scientific Knowledge (SciKnow),
Austin, Texas (Vol. 2017).

Fisher, D. & Frey, N. (2015). Show & Tell: A Video Column/Don't Just Gather Data--Use
It. Educational Leadership, 73(3), pp.80-81.

Future Generation Computer Systems, Volume 78, Part 1. (2018).


p 353-368, ISSN 0167-739X, https://doi.org/10.1016/j.future.2016.06.009.
(https://www.sciencedirect.com/science/article/pii/S0167739X16301923)

Jelonek, D. (2017). Big Data Analytics in the Management of Business. In MATEC Web of
Conferences (Vol. 125, p. 04021). EDP Sciences.

Johnson, M.D. & Bull, S. (2015). Designing for visualisation of formative information on
learning. In Measuring and Visualizing Learning in the Information-Rich Classroom (pp. 237-
250).

Latt, W.Z, (2019). Data migration process strategies (Doctoral dissertation, MERAL Portal).

Li, K.C., Jiang, H. & Zomaya, A.Y. (eds). (2017). Big data management and processing.
CRC Press. www.accountingtools.com/articles/what-is-a-source-docu…(accessed
19/10/2022)
Maxwell, G.S. (2021). Different Approaches to Data Use. In Using Data to Improve Student
Learning (pp. 11-71). Springer, Cham.

Maxwell, G.S. (2021). Collecting Data and Creating Databases. In Using Data to Improve
Student Learning (pp. 113-141). Springer, Cham.

Pimazzoni, M. (1998). On Global data management: a winning approach to clinical data


processing. Drug information journal: DIJ/Drug Information Association, 32(2), pp.
569-571.

Sadyrin, I., Syrovatskay, O. & Leonova, O. (2021). Prospects for using big data in financial
analysis. In SHS Web of Conferences (Vol. 110). EDP Sciences.
Sakr, S. & Gaber, M. (eds). (2014). Large scale and big data: Processing and
management. Crc Press.

Sarmah, S.S. (2018). Data migration. Science and Technology, 8(1), pp.1-10.
33
Corresponding author: sarmah.simanta@gmail.com (Simanta Shekhar Sarmah) Published
online at http://journal.sapub.org/scit Copyright © 2018 Scientific & Academic Publishing.
All Rights Reserved.

Souza, R., Silva, V., Coutinho, A.L., Valduriez, P. & Mattoso, M. (2016, November). Online
input data reduction in scientific workflows. In WORKS: Workflows in Support of Large-
scale Science.
Souza, R., Silva, V., Coutinho, A.L., Valduriez, P. & Mattoso, M., (2020). Data reduction in
scientific workflows using provenance monitoring and user steering. Future
Generation Computer Systems, 110, pp. 481-501.
UNISA. (2022). Study material for Accounting Information System in a Computer
Environment AIN1501. Unisa, Pretoria.
UNISA. (2022). Study material for Practical Accounting Data Processing AIN2601. Unisa,
Pretoria.
Vinçon, T., Koch, A. & Petrov, I. (2019). Moving processing to data: on the influence of
processing in memory on data management. arXiv preprint arXiv:1905.04767
Voithofer, R. & Golan, A.M. (2018). Data Sources for Educators. Responsible Analytics
and Data Mining in Education: Global Perspectives on Quality, Support, and Decision
Making.
Wang, Y., (2015). Data management and data processing support on array-based
scientific data (Doctoral dissertation, The Ohio State University).
Warnecke, B. (2018). From Customizing back to SAP standard options and limits when
migrating to SAP S/4HANA. HMD Praxis der Wirtschaftsinformatik, 55(1), pp. 151-
162.

Ziora, A.C.L. (2015). The role of big data solutions in the management of organizations.
Review of selected practical examples. Procedia Computer Science, 65, pp. 1006-1012.
https://doi.org/10.1016/j.procs.2015.09.059

You might also like