Data Analytics
Data Analytics
ANALYTICS
THE FUTURE
OF
INTERNAL
AUDITS
How is Data Analytics and Intelligent
Automation making its way into the world of
Internal Audit?
JUNE 12, 2019
independence
It is imperative that use of data analytics must preserve the internal auditor’s independence and objectivity. The
nature of the outputs delivered by the analytics cycle can give rise to specific complications when it comes to
meeting these principles. Careful consideration must be taken to address a number of issues that can affect the
independence of the audit work.
Deal with outliers. Auditors can't assume that a 99.3 percent positive return means things are good,
because that 0.7 percent might be a significant issue that won't be known until you dig in. Outliers should
not be ignored, they should be understood. They may be telling you something important. Seize the
opportunity to discover the reason why things didn't come out the way you thought they would.
Clearly visualize the data. It is incumbent on auditors to communicate audit results succinctly and
clearly, and that includes graphs and charts carefully constructed to convey maximum value. Data
visualization also should be leveraged early in the analysis process, as it enables pattern identification.
Recognize when you should not use data. More data isn't necessarily better, as not all data will help
achieve audit objectives. Realize that you can't use data for everything, and sometimes instinct and
experience are the best tools for auditing a certain area. Also, similar to correlation versus causation,
data can often appear to have more meaning than it really does. Beware the danger of jumping to
conclusions that ultimately may not be supported by the data.
Understand correlation versus causation. Correlation describes the relationship between two variables,
while causation speaks to the idea that one event is the result of the occurrence of the other event. It is
easy, and too common, to assume causation when there is simply correlation in the data and individuals
viewing the data will be influenced by past experience and their own personal biases.
OPEn source frameworks as solutions
to big data tool development
Apache Spark
Apache Spark is the alternative — and in many aspects the successor — of
Apache Hadoop. Spark was built to address the shortcomings of Hadoop and it
does this incredibly well. For example, it can process both batch data and
real-time data, and operates 100 times faster than MapReduce. Spark provides
the in-memory data processing capabilities, which is way faster than disk
processing leveraged by MapReduce. In addition, Spark works with HDFS,
OpenStack and Apache Cassandra, both in the cloud and on-prem, adding
another layer of versatility to big data operations for your business.
R Programming Environment
R is mostly used along with JuPyteR stack (Julia, Python, R) for enabling wide-
scale statistical analysis and data visualization. JupyteR Notebook is one of 4
most popular Big Data visualization tools, as it allows composing literally any
analytical model from more than 9,000 CRAN (Comprehensive R Archive Network)
algorithms and modules, running it in a convenient environment, adjusting it on
the go and inspecting the analysis results at once.