Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
33 views

Multidimensional Data Analysis, Data Mining and Knowledge Discovery

In recent times, the rate of usage and consumption of data has led to the need for these data to be organized, analyzed and used for futuristic prediction and decision making in order to improve human lives and future prediction in different fields of endeavor. Multidimensional Data Analysis, Data Mining and Knowledge Discovery are all associated with the organization, analysis and extraction of a data set for organization’s decision making and futuristic prediction.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Multidimensional Data Analysis, Data Mining and Knowledge Discovery

In recent times, the rate of usage and consumption of data has led to the need for these data to be organized, analyzed and used for futuristic prediction and decision making in order to improve human lives and future prediction in different fields of endeavor. Multidimensional Data Analysis, Data Mining and Knowledge Discovery are all associated with the organization, analysis and extraction of a data set for organization’s decision making and futuristic prediction.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Multidimensional Data Analysis, Data


Mining and Knowledge Discovery
Ekwe Prince O.1, 2; Okoronkwo Mathew1; Ukwome Tochi P2; Anozie Valentine U2
1
Department of Computer Science, University of Nigeria, Nsukka (UNN)
2
Department of Computer Science, Federal College of Agriculture, Ishiagu (FCAI)

Abstract:- In recent times, the rate of usage and Data mining involves the extraction of meaningful
consumption of data has led to the need for these data to information and patterns from complex data for the sake of
be organized, analyzed and used for futuristic prediction decision making and futuristic prediction. It is also referred
and decision making in order to improve human lives to as knowledge discovery process, knowledge mining data,
and future prediction in different fields of endeavor. knowledge extraction, multidimensional data analysis or
Multidimensional Data Analysis, Data Mining and data /pattern analysis.
Knowledge Discovery are all associated with the
organization, analysis and extraction of a data set for Data mining is also used for establishing connection
organization’s decision making and futuristic prediction. and creating variant patterns, abnormalities and correlations
In this research work, our focus was on the techniques, used for tackling multiple issues as well as processing
application of data mining as well as the phases involved usable information in the process. Data discovery is a vast,
in data mining. Our work further highlighted the varied process that involves multiple elements which gives
current and future trends in data mining and the birth to refined decisions.
numerous positives of multidimensional data
analysis/data mining and knowledge discovery to The remaining parts of these work is organized and
individuals, organizations, government, societies and the arranged into section two which highlights the background
world at large. The outcome of this research would of data mining, section three lays emphases on the phases of
provide a detailed positive view of the impact of data data mining while section four sheds light on the application
mining to individuals, organizations, government, areas of data mining. Section five educates on the benefits of
societies and the world at large for decision making and data mining to organizations and section six points out the
futuristic prediction. different techniques of data mining. Section seven breaks
down the current and future trends of data mining, while
Keywords:- Data, Mining, Data Mining, Multidimensional section eight highlights the contributions to knowledge and
Data Analysis, Knowledge Data Discovery. section nine concludes the research.

I. INTRODUCTION II. BACKGROUND TO THE STUDY

The advent and wide spread of information technology Angeli et al., (2017) asserted that for millennia now,
has led to continuous generation of data in different humans have dug out different locations to search for
databases. These generated data set needs to be organized missing knowledge. "Knowledge discovery in databases" is
and extracted with the sole aim of fine tuning useful the process of sieving different data to find hidden
intelligence to help organizations/businesses resolve information and forecast varying future trends. The phrase
imminent challenges, predict trends, mitigate risk and find "data mining" came into being in the 1990s. Data mining
trending opportunities to improve human lives. came to light from the convergence of three scientific
disciplines, these disciplines are: artificial intelligence,
We live in an information-rich world which is data- machine learning, and statistics.
driven. It is awesomely comforting to wake up to surplus of
readily accessible data, information and knowledge, these The 21st century presents us with high-dimensional,
large amount of available data gives rise to multiple large-scale, distributed digital mining in which enormous
challenges, organization needs and extraction of useful data are been mined in short period of time leading to
information for decision making. The Information depth bright-line prospects, and the potential positive value is also
would determine the useful insights you need. boundless. Among them, the classification prediction
technology will aid in future smart economic activities as
Data mining takes advantage of big data's endless well as provide vital reference decisions.
opportunities and affordable processing power to analyze
large data with ease. Computer delivery power and speed Li and Long (2020) researched on image detection and
have improved substantially in recent times, which has quantitative detection analysis of gastrointestinal infections
allowed the whole wide world to experience rapid, easy, and using data mining. Zuo (2018) did a detailed analysis on the
automated data analysis. attributes of network viruses and developed an electronic
data mining software. He further blended the data mining

IJISRT24JAN1077 www.ijisrt.com 2243


Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
technology and dynamic behavior interception technology to process data in technology-driven system. There are two (2)
mine encrypted data and ascertain whether there is a main classes of data mining techniques, they are supervised
malicious program. This approach was employed to network and unsupervised learning methods (Fu et al,
Trojan virus discovery. 2014; Sinharay, 2016). Supervised approaches are used
when subjects' memberships are known and the reason is to
Although alternating analytic methods are preferred to train a classifier that can concisely classify the subjects into
be used for varying data sources with particular their own category (e.g., score) and then be efficiently
characteristics, some prominent analytic methods can be generalized to new datasets. Unsupervised approaches are
carried out centered on the common characteristics of log used when subjects' memberships are unknown and the goal
files. is to categorize the subjects into clearly different clusters
based on characteristics that can differentiate them. Decision
Hao et al. (2016) summarized a number of common tree is a supervised data classification approach has been
actions when launching the package in Python, glassPy. utilized very often in analyzing process data in varying
These includes the summary data from the log file, the systems.
number of sessions, the time duration of each session and
the frequency of each event. More so, event n-grams, or DiCerbo and Kidwai (2013) worked with
event sequences of different lengths can be created for Classification and Regression Tree (CART) methods to
further implementation of similarity measures to classify produce classifiers to detect a player's goal in a gaming
and rate individuals' performances. To take into account, the surrounding. The authors showed the creation of the
temporal data, hierarchical vectorization of the rank ordered classifier including feature generation, pruning process, and
time intervals and the time interval distribution of event evaluated the results using concise data. This research
pairs were also implemented. showed that the CART could be a dependable automated
detector and showed the procedure of how to create such a
In addition to these common electronic data analytic detector with a relatively small sample size (n = 527).
techniques, other existing data analytic approaches for
processing data are Social Network Analysis (SNA; Zhu et On the other hand, cluster analysis and Self-Organizing
al., 2016), Bayesian Networks/Bayes nets (BNs; Levy, Maps (SOMs) are two pronounced unsupervised methods
2014) and Markov Item Response Theory (Shu et al., 2017). that organize students' problem-solving strategies. These
Furthermore, later data mining approaches, including cluster shows that cluster analysis can constantly identify key
analysis, decision trees, and artificial neural networks have characteristics in 155 students' performances in log files
been useful in unveiling vital information about students' extracted from an educational gaming and simulation
problem-solving strategies in different technology-improved surrounding called Save Patch, which measures
grading. mathematical competence. The authors showed how they
manipulated the data for the application of clustering
Buczak and Guven (2017) produced a hands on lecture algorithms and identified evidence that fuzzy cluster
on machine learning (ML) approaches and data mining analysis is more accurate than hard cluster analysis in
(DM) processes for network analysis. Xu et al. (2013) analyzing log file process data from game/simulation
explored the intermediate problems associated to data surrounding. Most importantly, the authors showed that
mining from a wider horizon and detailed several cluster analysis can identify both effective strategies and
approaches that aid in preserving sensitive information. He misconceptions students have with respect to the related
reviewed recent and trending approaches of Data Mining construct. Fossey (2017) reviewed three unsupervised
and came up with some preliminary nuggets for futuristic approaches, including k-means, SOM and Robust Clustering
research. Yan and Zheng (2017) discovered that long after using Links (ROCK) on analyzing process data in log files
doing a detailed job on data mining, many basic signs are from a game-based assessment case.
vital predictors of cross-sectional stock benefits. Their
approaches are general and it was used on past benefit-based III. STEPS INVOLVE IN DATA MINING
anomalies. Emoto et al. (2017) used terminal restriction
fragment length polymorphism (T-RFLP) data mining Data Mining has some steps involve in actualizing
technology to show the gut microbiota profile of patients analysis of varying data in order to organize and analyze
who have coronary artery disease. Hong et al. (2018) data for decision making and futuristic prediction.
presented a modern approach to construct a flood sensitivity
map in Poyang County, Jiangxi Province, China, by Data mining provides an indebt knowledge of
implementing the fuzzy Wolfe and data mining procedures. arithmetic/statistics, programming, business principles as
The data output of these studies are not broad-gauged and well as communication. In gathering knowledge about data
the outputs lack footing; thus, they cannot be completely analysis, all data scientist must note: Linear Algebra,
recognized by the nuetrals. Machine Learning, Data Retrieval and Database, Artificial
Intelligence, Problem-solving Ability, Data Structures and
This study aims to highlight data mining trends, Algorithms, Statistical Analysis.
techniques and application, this paragraph provides a brief
review of related techniques that have been frequently used
and the impact of these researches in relation to analyzing

IJISRT24JAN1077 www.ijisrt.com 2244


Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Outlined below are the Processes Data A. The Following are Trends and Application Areas of
Analysts/Scientists Follow in Order to Handle Data Data Mining:
Mining Project;
 Shopping Data Analysis
 Broad Understanding of the Business/organization: At The shopping market present us with a large data, the
this stage of Data Mining, the following data and management may need to swim through this large chunk of
information must be studied and understood in detail. data by different patterns. In order to achieve this fit, market
This information include; basket analysis is an appropriate analytic method. Market
basket analysis is an analytic approach that uses the idea that
 Basic Knowledge of the Company once you buy one set of goods, you're likely to buy another
 The Organization’s Present Footing, set of goods. This approach helps small scale businesses
 The Project’s Goals/Objectives, predict a customer's purchasing habits. With differential
 What is the Benchmark for Success? analysis, data gotten from various customers and clients
from various regional clusters can be analyzed.
 Comprehend the type of Data: This stage involves a
thorough understanding of the data to be analyzed and  Weather Prediction Analysis
the various sources of the data. In more simpler terms, Data Mining is also used for weather prediction,
there are two steps involve in this stage; weather forecasting systems makes decision from large
amounts of historical data for a particular period of time.
 Decipher the kind of data that is needed to resolve the Since large chunks of data are accessed and processed, the
problem right data mining approach would be deployed.
 Source for data from the right source.
 Analysis in the Stock Market
 Arrange the Data: These stage of Data Mining involves The stock market deals with enormous chunk of data,
sorting, arranging and preparing data for analyzing. these data needs to be analyzed. As a result, data mining
The steps in this phase are outlined below; approaches are used to model such data in order to perform
the analysis.
 Ramify data quality issues such as duplicate data,
missing data or infected data  Intrusion Detection System
 Arrange the data in a structure acceptable to solve the Intrusion Detection System analysis varying data in
organization’s challenges. order to detect the network activity. Data mining enhances
intrusion detection system by predicting anomaly detection.
 Remodel the Data to a particular form: This stage of It helps in differentiating between unusual network activity
Data Mining entails employing algorithm to ascertain and normal network activity.
data patterns after which a working model is created.
The steps in this stage are outlined below;  Fraud Detection System
Data Mining aids in Fraud Detection System. Old
 Deploy algorithms to predict data patterns. fashion approaches of fraud detection are time-wasting and
 Data analyst create, test, and ascertain a suitable model stressful as a result of the massive chunk of data involved.
from data patterns generated. Data mining helps in predicting relevant patterns and the
processing of data into information.
 Evaluate the Data to ascertain the result: This stage
decides whether/how effective the outcome of the model  Video Surveillance
will positively impact the business goal or solve the Video surveillance is used practically everywhere in
issue. More so, a repetitive level for sourcing the our everyday life for security reasons. Enormous Data are
appropriate algorithm is proposed, in a situation where captured every single day from the cameras and we need
the data analyst fails to attain success in the first Data Mining for analyzing the enormous chunks of data.
instance.
 Deploy the System to Management: The outcome of data  Analysis in Financial Banking
mined is given to the management for decision making The importance of Data Mining in Financial Banking
and futuristic prediction. cannot be underestimated. Because for every fresh business
deal in automated banking, an enormous chunk of data is
IV. APPLICATION AREAS OF DATA MINING produced. By resolving hidden patterns, causalities, and
correlations in business data, data mining would definitely
The positives of Data mining in competitive businesses provide solutions in banking bottlenecks and data access in
environment is enormous, as a result, decision making and banking and finance.
future prediction is achieved in a short period of time.
Outlined below are some data mining examples that shows a Data mining aids financial institutions ascertain anti-
broad range of application areas. fraud systems and credit ratings, analyze client financial
data, transaction record and financial card purchases. More
so, financial institutions get a better understanding of their

IJISRT24JAN1077 www.ijisrt.com 2245


Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
clients’ online habits and preferences through Data Mining,  It aids data analyst to quickly instantiate computerized
which serves as a bedrock when creating a new marketing predictions of behaviors and trends and discover hidden
campaign. patterns

 Data Mining Analysis in Healthcare VI. DATA MINING TECHNIQUES/TOOLS


Medical Practitioners create precise diagnosis by
combining physical examination results, patient’s medical As seasoned work men are known for the saying, “To
history, medications, and treatment patterns using Data achieve success, use the right tool for the right job.” It is
Mining. It also reduces fraud and waste as well as welcome very vital to note the different tools used for data mining in
a more cost-effective automated health resource order to make accurate and prompt organizational decision.
management system. Outlined below are the strategies/techniques that aid data
scientists with multiple data mining abilities.
 Data Analysis in Marketing
Marketing is one of the application that benefits most  Artificial Intelligence Tool
from data mining, it is actually marketing! After all, the sole Artificially Intelligent process produce analytical
aim of marketing is to ascertain clients effectively for functions that reproduce human intelligence, such as
utmost sales and the most appropriate strategy to get your problem-solving, reasoning, planning, and learning.
customers is to know your customers. Data mining comes in
handy in bringing as a unit data on gender, income level,  Association Rule Learning Technique
age, location, tastes and spending habits to produce The association rule toolset is also known as market
satisfying individualized loyalty campaigns. Data Analysis basket analysis, it searches for relationships amongst dataset
can also ascertain which clients will most possibly variables. A case study is association rule learning can
unsubscribe to a mailing list or other service offered by the determine which commodities are always bought together
organization. With such data from Data Mining techniques, (e.g., a smartphone and a protective case).
businesses are sure to take appreciable strategies to ensure
such clients don’t unsubscribe and leave for other  Clustering Technique
competitors. The clustering technique organizes datasets into
different useful sets, known as clusters. This technique aids
 Data Analysis in Retail individuals perceive the normal bedrock or strata in the data.
Data mining in retail and marketing work in parallel,
but the former still needs its personal highlight. Retail shops  Classification Technique
and supermarkets uses client choice patterns to clamp down The classification technique allocates particular data in
commodity associations and predict the items that need to be a set to various particular clusters or segments. The main
stocked in the supermarket/business premises and where it objectives is to create concise forecasts in the particular
can be gotten. Data mining also highlights which of the cluster for all the members of the set.
choices attain the most response.
 Data Analytics Technique
V. BENEFITS DERIVED FROM DATA MINING The data analytics technique allows users to ascertain
digital information and create useful business intelligence
We have our being in a data-driven globe, this gives us from it.
a lot of advantages with ease of data access. Data mining
gives us a paradigm for solving and resolving challenges in  Data Cleansing and Preparation Technique
this complex information age. The benefits of Data mining This Technique processes the raw data to an optimal
include: pattern suitable for later processing and usage. Preparation
involves processes such as finding and debugging errors and
 It aids in gathering reliable information for checking for missing or duplicate data.
organization/businesses
 It is efficient and cost less in comparison to most data  Data Warehousing Technique
systems around This technique includes a comprehensive collation of
 It aids varying companies to ascertain all round profit company’s data that management utilize to aid them in
and model shifts setting appropriate goals. Warehousing is a basic and
 Data mining can work with both new and legacy important jig saw of major large data mining systems.
computer systems
 Informed decisions are made through data mining for  Machine Learning Technique
businesses and organization Machine Learning Technique is related to the Artificial
 It also aims at detecting various credit risks and fraud Intelligent technique, machine learning is a computerized
 Scientists analyze large amounts of data with ease using programming technique that uses statistical probabilities to
Data Mining create computers with the skills to learn without human
intervention or manually programmed.
 Data analyst use the information to detect fraud, create
risk models, and increase product safety

IJISRT24JAN1077 www.ijisrt.com 2246


Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Regression Technique IX. CONCLUSION
The regression technique forecasts a number of
numeric data in clusters such as sales, stock prices, or even Data Mining, Multidimensional Data Analysis and
temperature. The organization are based on the information Knowledge Discovery are integral parts of cumbersome data
found in a particular data set. analysis, organization, extraction and presenting needed
information for decision making and futuristic prediction.
 There are Two Specific Tools, they are: R and ODM Needless to say that the role and importance of data mining
R is an open source language that is used for graphics to individuals, organizations, government, societies and the
and statistical computing. It enrich data scientist with a wide world at large cannot be undermined.
option of statistical tests, classification and graphical
techniques, and time-series analysis. The article provided a comprehensive knowledge on
where Data Mining started, it did go one step further to
Oracle Data Mining (ODM) is a module of the Oracle explore the trends and techniques of data mining. Finally,
Advanced Analytics Database. It aids data scientists in the future and applications of Data Mining were also
forecasting and creating ambiguous insights. Data Scientist exhausted.
use ODM to forecast client behavior, develop client profiles,
and identify cross-selling opportunities. REFERENCES

VII. CURRENT AND FUTURE TRENDS IN [1]. Angeli, C., Howard, S. K., Ma, J., Yang, J., and
DATA MINING Kirschner, P. A. (2017). Data mining in educational
technology classroom research: can it make a
The current and hereafter of data mining is positive, as contribution? Computers & Education, vol. 113, pp.
long as data volumes continually increase. Data mining 226–242.
techniques have improved due to technology improvement, [2]. Buczak A. and Guven. (2017). A survey of data
as have systems that extract useful information from data mining and machine learning methods for cyber
improved also. Before now, only large organizations with security intrusion detection, IEEE Communications
high budgets utilized multiple supercomputers to ascertain Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176.
data for organizing, analyzing and mining data for decision [3]. DiCerbo, K. E., and Kidwai, K. (2013). Detecting
making and futuristic prediction due to the expenses of player goals from game log files, in Poster presented
saving and analyzing data was very cost. at the Sixth International Conference on Educational
Data Mining (Memphis, TN).
Today, Organizations are introducing artificial [4]. Emoto, T., Yamashita, T., Kobayashi, T. (2017).
intelligence, machine learning, and deep learning on cloud- Characterization of gut microbiota profiles in
based data lakes to extract vital information from chunks of coronary artery disease patients using data mining
data for futuristic prediction. analysis of terminal restriction fragment length
polymorphism: gut microbiota could be a diagnostic
The Internet of Things and wearable computing has marker of coronary artery disease, Heart and Vessels,
changed both individuals and electronic devices into data- vol. 32, no. 1, pp. 39–46.
generating machines capable of creating massive data on [5]. Fossey, W. A. (2017). An Evaluation of Clustering
individuals and organizations. Through this, organizations Algorithms for Modeling Game-Based Assessment
can accumulate, store, and analyze massive amounts of data Work Processes. Unpublished doctoral dissertation,
for decision making. University of Maryland, College Park.
[6]. Fu, J., Zapata-Rivera, D., and Mavronikolas, E.
Cloud-based analytics solutions provide an easier and (2014). Statistical Methods for Assessments in
more cost-effective paradigm for organizations to entertain Simulations and Serious Games (ETS Research
large amounts of data and processing power. Cloud Report Series No. RR-14-12). Princeton, NJ:
computing allows for organizations to easily access and Educational Testing Service.
react on data from manufacturing, sales, and inventory [7]. Hao, J., Smith, L., Mislevy, R. J., von Davier, A. A.,
systems, Internet, marketing, among other sources in order and Bauer, M. (2016). Taming Log Files From
to enhance their bottom line. Game/Simulation-Based Assessments: Data Models
and Data Analysis Tools (ETS Research Report
VIII. CONTRIBUTION TO KNOWLEDGE Series No. RR-16-10). Princeton, NJ: Educational
Testing Service.
This research paper tends to contribute to knowledge [8]. Hong, H., Tsangaratos, P., Ilia, I., Liu, J., Zhu, A.-X.
by highlighting the place of organizing and extracting and Chen, W. (2018). Application of fuzzy weight of
information from a data set for decision making and evidence and data mining techniques in construction
futuristic prediction. More so, the research paper tends to of flood susceptibility map of Poyang County,
bring to lamplight, the techniques involve in data mining China,” The Science of the Total Environment, vol.
and the current and future trends for individuals, 625, no. 1, pp. 575–588.
organizations, government, societies and the world at large.

IJISRT24JAN1077 www.ijisrt.com 2247


Volume 9, Issue 1, January – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[9]. Levy, R. (2014). Dynamic Bayesian Network
Modeling of Game Based Diagnostic Assessments
(CRESST Report No.837). Los Angeles, CA:
University of California, National Center for
Research on Evaluation, Standards, and Student
Testing (CRESST), Center for Studies in Education,
UCLA.
[10]. Li, T. and Long, L. (2020). Imaging examination and
quantitative detection and analysis of gastrointestinal
diseases based on data mining technology, Journal of
Medical Systems, vol. 44, no. 1, pp. 1–15.
[11]. Sinharay, S. (2016). An NCME instructional module
on data mining methods for classification and
regression. Educ. Meas. Issues Pract. 35, 38–54. doi:
10.1111/emip.12115.
[12]. Shu, Z., Bergner, Y., Zhu, M., Hao, J., and von
Davier, A. A. (2017). An item response theory
analysis of problem-solving processes in scenario-
based tasks. Psychol. Test Assess. Model. 59, 109–
131.
[13]. Xu, B., Recker, M., Qi, X., Flann, N., and Ye, L.
(2013). Clustering educational digital library usage
data: a comparison of latent class analysis and k-
means algorithms. J. Educ. Data Mining 5, 38–68.
[14]. Yan, X. and Zheng, L. (2017). Fundamental analysis
and the cross-section of stock returns: a data-mining
approach, Review of Financial Studies, vol. 30, no. 4,
pp. 1382–1423.
[15]. Zhu, M., Shu, Z., and von Davier, A. A. (2016).
Using networks to visualize and analyze process data
for educational assessment. J. Educ. Meas. 53, 190–
211. doi: 10.1111/jedm.12107.
[16]. Zuo, C. (2018). Defense of computer network viruses
based on data mining technology, International
Journal on Network Security, vol. 20, no. 4, pp. 805–
810.

IJISRT24JAN1077 www.ijisrt.com 2248

You might also like