This document discusses 14 trends in data mining including: the integration of data mining with databases and web systems, standardization of data mining languages, visual and distributed data mining methods, mining complex data types like graphs and biological data, and real-time data mining. It also summarizes applications of data mining in finance, retail, telecommunications, biology, science, and intrusion detection. Finally, it outlines theoretical foundations, statistical techniques, visual data mining, collaborative filtering, and security aspects of data mining.
This document discusses 14 trends in data mining including: the integration of data mining with databases and web systems, standardization of data mining languages, visual and distributed data mining methods, mining complex data types like graphs and biological data, and real-time data mining. It also summarizes applications of data mining in finance, retail, telecommunications, biology, science, and intrusion detection. Finally, it outlines theoretical foundations, statistical techniques, visual data mining, collaborative filtering, and security aspects of data mining.
This document discusses 14 trends in data mining including: the integration of data mining with databases and web systems, standardization of data mining languages, visual and distributed data mining methods, mining complex data types like graphs and biological data, and real-time data mining. It also summarizes applications of data mining in finance, retail, telecommunications, biology, science, and intrusion detection. Finally, it outlines theoretical foundations, statistical techniques, visual data mining, collaborative filtering, and security aspects of data mining.
This document discusses 14 trends in data mining including: the integration of data mining with databases and web systems, standardization of data mining languages, visual and distributed data mining methods, mining complex data types like graphs and biological data, and real-time data mining. It also summarizes applications of data mining in finance, retail, telecommunications, biology, science, and intrusion detection. Finally, it outlines theoretical foundations, statistical techniques, visual data mining, collaborative filtering, and security aspects of data mining.
1. Application exploration: Early data mining applications focused mainly on helping businesses gain a competitive edge. 2. Scalable and interactive data mining methods: In contrast with traditional data analysis methods, data mining must be able to handle huge amounts of data efficiently and, if possible, interactively. 3. Integration of data mining with database systems, data warehouse systems, and Web database systems: Database systems, data warehouse systems, and the Web have become mainstream information processing systems. 4. Standardization of data mining language: A standard data mining language or other standardization efforts will facilitate the systematic development of data mining solutions, improve interoperability among multiple data mining systems and functions, and promote the education and use of data mining systems in industry and society. 5. Visual data mining: Visual data mining is an effective way to discover knowledge fromhuge amounts of data 6. Biological data mining: Although biological data mining can be considered under application exploration or mining complex types of data, the unique combination of complexity, richness, size, and importance of biological data warrants special attention in data mining. 7. Data mining and software engineering: As software programs become increasingly bulky in size, sophisticated in complexity, and tend to originate from the integration of multiple components developed by different software teams, it is an increasingly challenging task to ensure software robustness and reliability. 8. Web mining: Web mining is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining. 9. Distributed data mining: Traditional data mining methods, designed to work at a centralized location, do not work well in many of the distributed computing environments present today (e.g., the Internet, intranets, local area networks, high-speed wireless networks, and sensor networks). Advances in distributed data mining methods are expected. 10. Real-time or time-critical data mining: Many applications involving stream data (such as e-commerce, Web mining, stock analysis, intrusion detection, mobile data mining, and data mining for counterterrorism) require dynamic data mining models to be built in real time. Additional development is needed in this area. 11.Graph mining, link analysis, and social network analysis: Graph mining, link analysis, and social network analysis are useful for capturing sequential, topological, geometric, and other relational characteristics of many scientific data sets (such as for chemical compounds and biological networks) and social data sets (such as for the analysis of hidden criminal networks) 12. Multi relational and multi database data mining: Most data mining approaches search for patterns in a single relational table or in a single database. However, most real world data and information are spread across multiple tables and databases. 13. New methods for mining complex types of data: mining complex types of data is an important research frontier in data mining. Although progress has been made in mining stream, time-series, sequence, graph, spatiotemporal, multimedia, and text data, there is still a huge gap between the needs for these applications and the available technology. 14. Privacy protection and information security in data mining: An abundance of recorded personal information available in electronic forms and on the Web, coupled with increasingly powerful data mining tools, poses a threat to our privacy and data security 2.5 APPLICATIONS OF DATA MINING Data Mining for Financial Data Analysis few typical cases: 1. Design and construction of data warehouses for multidimensional data analysis and data mining 2. Loan payment prediction and customer credit policy analysis 3. Classification and clustering of customers for targeted marketing 4. Detection of money laundering and other financial crimes 5. Data Mining for the Retail Industry A few examples of data mining in the retail industry: 1. Design and construction of data warehouses based on the benefits of data mining 2. Multidimensional analysis of sales, customers, products, time, and region 3. Analysis of the effectiveness of sales campaigns 4. Customer retentionanalysis of customer loyalty 5. Product recommendation and cross-referencing of items Data Mining for the Telecommunication Industry 1. Multidimensional analysis of telecommunication data 2. Fraudulent pattern analysis and the identification of unusual patterns 3. Multidimensional association and sequential pattern analysis: 4. Mobile telecommunication services 5. Use of visualization tools in telecommunication data analysis Data Mining for Biological Data Analysis 1. Semantic integration of heterogeneous, distributed genomic and proteomic databases 2. Alignment, indexing, similarity search, and comparative analysis of multiple nucleotide/ protein sequences 3. Discovery of structural patterns and analysis of genetic networks and protein pathways 4. Association and path analysis: identifying co-occurring gene sequences and linking genes to different stages of disease development 5. Visualization tools in genetic data analysis Data Mining in Other Scientific Applications Data collection and storage technologies have recently improved, so that today, scientific data can be amassed at much higher speeds and lower costs. This has resulted in the accumulation of huge volumes of high-dimensional data, stream data, and heterogeneous data, containing rich spatial and temporal information. Consequently, scientific applications are shifting from the hypothesize-and-test paradigm toward a collect and store data, mine for new hypotheses, confirm with data or experimentation process. This shift brings about new challenges for data mining. Challenges: 1. Data warehouses and data preprocessing 2. Mining complex data types 3. Graph-based mining 4. Visualization tools and domain-specific knowledge Data Mining for Intrusion Detection The security of our computer systems and data is at continual risk. The extensive growth of the Internet and increasing availability of tools and tricks for intruding and attacking networks have prompted intrusion detection to become a critical component of network administration. The following are areas in which data mining technology may be applied or further developed for intrusion detection: 1. Development of data mining algorithms for intrusion detection 2. Association and correlation analysis, and aggregation to help select and build discriminating attributes 3. Analysis of stream data 4. Distributed data mining 5. Visualization and querying tools Data Mining System Products and Research Prototypes data mining systems should be assessed based on the following multiple features: 1. Data types 2. System issues 3. Data sources 4. Data mining functions and methodologies. 5. Coupling data mining with database and/or data warehouse systems. 6. Scalability 7. Visualization tools 8. Data mining query language and graphical user interface: Additional Themes on Data Mining: Theoretical Foundations of Data Mining 1. Data reduction: In this theory, the basis of data mining is to reduce the data representation 2. Data compression: According to this theory, the basis of data mining is to compress the given data by encoding in terms of bits, association rules, decision trees, clusters, and so on 3. Pattern discovery: In this theory, the basis of data mining is to discover patterns occurring in the database, such as associations, classification models, sequential patterns, and so on 4. Probability theory: This is based on statistical theory. In this theory, the basis of data mining is to discover joint probability distributions of random variables, for example, Bayesian belief networks or hierarchical Bayesian models. 5. Microeconomic view: The microeconomic view considers data mining as the task of finding patterns that are interesting only to the extent that they can be used in the decision making process of some enterprise (e.g., regarding marketing strategies and production plans). 6. Inductive databases: According to this theory, a database schema consists of data and patterns that are stored in the database. Statistical Data Mining techniques: 1. Regression 2. Generalized linear model 3. Analysis of variance 4. Mixed effect model 5. Factor analysis 6. Discriminate analysis 7. Time series analysis 8. Survival analysis 9. Quality control Visual and Audio Data Mining Visual data mining discovers implicit and useful knowledge from large data sets using data and/or knowledge visualization techniques. In general, data visualization and data mining can be integrated in the following ways: 1. Data visualization 2. Data mining result visualization 3. Data mining process visualization 4. Interactive visual data mining Data Mining and Collaborative Filtering A collaborative filtering approach is commonly used, in which products are recommended based on the opinions of other customers. Collaborative recommender systems may employ data mining or statistical techniques to search for similarities among customer preferences. Security of Data Mining Data securityenhancing techniques have been developed to help protect data. Databases can employ a multilevel security model to classify and restrict data according to various security levels, with users permitted access to only their authorized level. Privacy-sensitive data mining deals with obtaining valid data mining results without learning the underlying data values.