Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Encyclopedia of Data Warehousing and Mining, Second Edition
High dimensional data is a phenomenon in real-world data mining applications. Text data is a typical example. In text mining, a text document is viewed as a vector of terms whose dimension is equal to the total number of unique terms in a data set, which is usually in thousands. High dimensional data occurs in business as well. In retails, for example, to effectively manage supplier relationship, suppliers are often categorized according to their business behaviors (Zhang, Huang, Qian, Xu, & Jing, 2006). The supplier’s behavior data is high dimensional, which contains thousands of attributes to describe the supplier’s behaviors, including product items, ordered amounts, order frequencies, product quality and so forth. One more example is DNA microarray data. Clustering high-dimensional data requires special treatment (Swanson, 1990; Jain, Murty, & Flynn, 1999; Cai, He, & Han, 2005; Kontaki, Papadopoulos & Manolopoulos., 2007), although various methods for clustering are available (J...
2019 •
101 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: A5498108118/19©BEIESP Abstract: Clustering High dimensional data is a propitious research area in current scenario. Now it becomes a crucial task to cluster multi-dimensional dataset as data-objects are largely dispersed in multi-dimensional space. Most of the conventional algorithms for clustering work on all dimensions of the feature space for calculating clusters. Whereas only few attributes are relevant. Thus their performance is not very Precise. A modified subspace clustering is proposed in this research paper, which does not use all attributes of high-dimensional feature space simultaneously rather, it determine a subspace of attributes which are important for each individual cluster. This subspace of attributes may be same or different for the different cluster. The comparison between conventional K-Means and modified subspace K-means clustering algorithms were done based on variou...
ACM SIGKDD Explorations Newsletter
Subspace clustering for high dimensional data: a review2004 •
Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Often in high dimensional data, many dimensions are irrelevant and can mask existing clusters in noisy data. Feature selection removes irrelevant and redundant dimensions by analyzing the entire dataset. Subspace clustering algorithms localize the search for relevant dimensions allowing them to find clusters that exist in multiple, possibly overlapping subspaces. There are two major ...
The data mining has emerged as a powerful tool to extract knowledge from huge databases. Researchers have introduced several machine learning algorithms to explore the databases to discover information, hidden patterns, and rules from the data which were not known at the data recording time. Due to the remarkable developments in the storage capacities, processing and powerful algorithmic tools, practitioners are developing new and improved algorithms and techniques in several areas of data mining to discover the rules and relationship among the attributes in simple and complex higher dimensional databases. Furthermore data mining has its implementation in large variety of areas ranging from banking to marketing, engineering to bioinformatics and from investment to risk analysis and fraud detection. Practitioners are analyzing and implementing the techniques of artificial neural networks for classification and regression problems because of accuracy, efficiency. The aim of his short ...
ACM SIGKDD Explorations Newsletter
Subspace clustering for high dimensional categorical data2004 •
Data clustering has been discussed extensively, but almost all known conventional clustering algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the data points. Existing subspace clustering algorithms for handling high-dimensional data focus on numerical dimensions. In this paper, we designed an iterative algorithm called SUBCAD for clustering high dimensional categorical data sets, based on the minimization of an objective function for clustering. We deduced some cluster memberships changing rules using the objective function. We also designed an objective function to determine the subspace associated with each cluster. We proved various properties of this objective function that are essential for us to design a fast algorithm to find the subspace associated with each cluster. Finally, we carried out some experiments to show the effectiveness of the proposed method and the algorithm.
Advanced Data Mining and Applications
A Fuzzy Subspace Algorithm for Clustering High Dimensional Data2006 •
2011 •
Abstract: In high dimensional databases, traditional full space clustering methods are known to fail due to the curse of dimensionality. Thus, in recent years, subspace clustering and projected clustering approaches were proposed for clustering in high dimensional spaces. As the area is rather young, few comparative studies on the advantages and disadvantages of the different algorithms exist.
International Journal of Advanced Computer Science and Applications
Pattern Based Subspace Clustering: A Review2010 •
2010 •
Clustering algorithms break down when the data points fall in huge-dimensional spaces. To tackle this problem, many subspace clustering methods were proposed to build up a subspace where data points cluster efficiently. The bottom-up approach is used widely to select a set of candidate features, and then to use a portion of this set to build up the hidden subspace step by step. The complexity depends exponentially or cubically on the number of the selected features. In this paper, we present SEGCLU, a SEGregation-based subspace CLUstering method which significantly reduces the size of the candidate features' set and has a cubic complexity. This algorithm was applied at noise-free data of DNA copy numbers of two groups of autistic and typically developing children to extract a potential bio-marker for autism. 85% of the individuals were classified correctly in a 13-dimensional subspace.
Пушкинский временник. Специальный выпуск в честь Татьяны Ивановны Краснобородько
Новое о Сен При2023 •
2021 •
International Journal of Instruction
English Teachers' Competency in Flipped Learning: Question Level and Questioning Strategy in Reading Comprehension2022 •
2019 •
Intelligent Virtual Agents
Giving Emotional Contagion Ability to Virtual Agents in Crowds2017 •
Biology of Sport
A genetic-based algorithm for personalized resistance-training2016 •
Jurnal PkM Pengabdian kepada Masyarakat
Penggunaan Poster Sebagai Alternatif Sosialisasi Padanan Istilah Bahasa Indonesia DI RW 03 Kelurahan Meruyung, Kecamatan Limo, Kota DepokArgumentos Voces Jurídicas & Literarias
La dignidad como principio fundante de los derechos2024 •
Renseignement et espionnage pendant la Seconde Guerre mondiale
Renseignement et espionnage pendant la Seconde Guerre mondiale2024 •