Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Automated anomaly detection for categorical data by repurposing a form filling recommender system
Journal of Data and Information Quality (JDIQ), Volume 16, Issue 3Article No.: 16, Pages 1–28https://doi.org/10.1145/3696110Data quality is crucial in modern software systems, like data-driven decision support systems. However, data quality is affected by data anomalies, which represent instances that deviate from most of the data. These anomalies affect the reliability and ...
- ArticleJuly 2024
PCA Dimensionality Reduction for Categorical Data
AbstractThe purpose of the article is to develop a new dimensionality reduction algorithm for categorical data. We give a new geometric formulation of the PCA dimensionality reduction method for numerical data that can be effectively transferred to the ...
- research-articleSeptember 2024
Latent class analysis: A review and recommendations for future applications in health sciences
Procedia Computer Science (PROCS), Volume 238, Issue CPages 1062–1067https://doi.org/10.1016/j.procs.2024.06.135AbstractThis article provides an overview of latent class analysis (LCA) and its applications in the field of health sciences. LCA is a statistical method used to identify unobserved subpopulations, or latent classes, within a population based on a set of ...
- research-articleFebruary 2023
About Representation and Evaluation of the Scientific Data, Numerical and Non-Numerical Nature in the Properties of Materials Research
Automatic Documentation and Mathematical Linguistics (SPADML), Volume 57, Issue 1Pages 46–54https://doi.org/10.3103/S0005105523010077Abstract—The paper considers issues in presenting and assessing the quality of scientific data in the materials science domain. Problems with numerical and non-numerical data of various types are presented using examples of data on the properties of ...
- research-articleMarch 2024
The similarity of the similarity measures in the context of clustering algorithms for categorical data
Procedia Computer Science (PROCS), Volume 225, Issue CPages 4511–4520https://doi.org/10.1016/j.procs.2023.10.449AbstractThis research aims to assess the similarity between determining the value of the similarity of objects by using various measures. Particularly for categorical data, the used similarity measures may behave differently depending on whether the ...
-
- research-articleJanuary 2022
A class of conjugate priors for multinomial probit models which includes the multivariate normal one
The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 30, Pages 1358–1383Multinomial probit models are routinely-implemented representations for learning how the class probabilities of categorical response data change with p observed predictors. Although several frequentist methods have been developed for estimation, inference ...
- research-articleJanuary 2021
Analyzing mixed-type data by using word embedding for handling categorical features
Intelligent Data Analysis (INDA), Volume 25, Issue 6Pages 1349–1368https://doi.org/10.3233/IDA-205453Most of real-world datasets are of mixed type including both numeric and categorical attributes. Unlike numbers, operations on categorical values are limited, and the degree of similarity between distinct values cannot be measured directly. In ...
- research-articleOctober 2020
Emerging Topic Detection on the Meta-data of Images from Fashion Social Media
MM '20: Proceedings of the 28th ACM International Conference on MultimediaPages 3995–4003https://doi.org/10.1145/3394171.3413914In the fashion industry where social media has a growing presence, it is increasingly important to find the emergence of people's new tastes in the early stage based on the photos posted there. However, the amount of photos posted on fashion social ...
- research-articleSeptember 2020
Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories
Journal of Software: Evolution and Process (WSMR), Volume 32, Issue 9https://doi.org/10.1002/smr.2260AbstractMissing data is a serious issue in software engineering because it can lead to information loss and bias in data analysis. Several imputation techniques have been proposed to deal with both numerical and categorical missing data. However, most of ...
- review-articleAugust 2019
Multiple and multiway correspondence analysis
One of the most popular, and versatile, ways of visually analyzing the associating between categorical data is to perform a correspondence analysis on the contingency table that is formed from their cross‐classification. Traditionally the analysis of ...
We briefly explore the various ways in which the association between the variables of a multiway contingency table can be visualized using correspondence analysis image image
- research-articleOctober 2018
Exploring a High-quality Outlying Feature Value Set for Noise-Resilient Outlier Detection in Categorical Data
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementPages 17–26https://doi.org/10.1145/3269206.3271721Unavoidable noise in real-world categorical data presents significant challenges to existing outlier detection methods because they normally fail to separate noisy values from outlying values. Feature subspace-based methods inevitably mix noisy values ...
- abstractApril 2018
On improving ROCK-based clustering for categorical data: student research abstract
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingPages 499–500https://doi.org/10.1145/3167132.3167445In the field of data mining, the analysis of categorical data (i.e., data that can assume a limited number of values, such as names) is particularly challenging due to the lack of implicit geometrical properties. Clustering of categorical data is ...
- research-articleNovember 2017
Selective Value Coupling Learning for Detecting Outliers in High-Dimensional Categorical Data
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementPages 807–816https://doi.org/10.1145/3132847.3132994This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. ...
- research-articleFebruary 2017
Enhanced (k,e)-anonymous for categorical data
ICSCA '17: Proceedings of the 6th International Conference on Software and Computer ApplicationsPages 62–67https://doi.org/10.1145/3056662.3056668Currently, the information is available in the organizations to gain more utilities together with some reason such as improving the strategy of organization, research, or etc.,. For this reason, the data holder cannot avoid to share some existing data-...
- research-articleJanuary 2017
Holo-Entropy Based Categorical Data Hierarchical Clustering
Informatica (INFMA), Volume 28, Issue 2Pages 303–328Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume ...
- research-articleJanuary 2017
A new classifier for categorical data based on a possibilistic estimation and a novel generalized minimum-based algorithm
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology (JIFS), Volume 33, Issue 3Pages 1723–1731https://doi.org/10.3233/JIFS-15372In this paper, we propose NPCc, a new Naïve Possibilistic Classifier for categorical data. The proposed classifier relies on the Bayesian structure of the Naïve Bayes Classifier for categorical data (NBCc) which stands for an interesting pattern when ...
- articleJanuary 2017
Dimensions reordering for visual mining of association rules using parallel set
International Journal of Data Analysis Techniques and Strategies (IJDATS), Volume 8, Issue 4Pages 296–315https://doi.org/10.1504/IJDATS.2016.081362Mining the interesting association rules from data and clusters presents high demands in many application fields, such as telephonic operator, social networks, marketing especially in decision-making process. Exploring the set of extracted rules from ...
- extended-abstractOctober 2016
Parallel bubbles: categorical data visualization in parallel coordinates
IHM '16: Actes de la 28ième conference francophone sur l'Interaction Homme-MachinePages 299–306https://doi.org/10.1145/3004107.3004142In this article we discuss the techniques available to represent categorical data in Parallel Coordinates, a widely used visualisation method for multivariate datasets analysis tasks. We propose Parallel Bubbles, a frequency-based method improving the ...
- ArticleSeptember 2015
ConDist: a context-driven categorical distance measure
ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IPages 251–266https://doi.org/10.1007/978-3-319-23528-8_16A distance measure between objects is a key requirement for many data mining tasks like clustering, classification or outlier detection. However, for objects characterized by categorical attributes, defining meaningful distance measures is a challenging ...
- ArticleSeptember 2015
Multivariate and Categorical Analysis of Gaming Statistics
NBIS '15: Proceedings of the 2015 18th International Conference on Network-Based Information SystemsPages 286–293https://doi.org/10.1109/NBiS.2015.45This paper provides exploratory analysis on gaming statistics via various multivariate and categorical data analysis approaches. The clustering results show that the principal components associated with the gaming data are related to player expertise ...