Keyword: categorical data : Search

research-article

Open Access

Automated anomaly detection for categorical data by repurposing a form filling recommender system

Journal of Data and Information Quality (JDIQ), Volume 16, Issue 3Article No.: 16, Pages 1–28https://doi.org/10.1145/3696110

Data quality is crucial in modern software systems, like data-driven decision support systems. However, data quality is affected by data anomalies, which represent instances that deviate from most of the data. These anomalies affect the reliability and ...

Article

PCA Dimensionality Reduction for Categorical Data

Aleksander Denisiuk

Computational Science – ICCS 2024Pages 179–186https://doi.org/10.1007/978-3-031-63759-9_22

Abstract

The purpose of the article is to develop a new dimensionality reduction algorithm for categorical data. We give a new geometric formulation of the PCA dimensionality reduction method for numerical data that can be effectively transferred to the ...

research-article

Latent class analysis: A review and recommendations for future applications in health sciences

Procedia Computer Science (PROCS), Volume 238, Issue CPages 1062–1067https://doi.org/10.1016/j.procs.2024.06.135

Abstract

This article provides an overview of latent class analysis (LCA) and its applications in the field of health sciences. LCA is a statistical method used to identify unobserved subpopulations, or latent classes, within a population based on a set of ...

research-article

About Representation and Evaluation of the Scientific Data, Numerical and Non-Numerical Nature in the Properties of Materials Research

Automatic Documentation and Mathematical Linguistics (SPADML), Volume 57, Issue 1Pages 46–54https://doi.org/10.3103/S0005105523010077

Abstract—The paper considers issues in presenting and assessing the quality of scientific data in the materials science domain. Problems with numerical and non-numerical data of various types are presented using examples of data on the properties of ...

research-article

The similarity of the similarity measures in the context of clustering algorithms for categorical data

Procedia Computer Science (PROCS), Volume 225, Issue CPages 4511–4520https://doi.org/10.1016/j.procs.2023.10.449

Abstract

This research aims to assess the similarity between determining the value of the similarity of objects by using various measures. Particularly for categorical data, the used similarity measures may behave differently depending on whether the ...

research-article

Free

A class of conjugate priors for multinomial probit models which includes the multivariate normal one

The Journal of Machine Learning Research (JMLR), Volume 23, Issue 1Article No.: 30, Pages 1358–1383

Multinomial probit models are routinely-implemented representations for learning how the class probabilities of categorical response data change with p observed predictors. Although several frequentist methods have been developed for estimation, inference ...

research-article

Analyzing mixed-type data by using word embedding for handling categorical features

Intelligent Data Analysis (INDA), Volume 25, Issue 6Pages 1349–1368https://doi.org/10.3233/IDA-205453

Most of real-world datasets are of mixed type including both numeric and categorical attributes. Unlike numbers, operations on categorical values are limited, and the degree of similarity between distinct values cannot be measured directly. In ...

research-article

Emerging Topic Detection on the Meta-data of Images from Fashion Social Media

MM '20: Proceedings of the 28th ACM International Conference on MultimediaPages 3995–4003https://doi.org/10.1145/3394171.3413914

In the fashion industry where social media has a growing presence, it is increasingly important to find the emergence of people's new tastes in the early stage based on the photos posted there. However, the amount of photos posted on fashion social ...

research-article

Fuzzy case‐based‐reasoning‐based imputation for incomplete data in software engineering repositories

Journal of Software: Evolution and Process (WSMR), Volume 32, Issue 9https://doi.org/10.1002/smr.2260

Abstract

Missing data is a serious issue in software engineering because it can lead to information loss and bias in data analysis. Several imputation techniques have been proposed to deal with both numerical and categorical missing data. However, most of ...

review-article

Multiple and multiway correspondence analysis

WIREs Computational Statistics (WICS), Volume 11, Issue 5https://doi.org/10.1002/wics.1464

One of the most popular, and versatile, ways of visually analyzing the associating between categorical data is to perform a correspondence analysis on the contingency table that is formed from their cross‐classification. Traditionally the analysis of ...

We briefly explore the various ways in which the association between the variables of a multiway contingency table can be visualized using correspondence analysis image image

research-article

Exploring a High-quality Outlying Feature Value Set for Noise-Resilient Outlier Detection in Categorical Data

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementPages 17–26https://doi.org/10.1145/3269206.3271721

Unavoidable noise in real-world categorical data presents significant challenges to existing outlier detection methods because they normally fail to separate noisy values from outlying values. Feature subspace-based methods inevitably mix noisy values ...

abstract

On improving ROCK-based clustering for categorical data: student research abstract

Riccardo Cappuzzo

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingPages 499–500https://doi.org/10.1145/3167132.3167445

In the field of data mining, the analysis of categorical data (i.e., data that can assume a limited number of values, such as names) is particularly challenging due to the lack of implicit geometrical properties. Clustering of categorical data is ...

research-article

Selective Value Coupling Learning for Detecting Outliers in High-Dimensional Categorical Data

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementPages 807–816https://doi.org/10.1145/3132847.3132994

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. ...

research-article

Enhanced (k,e)-anonymous for categorical data

ICSCA '17: Proceedings of the 6th International Conference on Software and Computer ApplicationsPages 62–67https://doi.org/10.1145/3056662.3056668

Currently, the information is available in the organizations to gain more utilities together with some reason such as improving the strategy of organization, research, or etc.,. For this reason, the data holder cannot avoid to share some existing data-...

research-article

Holo-Entropy Based Categorical Data Hierarchical Clustering

Informatica (INFMA), Volume 28, Issue 2Pages 303–328

Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume ...

research-article

A new classifier for categorical data based on a possibilistic estimation and a novel generalized minimum-based algorithm

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology (JIFS), Volume 33, Issue 3Pages 1723–1731https://doi.org/10.3233/JIFS-15372

In this paper, we propose NPC_c, a new Naïve Possibilistic Classifier for categorical data. The proposed classifier relies on the Bayesian structure of the Naïve Bayes Classifier for categorical data (NBC_c) which stands for an interesting pattern when ...

article

Dimensions reordering for visual mining of association rules using parallel set

International Journal of Data Analysis Techniques and Strategies (IJDATS), Volume 8, Issue 4Pages 296–315https://doi.org/10.1504/IJDATS.2016.081362

Mining the interesting association rules from data and clusters presents high demands in many application fields, such as telephonic operator, social networks, marketing especially in decision-making process. Exploring the set of extracted rules from ...

extended-abstract

Parallel bubbles: categorical data visualization in parallel coordinates

IHM '16: Actes de la 28ième conference francophone sur l'Interaction Homme-MachinePages 299–306https://doi.org/10.1145/3004107.3004142

In this article we discuss the techniques available to represent categorical data in Parallel Coordinates, a widely used visualisation method for multivariate datasets analysis tasks. We propose Parallel Bubbles, a frequency-based method improving the ...

Article

ConDist: a context-driven categorical distance measure

ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IPages 251–266https://doi.org/10.1007/978-3-319-23528-8_16

A distance measure between objects is a key requirement for many data mining tasks like clustering, classification or outlier detection. However, for objects characterized by categorical attributes, defining meaningful distance measures is a challenging ...

Article

Multivariate and Categorical Analysis of Gaming Statistics

NBIS '15: Proceedings of the 2015 18th International Conference on Network-Based Information SystemsPages 286–293https://doi.org/10.1109/NBiS.2015.45

This paper provides exploratory analysis on gaming statistics via various multivariate and categorical data analysis approaches. The clustering results show that the principal components associated with the gaming data are related to player expertise ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences