Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Towards efficient image-based representation of tabular data

Published: 04 October 2023 Publication History

Abstract

Convolutional neural networks (CNNs) have been widely used in image classification tasks and have achieved remarkable results compared with traditional methods. Their main advantage is the ability to extract hidden features automatically using local connectivity and spatial locality. However, CNN cannot be applied to tabular data, mainly due to the unsuitability of the tabular data structure to the CNN input. In this paper, we propose a new generic method for the representation of multidimensional tabular data as color-encoded images that can be used both for data visualization and classification with CNN. Our approach, named FC-Viz (Feature Clustering-Visualization), is based on user-oriented data visualization ideas, such as pixel-oriented techniques, feature clustering, and feature interactions. The proposed approach includes a transformation of each instance of the tabular data into a 2D pixel-based representation, where pixels representing features with strong correlation and interaction are adjacent to each other. We applied FC-Viz to ten multidimensional tabular datasets with dozens to thousands of features and compared its classification and visualization performance with a state-of-the-art tabular data transformation method. The evaluation experiments show that our approach is as accurate as the state-of-the-art, but with much smaller images resulting in much more compact and faster CNN models.

References

[1]
Sun B, Yang L, Zhang W, Lin M, Dong P, Young C, Dong J (2019) Supertml: two-dimensional word embedding for the precognition on structured tabular data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops
[2]
Krizhevsky A, Sutskever I, and Hinton GE Imagenet classification with deep convolutional neural networks Adv Neural Inf Process Syst 2012 25 1097-1105
[3]
Shneiderman B (2008) Extreme visualization: squeezing a billion records into a million pixels. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 3–12
[4]
Perrot A, Bourqui R, Hanusse N, Lalanne F, Auber D (2015) Large interactive visualization of density functions on big data infrastructure. In: 2015 IEEE 5th symposium on large data analysis and visualization (lDAV), pp 99–106. IEEE
[5]
Liu Z, Jiang B, Heer J (2013) IMMENS: real-time visual querying of big data. In: Computer Graphics Forum, vol 32, pp 421–430 . Wiley Online Library
[6]
Keim DA, Hao MC, Ladisch J, Hsu M, Dayal U (2001) Pixel bar charts: a new technique for visualizing large multi-attribute data sets without aggregation. In: IEEE symposium on information visualization: INFOVIS 2001, pp 113–120
[7]
Keim DA and Kriegel H-P Visualization techniques for mining large databases: a comparison IEEE Trans Knowl Data Eng 1996 8 6 923-938
[8]
Sharma A, Vans E, Shigemizu D, Boroevich KA, and Tsunoda T Deepinsight: a methodology to transform a non-image data to an image for convolution neural network architecture Sci Rep 2019 9 1 1-7
[9]
Lyu B, Haque A (2018) Deep learning based tumor type classification using gene expression data. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pp 89–96
[10]
Ma S, Zhang Z (2018) Omicsmapnet: transforming omics data to take advantage of deep convolutional neural network for discovery. arXiv preprint arXiv:1804.05283
[11]
Shneiderman B Tree visualization with tree-maps: 2-d space-filling approach ACM Trans Graphics (TOG) 1992 11 1 92-99
[12]
López-García G, Jerez JM, Franco L, and Veredas FJ Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data PLoS ONE 2020 15 3 0230536
[13]
Bazgir O, Zhang R, Dhruba SR, Rahman R, Ghosh S, Pal R (2019) Refined (representation of features as images with neighborhood dependencies): a novel feature representation for convolutional neural networks. arXiv preprint arXiv:1912.05687
[14]
Rusbult CE and Zembrodt IM Responses to dissatisfaction in romantic involvements: a multidimensional scaling analysis J Exp Soc Psychol 1983 19 3 274-293
[15]
Han H, Li Y, and Zhu X Convolutional neural network learning for generic data classification Inf Sci 2019 477 448-465
[16]
Feng G, Li B, Yang M, Yan Z (2018) V-CNN: data visualizing based convolutional neural network. In: 2018 IEEE international conference on signal processing, communications and computing (ICSPCC), pp 1–6 . IEEE
[17]
Zhu Y, Brettin T, Xia F, Partin A, Shukla M, Yoo H, Evrard YA, Doroshow JH, and Stevens RL Converting tabular data into images for deep learning with convolutional neural networks Sci Rep 2021 11 1 1-11
[18]
Kovalerchuk B, Agarwal B, Kall DC (2020) Solving non-image learning problems by mapping to images. In: 2020 24th international conference information visualisation (IV). IEEE, pp 264–269
[19]
Buturović L, Miljković D (2020) A novel method for classification of tabular data using convolutional neural networks. bioRxiv
[20]
Sharma A, Kumar D (2020) Classification with 2-d convolutional neural networks for breast cancer diagnosis. arXiv preprint arXiv:2007.03218
[21]
Kovalerchuk B, Kalla DC, Agarwal B (2021) Deep learning image recognition for non-images. arXiv preprint arXiv:2106.14350
[22]
Keim DA Designing pixel-oriented visualization techniques: theory and applications IEEE Trans Visual Comput Graph. 2000 6 1 59-78
[23]
Keim DA Pixel-oriented visualization techniques for exploring very large data bases J Comput Graph Stat 1996 5 1 58-77
[24]
Keim DA Information visualization and visual data mining IEEE Trans Vis Comput Graph 2002 8 1 1-8
[25]
Ellis G and Dix A A taxonomy of clutter reduction for information visualisation IEEE Trans Visual Comput Graph 2007 13 6 1216-1223
[26]
Bertini E, Tatu A, and Keim D Quality metrics in high-dimensional data visualization: an overview and systematization IEEE Trans Visual Comput Graph 2011 17 12 2203-2212
[27]
Yang J, Peng W, Ward MO, Rundensteiner EA (2003) Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In: IEEE symposium on information visualization 2003 (IEEE Cat. No. 03TH8714), pp. 105–112. IEEE
[28]
Behrisch M, Blumenschein M, Kim NW, Shao L, El-Assady M, Fuchs J, Seebacher D, Diehl A, Brandes U, Pfister H et al (2018) Quality metrics for information visualization. In: Computer graphics forum. Wiley Online Library, vol 37, pp 625–662
[29]
Ankerst M (2001) Visual data mining with pixel-oriented visualization techniques. In: Proceedings of the ACM SIGKDD workshop on visual data mining. Citeseer
[30]
Ankerst M, Berchtold S, Keim DA (1998) Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In: Proceedings IEEE symposium on information visualization (Cat. No. 98TB100258). IEEE, pp 52–60
[31]
Maimon O and Rokach L Data mining and knowledge discovery handbook 2005 Berlin Springer
[32]
Illowsky B and Dean S Introductory statistics 2018 Texas OpenStax College
[33]
Molnar C Interpretable machine learning 2020 Morrisville Lulu.com
[34]
Friedman JH and Popescu BE Predictive learning via rule ensembles Ann Appl Stat 2008 2 3 916-954
[35]
Sorokina D, Caruana R, Riedewald M, Fink D (2008) Detecting statistical interactions with additive groves of trees. In: Proceedings of the 25th international conference on machine learning, pp 1000–1007
[36]
Oh S Feature interaction in terms of prediction performance Appl Sci 2019 9 23 5191
[37]
Chanda P, Cho Y-R, Zhang A, Ramanathan M (2009) Mining of attribute interactions using information theoretic metrics. In: 2009 IEEE international conference on data mining workshops. IEEE, pp 350–355
[38]
Tang X, Dai Y, Sun P, and Meng S Interaction-based feature selection using factorial design Neurocomputing 2018 281 47-54
[39]
Dorigo M, Stützle T (2019) Ant colony optimization: overview and recent advances. Handbook of metaheuristics, pp 311–351
[40]
Dorigo M and Gambardella LM Ant colony system: a cooperative learning approach to the traveling salesman problem IEEE Trans Evol Comput 1997 1 1 53-66
[41]
Van der Maaten L and Hinton G Visualizing data using t-sne J Mach Learn Res 2008 9 11 2579-2605
[42]
Schölkopf B, Smola A, and Müller K-R Nonlinear component analysis as a kernel eigenvalue problem Neural Comput 1998 10 5 1299-1319
[43]
McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426
[44]
Garofolo JS (1993) Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium
[45]
Mitchell TM and Mitchell TM Machine learning 1997 New York McGraw-Hill
[46]
Guyon I, Gunn S, Nikravesh M, and Zadeh LA Feature extraction: foundations and applications 2008 Berlin Springer
[47]
Breiman L (1996) Bias, variance, and arcing classifiers. Technical report, 460, Statistics Department, University of California, Berkeley
[48]
Li M Efficiency improvement of ant colony optimization in solving the moderate LTSP J Syst Eng Electron 2015 26 6 1300-1308

Index Terms

  1. Towards efficient image-based representation of tabular data
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Neural Computing and Applications
          Neural Computing and Applications  Volume 36, Issue 2
          Jan 2024
          510 pages

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 04 October 2023
          Accepted: 14 September 2023
          Received: 10 March 2023

          Author Tags

          1. Tabular data representation
          2. Convolutional neural networks
          3. Feature interaction
          4. Feature clustering
          5. Data visualization
          6. Data transformation

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 16 Oct 2024

          Other Metrics

          Citations

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media