Abstract
Outlying Aspect Mining (OAM) is the task of identifying a subset of features that distinguish an outlier from normal data, which is important for downstream (human) decision-making. Existing methods are based on beam search in the space of feature subsets. They need to compute outlier scores for all examined subsets, and thus rely on simple outlier scoring algorithms.
In this paper, we propose SOAM, a novel OAM algorithm based on Sum-Product Networks (SPNs), a class of probabilistic circuits that can accurately model high-dimensional distributions. Our approach needs to fit an SPN only once, and leverages the tractability of marginal inference in SPNs to compute outlier scores in feature subsets. This way, computing outlier scores in subsets is fast, while being based on a flexible and accurate density estimator. We empirically show that SOAM clearly outperform the state-of-the-art method in search-based OAM, and even outperforms recent deep learning-based methods in the majority of the investigated cases. (Available at github.com/stefanluedtke/Sum-Product-Network-OAM).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Available at www.ipd.kit.edu/mitarbeiter/muellere/HiCS.
- 2.
Available at github.com/xuhongzuo/outlier-interpretation.
- 3.
github.com/xuhongzuo/outlier-interpretation.
References
Aggarwal, C.C.: Data Mining. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Duan, L., Tang, G., Pei, J., Bailey, J., Campbell, A., Tang, C.: Mining outlying aspects on numeric data. Data Min. Knowl. Discov. 29(5), 1116–1151 (2015). https://doi.org/10.1007/s10618-014-0398-2
Gens, R., Domingos, P.: Learning the structure of sum-product networks. In: International Conference on Machine Learning, pp. 873–880. PMLR (2013)
Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: KI-2012: Poster and Demo Track 9 (2012)
Keller, F., Muller, E., Bohm, K.: HICS: high contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 1037–1048. IEEE (2012)
Li, Z., Zhao, Y., Botta, N., Ionescu, C., Hu, X.: COPOD: copula-based outlier detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1118–1123. IEEE (2020)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Liu, N., Shin, D., Hu, X.: Contextual outlier interpretation. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 2461–2467 (2018)
Molina, A., Vergari, A., Di Mauro, N., Natarajan, S., Esposito, F., Kersting, K.: Mixed sum-product networks: a deep architecture for hybrid domains. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Molina, A., et al.: SPFlow: an easy and extensible library for deep probabilistic learning using sum-product networks (2019)
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. (CSUR) 54(2), 1–38 (2021)
Peharz, R., et al.: Einsum networks: fast and scalable learning of tractable probabilistic circuits. In: International Conference on Machine Learning, pp. 7563–7574. PMLR (2020)
Peharz, R., Tschiatschek, S., Pernkopf, F., Domingos, P.: On theoretical properties of sum-product networks. In: Artificial Intelligence and Statistics, pp. 744–752. PMLR (2015)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)
Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: Proceeding of the UAI (2011)
Samariya, D., Aryal, S., Ting, K.M., Ma, J.: A new effective and efficient measure for outlying aspect mining. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2020. LNCS, vol. 12343, pp. 463–474. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62008-0_32
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Schubert, E., Zimek, A., Kriegel, H.P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 542–550. SIAM (2014)
Venkataramanan, S., Peng, K.-C., Singh, R.V., Mahalanobis, A.: Attention guided anomaly localization in images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 485–503. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_29
Vergari, A., Di Mauro, N., Esposito, F.: Simplifying, regularizing and strengthening sum-product network structure learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9285, pp. 343–358. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23525-7_21
Vinh, N.X., et al.: Discovering outlying aspects in large datasets. Data Min. Knowl. Discov. 30(6), 1520–1555 (2016). https://doi.org/10.1007/s10618-016-0453-2
Wells, J.R., Ting, K.M.: A new simple and efficient density estimator that enables fast systematic search. Pattern Recogn. Lett. 122, 92–98 (2019)
Xu, H., et al.: Beyond outlier detection: outlier interpretation by attention-guided triplet deviation network. In: Proceedings of the Web Conference 2021, pp. 1328–1339 (2021)
Zhang, J., Lou, M., Ling, T.W., Wang, H.: Hos-miner: a system for detecting outlying subspaces of high-dimensional data. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB’04), pp. 1265–1268. Morgan Kaufmann Publishers Inc. (2004)
Acknowledgements
Stefan Lüdtke acknowledges the financial support by the Federal Ministry of Education and Research of Germany and by the Sächsische Staatsministerium für Wissenschaft Kultur und Tourismus in the program Center of Excellence for AI-research “Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig”, project identification number: ScaDS.AI
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lüdtke, S., Bartelt, C., Stuckenschmidt, H. (2023). Outlying Aspect Mining via Sum-Product Networks. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-33374-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33373-6
Online ISBN: 978-3-031-33374-3
eBook Packages: Computer ScienceComputer Science (R0)