Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

A Systematic Literature Review of Novelty Detection in Data Streams: Challenges and Opportunities

Published: 14 May 2024 Publication History

Abstract

Novelty detection in data streams is the task of detecting concepts that were not known prior, in streams of data. Many machine learning algorithms have been proposed to detect these novelties, as well as integrate them. This study provides a systematic literature review of the state of novelty detection in data streams, including its advancement in recent years, its main challenges and solutions, an updated taxonomy for the classification of the proposed frameworks, and a comparative analysis of different key algorithms in this field. Additionally, we highlight ongoing challenges and future research directions that could be tackled moving forward.

References

[1]
Zahraa S. Abdallah, Mohamed Medhat Gaber, Bala Srinivasan, and Shonali Krishnaswamy. 2016. AnyNovel: Detection of novel concepts in evolving data streams. Evolv. Syst. 7, 2, SI (2016), 73–93. DOI:
[2]
Amal Abid and Salma Jamoussi. 2016. Novelty detection in data stream clustering using the artificial immune system. In European Mediterranean & Middle Eastern Conference on Information Systems.
[3]
A. Abid, S. Jamoussi, and A. B. Hamadou. 2019. AIS-Clus: A bio-inspired method for textual data stream clustering. Vietnam. J. Comput. Sci. 6, 2 (2019), 223–256. DOI:
[4]
Charu C. Aggarwal, Philip S. Yu, Jiawei Han, and Jianyong Wang. 2003. A framework for clustering evolving data streams. In VLDB Conference, Johann-Christoph Freytag, Peter Lockemann, Serge Abiteboul, Michael Carey, Patricia Selinger, and Andreas Heuer (Eds.). Morgan Kaufmann, San Francisco, 81–92. DOI:
[5]
Supriya Agrahari and Anil Kumar Singh. 2022. Concept drift detection in data stream mining: A literature review. J. King Saud Univ. - Comput. Inf. Sci. 34, 10, Part B (2022), 9523–9540. DOI:
[6]
Husam Al-Behadili, Arne Grumpe, Christian Dopp, and Christian Wöhler. 2015. Extreme learning machine based novelty detection for incremental semi-supervised learning. In 3rd International Conference on Image Information Processing (ICIIP’15). 230–235. DOI:
[7]
H. Al-Behadili, A. Grumpe, L. Migdadi, and C. Wöhler. 2016. Incremental Parzen window classifier for a multi-class system. Int. J. Simul. Syst. Sci. Technol. 17, 34 (2016), 6.1–6.11. DOI:
[8]
Husam Al-Behadili, Arne Grumpe, Lubaba Migdadi, and Christian Wöhler. 2016. Semi-supervised learning using incremental support vector machine and extreme value theory in gesture data. In UKSim-AMSS 18th International Conference on Computer Modelling and Simulation (UKSim’16). 184–189. DOI:
[9]
Husam Al-Behadili, Arne Grumpe, and Christian Wöhler. 2015. Semi-supervised learning using incremental polynomial classifier and extreme value theory. In 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS’15). 332–337. DOI:
[10]
Tahseen Al-Khateeb, Mohammad M. Masud, Khaled M. Al-Naami, Sadi Evren Seker, Ahmad M. Mustafa, Latifur Khan, Zouheir Trabelsi, Charu Aggarwal, and Jiawei Han. 2016. Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Trans. Knowl. Data Eng. 28, 10 (2016), 2752–2764. DOI:
[11]
Omar Alghushairy, Raed Alsini, Terence Soule, and Xiaogang Ma. 2021. A review of local outlier factor algorithms for outlier detection in big data streams. Big Data Cog. Comput. 5, 1 (2021). DOI:
[12]
Marta Amorim, Frederico D. Bortoloti, Patrick M. Ciarelli, Evandro O. T. Salles, and Daniel C. Cavalieri. 2019. Novelty detection in social media by fusing text and image into a single structure. IEEE Access 7 (2019), 132786–132802. DOI:
[13]
Annalisa Appice, Michelangelo Ceci, Corrado Loglisci, Costantina Caruso, Fabio Fumarola, Michele Todaro, and Donato Malerba. 2009. A relational approach to novelty detection in data streams. In 17th Italian Symposium on Advanced Database Systems (SEBD’09). 89–100.
[14]
Ira Assent, Philipp Kranen, Corinna Baldauf, and Thomas Seidl. 2010. Detecting outliers on arbitrary data streams using anytime approaches. In 1st International Workshop on Novel Data Stream Pattern Mining Techniques (StreamKDD’10). Association for Computing Machinery, New York, NY, 10–15. DOI:
[15]
Brian Babcock, Mayur Datar, and Rajeev Motwani. 2002. Sampling from a moving window over streaming data. In 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02). Society for Industrial and Applied Mathematics, 633–634.
[16]
Nathalie A. Barbosa, Louise Travé-Massuyès, and Victor H. Grisales. 2016. A novel algorithm for dynamic clustering: Properties and performance. In 15th IEEE International Conference on Machine Learning and Applications (ICMLA’16). 565–570. DOI:
[17]
Jean Paul Barddal, Heitor Murilo Gomes, and Fabrício Enembreck. 2015. SNCStream: A social network-based data stream clustering algorithm. In 30th Annual ACM Symposium on Applied Computing (SAC’15). Association for Computing Machinery, New York, NY, 935–940. DOI:
[18]
Clauber Gomes Bezerra, Bruno Sielly Jales Costa, Luiz Affonso Guedes, and Plamen Parvanov Angelov. 2020. An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Inf. Sci. 518 (2020), 13–28. DOI:
[19]
Xin Bi, Chao Zhang, Xiangguo Zhao, Donghang Li, Yongjiao Sun, and Yuliang Ma. 2020. CODES: Efficient incremental semi-supervised classification over drifting and evolving social streams. IEEE Access 8 (2020), 14024–14035. DOI:
[20]
Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive online analysis. J. Mach. Learn. Res. 11 (2010), 1601–1604. DOI:
[21]
Jock Blackard. 1998. Covertype. UCI Machine Learning Repository. DOI:
[22]
Mohamed-Rafik Bouguelia, Yolande Belaid, and Abdel Belaid. 2014. Efficient active novel class detection for data stream classification. In 22nd International Conference on Pattern Recognition. 2826–2831. DOI:
[23]
Mohamed-Rafik Bouguelia, Slawomir Nowaczyk, and Amir H. Payberah. 2018. An adaptive algorithm for anomaly and novelty detection in evolving data streams. Data Min. Knowl. Discov. 32, 6 (2018), 1597–1633. DOI:
[24]
Paula Branco, Luís Torgo, and Rita P. Ribeiro. 2016. A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49, 2, Article 31 (Aug. 2016), 50 pages. DOI:
[25]
C. Fahy and S. Yang. 2019. Finding and tracking multi-density clusters in online dynamic data streams. IEEE Trans. Big Data 8, 1 (2019). DOI:
[26]
C. Fahy, S. Yang, and M. Gongora. 2021. Classification in dynamic data streams with a scarcity of labels. IEEE Trans. Knowl. Data Eng. 35, 4 (2021). DOI:
[27]
C. Puerto-Santana, C. Bielza, J. Diaz-Rozo, G. Ramirez-Gargallo, F. Mantovani, G. Virumbrales, J. Labarta, and P. Larrañaga. 2022. Asymmetric HMMs for online ball-bearing health assessments. IEEE Internet Things J. 9, 20 (2022). DOI:
[28]
Jesus A. Carino, Miguel Delgado-Prieto, Jose Antonio Iglesias, Araceli Sanchis, Daniel Zurita, Marta Millan, Juan Antonio Ortega Redondo, and Rene Romero-Troncoso. 2018. Fault detection and identification methodology under an incremental learning framework applied to industrial machinery. IEEE Access 6 (2018), 49755–49766. DOI:
[29]
Ander Carreño, Iñaki Inza, and Jose A. Lozano. 2023. SNDProb: A probabilistic approach for streaming novelty detection. IEEE Trans. Knowl. Data Eng. 35, 6 (2023), 6335–6348. DOI:
[30]
Robert Cattral and Franz Oppacher. 2007. Poker Hand. UCI Machine Learning Repository. DOI:
[31]
Michelangelo Ceci, Annalisa Appice, Corrado Loglisci, Costantina Caruso, Fabio Fumarola, and Donato Malerba. 2009. Novelty detection from evolving complex data streams with time windows. In Foundations of Intelligent Systems, Jan Rauch, Zbigniew W. Raś, Petr Berka, and Tapio Elomaa (Eds.). Springer Berlin, 563–572.
[32]
Michelangelo Ceci, Annalisa Appice, Corrado Loglisci, Costantina Caruso, Fabio Fumarola, Carmine Valente, and Donato Malerba. 2009. Relational frequent patterns mining for novelty detection from data streams. In Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.). Springer Berlin, 427–439.
[33]
M. B. Chandak. 2016. Role of big-data in classification and novel class detection in data streams. J. Big Data 3, 1 (2016). DOI:
[34]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages. DOI:
[35]
Clément Christophe, Julien Velcin, Jairo Cugliari, Philippe Suignard, and Manel Boumghar. 2019. How to detect novelty in textual data streams? A comparative study of existing methods. CoRR abs/1909.05099 (2019).
[36]
Joel D. Costa Júnior, Elaine R. Faria, Jonathan A. Silva, João Gama, and Ricardo Cerri. 2019. Novelty detection for multi-label stream classification. In 8th Brazilian Conference on Intelligent Systems (BRACIS’19). 144–149. DOI:
[37]
Joel D. Costa Júnior, Elaine R. Faria, Jonathan A. Silva, João Gama, and Ricardo Cerri. 2019. Pruned sets for multi-label stream classification without true labels. In International Joint Conference on Neural Networks (IJCNN’19). 1–8. DOI:
[38]
A. L. Cristiani, T. P. da Silva, and H. de Arruda Camargo. 2020. A Fuzzy Approach for Classification and Novelty Detection in Data Streams under Intermediate Latency. Vol. 12320 LNAI. Springer Science and Business Media Deutschland GmbH. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85094105335&doi=10.1007%2f978-3-030-61380-8_12&partnerID=40&md5=0eae673986a6bbf4fa8fed3560ced91f
[39]
André Luis Cristiani and Heloisa de Arruda Camargo. 2021. A fuzzy multi-class novelty detector for data streams under intermediate latency. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’21). 1–6. DOI:
[40]
D. -W. Zhou, Y. Yang, and D. -C. Zhan. 2022. Learning to classify with incremental new class. IEEE Trans. Neural Netw. Learn. Syst. 33, 6 (2022), 2429–2443. DOI:
[41]
Tiago Pinho da Silva and Heloisa de Arruda Camargo. 2020. Possibilistic approach for novelty detection in data streams. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’20). 1–8. DOI:
[42]
Tiago Pinho da Silva, Leonardo Schick, Priscilla de Abreu Lopes, and Heloisa de Arruda Camargo. 2018. A fuzzy multiclass novelty detector for data streams. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’18). 1–8. DOI:
[43]
Elaine Ribeiro de Faria, Andre Carlos Ponce de Leon Ferreira Carvalho, and Joao Gama. 2016. MINAS: Multiclass learning algorithm for novelty detection in data streams. Data Min. Knowl. Discov. 30, 3 (2016), 640–680. DOI:
[44]
Elaine Ribeiro de Faria, Isabel Ribeiro Goncalves, Joao Gama, and Andre Carlos Ponce de Leon Ferreira Carvalho. 2015. Evaluation of multiclass novelty detection algorithms for data streams. IEEE Trans. Knowl. Data Eng. 27, 11 (2015), 2961–2973. DOI:
[45]
Li Deng. 2012. The MNIST database of handwritten digit images for machine learning research. IEEE Sig. Process. Mag. 29, 6 (2012), 141–142.
[46]
S. U. Din, J. Shao, J. Kumar, C. B. Mawuli, S. M. H. Mahmud, W. Zhang, and Q. Yang. 2021. Data stream classification with novel class detection: A review, comparison and challenges. Knowl. Inf. Syst. 63, 9 (2021), 2231–2276. DOI:
[47]
Salah Ud Din and Junming Shao. 2020. Exploiting evolving micro-clusters for data stream classification with emerging class detection. Inf. Sci. 507 (2020), 404–420. DOI:
[48]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml
[49]
F. Dufrenois. 2022. Incremental and compressible kernel null discriminant analysis. Pattern Recog. 127 (2022). DOI:
[50]
Sarah M. Erfani, Sutharshan Rajasegarar, and Christopher Leckie. 2011. An efficient approach to detecting concept-evolution in network data streams. In Australasian Telecommunication Networks and Applications Conference (ATNAC’11). 1–7. DOI:
[51]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI Press, 226–231.
[52]
C. Fahy and S. Yang. 2022. Finding and tracking multi-density clusters in online dynamic data streams. IEEE Trans. Big Data 8, 1 (2022), 178–192. DOI:
[53]
Conor Fahy, Shengxiang Yang, and Mario Gongora. 2022. Scarcity of labels in non-stationary data streams: A survey. ACM Comput. Surv. 55, 2 (2022). DOI:
[54]
Elaine R. Faria, João Gama, and André C. P. L. F. Carvalho. 2013. Novelty detection algorithm for data streams multi-class problems. In 28th Annual ACM Symposium on Applied Computing (SAC’13). Association for Computing Machinery, New York, NY, 795–800. DOI:
[55]
Elaine R. Faria, Isabel J. C. R. Gonçalves, André C. P. L. F. de Carvalho, and João Gama. 2016. Novelty detection in data streams. Artif. Intell. Rev. 45, 2 (01 Feb. 2016), 235–269. DOI:
[56]
Dewan Md. Farid and Chowdhury Mofizur Rahman. 2012. Novel class detection in concept-drifting data stream mining employing decision tree. In 7th International Conference on Electrical and Computer Engineering. 630–633. DOI:
[57]
Dewan Md. Farid, Li Zhang, Alamgir Hossain, Chowdhury Mofizur Rahman, Rebecca Strachan, Graham Sexton, and Keshav Dahal. 2013. An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst. Applic. 40, 15 (2013), 5895–5906. DOI:
[58]
Geli Fei, Shuai Wang, and Bing Liu. 2016. Learning cumulatively to become more knowledgeable. In 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). Association for Computing Machinery, New York, NY, 1565–1574. DOI:
[59]
R. A. Fisher. 1988. Iris. UCI Machine Learning Repository. DOI:
[60]
Floris Gaisser, Maja Rudinac, Pieter P. Jonker, and David Tax. 2013. Online face recognition and learning for cognitive robots. In 16th International Conference on Advanced Robotics (ICAR’13). 1–9. DOI:
[61]
Joao Gama. 2010. Knowledge Discovery from Data Streams (1st ed.). Chapman & Hall/CRC.
[62]
João Gama. 2012. A survey on learning from data streams: Current and future trends. Progr. Artif. Intell. 1, 1 (01 Apr. 2012), 45–55. DOI:
[63]
Joao Gama, Jesus Aguilar-Ruiz, and Ralf Klinkenberg. 2008. Knowledge discovery from data streams. Intell. Data Anal. 12, 3 (2008), 251–252.
[64]
J. Gama and P. P. Rodrigues. 2009. An Overview on Mining Data Streams. Vol. 206. Springer Verlag.
[65]
Yang Gao, Yi-Fan Li, Bo Dong, Yu Lin, and Latifur Khan. 2019. SIM: Open-world multi-task stream classifier with integral similarity metrics. In IEEE International Conference on Big Data (Big Data’19). 751–760. DOI:
[66]
K. D. Garcia, E. R. de Faria, C. R. de Sá, J. Mendes-Moreira, C. C. Aggarwal, A. C. P. L. F. de Carvalho, and J. N. Kok. 2019. Ensemble Clustering for Novelty Detection in Data Streams. Vol. 11828 LNAI. Springer. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85075818430&doi=10.1007%2f978-3-030-33778-0_34&partnerID=40&md5=f7e0ef1c2bd475220bfc08ad313f7892
[67]
K. D. Garcia, M. Poel, J. N. Kok, and A. C. P. L. F. de Carvalho. 2019. Online Clustering for Novelty Detection and Concept Drift in Data Streams. Vol. 11805 LNAI. Springer Verlag. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85072863596&doi=10.1007%2f978-3-030-30244-3_37&partnerID=40&md5=999cff0b90e22b120a04beb28879dee8
[68]
Jean-Gabriel Gaudreault and Paula Branco. 2023. Toward streamlining the evaluation of novelty detection in data streams. In Discovery Science, Albert Bifet, Ana Carolina Lorena, Rita P. Ribeiro, João Gama, and Pedro H. Abreu (Eds.). Springer Nature Switzerland, Cham, 703–717.
[69]
Jean-Gabriel Gaudreault, Paula Branco, and João Gama. 2021. An analysis of performance metrics for imbalanced classification. In Discovery Science, Carlos Soares and Luis Torgo (Eds.). Springer International Publishing, Cham, 67–77.
[70]
Chuanxing Geng, Sheng-Jun Huang, and Songcan Chen. 2018. Recent advances in open set recognition: A survey. CoRR abs/1811.08581 (2018).
[71]
Heitor Murilo Gomes, Jean Paul Barddal, Fabricio Enembreck, and Albert Bifet. 2017. A survey on ensemble learning for data stream classification. ACM Comput. Surv. 50, 2 (2017). DOI:
[72]
Heitor Murilo Gomes, Maciej Grzenda, Rodrigo Mello, Jesse Read, Minh Huong Le Nguyen, and Albert Bifet. 2022. A survey on semi-supervised learning for delayed partially labelled data streams. ACM Comput. Surv. 55, 4 (2022). DOI:
[73]
Heitor Murilo Gomes, Jesse Read, Albert Bifet, Jean Paul Barddal, and João Gama. 2019. Machine learning for streaming data: State of the art, challenges, and opportunities. SIGKDD Explor. Newslett. 21, 2 (2019), 6–22. DOI:
[74]
Christian Gruhl, Abdul Hannan, Zhixin Huang, Chandana Nivarthi, and Stephan Vogt. 2021. The problem with real-world novelty detection—Issues in multivariate probabilistic models. In IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C’21). 204–209. DOI:
[75]
Xiaojie Guo, Amir Alipour-Fanid, Lingfei Wu, Hemant Purohit, Xiang Chen, Kai Zeng, and Liang Zhao. 2019. Multi-stage deep classifier cascades for open world recognition. In 28th ACM International Conference on Information and Knowledge Management (CIKM’19). Association for Computing Machinery, New York, NY, 179–188. DOI:
[76]
Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 26, 9 (2014), 2250–2267. DOI:
[77]
Fatma Hamdi and Younès Bennani. 2011. Learning random subspace novelty detection filters. In International Joint Conference on Neural Networks. 2273–2280. DOI:
[78]
Ahsanul Haque, Latifur Khan, and Michael Baron. 2015. Semi supervised adaptive framework for classifying evolving data stream. In Advances in Knowledge Discovery and Data Mining, Tru Cao, Ee-Peng Lim, Zhi-Hua Zhou, Tu-Bao Ho, David Cheung, and Hiroshi Motoda (Eds.). Springer International Publishing, Cham, 383–394.
[79]
Ahsanul Haque, Latifur Khan, and Michael Baron. 2016. SAND: Semi-supervised adaptive novel class detection and classification over data stream. In 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 1652–1658.
[80]
Ahsanul Haque, Latifur Khan, Michael Baron, Bhavani Thuraisingham, and Charu Aggarwal. 2016. Efficient handling of concept drift and concept evolution over stream data. In IEEE 32nd International Conference on Data Engineering (ICDE’16). 481–492. DOI:
[81]
Michael Harries and University of New South Wales. 1999. Splice-2 Comparative Evaluation: Electricity Pricing. School of Computer Science and Engineering. https://webarchive.nla.gov.au/awa/20040915173921/ http://pandora.nla.gov.au/pan/32869/20040907-0000/ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/9905.pdf
[82]
Douglas M. Hawkins. 1980. Identification of Outliers. Vol. 11. Springer.
[83]
Morteza Zi Hayat and Mahmoud Reza Hashemi. 2010. A DCT based approach for detecting novelty and concept drift in data streams. In International Conference of Soft Computing and Pattern Recognition. 373–378. DOI:
[84]
Elena Ikonomovska, João Gama, and Sašo Džeroski. 2011. Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23, 1 (01 July 2011), 128–168. DOI:
[85]
M. R. Islam. 2014. Recurring and Novel Class Detection in Concept-drifting Data Streams using Class-based Ensemble. Vol. 8444 LNAI. Springer Verlag, Tainan. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-84901284721&doi=10.1007%2f978-3-319-06605-9_35&partnerID=40&md5=815b3814260fc7a9d82a6e5a7ef48835
[86]
J. W. Sangma, Y. Rani, V. Pal, N. Kumar, and R. Kushwaha. 2022. FHC-NDS: Fuzzy hierarchical clustering of multiple nominal data streams. IEEE Trans. Fuzzy Syst. 31, 3 (2022), 1–12. DOI:
[87]
Syed Muslim Jameel, Manzoor Ahmed Hashmani, Mobashar Rehman, and Arif Budiman. 2020. An adaptive deep learning framework for dynamic image classification in the internet of things environment. Sensors 20, 20 (2020). DOI:
[88]
Olga Jodelka, Christos Anagnostopoulos, and Kostas Kolomvatsos. 2021. Adaptive novelty detection over contextual data streams at the edge using one-class classification. In 12th International Conference on Information and Communication Systems (ICICS’12). 213–219. DOI:
[89]
J. M. Kang, M. A. Ahmad, A. Teredesai, and R. Gaborski. 2007. Cognitively motivated novelty detection in video data streams. In Multimedia Data ing and Knowledge Discovery.Springer London, 209–233. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-84861620807&doi=10.1007%2f978-1-84628-799-2_11&partnerID=40&md5=991b9f9a639c084c49f4fa3f204fe7b9
[90]
Kevin S. Killourhy and Roy A. Maxion. 2009. Comparing anomaly-detection algorithms for keystroke dynamics. In IEEE/IFIP International Conference on Dependable Systems & Networks. 125–134. DOI:
[91]
Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 51, 1 (2009), 7–15. DOI:
[92]
Yanika Kongsorot, Punyaphol Horata, and Pakarat Musikawan. 2020. An incremental kernel extreme learning machine for multi-label learning with emerging new labels. IEEE Access 8 (2020), 46055–46070. DOI:
[93]
Bartosz Krawczyk, Leandro L. Minku, Joao Gama, Jerzy Stefanowski, and Michal Wozniak. 2017. Ensemble learning for data stream analysis: A survey. Inf. Fusion 37 (2017), 132–156. DOI:
[94]
Bartosz Krawczyk and Michal Wozniak. 2015. One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Comput. 19, 12, SI (2015), 3387–3400. DOI:
[95]
Bartosz Krawczyk and Michał Woźniak. 2015. Reacting to different types of concept drift with adaptive and incremental one-class classifiers. In IEEE 2nd International Conference on Cybernetics (CYBCONF’15). 30–35. DOI:
[96]
Andre Lemos, Walmir Caminhas, and Fernando Gomide. 2013. Adaptive fault detection and diagnosis using an evolving fuzzy classifier. Inf. Sci. 220 (2013), 64–85. DOI:
[97]
Xiangjun Li, Yong Zhou, Ziyan Jin, Peng Yu, and Shun Zhou. 2020. A classification and novel class detection algorithm for concept drift data stream based on the cohesiveness and separation index of Mahalanobis distance. J. Electric. Comput. Eng. 2020 (2020).
[98]
Jing Liu, Guo-Sheng Xu, Da Xiao, Li-Ze Gu, and Xin-Xin Niu. 2013. A semi-supervised ensemble approach for mining data streams. J. Comput. 8, 11, SI (2013), 2873–2879. DOI:
[99]
K.-H. Liu, W.-P. Zhan, Y.-F. Liang, Y.-N. Zhang, H.-Z. Guo, J.-F. Yao, Q.-Q. Wu, and Q.-Q. Hong. 2022. The design of error-correcting output codes algorithm for the open-set recognition. Appl. Intell. 52, 7 (2022), 7843–7869. DOI:
[100]
Sin Kit Lo, Qinghua Lu, Chen Wang, Hye-Young Paik, and Liming Zhu. 2021. A systematic literature review on federated machine learning: From a software engineering perspective. ACM Comput. Surv. 54, 5, Article 95 (May 2021), 39 pages. DOI:
[101]
Atefeh Mahdavi and Marco Carvalho. 2021. A survey on open set recognition. In IEEE 4th International Conference on Artificial Intelligence and Knowledge Engineering (AIKE’21). IEEE. DOI:
[102]
Sepehr Maleki and Chris Bingham. 2019. Robust hierarchical clustering for novelty identification in sensor networks: With applications to industrial systems. Appl. Soft Comput. 85 (2019). DOI:
[103]
Mohammad M. Masud, Tahseen M. Al-Khateeb, Latifur Khan, Charu Aggarwal, Jing Gao, Jiawei Han, and Bhavani Thuraisingham. 2011. Detecting recurring and novel classes in concept-drifting data streams. In IEEE 11th International Conference on Data Mining. 1176–1181. DOI:
[104]
Mohammad M. Masud, Qing Chen, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. 2010. Classification and novel class detection of data streams in a dynamic feature space. In Machine Learning and Knowledge Discovery in Databases, José Luis Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag (Eds.). Springer Berlin, 337–352.
[105]
Mohammad M. Masud, Qing Chen, Latifur Khan, Charu Aggarwal, Jing Gao, Jiawei Han, and Bhavani Thuraisingham. 2010. Addressing concept-evolution in concept-drifting data streams. In IEEE International Conference on Data Mining. 929–934. DOI:
[106]
Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, Jing Gao, Jiawei Han, Ashok Srivastava, and Nikunj C. Oza. 2013. Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans. Knowl. Data Eng. 25, 7 (2013), 1484–1497. DOI:
[107]
Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. 2009. Integrating novel class detection with classification for concept-drifting data streams. In Machine Learning and Knowledge Discovery in Databases, Wray Buntine, Marko Grobelnik, Dunja Mladenić, and John Shawe-Taylor (Eds.). Springer Berlin, 79–94.
[108]
Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. 2010. Classification and novel class detection in data streams with active mining. In Advances in Knowledge Discovery and Data Mining, Mohammed J. Zaki, Jeffrey Xu Yu, B. Ravindran, and Vikram Pudi (Eds.). Springer Berlin, 311–324.
[109]
Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham. 2011. Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23, 6 (2011), 859–874. DOI:
[110]
Shon Mendelson and Boaz Lerner. 2020. Online cluster drift detection for novelty detection in data streams. In 19th IEEE International Conference on Machine Learning and Applications (ICMLA’20). 171–178. DOI:
[111]
Xuedan Miao, Ying Liu, Haiquan Zhao, and Chunguang Li. 2019. Distributed online one-class support vector machine for anomaly detection over networks. IEEE Trans. Cybern. 49, 4 (2019), 1475–1488. DOI:
[112]
Y. Miao, L. Qiu, H. Chen, J. Zhang, and Y. Wen. 2013. Novel Class Detection within Classification for Data Streams. Vol. 7952 LNCS. Springer Verlag, Dalian. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-84880754617&doi=10.1007%2f978-3-642-39068-5_50&partnerID=40&md5=c39851e79cce6a28d2eed8c34ca8b9aa
[113]
Saad Mohamad, Moamar Sayed-Mouchaweh, and Abdelhamid Bouchachia. 2018. Active learning for classifying data streams with unknown number of classes. Neural Netw. 98 (2018), 1–15. DOI:
[114]
Xin Mu, Kai Ming Ting, and Zhi-Hua Zhou. 2017. Classification under streaming emerging new classes: A solution using completely-random trees. IEEE Trans. Knowl. Data Eng. 29, 8 (2017), 1605–1618. DOI:
[115]
Ahmad M. Mustafa, Gbadebo Ayoade, Khaled Al-Naami, Latifur Khan, Kevin W. Hamlen, Bhavani Thuraisingham, and Frederico Araujo. 2017. Unsupervised deep embedding for novel class detection over data stream. In IEEE International Conference on Big Data (Big Data’17). 1830–1839. DOI:
[116]
Gyoung S. Na, Donghyun Kim, and Hwanjo Yu. 2018. DILOF: Effective and memory efficient local outlier detection in data streams. In 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’18). Association for Computing Machinery, New York, NY, 1993–2002. DOI:
[117]
Emre Ozbilge. 2019. Experiments in online expectation-based novelty-detection using 3D shape and colour perceptions for mobile robot inspection. Robot. Auton. Syst. 117 (2019), 68–79. DOI:
[118]
Jinxing Pan, Xiaoshan Yang, Yi Huang, and Changsheng Xu. 2022. Few-shot egocentric multimodal activity recognition. In ACM Multimedia Asia (MMAsia’21). Association for Computing Machinery, New York, NY, Article 23, 7 pages. DOI:
[119]
Cheong Hee Park. 2019. Outlier and anomaly pattern detection on data streams. J. Supercomput. 75, 9 (2019), 6118–6128. DOI:
[120]
Brandon Parker, Ahmad M. Mustafa, and Latifur Khan. 2012. Novel class detection and feature via a tiered ensemble approach for stream mining. In IEEE 24th International Conference on Tools with Artificial Intelligence, Vol. 1. 1171–1178. DOI:
[121]
Brandon S. Parker and Latifur Khan. 2015. Detecting and tracking concept class drift and emergence in non-stationary fast data streams. In 29th AAAI Conference on Artificial Intelligence (AAAI’15). AAAI Press, 2908–2913.
[122]
Lenka Pitonakova and Seth Bullock. 2020. The robustness-fidelity trade-off in Grow When Required neural networks performing continuous novelty detection. Neural Netw. 122 (2020), 183–195. DOI:
[123]
Roozbeh Razavi-Far, Ehsan Hallaji, Mehrdad Saif, and Gregory Ditzler. 2019. A novelty detector and extreme verification latency model for nonstationary environments. IEEE Trans. Industr. Electron. 66, 1 (2019), 561–570. DOI:
[124]
Attila Reiss. 2012. PAMAP2 Physical Activity Monitoring. UCI Machine Learning Repository. DOI:
[125]
Huorong Ren, Zhixing Ye, and Zhiwu Li. 2017. Anomaly detection based on a dynamic Markov model. Inf. Sci. 411 (2017), 52–65. DOI:
[126]
Nathalie Barbosa Roa, Louise Trave-Massuyes, and Victor H. Grisales-Palacio. 2019. DyClee: Dynamic clustering for tracking evolving environments. Pattern Recog. 94 (2019), 162–186. DOI:
[127]
Stefano Rovetta, Zied Mnasri, and Francesco Masulli. 2020. Detection of hazardous road events from audio streams: An ensemble outlier detection approach. In IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS’20). 1–6. DOI:
[128]
Andrzej Rusiecki. 2012. Robust neural network for novelty detection on data streams. In Artificial Intelligence and Soft Computing, Leszek Rutkowski, Marcin Korytkowski, Rafał Scherer, Ryszard Tadeusiewicz, Lotfi A. Zadeh, and Jacek M. Zurada (Eds.). Springer Berlin, 178–186.
[129]
Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 8, 4 (2018), e1249.
[130]
Deepita Saha, Mozzammel Haque, Akash Sarkar, Famina Alam, Dewan Md Farid, Chowdhury Mofizur Rahman, and Swakkhar Shatabda. 2018. IEEE WIECON-ECE 2018 novel class detection in concept drifting data streams using decision tree leaves. In IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE’18). 87–90. DOI:
[131]
Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, and Mohammad Sabokrou. 2021. A unified survey on anomaly, novelty, open-set, and out-of-distribution detection: Solutions and future challenges. CoRR abs/2110.14051 (2021).
[132]
Walter J. Scheirer, Anderson de Rezende Rocha, Archana Sapkota, and Terrance E. Boult. 2013. Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (2013), 1757–1772. DOI:
[133]
Sajjad Kamali Siahroudi, Poorya Zare Moodi, and Hamid Beigy. 2018. Detection of evolving concepts in non-stationary data streams: A multiple kernel learning approach. Expert Syst. Applic. 91 (2018), 187–197. DOI:
[134]
Bruno Sielly Jales Costa, Clauber Gomes Bezerra, Luiz Affonso Guedes, and Plamen Parvanov Angelov. 2016. Unsupervised classification of data streams based on Typicality and Eccentricity Data Analytics. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’16). 58–63. DOI:
[135]
Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, André C. P. L. F. de Carvalho, and João Gama. 2013. Data stream clustering: A survey. ACM Comput. Surv. 46, 1, Article 13 (July 2013), 31 pages. DOI:
[136]
Dimitrios Sisiaridis, Fabrizio Carcillo, and Olivier Markowitch. 2016. A framework for threat detection in communication systems. In 20th Pan-Hellenic Conference on Informatics (PCI’16). Association for Computing Machinery, New York, NY, Article 68, 6 pages. DOI:
[137]
James Seale Smith, Seth Baer, Zsolt Kira, and Constantine Dovrolis. 2019. Unsupervised continual learning and self-taught associative memory hierarchies. CoRR abs/1904.02021 (2019).
[138]
Mark Smith, Steven Reece, Stephen Roberts, Ioannis Psorakis, and Iead Rezek. 2014. Maritime abnormality detection using Gaussian processes. Knowl. Inf. Syst. 38, 3 (2014), 717–741. DOI:
[139]
M. H. Soleimani-Babakamali, R. Sepasdar, K. Nasrollahzadeh, and R. Sarlo. 2022. A system reliability approach to real-time unsupervised structural health monitoring without prior information. Mech. Syst. Sig. Process. 171 (2022). DOI:
[140]
Vinicius M. A. Souza, Denis M. dos Reis, André G. Maletzke, and Gustavo E. A. P. A. Batista. 2020. Challenges in benchmarking stream learning algorithms with real-world data. Data Min. Knowl. Discov. 34, 6 (01 Nov. 2020), 1805–1858. DOI:
[141]
Eduardo J. Spinosa, Andre Ponce de Leon F. de Carvalhoa, and Joao Gama. 2009. Novelty detection with application to data streams. Intell. Data Anal. 13, 3 (2009), 405–422. DOI:
[142]
Eduardo J. Spinosa, André Ponce de Leon F. de Carvalho, and João Gama. 2007. OLINDDA: A cluster-based approach for detecting novelty and concept drift in data streams. In ACM Symposium on Applied Computing (SAC’07). Association for Computing Machinery, New York, NY, 448–452. DOI:
[143]
Eduardo J. Spinosa, André Ponce de Leon F. de Carvalho, and João Gama. 2008. Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In ACM Symposium on Applied Computing (SAC’08). Association for Computing Machinery, New York, NY, 976–980. DOI:
[144]
Noppayut Sriwatanasakdi, Masayuki Numao, and Ken-ichi Fukui. 2017. Concept drift detection for graph-structured classifiers under scarcity of true labels. In IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI’17). 461–468. DOI:
[145]
Fan Wei Lee Wenke Prodromidis Andreas Stolfo, Salvatore and Philip Chan. 1999. KDD Cup 1999 Data. UCI Machine Learning Repository. DOI:
[146]
Yogita Thakran and Durga Toshniwal. 2012. Unsupervised outlier detection in streaming data using weighted clustering. In 12th International Conference on Intelligent Systems Design and Applications (ISDA’12). 947–952. DOI:
[147]
Rosane M. M. Vallim, José A. Andrade Filho, André C. P. L. F. de Carvalho, and João Gama. 2012. A density-based clustering approach for behavior change detection in data streams. In Brazilian Symposium on Neural Networks. 37–42. DOI:
[148]
Rosane M. M. Vallim, Jose A. Andrade Filho, Rodrigo F. de Mello, and Andre C. P. L. F. de Carvalho. 2013. Online behavior change detection in computer games. Expert Syst. Applic. 40, 16 (2013), 6258–6265. DOI:
[149]
Rosane M. M. Vallim, Jose A. Andrade Filho, Rodrigo F. de Mello, Andre C. P. L. F. de Carvalho, and Joao Gama. 2014. Unsupervised density-based behavior change detection in data streams. Intell. Data Anal. 18, 2 (2014), 181–201. DOI:
[150]
Nees Jan van Eck and Ludo Waltman. 2011. Text mining and visualization using VOSviewer. DOI:
[151]
Anna Vergeles, Alexander Khaya, Dmytro Prokopenko, and Nataliia Manakova. 2018. Unsupervised real-time stream-based novelty detection technique an approach in a corporate cloud. In IEEE 2nd International Conference on Data Stream Mining & Processing (DSMP’18). 166–170. DOI:
[152]
Zhiyuan Wan, Xin Xia, David Lo, and Gail C. Murphy. 2021. How does machine learning change software development practices? IEEE Trans. Softw. Eng. 47, 9 (2021), 1857–1871. DOI:
[153]
X. Wang and Y. Xing. 2019. An online support vector machine for the open-ended environment. Expert Syst. Applic. 120 (2019), 72–86. DOI:
[154]
Yigong Wang, Zhuoyi Wang, Yu Lin, Latifur Khan, and Dingcheng Li. 2021. CIFDM: Continual and interactive feature distillation for multi-label stream learning. In 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’21). Association for Computing Machinery, New York, NY, 2121–2125. DOI:
[155]
Yi Wang, Nan Xue, Xin Fan, Jiebo Luo, Risheng Liu, Bin Chen, Haojie Li, and Zhongxuan Luo. 2018. Fast factorization-free kernel learning for unlabeled chunk data streams. In 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 2833–2839.
[156]
Zhuoyi Wang, Zelun Kong, Swarup Changra, Hemeng Tao, and Latifur Khan. 2019. Robust high dimensional stream classification with novel class detection. In IEEE 35th International Conference on Data Engineering (ICDE’19). 1418–1429. DOI:
[157]
Zhuoyi Wang, Hemeng Tao, Zelun Kong, Swarup Chandra, and Latifur Khan. 2019. Metric learning based framework for streaming classification with concept evolution. In International Joint Conference on Neural Networks (IJCNN’19). 1–8. DOI:
[158]
Zhuoyi Wang, Yigong Wang, Yu Lin, Evan Delord, and Khan Latifur. 2020. Few-sample and adversarial representation learning for continual stream mining. In The Web Conference (WWW’20). Association for Computing Machinery, New York, NY, 718–728. DOI:
[159]
Christian Weiss and Andreas Zell. 2008. Novelty detection and online learning for vibration-based terrain classification. Intell. Auton. Syst. 10, IAS 2008 (01 2008). DOI:
[160]
Dominik Wurzer and Yumeng Qin. 2018. Parameterizing kterm hashing. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR’18). Association for Computing Machinery, New York, NY, 945–948. DOI:
[161]
Yiyang Xu, Yan Li, and YunJie Li. 2019. Fuzzy ARTMAP network and clustering for streaming classification under emerging new classes. In IEEE International Conference on Signal, Information and Data Processing (ICSIDP’19). 1–5. DOI:
[162]
Y. Gao, S. Chandra, Y. Li, L. Khan, and B. M. Thuraisingham. 2020. SACCOS: A semi-supervised framework for emerging class detection and concept drift adaption over data streams. IEEE Trans. Knowl. Data Eng. 34, 3 (2020). DOI:
[163]
Y. Wang, Y. Ding, X. He, X. Fan, C. Lin, F. Li, T. Wang, Z. Luo, and J. Luo. 2021. Novelty detection and online learning for chunk data streams. IEEE Trans. Pattern Anal. Mach. Intell. 43, 7 (2021), 2400–2412. DOI:
[164]
Shipeng Yan, Jiale Zhou, Jiangwei Xie, Songyang Zhang, and Xuming He. 2021. An EM framework for online incremental learning of semantic segmentation. In 29th ACM International Conference on Multimedia (MM’21). Association for Computing Machinery, New York, NY, 3052–3060. DOI:
[165]
Jia Ming Yeoh, Fabio Caraffini, Elmina Homapour, Valentino Santucci, and Alfredo Milani. 2019. A clustering system for dynamic data streams based on metaheuristic optimisation. Mathematics 7, 12 (2019). DOI:
[166]
Yogita and Durga Toshniwal. 2012. A framework for outlier detection in evolving data streams by weighting attributes in clustering. Procedia Technol. 6 (2012), 214–222. DOI:
[167]
Yogita and Durga Toshniwal. 2013. Clustering techniques for streaming data-a survey. In 3rd IEEE International Advance Computing Conference (IACC’13). 951–956. DOI:
[168]
Poorya ZareMoodi, Hamid Beigy, and Sajjad Kamali Siahroudi. 2015. Novel class detection in data streams using local patterns and neighborhood graph. Neurocomputing 158 (2015), 234–245. DOI:
[169]
Poorya ZareMoodi, Sajjad Kamali Siahroudi, and Hamid Beigy. 2016. A support vector based approach for classification beyond the learned label space in data streams. In 31st Annual ACM Symposium on Applied Computing (SAC’16). Association for Computing Machinery, New York, NY, 910–915. DOI:
[170]
Poorya ZareMoodi, Sajjad Kamali Siahroudi, and Hamid Beigy. 2019. Concept-evolution detection in non-stationary data streams: A fuzzy clustering approach. Knowl. Inf. Syst. 60, 3 (2019), 1329–1352. DOI:
[171]
Dan Zhang, Yan Liu, and Luo Si. 2011. Serendipitous learning: Learning beyond the predefined label space. In 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). Association for Computing Machinery, New York, NY, 1343–1351. DOI:
[172]
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1997. BIRCH: A new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1, 2 (1997), 141–182.
[173]
Z. Zhang, Y. Li, C. Jin, and M. Gao. 2018. Adaptive matrix sketching and clustering for semisupervised incremental learning. IEEE Sig. Process Lett. 25, 7 (2018), 1069–1073. DOI:
[174]
Xiulin Zheng, Peipei Li, Xuegang Hu, and Kui Yu. 2021. Semi-supervised classification on data streams with recurring concept drift and concept evolution. Knowl.-based Syst. 215 (2021). DOI:
[175]
Da-Wei Zhou, Yang Yang, and De-Chuan Zhan. 2021. Detecting sequentially novel classes with stable generalization ability. In Advances in Knowledge Discovery and Data Mining, Kamal Karlapalem, Hong Cheng, Naren Ramakrishnan, R. K. Agrawal, P. Krishna Reddy, Jaideep Srivastava, and Tanmoy Chakraborty (Eds.). Springer International Publishing, Cham, 371–382.
[176]
Yuxun Zhou, Reza Arghandeh, and Costas J. Spanos. 2016. Online learning of contextual hidden Markov models for temporal-spatial data analysis. In IEEE 55th Conference on Decision and Control (CDC’16). 6335–6341. DOI:
[177]
Yong-Nan Zhu and Yu-Feng Li. 2020. Semi-supervised streaming learning with emerging new labels. Proc. AAAI Conf. Artif. Intell. 34, 04 (Apr. 2020), 7015–7022. DOI:
[178]
E. Özbilge. 2016. On-line expectation-based novelty detection for mobile robots. Robot Autom. Syst. 81 (2016), 33–47. DOI:
[179]
R. Šajina, N. Tanković, and D. Etinger. 2020. Novel class detection in non-stationary streaming environment with a discriminative classifier. In 43rd International Convention on Information, Communication and Electronic Technology (MIPRO’20). 1109–1113. DOI:

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 56, Issue 10
October 2024
954 pages
EISSN:1557-7341
DOI:10.1145/3613652
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2024
Online AM: 12 April 2024
Accepted: 06 April 2024
Revised: 19 February 2024
Received: 09 April 2023
Published in CSUR Volume 56, Issue 10

Check for updates

Author Tags

  1. Novelty detection
  2. data streams
  3. data mining
  4. concept evolution
  5. online learning
  6. concept drift

Qualifiers

  • Survey

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 806
    Total Downloads
  • Downloads (Last 12 months)806
  • Downloads (Last 6 weeks)99
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media