Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3638209.3638224acmotherconferencesArticle/Chapter ViewAbstractPublication PagesciisConference Proceedingsconference-collections
research-article

Occupancy Estimation in Smart Buildings: Impact of Data Quality on Feature Selection

Published: 28 February 2024 Publication History

Abstract

Feature selection has been widely applied in machine learning applications to reduce computational time, improve learning accuracy, and better understand the data modeling process. One of the vital challenges is the correct selection of the relevant features from the available ones in the training dataset. A training dataset of poor quality may compromise the features selection step, which will decrease the generalization capability of the resulting model. Defining the essential features based on data from sensors in smart building applications can significantly reduce solution complexity and total cost. In this paper, an indicator named Qscore is proposed to assess the quality of the training data as an essential step before deploying the feature selection methods. Moreover, a high Qscore value would confirm the reliability of the selected features. To validate the proposed novel concept, the occupancy estimation problem in an office contest is investigated. The training data consist of the measurements collected from standard sensors, for instance, motion detection, power consumption, and CO2 concentration, and the label (i.e., the number of occupants). Several features selection methods along with the proposed data quality indicator and cross-validation approach are deployed to assure the best choice of features in the occupancy estimation problem. Extensive simulations and experiments show the merits of our framework.

References

[1]
Afan Ali. 2017. Dominant Eigen Vector Based Feature Selection Using Singular Value Decomposition in Automatic Modulation Classification. International Journal of Computer Theory and Engineering 9 (10 2017), 398–401.
[2]
Manar Amayri, Abhay Arora, Stephane Ploix, Sanghamitra Bandhyopadyay, Quoc-Dung Ngo, and Venkata Ramana Badarla. 2016. Estimating occupancy in heterogeneous sensor environment. Energy and Buildings 129 (2016), 46 – 58.
[3]
Manar Amayri, Stephane Ploix, Nizar Bouguila, and Frederic Wurtz. 2020. Database quality assessment for interactive learning: Application to occupancy estimation. Energy and Buildings 209 (2020), 109578.
[4]
Nizar Bouguila. 2009. A Model-Based Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity. IEEE Trans. Knowl. Data Eng. 21, 12 (2009), 1649–1664.
[5]
Nizar Bouguila. 2012. Hybrid Generative/Discriminative Approaches for Proportional Data Modeling and Classification. IEEE Transactions on Knowledge and Data Engineering 24, 12 (2012), 2184–2202.
[6]
Nizar Bouguila and Wentao Fan. 2020. Mixture models and applications. Springer.
[7]
Nizar Bouguila and Djemel Ziou. 2008. A Dirichlet process mixture of dirichlet distributions for classification and prediction. In 2008 IEEE Workshop on Machine Learning for Signal Processing. 297–302.
[8]
Nizar Bouguila and Djemel Ziou. 2012. A countably infinite mixture model for clustering and feature selection. Knowl. Inf. Syst. 33, 2 (2012), 351–370.
[9]
Sabri Boutemedjet, Djemel Ziou, and Nizar Bouguila. 2007. Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data. In Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007, John C. Platt, Daphne Koller, Yoram Singer, and Sam T. Roweis (Eds.). Curran Associates, Inc., 177–184.
[10]
John Byabazaire, Gregory O’Hare, and Declan Delaney. 2020. Data Quality and Trust: Review of Challenges and Opportunities for Data Sharing in IoT. Electronics 9, 12 (2020).
[11]
Jie Cai, Jiawei Luo, Shulin Wang, and Sheng Yang. 2018. Feature selection in machine learning: A new perspective. Neurocomputing 300 (2018), 70 – 79.
[12]
Zhenghua Chen, Chaoyang Jiang, and Lihua Xie. 2018. Building occupancy estimation and detection: A review. Energy and Buildings 169 (2018), 260 – 270.
[13]
Aveek K. Das, Parth H. Pathak, Josiah Jee, Chen-Nee Chuah, and Prasant Mohapatra. 2017. Non-Intrusive Multi-Modal Estimation of Building Occupancy. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems (Delft, Netherlands) (SenSys ’17). Association for Computing Machinery, Article 14, 14 pages.
[14]
Djamel Djenouri, Roufaida Laidi, Youcef Djenouri, and Ilangko Balasingham. 2019. Machine Learning for Smart Building Applications: Review and Taxonomy. ACM Comput. Surv. 52, 2, Article 24 (March 2019), 36 pages.
[15]
P. Geurts, D. Ernst, and L. Wehenkel. 2006. Extremely randomized trees. Machine Learning 63 (2006), 3–42.
[16]
Shadan Golestan, Sepehr Kazemian, and Omid Ardakanian. 2018. Data-Driven Models for Building Occupancy Estimation. In Proceedings of the Ninth International Conference on Future Energy Systems (Karlsruhe, Germany) (e-Energy ’18). Association for Computing Machinery, 277–281.
[17]
Aurora Gonzalez-Vidal, Fernando Jimenez, and Antonio F. Gomez-Skarmeta. 2019. A methodology for energy multivariate time series forecasting in smart buildings based on feature selection. Energy and Buildings 196 (2019), 71–82.
[18]
Tianzhen Hong, Zhe Wang, Xuan Luo, and Wanni Zhang. 2020. State-of-the-art on research and applications of machine learning in the building life cycle. Energy and Buildings 212 (2020), 109831.
[19]
Can Hu, Wentao Fan, Ji-Xiang Du, and Nizar Bouguila. 2019. A novel statistical approach for clustering positive data based on finite inverted Beta-Liouville mixture models. Neurocomputing 333 (2019), 110–123.
[20]
Chaoyang Jiang, Zhenghua Chen, Rong Su, Mustafa Khalid Masood, and Yeng Chai Soh. 2020. Bayesian filtering for building occupancy estimation from carbon dioxide concentration. Energy and Buildings 206 (2020), 109566.
[21]
Jing-Doo Wang, Hsiang-Chuan Liu, and Yao-Chug Shi. 2009. A novel approach for evaluating Class Structure Ambiguity. In 2009 International Conference on Machine Learning and Cybernetics, Vol. 3. 1550–1555.
[22]
Aimad Karkouch, Hajar Mousannif, Hassan Al Moatassime, and Thomas Noel. 2016. Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications 73 (2016), 57 – 81.
[23]
Waqqas M. Khan and Imran A. Zualkernan. 2018. SensePods: A ZigBee-Based Tangible Smart Home Interface. IEEE Transactions on Consumer Electronics 64, 2 (2018), 145–152.
[24]
Scott Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In NIPS.
[25]
Donald W. Marquardt and R. Snee. 1975. Ridge Regression in Practice. The American Statistician 29 (1975), 3–20.
[26]
Abbas Mehrabi and Kiseon Kim. 2018. Low-Complexity Charging/Discharging Scheduling for Electric Vehicles at Home and Common Lots for Smart Households Prosumers. IEEE Transactions on Consumer Electronics 64, 3 (2018), 348–355.
[27]
Clayton Miller, Zoltan Nagy, and Arno Schlueter. 2018. A review of unsupervised statistical learning and visual analytics techniques applied to performance analysis of non-residential buildings. Renewable and Sustainable Energy Reviews 81 (2018), 1365 – 1377.
[28]
Avinash Pallikere, Robin Qiu, Parhum Delgoshaei, and Ashkan Negahban. 2020. Incorporating occupancy data in scheduling building equipment: A simulation optimization framework. Energy and Buildings 209 (2020), 109655.
[29]
Yogesh Pawar, Manar Amayri, and Nizar Bouguila. 2020. Performance Evaluation of Geometric Area Analysis Technique for Anomaly Detection Using Trapezoidal Area Estimation. In 2020 International Symposium on Networks, Computers and Communications (ISNCC). 1–6.
[30]
Kostas P. Peppas, George C. Alexandropoulos, Evangelos D. Xenos, and Andreas Maras. 2020. The Fischer-Snedecor Math 11-Distribution Model for Turbulence-Induced Fading in Free-Space Optical Systems. Journal of Lightwave Technology 38, 6 (2020), 1286–1295.
[31]
B. Qolomany, A. Al-Fuqaha, A. Gupta, D. Benhaddou, S. Alwajidi, J. Qadir, and A. C. Fong. 2019. Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey. IEEE Access 7 (2019), 90316–90356.
[32]
Anu radha.P and Vasantha David. 2021. Feature selection using ModifiedBoostARoota and prediction of heart diseases using Gradient Boosting algorithms. International Journal of Computer Theory and Engineering (02 2021), 19–23.
[33]
M. Robnik-Sikonja and I. Kononenko. 2004. Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning 53 (2004), 23–69.
[34]
N. Sadeghianpourhamami, J. Ruyssinck, D. Deschrijver, T. Dhaene, and C. Develder. 2017. Comprehensive feature selection for appliance classification in NILM. Energy and Buildings 151 (2017), 98 – 106.
[35]
D. M. Shawky and A. F. Ali. 2011. A Feature Selection Method using Misclassified Patterns. International Journal of Computer Theory and Engineering 3 (2011), 643–651.
[36]
L. Jaba Sheela and V. Shanthi. 2009. An Approach for Discretization and Feature Selection Of Continuous-Valued Attributes in Medical Images for Classification Learning. International Journal of Computer and Electrical Engineering (01 2009), 179–183.
[37]
R. Tibshirani. 1996. Regression Shrinkage and Selection via the Lasso. Journal of the royal statistical society series b-methodological 58 (1996), 267–288.
[38]
Ryan Urbanowicz, Melissa Meeker, William LaCava, Randal Olson, and Jason Moore. 2017. Relief-Based Feature Selection: Introduction and Review. Journal of Biomedical Informatics 85 (11 2017).
[39]
Shalika Walker, Waqas Khan, Katarina Katic, Wim Maassen, and Wim Zeiler. 2020. Accuracy of different machine learning algorithms and added-value of predicting aggregated-level energy performance of commercial buildings. Energy and Buildings 209 (2020), 109705.
[40]
Jing-Doo Wang and Yao-Chug Shi. 2009. Evaluating the ambiguity of non-linear separable class structure via instance neighbor entropy. In The 20th Workshop on Object-Oriented Technology and Applications. 54.
[41]
Fernando Marcos Wittmann, Juan Camilo Lopez, and Marcos J. Rider. 2018. Nonintrusive Load Monitoring Algorithm Using Mixed-Integer Linear Programming. IEEE Transactions on Consumer Electronics 64, 2 (2018), 180–187.
[42]
Liang Zhang and Jin Wen. 2019. A systematic feature selection procedure for short-term data-driven building energy forecasting model development. Energy and Buildings 183 (2019), 428 – 442.
[43]
Zhengze Zhou and Giles Hooker. 2019. Unbiased Measurement of Feature Importance in Tree-Based Methods. arXiv e-prints (March 2019), arXiv:1903.05179. arXiv:1903.05179

Index Terms

  1. Occupancy Estimation in Smart Buildings: Impact of Data Quality on Feature Selection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CIIS '23: Proceedings of the 2023 6th International Conference on Computational Intelligence and Intelligent Systems
    November 2023
    193 pages
    ISBN:9798400709067
    DOI:10.1145/3638209
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 February 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Smart buildings
    2. data quality
    3. feature selection.
    4. occupancy estimation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CIIS 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 21
      Total Downloads
    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media