Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3442442.3453543acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce

Published: 03 June 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Nowadays, the volume and variety of generated data, how to process it and accordingly create value through scalable analytics are main challenges to industries and real-world practices such as talent analytics. For instance, large enterprises and job centres have to progress data intensive matching of job seekers to various job positions at the same time. In other words, it should result in the large scale assignment of best-fit (right) talents (Person) with right expertise (Profession) to the right job (Position) at the right time (Period). We call this definition as a 4P rule in this paper. All enterprises should consider 4P rule in their daily recruitment processes towards efficient workforce development strategies. Such consideration demands integrating large volumes of disparate data from various sources and strongly needs the use of scalable algorithms and analytics. The diversity of the data in human resource management requires speeding up analytical processes. The main challenge here is not only how and where to store the data, but also the analysing it towards creating value (knowledge discovery). In this paper, we propose a generic Career Knowledge Representation (CKR) model in order to be able to model most competences that exist in a wide variety of careers. A regenerated job qualification data of 15 million employees with 84 dimensions (competences) from real HRM data has been used in test and evaluation of proposed Evolutionary MapReduce K-Means method in this research. This proposed EMR method shows faster and more accurate experimental results in comparison to similar approaches and has been tested with real large scale datasets and achieved results are already discussed.

    References

    [1]
    Lefteris Angelis, Mahdi Bohlouli, Kiki Hatzistavrou, George Kakarontzas, Julian Lopez, and Johannes Zenkert. 2018. The COMALAT approach to individualized E-learning in job-specific language competences. In Digital Marketplaces Unleashed. Springer, 137–148.
    [2]
    Janet L Bailey. 2014. Non-technical skills for success in a technical world. International Journal of Business and Social Science 5, 4 (March 2014), 1–10.
    [3]
    Mahdi Bohlouli, Fazel Ansari, George Kakarontzas, and Lefteris Angelis. 2015. An adaptive model for competences assessment of IT professionals. In Integrated Systems: Innovations and Applications. Springer, 91–110.
    [4]
    Mahdi Bohlouli, Fazel Ansari, Yogesh Patel, Madjid Fathi, Miguel Loitxate Cid, and Lefteris Angelis. 2013. Towards analytical evaluation of professional competences in human resource management. In IECON 2013-39th Annual Conference of the IEEE Industrial Electronics Society. IEEE, 8335–8340.
    [5]
    Mahdi Bohlouli, Alexander Holland, and Madjid Fathi. 2011. Knowledge Integration of Collaborative Product Design Using Cloud Computing Infrastructure. In Proceeding of International Conference on Electro/Information Technology (IEEE EIT). Mankato, MN, USA.
    [6]
    Mahdi Bohlouli, Fabian Merges, and Madjid Fathi. 2012. A Cloud-based Conceptual Framework for Knowledge Integration in Distributed Enterprises. In Proceeding of International Conference on Electro/Information Technology (IEEE EIT). Indianapolis, IN, USA.
    [7]
    Mahdi Bohlouli, Fabian Merges, and Madjid Fathi. 2012. A cloud-based conceptual framework for knowledge integration in distributed enterprises. In International Conference on Electro/Information Technology (IEEE EIT 2012), 2012.
    [8]
    Mahdi Bohlouli, Nikolaos Mittas, George Kakarontzas, Theodosios Theodosiou, Lefteris Angelis, and Madjid Fathi. 2017. Competence assessment as an expert system for human resource management: A mathematical approach. Expert Systems with Applications 70 (March 2017), 83–102. https://doi.org/10.1016/j.eswa.2016.10.046
    [9]
    Mahdi Bohlouli, Frank Schulz, Lefteris Angelis, David Pahor, Ivona Brandic, David Atlan, and Rosemary Tate. 2013. Integration of Practice-Oriented Knowledge Technology: Trends and Prospectives. Springer Berlin Heidelberg, Chapter Towards an Integrated Platform for Big Data Analysis, 47–56.
    [10]
    André B. Bondi. 2000. Characteristics of scalability and their impact on performance. In Proceedings of the second international workshop on Software andperformance-WOSP. ACM, New York, NY, USA, 195–203.
    [11]
    Lawrence David Davis. 1991. Handbook Of Genetic Algorithms(1st ed.). Van Nostrand Reinhold.
    [12]
    Gilberto Viana de Oliveira and Murilo Coelho Naldi. 2015. Scalable Fast Evolutionary k-Means Clustering. In 2015 Brazilian Conference on Intelligent Systems (BRACIS). IEEE. https://doi.org/10.1109/bracis.2015.20
    [13]
    Gilberto Viana de Oliveira and Murilo Coelho Naldi. 2015. Scalable fast evolutionary k-means clustering. In 2015 Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 74–79.
    [14]
    Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters, In Sixth Symposium on Operating System Design and Implementation (OSDI). Commun. ACM (December 2004), 137–149.
    [15]
    Kalyanmoy Deb. 2001. Multi-Objective Optimization Using Evolutionary Algorithms. JOHN WILEY & SONS INC.
    [16]
    Yaozu Dong, Xudong Zheng, Xiantao Zhang, Jinquan Dai, Jianhui Li, Xin Li, Gang Zhai, and Haibing Guan. 2010. Improving virtualization performance and scalability with advanced hardware accelerations. In IEEE International Symposium on Workload Characterization (IISWC10). IEEE. https://doi.org/10.1109/iiswc.2010.5649499
    [17]
    Edd Dumbill. 2011. The SMAQ stack for big data. In Big Data Now (firsted.), Mac Slocum (Ed.). O’Reilly Media, 16–29.
    [18]
    A. E. Eiben and J. E. Smith. 2003. Introduction to Evolutionary Computing. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-662-05094-1
    [19]
    eQuest Big Data for Human Resources. 2010. Big Data: HR’s Golden Opportunity Arrives. Technical Report. eQuest Headquarters.
    [20]
    Jackie Fenn and Hung LeHong. 2011. Hype Cycle for Emerging Technologies. Technical Report G00215650. Gartner, Inc.
    [21]
    David B. Fogel. 2006. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. JOHN WILEY and SONS INC.
    [22]
    Alex A. Freitas. 2002. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer.
    [23]
    Guojun Gan, Chaoqun Ma, and Jianhong Wu. 2007. Data Clustering: Theory, Algorithms, and Applications. SIAM, Society for Industrial and Applied Mathematics.
    [24]
    Kemilly Dearo Garcia and Murilo Coelho Naldi. 2014. Multiple Parallel MapReduce k-Means Clustering with Validation and Selection. In 2014 Brazilian Conference on Intelligent Systems. IEEE. https://doi.org/10.1109/bracis.2014.83
    [25]
    Alan F. Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan M. Narayanamurthy, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, and Utkarsh Srivastava. 2009. Building a high-level dataflow system on top of Map-Reduce. Proceedings of the VLDB Endowment 2, 2 (aug 2009), 1414–1425. https://doi.org/10.14778/1687553.1687568
    [26]
    Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. Operating Systems Review 37, 5 (December 2003), 29–43.
    [27]
    E.R. Hruschka, L.N. de Castro, and R.J.G.B. Campello. 2004. Evolutionary Algorithms for Clustering Gene-Expression Data. In Fourth IEEE International Conference on Data Mining (ICDM04). IEEE. https://doi.org/10.1109/icdm.2004.10073
    [28]
    John E. Hunter. 1986. Cognitive Ability, Cognitive Aptitudes, Job Knowledge, and Job Performance.Journal of Vocational Behavior 29, 3 (1986), 340–62.
    [29]
    Laxmikant Kalé, Robert Skeel, Milind Bhandarkar, Robert Brunner, Attila Gursoy, Neal Krawetz, James Phillips, Aritomo Shinozaki, Krishnan Varadarajan, and Klaus Schulten. 1999. NAMD2: Greater Scalability for Parallel Molecular Dynamics. J. Comput. Phys. 151, 1 (may 1999), 283–312. https://doi.org/10.1006/jcph.1999.6201
    [30]
    Amin Keshavarzi, Abolfazl T Haghighat, and Mahdi Bohlouli. 2013. Research challenges and prospective business impacts of cloud computing: A survey. In 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), Vol. 2. IEEE, 731–736.
    [31]
    Amin Keshavarzi, Abolfazl Toroghi Haghighat, and Mahdi Bohlouli. 2017. Adaptive Resource Management and Provisioning in the Cloud Computing: A Survey of Definitions, Standards and Research Roadmaps.KSII Transactions on Internet & Information Systems 11, 9 (2017).
    [32]
    Henry H. Liu. 2009. Software Performance and Scalability: A Quantitative Approach. WILEY-BLACKWELL.
    [33]
    Mike Loukides. 2011. What is data science?In Big Data Now (firsted.), Mac Slocum (Ed.). O’Reilly Media, 1–15.
    [34]
    John MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1: Statistics. University of California Press, Berkeley, Calif., 281–297.
    [35]
    James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. 2011. Big data: The next frontier for innovation, competition, and productivity. Technical Report. McKinsey Global Institute (MGI).
    [36]
    Donald Miner and Adam Shook. 2012. MapReduce Design Patterns. O’Reilly UK Ltd.
    [37]
    Melanie Mitchell. 1998. An Introduction to Genetic Algorithms. MIT PR.
    [38]
    Nikolaos Mittas, George Kakarontzas, Mahdi Bohlouli, Lefteris Angelis, Ioannis Stamelos, and Madjid Fathi. 2015. ComProFITS: A web-based platform for human resources competence assessment. In 2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE, 1–6.
    [39]
    Nikolaos Mittas, George Kakarontzas, Mahdi Bohlouli, Lefteris Angelis, Ioannis Stamelos, and Madjid Fathi. 2015. ComProFITS: A Web-based Platform for Human Resources Competence Assessment. In The Sixth International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE.
    [40]
    Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. 2011. Mahout in Action. Manning Publications Co.
    [41]
    Jaroslav Pokorny. 2011. NoSQL databases. In Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services - (iiWAS11). ACM Press. https://doi.org/10.1145/2095536.2095583
    [42]
    Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (nov 1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
    [43]
    Philip Russom. 2011. Big Data Analytics. Technical Report. TDWI Research.
    [44]
    Frank L. Schmidt, John E. Hunter, and Alice N. Outerbridge. 1986. Impact of job experience and ability on job knowledge, work sample performance, and supervisory ratings of job performance.Journal of Applied Psychology 71, 3 (1986), 432–439. https://doi.org/10.1037/0021-9010.71.3.432
    [45]
    Peter Sloane, Kostas Mavromaras, Nigel O’Leary, Seamus McGuinness, and Philip J O’Connell. 2010. The Skill Matching Challenge: Analysing Skill Mismatch and Policy implications. Technical Report. Publications office of the European Union.
    [46]
    Sebastian Stier, Arnim Bleier, Malte Bonart, Fabian Mörsheim, Mahdi Bohlouli, Margarita Nizhegorodov, Lisa Posch, Jürgen Maier, Tobias Rothmund, and Steffen Staab. 2018. Systematically monitoring social media: The case of the German federal election 2017. arXiv preprint arXiv:1804.02888(2018).
    [47]
    Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2, 2 (aug 2009), 1626–1629. https://doi.org/10.14778/1687553.1687609
    [48]
    W. Jackeline Torres and Margaret E. Beier. 2016. It’s Time To Examine the Nomological Net of Job Knowledge. Industrial and Organizational Psychology 9, 1 (March 2016), 51–55.
    [49]
    Jeffrey S. Vitter. 1985. Random sampling with a reservoir. ACM Trans. Math. Software 11, 1 (mar 1985), 37–57. https://doi.org/10.1145/3147.3165
    [50]
    Tom White. 2009. Hadoop: The Definitive Guide. O’Reilly Media.
    [51]
    X. Yao and Y. Liu. 1997. A new evolutionary system for evolving artificial neural networks. IEEE Transactions on Neural Networks 8, 3 (may 1997), 694–713. https://doi.org/10.1109/72.572107

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '21: Companion Proceedings of the Web Conference 2021
    April 2021
    726 pages
    ISBN:9781450383134
    DOI:10.1145/3442442
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Big Data
    2. Evolutionary Algorithms
    3. K-Means Clustering
    4. Large Scale Human Resource Data.
    5. Scalable Clustering

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '21
    Sponsor:
    WWW '21: The Web Conference 2021
    April 19 - 23, 2021
    Ljubljana, Slovenia

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 78
      Total Downloads
    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media