research-article

Open access

Synthetic Dataset Generation for Fairer Unfairness Research

Authors:

Nigel BoschAuthors Info & Claims

LAK '24: Proceedings of the 14th Learning Analytics and Knowledge Conference

Pages 200 - 209

https://doi.org/10.1145/3636555.3636868

Published: 18 March 2024 Publication History

All formats PDF

Abstract

Recent research has made strides toward fair machine learning. Relatively few datasets, however, are commonly examined to evaluate these fairness-aware algorithms, and even fewer in education domains, which can lead to a narrow focus on particular types of fairness issues. In this paper, we describe a novel dataset modification method that utilizes a genetic algorithm to induce many types of unfairness into datasets. Additionally, our method can generate an unfairness benchmark dataset from scratch (thus avoiding data collection in situations that might exploit marginalized populations), or modify an existing dataset used as a reference point. Our method can increase the unfairness by 156.3% on average across datasets and unfairness definitions while preserving AUC scores for models trained on the original dataset (just 0.3% change, on average). We investigate the generalization of our method across educational datasets with different characteristics and evaluate three common unfairness mitigation algorithms. The results show that our method can generate datasets with different types of unfairness, large and small datasets, different types of features, and which affect models trained with different classifiers. Datasets generated with this method can be used for benchmarking and testing for future research on the measurement and mitigation of algorithmic unfairness.

References

[1]

John M. Abowd and Lars Vilhuber. 2008. How protective are synthetic data?. In Privacy in Statistical Databases: UNESCO Chair in Data Privacy International Conference, PSD 2008, Istanbul, Turkey, September 24-26, 2008. Proceedings. Springer, Springer, Heidelberg, 239–246.

[2]

Ricardo Aguiar and MTAG Collares-Pereira. 1992. TAG: a time-dependent, autoregressive, Gaussian model for generating synthetic hourly radiation. Solar energy 49, 3 (1992), 167–174.

[3]

Samuel A. Assefa, Danial Dervovic, Mahmoud Mahfouz, Robert E. Tillman, Prashant Reddy, and Manuela Veloso. 2021. Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance(ICAIF ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3383455.3422554

Digital Library

[4]

Ryan S. Baker, Albert T. Corbett, Kenneth R. Koedinger, and Angela Z. Wagner. 2004. Off-task behavior in the cognitive tutor classroom: When students "game the system". In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’04). ACM, New York, NY, 383–390. https://doi.org/10.1145/985692.985741

Digital Library

[5]

Ryan S. Baker, Lief Esbenshade, Jonathan Vitale, and Shamya Karumbaiah. 2023. Using demographic data as predictor variables: A questionable choice. Journal of Educational Data Mining 15, 2 (June 2023), 22–52. https://doi.org/10.5281/zenodo.7702628 Number: 2.

[6]

Ryan S. Baker and Aaron Hawn. 2021. Algorithmic bias in education. International Journal of Artificial Intelligence in Education 32 (Nov. 2021), 1052–1092. https://doi.org/10.1007/s40593-021-00285-9

[7]

Michelle Bao, Angela Zhou, Samantha Zottola, Brian Brubach, Brian Brubach, Sarah Desmarais, Aaron Horowitz, Kristian Lum, and Suresh Venkatasubramanian. 2021. It’s COMPASlicated: The messy relationship between RAI datasets and algorithmic fairness benchmarks. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1 (Dec. 2021), 1–18. https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/92cc227532d17e56e07902b254dfad10-Abstract-round1.html

[8]

Enrico Barbierato, Marco L. Della Vedova, Daniele Tessera, Daniele Toti, and Nicola Vanoli. 2022. A methodology for controlling bias and fairness in synthetic data generation. Applied Sciences 12, 9 (Jan. 2022), 4619. https://doi.org/10.3390/app12094619 Number: 9 Publisher: Multidisciplinary Digital Publishing Institute.

[9]

Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org. http://www.fairmlbook.org.

[10]

Solon Barocas and Andrew D. Selbst. 2016. Big data’s disparate impact. California Law Review 104, 671 (2016), 671–732.

[11]

Clara Belitz, Lan Jiang, and Nigel Bosch. 2021. Automating procedurally fair feature selection in machine learning. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. Association for Computing Machinery, New York, NY, 379–389. https://doi.org/10.1145/3461702.3462585

Digital Library

[12]

Clara Belitz, Jaclyn Ocumpaugh, Steven Ritter, Ryan S. Baker, Stephen E. Fancsali, and Nigel Bosch. 2022. Constructing categories: Moving beyond protected classes in algorithmic fairness. Journal of the Association for Information Science and Technology (2022), 1–6. https://doi.org/10.1002/asi.24643

Digital Library

[13]

Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2021. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research 50, 1 (2021), 3–44.

[14]

Su Lin Blodgett, Gilsinia Lopez, Alexandra Olteanu, Robert Sim, and Hanna Wallach. 2021. Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1004–1015. https://doi.org/10.18653/v1/2021.acl-long.81

[15]

Claire McKay Bowen and Fang Liu. 2020. Comparative study of differentially private data synthesis methods. Statist. Sci. 35, 2 (May 2020), 280–307. https://doi.org/10.1214/19-STS742 Publisher: Institute of Mathematical Statistics.

[16]

Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (Oct. 2001), 5–32. https://doi.org/10.1023/A:1010933404324

Digital Library

[17]

Brent Bridgeman. 2013. 13 Human Ratings and Automated Essay Evaluation. Handbook of automated essay evaluation: Current applications and new directions (2013), 221.

[18]

Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: Experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108–122.

[19]

Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency. PMLR, New York, NY, 77–91. https://proceedings.mlr.press/v81/buolamwini18a.html ISSN: 2640-3498.

[20]

Gregory Caiola and Jerome P. Reiter. 2010. Random forests for generating partially synthetic, categorical data.Trans. Data Priv. 3, 1 (2010), 27–42.

Digital Library

[21]

Alessandro Castelnovo, Riccardo Crupi, Nicole Inverardi, Daniele Regoli, and Andrea Cosentini. 2022. Investigating bias with a synthetic data generator: Empirical evidence and philosophical interpretation. http://arxiv.org/abs/2209.05889 arXiv:2209.05889 [cs, stat].

[22]

Richard J. Chen, Ming Y. Lu, Tiffany Y. Chen, Drew F. K. Williamson, and Faisal Mahmood. 2021. Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering 5, 6 (June 2021), 493–497. https://doi.org/10.1038/s41551-021-00751-8

[23]

Chanachok Chokwitthaya, Yimin Zhu, Supratik Mukhopadhyay, and Amirhosein Jafari. 2020. Applying the Gaussian mixture model to generate large synthetic data from a small data set. In Construction Research Congress 2020: Computer Applications. American Society of Civil Engineers, Tempe, Arizona, 1251–1260.

[24]

Jade Mai Cock, Muhammad Bilal, Richard Davis, Mirko Marras, and Tanja Kaser. 2023. Protected attributes tell us who, behavior tells us how: A comparison of demographic and behavioral oversampling for fair student success modeling. In LAK23: 13th International Learning Analytics and Knowledge Conference. ACM, Arlington TX USA, 488–498. https://doi.org/10.1145/3576050.3576149

Digital Library

[25]

Paulo Cortez and Alice Maria Gonçalves Silva. 2008. Using data mining to predict secondary school student performance. In Proceedings of 5th Future Business Technology Conference. EUROSIS-ETI, Porto, Portugal, 5–12.

[26]

Sasha Costanza-Chock, Inioluwa Deborah Raji, and Joy Buolamwini. 2022. Who audits the auditors? Recommendations from a field scan of the algorithmic auditing ecosystem. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, Seoul Republic of Korea, 1571–1583. https://doi.org/10.1145/3531146.3533213

Digital Library

[27]

Charles Darwin. 2004. On the Origin of Species, 1859. Routledge.

[28]

Frances Ding, Moritz Hardt, John Miller, and Ludwig Schmidt. 2021. Retiring adult: New datasets for fair machine learning. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). Vol. 34. Curran Associates, Inc., New York, 6478–6490. https://proceedings.neurips.cc/paper/2021/file/32e54441e6382a7fbacbbbaf3c450059-Paper.pdf

[29]

Jörg Drechsler. 2010. Using support vector machines for generating synthetic datasets. In International Conference on Privacy in Statistical Databases. Springer, Springer, Berlin, Heidelberg, 148–161. https://doi.org/10.1007/978-3-642-15838-4_14

[30]

Jörg Drechsler. 2011. Synthetic Datasets for Statistical Disclosure Control: Theory and Implementation. Vol. 201. Springer Science & Business Media, New York, NY. https://doi.org/10.1007/978-1-4614-0326-5

[31]

Cynthia Dwork. 2010. Differential privacy in new settings. In Proceedings of the 2010 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)(Proceedings). Society for Industrial and Applied Mathematics, Philadelphia, PA, 174–183. https://doi.org/10.1137/1.9781611973075.16

[32]

Cristóbal Esteban, Stephanie L. Hyland, and Gunnar Rätsch. 2017. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633 (2017), 13.

[33]

Virginia Eubanks. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s Publishing Group, New York, NY, USA.

Digital Library

[34]

Alessandro Fabris, Stefano Messina, Gianmaria Silvello, and Gian Antonio Susto. 2022. Algorithmic fairness datasets: the story so far. Data Mining and Knowledge Discovery 36, 6 (Nov. 2022), 2074–2152. https://doi.org/10.1007/s10618-022-00854-z

Digital Library

[35]

Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 259–268. https://doi.org/10.1145/2783258.2783311

Digital Library

[36]

Gianni Fenu, Roberta Galici, and Mirko Marras. 2022. Experts’ view on challenges and needs for fairness in artificial intelligence for education. In International Conference on Artificial Intelligence in Education. Springer, 243–255.

[37]

Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Denver, Colorado, USA). Association for Computing Machinery, New York, NY, USA, 1322–1333.

Digital Library

[38]

Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (April 2006), 3–42. https://doi.org/10.1007/s10994-006-6226-1

Digital Library

[39]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.

Digital Library

[40]

Aman Gupta, Deepak Bhatt, and Anubha Pandey. 2021. Transitioning from real to synthetic data: Quantifying the bias in model. arXiv. http://arxiv.org/abs/2105.04144 arXiv:2105.04144 [cs].

[41]

Inken Hagestedt, Yang Zhang, Mathias Humbert, Pascal Berrang, Tang Haixu, Wang XiaoFeng, and Michael Backes. 2019. MBeacon: Privacy-preserving beacons for DNA methylation data. In Network and Distributed Systems Security (NDSS) Symposium 2019. Network and Distributed Systems Security (NDSS) Symposium, San Diego, CA, USA, 15. https://dx.doi.org/10.14722/ndss.2019.23064

[42]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016).

[43]

K.A. De Jong. 1975. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph. D. Dissertation. University of Michigan, Ann Arbor, MI, USA.

[44]

Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (Oct. 2012), 1–33. https://doi.org/10.1007/s10115-011-0463-8

Digital Library

[45]

Sourabh Katoch, Sumit Singh Chauhan, and Vijay Kumar. 2021. A review on genetic algorithm: past, present, and future. Multimedia Tools and Applications 80 (2021), 8091–8126.

Digital Library

[46]

Os Keyes. 2018. The misgendering machines: Trans/HCI implications of automatic gender recognition. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (Nov. 2018), 88:1–88:22. https://doi.org/10.1145/3274357

Digital Library

[47]

Huda Khayrallah and Philipp Koehn. 2018. On the Impact of Various Types of Noise on Neural Machine Translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Association for Computational Linguistics, Melbourne, Australia, 74–83. https://doi.org/10.18653/v1/W18-2709

[48]

Adam Kortylewski, Bernhard Egger, Andreas Schneider, Thomas Gerig, Andreas Morel-Forster, and Thomas Vetter. 2019. Analyzing and reducing the damage of dataset bias to face recognition With synthetic data. https://openaccess.thecvf.com/content_CVPRW_2019/html/BEFA/Kortylewski_Analyzing_and_Reducing_the_Damage_of_Dataset_Bias_to_Face_CVPRW_2019_paper.html

[49]

Manford H. Kuhn and Thomas S. McPartland. 2017. An empirical investigation of self-attitudes. In Sociological Methods. Routledge, England, UK, 167–182.

[50]

Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Wenbin Zhang, and Eirini Ntoutsi. 2022. A survey on datasets for fairness-aware machine learning. WIREs Data Mining and Knowledge Discovery 12, 3 (2022), e1452. https://doi.org/10.1002/widm.1452

[51]

Adam Lipowski and Dorota Lipowska. 2012. Roulette-wheel selection via stochastic acceptance. Physica A: Statistical Mechanics and its Applications 391, 6 (2012), 2193–2196.

[52]

Qun Liu, Supratik Mukhopadhyay, Yimin Zhu, Ravindra Gudishala, Sanaz Saeidi, and Alimire Nabijiang. 2019. Improving route choice models by incorporating contextual factors via knowledge distillation. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, IEEE, Budapest, Hungary, 1–8. https://doi.org/10.1109/IJCNN.2019.8852482

[53]

Michael Madaio, Su Lin Blodgett, Elijah Mayfield, and Ezekiel Dixon-Román. 2021. Beyond "fairness:" Structural (in)justice lenses on AI for education. In The Ethics of Artificial Intelligence in Education: Current Challenges, Practices and Debates, W. Holmesand and K. Porayska-Pomsta (Eds.). Routledge, England, UK, 24. http://arxiv.org/abs/2105.08847 arXiv:2105.08847.

[54]

Daniel McDuff, Shuang Ma, Yale Song, and Ashish Kapoor. 2019. Characterizing Bias in Classifiers using Generative Models. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc., New York, 1–12. https://proceedings.neurips.cc/paper/2019/file/7f018eb7b301a66658931cb8a93fd6e8-Paper.pdf

[55]

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. Comput. Surveys 54, 6 (July 2021), 115:1–115:35. https://doi.org/10.1145/3457607

Digital Library

[56]

Oren Melamud and Chaitanya Shivade. 2019. Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 35–45. https://doi.org/10.18653/v1/W19-1905

[57]

Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE symposium on security and privacy (SP). IEEE, IEEE, San Fransisco, CA, US, 691–706. https://doi.org/10.1109/SP.2019.00029

[58]

Seyedali Mirjalili and Seyedali Mirjalili. 2019. Genetic algorithm. Evolutionary Algorithms and Neural Networks: Theory and Applications 780 (2019), 43–55. https://doi.org/10.1007/978-3-319-93025-1_4

[59]

Jaclyn Ocumpaugh, Ryan Baker, Sujith Gowda, Neil Heffernan, and Cristina Heffernan. 2014. Population validity for Educational Data Mining models: A case study in affect detection. British Journal of Educational Technology 45, 3 (2014), 487–501.

[60]

Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2021. Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns 2, 11 (Nov. 2021), 100336. https://doi.org/10.1016/j.patter.2021.100336

[61]

Kenny Peng, Arunesh Mathur, and Arvind Narayanan. 2021. Mitigating dataset harms requires stewardship: Lessons from 1000 papers. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1 (2021), 16.

[62]

Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q. Weinberger. 2017. On fairness and calibration. Advances in neural information processing systems 30 (2017).

[63]

Apostolos Pyrgelis, Carmela Troncoso, and Emiliano De Cristofaro. 2017. Knock knock, who’s there? Membership inference on aggregate location data. In Network and Distributed Systems Security (NDSS) Symposium 2018. Network and Distributed Systems Security (NDSS) Symposium, San Diego, CA, USA, 15. http://dx.doi.org/10.14722/ndss.2018.23183

[64]

Inioluwa Deborah Raji, Emily M. Bender, Amandalynne Paullada, Emily Denton, and Alex Hanna. 2021. AI and the everything in the whole wide world benchmark. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. Neural Information Processing Systems, Virtual.

[65]

Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. 2019. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1668–1678. https://doi.org/10.18653/v1/P19-1163

[66]

Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R. Brubaker. 2020. How we’ve taught algorithms to see identity: Constructing race and gender in image databases for facial analysis. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (May 2020), 1–35. https://doi.org/10.1145/3392866

Digital Library

[67]

Daniel Schiff. 2021. Out of the laboratory and into the classroom: The future of artificial intelligence in education. AI & Society 36, 1 (March 2021), 331–348. https://doi.org/10.1007/s00146-020-01033-8

Digital Library

[68]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, San Jose, CA, 3–18.

[69]

Ravindra Singh, Bikash C Pal, and Rabih A Jabr. 2009. Statistical representation of distribution system loads using Gaussian mixture model. IEEE Transactions on Power Systems 25, 1 (2009), 29–37.

[70]

Frank Stinar and Nigel Bosch. 2022. Algorithmic unfairness mitigation in student models: When fairer methods lead to unintended results. In Proceedings of the 15th International Conference on Educational Data Mining. International Educational Data Mining Society, Durham, UK, 606–611. https://doi.org/10.5281/zenodo.6853135

[71]

Boris van Breugel, Trent Kyono, Jeroen Berrevoets, and Mihaela van der Schaar. 2021. DECAF: Generating fair synthetic data using causally-aware generative networks. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 22221–22233. https://proceedings.neurips.cc/paper/2021/hash/ba9fab001f67381e56e410575874d967-Abstract.html

[72]

Sahil Verma and Julia Rubin. 2018. Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness. Association for Computing Machinery, New York, NY, 1–7.

Digital Library

[73]

Angelina Wang, Vikram V Ramaswamy, and Olga Russakovsky. 2022. Towards intersectionality in machine learning: Including more identities, handling underrepresentation, and performing evaluation. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, Seoul Republic of Korea, 336–349. https://doi.org/10.1145/3531146.3533101

Digital Library

[74]

Darrell Whitley. 1994. A genetic algorithm tutorial. Statistics and computing 4, 2 (1994), 65–85.

[75]

Linda F. Wightman. 1998. LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. (1998).

[76]

Depeng Xu, Shuhan Yuan, Lu Zhang, and Xintao Wu. 2018. FairGAN: Fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, Seattle, WA, 570–575. https://doi.org/10.1109/BigData.2018.8622525

[77]

Jinsung Yoon, James Jordon, and Mihaela Van Der Schaar. 2018. GANITE: Estimation of individualized treatment effects using generative adversarial nets. In International Conference on Learning Representations. ICLR, Vancouver, Canada, 11.

[78]

Jie Yu, Jaume Amores, Nicu Sebe, and Qi Tian. 2006. A new study on distance metrics as similarity measurement. In 2006 IEEE International Conference on Multimedia and Expo. IEEE, IEEE, Toronto, ON, Canada, 533–536.

[79]

Jun Zhang, Graham Cormode, Cecilia M Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2017. Privbayes: Private data release via Bayesian networks. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 1–41.

Digital Library

[80]

Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2021. A comprehensive survey on transfer learning. Proc. IEEE 109, 1 (Jan. 2021), 43–76. https://doi.org/10.1109/JPROC.2020.3004555

Index Terms

Synthetic Dataset Generation for Fairer Unfairness Research
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Bio-inspired approaches
        Genetic algorithms

Recommendations

A genetically-optimised artificial life algorithm for complexity-based synthetic dataset generation
Abstract
Algorithmic evaluation is a vital step in developing new approaches to machine learning and relies on the availability of existing datasets. However, real-world datasets often do not cover the necessary complexity space required to ...
A Comparative Study of Synthetic Dataset Generation Techniques
Database and Expert Systems Applications
Abstract
Unrestricted availability of the datasets is important for the researchers to evaluate their strategies to solve the research problems. While publicly releasing the datasets, it is equally important to protect the privacy of the respective data ...
A human morning routine dataset
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems

To be able to evaluate and compare the quality of different approaches in research, general and publicly available datasets are needed. While in some areas, there exists a variety of such datasets that are constantly used by researchers, in the area of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

LAK '24: Proceedings of the 14th Learning Analytics and Knowledge Conference

March 2024

962 pages

ISBN:9798400716188

DOI:10.1145/3636555

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

LAK '24

LAK '24: The 14th Learning Analytics and Knowledge Conference

March 18 - 22, 2024

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 236 of 782 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
169
Total Downloads

Downloads (Last 12 months)169
Downloads (Last 6 weeks)52

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents