Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3636555.3636868acmotherconferencesArticle/Chapter ViewAbstractPublication PageslakConference Proceedingsconference-collections
research-article
Open access

Synthetic Dataset Generation for Fairer Unfairness Research

Published: 18 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Recent research has made strides toward fair machine learning. Relatively few datasets, however, are commonly examined to evaluate these fairness-aware algorithms, and even fewer in education domains, which can lead to a narrow focus on particular types of fairness issues. In this paper, we describe a novel dataset modification method that utilizes a genetic algorithm to induce many types of unfairness into datasets. Additionally, our method can generate an unfairness benchmark dataset from scratch (thus avoiding data collection in situations that might exploit marginalized populations), or modify an existing dataset used as a reference point. Our method can increase the unfairness by 156.3% on average across datasets and unfairness definitions while preserving AUC scores for models trained on the original dataset (just 0.3% change, on average). We investigate the generalization of our method across educational datasets with different characteristics and evaluate three common unfairness mitigation algorithms. The results show that our method can generate datasets with different types of unfairness, large and small datasets, different types of features, and which affect models trained with different classifiers. Datasets generated with this method can be used for benchmarking and testing for future research on the measurement and mitigation of algorithmic unfairness.

    References

    [1]
    John M. Abowd and Lars Vilhuber. 2008. How protective are synthetic data?. In Privacy in Statistical Databases: UNESCO Chair in Data Privacy International Conference, PSD 2008, Istanbul, Turkey, September 24-26, 2008. Proceedings. Springer, Springer, Heidelberg, 239–246.
    [2]
    Ricardo Aguiar and MTAG Collares-Pereira. 1992. TAG: a time-dependent, autoregressive, Gaussian model for generating synthetic hourly radiation. Solar energy 49, 3 (1992), 167–174.
    [3]
    Samuel A. Assefa, Danial Dervovic, Mahmoud Mahfouz, Robert E. Tillman, Prashant Reddy, and Manuela Veloso. 2021. Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance(ICAIF ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3383455.3422554
    [4]
    Ryan S. Baker, Albert T. Corbett, Kenneth R. Koedinger, and Angela Z. Wagner. 2004. Off-task behavior in the cognitive tutor classroom: When students "game the system". In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’04). ACM, New York, NY, 383–390. https://doi.org/10.1145/985692.985741
    [5]
    Ryan S. Baker, Lief Esbenshade, Jonathan Vitale, and Shamya Karumbaiah. 2023. Using demographic data as predictor variables: A questionable choice. Journal of Educational Data Mining 15, 2 (June 2023), 22–52. https://doi.org/10.5281/zenodo.7702628 Number: 2.
    [6]
    Ryan S. Baker and Aaron Hawn. 2021. Algorithmic bias in education. International Journal of Artificial Intelligence in Education 32 (Nov. 2021), 1052–1092. https://doi.org/10.1007/s40593-021-00285-9
    [7]
    Michelle Bao, Angela Zhou, Samantha Zottola, Brian Brubach, Brian Brubach, Sarah Desmarais, Aaron Horowitz, Kristian Lum, and Suresh Venkatasubramanian. 2021. It’s COMPASlicated: The messy relationship between RAI datasets and algorithmic fairness benchmarks. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1 (Dec. 2021), 1–18. https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/92cc227532d17e56e07902b254dfad10-Abstract-round1.html
    [8]
    Enrico Barbierato, Marco L. Della Vedova, Daniele Tessera, Daniele Toti, and Nicola Vanoli. 2022. A methodology for controlling bias and fairness in synthetic data generation. Applied Sciences 12, 9 (Jan. 2022), 4619. https://doi.org/10.3390/app12094619 Number: 9 Publisher: Multidisciplinary Digital Publishing Institute.
    [9]
    Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org. http://www.fairmlbook.org.
    [10]
    Solon Barocas and Andrew D. Selbst. 2016. Big data’s disparate impact. California Law Review 104, 671 (2016), 671–732.
    [11]
    Clara Belitz, Lan Jiang, and Nigel Bosch. 2021. Automating procedurally fair feature selection in machine learning. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. Association for Computing Machinery, New York, NY, 379–389. https://doi.org/10.1145/3461702.3462585
    [12]
    Clara Belitz, Jaclyn Ocumpaugh, Steven Ritter, Ryan S. Baker, Stephen E. Fancsali, and Nigel Bosch. 2022. Constructing categories: Moving beyond protected classes in algorithmic fairness. Journal of the Association for Information Science and Technology (2022), 1–6. https://doi.org/10.1002/asi.24643
    [13]
    Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2021. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research 50, 1 (2021), 3–44.
    [14]
    Su Lin Blodgett, Gilsinia Lopez, Alexandra Olteanu, Robert Sim, and Hanna Wallach. 2021. Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1004–1015. https://doi.org/10.18653/v1/2021.acl-long.81
    [15]
    Claire McKay Bowen and Fang Liu. 2020. Comparative study of differentially private data synthesis methods. Statist. Sci. 35, 2 (May 2020), 280–307. https://doi.org/10.1214/19-STS742 Publisher: Institute of Mathematical Statistics.
    [16]
    Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (Oct. 2001), 5–32. https://doi.org/10.1023/A:1010933404324
    [17]
    Brent Bridgeman. 2013. 13 Human Ratings and Automated Essay Evaluation. Handbook of automated essay evaluation: Current applications and new directions (2013), 221.
    [18]
    Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: Experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108–122.
    [19]
    Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency. PMLR, New York, NY, 77–91. https://proceedings.mlr.press/v81/buolamwini18a.html ISSN: 2640-3498.
    [20]
    Gregory Caiola and Jerome P. Reiter. 2010. Random forests for generating partially synthetic, categorical data.Trans. Data Priv. 3, 1 (2010), 27–42.
    [21]
    Alessandro Castelnovo, Riccardo Crupi, Nicole Inverardi, Daniele Regoli, and Andrea Cosentini. 2022. Investigating bias with a synthetic data generator: Empirical evidence and philosophical interpretation. http://arxiv.org/abs/2209.05889 arXiv:2209.05889 [cs, stat].
    [22]
    Richard J. Chen, Ming Y. Lu, Tiffany Y. Chen, Drew F. K. Williamson, and Faisal Mahmood. 2021. Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering 5, 6 (June 2021), 493–497. https://doi.org/10.1038/s41551-021-00751-8
    [23]
    Chanachok Chokwitthaya, Yimin Zhu, Supratik Mukhopadhyay, and Amirhosein Jafari. 2020. Applying the Gaussian mixture model to generate large synthetic data from a small data set. In Construction Research Congress 2020: Computer Applications. American Society of Civil Engineers, Tempe, Arizona, 1251–1260.
    [24]
    Jade Mai Cock, Muhammad Bilal, Richard Davis, Mirko Marras, and Tanja Kaser. 2023. Protected attributes tell us who, behavior tells us how: A comparison of demographic and behavioral oversampling for fair student success modeling. In LAK23: 13th International Learning Analytics and Knowledge Conference. ACM, Arlington TX USA, 488–498. https://doi.org/10.1145/3576050.3576149
    [25]
    Paulo Cortez and Alice Maria Gonçalves Silva. 2008. Using data mining to predict secondary school student performance. In Proceedings of 5th Future Business Technology Conference. EUROSIS-ETI, Porto, Portugal, 5–12.
    [26]
    Sasha Costanza-Chock, Inioluwa Deborah Raji, and Joy Buolamwini. 2022. Who audits the auditors? Recommendations from a field scan of the algorithmic auditing ecosystem. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, Seoul Republic of Korea, 1571–1583. https://doi.org/10.1145/3531146.3533213
    [27]
    Charles Darwin. 2004. On the Origin of Species, 1859. Routledge.
    [28]
    Frances Ding, Moritz Hardt, John Miller, and Ludwig Schmidt. 2021. Retiring adult: New datasets for fair machine learning. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). Vol. 34. Curran Associates, Inc., New York, 6478–6490. https://proceedings.neurips.cc/paper/2021/file/32e54441e6382a7fbacbbbaf3c450059-Paper.pdf
    [29]
    Jörg Drechsler. 2010. Using support vector machines for generating synthetic datasets. In International Conference on Privacy in Statistical Databases. Springer, Springer, Berlin, Heidelberg, 148–161. https://doi.org/10.1007/978-3-642-15838-4_14
    [30]
    Jörg Drechsler. 2011. Synthetic Datasets for Statistical Disclosure Control: Theory and Implementation. Vol. 201. Springer Science & Business Media, New York, NY. https://doi.org/10.1007/978-1-4614-0326-5
    [31]
    Cynthia Dwork. 2010. Differential privacy in new settings. In Proceedings of the 2010 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)(Proceedings). Society for Industrial and Applied Mathematics, Philadelphia, PA, 174–183. https://doi.org/10.1137/1.9781611973075.16
    [32]
    Cristóbal Esteban, Stephanie L. Hyland, and Gunnar Rätsch. 2017. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633 (2017), 13.
    [33]
    Virginia Eubanks. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s Publishing Group, New York, NY, USA.
    [34]
    Alessandro Fabris, Stefano Messina, Gianmaria Silvello, and Gian Antonio Susto. 2022. Algorithmic fairness datasets: the story so far. Data Mining and Knowledge Discovery 36, 6 (Nov. 2022), 2074–2152. https://doi.org/10.1007/s10618-022-00854-z
    [35]
    Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 259–268. https://doi.org/10.1145/2783258.2783311
    [36]
    Gianni Fenu, Roberta Galici, and Mirko Marras. 2022. Experts’ view on challenges and needs for fairness in artificial intelligence for education. In International Conference on Artificial Intelligence in Education. Springer, 243–255.
    [37]
    Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Denver, Colorado, USA). Association for Computing Machinery, New York, NY, USA, 1322–1333.
    [38]
    Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (April 2006), 3–42. https://doi.org/10.1007/s10994-006-6226-1
    [39]
    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
    [40]
    Aman Gupta, Deepak Bhatt, and Anubha Pandey. 2021. Transitioning from real to synthetic data: Quantifying the bias in model. arXiv. http://arxiv.org/abs/2105.04144 arXiv:2105.04144 [cs].
    [41]
    Inken Hagestedt, Yang Zhang, Mathias Humbert, Pascal Berrang, Tang Haixu, Wang XiaoFeng, and Michael Backes. 2019. MBeacon: Privacy-preserving beacons for DNA methylation data. In Network and Distributed Systems Security (NDSS) Symposium 2019. Network and Distributed Systems Security (NDSS) Symposium, San Diego, CA, USA, 15. https://dx.doi.org/10.14722/ndss.2019.23064
    [42]
    Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016).
    [43]
    K.A. De Jong. 1975. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph. D. Dissertation. University of Michigan, Ann Arbor, MI, USA.
    [44]
    Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (Oct. 2012), 1–33. https://doi.org/10.1007/s10115-011-0463-8
    [45]
    Sourabh Katoch, Sumit Singh Chauhan, and Vijay Kumar. 2021. A review on genetic algorithm: past, present, and future. Multimedia Tools and Applications 80 (2021), 8091–8126.
    [46]
    Os Keyes. 2018. The misgendering machines: Trans/HCI implications of automatic gender recognition. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (Nov. 2018), 88:1–88:22. https://doi.org/10.1145/3274357
    [47]
    Huda Khayrallah and Philipp Koehn. 2018. On the Impact of Various Types of Noise on Neural Machine Translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Association for Computational Linguistics, Melbourne, Australia, 74–83. https://doi.org/10.18653/v1/W18-2709
    [48]
    Adam Kortylewski, Bernhard Egger, Andreas Schneider, Thomas Gerig, Andreas Morel-Forster, and Thomas Vetter. 2019. Analyzing and reducing the damage of dataset bias to face recognition With synthetic data. https://openaccess.thecvf.com/content_CVPRW_2019/html/BEFA/Kortylewski_Analyzing_and_Reducing_the_Damage_of_Dataset_Bias_to_Face_CVPRW_2019_paper.html
    [49]
    Manford H. Kuhn and Thomas S. McPartland. 2017. An empirical investigation of self-attitudes. In Sociological Methods. Routledge, England, UK, 167–182.
    [50]
    Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Wenbin Zhang, and Eirini Ntoutsi. 2022. A survey on datasets for fairness-aware machine learning. WIREs Data Mining and Knowledge Discovery 12, 3 (2022), e1452. https://doi.org/10.1002/widm.1452
    [51]
    Adam Lipowski and Dorota Lipowska. 2012. Roulette-wheel selection via stochastic acceptance. Physica A: Statistical Mechanics and its Applications 391, 6 (2012), 2193–2196.
    [52]
    Qun Liu, Supratik Mukhopadhyay, Yimin Zhu, Ravindra Gudishala, Sanaz Saeidi, and Alimire Nabijiang. 2019. Improving route choice models by incorporating contextual factors via knowledge distillation. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, IEEE, Budapest, Hungary, 1–8. https://doi.org/10.1109/IJCNN.2019.8852482
    [53]
    Michael Madaio, Su Lin Blodgett, Elijah Mayfield, and Ezekiel Dixon-Román. 2021. Beyond "fairness:" Structural (in)justice lenses on AI for education. In The Ethics of Artificial Intelligence in Education: Current Challenges, Practices and Debates, W. Holmesand and K. Porayska-Pomsta (Eds.). Routledge, England, UK, 24. http://arxiv.org/abs/2105.08847 arXiv:2105.08847.
    [54]
    Daniel McDuff, Shuang Ma, Yale Song, and Ashish Kapoor. 2019. Characterizing Bias in Classifiers using Generative Models. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc., New York, 1–12. https://proceedings.neurips.cc/paper/2019/file/7f018eb7b301a66658931cb8a93fd6e8-Paper.pdf
    [55]
    Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. Comput. Surveys 54, 6 (July 2021), 115:1–115:35. https://doi.org/10.1145/3457607
    [56]
    Oren Melamud and Chaitanya Shivade. 2019. Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 35–45. https://doi.org/10.18653/v1/W19-1905
    [57]
    Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE symposium on security and privacy (SP). IEEE, IEEE, San Fransisco, CA, US, 691–706. https://doi.org/10.1109/SP.2019.00029
    [58]
    Seyedali Mirjalili and Seyedali Mirjalili. 2019. Genetic algorithm. Evolutionary Algorithms and Neural Networks: Theory and Applications 780 (2019), 43–55. https://doi.org/10.1007/978-3-319-93025-1_4
    [59]
    Jaclyn Ocumpaugh, Ryan Baker, Sujith Gowda, Neil Heffernan, and Cristina Heffernan. 2014. Population validity for Educational Data Mining models: A case study in affect detection. British Journal of Educational Technology 45, 3 (2014), 487–501.
    [60]
    Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2021. Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns 2, 11 (Nov. 2021), 100336. https://doi.org/10.1016/j.patter.2021.100336
    [61]
    Kenny Peng, Arunesh Mathur, and Arvind Narayanan. 2021. Mitigating dataset harms requires stewardship: Lessons from 1000 papers. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1 (2021), 16.
    [62]
    Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q. Weinberger. 2017. On fairness and calibration. Advances in neural information processing systems 30 (2017).
    [63]
    Apostolos Pyrgelis, Carmela Troncoso, and Emiliano De Cristofaro. 2017. Knock knock, who’s there? Membership inference on aggregate location data. In Network and Distributed Systems Security (NDSS) Symposium 2018. Network and Distributed Systems Security (NDSS) Symposium, San Diego, CA, USA, 15. http://dx.doi.org/10.14722/ndss.2018.23183
    [64]
    Inioluwa Deborah Raji, Emily M. Bender, Amandalynne Paullada, Emily Denton, and Alex Hanna. 2021. AI and the everything in the whole wide world benchmark. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. Neural Information Processing Systems, Virtual.
    [65]
    Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. 2019. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1668–1678. https://doi.org/10.18653/v1/P19-1163
    [66]
    Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R. Brubaker. 2020. How we’ve taught algorithms to see identity: Constructing race and gender in image databases for facial analysis. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (May 2020), 1–35. https://doi.org/10.1145/3392866
    [67]
    Daniel Schiff. 2021. Out of the laboratory and into the classroom: The future of artificial intelligence in education. AI & Society 36, 1 (March 2021), 331–348. https://doi.org/10.1007/s00146-020-01033-8
    [68]
    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, San Jose, CA, 3–18.
    [69]
    Ravindra Singh, Bikash C Pal, and Rabih A Jabr. 2009. Statistical representation of distribution system loads using Gaussian mixture model. IEEE Transactions on Power Systems 25, 1 (2009), 29–37.
    [70]
    Frank Stinar and Nigel Bosch. 2022. Algorithmic unfairness mitigation in student models: When fairer methods lead to unintended results. In Proceedings of the 15th International Conference on Educational Data Mining. International Educational Data Mining Society, Durham, UK, 606–611. https://doi.org/10.5281/zenodo.6853135
    [71]
    Boris van Breugel, Trent Kyono, Jeroen Berrevoets, and Mihaela van der Schaar. 2021. DECAF: Generating fair synthetic data using causally-aware generative networks. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 22221–22233. https://proceedings.neurips.cc/paper/2021/hash/ba9fab001f67381e56e410575874d967-Abstract.html
    [72]
    Sahil Verma and Julia Rubin. 2018. Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness. Association for Computing Machinery, New York, NY, 1–7.
    [73]
    Angelina Wang, Vikram V Ramaswamy, and Olga Russakovsky. 2022. Towards intersectionality in machine learning: Including more identities, handling underrepresentation, and performing evaluation. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, Seoul Republic of Korea, 336–349. https://doi.org/10.1145/3531146.3533101
    [74]
    Darrell Whitley. 1994. A genetic algorithm tutorial. Statistics and computing 4, 2 (1994), 65–85.
    [75]
    Linda F. Wightman. 1998. LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. (1998).
    [76]
    Depeng Xu, Shuhan Yuan, Lu Zhang, and Xintao Wu. 2018. FairGAN: Fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, Seattle, WA, 570–575. https://doi.org/10.1109/BigData.2018.8622525
    [77]
    Jinsung Yoon, James Jordon, and Mihaela Van Der Schaar. 2018. GANITE: Estimation of individualized treatment effects using generative adversarial nets. In International Conference on Learning Representations. ICLR, Vancouver, Canada, 11.
    [78]
    Jie Yu, Jaume Amores, Nicu Sebe, and Qi Tian. 2006. A new study on distance metrics as similarity measurement. In 2006 IEEE International Conference on Multimedia and Expo. IEEE, IEEE, Toronto, ON, Canada, 533–536.
    [79]
    Jun Zhang, Graham Cormode, Cecilia M Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2017. Privbayes: Private data release via Bayesian networks. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 1–41.
    [80]
    Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2021. A comprehensive survey on transfer learning. Proc. IEEE 109, 1 (Jan. 2021), 43–76. https://doi.org/10.1109/JPROC.2020.3004555

    Index Terms

    1. Synthetic Dataset Generation for Fairer Unfairness Research

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      LAK '24: Proceedings of the 14th Learning Analytics and Knowledge Conference
      March 2024
      962 pages
      ISBN:9798400716188
      DOI:10.1145/3636555
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 March 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data generation
      2. datasets
      3. fair machine learning
      4. student data

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      LAK '24

      Acceptance Rates

      Overall Acceptance Rate 236 of 782 submissions, 30%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 169
        Total Downloads
      • Downloads (Last 12 months)169
      • Downloads (Last 6 weeks)52
      Reflects downloads up to 12 Aug 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media