Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–45 of 45 results for author: Rangwala, H

.
  1. arXiv:2406.09534  [pdf, other

    cs.DB cs.LG

    FeatNavigator: Automatic Feature Augmentation on Tabular Data

    Authors: Jiaming Liang, Chuan Lei, Xiao Qin, Jiani Zhang, Asterios Katsifodimos, Christos Faloutsos, Huzefa Rangwala

    Abstract: Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and gene… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 15 pages, 41 figures

  2. arXiv:2406.06022  [pdf, other

    cs.LG cs.DC

    GraphStorm: all-in-one graph machine learning framework for industry applications

    Authors: Da Zheng, Xiang Song, Qi Zhu, Jian Zhang, Theodore Vasiloudis, Runjie Ma, Houyu Zhang, Zichen Wang, Soji Adeshina, Israt Nisa, Alejandro Mottini, Qingjun Cui, Huzefa Rangwala, Belinda Zeng, Christos Faloutsos, George Karypis

    Abstract: Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perfor… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Journal ref: KDD 2024

  3. arXiv:2405.12372  [pdf, other

    cs.LG

    DispaRisk: Assessing and Interpreting Disparity Risks in Datasets

    Authors: Jonathan Vasquez, Carlotta Domeniconi, Huzefa Rangwala

    Abstract: Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors, including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases presented in datasets, leading to adversarial impacts on subsets/groups of individuals, in many cases minority groups. To effectively mitigate these untoward effects,… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  4. arXiv:2402.14361  [pdf, other

    cs.LG

    OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

    Authors: Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis

    Abstract: Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to div… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024

  5. arXiv:2310.20046  [pdf, other

    cs.CL

    Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

    Authors: Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, George Karypis

    Abstract: Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, t… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  6. arXiv:2310.13196  [pdf, other

    cs.CL cs.DB cs.LG

    NameGuess: Column Name Expansion for Tabular Data

    Authors: Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Shen Wang, Huzefa Rangwala, George Karypis

    Abstract: Recent advances in large language models have revolutionized many sectors, including the database industry. One common challenge when dealing with large volumes of tabular data is the pervasive use of abbreviated column names, which can negatively impact performance on various data search, access, and understanding tasks. To address this issue, we introduce a new task, called NameGuess, to expand… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: This work has been accepted to EMNLP'23

  7. arXiv:2310.09656  [pdf, other

    cs.LG

    Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

    Authors: Hengrui Zhang, Jiani Zhang, Balasubramaniam Srinivasan, Zhengyuan Shen, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, George Karypis

    Abstract: Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces Tabsyn, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted late… ▽ More

    Submitted 11 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024 (Oral Presentation). Code is available at: https://github.com/amazon-science/tabsyn

  8. arXiv:2310.03320  [pdf, other

    cs.LG

    BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs

    Authors: Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N. Ioannidis, Huzefa Rangwala, Rishita Anubhai

    Abstract: Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs,… ▽ More

    Submitted 18 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  9. arXiv:2212.05975  [pdf, other

    cs.LG cs.CE

    GenSyn: A Multi-stage Framework for Generating Synthetic Microdata using Macro Data Sources

    Authors: Angeela Acharya, Siddhartha Sikdar, Sanmay Das, Huzefa Rangwala

    Abstract: Individual-level data (microdata) that characterizes a population, is essential for studying many real-world problems. However, acquiring such data is not straightforward due to cost and privacy constraints, and access is often limited to aggregated data (macro data) sources. In this study, we examine synthetic data generation as a tool to extrapolate difficult-to-obtain high-resolution data by co… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 10 pages, 6 figures, Accepted for the 2022 IEEE International Conference on Big Data

  10. arXiv:2211.11749  [pdf

    eess.IV cs.CV physics.med-ph

    Towards Automatic Prediction of Outcome in Treatment of Cerebral Aneurysms

    Authors: Ashutosh Jadhav, Satyananda Kashyap, Hakan Bulu, Ronak Dholakia, Amon Y. Liu, Tanveer Syeda-Mahmood, William R. Patterson, Hussain Rangwala, Mehdi Moradi

    Abstract: Intrasaccular flow disruptors treat cerebral aneurysms by diverting the blood flow from the aneurysm sac. Residual flow into the sac after the intervention is a failure that could be due to the use of an undersized device, or to vascular anatomy and clinical condition of the patient. We report a machine learning model based on over 100 clinical and imaging features that predict the outcome of wide… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: 10 pages

    Report number: https://s4.goeshow.com/amia/annual/2022/schedule_at_a_glance.cfm?session_key=1965BCBD-A832-92DD-9D05-FB2CB132FADB&session_date=

    Journal ref: AMAI 2022 Annual Symposium

  11. arXiv:2211.06428  [pdf, other

    q-bio.QM cs.LG q-bio.BM

    Training self-supervised peptide sequence models on artificially chopped proteins

    Authors: Gil Sadeh, Zichen Wang, Jasleen Grewal, Huzefa Rangwala, Layne Price

    Abstract: Representation learning for proteins has primarily focused on the global understanding of protein sequences regardless of their length. However, shorter proteins (known as peptides) take on distinct structures and functions compared to their longer counterparts. Unfortunately, there are not as many naturally occurring peptides available to be sequenced and therefore less peptide-specific data to t… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  12. arXiv:2208.07918  [pdf, other

    cs.LG cs.CY

    Ex-Ante Assessment of Discrimination in Dataset

    Authors: Jonathan Vasquez, Xavier Gitiaux, Huzefa Rangwala

    Abstract: Data owners face increasing liability for how the use of their data could harm under-priviliged communities. Stakeholders would like to identify the characteristics of data that lead to algorithms being biased against any particular demographic groups, for example, defined by their race, gender, age, and/or religion. Specifically, we are interested in identifying subsets of the feature space where… ▽ More

    Submitted 18 August, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

  13. arXiv:2204.12556  [pdf, other

    cs.LG cs.CY

    SoFaiR: Single Shot Fair Representation Learning

    Authors: Xavier Gitiaux, Huzefa Rangwala

    Abstract: To avoid discriminatory uses of their data, organizations can learn to map them into a representation that filters out information related to sensitive attributes. However, all existing methods in fair representation learning generate a fairness-information trade-off. To achieve different points on the fairness-information plane, one must train different models. In this paper, we first demonstrate… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

  14. arXiv:2204.02531  [pdf, other

    cs.CL cs.AI

    Improving Zero-Shot Event Extraction via Sentence Simplification

    Authors: Sneha Mehta, Huzefa Rangwala, Naren Ramakrishnan

    Abstract: The success of sites such as ACLED and Our World in Data have demonstrated the massive utility of extracting events in structured formats from large volumes of textual data in the form of news, social media, blogs and discussion forums. Event extraction can provide a window into ongoing geopolitical crises and yield actionable intelligence. With the proliferation of large pretrained language model… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

  15. arXiv:2202.12334  [pdf, other

    cs.CY cs.GT

    Trade-offs between Group Fairness Metrics in Societal Resource Allocation

    Authors: Tasfia Mashiat, Xavier Gitiaux, Huzefa Rangwala, Patrick J. Fowler, Sanmay Das

    Abstract: We consider social resource allocations that deliver an array of scarce supports to a diverse population. Such allocations pervade social service delivery, such as provision of homeless services, assignment of refugees to cities, among others. At issue is whether allocations are fair across sociodemographic groups and intersectional identities. Our paper shows that necessary trade-offs exist for f… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  16. arXiv:2112.05695  [pdf, other

    cs.LG cs.AI

    Causal Knowledge Guided Societal Event Forecasting

    Authors: Songgaojun Deng, Huzefa Rangwala, Yue Ning

    Abstract: Data-driven societal event forecasting methods exploit relevant historical information to predict future events. These methods rely on historical labeled data and cannot accurately predict events when data are limited or of poor quality. Studying causal effects between events goes beyond correlation analysis and can contribute to a more robust prediction of events. However, incorporating causality… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: 17 pages, 19 figures

    MSC Class: 68T07

  17. arXiv:2109.00151  [pdf, other

    cs.LG cs.DC

    Asynchronous Federated Learning for Sensor Data with Concept Drift

    Authors: Yujing Chen, Zheng Chai, Yue Cheng, Huzefa Rangwala

    Abstract: Federated learning (FL) involves multiple distributed devices jointly training a shared model without any of the participants having to reveal their local data to a centralized server. Most of previous FL approaches assume that data on devices are fixed and stationary during the training process. However, this assumption is unrealistic because these devices usually have varying sampling rates and… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

  18. arXiv:2108.13620  [pdf, other

    cs.CL cs.LG

    Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

    Authors: Jitin Krishnan, Antonios Anastasopoulos, Hemant Purohit, Huzefa Rangwala

    Abstract: Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks. In this work, we combine data augmentation approaches with a Teacher-Student training scheme to address this issue in a cross-lingual transfer setting for fine-tuning state-of-the-art pre-trained multilingual language models such as mBERT and XLM-R. We ev… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

    Comments: 12 pages, 5 tables, 7 Figures

  19. arXiv:2105.14044  [pdf, other

    cs.LG cs.CY

    Fair Representations by Compression

    Authors: Xavier Gitiaux, Huzefa Rangwala

    Abstract: Organizations that collect and sell data face increasing scrutiny for the discriminatory use of data. We propose a novel unsupervised approach to transform data into a compressed binary representation independent of sensitive attributes. We show that in an information bottleneck framework, a parsimonious representation should filter out information related to sensitive attributes if they are provi… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  20. arXiv:2103.07792  [pdf, other

    cs.CL cs.LG

    Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling

    Authors: Jitin Krishnan, Antonios Anastasopoulos, Hemant Purohit, Huzefa Rangwala

    Abstract: Predicting user intent and detecting the corresponding slots from text are two key problems in Natural Language Understanding (NLU). In the context of zero-shot learning, this task is typically approached by either using representations from pre-trained multilingual transformers such as mBERT, or by machine translating the source data into the known target language and then fine-tuning. Our work f… ▽ More

    Submitted 16 March, 2021; v1 submitted 13 March, 2021; originally announced March 2021.

  21. arXiv:2011.06738  [pdf, other

    cs.LG

    Metric-Free Individual Fairness with Cooperative Contextual Bandits

    Authors: Qian Hu, Huzefa Rangwala

    Abstract: Data mining algorithms are increasingly used in automated decision making across all walks of daily life. Unfortunately, as reported in several studies these algorithms inject bias from data and environment leading to inequitable and unfair solutions. To mitigate bias in machine learning, different formalizations of fairness have been proposed that can be categorized into group fairness and indivi… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  22. arXiv:2010.05958  [pdf, other

    cs.DC cs.LG cs.NI

    FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

    Authors: Zheng Chai, Yujing Chen, Ali Anwar, Liang Zhao, Yue Cheng, Huzefa Rangwala

    Abstract: Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem, where the clients lag due to data or (computing and network) resou… ▽ More

    Submitted 28 August, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: SC21

  23. arXiv:2006.08788  [pdf, other

    cs.LG stat.ML

    Learning Smooth and Fair Representations

    Authors: Xavier Gitiaux, Huzefa Rangwala

    Abstract: Organizations that own data face increasing legal liability for its discriminatory use against protected demographic groups, extending to contractual transactions involving third parties access and use of the data. This is problematic, since the original data owner cannot ex-ante anticipate all its future uses by downstream users. This paper explores the upstream ability to preemptively remove the… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

  24. arXiv:2003.11687  [pdf, other

    cs.CL cs.LG stat.ML

    Common-Knowledge Concept Recognition for SEVA

    Authors: Jitin Krishnan, Patrick Coronado, Hemant Purohit, Huzefa Rangwala

    Abstract: We build a common-knowledge concept recognition system for a Systems Engineer's Virtual Assistant (SEVA) which can be used for downstream tasks such as relation extraction, knowledge graph construction, and question-answering. The problem is formulated as a token classification task similar to named entity extraction. With the help of a domain expert and text processing methods, we construct a dat… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

    Comments: Source code available

  25. arXiv:2003.08753  [pdf, other

    cs.CV cs.HC cs.LG stat.ML

    FineHand: Learning Hand Shapes for American Sign Language Recognition

    Authors: Al Amin Hosain, Panneer Selvam Santhalingam, Parth Pathak, Huzefa Rangwala, Jana Kosecka

    Abstract: American Sign Language recognition is a difficult gesture recognition problem, characterized by fast, highly articulate gestures. These are comprised of arm movements with different hand shapes, facial expression and head movements. Among these components, hand shape is the vital, often the most discriminative part of a gesture. In this work, we present an approach for effective learning of hand s… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

  26. arXiv:2003.08743  [pdf, other

    cs.CV

    Generative Multi-Stream Architecture For American Sign Language Recognition

    Authors: Dom Huh, Sai Gurrapu, Frederick Olson, Huzefa Rangwala, Parth Pathak, Jana Kosecka

    Abstract: With advancements in deep model architectures, tasks in computer vision can reach optimal convergence provided proper data preprocessing and model parameter initialization. However, training on datasets with low feature-richness for complex applications limit and detriment optimal convergence below human performance. In past works, researchers have provided external sources of complementary data a… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  27. arXiv:2003.04991  [pdf, other

    cs.CL cs.LG cs.SI stat.ML

    Unsupervised and Interpretable Domain Adaptation to Rapidly Filter Tweets for Emergency Services

    Authors: Jitin Krishnan, Hemant Purohit, Huzefa Rangwala

    Abstract: During the onset of a disaster event, filtering relevant information from the social web data is challenging due to its sparse availability and practical limitations in labeling datasets of an ongoing crisis. In this paper, we hypothesize that unsupervised domain adaptation through multi-task learning can be a useful framework to leverage data from past crisis events for training efficient informa… ▽ More

    Submitted 20 October, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

    Comments: 8 pages, 4 Figures, 6 Tables, Source Code Available

  28. arXiv:2002.10937  [pdf, other

    cs.LG cs.CL stat.ML

    Diversity-Based Generalization for Unsupervised Text Classification under Domain Shift

    Authors: Jitin Krishnan, Hemant Purohit, Huzefa Rangwala

    Abstract: Domain adaptation approaches seek to learn from a source domain and generalize it to an unseen target domain. At present, the state-of-the-art unsupervised domain adaptation approaches for subjective text classification problems leverage unlabeled target data along with labeled source data. In this paper, we propose a novel method for domain adaptation of single-task text classification problems b… ▽ More

    Submitted 20 October, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: 16 pages, 3 figures, 5 Tables, Source Code Available

  29. arXiv:2001.00632  [pdf, other

    cs.LG cs.AI

    Academic Performance Estimation with Attention-based Graph Convolutional Networks

    Authors: Qian Hu, Huzefa Rangwala

    Abstract: Student's academic performance prediction empowers educational technologies including academic trajectory and degree planning, course recommender systems, early warning and advising systems. Given a student's past data (such as grades in prior courses), the task of student's performance prediction is to predict a student's grades in future courses. Academic programs are structured in a way that pr… ▽ More

    Submitted 26 December, 2019; originally announced January 2020.

  30. arXiv:1912.10202  [pdf, other

    cs.LG cs.SI stat.ML

    Graph Message Passing with Cross-location Attentions for Long-term ILI Prediction

    Authors: Songgaojun Deng, Shusen Wang, Huzefa Rangwala, Lijing Wang, Yue Ning

    Abstract: Forecasting influenza-like illness (ILI) is of prime importance to epidemiologists and health-care providers. Early prediction of epidemic outbreaks plays a pivotal role in disease intervention and control. Most existing work has either limited long-term prediction performance or lacks a comprehensive ability to capture spatio-temporal dependencies in data. Accurate and early disease forecasting m… ▽ More

    Submitted 28 December, 2019; v1 submitted 21 December, 2019; originally announced December 2019.

    Comments: 17 pages, 22 figures, 5 tables

  31. arXiv:1912.00835  [pdf, other

    cs.CL cs.LG

    Low Rank Factorization for Compact Multi-Head Self-Attention

    Authors: Sneha Mehta, Huzefa Rangwala, Naren Ramakrishnan

    Abstract: Effective representation learning from text has been an active area of research in the fields of NLP and text mining. Attention mechanisms have been at the forefront in order to learn contextual sentence representations. Current state-of-the-art approaches for many NLP tasks use large pre-trained language models such as BERT, XLNet and so on for learning representations. These models are based on… ▽ More

    Submitted 9 August, 2020; v1 submitted 26 November, 2019; originally announced December 2019.

    Comments: 9 pages, 5 figures

  32. arXiv:1911.02134  [pdf, other

    cs.DC cs.LG

    Asynchronous Online Federated Learning for Edge Devices with Non-IID Data

    Authors: Yujing Chen, Yue Ning, Martin Slawski, Huzefa Rangwala

    Abstract: Federated learning (FL) is a machine learning paradigm where a shared central model is learned across distributed edge devices while the training data remains on these devices. Federated Averaging (FedAvg) is the leading optimization method for training non-convex models in this setting with a synchronized protocol. However, the assumptions made by FedAvg are not realistic given the heterogeneity… ▽ More

    Submitted 20 October, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: 11 pages

  33. arXiv:1909.11232  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Sign Language Recognition Analysis using Multimodal Data

    Authors: Al Amin Hosain, Panneer Selvam Santhalingam, Parth Pathak, Jana Kosecka, Huzefa Rangwala

    Abstract: Voice-controlled personal and home assistants (such as the Amazon Echo and Apple Siri) are becoming increasingly popular for a variety of applications. However, the benefits of these technologies are not readily accessible to Deaf or Hard-ofHearing (DHH) users. The objective of this study is to develop and evaluate a sign recognition system using multiple modalities that can be used by DHH signers… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

    Comments: conference : IEEE DSAA, 2019, Washington DC

  34. arXiv:1905.05142  [pdf, other

    cs.LG stat.ML

    Federated Multi-task Hierarchical Attention Model for Sensor Analytics

    Authors: Yujing Chen, Yue Ning, Zheng Chai, Huzefa Rangwala

    Abstract: Sensors are an integral part of modern Internet of Things (IoT) applications. There is a critical need for the analysis of heterogeneous multivariate temporal data obtained from the individual sensors of these systems. In this paper we particularly focus on the problem of the scarce amount of training data available per sensor. We propose a novel federated multi-task hierarchical attention model (… ▽ More

    Submitted 13 May, 2019; originally announced May 2019.

  35. arXiv:1903.07609  [pdf, ps, other

    cs.LG stat.ML

    Multi-Differential Fairness Auditor for Black Box Classifiers

    Authors: Xavier Gitiaux, Huzefa Rangwala

    Abstract: Machine learning algorithms are increasingly involved in sensitive decision-making process with adversarial implications on individuals. This paper presents mdfa, an approach that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classif… ▽ More

    Submitted 18 March, 2019; originally announced March 2019.

  36. arXiv:1902.10213  [pdf, other

    cs.AI cs.CY

    Reliable Deep Grade Prediction with Uncertainty Estimation

    Authors: Qian Hu, Huzefa Rangwala

    Abstract: Currently, college-going students are taking longer to graduate than their parental generations. Further, in the United States, the six-year graduation rate has been 59% for decades. Improving the educational quality by training better-prepared students who can successfully graduate in a timely manner is critical. Accurately predicting students' grades in future courses has attracted much attentio… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

  37. arXiv:1801.05535  [pdf, other

    cs.LG

    ALE: Additive Latent Effect Models for Grade Prediction

    Authors: Zhiyun Ren, Xia Ning, Huzefa Rangwala

    Abstract: The past decade has seen a growth in the development and deployment of educational technologies for assisting college-going students in choosing majors, selecting courses and acquiring feedback based on past academic performance. Grade prediction methods seek to estimate a grade that a student may achieve in a course that she may take in the future (e.g., next term). Accurate and timely prediction… ▽ More

    Submitted 16 January, 2018; originally announced January 2018.

    Comments: 9 pages, 3 figures

  38. arXiv:1709.05433  [pdf, other

    cs.LG

    Grade Prediction with Temporal Course-wise Influence

    Authors: Zhiyun Ren, Xia Ning, Huzefa Rangwala

    Abstract: There is a critical need to develop new educational technology applications that analyze the data collected by universities to ensure that students graduate in a timely fashion (4 to 6 years); and they are well prepared for jobs in their respective fields of study. In this paper, we present a novel approach for analyzing historical educational records from a large, public university to perform nex… ▽ More

    Submitted 15 September, 2017; originally announced September 2017.

    Comments: 8 pages, 5 figures

  39. arXiv:1706.01583  [pdf, other

    cs.LG stat.ML

    Classifying Documents within Multiple Hierarchical Datasets using Multi-Task Learning

    Authors: Azad Naik, Anveshi Charuvaka, Huzefa Rangwala

    Abstract: Multi-task learning (MTL) is a supervised learning paradigm in which the prediction models for several related tasks are learned jointly to achieve better generalization performance. When there are only a few training examples per task, MTL considerably outperforms the traditional Single task learning (STL) in terms of prediction accuracy. In this work we develop an MTL based approach for classify… ▽ More

    Submitted 5 June, 2017; originally announced June 2017.

    Comments: IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2013

  40. arXiv:1706.01581  [pdf, other

    cs.LG stat.ML

    Embedding Feature Selection for Large-scale Hierarchical Classification

    Authors: Azad Naik, Huzefa Rangwala

    Abstract: Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time a… ▽ More

    Submitted 5 June, 2017; originally announced June 2017.

    Comments: IEEE International Conference on Big Data (IEEE BigData 2016)

  41. arXiv:1706.01214  [pdf, other

    cs.LG stat.ML

    Inconsistent Node Flattening for Improving Top-down Hierarchical Classification

    Authors: Azad Naik, Huzefa Rangwala

    Abstract: Large-scale classification of data where classes are structurally organized in a hierarchy is an important area of research. Top-down approaches that exploit the hierarchy during the learning and prediction phase are efficient for large scale hierarchical classification. However, accuracy of top-down approaches is poor due to error propagation i.e., prediction errors made at higher levels in the h… ▽ More

    Submitted 5 June, 2017; originally announced June 2017.

    Comments: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2016

  42. arXiv:1605.02269  [pdf, other

    cs.CY cs.LG

    Predicting Performance on MOOC Assessments using Multi-Regression Models

    Authors: Zhiyun Ren, Huzefa Rangwala, Aditya Johri

    Abstract: The past few years has seen the rapid growth of data min- ing approaches for the analysis of data obtained from Mas- sive Open Online Courses (MOOCs). The objectives of this study are to develop approaches to predict the scores a stu- dent may achieve on a given grade-related assessment based on information, considered as prior performance or prior ac- tivity in the course. We develop a personaliz… ▽ More

    Submitted 8 May, 2016; originally announced May 2016.

    Comments: 8 pages, 7 figures

  43. Next-Term Student Performance Prediction: A Recommender Systems Approach

    Authors: Mack Sweeney, Huzefa Rangwala, Jaime Lester, Aditya Johri

    Abstract: An enduring issue in higher education is student retention to successful graduation. National statistics indicate that most higher education institutions have four-year degree completion rates around 50 percent, or just half of their student populations. While there are prediction models which illuminate what factors assist with college student success, interventions that support course selections… ▽ More

    Submitted 6 April, 2016; originally announced April 2016.

    Comments: 27 pages, 5 figures, submitted to Journal of Educational Data Mining (JEDM)

    Journal ref: https://jedm.educationaldatamining.org/index.php/JEDM/article/view/JEDM2016-8-1-3

  44. arXiv:1603.00772  [pdf, other

    cs.AI

    Filter based Taxonomy Modification for Improving Hierarchical Classification

    Authors: Azad Naik, Huzefa Rangwala

    Abstract: Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in… ▽ More

    Submitted 15 October, 2016; v1 submitted 2 March, 2016; originally announced March 2016.

    Comments: The conference version of the paper is submitted for publication

  45. arXiv:1602.08033  [pdf, other

    cs.SI

    Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning

    Authors: Yue Ning, Sathappan Muthiah, Huzefa Rangwala, Naren Ramakrishnan

    Abstract: Forecasting events like civil unrest movements, disease outbreaks, financial market movements and government elections from open source indicators such as news feeds and social media streams is an important and challenging problem. From the perspective of human analysts and policy makers, forecasting algorithms need to provide supporting evidence and identify the causes related to the event of int… ▽ More

    Submitted 24 August, 2016; v1 submitted 25 February, 2016; originally announced February 2016.

    Comments: The conference version of the paper is submitted for publication