Search | arXiv e-print repository

Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring

Authors: Xuansheng Wu, Padmaja Pravin Saraf, Gyeong-Geon Lee, Ehsan Latif, Ninghao Liu, Xiaoming Zhai

Abstract: Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by which LLMs assign scores remain largely unclear. It is also uncertain how closely AI's scoring process mirrors that of humans, or if it adheres to the same gradin… ▽ More Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by which LLMs assign scores remain largely unclear. It is also uncertain how closely AI's scoring process mirrors that of humans, or if it adheres to the same grading criteria. To address this gap, this paper uncovers the grading rubrics that LLMs used to score students' written responses to science tasks and their alignment with human scores. We also examine whether enhancing the alignments can improve scoring accuracy. Specifically, we prompt LLMs to generate analytic rubrics that they use to assign scores and study the alignment gap with human grading rubrics. Based on a series of experiments with various configurations of LLM settings, we reveal a notable alignment gap between human and LLM graders. While LLMs can adapt quickly to scoring tasks, they often resort to shortcuts, bypassing deeper logical reasoning expected in human grading. We found that incorporating high-quality analytical rubrics designed to reflect human grading logic can mitigate this gap and enhance LLMs' scoring accuracy. These results caution against the simplistic application of LLMs in science education and highlight the importance of aligning LLM outputs with human expectations to ensure efficient and accurate automatic scoring. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Non-archival Presenting at EDM 2024 Workshop on Large Language Models

arXiv:2403.20329 [pdf, other]

ReALM: Reference Resolution As Language Modeling

Authors: Joel Ruben Antony Moniz, Soundarya Krishnan, Melis Ozyildirim, Prathamesh Saraf, Halim Cagri Ates, Yuan Zhang, Hong Yu

Abstract: Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in ref… ▽ More Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilized. This paper demonstrates how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality. We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it. △ Less

Submitted 18 August, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted at SIGDIAL 2024 (Oral presentation)

arXiv:2307.00156 [pdf, other]

Convex Optimization in Legged Robots

Authors: Prathamesh Saraf, Mustafa Shaikh, Myron Phan

Abstract: Convex optimization is crucial in controlling legged robots, where stability and optimal control are vital. Many control problems can be formulated as convex optimization problems, with a convex cost function and constraints capturing system dynamics. Our review focuses on active balancing problems and presents a general framework for formulating them as second-order cone programming (SOCP) for ro… ▽ More Convex optimization is crucial in controlling legged robots, where stability and optimal control are vital. Many control problems can be formulated as convex optimization problems, with a convex cost function and constraints capturing system dynamics. Our review focuses on active balancing problems and presents a general framework for formulating them as second-order cone programming (SOCP) for robustness and efficiency with existing interior point algorithms. We then discuss some prior work around the Zero Moment Point stability criterion, Linear Quadratic Regulator Control, and then the feedback model predictive control (MPC) approach to improve prediction accuracy and reduce computational costs. Finally, these techniques are applied to stabilize the robot for jumping and landing tasks. Further research in convex optimization of legged robots can have a significant societal impact. It can lead to improved gait planning and active balancing which enhances their ability to navigate complex environments, assist in search and rescue operations and perform tasks in hazardous environments. These advancements have the potential to revolutionize industries and help humans in daily life. △ Less

Submitted 30 June, 2023; originally announced July 2023.

Comments: 12 pages

arXiv:2106.03307 [pdf, other]

Terrain Adaptive Gait Transitioning for a Quadruped Robot using Model Predictive Control

Authors: Prathamesh Saraf, Abhishek Sarkar, Arshad Javed

Abstract: Legged robots can traverse challenging terrain, use perception to plan their safe foothold positions, and navigate the environment. Such unique mobility capabilities make these platforms a perfect candidate for scenarios such as search and rescue, inspection, and exploration tasks. While traversing through such terrains, the robot's instability is a significant concern. Many times the robot needs… ▽ More Legged robots can traverse challenging terrain, use perception to plan their safe foothold positions, and navigate the environment. Such unique mobility capabilities make these platforms a perfect candidate for scenarios such as search and rescue, inspection, and exploration tasks. While traversing through such terrains, the robot's instability is a significant concern. Many times the robot needs to switch gaits depending on its environment. Due to the complex dynamics of quadruped robots, classical PID control fails to provide high stability. Thus, there is a need for advanced control methods like the Model Predictive Control (MPC) which uses the system model and the nature of the terrain in order to predict the stable body pose of the robot. The controller also provides correction to any external disturbances that result in a change in the desired behavior of the robot. The MPC controller is designed in MATLAB, for full body torque control. The controller performance was verified on Boston Dynamics Spot in Webots simulator. The robot is able to provide correction for external perturbations up to 150 N and also resist falls till 80 cm. △ Less

Submitted 6 July, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: To be published in the proceedings of the 26th IEEE International Conference on Automation and Computing (ICAC'21)

arXiv:2010.13574 [pdf]

Modeling and Simulation of a Point to Point Spherical Articulated Manipulator using Optimal Control

Authors: Prathamesh Saraf, R. N. Ponnalagu

Abstract: This paper aims to design an optimal stability controller for a point to point trajectory tracking 3 degree of freedom articulated manipulator. The DH convention is used to obtain the forward and inverse kinematics of the manipulator. The manipulator dynamics are formulated using the Lagrange Euler method to obtain a nonlinear system. The complicated nonlinear equations obtained are then linearize… ▽ More This paper aims to design an optimal stability controller for a point to point trajectory tracking 3 degree of freedom articulated manipulator. The DH convention is used to obtain the forward and inverse kinematics of the manipulator. The manipulator dynamics are formulated using the Lagrange Euler method to obtain a nonlinear system. The complicated nonlinear equations obtained are then linearized in order to implement the optimal LQR. The simulations are performed in MATLAB and Simulink and the optimal controllers performance is tested for various conditions and the results are presented. The results obtained prove the superiority of LQR over conventional PID control. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 5 pages

arXiv:1909.03854 [pdf]

A Convolutional Neural Network Approach Towards Self-Driving Cars

Authors: Akhil Agnihotri, Prathamesh Saraf, Kriti Rajesh Bapnad

Abstract: A convolutional neural network (CNN) approach is used to implement a level 2 autonomous vehicle by mapping pixels from the camera input to the steering commands. The network automatically learns the maximum variable features from the camera input, hence requires minimal human intervention. Given realistic frames as input, the driving policy trained on the dataset by NVIDIA and Udacity can adapt to… ▽ More A convolutional neural network (CNN) approach is used to implement a level 2 autonomous vehicle by mapping pixels from the camera input to the steering commands. The network automatically learns the maximum variable features from the camera input, hence requires minimal human intervention. Given realistic frames as input, the driving policy trained on the dataset by NVIDIA and Udacity can adapt to real-world driving in a controlled environment. The CNN is tested on the CARLA open-source driving simulator. Details of a beta-testing platform are also presented, which consists of an ultrasonic sensor for obstacle detection and an RGBD camera for real-time position monitoring at 10Hz. Arduino Mega and Raspberry Pi are used for motor control and processing respectively to output the steering angle, which is converted to angular velocity for steering. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: 4 pages, 7 figures

arXiv:1709.02343 [pdf, other]

TIPS: Mining Top-K Locations to Minimize User-Inconvenience for Trajectory-Aware Services

Authors: Shubhadip Mitra, Priya Saraf, Arnab Bhattacharya

Abstract: Facility location problems aim to identify the best locations to set up new services. Majority of the existing works typically assume that the users are static. However, there exists a wide array of services such as fuel stations, ATMs, food joints, etc., that are widely accessed by mobile users besides the static ones. Such trajectory-aware services should, therefore, factor in the trajectories o… ▽ More Facility location problems aim to identify the best locations to set up new services. Majority of the existing works typically assume that the users are static. However, there exists a wide array of services such as fuel stations, ATMs, food joints, etc., that are widely accessed by mobile users besides the static ones. Such trajectory-aware services should, therefore, factor in the trajectories of its users rather than simply their static locations. In this work, we introduce the problem of optimal placement of facility locations for such trajectory-aware services that minimize the user inconvenience. The inconvenience of a user is the extra distance traveled by her from her regular path to avail a service. We call this the TIPS problem (Trajectory-aware Inconvenience-minimizing Placement of Services) and consider two variants of it. The goal of the first variant, MAXTIPS, is to minimize the maximum inconvenience faced by any user, while that of the second, AVGTIPS, is to minimize the average inconvenience over all the users. We show that both these problems are NP-hard, and propose multiple efficient heuristics to solve them. Empirical evaluation on real urban-scale road networks validate the efficiency and effectiveness of the proposed heuristics. △ Less

Submitted 2 August, 2019; v1 submitted 7 September, 2017; originally announced September 2017.

Journal ref: TKDE, 2019

arXiv:1702.02809 [pdf, other]

NetClus: A Scalable Framework for Locating Top-K Sites for Placement of Trajectory-Aware Services

Authors: Shubhadip Mitra, Priya Saraf, Richa Sharma, Arnab Bhattacharya, Harsh Bhandari, Sayan Ranu

Abstract: Facility location queries identify the best locations to set up new facilities for providing service to its users. Majority of the existing works in this space assume that the user locations are static. Such limitations are too restrictive for planning many modern real-life services such as fuel stations, ATMs, convenience stores, cellphone base-stations, etc. that are widely accessed by mobile us… ▽ More Facility location queries identify the best locations to set up new facilities for providing service to its users. Majority of the existing works in this space assume that the user locations are static. Such limitations are too restrictive for planning many modern real-life services such as fuel stations, ATMs, convenience stores, cellphone base-stations, etc. that are widely accessed by mobile users. The placement of such services should, therefore, factor in the mobility patterns or trajectories of the users rather than simply their static locations. In this work, we introduce the TOPS (Trajectory-Aware Optimal Placement of Services) query that locates the best k sites on a road network. The aim is to optimize a wide class of objective functions defined over the user trajectories. We show that the problem is NP-hard and even the greedy heuristic with an approximation bound of (1-1/e) fails to scale on urban-scale datasets. To overcome this challenge, we develop a multi-resolution clustering based indexing framework called NetClus. Empirical studies on real road network trajectory datasets show that NetClus offers solutions that are comparable in terms of quality with those of the greedy heuristic, while having practical response times and low memory footprints. Additionally, the NetClus framework can absorb dynamic updates in mobility patterns, handle constraints such as site-costs and capacity, and existing services, thereby providing an effective solution for modern urban-scale scenarios. △ Less

Submitted 12 April, 2017; v1 submitted 9 February, 2017; originally announced February 2017.

Comments: ICDE 2017 poster

ACM Class: H.2.8; H.2.4

arXiv:1604.00033 [pdf, other]

EMBERS at 4 years: Experiences operating an Open Source Indicators Forecasting System

Authors: Sathappan Muthiah, Patrick Butler, Rupinder Paul Khandpur, Parang Saraf, Nathan Self, Alla Rozovskaya, Liang Zhao, Jose Cadena, Chang-Tien Lu, Anil Vullikanti, Achla Marathe, Kristen Summers, Graham Katz, Andy Doyle, Jaime Arredondo, Dipak K. Gupta, David Mares, Naren Ramakrishnan

Abstract: EMBERS is an anticipatory intelligence system forecasting population-level events in multiple countries of Latin America. A deployed system from 2012, EMBERS has been generating alerts 24x7 by ingesting a broad range of data sources including news, blogs, tweets, machine coded events, currency rates, and food prices. In this paper, we describe our experiences operating EMBERS continuously for near… ▽ More EMBERS is an anticipatory intelligence system forecasting population-level events in multiple countries of Latin America. A deployed system from 2012, EMBERS has been generating alerts 24x7 by ingesting a broad range of data sources including news, blogs, tweets, machine coded events, currency rates, and food prices. In this paper, we describe our experiences operating EMBERS continuously for nearly 4 years, with specific attention to the discoveries it has enabled, correct as well as missed forecasts, and lessons learnt from participating in a forecasting tournament including our perspectives on the limits of forecasting and ethical considerations. △ Less

Submitted 31 March, 2016; originally announced April 2016.

Comments: Submitted to a conference

arXiv:1402.7035 [pdf, ps, other]

'Beating the news' with EMBERS: Forecasting Civil Unrest using Open Source Indicators

Authors: Naren Ramakrishnan, Patrick Butler, Sathappan Muthiah, Nathan Self, Rupinder Khandpur, Parang Saraf, Wei Wang, Jose Cadena, Anil Vullikanti, Gizem Korkmaz, Chris Kuhlman, Achla Marathe, Liang Zhao, Ting Hua, Feng Chen, Chang-Tien Lu, Bert Huang, Aravind Srinivasan, Khoa Trinh, Lise Getoor, Graham Katz, Andy Doyle, Chris Ackermann, Ilya Zavorin, Jim Ford , et al. (5 additional authors not shown)

Abstract: We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and conti… ▽ More We describe the design, implementation, and evaluation of EMBERS, an automated, 24x7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings. △ Less

Submitted 27 February, 2014; v1 submitted 27 February, 2014; originally announced February 2014.

ACM Class: K.4.1; J.4; I.2.7

Showing 1–10 of 10 results for author: Saraf, P