-
Context-Enhanced Language Models for Generating Multi-Paper Citations
Authors:
Avinash Anand,
Kritarth Prasad,
Ujjwal Goel,
Mohit Gupta,
Naman Lal,
Astha Verma,
Rajiv Ratn Shah
Abstract:
Citation text plays a pivotal role in elucidating the connection between scientific documents, demanding an in-depth comprehension of the cited paper. Constructing citations is often time-consuming, requiring researchers to delve into extensive literature and grapple with articulating relevant content. To address this challenge, the field of citation text generation (CTG) has emerged. However, whi…
▽ More
Citation text plays a pivotal role in elucidating the connection between scientific documents, demanding an in-depth comprehension of the cited paper. Constructing citations is often time-consuming, requiring researchers to delve into extensive literature and grapple with articulating relevant content. To address this challenge, the field of citation text generation (CTG) has emerged. However, while earlier methods have primarily centered on creating single-sentence citations, practical scenarios frequently necessitate citing multiple papers within a single paragraph. To bridge this gap, we propose a method that leverages Large Language Models (LLMs) to generate multi-citation sentences. Our approach involves a single source paper and a collection of target papers, culminating in a coherent paragraph containing multi-sentence citation text. Furthermore, we introduce a curated dataset named MCG-S2ORC, composed of English-language academic research papers in Computer Science, showcasing multiple citation instances. In our experiments, we evaluate three LLMs LLaMA, Alpaca, and Vicuna to ascertain the most effective model for this endeavor. Additionally, we exhibit enhanced performance by integrating knowledge graphs from target papers into the prompts for generating citation text. This research underscores the potential of harnessing LLMs for citation generation, opening a compelling avenue for exploring the intricate connections between scientific documents.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
KG-CTG: Citation Generation through Knowledge Graph-guided Large Language Models
Authors:
Avinash Anand,
Mohit Gupta,
Kritarth Prasad,
Ujjwal Goel,
Naman Lal,
Astha Verma,
Rajiv Ratn Shah
Abstract:
Citation Text Generation (CTG) is a task in natural language processing (NLP) that aims to produce text that accurately cites or references a cited document within a source document. In CTG, the generated text draws upon contextual cues from both the source document and the cited paper, ensuring accurate and relevant citation information is provided. Previous work in the field of citation generati…
▽ More
Citation Text Generation (CTG) is a task in natural language processing (NLP) that aims to produce text that accurately cites or references a cited document within a source document. In CTG, the generated text draws upon contextual cues from both the source document and the cited paper, ensuring accurate and relevant citation information is provided. Previous work in the field of citation generation is mainly based on the text summarization of documents. Following this, this paper presents a framework, and a comparative study to demonstrate the use of Large Language Models (LLMs) for the task of citation generation. Also, we have shown the improvement in the results of citation generation by incorporating the knowledge graph relations of the papers in the prompt for the LLM to better learn the relationship between the papers. To assess how well our model is performing, we have used a subset of standard S2ORC dataset, which only consists of computer science academic research papers in the English Language. Vicuna performs best for this task with 14.15 Meteor, 12.88 Rouge-1, 1.52 Rouge-2, and 10.94 Rouge-L. Also, Alpaca performs best, and improves the performance by 36.98% in Rouge-1, and 33.14% in Meteor by including knowledge graphs.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
System to Identify and Elide Superfluous JavaScript Code for Faster Webpage Loads
Authors:
Utkarsh Goel,
Moritz Steiner
Abstract:
Many websites import large JavaScript (JS) libraries to customize and enhance user experiences. Our data shows that many JS libraries are only partially utilized during a page load, and therefore, contain superfluous code that is never executed. Many top-ranked websites contain up to hundreds of kilobytes of compressed superfluous code and a JS resource on a median page contains 31% superfluous co…
▽ More
Many websites import large JavaScript (JS) libraries to customize and enhance user experiences. Our data shows that many JS libraries are only partially utilized during a page load, and therefore, contain superfluous code that is never executed. Many top-ranked websites contain up to hundreds of kilobytes of compressed superfluous code and a JS resource on a median page contains 31% superfluous code. Superfluous JS code inflates the page weight, and thereby, the time to download, parse, and compile a JS resource. It is therefore important to monitor the usage and optimize the payload of JS resources to improve Web performance. However, given that the webpage design and functionality could depend on a user's preferences or device, among many other factors, actively loading webpages in controlled environments cannot cover all possible conditions in which webpage content and functionality changes. In this paper, we show that passive measurement techniques, such as real user monitoring systems (RUM), that monitor the performance of real user page loads under different conditions can be leveraged to identify superfluous code. Using a custom man-in-the-middle proxy (similar to a content delivery network's proxy server), we designed a systematic approach for website developers that relies on pages loaded by real users to passively identify superfluous code on JS resources. We then elide any superfluous code from JS resources before subsequent page load requests. Our data shows that eliding superfluous JS code improves the median page load time by 5% and by at least 10% for pages in the long tail. Through results presented in this paper, we motivate for the need for rigorous monitoring of the usage of JS resources under different real world conditions, with the goal to improve Web performance.
△ Less
Submitted 16 March, 2020;
originally announced March 2020.
-
Web Performance with Android's Battery-Saver Mode
Authors:
Utkarsh Goel,
Stephen Ludin,
Moritz Steiner
Abstract:
A Web browser utilizes a device's CPU to parse HTML, build a Document Object Model, a Cascading Style Sheets Object Model, and render trees, and parse, compile, and execute computationally-heavy JavaScript. A powerful CPU is required to perform these tasks as quickly as possible and provide the user with a great experience. However, increased CPU performance comes with increased power consumption…
▽ More
A Web browser utilizes a device's CPU to parse HTML, build a Document Object Model, a Cascading Style Sheets Object Model, and render trees, and parse, compile, and execute computationally-heavy JavaScript. A powerful CPU is required to perform these tasks as quickly as possible and provide the user with a great experience. However, increased CPU performance comes with increased power consumption and reduced battery life on mobile devices. As an option to extend battery life, Android offers a battery-saver mode that when activated, turns off the power-hungry and faster processor cores and turns on the battery-conserving and slower processor cores on the device. The transition from using faster processor cores to using slower processor cores throttles the CPU clock speed on the device, and therefore impacts the webpage load process. We utilize a large-scale data-set collected by a real user monitoring system of a major content delivery network to investigate the impact of Android's battery-saver mode on various mobile Web performance metrics. Our analysis suggests that users of select smartphones of Huawei and Sony experience a sudden or gradual degradation in Web performance when battery-saver mode is active. Battery-saver mode on newer flagship smartphones, however, does not impact the mobile Web performance. Finally, we encourage for new website design goals that treat slow (and throttled-CPU) devices kindly in favor of improving end-user experience and suggest that Web performance measurements should be aware of user device battery charge levels to correctly associate Web performance.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Domain-Sharding for Faster HTTP/2 in Lossy Cellular Networks
Authors:
Utkarsh Goel,
Moritz Steiner,
Mike P. Wittie,
Stephen Ludin,
Martin Flack
Abstract:
HTTP/2 (h2) is a new standard for Web communications that already delivers a large share of Web traffic. Unlike HTTP/1, h2 uses only one underlying TCP connection. In a cellular network with high loss and sudden spikes in latency, which the TCP stack might interpret as loss, using a single TCP connection can negatively impact Web performance. In this paper, we perform an extensive analysis of real…
▽ More
HTTP/2 (h2) is a new standard for Web communications that already delivers a large share of Web traffic. Unlike HTTP/1, h2 uses only one underlying TCP connection. In a cellular network with high loss and sudden spikes in latency, which the TCP stack might interpret as loss, using a single TCP connection can negatively impact Web performance. In this paper, we perform an extensive analysis of real world cellular network traffic and design a testbed to emulate loss characteristics in cellular networks. We use the emulated cellular network to measure h2 performance in comparison to HTTP/1.1, for webpages synthesized from HTTP Archive repository data.
Our results show that, in lossy conditions, h2 achieves faster page load times (PLTs) for webpages with small objects. For webpages with large objects, h2 degrades the PLT. We devise a new domain-sharding technique that isolates large and small object downloads on separate connections. Using sharding, we show that under lossy cellular conditions, h2 over multiple connections improves the PLT compared to h2 with one connection and HTTP/1.1 with six connections. Finally, we recommend content providers and content delivery networks to apply h2-aware domain-sharding on webpages currently served over h2 for improved mobile Web performance.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.
-
Survey of End-to-End Mobile Network Measurement Testbeds, Tools, and Services
Authors:
Utkarsh Goel,
Mike P. Wittie,
Kimberly C. Claffy,
Andrew Le
Abstract:
Mobile (cellular) networks enable innovation, but can also stifle it and lead to user frustration when network performance falls below expectations. As mobile networks become the predominant method of Internet access, developer, research, network operator, and regulatory communities have taken an increased interest in measuring end-to-end mobile network performance to, among other goals, minimize…
▽ More
Mobile (cellular) networks enable innovation, but can also stifle it and lead to user frustration when network performance falls below expectations. As mobile networks become the predominant method of Internet access, developer, research, network operator, and regulatory communities have taken an increased interest in measuring end-to-end mobile network performance to, among other goals, minimize negative impact on application responsiveness. In this survey we examine current approaches to end-to-end mobile network performance measurement, diagnosis, and application prototyping. We compare available tools and their shortcomings with respect to the needs of researchers, developers, regulators, and the public. We intend for this survey to provide a comprehensive view of currently active efforts and some auspicious directions for future work in mobile network measurement and mobile application performance evaluation.
△ Less
Submitted 18 May, 2015; v1 submitted 18 November, 2014;
originally announced November 2014.