-
Horizontal norm compatibility of cohomology classes for $\mathrm{GSp}_{6}$
Authors:
Syed Waqar Ali Shah
Abstract:
We establish abstract horizontal norm relations involving the unramified Hecke-Frobenius polynomials that correspond under the Satake isomorhpism to the degree eight spinor $L$-factors of $ \mathrm{GSp}_{6} $. These relations apply to classes in the degree seven motivic cohomology of the Siegel modular sixfold obtained via Gysin pushforwards of Beilinson's Eisenstein symbol pulled back on one copy…
▽ More
We establish abstract horizontal norm relations involving the unramified Hecke-Frobenius polynomials that correspond under the Satake isomorhpism to the degree eight spinor $L$-factors of $ \mathrm{GSp}_{6} $. These relations apply to classes in the degree seven motivic cohomology of the Siegel modular sixfold obtained via Gysin pushforwards of Beilinson's Eisenstein symbol pulled back on one copy in a triple product of modular curves. The proof is based on a novel approach that circumvents the failure of the so-called multiplicity one hypothesis in our setting, which precludes the applicability of an existing technique. In a sequel, we combine our result with the previously established vertical norm relations for these classes to obtain new Euler systems for the eight dimensional Galois representations associated with certain non-endoscopic cohomological cuspidal automorphic representations of $ \mathrm{GSp}_{6} $.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
On constructing zeta elements for Shimura varieties
Authors:
Syed Waqar Ali Shah
Abstract:
We present a novel axiomatic framework for establishing horizontal norm relations in Euler systems that are built from pushforwards of classes in the motivic cohomology of Shimura varieties. This framework is uniformly applicable to the Euler systems of both algebraic cycles and Eisenstein classes. It also applies to non-spherical pairs of groups that fail to satisfy a local multiplicity one hypot…
▽ More
We present a novel axiomatic framework for establishing horizontal norm relations in Euler systems that are built from pushforwards of classes in the motivic cohomology of Shimura varieties. This framework is uniformly applicable to the Euler systems of both algebraic cycles and Eisenstein classes. It also applies to non-spherical pairs of groups that fail to satisfy a local multiplicity one hypothesis, and thus lie beyond the reach of existing methods. A key application of this work is the construction of an Euler system for the spinor Galois representations arising in the cohomology of Siegel modular varieties of genus three, which is undertaken in two companion articles.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Explicit Differentiable Slicing and Global Deformation for Cardiac Mesh Reconstruction
Authors:
Yihao Luo,
Dario Sesia,
Fanwen Wang,
Yinzhe Wu,
Wenhao Ding,
Jiahao Huang,
Fadong Shi Anoop Shah,
Amit Kaural,
Jamil Mayet,
Guang Yang,
ChoonHwai Yap
Abstract:
Mesh reconstruction of the cardiac anatomy from medical images is useful for shape and motion measurements and biophysics simulations to facilitate the assessment of cardiac function and health. However, 3D medical images are often acquired as 2D slices that are sparsely sampled and noisy, and mesh reconstruction on such data is a challenging task. Traditional voxel-based approaches rely on pre- a…
▽ More
Mesh reconstruction of the cardiac anatomy from medical images is useful for shape and motion measurements and biophysics simulations to facilitate the assessment of cardiac function and health. However, 3D medical images are often acquired as 2D slices that are sparsely sampled and noisy, and mesh reconstruction on such data is a challenging task. Traditional voxel-based approaches rely on pre- and post-processing that compromises image fidelity, while mesh-level deep learning approaches require mesh annotations that are difficult to get. Therefore, direct cross-domain supervision from 2D images to meshes is a key technique for advancing 3D learning in medical imaging, but it has not been well-developed. While there have been attempts to approximate the optimized meshes' slicing, few existing methods directly use 2D slices to supervise mesh reconstruction in a differentiable manner. Here, we propose a novel explicit differentiable voxelization and slicing (DVS) algorithm that allows gradient backpropagation to a mesh from its slices, facilitating refined mesh optimization directly supervised by the losses defined on 2D images. Further, we propose an innovative framework for extracting patient-specific left ventricle (LV) meshes from medical images by coupling DVS with a graph harmonic deformation (GHD) mesh morphing descriptor of cardiac shape that naturally preserves mesh quality and smoothness during optimization. Experimental results demonstrate that our method achieves state-of-the-art performance in cardiac mesh reconstruction tasks from CT and MRI, with an overall Dice score of 90% on multi-datasets, outperforming existing approaches. The proposed method can further quantify clinically useful parameters such as ejection fraction and global myocardial strains, closely matching the ground truth and surpassing the traditional voxel-based approach in sparse images.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Casting Light on Degeneracies: A Comprehensive Study of Lightcurve Variations in Microlensing Events OGLE-2017-BLG-0103 and OGLE-2017-BLG-0192
Authors:
Sarang Shah
Abstract:
This study investigates orbital parallax in gravitational microlensing events, focusing on OGLE-2017-BLG-0103 and OGLE-2017-BLG-0192. For events with timescales $\leq$ 60 days, a Jerk Parallax degeneracy arises due to high Jerk velocity ($\tilde{v_{j}}$), causing a four-fold continuous parallax degeneracy. OGLE-2017-BLG-0103, after incorporating orbital parallax, reveals four discrete degenerate p…
▽ More
This study investigates orbital parallax in gravitational microlensing events, focusing on OGLE-2017-BLG-0103 and OGLE-2017-BLG-0192. For events with timescales $\leq$ 60 days, a Jerk Parallax degeneracy arises due to high Jerk velocity ($\tilde{v_{j}}$), causing a four-fold continuous parallax degeneracy. OGLE-2017-BLG-0103, after incorporating orbital parallax, reveals four discrete degenerate parallax solutions, while OGLE-2017-BLG-0192 exhibits four discrete solutions without degeneracy. {The asymmetric light curve of OGLE-2017-BLG-0103 suggests a more probable model where xallarap is added to the parallax model, introducing tension. The galactic model analysis predicts a very low mass stellar lens for OGLE-2017-BLG-0192. For OGLE-2017-BLG-0103, degenerate solutions suggest a low-mass star or a darker lens in the disc, while the Xallarap+Parallax model also predicts a stellar lens in the bulge, with the source being a solar-type star orbited by a dwarf star.} This study presents five degenerate solutions for OGLE-2017-BLG-0103, emphasizing the potential for confirmation through high-resolution Adaptive Optics (AO) observations with Extremely Large Telescopes in the future. The complexities of degenerate scenarios in these microlensing events underscore the need to analyze special single-lens events in the Roman Telescope Era.
△ Less
Submitted 29 August, 2024; v1 submitted 28 August, 2024;
originally announced August 2024.
-
CR-Enabled NOMA Integrated Non-Terrestrial IoT Networks with Transmissive RIS
Authors:
Wali Ullah Khan,
Zain Ali,
Asad Mahmood,
Eva Lagunas,
Syed Tariq Shah,
Symeon Chatzinotas
Abstract:
This work proposes a T-RIS-equipped LEO satellite communication in cognitive radio-enabled integrated NTNs. In the proposed system, a GEO satellite operates as a primary network, and a T-RIS-equipped LEO satellite operates as a secondary IoT network. The objective is to maximize the sum rate of T-RIS-equipped LEO satellite communication using downlink NOMA while ensuring the service quality of GEO…
▽ More
This work proposes a T-RIS-equipped LEO satellite communication in cognitive radio-enabled integrated NTNs. In the proposed system, a GEO satellite operates as a primary network, and a T-RIS-equipped LEO satellite operates as a secondary IoT network. The objective is to maximize the sum rate of T-RIS-equipped LEO satellite communication using downlink NOMA while ensuring the service quality of GEO cellular users. Our framework simultaneously optimizes the total transmit power of LEO, NOMA power allocation for LEO IoT (LIoT) and T-RIS phase shift design subject to the service quality of LIoT and interference temperature to the primary GEO network. To solve the non-convex sum rate maximization problem, we first adopt successive convex approximations to reduce the complexity of the formulated optimization. Then, we divide the problem into two parts, i.e., power allocation of LEO and phase shift design of T-RIS. The power allocation problem is solved using KKT conditions, while the phase shift problem is handled by Taylor approximation and semidefinite programming. Numerical results are provided to validate the proposed optimization framework.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
On the origins of reverse Janssen effect
Authors:
Srujal Shah,
Ana Maria Mosquera Gomez,
Payman Jalali,
Lou Kondic
Abstract:
We consider experimentally and computationally the phenomenon of the reverse Janssen effect, involving the counterintuitive finding that the force on the base of a column containing granular particles may be larger than the weight of the granular material itself. This finding is in contrast to the common Janssen effect, for which the force on the base is smaller than the particle weight, illustrat…
▽ More
We consider experimentally and computationally the phenomenon of the reverse Janssen effect, involving the counterintuitive finding that the force on the base of a column containing granular particles may be larger than the weight of the granular material itself. This finding is in contrast to the common Janssen effect, for which the force on the base is smaller than the particle weight, illustrating one of the best-known differences between granular and liquid systems. We find that the reverse Janssen effect is strongly influenced by the pouring protocol: under Earth's gravitational field, and for pouring heights that are measured in tens of particle diameters, dynamic reverse Janssen effect is found. Dynamic reverse Janssen effect is an order of magnitude stronger than static one, found for small pouring heights. This differentiation between static and dynamic effects allows for the development of a better understanding of the general features of reverse Janssen effect, and of the comparison between experiments and simulations reported in this and previous works.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
A semi-centralized multi-agent RL framework for efficient irrigation scheduling
Authors:
Bernard T. Agyeman,
Benjamin Decard-Nelson,
Jinfeng Liu,
Sirish L. Shah
Abstract:
This paper proposes a Semi-Centralized Multi-Agent Reinforcement Learning (SCMARL) approach for irrigation scheduling in spatially variable agricultural fields, where management zones address spatial variability. The SCMARL framework is hierarchical in nature, with a centralized coordinator agent at the top level and decentralized local agents at the second level. The coordinator agent makes daily…
▽ More
This paper proposes a Semi-Centralized Multi-Agent Reinforcement Learning (SCMARL) approach for irrigation scheduling in spatially variable agricultural fields, where management zones address spatial variability. The SCMARL framework is hierarchical in nature, with a centralized coordinator agent at the top level and decentralized local agents at the second level. The coordinator agent makes daily binary irrigation decisions based on field-wide conditions, which are communicated to the local agents. Local agents determine appropriate irrigation amounts for specific management zones using local conditions. The framework employs state augmentation approach to handle non-stationarity in the local agents' environments. An extensive evaluation on a large-scale field in Lethbridge, Canada, compares the SCMARL approach with a learning-based multi-agent model predictive control scheduling approach, highlighting its enhanced performance, resulting in water conservation and improved Irrigation Water Use Efficiency (IWUE). Notably, the proposed approach achieved a 4.0% savings in irrigation water while enhancing the IWUE by 6.3%.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Quantum Buffer Design Using Petri Nets
Authors:
Syed Asad Shah,
A. Yavuz Oruç
Abstract:
This paper introduces a simplified quantum Petri net (QPN) model and uses this model to generalize classical SISO, SIMO, MISO, MIMO and priority buffers to their quantum counterparts. It provides a primitive storage element, namely a quantum S-R flip-flop design using quantum CNOT and SWAP gates that can be replicated to obtain a quantum register for any given number of qubits. The aforementioned…
▽ More
This paper introduces a simplified quantum Petri net (QPN) model and uses this model to generalize classical SISO, SIMO, MISO, MIMO and priority buffers to their quantum counterparts. It provides a primitive storage element, namely a quantum S-R flip-flop design using quantum CNOT and SWAP gates that can be replicated to obtain a quantum register for any given number of qubits. The aforementioned quantum buffers are then obtained using the simplified QPN model and quantum registers. $\!\!$The quantum S-R flip-flop and quantum buffer designs have been tested using OpenQASM and Qiskit on IBM quantum computers and simulators and the results validate the presented quantum S-R flip-flop and buffer designs.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
A Partial Near-infrared Guide Star Catalog for Thirty Meter Telescope Operations
Authors:
Sarang Shah,
Smitha Subramanian,
Avinash C. K.,
David R. Andersen,
Warren Skidmore,
G. C. Anupama,
Francisco Delgado,
Kim Gillies,
Maheshwar Gopinathan,
A. N. Ramaprakash,
B. E. Reddy,
T. Sivarani,
Annapurni Subramaniam
Abstract:
At first light, the Thirty Meter Telescope (TMT) near-infrared (NIR) instruments will be fed by a multiconjugate adaptive optics instrument known as the Narrow Field Infrared Adaptive Optics System (NFIRAOS). NFIRAOS will use six laser guide stars to sense atmospheric turbulence in a volume corresponding to a field of view of 2', but natural guide stars (NGSs) will be required to sense tip/tilt an…
▽ More
At first light, the Thirty Meter Telescope (TMT) near-infrared (NIR) instruments will be fed by a multiconjugate adaptive optics instrument known as the Narrow Field Infrared Adaptive Optics System (NFIRAOS). NFIRAOS will use six laser guide stars to sense atmospheric turbulence in a volume corresponding to a field of view of 2', but natural guide stars (NGSs) will be required to sense tip/tilt and focus. To achieve high sky coverage (50% at the north Galactic pole), the NFIRAOS client instruments use NIR on-instrument wavefront sensors that take advantage of the sharpening of the stars by NFIRAOS. A catalog of guide stars with NIR magnitudes as faint as 22 mag in the J band (Vega system), covering the TMT-observable sky, will be a critical resource for the efficient operation of NFIRAOS, and no such catalog currently exists. Hence, it is essential to develop such a catalog by computing the expected NIR magnitudes of stellar sources identified in deep optical sky surveys using their optical magnitudes. This paper discusses the generation of a partial NIR Guide Star Catalog (IRGSC), similar to the final IRGSC for TMT operations. The partial catalog is generated by applying stellar atmospheric models to the optical data of stellar sources from the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) optical data and then computing their expected NIR magnitudes. We validated the computed NIR magnitudes of the sources in some fields by using the available NIR data for those fields. We identified the remaining challenges of this approach. We outlined the path for producing the final IRGSC using the Pan-STARRS data. We have named the Python code to generate the IRGSC as irgsctool, which generates a list of NGS for a field using optical data from the Pan-STARRS 3pi survey and also a list of NGSs having observed NIR data from the UKIRT Infrared Deep Sky Survey if they are available.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
The R-Process Alliance: Fifth Data Release from the Search for R-Process-Enhanced Metal-poor Stars in the Galactic Halo with the GTC
Authors:
Avrajit Bandyopadhyay,
Rana Ezzeddine,
Carlos Allende Prieto,
Nima Aria,
Shivani P. Shah,
Timothy C. Beers,
Anna Frebel,
Terese T. Hansen,
Erika M. Holmbeck,
Vinicius M. Placco,
Ian U. Roederer,
Charli M. Sakari
Abstract:
Understanding the abundance pattern of metal-poor stars and the production of heavy elements through various nucleosynthesis processes offers crucial insights into the chemical evolution of the Milky Way, revealing primary sites and major sources of rapid neutron-capture process ($r$-process) material in the Universe. In this fifth data release from the $R$-Process Alliance, we present the detaile…
▽ More
Understanding the abundance pattern of metal-poor stars and the production of heavy elements through various nucleosynthesis processes offers crucial insights into the chemical evolution of the Milky Way, revealing primary sites and major sources of rapid neutron-capture process ($r$-process) material in the Universe. In this fifth data release from the $R$-Process Alliance, we present the detailed chemical abundances of 41 faint (down to V = 15.8) and extremely metal-poor (down to [Fe/H] = -3.3) halo stars selected from the R-Process Alliance (RPA). We obtained high-resolution spectra for these objects with the HORuS spectrograph on the Gran Telescopio Canarias. We measure the abundances of light, alpha, Fe-peak, and neutron-capture elements. We report the discovery of five CEMP, one limited-$r$, three $r$-I, and four $r$-II stars, and six Mg-poor stars. We also identify one star of a possible globular cluster origin at an extremely low metallicity at [Fe/H] = -3.0. This adds to the growing evidence of a lower limit metallicity floor for globular cluster abundances. We use the abundances of Fe-peak elements and the alpha-elements to investigate the contributions from different nucleosynthesis channels in the progenitor supernovae. We find the distribution of [Mg/Eu] as a function of [Fe/H] to have different enrichment levels, indicating different possible pathways and sites of their production. We also reveal differences in the trends of the neutron-capture element abundances of Sr, Ba, and Eu of various $r$-I and $r$-II stars from the RPA data releases, which provide constraints on their nucleosynthesis sites and subsequent evolution.
△ Less
Submitted 22 August, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Algebraicity of Spin $L$-functions for $\mathrm{GSp}_6$
Authors:
Ellen Eischen,
Giovanni Rosso,
Shrenik Shah
Abstract:
We prove algebraicity of critical values of certain Spin $L$-functions. More precisely, our results concern $L(s, Ď€\otimes χ, \mathrm{Spin})$ for cuspidal automorphic representations $Ď€$ associated to a holomorphic Siegel eigenform on $\mathrm{GSp}_6$, Dirichlet characters $χ$, and critical points $s$ to the right of the center of symmetry. We use the strategy of relating the $L$-values to propert…
▽ More
We prove algebraicity of critical values of certain Spin $L$-functions. More precisely, our results concern $L(s, π\otimes χ, \mathrm{Spin})$ for cuspidal automorphic representations $π$ associated to a holomorphic Siegel eigenform on $\mathrm{GSp}_6$, Dirichlet characters $χ$, and critical points $s$ to the right of the center of symmetry. We use the strategy of relating the $L$-values to properties of Eisenstein series, and a significant portion of the paper concerns the Fourier coefficients of these Eisenstein series. Unlike in prior algebraicity results following this strategy, our Eisenstein series are on a group $G$ that has no known moduli problem, and the $L$-functions are related to the Eisenstein series through a non-unique model.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Apple Intelligence Foundation Language Models
Authors:
Tom Gunter,
Zirui Wang,
Chong Wang,
Ruoming Pang,
Andy Narayanan,
Aonan Zhang,
Bowen Zhang,
Chen Chen,
Chung-Cheng Chiu,
David Qiu,
Deepak Gopinath,
Dian Ang Yap,
Dong Yin,
Feng Nan,
Floris Weers,
Guoli Yin,
Haoshuo Huang,
Jianyu Wang,
Jiarui Lu,
John Peebles,
Ke Ye,
Mark Lee,
Nan Du,
Qibin Chen,
Quentin Keunebroek
, et al. (130 additional authors not shown)
Abstract:
We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used…
▽ More
We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
A Fully Open-Source End-to-End Private 5G Network over Unlicensed Frequency Bands
Authors:
Faycal Bouhafs,
Sayed Amir Hoseini,
Syed Danial Ali Shah,
Frank den Hartog
Abstract:
The fifth generation of mobile networks (5G) represents the latest development in mobile communications. It has been designed to support several types of data traffic and to meet more performance requirements than ever before. These characteristics make 5G very attractive for current but also novel public and private industries and services. However, because of coverage, regulatory, business, and…
▽ More
The fifth generation of mobile networks (5G) represents the latest development in mobile communications. It has been designed to support several types of data traffic and to meet more performance requirements than ever before. These characteristics make 5G very attractive for current but also novel public and private industries and services. However, because of coverage, regulatory, business, and security reasons, many of these novel applications can only be deployed as part of a private network. The cost of licensed frequencies makes such approach prohibitive for many stakeholders, and therefore unlicensed frequency bands represent a more affordable option. Even so, private 5G networks for use in globally unlicensed frequency bands do not yet exist. In this paper we present the first end-to-end private 5G network operating in a globally unlicensed frequency band, using general purpose computers, open-source software and software-defined radio. We evidence its working and show that the choice of the hardware can significantly affect the performance of the network.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
CodedVO: Coded Visual Odometry
Authors:
Sachin Shah,
Naitri Rajyaguru,
Chahat Deep Singh,
Christopher Metzler,
Yiannis Aloimonos
Abstract:
Autonomous robots often rely on monocular cameras for odometry estimation and navigation. However, the scale ambiguity problem presents a critical barrier to effective monocular visual odometry. In this paper, we present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem by employing custom optics to physically encode metric depth information into imagery.…
▽ More
Autonomous robots often rely on monocular cameras for odometry estimation and navigation. However, the scale ambiguity problem presents a critical barrier to effective monocular visual odometry. In this paper, we present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem by employing custom optics to physically encode metric depth information into imagery. By incorporating this information into our odometry pipeline, we achieve state-of-the-art performance in monocular visual odometry with a known scale. We evaluate our method in diverse indoor environments and demonstrate its robustness and adaptability. We achieve a 0.08m average trajectory error in odometry evaluation on the ICL-NUIM indoor odometry dataset.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Harnessing DRL for URLLC in Open RAN: A Trade-off Exploration
Authors:
Rana Muhammad Sohaib,
Syed Tariq Shah,
Oluwakayode Onireti,
Muhammad Ali Imran
Abstract:
The advent of Ultra-Reliable Low Latency Communication (URLLC) alongside the emergence of Open RAN (ORAN) architectures presents unprecedented challenges and opportunities in Radio Resource Management (RRM) for next-generation communication systems. This paper presents a comprehensive trade-off analysis of Deep Reinforcement Learning (DRL) approaches designed to enhance URLLC performance within OR…
▽ More
The advent of Ultra-Reliable Low Latency Communication (URLLC) alongside the emergence of Open RAN (ORAN) architectures presents unprecedented challenges and opportunities in Radio Resource Management (RRM) for next-generation communication systems. This paper presents a comprehensive trade-off analysis of Deep Reinforcement Learning (DRL) approaches designed to enhance URLLC performance within ORAN's flexible and dynamic framework. By investigating various DRL strategies for optimising RRM parameters, we explore the intricate balance between reliability, latency, and the newfound adaptability afforded by ORAN principles. Through extensive simulation results, our study compares the efficacy of different DRL models in achieving URLLC objectives in an ORAN context, highlighting the potential of DRL to navigate the complexities introduced by ORAN. The proposed study provides valuable insights into the practical implementation of DRL-based RRM solutions in ORAN-enabled wireless networks. It sheds light on the benefits and challenges of integrating DRL and ORAN for URLLC enhancements. Our findings contribute to the ongoing discourse on advancements in URLLC and ORAN, offering a roadmap for future research to pursue efficient, reliable, and flexible communication systems.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run
Authors:
Gayathri Raman,
Samuele Ronchini,
James Delaunay,
Aaron Tohuvavohu,
Jamie A. Kennea,
Tyler Parsotan,
Elena Ambrosi,
Maria Grazia Bernardini,
Sergio Campana,
Giancarlo Cusumano,
Antonino D'Ai,
Paolo D'Avanzo,
Valerio D'Elia,
Massimiliano De Pasquale,
Simone Dichiara,
Phil Evans,
Dieter Hartmann,
Paul Kuin,
Andrea Melandri,
Paul O'Brien,
Julian P. Osborne,
Kim Page,
David M. Palmer,
Boris Sbarufatti,
Gianpiero Tagliaferri
, et al. (1797 additional authors not shown)
Abstract:
We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav…
▽ More
We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wave Transient Catalogs (GWTC-3). Targeted searches were carried out on the entire GW sample using the maximum--likelihood NITRATES pipeline on the BAT data made available via the GUANO infrastructure. We do not detect any significant electromagnetic emission that is temporally and spatially coincident with any of the GW candidates. We report flux upper limits in the 15-350 keV band as a function of sky position for all the catalog candidates. For GW candidates where the Swift-BAT false alarm rate is less than 10$^{-3}$ Hz, we compute the GW--BAT joint false alarm rate. Finally, the derived Swift-BAT upper limits are used to infer constraints on the putative electromagnetic emission associated with binary black hole mergers.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Green Resource Allocation in Cloud-Native O-RAN Enabled Small Cell Networks
Authors:
Rana M. Sohaib,
Syed Tariq Shah,
Oluwakayode Onireti,
Yusuf Sambo,
M. A. Imran
Abstract:
In the rapidly evolving landscape of 5G and beyond, cloud-native Open Radio Access Networks (O-RAN) present a paradigm shift towards intelligent, flexible, and sustainable network operations. This study addresses the intricate challenge of energy efficient (EE) resource allocation that services both enhanced Mobile Broadband (eMBB) and ultra-reliable low-latency communications (URLLC) users. We pr…
▽ More
In the rapidly evolving landscape of 5G and beyond, cloud-native Open Radio Access Networks (O-RAN) present a paradigm shift towards intelligent, flexible, and sustainable network operations. This study addresses the intricate challenge of energy efficient (EE) resource allocation that services both enhanced Mobile Broadband (eMBB) and ultra-reliable low-latency communications (URLLC) users. We propose a novel distributed learning framework leveraging on-policy and off-policy transfer learning strategies within a deep reinforcement learning (DRL)--based model to facilitate online resource allocation decisions under different channel conditions. The simulation results explain the efficacy of the proposed method, which rapidly adapts to dynamic network states, thereby achieving a green resource allocation.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN
Authors:
Rana M. Sohaib,
Syed Tariq Shah,
Oluwakayode Onireti,
Yusuf Sambo,
Qammer H. Abbasi,
M. A. Imran
Abstract:
This work addresses resource allocation challenges in multi-cell wireless systems catering to enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low Latency Communications (URLLC) users. We present a distributed learning framework tailored to O-RAN network architectures. Leveraging a Thompson sampling-based Deep Reinforcement Learning (DRL) algorithm, our approach provides real-time resource allo…
▽ More
This work addresses resource allocation challenges in multi-cell wireless systems catering to enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low Latency Communications (URLLC) users. We present a distributed learning framework tailored to O-RAN network architectures. Leveraging a Thompson sampling-based Deep Reinforcement Learning (DRL) algorithm, our approach provides real-time resource allocation decisions, aligning with evolving network structures. The proposed approach facilitates online decision-making for resource allocation by deploying trained execution agents at Near-Real Time Radio Access Network Intelligent Controllers (Near-RT RICs) located at network edges. Simulation results demonstrate the algorithm's effectiveness in meeting Quality of Service (QoS) requirements for both eMBB and URLLC users, offering insights into optimising resource utilisation in dynamic wireless environments.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
LokiLM: Technical Report
Authors:
Justin Kiefel,
Shrey Shah
Abstract:
In this work, we introduce LokiLM, a 1.4B parameter large language model trained on 500B tokens. Our model performs strongly in natural language reasoning tasks and achieves state-of-the-art performance among models with 1.5B parameters or less. LokiLM is trained using multi-teacher knowledge distillation and high-quality training data to achieve benchmark results competitive with larger models tr…
▽ More
In this work, we introduce LokiLM, a 1.4B parameter large language model trained on 500B tokens. Our model performs strongly in natural language reasoning tasks and achieves state-of-the-art performance among models with 1.5B parameters or less. LokiLM is trained using multi-teacher knowledge distillation and high-quality training data to achieve benchmark results competitive with larger models trained on significantly more tokens. We support these findings by introducing steps to avoid benchmark contamination and overfitting throughout our development process. Despite its promising performance, LokiLM exhibits a concerning amount of hallucinations and scores poorly on the TruthfulQA benchmark, so we do not release the model publicly.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
Authors:
Qi Wang,
Zhou Xu,
Yuming Lin,
Jingtao Ye,
Hongsheng Li,
Guangming Zhu,
Syed Afaq Ali Shah,
Mohammed Bennamoun,
Liang Zhang
Abstract:
Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing thei…
▽ More
Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing their vast potential for advancement. However, the development in this field is currently slowed by the lack of comprehensive, large-scale datasets, which are critical for developing robust recognition frameworks. To bridge this gap, we introduces DailyDVS-200, a meticulously curated benchmark dataset tailored for the event-based action recognition community. DailyDVS-200 is extensive, covering 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences. This dataset is designed to reflect a broad spectrum of action types, scene complexities, and data acquisition diversity. Each sequence in the dataset is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions. Moreover, DailyDVS-200 is structured to facilitate a wide range of research paths, offering a solid foundation for both validating existing approaches and inspiring novel methodologies. By setting a new benchmark in the field, we challenge the current limitations of neuromorphic data processing and invite a surge of new approaches in event-based action recognition techniques, which paves the way for future explorations in neuromorphic computing and beyond. The dataset and source code are available at https://github.com/QiWang233/DailyDVS-200.
△ Less
Submitted 13 July, 2024; v1 submitted 6 July, 2024;
originally announced July 2024.
-
Quantum interrogation using weak value measurement
Authors:
Muhammad Abdullah Ijaz,
Syed Bilal Hyder Shah,
Muhammad Sabieh Anwar
Abstract:
We propose a scheme for quantum interrogation measurements using constructive interference and post-selection to achieve single-pass high-efficiency detection for imperfect or semi-transparent absorbers. We illustrate that our method works for heralded single-photon as well as weak attenuated sources. We also study the influence of error from our equipment and show that post-selection renders robu…
▽ More
We propose a scheme for quantum interrogation measurements using constructive interference and post-selection to achieve single-pass high-efficiency detection for imperfect or semi-transparent absorbers. We illustrate that our method works for heralded single-photon as well as weak attenuated sources. We also study the influence of error from our equipment and show that post-selection renders robustness to our scheme against noise. We further demonstrate that with a small extension, we can quantify the transmittance of the imperfect absorber by using the process of weak value amplification (WVA)
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Development of Cognitive Intelligence in Pre-trained Language Models
Authors:
Raj Sanjay Shah,
Khushi Bhardwaj,
Sashank Varma
Abstract:
Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models (PLMs). The increasing cognitive alignment of these models has made them candidates for cognitive science theories. Prior research into the emergent cognitive abilities of PLMs has largely been path-independent to model training, i.e., has focused on the final model weights and not the intermediate s…
▽ More
Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models (PLMs). The increasing cognitive alignment of these models has made them candidates for cognitive science theories. Prior research into the emergent cognitive abilities of PLMs has largely been path-independent to model training, i.e., has focused on the final model weights and not the intermediate steps. However, building plausible models of human cognition using PLMs would benefit from considering the developmental alignment of their performance during training to the trajectories of children's thinking. Guided by psychometric tests of human intelligence, we choose four sets of tasks to investigate the alignment of ten popular families of PLMs and evaluate their available intermediate and final training steps. These tasks are Numerical ability, Linguistic abilities, Conceptual understanding, and Fluid reasoning. We find a striking regularity: regardless of model size, the developmental trajectories of PLMs consistently exhibit a window of maximal alignment to human cognitive development. Before that window, training appears to endow "blank slate" models with the requisite structure to be poised to rapidly learn from experience. After that window, training appears to serve the engineering goal of reducing loss but not the scientific goal of increasing alignment with human cognition.
△ Less
Submitted 12 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Transformer-based segmentation of adnexal lesions and ovarian implants in CT images
Authors:
Aneesh Rangnekar,
Kevin M. Boehm,
Emily A. Aherne,
Ines Nikolovski,
Natalie Gangai,
Ying Liu,
Dimitry Zamarin,
Kara L. Roche,
Sohrab P. Shah,
Yulia Lakhman,
Harini Veeraraghavan
Abstract:
Two self-supervised pretrained transformer-based segmentation models (SMIT and Swin UNETR) fine-tuned on a dataset of ovarian cancer CT images provided reasonably accurate delineations of the tumors in an independent test dataset. Tumors in the adnexa were segmented more accurately by both transformers (SMIT and Swin UNETR) than the omental implants. AI-assisted labeling performed on 72 out of 245…
▽ More
Two self-supervised pretrained transformer-based segmentation models (SMIT and Swin UNETR) fine-tuned on a dataset of ovarian cancer CT images provided reasonably accurate delineations of the tumors in an independent test dataset. Tumors in the adnexa were segmented more accurately by both transformers (SMIT and Swin UNETR) than the omental implants. AI-assisted labeling performed on 72 out of 245 omental implants resulted in smaller manual editing effort of 39.55 mm compared to full manual correction of partial labels of 106.49 mm and resulted in overall improved accuracy performance. Both SMIT and Swin UNETR did not generate any false detection of omental metastases in the urinary bladder and relatively few false detections in the small bowel, with 2.16 cc on average for SMIT and 7.37 cc for Swin UNETR respectively.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Authors:
Jiangshu Du,
Yibo Wang,
Wenting Zhao,
Zhongfen Deng,
Shuaiqi Liu,
Renze Lou,
Henry Peng Zou,
Pranav Narayanan Venkit,
Nan Zhang,
Mukund Srinath,
Haoran Ranran Zhang,
Vipul Gupta,
Yinghui Li,
Tao Li,
Fei Wang,
Qin Liu,
Tianlin Liu,
Pengzhi Gao,
Congying Xia,
Chen Xing,
Jiayang Cheng,
Zhaowei Wang,
Ying Su,
Raj Sanjay Shah,
Ruohao Guo
, et al. (15 additional authors not shown)
Abstract:
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th…
▽ More
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?
This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.
△ Less
Submitted 25 June, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Finite Alphabet Fast List Decoders for Polar Codes
Authors:
Syed Aizaz Ali Shah,
Gerhard Bauch
Abstract:
The so-called fast polar decoding schedules are meant to improve the decoding speed of the sequential-natured successive cancellation list decoders. The decoding speedup is achieved by replacing various parts of the serial decoding process with efficient special-purpose decoder nodes. This work incorporates the fast decoding schedules for polar codes into their quantized finite alphabet decoding.…
▽ More
The so-called fast polar decoding schedules are meant to improve the decoding speed of the sequential-natured successive cancellation list decoders. The decoding speedup is achieved by replacing various parts of the serial decoding process with efficient special-purpose decoder nodes. This work incorporates the fast decoding schedules for polar codes into their quantized finite alphabet decoding. In a finite alphabet successive cancellation list decoder, the log-likelihood ratio computations are replaced with lookup operations on low-resolution integer messages. The lookup tables are designed using the information bottleneck method. It is shown that the finite alphabet decoders can also leverage the special decoder nodes found in the literature. Besides their inherent decoding speed improvement, the use of these special decoder nodes drastically reduces the number of lookup tables required to perform the finite alphabet decoding. In order to perform quantized decoding using lookup operations, the proposed decoders require up to 93% less unique lookup tables as compared to the ones that use the conventional successive cancellation schedule. Moreover, the proposed decoders exhibit negligible loss in error correction performance without necessitating alterations to the lookup table design process.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models
Authors:
Harsh Nishant Lalai,
Aashish Anantha Ramakrishnan,
Raj Sanjay Shah,
Dongwon Lee
Abstract:
With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has t…
▽ More
With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has two key advantages, (1) we analyze research based on the specific intentions behind different watermarking techniques, evaluation datasets used, watermarking addition, and removal methods to construct a cohesive taxonomy. (2) We highlight the gaps and open challenges in text watermarking to promote research in protecting text authorship. This extensive coverage and detailed analysis sets our work apart, offering valuable insights into the evolving landscape of text watermarking in language models.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Eye in the Sky: Detection and Compliance Monitoring of Brick Kilns using Satellite Imagery
Authors:
Rishabh Mondal,
Shataxi Dubey,
Vannsh Jani,
Shrimay Shah,
Suraj Jaiswal,
Zeel B Patel,
Nipun Batra
Abstract:
Air pollution kills 7 million people annually. The brick manufacturing industry accounts for 8%-14% of air pollution in the densely populated Indo-Gangetic plain. Due to the unorganized nature of brick kilns, policy violation detection, such as proximity to human habitats, remains challenging. While previous studies have utilized computer vision-based machine learning methods for brick kiln detect…
▽ More
Air pollution kills 7 million people annually. The brick manufacturing industry accounts for 8%-14% of air pollution in the densely populated Indo-Gangetic plain. Due to the unorganized nature of brick kilns, policy violation detection, such as proximity to human habitats, remains challenging. While previous studies have utilized computer vision-based machine learning methods for brick kiln detection from satellite imagery, they utilize proprietary satellite data and rarely focus on compliance with government policies. In this research, we introduce a scalable framework for brick kiln detection and automatic compliance monitoring. We use Google Maps Static API to download the satellite imagery followed by the YOLOv8x model for detection. We identified and hand-verified 19579 new brick kilns across 9 states within the Indo-Gangetic plain. Furthermore, we automate and test the compliance to the policies affecting human habitats, rivers and hospitals. Our results show that a substantial number of brick kilns do not meet the compliance requirements. Our framework offers a valuable tool for governments worldwide to automate and enforce policy regulations for brick kilns, addressing critical environmental and public health concerns.
△ Less
Submitted 23 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Enhancing Question Answering on Charts Through Effective Pre-training Tasks
Authors:
Ashim Gupta,
Vivek Gupta,
Shuo Zhang,
Yujie He,
Ning Zhang,
Shalin Shah
Abstract:
To completely understand a document, the use of textual information is not enough. Understanding visual cues, such as layouts and charts, is also required. While the current state-of-the-art approaches for document understanding (both OCR-based and OCR-free) work well, a thorough analysis of their capabilities and limitations has not yet been performed. Therefore, in this work, we addresses the li…
▽ More
To completely understand a document, the use of textual information is not enough. Understanding visual cues, such as layouts and charts, is also required. While the current state-of-the-art approaches for document understanding (both OCR-based and OCR-free) work well, a thorough analysis of their capabilities and limitations has not yet been performed. Therefore, in this work, we addresses the limitation of current VisualQA models when applied to charts and plots. To investigate shortcomings of the state-of-the-art models, we conduct a comprehensive behavioral analysis, using ChartQA as a case study. Our findings indicate that existing models particularly underperform in answering questions related to the chart's structural and visual context, as well as numerical information. To address these issues, we propose three simple pre-training tasks that enforce the existing model in terms of both structural-visual knowledge, as well as its understanding of numerical questions. We evaluate our pre-trained model (called MatCha-v2) on three chart datasets - both extractive and abstractive question datasets - and observe that it achieves an average improvement of 1.7% over the baseline model.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Authors:
Sachin Shah,
Matthew Albert Chan,
Haoming Cai,
Jingxi Chen,
Sakshum Kulshrestha,
Chahat Deep Singh,
Yiannis Aloimonos,
Christopher Metzler
Abstract:
Point-spread-function (PSF) engineering is a well-established computational imaging technique that uses phase masks and other optical elements to embed extra information (e.g., depth) into the images captured by conventional CMOS image sensors. To date, however, PSF-engineering has not been applied to neuromorphic event cameras; a powerful new image sensing technology that responds to changes in t…
▽ More
Point-spread-function (PSF) engineering is a well-established computational imaging technique that uses phase masks and other optical elements to embed extra information (e.g., depth) into the images captured by conventional CMOS image sensors. To date, however, PSF-engineering has not been applied to neuromorphic event cameras; a powerful new image sensing technology that responds to changes in the log-intensity of light.
This paper establishes theoretical limits (Cramér Rao bounds) on 3D point localization and tracking with PSF-engineered event cameras. Using these bounds, we first demonstrate that existing Fisher phase masks are already near-optimal for localizing static flashing point sources (e.g., blinking fluorescent molecules). We then demonstrate that existing designs are sub-optimal for tracking moving point sources and proceed to use our theory to design optimal phase masks and binary amplitude masks for this task. To overcome the non-convexity of the design problem, we leverage novel implicit neural representation based parameterizations of the phase and amplitude masks. We demonstrate the efficacy of our designs through extensive simulations. We also validate our method with a simple prototype.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
How Well Do Deep Learning Models Capture Human Concepts? The Case of the Typicality Effect
Authors:
Siddhartha K. Vemuri,
Raj Sanjay Shah,
Sashank Varma
Abstract:
How well do representations learned by ML models align with those of humans? Here, we consider concept representations learned by deep learning models and evaluate whether they show a fundamental behavioral signature of human concepts, the typicality effect. This is the finding that people judge some instances (e.g., robin) of a category (e.g., Bird) to be more typical than others (e.g., penguin).…
▽ More
How well do representations learned by ML models align with those of humans? Here, we consider concept representations learned by deep learning models and evaluate whether they show a fundamental behavioral signature of human concepts, the typicality effect. This is the finding that people judge some instances (e.g., robin) of a category (e.g., Bird) to be more typical than others (e.g., penguin). Recent research looking for human-like typicality effects in language and vision models has focused on models of a single modality, tested only a small number of concepts, and found only modest correlations with human typicality ratings. The current study expands this behavioral evaluation of models by considering a broader range of language (N = 8) and vision (N = 10) model architectures. It also evaluates whether the combined typicality predictions of vision + language model pairs, as well as a multimodal CLIP-based model, are better aligned with human typicality judgments than those of models of either modality alone. Finally, it evaluates the models across a broader range of concepts (N = 27) than prior studies. There were three important findings. First, language models better align with human typicality judgments than vision models. Second, combined language and vision models (e.g., AlexNet + MiniLM) better predict the human typicality data than the best-performing language model (i.e., MiniLM) or vision model (i.e., ViT-Huge) alone. Third, multimodal models (i.e., CLIP ViT) show promise for explaining human typicality judgments. These results advance the state-of-the-art in aligning the conceptual representations of ML models and humans. A methodological contribution is the creation of a new image set for testing the conceptual alignment of vision models.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention
Authors:
Andrew Li,
Xianle Feng,
Siddhant Narang,
Austin Peng,
Tianle Cai,
Raj Sanjay Shah,
Sashank Varma
Abstract:
When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinter…
▽ More
When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinterpretations using four large language models (LLMs): GPT-2, LLaMA-2, Flan-T5, and RoBERTa. The overall goal is to evaluate whether humans and LLMs are aligned in their processing of garden-path sentences and in the lingering misinterpretations past the point of disambiguation, especially when extra-syntactic information (e.g., a comma delimiting a clause boundary) is present to guide processing. We address this goal using 24 garden-path sentences that have optional transitive and reflexive verbs leading to temporary ambiguities. For each sentence, there are a pair of comprehension questions corresponding to the misinterpretation and the correct interpretation. In three experiments, we (1) measure the dynamic semantic interpretations of LLMs using the question-answering task; (2) track whether these models shift their implicit parse tree at the point of disambiguation (or by the end of the sentence); and (3) visualize the model components that attend to disambiguating information when processing the question probes. These experiments show promising alignment between humans and LLMs in the processing of garden-path sentences, especially when extra-syntactic information is available to guide processing.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Harnessing Complexity: Nonlinear Optical Phenomena in L-Shapes, Nanocrescents, and Split-Ring Resonators
Authors:
Michael R. Clark,
Syed A. Shah,
Andrei Piryatinski,
Maxim Sukharev
Abstract:
We conduct systematic studies of the optical characteristics of plasmonic nanoparticles that exhibit C2v symmetry. We analyze three distinct geometric configurations: an L-type shape, a crescent, and a split-ring resonator. Optical properties are examined using the FDTD method. It is demonstrated that all three shapes exhibit two prominent plasmon bands associated with the two axes of symmetry. Th…
▽ More
We conduct systematic studies of the optical characteristics of plasmonic nanoparticles that exhibit C2v symmetry. We analyze three distinct geometric configurations: an L-type shape, a crescent, and a split-ring resonator. Optical properties are examined using the FDTD method. It is demonstrated that all three shapes exhibit two prominent plasmon bands associated with the two axes of symmetry. This is in addition to a wide range of resonances observed at high frequencies corresponding to quadrupole modes and peaks due to sharp corners. Next, to facilitate nonlinear analysis, we employ a semiclassical hydrodynamic model where the electron pressure term is explicitly accounted for. Employing this model enables us to rigorously examine the second-order angular resolved nonlinear optical response of these nanoparticles in each of the three configurations. For CW pumping, we explore properties of the SHG. Polarization and angle-resolved SHG spectra are obtained, revealing strong dependence on the nanoparticle geometry and incident wave polarization. For pulsed excitations, we discuss the phenomenon of broadband THz generation induced by the DFG. It is shown that the THz emission spectra exhibit unique features attributed to the plasmonic resonances and symmetry of the nanoparticles. The polarization of the generated THz waves is also examined, revealing interesting patterns tied to the nanoparticle geometry. To gain deeper insight, we propose a simple analytical theory that agrees very well with the numerical experiments. An expression for the far-field THz intensity is derived in terms of the incident pulse parameters and the nonlinear response tensor of the nanoparticle. The results presented in this work offer new insights into the linear and nonlinear optical properties of nanoparticles with C2v symmetry.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Imaging Local Effects of Voltage and Boron Doping on Spin Reversal in Antiferromagnetic Magnetoelectric Cr2O3 Thin Films and Devices
Authors:
Adam Erickson,
Syed Qamar Abbas Shah,
Ather Mahmood,
Pratyush Buragohain,
Ilja Fescenko,
Alexei Gruverman,
Christian Binek,
Abdelghani Laraoui
Abstract:
Chromia (Cr2O3) is a magnetoelectric oxide which permits voltage-control of the antiferromagnetic (AFM) order, but it suffers technological constraints due to its low Neel Temperature (TN ~307 K) and the need of a symmetry breaking applied magnetic field to achieve reversal of the Neel vector. Recently, boron (B) doping of Cr2O3 films led to an increase TN > 400 K and allowed the realization of vo…
▽ More
Chromia (Cr2O3) is a magnetoelectric oxide which permits voltage-control of the antiferromagnetic (AFM) order, but it suffers technological constraints due to its low Neel Temperature (TN ~307 K) and the need of a symmetry breaking applied magnetic field to achieve reversal of the Neel vector. Recently, boron (B) doping of Cr2O3 films led to an increase TN > 400 K and allowed the realization of voltage magnetic-field free controlled NĂ©el vector rotation. Here, we directly image the impact of B doping on the formation of AFM domains in Cr2O3 thin films and elucidate the mechanism of voltage-controlled manipulation of the spin structure using nitrogen vacancy (NV) scanning probe magnetometry. We find a stark reduction and thickness dependence of domain size in B-doped Cr2O3 (B:Cr2O3) films, explained by the increased germ density, likely associated with the B doping. By reconstructing the surface magnetization from the NV stray-field maps, we find a qualitative distinction between the undoped and B-doped Cr2O3 films, manifested by the histogram distribution of the AFM ordering, i.e., 180 degree domains for pure films, and 90 degree domains for B:Cr2O3 films. Additionally, NV imaging of voltage-controlled B-doped Cr2O3 devices corroborate the 90 degeree rotation of the AFM domains observed in magnetotransport measurement.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Jyoti Aneja,
Hany Awadalla,
Ahmed Awadallah,
Ammar Ahmad Awan,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
SĂ©bastien Bubeck,
Martin Cai,
Qin Cai,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Weizhu Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Hao Cheng,
Parul Chopra,
Xiyang Dai
, et al. (104 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts.
△ Less
Submitted 30 August, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach
Authors:
Mohammed Yousif,
Jonat John Mathew,
Huzaifa Pallan,
Agamjeet Singh Padda,
Syed Daniyal Shah,
Sara Adamski,
Madhu Reddiboina,
Arjun Pankajakshan
Abstract:
Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-b…
▽ More
Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-based sampling approach applied to pre-trained models trained on distinct datasets to create a new training database. Using ASVspoof 2019 dataset as a proof-of-concept, we implement pre-trained models with Resnet and ConvNext architectures. Our approach demonstrates comparable generalization on unseen data while being computationally efficient, requiring less training data. Evaluation is conducted using the In-the-wild dataset.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
A Systematic Overview of Single-Cell Transcriptomics Databases, their Use cases, and Limitations
Authors:
Mahnoor N. Gondal,
Saad Ur Rehman Shah,
Arul M. Chinnaiyan,
Marcin Cieslik
Abstract:
Rapid advancements in high-throughput single-cell RNA-seq (scRNA-seq) technologies and experimental protocols have led to the generation of vast amounts of genomic data that populates several online databases and repositories. Here, we systematically examined large-scale scRNA-seq databases, categorizing them based on their scope and purpose such as general, tissue-specific databases, disease-spec…
▽ More
Rapid advancements in high-throughput single-cell RNA-seq (scRNA-seq) technologies and experimental protocols have led to the generation of vast amounts of genomic data that populates several online databases and repositories. Here, we systematically examined large-scale scRNA-seq databases, categorizing them based on their scope and purpose such as general, tissue-specific databases, disease-specific databases, cancer-focused databases, and cell type-focused databases. Next, we discuss the technical and methodological challenges associated with curating large-scale scRNA-seq databases, along with current computational solutions. We argue that understanding scRNA-seq databases, including their limitations and assumptions, is crucial for effectively utilizing this data to make robust discoveries and identify novel biological insights. Furthermore, we propose that bridging the gap between computational and wet lab scientists through user-friendly web-based platforms is needed for democratizing access to single-cell data. These platforms would facilitate interdisciplinary research, enabling researchers from various disciplines to collaborate effectively. This review underscores the importance of leveraging computational approaches to unravel the complexities of single-cell data and offers a promising direction for future research in the field.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Stability conditions on crepant resolutions of quotients of product varieties
Authors:
Alexander Perry,
Saket Shah
Abstract:
We construct stability conditions on crepant resolutions of certain quotients of product varieties, giving as a special case the first examples of stability conditions on strict Calabi-Yau varieties of arbitrary dimension. Along the way, we prove the crepant resolutions are derived equivalent to the corresponding quotient stacks, verifying an instance of a conjecture of Bondal and Orlov.
We construct stability conditions on crepant resolutions of certain quotients of product varieties, giving as a special case the first examples of stability conditions on strict Calabi-Yau varieties of arbitrary dimension. Along the way, we prove the crepant resolutions are derived equivalent to the corresponding quotient stacks, verifying an instance of a conjecture of Bondal and Orlov.
△ Less
Submitted 4 July, 2024; v1 submitted 13 April, 2024;
originally announced April 2024.
-
Observation of Gravitational Waves from the Coalescence of a $2.5\text{-}4.5~M_\odot$ Compact Object and a Neutron Star
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
D. Agarwal,
M. Agathos,
M. Aghaei Abchouyeh,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
S. Akçay,
T. Akutsu,
S. Albanesi,
R. A. Alfaidi,
A. Al-Jodah
, et al. (1771 additional authors not shown)
Abstract:
We report the observation of a coalescing compact binary with component masses $2.5\text{-}4.5~M_\odot$ and $1.2\text{-}2.0~M_\odot$ (all measurements quoted at the 90% credible level). The gravitational-wave signal GW230529_181500 was observed during the fourth observing run of the LIGO-Virgo-KAGRA detector network on 2023 May 29 by the LIGO Livingston Observatory. The primary component of the so…
▽ More
We report the observation of a coalescing compact binary with component masses $2.5\text{-}4.5~M_\odot$ and $1.2\text{-}2.0~M_\odot$ (all measurements quoted at the 90% credible level). The gravitational-wave signal GW230529_181500 was observed during the fourth observing run of the LIGO-Virgo-KAGRA detector network on 2023 May 29 by the LIGO Livingston Observatory. The primary component of the source has a mass less than $5~M_\odot$ at 99% credibility. We cannot definitively determine from gravitational-wave data alone whether either component of the source is a neutron star or a black hole. However, given existing estimates of the maximum neutron star mass, we find the most probable interpretation of the source to be the coalescence of a neutron star with a black hole that has a mass between the most massive neutron stars and the least massive black holes observed in the Galaxy. We provisionally estimate a merger rate density of $55^{+127}_{-47}~\text{Gpc}^{-3}\,\text{yr}^{-1}$ for compact binary coalescences with properties similar to the source of GW230529_181500; assuming that the source is a neutron star-black hole merger, GW230529_181500-like sources constitute about 60% of the total merger rate inferred for neutron star-black hole coalescences. The discovery of this system implies an increase in the expected rate of neutron star-black hole mergers with electromagnetic counterparts and provides further evidence for compact objects existing within the purported lower mass gap.
△ Less
Submitted 26 July, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Authors:
Ran Zmigrod,
Dongsheng Wang,
Mathieu Sibue,
Yulong Pei,
Petr Babkin,
Ivan Brugere,
Xiaomo Liu,
Nacho Navarro,
Antony Papadimitriou,
William Watson,
Zhiqiang Ma,
Armineh Nourbakhsh,
Sameena Shah
Abstract:
The field of visually rich document understanding (VRDU) aims to solve a multitude of well-researched NLP tasks in a multi-modal domain. Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia. These datasets cover documents like invoices and receipts with sparse ann…
▽ More
The field of visually rich document understanding (VRDU) aims to solve a multitude of well-researched NLP tasks in a multi-modal domain. Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia. These datasets cover documents like invoices and receipts with sparse annotations such that they support one or two co-related tasks (e.g., entity extraction and entity linking). Unfortunately, only focusing on a single specific of documents or task is not representative of how documents often need to be processed in the wild - where variety in style and requirements is expected. In this paper, we introduce BuDDIE (Business Document Dataset for Information Extraction), the first multi-task dataset of 1,665 real-world business documents that contains rich and dense annotations for DC, KEE, and VQA. Our dataset consists of publicly available business entity documents from US state government websites. The documents are structured and vary in their style and layout across states and types (e.g., forms, certificates, reports, etc.). We provide data variety and quality metrics for BuDDIE as well as a series of baselines for each task. Our baselines cover traditional textual, multi-modal, and large language model approaches to VRDU.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Semisimple Algebras of Vector Fields on $\mathbb{C}^{3}$
Authors:
Sajid Ali,
Hassan Azad,
Indranil Biswas,
Fazal M. Mahomed,
Said Waqas Shah
Abstract:
A local classification of semisimple algebras of vector fields on $\mathbb{C}^{3}$ is given, using the canonical forms of the Heisenberg algebra and of $sl(2,\mathbb{C})\times sl(2,\mathbb{C})$.
A local classification of semisimple algebras of vector fields on $\mathbb{C}^{3}$ is given, using the canonical forms of the Heisenberg algebra and of $sl(2,\mathbb{C})\times sl(2,\mathbb{C})$.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Language Model Guided Interpretable Video Action Reasoning
Authors:
Ning Wang,
Guangming Zhu,
HS Li,
Liang Zhang,
Syed Afaq Ali Shah,
Mohammed Bennamoun
Abstract:
While neural networks have excelled in video action recognition tasks, their black-box nature often obscures the understanding of their decision-making processes. Recent approaches used inherently interpretable models to analyze video actions in a manner akin to human reasoning. These models, however, usually fall short in performance compared to their black-box counterparts. In this work, we pres…
▽ More
While neural networks have excelled in video action recognition tasks, their black-box nature often obscures the understanding of their decision-making processes. Recent approaches used inherently interpretable models to analyze video actions in a manner akin to human reasoning. These models, however, usually fall short in performance compared to their black-box counterparts. In this work, we present a new framework named Language-guided Interpretable Action Recognition framework (LaIAR). LaIAR leverages knowledge from language models to enhance both the recognition capabilities and the interpretability of video models. In essence, we redefine the problem of understanding video model decisions as a task of aligning video and language models. Using the logical reasoning captured by the language model, we steer the training of the video model. This integrated approach not only improves the video model's adaptability to different domains but also boosts its overall performance. Extensive experiments on two complex video action datasets, Charades & CAD-120, validates the improved performance and interpretability of our LaIAR framework. The code of LaIAR is available at https://github.com/NingWang2049/LaIAR.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency
Authors:
Toyin Aguda,
Suchetha Siddagangappa,
Elena Kochkina,
Simerjot Kaur,
Dongsheng Wang,
Charese Smiley,
Sameena Shah
Abstract:
Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data a…
▽ More
Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data annotators for extracting relations in financial documents. We compare the annotations produced by three LLMs (GPT-4, PaLM 2, and MPT Instruct) against expert annotators and crowdworkers. We demonstrate that the current state-of-the-art LLMs can be sufficient alternatives to non-expert crowdworkers. We analyze models using various prompts and parameter settings and find that customizing the prompts for each relation group by providing specific examples belonging to those groups is paramount. Furthermore, we introduce a reliability index (LLM-RelIndex) used to identify outputs that may require expert attention. Finally, we perform an extensive time, cost and error analysis and provide recommendations for the collection and usage of automated annotations in domain-specific settings.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Multi-Level Feedback Generation with Large Language Models for Empowering Novice Peer Counselors
Authors:
Alicja Chaszczewicz,
Raj Sanjay Shah,
Ryan Louie,
Bruce A Arnow,
Robert Kraut,
Diyi Yang
Abstract:
Realistic practice and tailored feedback are key processes for training peer counselors with clinical skills. However, existing mechanisms of providing feedback largely rely on human supervision. Peer counselors often lack mechanisms to receive detailed feedback from experienced mentors, making it difficult for them to support the large number of people with mental health issues who use peer couns…
▽ More
Realistic practice and tailored feedback are key processes for training peer counselors with clinical skills. However, existing mechanisms of providing feedback largely rely on human supervision. Peer counselors often lack mechanisms to receive detailed feedback from experienced mentors, making it difficult for them to support the large number of people with mental health issues who use peer counseling. Our work aims to leverage large language models to provide contextualized and multi-level feedback to empower peer counselors, especially novices, at scale. To achieve this, we co-design with a group of senior psychotherapy supervisors to develop a multi-level feedback taxonomy, and then construct a publicly available dataset with comprehensive feedback annotations of 400 emotional support conversations. We further design a self-improvement method on top of large language models to enhance the automatic generation of feedback. Via qualitative and quantitative evaluation with domain experts, we demonstrate that our method minimizes the risk of potentially harmful and low-quality feedback generation which is desirable in such high-stakes scenarios.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Designing Multi-Step Action Models for Enterprise AI Adoption
Authors:
Shreyash Mishra,
Shrey Shah,
Rex Pereira
Abstract:
This paper introduces the Multi-Step Action Model (MSAM), a closed-source AI model designed by Empsing to address challenges hindering AI adoption in enterprises. Through a holistic examination, this paper explores MSAM's foundational principles, design architecture, and future trajectory. It evaluates MSAM's performance via rigorous testing methodologies and envisions its potential impact on adva…
▽ More
This paper introduces the Multi-Step Action Model (MSAM), a closed-source AI model designed by Empsing to address challenges hindering AI adoption in enterprises. Through a holistic examination, this paper explores MSAM's foundational principles, design architecture, and future trajectory. It evaluates MSAM's performance via rigorous testing methodologies and envisions its potential impact on advancing AI adoption within organizations.
△ Less
Submitted 21 February, 2024;
originally announced March 2024.
-
On decompositions for Fano schemes of intersections of two quadrics
Authors:
Pieter Belmans,
Jishnu Bose,
Sarah Frei,
Benjamin Gould,
James Hotchkiss,
Alicia Lamarche,
Jack Petok,
Cristian Rodriguez Avila,
Saket Shah
Abstract:
We propose conjectural semiorthogonal decompositions for Fano schemes of linear subspaces on intersections of two quadrics, in terms of symmetric powers of the associated hyperelliptic (resp. stacky) curve. When the intersection is odd-dimensional, we moreover conjecture an identity in the Grothendieck ring of varieties and other motivic contexts. The evidence for these conjectures is given by upg…
▽ More
We propose conjectural semiorthogonal decompositions for Fano schemes of linear subspaces on intersections of two quadrics, in terms of symmetric powers of the associated hyperelliptic (resp. stacky) curve. When the intersection is odd-dimensional, we moreover conjecture an identity in the Grothendieck ring of varieties and other motivic contexts. The evidence for these conjectures is given by upgrading recent results of Chen-Vilonen-Xue, to obtain formulae for the Hodge numbers of these Fano schemes. This allows us to numerically verify the conjecture in the hyperelliptic case, and establish a combinatorial identity as evidence for the stacky case.
△ Less
Submitted 8 April, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Quasinormal Modes of Near-Extremal Electric and Magnetic Black Branes
Authors:
Swapnil Nitin Shah
Abstract:
Gauge-gravity duality provides a robust mathematical framework for studying the behavior of strongly coupled non-abelian plasmas both near and far away from thermodynamic equilibrium. In particular, their near-equilibrium transport coefficients such as viscosity, conductivity, diffusion constants, etc. can be determined from poles of the retarded Green's function which are the dissipative eigenmod…
▽ More
Gauge-gravity duality provides a robust mathematical framework for studying the behavior of strongly coupled non-abelian plasmas both near and far away from thermodynamic equilibrium. In particular, their near-equilibrium transport coefficients such as viscosity, conductivity, diffusion constants, etc. can be determined from poles of the retarded Green's function which are the dissipative eigenmodes i.e., the quasinormal modes (QNMs) of the dual gravitational field equations. The AdS5/CFT4 correspondence admits the description of a strongly coupled $\mathcal{N}$= 4 Supersymmetric Yang Mills (SYM) plasma at non-zero temperature as a dual AdS5 black brane geometry. We demonstrate the application of pseudospectral methods to solving the dual Einstein field equations using the example of homogenous isotropization in $\mathcal{N}$= 4 SYM plasma far from equilibrium. Using this framework, we also compute the quasinormal modes of electrically (Reissner-Nordstrom) and magnetically charged AdS5 black branes for the case of vanishing spatial momenta. The near-extremal behavior of these QNMs is analyzed for both types of black branes.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Towards Neuro-Symbolic Video Understanding
Authors:
Minkyu Choi,
Harsh Goel,
Mohammad Omama,
Yunhao Yang,
Sahil Shah,
Sandeep Chinchali
Abstract:
The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reaso…
▽ More
The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reasoning across frames. A key reason for this failure is that they intertwine per-frame perception and temporal reasoning into a single deep network. Hence, decoupling but co-designing semantic understanding and temporal reasoning is essential for efficient scene identification. We propose a system that leverages vision-language models for semantic understanding of individual frames but effectively reasons about the long-term evolution of events using state machines and temporal logic (TL) formulae that inherently capture memory. Our TL-based reasoning improves the F1 score of complex event identification by 9-15% compared to benchmarks that use GPT4 for reasoning on state-of-the-art self-driving datasets such as Waymo and NuScenes.
△ Less
Submitted 15 July, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase
Authors:
Yulong Pei,
Salwa Alamir,
Rares Dolga,
Sameena Shah
Abstract:
Code revert prediction, a specialized form of software defect detection, aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development. This task is very important in practice because by identifying code changes that are more prone to being reverted, developers and project managers can proactively take measures to prevent issues, improve code qual…
▽ More
Code revert prediction, a specialized form of software defect detection, aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development. This task is very important in practice because by identifying code changes that are more prone to being reverted, developers and project managers can proactively take measures to prevent issues, improve code quality, and optimize development processes. However, compared to code defect detection, code revert prediction has been rarely studied in previous research. Additionally, many previous methods for code defect detection relied on independent features but ignored relationships between code scripts. Moreover, new challenges are introduced due to constraints in an industry setting such as company regulation, limited features and large-scale codebase. To overcome these limitations, this paper presents a systematic empirical study for code revert prediction that integrates the code import graph with code features. Different strategies to address anomalies and data imbalance have been implemented including graph neural networks with imbalance classification and anomaly detection. We conduct the experiments on real-world code commit data within J.P. Morgan Chase which is extremely imbalanced in order to make a comprehensive comparison of these different approaches for the code revert prediction problem.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Belief and Persuasion in Scientific Discourse on Social Media: A Study of the COVID-19 Pandemic
Authors:
Salwa Alamir,
Armineh Nourbakhsh,
Cecilia Tilli,
Sameena Shah,
Manuela Veloso
Abstract:
Research into COVID-19 has been rapidly evolving since the onset of the pandemic. This occasionally results in contradictory recommendations by credible sources of scientific opinion, public health authorities, and medical professionals. In this study, we examine whether this has resulted in a lack of trust in scientific opinion, by examining the belief patterns of social media users and their rea…
▽ More
Research into COVID-19 has been rapidly evolving since the onset of the pandemic. This occasionally results in contradictory recommendations by credible sources of scientific opinion, public health authorities, and medical professionals. In this study, we examine whether this has resulted in a lack of trust in scientific opinion, by examining the belief patterns of social media users and their reactions to statements related to scientific facts. We devise models to mine belief and persuasion in Twitter discourse using semi-supervised approaches, and show the relationship between lack of belief and insurgence of paranoia and conspiracy theories. By investigating these belief patterns, we explore the best persuasion tactics for communicating information related to COVID-19.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Log Summarisation for Defect Evolution Analysis
Authors:
Rares Dolga,
Ran Zmigrod,
Rui Silva,
Salwa Alamir,
Sameena Shah
Abstract:
Log analysis and monitoring are essential aspects in software maintenance and identifying defects. In particular, the temporal nature and vast size of log data leads to an interesting and important research question: How can logs be summarised and monitored over time? While this has been a fundamental topic of research in the software engineering community, work has typically focused on heuristic-…
▽ More
Log analysis and monitoring are essential aspects in software maintenance and identifying defects. In particular, the temporal nature and vast size of log data leads to an interesting and important research question: How can logs be summarised and monitored over time? While this has been a fundamental topic of research in the software engineering community, work has typically focused on heuristic-, syntax-, or static-based methods. In this work, we suggest an online semantic-based clustering approach to error logs that dynamically updates the log clusters to enable monitoring code error life-cycles. We also introduce a novel metric to evaluate the performance of temporal log clusters. We test our system and evaluation metric with an industrial dataset and find that our solution outperforms similar systems. We hope that our work encourages further temporal exploration in defect datasets.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.