Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–8 of 8 results for author: Nakhost, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15708  [pdf, other

    cs.CL cs.AI cs.LG

    Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

    Authors: Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Sercan O. Arik

    Abstract: Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, the… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2312.01279  [pdf, other

    cs.CL cs.AI cs.LG

    TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents

    Authors: James Enouen, Hootan Nakhost, Sayna Ebrahimi, Sercan O Arik, Yan Liu, Tomas Pfister

    Abstract: Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developm… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  3. arXiv:2311.02883  [pdf, other

    cs.CL

    SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data

    Authors: Ruoxi Sun, Sercan Ö. Arik, Rajarishi Sinha, Hootan Nakhost, Hanjun Dai, Pengcheng Yin, Tomas Pfister

    Abstract: Text-to-SQL aims to automate the process of generating SQL queries on a database from natural language text. In this work, we propose "SQLPrompt", tailored to improve the few-shot prompting capabilities of Text-to-SQL for Large Language Models (LLMs). Our methods include innovative prompt design, execution-based consistency decoding strategy which selects the SQL with the most consistent execution… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  4. arXiv:2306.00739  [pdf, other

    cs.CL cs.AI cs.DB

    SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

    Authors: Ruoxi Sun, Sercan Ö. Arik, Alex Muzio, Lesly Miculicich, Satya Gundabathula, Pengcheng Yin, Hanjun Dai, Hootan Nakhost, Rajarishi Sinha, Zifeng Wang, Tomas Pfister

    Abstract: Text-to-SQL, the process of translating natural language into Structured Query Language (SQL), represents a transformative application of large language models (LLMs), potentially revolutionizing how humans interact with data. This paper introduces the SQL-PaLM framework, a comprehensive solution for understanding and enhancing Text-to-SQL using LLMs, using in the learning regimes of few-shot prom… ▽ More

    Submitted 30 March, 2024; v1 submitted 26 May, 2023; originally announced June 2023.

  5. arXiv:2305.14926  [pdf, other

    cs.CL cs.AI cs.LG

    Universal Self-Adaptive Prompting

    Authors: Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Hanjun Dai, Julian Martin Eisenschlos, Sercan O. Arik, Tomas Pfister

    Abstract: A hallmark of modern large language models (LLMs) is their impressive general zero-shot and few-shot abilities, often elicited through in-context learning (ICL) via prompting. However, while highly coveted and being the most general, zero-shot performances in LLMs are still typically weaker due to the lack of guidance and the difficulty of applying existing automatic prompt design methods in gener… ▽ More

    Submitted 20 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 (Main). 10 pages, 5 figures, 4 tables (26 pages, 9 figures and 13 tables including references and appendices)

  6. arXiv:2305.02301  [pdf, other

    cs.CL cs.AI cs.LG

    Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

    Authors: Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister

    Abstract: Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs.… ▽ More

    Submitted 5 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  7. arXiv:2211.07357  [pdf, other

    cs.LG cs.AI eess.SY

    Controlling Commercial Cooling Systems Using Reinforcement Learning

    Authors: Jerry Luo, Cosmin Paduraru, Octavian Voicu, Yuri Chervonyi, Scott Munns, Jerry Li, Crystal Qian, Praneet Dutta, Jared Quincy Davis, Ningjia Wu, Xingwei Yang, Chu-Ming Chang, Ted Li, Rob Rose, Mingyan Fan, Hootan Nakhost, Tinglin Liu, Brian Kirkman, Frank Altamura, Lee Cline, Patrick Tonker, Joel Gouker, Dave Uden, Warren Buddy Bryan, Jason Law , et al. (11 additional authors not shown)

    Abstract: This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments ha… ▽ More

    Submitted 14 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 27 pages, 11 figures

  8. arXiv:2008.00646  [pdf, other

    cs.LG stat.ML

    Interpretable Sequence Learning for COVID-19 Forecasting

    Authors: Sercan O. Arik, Chun-Liang Li, Jinsung Yoon, Rajarishi Sinha, Arkady Epshteyn, Long T. Le, Vikas Menon, Shashank Singh, Leyou Zhang, Nate Yoder, Martin Nikoltchev, Yash Sonthalia, Hootan Nakhost, Elli Kanal, Tomas Pfister

    Abstract: We propose a novel approach that integrates machine learning into compartmental disease modeling to predict the progression of COVID-19. Our model is explainable by design as it explicitly shows how different compartments evolve and it uses interpretable encoders to incorporate covariates and improve performance. Explainability is valuable to ensure that the model's forecasts are credible to epide… ▽ More

    Submitted 13 January, 2021; v1 submitted 3 August, 2020; originally announced August 2020.