Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–5 of 5 results for author: Neo, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01082  [pdf, other

    cs.CL

    Min P Sampling: Balancing Creativity and Coherence at High Temperature

    Authors: Minh Nguyen, Andrew Baker, Andreas Kirsch, Clement Neo

    Abstract: Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-$p$ sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, particularly when using higher temperatures. To ad… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 8 Pages

  2. arXiv:2402.15055  [pdf, other

    cs.CL cs.AI cs.LG

    Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

    Authors: Clement Neo, Shay B. Cohen, Fazl Barez

    Abstract: In this paper, we investigate the interplay between attention heads and specialized "next-token" neurons in the Multilayer Perceptron that predict specific tokens. By prompting an LLM like GPT-4 to explain these model internals, we can elucidate attention mechanisms that activate certain next-token neurons. Our analysis identifies attention heads that recognize contexts relevant to predicting a pa… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 15 pages, 11 figures

  3. arXiv:2402.02619  [pdf, other

    cs.LG cs.CL

    Increasing Trust in Language Models through the Reuse of Verified Circuits

    Authors: Philip Quirke, Clement Neo, Fazl Barez

    Abstract: Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a model can be trained to meet this sta… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 figures, 5 tables

  4. arXiv:2310.08164  [pdf, other

    cs.LG

    Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models

    Authors: Luke Marks, Amir Abdullah, Clement Neo, Rauno Arike, Philip Torr, Fazl Barez

    Abstract: Large language models (LLMs) fine-tuned by reinforcement learning from human feedback (RLHF) are becoming more widely deployed. We coin the term $\textit{Implicit Reward Model}$ (IRM) to refer to the changes that occur to an LLM during RLHF that result in high-reward generations. We interpret IRMs, and measure their divergence from the RLHF reward model used in the fine-tuning process that induced… ▽ More

    Submitted 7 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: 19 pages, 5 figures

  5. arXiv:2209.15585  [pdf, other

    physics.ao-ph cs.LG

    Cloud Classification with Unsupervised Deep Learning

    Authors: Takuya Kurihana, Ian Foster, Rebecca Willett, Sydney Jenkins, Kathryn Koenig, Ruby Werman, Ricardo Barros Lourenco, Casper Neo, Elisabeth Moyer

    Abstract: We present a framework for cloud characterization that leverages modern unsupervised deep learning technologies. While previous neural network-based cloud classification models have used supervised learning methods, unsupervised learning allows us to avoid restricting the model to artificial categories based on historical cloud classification schemes and enables the discovery of novel, more detail… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

    Comments: 5 pages, 6 figures, Proceedings for Climate Informatics Workshop 2019 Paris