Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0%

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

The importance of watermarking for large language models (LLMs) cannot be overstated. Aaronson[1] et al. propose a statistics based method to embed watermark into LLM generated text. They initially provided a brief overview of their method in a set of slides with no proofs, while Fernandez [2] et al. offers a more detailed theoretical proof with some probability tricks.

This blog post aims to explain Aaronson’s watermarking method in a straightforward, beginner-friendly manner, avoiding reliance on advanced probability techniques. Notably, we highlight that this method is essentially the same as using the Gumbel-Max Trick.

Method

Aaronson[1]'s watermarking method modifies the token selection process in large language models to embed an invisible trace into the generated text. During generation, a secret key kk is used to produce a random vector r\mathbf{r}, where each element rvr_v of this vector corresponds to a token vv in the vocabulary VV, and rvr_v is uniformly distributed in [0,1][0, 1].

Formally, given a large language model (LLM) that at each step generates a probability distribution p=(p1,p2,,pV)\mathbf{p} = (p_1, p_2, \ldots, p_V) over the vocabulary VV, where pvp_v is the probability of token vv, Aaronson’s method adjusts the selection process. The next token xx is selected as:

x=argmaxvV(rv1/pv)x = \arg\max_{v \in V} \left( r_v ^{1/p_v} \right)

This ensures that tokens with both high original probabilities and favorable random values rvr_v are chosen more frequently.

For detection, the watermark’s presence is verified by computing a score STS_T for a sequence of TT tokens. This score aggregates the log-transformed random values of the selected tokens:

ST=t=1Tln(1rx(t))S_T = -\sum_{t=1}^{T} \ln(1 - r_{x(t)})

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

Statistics is a field that empowers us to transform raw data into insightful information. Among the most commonly used techniques in statistics is the calculation of the mean, or the average. However, when we aim to minimize the impact of outliers or extreme values, we can employ the Median of Means method. This statistical approach provides a more robust estimate of a dataset’s central tendency. Let’s delve deeper into understanding this method.

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

Numerical integration plays a pivotal role in numerous scientific computations, with applications spanning from physics to finance. Conventional numerical integration techniques, such as Simpson’s rule or the trapezoidal rule, perform well for straightforward, low-dimensional integrals. However, these classical methods falter when applied to high-dimensional spaces, a challenge addressed by Monte Carlo integration, which offers a robust and efficient resolution to the dimensionality problem.

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

Introduction

Python, with its simplicity and readability, has become the language of choice for many developers. However, it has its limitations, especially when it comes to memory management in multiprocessing applications. The Copy-on-Write (CoW) issue it faces often leads to memory leakage, causing the program to consume all available memory and eventually crash.

CSTL is a powerful tool that brings the efficiency of the C++ Standard Template Library (STL) to Python. It provides Python developers with a way to leverage the efficiency of C++ STL containers, such as std::vector, std::unordered_map, and std::unordered_set, replacing their Python counterparts (list, dict, and set).

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

Fine-tuning a pre-trained model is currently the dominant approach in contemporary natural language processing research. However, the choice of parameters to adjust can significantly influence model performance. The challenge lies in pinpointing the most important parameters for fine-tuning. Numerous Parameter-Efficient methods like Adapter, LoRA, DiffPruning, ChildPruning employ various strategies to tweak model parameters. This blog post delves into our recent paper titled “On the effectiveness of parameter-efficient fine-tuning” by Zihao Fu et al. [1], where we introduce a novel method, the Second-order Approximation Method (SAM), to effectively identify the most critical parameters for fine-tuning.

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

In the realm of machine learning, fine-tuning pre-trained models is a common practice. However, it often requires a significant amount of parameters and computational resources. To address this, researchers have proposed various parameter-efficient models, each with their own unique approach to fine-tuning. In this blog post, we will be discussing a unified view of these models, focusing on how they align with the definition of a sparse fine-tuned model. This view is a brief explanation of the study “On the effectiveness of parameter-efficient fine-tuning” by Zihao Fu et al. [1], can provide a unified understanding of parameter-efficient models and can be instrumental in further analysis and research.

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

Deep learning, a subfield of artificial intelligence (AI), has been a topic of great interest for many years. Among its various intriguing aspects, the role of gradient descent, a fundamental algorithm employed for training deep learning models, has garnered considerable attention. A recent paper titled “Implicit Gradient Regularization” by researchers David G.T. Barrett and Benoit Dherin from DeepMind and Google Dublin, respectively, provides an enlightening exploration of how gradient descent implicitly regularizes models. This phenomenon is referred to as Implicit Gradient Regularization (IGR).
In this blog post, we will unpack the concept of IGR, discuss its core principles, and explore its implications on deep learning models.

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

Copying bibliography information from Google Scholar entries typically involves multiple steps and clicks. It’s a process that can be time-consuming and often disrupts the flow of your research. Recognizing this challenge, we present the Google Scholar Oneclick Copy Bib, a Tampermonkey plugin that simplifies this process down to a single click.

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

VSCode comes equipped with a built-in PDF reviewer based on pdf.js, facilitating seamless previews of PDF files as you work. However, a notable downside is the inability to alter the background color, which can be quite strenuous on your eyes over extended periods. Thankfully, we have identified a strategy to overcome this hindrance, enhancing not only the visual appeal but also offering a comfort to your eyes during those long coding sessions. Here we will delve into how you can change the background color for a more pleasant viewing experience in VSCode PDF reviewers like the LaTeX Workshop.

pdfbackground

Read more »

By Z.H. Fu
https://fuzihaofzh.github.io/blog/

In the world of classical mechanics, the Newtonian framework, while foundational, encounters limitations when dealing with complex systems due to its intricate force analysis. To address these challenges, advanced systems like Lagrangian and Hamiltonian mechanics have been developed. These innovative methodologies shift the focus from forces to energy and phase space, respectively, providing a more holistic and simplified approach to understanding physical systems. In this article, we will delve into the intricacies of Lagrangian and Hamiltonian mechanics, exploring how they resolve the complexities of Newtonian mechanics and the unique insights they bring to our comprehension of the dynamics of the physical world.

Read more »