Blog | Fan Pu Zeng

Notes on 'The Llama 3 Herd of Models'

Notes on the new Llama 3.1 technical report. It's a long paper, but one that's well-written with lots of interesting technical details and design choices.

14 min read · August 7, 2024

2024 · machine-learning

Playing Sound Voltex at Home: Setting Up Unnamed SDVX Clone with the Yuancon SDVX Controller

Rhythm is just a $200 controller and some hopefully-not-too-complicated open source software setup away! This beginner's guide will help to demystify the process of setting up Sound Voltex at home using a custom SDVX controller using Unnamed SDVX Clone.

14 min read · September 2, 2023

2023 · general rhythm-games

Creating Trackback Requests for Static Sites

A simple guide on creating manual Trackback requests for static sites to increase visibility and discoverability

5 min read · September 1, 2023

2023 · code general

A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers: A Guided Walkthrough

Imagine doing high-dimensional statistical inference, but instead of repeatedly studying different settings with specific low-dimensional constraints (such as linear regression with sparsity constraints, or estimation of structured covariance matrices), there is a method for performing a unified analysis using appropriate notions.

Well, you're in luck! 'A Unified Framework for High-Dimensional Analysis of $ M $-Estimators with Decomposable Regularizers' by Negahban, Ravikumar, Wainwright, and Yu shows that the $ \ell_2 $ difference between any regularized $M$-estimator and its true parameter can be bounded if the regularization function is decomposable, and the loss function satisfies restricted strong convexity.

The goal of this post is to provide intuition for the result and develop sufficient background for understanding the proof of this result, followed by a walkthrough of the proof itself.

22 min read · July 14, 2023

2023 · statistics machine-learning

The CMU Steam Tunnels and Wean 9

If you're curious about the infamous steam tunnels at CMU, or what the views from the roof of Wean Hall looks like, this post is for you!

7 min read · June 16, 2023

2023 · general cmu

CMU 15712 Advanced Operating Systems and Distributed Systems Course Review

15-712 Advanced OS was an excellent seminar-based graduate course that took us on a whirlwind tour through many of the most seminal SigOps Hall of Fame papers across several systems domains. It will prepare you to be a great systems designer and researcher. In this post, I will share my experience in the class, the course structure and content, what I thought were the biggest takeaways, and who this class might be suitable for.

22 min read · June 9, 2023

2023 · courses cmu systems

Score-Based Diffusion Models

Score-based diffusion models are a promising direction for generative models, as they improve on both likelihood-based approaches like variational autoencoders, as well as adversarial methods like Generative Adversarial Networks (GANs). In this blog post, we survey recent developments in the field centered around the line of results developed in (Song & Ermon, 2019), analyze the current strengths and limitations of score-based diffusion models, and discuss possible future directions that can address its drawbacks. Joint work with Owen Wang.

25 min read · June 7, 2023

2023 · machine-learning

The Art of LaTeX: Common Mistakes, and Advice for Typesetting Beautiful, Delightful Proofs

When was the first time you had to use LaTeX? If you are like most people, it was probably suddenly forced upon you during your first math or CS class where you had to start writing proofs, with minimal guidance on how to get started. Unfortunately, this meant that while many people have good operational knowledge of LaTeX, there are still many small mistakes and best practices which are not followed, which are not corrected by TAs as they are either not severe enough to warrant a note, or perhaps even the TAs themselves are not aware of them.

In this post, we cover some common mistakes that are made by LaTeX practitioners (even in heavily cited papers), and how to address them.

30 min read · January 2, 2023

2023 · code general

A Concise Proof of the Central Limit Theorem, and Its Actually Useful Version, the Berry-Esseen Theorem

The Central Limit Theorem is widely used in statistics and machine learning, as it allows us to assume that given enough samples, the mean of the samples will follow a normal distribution. This holds even if the samples come from a distribution that is not normally distributed. In this post, we prove the Central Limit Theorem, and then take a look at the Berry-Esseen Theorem, which actually provides a quantitative bound on the convergence of the distribution and can therefore be actually used in deriving theoretical bounds.

11 min read · December 28, 2022

2022 · math machine-learning

Reinforcement Learning Policy Optimization: Deriving the Policy Gradient Update

Reinforcement learning algorithms that learn a policy (as opposed to implicit policy methods like $\epsilon$-greedy) optimize their policies by updating their policies in the direction of the gradient. However, the precise environment dynamics are not usually known to us, and the state space is usually also too large to enumerate, which means that we still cannot compute the gradient analytically. In this post, we derive the policy gradient update from scratch, and show how it can be approximated by sampling sufficiently many trajectories.

7 min read · December 26, 2022

2022 · machine-learning

Pseudo-determinism for Graph Streaming Problems

Given a fixed input for a search problem, pseudo-deterministic algorithms produce the same answer over multiple independent runs, with high probability. For example, we can efficiently find a certificate for inequality of multivariate polynomials pseudo-deterministically, but it is not known how to do so deterministically. The same notion can be extended to the streaming model. The problem of finding a nonzero element from a turnstile stream is previously shown to require linear space for both deterministic and pseudo-deterministic algorithms. Another model of streaming problems is that of graphs, where edge insertions and deletions occur along a stream. Some natural problems include connectivity, bipartiteness, and colorability of a graph. While the randomized and deterministic graph streaming algorithms have been mostly well-studied, we investigate pseudo-deterministic space lower bounds and upper bounds for graph theoretic streaming problems.

1 min read · December 19, 2022

2022 · theory project

Graphical Bayesian Networks with Topic Modeling Priors for Predicting Asset Covariances

Covariance matrix prediction is a long-standing challenge in modern portfolio theory and quantitative finance. In this project, we investigate the effectiveness of Bayesian networks in predicting the covariance matrix of financial assets (specifically a subset of the S&P 500), evaluated against Heterogeneous Autoregressive (HAR) models. In particular, we consider both HAR-DRD, based on the DRD decomposition of the covariance matrix, and Graphical HAR (GHAR)-DRD, which is also based on DRD decomposition but also makes use of graphical relationships between the assets. To build the graph representing relationships between the assets, we apply Latent Dirichlet allocation (LDA) on the 10-K filings of each of the companies, and infer edges based on topic overlap.

1 min read · December 13, 2022

2022 · machine-learning project

Analysis of Symmetry and Conventions in Off-Belief Learning (OBL) in Hanabi

Hanabi has been proposed as the new frontier for developing strategies in cooperative AI, currently a very nascent area of AI research. A recent algorithm that has been developed for multi-agent reinforcement learning in a cooperative context is the Off-Belief Learning (OBL) algorithm, which is based on iterated reasoning starting from a base policy. We investigate if policies learnt by agents using the OBL algorithm in the multi-player cooperative game Hanabi in the zero-shot coordination (ZSC) context are invariant across symmetries of the game, and if any conventions formed during training are arbitrary or natural, both of which are desirable properties.

1 min read · December 9, 2022

2022 · machine-learning project

Improving Domain Adaptation of Transformer Models For Generating Reddit Comments

We improve upon the recent success of large language models based on the transformer architecture by investigating and showing several methods that have empirically improved its performance in domain adaptation. We use a pre-trained GPT-2 model and perform fine-tuning on 5 different subreddits, and use different methods of ordering the training data based on our priors about the input to see how this affects the prediction quality of the trained model. We propose a new metric for evaluating causal language modeling tasks called APES (Average Perplexity Evaluation for Sentences) to address the limitations of existing metrics, and apply them to our results. Our results are evaluated against both LSTM and GPT-2 baselines.

4 min read · December 5, 2022

2022 · machine-learning project

Efficient Low Rank Approximation via Affine Embeddings

Suppose you have a $n \times d$ matrix $A$, where both dimensions are large. This could represent something like a customer-product matrix used in online recommender systems, where each cell $A_{i,j}$ denotes how many times customer $i$ purchased item $j$. Then it is typically the case that $A$ can be well-approximated by a low-rank matrix. For instance, using the previous example, there might only be a few dominant patterns that describes purchasing behavior in $A$, and the rest of it is just noise. Therefore, if we can find such a low-rank approximation, we can achieve significant space savings, and can also help to make the data more interpretable. In this post, we explore how affine embeddings via the CountSketch matrix allows us to perform low rank approximation in time $O\left(\nnz{A}+(n+d) \text{poly} \left( \frac{k}{\epsilon} \right)\right)$.

16 min read · September 29, 2022

2022 · theory machine-learning math

CMU 15-441/641 Computer Networks Course Review

Computer Networks is one of the lesser-known systems classes at Carnegie Mellon that turned out to be surprisingly fun and informative. In this post I'll talk about the projects and content covered, followed by my own thoughts on the usefulness on the class and who should take it.

15 min read · August 15, 2022

2022 · courses systems

Solving Genshin Impact's Ancient Azure Stars quest in Linear Time

It is summer yet again, and miHoYo has blessed us with the Summertime Odyssey event that explores the (often dark and painful) backstories of the cast comprising Kazuha, Xinyan, Fischl, and Mona, back on the setting of the Golden Apple Archipelago. One puzzle that I found interesting from a computational perspective was a major part of Mona's questline Ancient Azure Stars, which is the main topic of this post. In this puzzle, you are given a pattern that resembles a constellation that you need to imitate. The puzzle is interesting because even though its mechanics allows for an exponential search space (and also multiple possible solutions), clever algorithmic techniques can speed up finding a valid solution to almost linear time. This post is meant to be accessible to people with only some exposure to algorithms, and takes things step by step.

7 min read · August 8, 2022

2022 · general code

Impagliazzo's Five Worlds, or The Computational (Im)Possibilities of The World That We Live In

Most people have probably heard of the P = NP? problem in some shape or form, which asks whether the class of languages decidable in deterministic polynomial time is the same as the class of languages decidable in non-deterministic polynomial time. However, there are also several other interesting classes of intermediate possibilities that can arise if it was the case that P != NP, as this post explores.

8 min read · August 7, 2022

2022 · complexity-theory computer-science

My Sharing at the Hwa Chong Undergrad Alumni Forum

Last week, I had the wonderful opportunity to participate in Hwa Chong's Undergrad Alumni Forum to share my experiences studying at Carnegie Mellon's School of Computer Science, and to answer any questions that the students had about studying in the US and pursuing Computer Science as a degree. I was a student at Hwa Chong Institution from 2010-2015, during which I made many happy memories and learnt a lot about myself and the world. I had personally found these sharings really helpful back when I was still a student, and am very grateful for the chance to pass it on and hopefully help to inspire and encourage some of the participants to pursue their education overseas. It has personally had brought about incredible personal and intellectual growth, and exposed me to people and ideas that I would otherwise never have had the chance to meet.

4 min read · June 22, 2022

2022 · general opinion

The Delightful Consequences of the Graph Minor Theorem

The graph minor theorem, also known as the Robertson–Seymour theorem, is generally regarded as the most important result in graph theory. In this post we introduce the graph minor theorem, provide the necessary background, and discover its delightfully deep algorithmic and philosophical implications.

6 min read · June 1, 2022

2022 · math theory

Theoretical Foundations

Machine Learning, Computer Science, Mathematics

Notes on 'The Llama 3 Herd of Models'

Playing Sound Voltex at Home: Setting Up Unnamed SDVX Clone with the Yuancon SDVX Controller

Creating Trackback Requests for Static Sites

A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers: A Guided Walkthrough

The CMU Steam Tunnels and Wean 9

CMU 15712 Advanced Operating Systems and Distributed Systems Course Review

Score-Based Diffusion Models

The Art of LaTeX: Common Mistakes, and Advice for Typesetting Beautiful, Delightful Proofs

A Concise Proof of the Central Limit Theorem, and Its Actually Useful Version, the Berry-Esseen Theorem

Reinforcement Learning Policy Optimization: Deriving the Policy Gradient Update

Pseudo-determinism for Graph Streaming Problems

Graphical Bayesian Networks with Topic Modeling Priors for Predicting Asset Covariances

Analysis of Symmetry and Conventions in Off-Belief Learning (OBL) in Hanabi

Improving Domain Adaptation of Transformer Models For Generating Reddit Comments

Efficient Low Rank Approximation via Affine Embeddings

CMU 15-441/641 Computer Networks Course Review

Solving Genshin Impact's Ancient Azure Stars quest in Linear Time

Impagliazzo's Five Worlds, or The Computational (Im)Possibilities of The World That We Live In

My Sharing at the Hwa Chong Undergrad Alumni Forum

The Delightful Consequences of the Graph Minor Theorem