Research Engineer
Meta - Fundamental AI Research (FAIR) Labs
Originally, I am from Florianópolis (Brazil) but I've lived in New Jersey, Orlando, Toronto (now), São Paulo, as well as other smaller cities in the south of Brazil. I spent 2022 at Google AI with Lucas Theis and Johannes Ballé as a Student Researcher.
I'm interested in information theory, machine learning, and AI.
Lossless compression algorithms typically preserve the ordering in which data points are compressed. However, there are data types where order is not meaningful, such as collections of files, rows in a database, nodes in a graph, and, notably, datasets in machine learning applications.
Compressing with traditional algorithms is possible if we pick an order for the elements and communicate the corresponding ordered sequence. However, unless the order information is somehow removed during the encoding process, this procedure will be sub-optimal, because the order contains information and therefore more bits are used to represent the source than are truly necessary.
In previous works, we gave a formal definition for non-sequential objects as random sets of equivalent sequences, which we call Combinatorial Random Variables (CRVs), as well as a general class of computatioanlly efficient algorithms that achieve the optimal compression rate of CRVs: Random Permutation Codes (RPCs). Specialized RPCs are given for the case of multisets (Random Order Coding), graphs (Random Edge Coding), and partitions/clusterings (under review), providing new algorithms for compression of databases, social networks, and web data in the JSON file format.
Currently, I'm interested in the application of RPCs to reduce the memory footprint of vector databases.
April 2024 - I've moved to Montréal to start as a Research Engineer at FAIR Labs!
March 2024 - LASI and Shuffle Coding were accepted to ICLR 2024.
August 2023 - I started a second internship at FAIR (Meta AI) in information theory and generative modelling with Matthew Muckley.
April 2023 - Random Edge Coding and Action Matching were accepted to ICML 2023.
- ICML 2023 Workshop on Neural Compression and Information Theory, 2023
- Asymmetric Numeral Systems (ANS) codec in pure Python, 2021
- A tutorial on bits-back with Huffman coding, 2021
- Vectorized Run-Length Encoding, 2021
- Persisting lru_cache to disk while using hashable pandas objects for parallel experiments, 2020
- Darts, Dice, and Coins: Sampling from a Discrete Distribution by Keith Schwarz, 2011
For a complete list, please see my Google Scholar profile.