Use DuckDB in ensembldb to query Ensembl's genome annotations

I have been using ensembldb to query genome annotations locally, which stores the Ensembl annotations in a offline SQLite database. By replacing the database engine with DuckDB, genome-wide queries are faster with small impact on gene specific queries (depending on the usage). DuckDB database’s file size is also smaller, and it can be even smaller by offloading the tables to external Parquet files.

Thesis in LaTeX

A few months ago, I finished my PhD thesis in LaTeX. The WUSTL LaTeX template I could find has a long history1, which has a relative lengthy implementation and includes unnecessary code. I didn’t feel comfortable to build on top of it.

I ended up rewriting the template …

Fix Fira Code font ligatures and features

Fira Code has been my choice of the programming font for a while. It’s also the default monospace font of my blog. I like its ligatures such as >= and connected lines ====== ------. It evens renders the progress bar nicely . It makes my plain text documents look neat.

That said, I …

Change the blog commenting system to utterances

My blog is statically generated, so it needs an external service for commenting. I chose Disqus when I started my blog because it was a popular choice, and it is free and easy to setup. However, there’s been increasing concern about its extensive user tracking, ads, and therefore a …

Identify the Ensembl release from versioned IDs

I often received data that was annotated by an unknown Ensembl release/version.

It could be the Ensembl IDs in a gene expression matrix, a VEP annotated MAF file, or even a customized GTF. The documentation of those files wasn’t always clear about the annotation in use. However, it …

Generate Venn diagrams easily

I find myself generating Venn diagrams quite often. While there are many available Venn diagram plotting libraries available, they don’t always fit my need. My inputs of the diagram are the set sizes rather than lists of observations. And after drawing the Venn diagram, I often edit them to …

Store GDC genome as a Seqinfo object

Genomic Data Commons (GDC) hosted by NCI is the place to harmonize past and future genomic data, such as TCGA, TARGET, and CPTAC projects. GDC has its own genome reference, GRCh38.d1.vd1, which has 2,779 “chromosomes” including decoys and virus sequences. That said, the canonical chromosomes of GRCh38 …

Make Firefox fullscreen borderless on macOS

EDIT 2021-06-01: In Firefox 89+, there’s a default option “Hide Toolbar” in the fullscreen mode that automatically hides the toolbar. So the customization is no longer needed.

Firefox fullscreen on macOS by default contains the address bar and the tab bar. I usually don’t really need the full …