- Unit Testing dbt Macros: A workaround for dbt's unit testing limitations Ever wished you could catch that broken SQL logic before it wrecks your dashboards? With dbt 1.8's new unit testing capabilities, you can finally sleep at night! However, support for testing macros is still limited. Let's explore how to test both models and macros with a workaround. Continue reading...
- Data Observability is not a tool: understanding data quality at the source, in transformations and in governance Have you ever wasted time or money because you made a decision based on incorrect data? Then you'll appreciate good data quality. Buying an observability, however, might not be the solution to your data quality issues. Let's explore how data quality issues arise at the source, in transformations and in data governance and find the appropriate solutions to those problems. Continue reading...
- Data Ingestion Pipelines Without Headaches: 8 simple steps Data, like wine and cheese, becomes more valuable when combined. However, to combine, you must first retrieve the data and a reliable and scalable manner. This post covers the 8 steps of a data ingestion pipeline and 3 overarching topics to ensure reliability and quality over time. Continue reading...
- Adding Geo and ISP data to your analytics hits with Snowplow and Cloudflare Workers In this post we'll look at how to add geo and ISP data to your analytics hits with Snowplow and Cloudflare Workers, an approach that you can also re-use for GA4. Continue reading...
- Own your web analytics pipeline for ā¬0.02 per day: Snowplow, Terraform, dbt, BigQuery and Docker Running Snowplow for your (web) analytics pipeline to expensive? Here's a ā¬0.02/day minimal, serverless version of Snowplow open source that you can deploy for your blog or website with Terraform (on GCP/BigQuery) in 5 minutes giving you full ownership of a web and app analytics pipeline from data collection to custom data models (š goodbye Google Analytics). Continue reading...
- Fetching IPv4 CIDR ranges from AWS, GCP, Azure and Cloudflare for bot detection with Python Bots usually run on one of the major cloud providers. Identifying them can be a big factor in determining the quality of your traffic. Whether that's for web analytics or threat mitigation, it's useful to have an overview of IP ranges to identify in bot scoring. Continue reading...
- Automatically Lint and Publish your Snowplow Schemas with Github Actions Snowplow schemas are a great way to codify expected data in JSON format. Using Github actions you can make them eevn more powerful by automatically checking for typos, validity, and other errors as well as directly publishing them to your production environment with no manual action. Continue reading...
- Using Search Console BigQuery data with NLP to extract and group by the most important topics With the recent update to Google Search Console (GSC) allowing exports to BigQuery we can now leverage some power features of BigQuery to do text processing and extract topics from our search queries with a simple JavaScript UDF. Continue reading...
- Why web analytics is still a mess in 2023 Web analytics still feels 'messy' in 2023. Why is it so hard to solve the problem of web analytics? Let's dive into some of the misconceptions that fuel the mess, like the ideas that websites are easy, are visited by people, that web analytics is about tracking poeple, that we have all the tools we need, and that web analytics is actually important. Continue reading...
- Mastering Time in dbt: Incremental Merging of Estimates and Actuals for large datasets Managing incrementality (change over time) in a large database is hard. Dbt can help us alleviate some of the pain by making the selection of incremental strategies we have easier to choose from. Lets look at updating an example sales table with actuals and estimates over time. Continue reading...