Step-by-Step Guide to Generate Synthetic Data by Sampling From Univariate Distributions
Learn how to create synthetic data in case your project runs low on data or use it for simulations
Data is the fuel in Data Science projects. But what if the observations are scarce, expensive, or difficult to measure? Synthetic data can be the solution. Synthetic data is artificially generated data that mimics the statistical properties of real-world events. I will demonstrate how to create continuous synthetic data by sampling from univariate distributions. First, I will show how to evaluate systems and processes by simulation where we need to choose a probability distribution and specify the parameters. Secondly, I will demonstrate how to generate samples that mimic the properties of an existing data set, i.e., the random variables that are distributed according to a probabilistic model. All examples are created using scipy and the distfit library.
If you find this article helpful, use my referral link to continue learning without limits and sign up for a Medium membership. Plus, follow me to stay up-to-date with my latest content!