feat: update plot sample to 1000 rows#458
Conversation
In making a line plot sample with Salem, I noticed that 100 rows loses some important shape information. Most screens are > 1000 pixels wide, so this seems a reasonable default.
| def _compute_plot_data(self, data): | ||
| # TODO: Cache the sampling data in the PlotAccessor. | ||
| sampling_n = self.kwargs.pop("sampling_n", 100) | ||
| sampling_n = self.kwargs.pop("sampling_n", 1000) |
There was a problem hiding this comment.
Maybe 500 or something would be better? 640x480 was a very common resolution in the 1990s.
There was a problem hiding this comment.
I investigated how sample size affects the shape of a dataset. My findings (see document: https://docs.google.com/document/d/1KaIF7zX-7seXsb-rohl56jYjhjFdLc-HlNOYfTM1Zfw/edit?tab=t.0) indicate that:
- Samples of size 500 (sampling_n=500) broadly reflect the same shape as a sample of 1000.
- A sample size of 1000 yields a closer approximation to the true underlying distribution.
Maybe we can proceed with a sample size of 1000? Visualize the denser data in the graph for a more informative representation?
|
Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot. |
In making a line plot sample with Salem, I noticed that 100 rows loses some important shape information. Most screens are > 1000 pixels wide, so this seems a reasonable default.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕