Location via proxy:   
[Report a bug]   [Manage cookies]                
yoda is hosted by Hepforge, IPPP Durham

YODA — Yet more Objects for Data Analysis

small, mean and full of Jedi magic

YODA is a small set of data analysis (specifically histogramming) classes being developed by MCnet members as a lightweight common system for MC event generator validation analyses, particularly as the core histogramming system in Rivet.

YODA is a refreshingly clean, natural and powerful way to do histogramming... and there are plenty of improvements still to come. Our mission is to make the most powerful, expressive, and focused approach to binned computational data handling, with the nicest possible balance of power and simplicity in the user interface. We hope you'll agree it's a good thing, but if not (or even if so) please get in touch and let us know about your thoughts, problems, and feature requests.

2024-10-28: YODA release 2.0.2

This release includes extensions to plotting customisations along with various patches and bug fixes.

2024-07-07: YODA release 2.0.1

In addition to various patches and bug fixes, this release add a whole new suite of features to the matplotlib-based plotting machinery, such as support for inhomogeneous binning, a heuristic for automated legend positioning or features to support for automatic scaling of uncertainty bands to arbitrary sigma levels or for plotting only a subset of the error breakdown where available.

2023-12-22: YODA release 2.0.0

A decade after its initial release, the YODA statistical library has been redesigned from the ground up, aiming to generalise the ideas of the first version while respecting real-world requirements: The YODA 2 architecture is heavily based on modern C++17 template methods, guaranteeing conceptual consistency and type correctness, as well as avoiding the code-duplication maintenance woes of the original YODA release series. For more details see the detailed write-up on the arXiv.

See the ChangeLog for more details.

We recommend this version of YODA for immediate use. Please let us know of any issues or improvement suggestions, and we will try to get them into an new version asap.

Get YODA now!

Features

The key features of YODA are as follows:

  • Storage of all information needed for statistically correct run combination and reweighting up to second-order correlations (e.g. variances, std devs, etc.) not just in the number of entries in a bin, but also the correlations of that with the x and y fill values.
  • Separation of statistics and data handling from presentation. YODA is primarily a library for doing the data part correctly: while we love really high quality data presentation, that's a separate goal.
  • A sensible class hierarchy for histogramming, recognising that a histogram contains details of fill history beyond the pure visual height of a bin, and that just counting weights, or binning arbitrary types on an axis are valuable operations.
  • Flexible data format support, including a new text-based, compact, and human-readable YODA format.
  • Proper and convenience treatment of "details" like irregular bin widths, gaps in contiguous binning, and overflows/underflows/etc. (incuding how they impact normalisation and calculation of histo-wide stat quantities)
  • Carefully designed programming interfaces in C++ and Python. We are very welcoming of feedback and design evolution, too!

Development areas

Several feature areas are under extension and redevelopment, for a soon-to-be-released YODA version 2. Please get in touch if interested in contributing code or design ideas. A fairly detailed description of the project status as of Dec 2023 is available in these slides by Chris Gutschow.

  • Plotting: plotting is currently included as a rudimentary matplotlib interface in yoda.plot, but is mostly performed via the Rivet make-plots scripts. These will be incorporated into Rivet, and the YODA type interfaces optimised for plotting via matplotlib, e.g. by adding methods which return exactly the plot-point arrays required for various mpl structures. For now, this script may be a helpful starting point for matplotlib plotting of YODA data types.
  • Generalised binned data types: histograms are currently the only binned data type, but can only be assembled through weighted fills. This means that data types such as reference data (and error) storage need to use Scatters and then manually match bin edges to scatter errors: yuck. YODA will be redeveloped so that histograms are built on a generic "binned storage" type, which can contain any sort of object -- and perform higher-level operations such as bin-merging if the stored type supports a merge operation.
  • First-class overflows: overflow bins are not currently treated as bins, since they have no meaningful width and therefore no meaningful "height" measure. Overflows are also second-class citizens in 2D (and higher) histograms, since a full indexing scheme is needed for them, in order to combine them with "real" bins when projecting or profiling in either variable. YODA will be extended with a binning scheme which always covers the whole real-number space in each variable, making overflows fully fledged bins and enabling more complex operations in N-dim space.

Previous releases

2024-02-10: YODA release 1.9.10

2023-12-02: YODA release 1.9.9

2023-04-27: YODA release 1.9.8

2022-09-28: YODA release 1.9.7

2022-07-15: YODA release 1.9.6

2022-05-13: YODA release 1.9.5

2021-12-02: YODA release 1.9.4

2021-11-24: YODA release 1.9.3

2021-11-05: YODA release 1.9.2

2021-08-13: YODA release 1.9.1

2021-03-31: YODA release 1.9.0

The 1.9 YODA release series is an increment of 1.8 with breaking changes to the yodamerge script, which is now much faster in particular for many large files, but as a consequence no longer has a 'simple stacking' mode. For this the new yodastack script has been introduced. This release also removes support for ROOT version 5, introduces the generic Fillable, Binned, and Scatter base classes and the fillDim() and rmPoint[s]() methods on fillables and scatters respectively, and fixes numerous small bugs and annoyances. Version 1.9.8 introduces new plotting facilities based on Python matplotlib.

2020-11-26: YODA release 1.8.5

2020-11-05: YODA release 1.8.4

2020-07-02: YODA release 1.8.3

2020-05-08: YODA release 1.8.2

2020-04-03: YODA release 1.8.1

2019-12-20: YODA release 1.8.0

The 1.8 YODA release series is mainly notable, among several feature additions, for breaking compatibility with the 1.7x and earlier Python API. All @property declarations have now been converted to normal object methods, to be called with parentheses (). This may break user-code which is directly using the YODA API, but conversion is trivial: just add the missing call-parentheses! Typically the error will manifest as a message about the return type being a function rather than a float, or similar. This change, while a bit annoying, is essential for future extensions that will pass optional arguments to these properties. Widely used downstream codes like Rivet and Contur have already been migrated to use the new API.

2019-06-18: YODA release 1.7.7

2019-06-07: YODA release 1.7.6
2019-05-09: YODA release 1.7.5
2018-12-10: YODA release 1.7.4
2018-09-24: YODA release 1.7.3
2018-08-23: YODA release 1.7.2
2018-08-14: YODA release 1.7.1
2017-12-21: YODA release 1.7.0

The 1.7 YODA release series sets in place several forward compatibility features with the eventual YODA 2.0 series, such as an explicit yoda1 Python module, use of YAML for persistent annotations, and a persistency versioning system (now on v2). YODA now also supports zipped reading and writing of data files/streams.

In addition there have been many small bugfixes and script/API improvements. YODA 1.7.1 supports multiple error bars on Scatter points; version 1.7.2 improves yodadiff and enables use of the binAt() method on const histograms; 1.7.3 fixes various minor bugs and adds more Pythonic accessors to aid plotting via matplotlib; 1.7.4 adds features to yodadiff and yodamerge, and functionality for assessing binning compatibility between histograms. There may be several small steps toward a major improvement of the Python API in version 1.8.x. See the ChangeLog for details.

2017-06-18: YODA release 1.6.7

2016-12-13: YODA release 1.6.6
2016-09-28: YODA release 1.6.5
2016-09-25: YODA release 1.6.4
2016-08-09: YODA release 1.6.3
2016-07-06: YODA release 1.6.2
2016-04-20: YODA release 1.6.1
2016-04-20: YODA release 1.6.0

The 1.6 YODA release series moves the codebase to use C++11 and eliminate dependence on the Boost library. We also now return NaNs from invalid statistical computations, to allow the user to choose how to handle the result -- matplotlib will by default mask plot elements with NaN values, for example. The C++ I/O interface has been generalised in neat ways, and several minor bug fixes have also made their way in. See the ChangeLog for details.

2016-03-09: YODA release 1.5.9

2015-12-21: YODA release 1.5.8
2015-12-13: YODA release 1.5.7
2015-11-22: YODA release 1.5.6
2015-10-07: YODA release 1.5.5
2015-10-06: YODA release 1.5.4
2015-09-23: YODA release 1.5.3
2015-09-11: YODA release 1.5.2
2015-09-03: YODA release 1.5.1
2015-08-28: YODA release 1.5.0

The 1.5 release adds several new convenient ways to read and write generic collections of analysis objects, simplifies and improves the YODA and FLAT format parsers (you can now read Scatter3Ds... at last!), and fixes a few rare issues in histogram division and the Python interface. The new I/O methods require Boost 1.48 or later.

The latest patch release also includes major speed improvements in the new parser, improved 1D axis rebinning tools, and better conversion routines between YODA and ROOT objects. See the ChangeLog for details.

We recommend this version of YODA for immediate use. Please let us know of any issues or improvement suggestions, and we will try to get them into an new version asap.

2015-07-01: YODA release 1.4.0

YODA 1.4.0 is now available!

This release cleans up some – but not all! – design errors that we made early on in YODA development, such as arithmetic operations on Scatters, which assumed special meanings of the X and Y axes. We've also improved many mappings of functions from C++ to the Python interface and increased the function coverage: much thanks to Adrian Buzatu for providing a comprehensive list of unmapped functions. The yodamerge script has also been improved following a lot of discussion, and the Python read() functions now allow "patterns" and "antipatterns" optional arguments to only load a subset of the analysis objects in a file, via path regexes.

We recommend this version of YODA for immediate use. Please let us know of any issues or improvement suggestions, and we will try to get them into a version 1.4.1 in time for the Rivet 2.3.0 release by the end of July 2015.

2015-03-19: YODA release 1.3.1

YODA 1.3.1 is now available!

This is mainly a bugfix and minor improvement release, affecting internals such as how overflow bin filling is triggered, bin edge treatment, Python interface improvements, and better script functionality. A major change is a new yoda.plotting Python sub-module which adds preliminary plotting functionality via the matplotlib library -- we expect to extend and improve this substantially in future releases. New yodacmp and yodaplot scripts are provided, which make use of this module. The yodascale script is now also much more powerful, allowing scaling and normalisation specific to histogram path patterns and optional bin ranges.

Compared to the 1.2.x releases, 1.3.x also provides an efficiency method for 2D histograms, fixes statistical sanity-checking logic in efficiency calculating routines, adds two-arg setting methods for 3D points, and a few other changes.

2014-09-30: YODA release 1.3.0

YODA 1.3.0 is now available!

This release provides an efficiency method for 2D histograms, fixes statistical sanity checking logic in efficiency calculating routines, adds two-arg setting methods for 3D points, and a few other changes. The major version number reflects the still-growing API. Have fun!

2014-09-01: YODA release 1.2.1

This release includes a significant bug fix for a problem introduced in 1.2.0 for binnings starting below zero. It should only have turned up as a performance hit, but did appear when running code in some special instrumented modes. This version also restricts direct access to bins to be read-only, to avoid direct calls to fill() leaving the histogram in an inconsistent state. The 1.2 series also contains more improvements to the API, a Scatter1D type, and read/write support for Scatter1D and Counter. I/O support for Scatter3D and for 2D outflows will come as soon as possible.


Currently everything is being done on the bug tracker and wiki, and is documented via Doxygen (class documentation, sure, but also design discussion and motivation). See the left-hand nav menu for links.