RooFit Tutorial: Fitting an Exponential Decay Model¶

Background and Key Concepts¶

RooFit is a powerful statistical modeling toolkit integrated within ROOT, designed for building complex probability density functions (PDFs), handling datasets, and performing advanced fits like maximum likelihood (ML) estimation. It's particularly useful for particle physics analyses involving exponential decays, such as radioactive processes where events follow $f(t|\tau) = \frac{1}{|\tau|} e^{-t/|\tau|} $, with $\tau$ as the (negative) lifetime parameter. RooFit supports both unbinned and binned data, allowing for precise modeling of shapes, yields, and backgrounds.

Basic Building Blocks of RooFit¶

RooFit uses an object-oriented approach where everything is a class instance, enabling modular construction. Core components include:

  • Variables (RooRealVar): Represent observables (e.g., time $ t $) or parameters (e.g., mean, sigma). They have names, titles, values, ranges, and can be fixed or floating. Example: RooRealVar x("x", "x", -10, 10); for a variable in [-10,10].
  • Datasets:
    • RooDataSet: For unbinned data, storing individual event values (efficient for high precision).
    • RooDataHist: For binned data, essentially a histogram wrapper (faster for large samples).
  • PDFs: Probability density functions, the heart of modeling.
    • Built-in: RooGaussian, RooExponential, RooPolynomial, etc.
    • Composite: Combine via RooAddPdf (sum PDFs, e.g., signal + background) or RooProdPdf (product for multidimensional or conditional models).
  • Fitting: The fitTo method performs minimization (e.g., ML or chi-squared) using Minuit. Options like Extended(kTRUE) add a Poisson term for yields.
  • Data Generation: PDFs have generate to create toy datasets, respecting normalization.
  • Plotting: Use RooPlot frames with plotOn for data/PDF overlays; supports custom binning and error types.

These blocks allow scalable models—from simple fits to full likelihoods with constraints.

PDF Normalization in RooFit¶

RooFit automatically normalizes PDFs to integrate to unity over the observable's defined range. This ensures probabilistic correctness without manual intervention—e.g., for RooExponential, the normalization factor $ 1 / \int f(t|\tau) dt $ is computed internally during fits, plots, or data generation. This prevents biases and maintains meaningful likelihoods. Note: Always define realistic ranges for observables to avoid truncation issues; RooFit warns if integrals fail due to parameter extremes.

Differences Between RooFit and ROOT's Built-in Fit Functions¶

  • ROOT's core fitting (e.g., TH1::Fit or TF1) uses simple functions like expo for exponentials, relying on chi-squared minimization for binned histograms or basic ML for graphs. It's quick for basic tasks but limited to predefined functions, lacks built-in extension for yields, and doesn't handle complex composite models or unbinned data natively.
  • RooFit extends this by treating PDFs as objects, enabling modular builds (e.g., adding backgrounds via RooAddPdf), automatic normalization, extended ML fits (incorporating Poisson yield uncertainties), and advanced error handling (e.g., asymmetric Poisson errors). It's more flexible for multidimensional or conditional models but requires object-oriented setup, making it slower for trivial fits. Use ROOT's fits for quick histograms; switch to RooFit for statistical rigor in analyses.

Key advantages of RooFit: Robust error propagation, visualization with pull plots for fit quality assessment, and seamless integration with ROOT's I/O for real data from trees or files.

Usage Considerations¶

  • Environment: Run in ROOT (e.g., root -l or Jupyter with ROOT kernel). Include necessary headers at the start.
  • Data Choices: Unbinned preserves event precision; binned speeds computation for large sets.
  • Fit Types: Extended ML estimates both shape and yield; non-extended fixes the yield.
  • Errors and Plots: Prefer Poisson errors for count data to handle low-statistics bins properly—avoids division-by-zero in pulls.
  • Pitfalls: Set negative $\tau$ for decays; seed randoms for reproducibility; check fit convergence with verbose output if needed.

Practical Example: Step-by-Step with Code¶

Below, we walk through defining, fitting, and plotting an exponential decay model using unbinned data and extended ML fit. Each step highlights basic building blocks in action (combine snippets into a single ROOT macro or Jupyter cell for execution). We'll use toy data for illustration—replace with real data as needed.

Setup and Definitions¶

Start by including headers and defining variables (RooRealVar for observable and parameters), then build the PDF. RooFit normalizes the exponential automatically over t's range [0,5].

#include "RooRealVar.h"     // Variables
#include "RooDataSet.h"     // Unbinned datasets
#include "RooExponential.h" // Exponential PDF
#include "RooExtendPdf.h"   // Extended PDF
#include "RooPlot.h"        // Plotting
#include "TCanvas.h"        // Canvas
#include "TPad.h"           // Pads
#include "TRandom3.h"       // Random generation

using namespace RooFit;     // Convenience namespace

// For reproducibility
gRandom->SetSeed(42);

// Define observable: time t in [0,5] (RooRealVar building block)
RooRealVar t("t", "time", 0, 5);

// Define parameters: tau negative for decay, nsig for event yield (RooRealVar)
RooRealVar tau("tau", "lifetime", -1, -5, -0.1);  // Initial -1
RooRealVar nsig("nsig", "number of events", 1000, 0, 3000);  // Initial 1000

// Build base PDF (auto-normalized) and extend it (PDF building blocks)
RooExponential expo("expo", "exponential PDF", t, tau);
RooExtendPdf eexpo("eexpo", "extended exponential PDF", expo, nsig);

Here, RooExponential is a basic PDF block, normalized internally, and RooExtendPdf scales it by nsig for the extended term, allowing the fit to float the total events.

Generate or Load Data¶

Create toy unbinned data (RooDataSet) from the base PDF (using generate, which respects normalization). In practice, load real data via RooDataSet::read or from a ROOT tree.

// Generate 1000 toy events (respects normalization, dataset building block)
RooDataSet* dataset = expo.generate(t, 1000);

This produces events distributed according to the normalized PDF—e.g., more events at low t for decay.

Perform the Fit¶

Use extended ML on unbinned data (fitTo method). RooFit optimizes parameters via Minuit, respecting normalization.

// Extended unbinned ML fit (quiet mode, fitting building block)
eexpo.fitTo(*dataset, PrintLevel(-1));

For non-extended, use expo.fitTo(*dataset)—but extended is better for uncertain yields, as it includes Poisson term in the likelihood. For composites, replace with RooAddPdf if adding backgrounds.

Print Results¶

Inspect fitted values. Errors come from the Hessian matrix, accounting for normalization.

// Print outputs
tau.Print();   // e.g., tau  = -1.001 +/- 0.0316
nsig.Print();  // e.g., nsig = 1000  +/- 31.62 (Poisson-like)

This shows the estimated lifetime and yield with uncertainties—more precise than ROOT's basic fits.

Plot the Results¶

Visualize data, fit, and pulls (RooPlot for frames, plotOn for overlays). Use binning for display; Poisson errors handle low counts.

// Create canvas with pads
TCanvas* canvas = new TCanvas("canvas", "Exponential Fit with Pull Plot", 600, 600);
TPad* pad1 = new TPad("pad1", "Main plot", 0, 0.3, 1, 1.0);
TPad* pad2 = new TPad("pad2", "Pull plot", 0, 0.0, 1, 0.3);
pad1->Draw(); pad2->Draw();

// Main plot pad (plotting building block)
pad1->cd();
RooPlot* frame = t.frame(Title("Exponential Decay Fit"));
dataset->plotOn(frame, Binning(50), DataError(RooAbsData::Poisson));  // Binned view with Poisson errors
eexpo.plotOn(frame, LineColor(kRed));  // Normalized fitted curve
frame->Draw();

// Pull plot pad
pad2->cd();
RooPlot* frame_pull = t.frame(Title("Pull Distribution"));
RooHist* hpull = frame->pullHist();  // (data - fit)/sigma, using Poisson sigma
frame_pull->addPlotable(hpull, "P");
frame_pull->SetYTitle("Pull");
frame_pull->Draw();

canvas->Draw();

The plot scales the PDF to the data via normalization and yield. Pulls should cluster around 0 with RMS ~1 for a good fit—Poisson option prevents artifacts in empty bins, unlike ROOT's default Gaussian assumptions.

Tips for Adaptation and Expansion¶

  • Binned Version: Add t.setBins(50); and use RooDataHist datahist("datahist", "binned data", t, *dataset); (binned dataset block), then fit to datahist.
  • Composite Models: For signal + background, use RooAddPdf total("total", "sig+bkg", RooArgList(expo, bkg), RooArgList(nsig, nbkg));—demonstrates additive building.
  • Multidimensional: Add another RooRealVar y("y","y",-5,5); and use RooProdPdf for independent PDFs.
  • Real Data: Use RooDataSet data("data", "data", RooArgSet(t), ImportFromFile("file.root", "tree")).
  • Advanced: Add constraints via RooGaussian constraint("const", "const", tau, RooConst(-1), RooConst(0.1)); and fit with ExternalConstraints.

This mini tutorial covers RooFit's essentials through a practical example—build simple models first, then expand using the blocks for complex analyses!

Additional Resources¶

For more comprehensive online tutorials and documentation on RooFit, refer to:

  • Official ROOT RooFit Tutorials: https://root.cern/doc/master/group__tutorial__roofit.html
  • ROOT Manual - RooFit Section: https://root.cern/manual/roofit/
  • RooFit User's Guide (PDF): https://root.cern.ch/download/doc/roofit.pdf
In [ ]: