Binned Maximum Likelihood (binned ML, non-extended)¶

Assume the observable range is divided into $J$ bins with observed counts $N_j^{\mathrm{obs}}$ and total observed events $N_{\mathrm{obs}} = \sum_j N_j^{\mathrm{obs}}$ (fixed).

The model PDF is: $$ f(m|\theta) = r\, f_s(m|\mu,\sigma) + (1-r)\, f_b(m|\tau), \quad \theta = \{r, \mu, \sigma, \tau\}. $$

For each bin $j$, define the probability: $$ p_j(\theta) = \int_{b_j}^{b_{j+1}} f(m|\theta)\,dm, \qquad \sum_{j=1}^{J} p_j(\theta) = 1. $$

Then the (non-extended) binned likelihood is: $$ \mathcal{L}_{\mathrm{bML}}(\theta) = \prod_{j=1}^{J} [p_j(\theta)]^{N_j^{\mathrm{obs}}}. $$

Ignoring constants, the log-likelihood becomes: $$ \ln\mathcal{L}_{\mathrm{bML}}(\theta) = \sum_{j=1}^{J} N_j^{\mathrm{obs}} \ln p_j(\theta). $$

Binned Extended Maximum Likelihood (binned EML)¶

Now include the total expected number of events $N_{\mathrm{exp}} = N_s + N_b$, and fit $N_s$ and $N_b$ directly.

The expected number in bin (j) is: $$ N_j^{\mathrm{exp}}(\theta) = N_s \int_{b_j}^{b_{j+1}} f_s(m|\mu,\sigma)\,dm + N_b \int_{b_j}^{b_{j+1}} f_b(m|\tau)\,dm. $$

Each bin follows a Poisson distribution, giving the binned-EML likelihood: $$ \mathcal{L}_{\mathrm{bEML}}(\theta) = \prod_{j=1}^{J} \frac{[N_j^{\mathrm{exp}}(\theta)]^{N_j^{\mathrm{obs}}} e^{-N_j^{\mathrm{exp}}(\theta)}}{N_j^{\mathrm{obs}}!}. $$

Ignoring factorial constants, the log-likelihood is: $$ \ln\mathcal{L}_{\mathrm{bEML}}(\theta) = \sum_{j=1}^{J} [\, N_j^{\mathrm{obs}} \ln N_j^{\mathrm{exp}}(\theta) - N_j^{\mathrm{exp}}(\theta) \,]. $$

In [2]:
// === EML: Extended ML fit (Jupyter can run directly, no main/function wrapper) ===
#include "TCanvas.h"
#include "TSystem.h"
#include "RooRealVar.h"
#include "RooGaussian.h"
#include "RooExponential.h"
#include "RooAddPdf.h"
#include "RooDataSet.h"
#include "RooDataHist.h"
#include "RooPlot.h"
#include "RooFitResult.h"
#include "RooRandom.h"
#include <iostream>
#include <cmath>

using namespace RooFit;

// ----- Unified parameters & variables (consistent with UML naming) -----
double xmin=2.6, xmax=3.8;
int    nbins=80, seed=12345;
int    Nsig_true=100, Nbkg_true=500;

RooRealVar x("x","m (GeV)", xmin, xmax, "GeV");
x.setBins(nbins);

RooRandom::randomGenerator()->SetSeed(seed);

// Shape parameters (same as UML)
RooRealVar mu("mu","mean",   3.10, 3.0, 3.2);
RooRealVar sigma("sigma","sigma", 0.03, 0.005, 0.08);
RooGaussian   sig("sig","signal pdf", x, mu, sigma);

RooRealVar tau("tau","slope", -1.5, -10., -0.1);
RooExponential bkg("bkg","background pdf", x, tau);

// ===================== Data generation (consistent with UML: extended generation) =====================
RooRealVar Nsig_gen("Nsig_gen","signal yield(gen)", Nsig_true);
RooRealVar Nbkg_gen("Nbkg_gen","bkg yield(gen)",    Nbkg_true);
RooAddPdf  model_gen("model_gen","sig+bkg(gen)", RooArgList(sig,bkg), RooArgList(Nsig_gen,Nbkg_gen));

RooDataSet* dataset = model_gen.generate(x, Extended(true));
double Nobs = dataset->numEntries();

// Convert to binned data
RooDataHist* datahist = new RooDataHist("datahist", "binned data", x, *dataset);

// ===================== EML fit (extended: directly fit yields Nsig, Nbkg) =====================
RooRealVar Nsig("Nsig","signal yield", Nsig_true*0.8, 0.0, 1.0e5);
RooRealVar Nbkg("Nbkg","bkg yield",    Nbkg_true*1.2, 0.0, 1.0e6);
RooAddPdf  model("model","sig+bkg (EML)", RooArgList(sig,bkg), RooArgList(Nsig,Nbkg));

// Extended fit: include Extended(true)
// To switch to non-extended fit, remove Extended(true); yields will be treated as relative fractions (normalized to sum=1)
// For binned data, use SumW2Error(false) to perform binned likelihood fit with Poisson errors
RooFitResult* fitres = model.fitTo(*datahist, Save(true), Extended(true), SumW2Error(false));

std::cout << "\n===== EML (extended) Fit =====\n";
std::cout << "Nobs   = " << Nobs << " +/- " << sqrt(Nobs) << "\n";
std::cout << "Nsig   = " << Nsig.getVal() << " ± " << Nsig.getError() << "\n";
std::cout << "Nbkg   = " << Nbkg.getVal() << " ± " << Nbkg.getError() << "\n";
std::cout << "mu     = " << mu.getVal()    << " ± " << mu.getError()    << "\n";
std::cout << "sigma  = " << sigma.getVal() << " ± " << sigma.getError() << "\n";
std::cout << "tau    = " << tau.getVal()   << " ± " << tau.getError()   << "\n";
std::cout << "==============================\n";

// ===================== Plotting (top: fit + bottom: pull) =====================
if (auto *old = dynamic_cast<TCanvas*>(gROOT->FindObject("c_eml"))) delete old;
TCanvas* c = new TCanvas("c_eml","EML fit + Pull", 900, 800);
c->Divide(1,2);

// Top pad
c->cd(1);
gPad->SetPad(0,0.30,1,1.00);
gPad->SetBottomMargin(0.02);

RooPlot* frame = x.frame(Title("EML: Gaussian + exponential background"));
datahist->plotOn(frame, DataError(RooAbsData::Poisson), Name("data"));

// Plot total model curve with expected counts scale (consistent with data dimension)
model.plotOn(frame, Name("model"), LineWidth(2),
             Normalization(datahist->sumEntries(), RooAbsReal::NumEvent));
model.plotOn(frame, Components(sig), LineStyle(kDashed), LineWidth(2));
model.plotOn(frame, Components(bkg), LineStyle(kDotted), LineWidth(2));
frame->Draw();

// Bottom pad: pull (data vs model)
c->cd(2);
gPad->SetPad(0,0.00,1,0.30);
gPad->SetTopMargin(0.02);
gPad->SetBottomMargin(0.30);

RooHist* hpull = frame->pullHist("data","model");
RooPlot* fpull = x.frame(Title("Pull = (data - fit)/#sigma"));
fpull->addPlotable(hpull,"P");
fpull->SetYTitle("Pull");
fpull->SetMinimum(-5); fpull->SetMaximum(5);
fpull->GetYaxis()->SetTitleSize(0.10);
fpull->GetYaxis()->SetLabelSize(0.09);
fpull->GetXaxis()->SetTitleSize(0.10);
fpull->GetXaxis()->SetLabelSize(0.09);
fpull->Draw();

c->Draw();
gSystem->ProcessEvents();
[#1] INFO:Fitting -- RooAbsPdf::fitTo(model) fixing normalization set for coefficient determination to observables in data
[#1] INFO:Fitting -- using generic CPU library compiled with no vectorizations
[#1] INFO:Fitting -- RooAddition::defaultErrorLevel(nll_model_datahist) Summation contains a RooNLLVar, using its error level
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
Minuit2Minimizer: Minimize with max-calls 2500 convergence for edm < 1 strategy 1
Minuit2Minimizer : Valid minimum - status = 0
FVAL  = -3030.15266727453536
Edm   = 1.1620120248010768e-05
Nfcn  = 98
Nbkg	  = 471.552	 +/-  23.7259	(limited)
Nsig	  = 95.3606	 +/-  13.6741	(limited)
mu	  = 3.10861	 +/-  0.00549171	(limited)
sigma	  = 0.0348709	 +/-  0.00506008	(limited)
tau	  = -1.44779	 +/-  0.143796	(limited)
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization

===== EML (extended) Fit =====
Nobs   = 567 +/- 23.8118
Nsig   = 95.3606 ± 13.6841
Nbkg   = 471.552 ± 23.7348
mu     = 3.10861 ± 0.00549119
sigma  = 0.0348709 ± 0.0050634
tau    = -1.44779 ± 0.143803
==============================
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (sig)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: ()
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (bkg)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: ()
Info in <Minuit2>: MnSeedGenerator Computing seed using NumericalGradient calculator
Info in <Minuit2>: MnSeedGenerator Initial state: FCN =      -3016.103378 Edm =       13.60677532 NCalls =     21
Info in <Minuit2>: MnSeedGenerator Initial state  
  Minimum value : -3016.103378
  Edm           : 13.60677532
  Internal parameters:	[     -1.521801632     -1.514220239                0    -0.3398369095        0.7997354]	
  Internal gradient  :	[      4522.616221      77.24441176     -28.28499497     -2.548811358     -15.01418662]	
  Internal covariance matrix:
[[  2.3524256e-06              0              0              0              0]
 [              0  3.5771111e-05              0              0              0]
 [              0              0   0.0064693003              0              0]
 [              0              0              0    0.028253395              0]
 [              0              0              0              0   0.0032727656]]]
Info in <Minuit2>: VariableMetricBuilder Start iterating until Edm is < 0.001 with call limit = 2500
Info in <Minuit2>: VariableMetricBuilder    0 - FCN =      -3016.103378 Edm =       13.60677532 NCalls =     21
Info in <Minuit2>: VariableMetricBuilder    1 - FCN =      -3029.013861 Edm =      0.9835260046 NCalls =     32
Info in <Minuit2>: VariableMetricBuilder    2 - FCN =      -3029.996931 Edm =       0.145866222 NCalls =     44
Info in <Minuit2>: VariableMetricBuilder    3 - FCN =      -3030.147871 Edm =     0.00466153727 NCalls =     56
Info in <Minuit2>: VariableMetricBuilder    4 - FCN =      -3030.152667 Edm =    1.24556138e-05 NCalls =     67
Info in <Minuit2>: VariableMetricBuilder After Hessian
Info in <Minuit2>: VariableMetricBuilder    5 - FCN =      -3030.152667 Edm =   1.162012025e-05 NCalls =     98
Info in <Minuit2>: Minuit2Minimizer::Hesse Using max-calls 2500
Info in <Minuit2>: Minuit2Minimizer::Hesse Hesse is valid - matrix is accurate
In [ ]: