Binned Maximum Likelihood (binned ML, non-extended)¶
Assume the observable range is divided into $J$ bins with observed counts $N_j^{\mathrm{obs}}$ and total observed events $N_{\mathrm{obs}} = \sum_j N_j^{\mathrm{obs}}$ (fixed).
The model PDF is: $$ f(m|\theta) = r\, f_s(m|\mu,\sigma) + (1-r)\, f_b(m|\tau), \quad \theta = \{r, \mu, \sigma, \tau\}. $$
For each bin $j$, define the probability: $$ p_j(\theta) = \int_{b_j}^{b_{j+1}} f(m|\theta)\,dm, \qquad \sum_{j=1}^{J} p_j(\theta) = 1. $$
Then the (non-extended) binned likelihood is: $$ \mathcal{L}_{\mathrm{bML}}(\theta) = \prod_{j=1}^{J} [p_j(\theta)]^{N_j^{\mathrm{obs}}}. $$
Ignoring constants, the log-likelihood becomes: $$ \ln\mathcal{L}_{\mathrm{bML}}(\theta) = \sum_{j=1}^{J} N_j^{\mathrm{obs}} \ln p_j(\theta). $$
Binned Extended Maximum Likelihood (binned EML)¶
Now include the total expected number of events $N_{\mathrm{exp}} = N_s + N_b$, and fit $N_s$ and $N_b$ directly.
The expected number in bin (j) is: $$ N_j^{\mathrm{exp}}(\theta) = N_s \int_{b_j}^{b_{j+1}} f_s(m|\mu,\sigma)\,dm + N_b \int_{b_j}^{b_{j+1}} f_b(m|\tau)\,dm. $$
Each bin follows a Poisson distribution, giving the binned-EML likelihood: $$ \mathcal{L}_{\mathrm{bEML}}(\theta) = \prod_{j=1}^{J} \frac{[N_j^{\mathrm{exp}}(\theta)]^{N_j^{\mathrm{obs}}} e^{-N_j^{\mathrm{exp}}(\theta)}}{N_j^{\mathrm{obs}}!}. $$
Ignoring factorial constants, the log-likelihood is: $$ \ln\mathcal{L}_{\mathrm{bEML}}(\theta) = \sum_{j=1}^{J} [\, N_j^{\mathrm{obs}} \ln N_j^{\mathrm{exp}}(\theta) - N_j^{\mathrm{exp}}(\theta) \,]. $$
// === EML: Extended ML fit (Jupyter can run directly, no main/function wrapper) ===
#include "TCanvas.h"
#include "TSystem.h"
#include "RooRealVar.h"
#include "RooGaussian.h"
#include "RooExponential.h"
#include "RooAddPdf.h"
#include "RooDataSet.h"
#include "RooDataHist.h"
#include "RooPlot.h"
#include "RooFitResult.h"
#include "RooRandom.h"
#include <iostream>
#include <cmath>
using namespace RooFit;
// ----- Unified parameters & variables (consistent with UML naming) -----
double xmin=2.6, xmax=3.8;
int nbins=80, seed=12345;
int Nsig_true=100, Nbkg_true=500;
RooRealVar x("x","m (GeV)", xmin, xmax, "GeV");
x.setBins(nbins);
RooRandom::randomGenerator()->SetSeed(seed);
// Shape parameters (same as UML)
RooRealVar mu("mu","mean", 3.10, 3.0, 3.2);
RooRealVar sigma("sigma","sigma", 0.03, 0.005, 0.08);
RooGaussian sig("sig","signal pdf", x, mu, sigma);
RooRealVar tau("tau","slope", -1.5, -10., -0.1);
RooExponential bkg("bkg","background pdf", x, tau);
// ===================== Data generation (consistent with UML: extended generation) =====================
RooRealVar Nsig_gen("Nsig_gen","signal yield(gen)", Nsig_true);
RooRealVar Nbkg_gen("Nbkg_gen","bkg yield(gen)", Nbkg_true);
RooAddPdf model_gen("model_gen","sig+bkg(gen)", RooArgList(sig,bkg), RooArgList(Nsig_gen,Nbkg_gen));
RooDataSet* dataset = model_gen.generate(x, Extended(true));
double Nobs = dataset->numEntries();
// Convert to binned data
RooDataHist* datahist = new RooDataHist("datahist", "binned data", x, *dataset);
// ===================== EML fit (extended: directly fit yields Nsig, Nbkg) =====================
RooRealVar Nsig("Nsig","signal yield", Nsig_true*0.8, 0.0, 1.0e5);
RooRealVar Nbkg("Nbkg","bkg yield", Nbkg_true*1.2, 0.0, 1.0e6);
RooAddPdf model("model","sig+bkg (EML)", RooArgList(sig,bkg), RooArgList(Nsig,Nbkg));
// Extended fit: include Extended(true)
// To switch to non-extended fit, remove Extended(true); yields will be treated as relative fractions (normalized to sum=1)
// For binned data, use SumW2Error(false) to perform binned likelihood fit with Poisson errors
RooFitResult* fitres = model.fitTo(*datahist, Save(true), Extended(true), SumW2Error(false));
std::cout << "\n===== EML (extended) Fit =====\n";
std::cout << "Nobs = " << Nobs << " +/- " << sqrt(Nobs) << "\n";
std::cout << "Nsig = " << Nsig.getVal() << " ± " << Nsig.getError() << "\n";
std::cout << "Nbkg = " << Nbkg.getVal() << " ± " << Nbkg.getError() << "\n";
std::cout << "mu = " << mu.getVal() << " ± " << mu.getError() << "\n";
std::cout << "sigma = " << sigma.getVal() << " ± " << sigma.getError() << "\n";
std::cout << "tau = " << tau.getVal() << " ± " << tau.getError() << "\n";
std::cout << "==============================\n";
// ===================== Plotting (top: fit + bottom: pull) =====================
if (auto *old = dynamic_cast<TCanvas*>(gROOT->FindObject("c_eml"))) delete old;
TCanvas* c = new TCanvas("c_eml","EML fit + Pull", 900, 800);
c->Divide(1,2);
// Top pad
c->cd(1);
gPad->SetPad(0,0.30,1,1.00);
gPad->SetBottomMargin(0.02);
RooPlot* frame = x.frame(Title("EML: Gaussian + exponential background"));
datahist->plotOn(frame, DataError(RooAbsData::Poisson), Name("data"));
// Plot total model curve with expected counts scale (consistent with data dimension)
model.plotOn(frame, Name("model"), LineWidth(2),
Normalization(datahist->sumEntries(), RooAbsReal::NumEvent));
model.plotOn(frame, Components(sig), LineStyle(kDashed), LineWidth(2));
model.plotOn(frame, Components(bkg), LineStyle(kDotted), LineWidth(2));
frame->Draw();
// Bottom pad: pull (data vs model)
c->cd(2);
gPad->SetPad(0,0.00,1,0.30);
gPad->SetTopMargin(0.02);
gPad->SetBottomMargin(0.30);
RooHist* hpull = frame->pullHist("data","model");
RooPlot* fpull = x.frame(Title("Pull = (data - fit)/#sigma"));
fpull->addPlotable(hpull,"P");
fpull->SetYTitle("Pull");
fpull->SetMinimum(-5); fpull->SetMaximum(5);
fpull->GetYaxis()->SetTitleSize(0.10);
fpull->GetYaxis()->SetLabelSize(0.09);
fpull->GetXaxis()->SetTitleSize(0.10);
fpull->GetXaxis()->SetLabelSize(0.09);
fpull->Draw();
c->Draw();
gSystem->ProcessEvents();
[#1] INFO:Fitting -- RooAbsPdf::fitTo(model) fixing normalization set for coefficient determination to observables in data [#1] INFO:Fitting -- using generic CPU library compiled with no vectorizations [#1] INFO:Fitting -- RooAddition::defaultErrorLevel(nll_model_datahist) Summation contains a RooNLLVar, using its error level [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization Minuit2Minimizer: Minimize with max-calls 2500 convergence for edm < 1 strategy 1 Minuit2Minimizer : Valid minimum - status = 0 FVAL = -3030.15266727453536 Edm = 1.1620120248010768e-05 Nfcn = 98 Nbkg = 471.552 +/- 23.7259 (limited) Nsig = 95.3606 +/- 13.6741 (limited) mu = 3.10861 +/- 0.00549171 (limited) sigma = 0.0348709 +/- 0.00506008 (limited) tau = -1.44779 +/- 0.143796 (limited) [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization ===== EML (extended) Fit ===== Nobs = 567 +/- 23.8118 Nsig = 95.3606 ± 13.6841 Nbkg = 471.552 ± 23.7348 mu = 3.10861 ± 0.00549119 sigma = 0.0348709 ± 0.0050634 tau = -1.44779 ± 0.143803 ============================== [#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (sig) [#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: () [#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (bkg) [#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: ()
Info in <Minuit2>: MnSeedGenerator Computing seed using NumericalGradient calculator Info in <Minuit2>: MnSeedGenerator Initial state: FCN = -3016.103378 Edm = 13.60677532 NCalls = 21 Info in <Minuit2>: MnSeedGenerator Initial state Minimum value : -3016.103378 Edm : 13.60677532 Internal parameters: [ -1.521801632 -1.514220239 0 -0.3398369095 0.7997354] Internal gradient : [ 4522.616221 77.24441176 -28.28499497 -2.548811358 -15.01418662] Internal covariance matrix: [[ 2.3524256e-06 0 0 0 0] [ 0 3.5771111e-05 0 0 0] [ 0 0 0.0064693003 0 0] [ 0 0 0 0.028253395 0] [ 0 0 0 0 0.0032727656]]] Info in <Minuit2>: VariableMetricBuilder Start iterating until Edm is < 0.001 with call limit = 2500 Info in <Minuit2>: VariableMetricBuilder 0 - FCN = -3016.103378 Edm = 13.60677532 NCalls = 21 Info in <Minuit2>: VariableMetricBuilder 1 - FCN = -3029.013861 Edm = 0.9835260046 NCalls = 32 Info in <Minuit2>: VariableMetricBuilder 2 - FCN = -3029.996931 Edm = 0.145866222 NCalls = 44 Info in <Minuit2>: VariableMetricBuilder 3 - FCN = -3030.147871 Edm = 0.00466153727 NCalls = 56 Info in <Minuit2>: VariableMetricBuilder 4 - FCN = -3030.152667 Edm = 1.24556138e-05 NCalls = 67 Info in <Minuit2>: VariableMetricBuilder After Hessian Info in <Minuit2>: VariableMetricBuilder 5 - FCN = -3030.152667 Edm = 1.162012025e-05 NCalls = 98 Info in <Minuit2>: Minuit2Minimizer::Hesse Using max-calls 2500 Info in <Minuit2>: Minuit2Minimizer::Hesse Hesse is valid - matrix is accurate