BEST Using the Standalone Function
This guide has been adapted from the instructions by justinrpilot and demarley.
Introduction
BEST: Boosted Event Shape Tagger is a neural network that was trained to identify AK8 jets that come from heavy Standard Model particles. This neural network was developed by CMS analysts and is available on GitHub from justinrpilot. Information about how BEST was trained is available on arXiv. This guide describes how to use BEST as a function in a CMS EDAnalyzer.
Instructions
Dependencies
BEST requires CMSSW and was configured on CERN’s lxplus and Fermilab’s lpc. The BEST standalone function can be used with CMSSW_9_4_X
.
In order to install CMSSW on lxplus, execute the following
cmsrel CMSSW_9_4_8
cd CMSSW_9_4_8/src/
scram b
cmsenv
Installation
BEST was programmed in python and trained using scikit-learn. However, the standard EDAnalyzer in CMS uses C++, so BEST requires an additional package in order to become a C++ function. This package can be found here. The first step in setting up BEST is installing this package [2]:
cd CMSSW_9_4_8/src/
mkdir lwtnn
cd lwtnn
git clone https://github.com/demarley/lwtnn.git
cd lwtnn
git checkout CMSSW_8_0_X-compatible
scram b -j8
Now, BEST can be installed and compiled [1].
cd CMSSW_9_4_8/src/
git clone https://github.com/cms-ttbarAC/BESTAnalysis.git
cd BESTAnalysis
git checkout master
cd BoostedEventShapeTagger
scram b -j8
This configuration allows for BEST to be use as a function in CMS’s EDAnalyzer [1].
Using the BEST function
EDAnalyzer modules allow analysts to interact with the CMS framework in order to study physics data. I will not go into great detail about how to use EDAnalyzer, because there is already a great tutorial in the CMS work book. I did include a couple of useful instructions from this tutorial to get the BEST function up and running.
Getting an EDAnalyzer Template
A template EDAnalyzer can be automatically created by CMSSW.
cd CMSSW_9_4_8/src/
mkdir Demo
cd Demo
mkedanalyzer DemoAnalyzer
cd DemoAnalyzer
scram b
Edit the EDAnalyzer
The directory CMSSSW_9_4_8/src/Demo/DemoAnalyzer/plugins
contains several files that need to be altered. First,
add the necessary functions to the BuildFile.xml
[1].
<use name="lwtnn/lwtnn"/>
<use name="BESTAnalysis/BoostedEventShapeTagger"/>
In the same directory, the DemoAnalyzer.cc file must also be altered [1].
// In the include files
#include "FWCore/ServiceRegistry/interface/Service.h"
#include "CommonTools/UtilAlgos/interface/TFileService.h"
#include "TTree.h"
#include "TFile.h"
#include "BESTAnalysis/BoostedEventShapeTagger/interface/BoostedEventShapeTagger.h"
#include "DataFormats/PatCandidates/interface/Jet.h"
// In the class, under private:
BoostedEventShapeTagger *m_BEST;
TTree *bestTree;
edm::EDGetTokenT<std::vector<pat::Jet> > ak8JetsToken_;
std::map<std::string, float> treeVars;
std::vector<std::string> listOfVars;
// In the constructor pass the name of the configuration file
m_BEST = new BoostedEventShapeTagger( "/full_path/BESTAnalysis/BoostedEventShapeTagger/data/config.txt" );
// Use TFile service to create a tree to store histogram variables
edm::Service<TFileService> fs;
bestTree = fs->make<TTree>("bestTree","bestTree");
// Create tree variables and branches
listOfVars.push_back("jet1_particleType");
listOfVars.push_back("jet1_dnn_w");
listOfVars.push_back("jet1_dnn_z");
listOfVars.push_back("jet1_dnn_h");
listOfVars.push_back("jet1_dnn_top");
listOfVars.push_back("jet1_dnn_qcd");
for (unsigned i = 0; i < listOfVars.size(); i++){
treeVars[ listOfVars[i] ] = -999.99;
bestTree->Branch( (listOfVars[i]).c_str() , &(treeVars[ listOfVars[i] ]), (listOfVars[i]+"/F").c_str() );
}
// Define input tags
edm::InputTag ak8JetsTag_;
ak8JetsTag_ = edm::InputTag("slimmedJetsAK8", "", "PAT");
ak8JetsToken_ = consumes<std::vector<pat::Jet> >(ak8JetsTag_);
// In the destructor
delete m_BEST;
// In the event loop
using namespace edm;
using namespace std;
// Find objects corresponding to the token and link to the handle
Handle< std::vector<pat::Jet> > ak8Jets;
iEvent.getByToken(ak8JetsToken_, ak8Jets);
// loop over the jets
int nJets = 0;
for (std::vector<pat::Jet>::const_iterator jetBegin = ak8Jets->begin(), jetEnd = ak8Jets->end(), ijet = jetBegin; ijet != jetEnd; ++ijet){
// These are the requirements for a Jet to be used with BEST
if(ijet->numberOfDaughters() >= 2 && ijet->pt() >= 500 && ijet->userFloat("ak8PFJetsCHSSoftDropMass") > 40 ){
nJets++;
std::map<std::string,double> NNresults = m_BEST->execute(*ijet); // ijet is a pat::Jet
int particleType = m_BEST->getParticleID(); // automatically calculate the particle classification
if(nJets == 1){
treeVars["jet1_particleType"] = particleType;
treeVars["jet1_dnn_w"] = NNresults["dnn_w"];
treeVars["jet1_dnn_z"] = NNresults["dnn_z"];
treeVars["jet1_dnn_h"] = NNresults["dnn_higgs"];
treeVars["jet1_dnn_top"] = NNresults["dnn_top"];
treeVars["jet1_dnn_qcd"] = NNresults["dnn_qcd"];
}
}
}
// Fill the tree that stores the NNresults
bestTree->Fill();
Now, in order to properly compile everything, the BESTAnalyzer and BESTProducer directories must be removed as they are not compatible with CMSSW_9_X.
cd CMSSW_9_4_8/src/
rm -r -f BESTAnalysis/BESTAnalyzer/
rm -r -f BESTAnalysis/BESTProducer/
Finally, in the file BESTAnalysis/BoostedEventShapeTagger/data/config.txt
the dnnFile
must be updated
to include the full path. After this, everything can be compiled.
cd CMSSW_9_4_8/src/
scram b -j8
Now a run.py
file must be created in order to use the EDAnalyzer. The run file should be made in
CMSSW_9_4_8/src/Demo/DemoAnalyzer/test
and should include the following:
import FWCore.ParameterSet.Config as cms
process = cms.Process("run")
process.load("FWCore.MessageService.MessageLogger_cfi")
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(-1) )
process.source = cms.Source("PoolSource",
# replace 'myfile.root' with the source file you want to use
fileNames = cms.untracked.vstring(
"file:myfile.root"
)
)
process.MessageLogger.cerr.FwkReport.reportEvery = 1000
process.run = cms.EDAnalyzer('DemoAnalyzer')
process.TFileService = cms.Service("TFileService", fileName = cms.string("best_results.root") )
process.out = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string("ana_out.root"),
SelectEvents = cms.untracked.PSet( SelectEvents = cms.vstring('p') ),
outputCommands = cms.untracked.vstring('drop *',
'keep *_*run*_*_*'
#, 'keep *_goodPatJetsCATopTagPF_*_*'
#, 'keep recoPFJets_*_*_*'
)
)
process.outpath = cms.EndPath(process.out)
process.p = cms.Path(process.run)
Now the EDAnalyzer can be run!
cd CMSSW_9_4_8/src/Demo/DemoAnalyzer/test/
cmsenv
cmsRun run.py
The output will be best_results.root
which contains the BEST probabilities for the first
AK8 jet identified in an event. This process can be expanded to include the BEST probability
results for all of the AK8 jets in an event. Below is a table explaining what each of the
NNresults strings mean [1].
String | Definition |
---|---|
dnn_w |
BEST’s calculated probability that the AK8 jet is from a W boson |
dnn_z |
BEST’s calculated probability that the AK8 jet is from a Z boson |
dnn_higgs |
BEST’s calculated probability that the AK8 jet is from a Higgs boson |
dnn_top |
BEST’s calculated probability that the AK8 jet is from a top quark |
dnn_qcd |
BEST’s calculated probability that the AK8 jet is from a QCD background |
References
- justinrpilot, BESTAnalysis, https://github.com/justinrpilot/BESTAnalysis/tree/master
- demarley, lwtnn, https://github.com/demarley/lwtnn/tree/CMSSW_8_0_X-compatible#cmssw-compatibility
- Previous
- Next