Issue |
EPJ Nuclear Sci. Technol.
Volume 4, 2018
Special Issue on 4th International Workshop on Nuclear Data Covariances, October 2–6, 2017, Aix en Provence, France – CW2017
|
|
---|---|---|
Article Number | 30 | |
Number of page(s) | 6 | |
Section | Covariance Evaluation Methodology | |
DOI | https://doi.org/10.1051/epjn/2018038 | |
Published online | 14 November 2018 |
https://doi.org/10.1051/epjn/2018038
Regular Article
Bayesian optimization of generalized data
1
Nuclear Data and Criticality Safety Group, Reactor and Nuclear Systems Division, Oak Ridge National Laboratory,
Oak Ridge,
TN
37831-6171, USA
2
Department of Mechanical, Aerospace, and Nuclear Engineering, Rensselaer Polytechnic Institute,
Troy,
NY
12180-3590, USA
3
Department of Physics and Astronomy, University of Alabama,
Huntsville,
AL
35899, USA
4
Nuclear & Radiological Engineering & Medical Physics, Georgia Institute of Technology,
Atlanta,
GA
30332-0745, USA
* e-mail: arbanasg@ornl.gov
Received:
31
October
2017
Received in final form:
11
April
2018
Accepted:
28
May
2018
Published online: 14 November 2018
Direct application of Bayes' theorem to generalized data yields a posterior probability distribution function (PDF) that is a product of a prior PDF of generalized data and a likelihood function, where generalized data consists of model parameters, measured data, and model defect data. The prior PDF of generalized data is defined by prior expectation values and a prior covariance matrix of generalized data that naturally includes covariance between any two components of generalized data. A set of constraints imposed on the posterior expectation values and covariances of generalized data via a given model is formally solved by the method of Lagrange multipliers. Posterior expectation values of the constraints and their covariance matrix are conventionally set to zero, leading to a likelihood function that is a Dirac delta function of the constraining equation. It is shown that setting constraints to values other than zero is analogous to introducing a model defect. Since posterior expectation values of any function of generalized data are integrals of that function over all generalized data weighted by the posterior PDF, all elements of generalized data may be viewed as nuisance parameters marginalized by this integration. One simple form of posterior PDF is obtained when the prior PDF and the likelihood function are normal PDFs. For linear models without a defect this PDF becomes equivalent to constrained least squares (CLS) method, that is, the χ2 minimization method.
© G. Arbanas et al., published by EDP Sciences, 2018
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Advancement of scientific understanding of natural phenomena is partly due to a complementary activity of conceiving better models while performing experiments that test those models or explore their limits of validity. There are many historical examples of constructive interplay between conceptual models and experiments that have resulted in improved understanding of some phenomena. Although experiments are often viewed as objective and independent from the model they were designed to test, their conception and design may be based upon some model in which measured data are to be interpreted and compared to the prediction of the model being tested.
As our understanding of nature increases, the harmony between models and experiments will likely increase to reveal a complementary relationship between the conceptual models and experimental data. One practical way of formalizing the complementary nature of model parameters and experimental data is by defining their union that is conventionally called generalized data, although generalized parameters may be a more appropriate term because the aspect of fitting, which is conventionally restricted to parameters, is extended to experimental data by virtue of this generalization. This union is referred to as generalized data herein to maintain consistency with nomenclature used in literature. For defective models, this concept is extended to include the model defect data as the third component of generalized data.
In the process of applying Bayes' theorem [1] to generalized data, the farsightedness of Edwin T. James insight is apparent in this remark: “But every Bayesian problem is open ended; no matter how much analysis you have completed, this only suggests still other kinds of prior information that you might have had and therefore still more interesting calculations that need to be done to get still deeper insight into the problem” [2]. From this perspective, prior expectation values of the generalized data (model parameters, measured experimental data, and the model defect data) and their generalized covariance matrix that by definition contains all pair-wise covariances among the three components of generalized data, are formally introduced as priors in the context of Bayes' theorem.
Direct application of Bayes' theorem to this generalized data yields a posterior PDF of generalized data from which posterior expectation values of generalized data and their posterior generalized covariance matrix can be computed. Furthermore, the constraint relating posterior expectation values of the three components of the generalized data (model parameters, measured experimental data, and the model defect data) and their covariance is imposed. The formal framework used in this work has been conducive to recognizing a connection between model defect and constraints presented here.
Key expressions related to Bayesian treatment of model defect have already been derived using standard notation in [3], and much has already been learned from various implementations of various model defects and their effects on neutron cross sections in [3–7].
Formal expressions in generalized data notation are derived in Section 2. In Section 2.1, Bayes' theorem [1] is applied directly to generalized data, yielding general expressions for generalized data optimization, that is, for simultaneous optimization of model parameters, measured experimental data, and model defect data. From the perspective of Bayes' theorem, generalized data have prior and posterior PDF, meaning that each of its three components will also have posterior values just like model parameters would.
In Section 2.2, normal PDFs are used to derive a special case of the posterior PDF, and in Section 2.3, an alternative derivation focusing on posterior PDF of model parameters is presented.
In Section 4, the connection between our Bayes' generalized data optimization method and the constrained generalized least squares (CGLS) method [8] is discussed. A linear form of CGLS has been implemented in the module TSURFER [9] of the SCALE system [10]. The CGLS method defines an objective function, often called χ2, along with a constraint that is enforced by Lagrange multipliers in those codes. The relationship to the conventional χ2 minimization is also discussed.
2 Derivation
This application of Bayes' theorem is relatively simple, illustrating the words of Bayesian advocate late Edwin T. Jaynes: “The difficulties are never mathematical,” but they could be described more accurately as “conceptual difficulties” [2].
The posterior PDF of generalized data is derived by application of the Bayes' theorem in Section 2.1. Compact generalized data notation is used whenever possible, except when particular aspects of model parameters and experimental data must be distinguished. A distinction is maintained in this derivation between probability p(z| ⟨ z ⟩ , …) that a variable attains a particular value z and the corresponding expectation values denoted by ⟨z⟩.
Application of Bayes' theorem to generalized data implies that both prior parameters and the new experimental data together constitute prior generalized data. That is, since a given experimental data has not yet been used in an evaluation, it should be viewed as a prior from the perspective of the Bayes' theorem. Since experimental data are to be treated as a prior, it may be expected that Bayes' theorem would naturally yield an optimal posterior PDF of experimental data, just as it does conventionally for model parameter values. Prior model parameters, measured data, and model defect data, simultaneously inform each other in a way that yields their optimal posterior joint PDF.
An expression for posterior PDF of generalized data is derived and then used to compute posterior expectation values and covariances. In Section 2.3, conventional methods are used to derive posterior probability distribution of model parameters, where the total probability theorem is used to derive an expression equivalent to that derived in Section 2.1. The derivation based on generalized data notation may be more compact, but the conventional derivation may be more intuitive to those accustomed to PDFs of model parameters. Mathematical guidelines and nomenclature used in derivations are listed here for convenience:
-
Generalized data, z ≡ (P, D, δ), is a union of model parameters (P), measured experimental data (D), and model defect data (δ).
-
The covariance matrix of generalized data, C, by definition contains covariance among all three components of generalized data.
-
Bayes' theorem is applied directly to generalized data.
-
A set of constraints, f, on posterior expectation values and covariances of generalized data is imposed on the posterior PDF, and formally solved by the Lagrange multiplier method.
-
A model, i.e., theory T(⋅), relates model parameters, measured data, and model defect data in definition of constraints.
-
Distinction between expectation values, i.e., ⟨z⟩, and their instance value, i.e., z, is maintained in PDFs, and
-
Posterior expectation values are indicated by a prime, ⟨z ⟩ ′ and C′, while unprimed ones represent prior expectation values, ⟨z⟩ and C.
The following expressions are referenced within the paper by assigning a context-dependent meaning to generic variables α, β, and γ used below. A generic Bayes' theorem could be stated as (1) while a generic product rule of probability theory is (2)Integrating equation (2) over β yields the law of total probability, (3)that is equivalent to marginalization of nuisance parameter β by integrating over its all possible values.
2.1 Derivation in generalized data notation
A definition of generalized data vector z is extended to include model defect data δ in addition to parameters P and measured data D, namely: (4) where prior values of generalized data are (5)and where the prior covariance matrix of generalized data is represented by a 3 × 3 block diagonal matrix C (6) (7)where square matrices M, V, and along the diagonal represent covariance matrix of parameters, measured data, and the model defect, respectively, while W, X, and Y are their respective pair-wise covariances. Prior expectation value of model defect ⟨δ⟩ is a vector of the same size as measured data ⟨D⟩, and it is expectation value of deviations between model predictions T(P) and the measured data caused by the model defect alone. The Bayes' theorem is used to write a posterior PDF for z ≡ (P, D, δ) by making the following substitution in equation (1), (8)to obtain (9)where p(z| ⟨ z ⟩ , C) is the prior PDF, f is a set of constraints imposed on the posterior expectation values, and where p(f|z, ⟨ z ⟩ , C) is the likelihood function. Constraints in f are defined by an auxiliary quantity that relates components of generalized data as (10)as constraints on their posterior expectation values and their posterior covariance matrix elements, (11)where and are given, and where posterior expectation values are indicated by primes.
Constraints on posterior expectation values and covariances yield a likelihood function formally expressed via Lagrange multipliers, so that the posterior PDF becomes (12) where and constitute a set of Lagrange multipliers to be determined from the constraint set f. This posterior PDF of generalized data implicitly contains a combined posterior PDF of parameters, measured data, and model defect data, that has been informed by all prior information available, namely, by ⟨z⟩, C, and the constraint f enforced on the posterior expectation values and covariances.
Upon normalizing the posterior PDF of generalized data to unity, posterior expectation values of any function g(z) of posterior generalized data z could be computed as an integral over generalized data (13) that are also used compute posterior expectation values, ⟨z ⟩ ′, and their posterior covariance matrix C′. This posterior PDF yields and .
One consequence of the derivation of the posterior PDF of generalized data is that any computation of expectation values computed with this PDF entails integration over all generalized data. Therefore, generalized data (model parameters, measured data, and model defect data) could be viewed as nuisance parameters marginalized by integration.
Expectation values of the constraint parameters are generally set to (14) (15) where this choice defines a particular set of constraints labeled f0. Since the diagonal elements of are the posterior expectation values of , and since PDF's are positive functions, this constraint could be satisfied by enforcing ω = 0 for all values of z inside the integral. This suggests an effective likelihood function (16)With this likelihood function, the expectation values computed by the posterior PDF become (17)where p(z| ⟨ z ⟩ , C) is the prior PDF of generalized data.
A δDirac(ω) likelihood function of a defective model effectively reduces integration over z = (P, D, δ) to (P,D), and the model defect variable δ is replaced by T(P) − D in the prior PDF. This component of the prior PDF has similar features as the likelihood function obtained by setting constraints and for a perfect model. Conversely, non-zero values of and for a perfect model with ⟨δ ⟩ = ⟨ δ ⟩ ′ = 0 and yield a PDF analogous to that obtained by setting constraints to zero and introducing a model defect and . This could be phrased as (18) or in shorthand (19)This point will be elaborated upon in Sections 2.2 and 3. The exact connection between the two approaches may be intricate because constraints are defined on posterior expectation values and covariances, while the model defect is defined by prior model defect data expectation values and covariances.
The posterior generalized data PDF enables computation of covariance matrix of posterior model values ⟨T(P) ⟩ ′, corresponding to experimental data D, to all orders (20) in contrast to the first-order approximation expression, (21)
reported in evaluated nuclear data files like the ENDF [11].
2.2 Posterior PDF for normal PDFs
Although this formalism applies to arbitrary PDFs, a particularly simple form is attained when normal form is assumed for all PDFs. In that case, the prior PDF becomes (22) (23) where N stands for a normal PDF, and the likelihood function could be stated as (24) (25)where ω is defined in equation (10), ωf is an effective parameter vector and is an effective covariance matrix such that this posterior PDF obeys the constraint f on posterior expectation value and . Unknown parameters ωf and play a role equivalent to Lagrange multipliers and in equation (12).
Combining the normal prior PDF and the normal likelihood functions yields a posterior PDF (26) subject to aforementioned constraints in f.
Constraint set, f0, namely ⟨ω ⟩ ′ = 0 and , are satisfied for ωf = 0 and , for which the normal likelihood function in equation (24) becomes a Dirac delta function, so that the posterior PDF becomes (27)
Furthermore, for models without a defect, that is, ⟨δ ⟩ = 0 and , this posterior PDF becomes (28) where (29)and is the covariance matrix corresponding to . In Section 4, it will be shown that the expression for the posterior PDF in this limit is equivalent to the CGLS method implemented in the APLCON code, or to its linear approximation implemented in the TSURFER module of the SCALE code system.
2.3 Conventional derivation of posterior parameter PDF
Making the following substitutions into a generic Bayes' theorem in equation (1): (30) one obtains (31)where the second factor on the right hand side can be expressed as a nested integral over all possible values of measured data D and model defect data δ, given their expectation values ⟨D⟩ and ⟨δ⟩, respectively, and their covariance matrix C, by using the total probability theorem in equation (3):(32) (32)
The first term in equation (31) and the second term in equation (32) could be combined by making the following substitutions: (33) into the product rule in equation (2) to obtain (34)Combining all terms yields (35)where Bayes' theorem stated by equation (9) was used to introduce in the integrand on the last line above. This shows that a partial posterior PDF of parameters, P, is simply an integral of the posterior PDF over all measured data, D, and model defect δ.
3 Simple example
The correspondence between a constraint set f and a model defect suggested above is illustrated on a simple analytically solvable example. A simple model without a defect is defined as T(P) = P with a single scalar parameter P, and a single data point D with constraints f: ⟨ω ⟩ ′ = 0 and Ω′ ≡ ⟨ ω2 ⟩ ′ = 2/3. A corresponding model with a defect ⟨δ ⟩ = 0 and Δ = 1 has constraint set to zero, namely f0: ⟨ω ⟩ ′ = 0 and Ω′ = 0, while its covariance matrix has a component corresponding to model defect, namely C3,3 = Δ = 1. For simplicity, the prior expectation values of generalized data are set to zero, ⟨z ⟩ = 0, and the prior covariance matrix C for both models is set to the identity matrix, so that posterior expectation values remain unchanged for both models, that is, ⟨z ⟩ = ⟨ z ⟩ ′ = 0.
First we consider a model without a defect, that is, whose covariance matrix is defined as (36) and whose posterior PDF is expressed as a product of its prior PDF and the likelihood function expressed in terms of Lagrange multipliers, (37)where λ = 0 and Λ = 1/2 yield a posterior PDF that satisfies the imposed constraints, that is, ⟨ω ⟩ ′ = 0 and Ω′ = 2/3.
For a model with a defect, z = (P, D, δ), with a prior covariance matrix (38) yields a posterior PDF that satisfies f0: ⟨ω ⟩ ′ = 0 and Ω′ = 0, as (39)
Upon integration over δ, the remaining PDF becomes equivalent to the posterior PDF in equation (37). In this simple illustration, numerical constants were chosen to make the correspondence between the two approaches exact.
4 Connections to other methods
To establish connections to other methods all PDFs are assumed to be normal, and consequently, the exponents in equations (22) and (24) could be combined to define a generalized cost function: (40) where the constraint T(P) − D = δ is enforced by defining . This cost function can be minimized by using Laplace transform and Newton-Raphson method to yield approximate posterior expectation values of generalized data ⟨z ⟩ ′ ≈ zmin and of its covariance matrix C ' ≈ Cmin [12]. For a perfect model, one may set ⟨δ ⟩ → 0 and to obtain (41)where and is the corresponding covariance matrix, with the constraint T(P) = D. A constrained minimization of this cost function performed by the TSURFER code [9] and the APLCON code [13], where the constraint is enforced by the Lagrange multiplier method. The values of zmin that minimize χ2 are then approximate expectation values of posterior generalized data ⟨z ⟩ ′ ≈ zmin. Since TSURFER makes a linear approximation of the model, its method is referred to as generalized linear least squares (GLLS). In conventional GLS, which is also known as the χ2 minimization method, the constraint is applied to the generalized data , and the difference in equation (41) is replaced by (42)with no constraint enforced.
A common approximation to the χ2-function is obtained for a block-diagonal generalized data covariance matrix C, with parameter covariance matrix M and experimental data covariance matrix V along the diagonal: (43)
Minimization of χ2 with respect to P yields a solution vector Pmin, that is approximately equal to the posterior expectation value of parameters ⟨P ⟩ ′ [14]. This definition of χ2 has been used in nuclear data evaluations and is also the quantity that is minimized in generic optimization codes like MINUIT [15]. That these methods could lead to incorrect posterior values and covariances has been recognized by Capote [16].
5 Conclusions and outlook
A new, general expression for the posterior PDF of generalized data has been derived, where generalized data refers to a union of model parameters, measured data, and model defect data, starting from the Bayes' theorem. An analogy between the constraints on posterior expectation values and model defect data has been suggested in this work, and further study may be needed to better understand their connections and potential applications. Key ingredients used in this derivation are:
-
use of generalized data that is a union of parameters, measured data, and model defect;
-
formal recognition constraints on posterior expectation values and covariances when applying Bayes' theorem;
-
formal and consistent separation between expectation and instance values of generalized data.
A direct consequence of application of Bayes' theorem to generalized data, of which experimental data are a subset, is that the posterior PDF of generalized data yields posterior expectation values and covariances for experimental data and model defect data. This is in contrast to the prevalent nuclear data evaluation practice where posterior PDF of experimental data and its covariances are identical to those of model predictions.
A normal form of posterior PDF of generalized data, obtained by setting all constituent PDFs to be normal and assuming a perfect model, was found useful to establish a connection to extant optimization methods, namely CGLS implemented in APLCON and its linear approximation implemented in the TSURFER module of SCALE.
The appealing features of the posterior PDF of generalized data listed above make it a candidate for simultaneous and consistent optimization of model parameters and experimental data of differential cross section and integral benchmarks that would yield more accurate and complete evaluations and covariances. The presented method could simultaneously sample R-matrix resonance parameters, optical model potential parameters, and integral benchmark parameters such as spatial dimensions and material composition, to compute presently unknown covariances among integral benchmark experiments and cross section data. Although sensitivities of integral benchmark responses with respect to cross sections are not needed for calculations using the Bayesian Monte Carlo method, they could nevertheless be computed [17]. It is hoped that the derived method could complement conventional nuclear data adjustment methods [18].
Author contribution statement
All the authors were involved in the preparation of the manuscript. All the authors have read and approved the final manuscript.
Acknowledgments
Useful discussions with Ivan Kodeli, Mark Williams, Helmut Leeb, Georg Schnabel, Roberto Capote, and Christopher Perfetti are acknowledged. This work has been funded by the Nuclear Criticality Safety Program in the National Nuclear Security Agency of the United States Department of Energy.
References
- T. Bayes, Phil. Trans. Roy. Soc. 53, 370 (1763), [reprinted in E.S. Pearson and M.G. Kendall, Studies in the History of Statistics and Probability, (Hafner, Darien, Conn., 1970)] [Google Scholar]
- E.T. Jaynes, Straight Line Fitting − a Bayesian Solution, http://bayes.wustl.edu/etj/articles/leapz.pdf (1991) [Google Scholar]
- G. Schnabel, Ph.D. Thesis, Technischen Universität Wien, 2015 [Google Scholar]
- M.T. Pigni, H. Leeb, in Proceedings of the International Workshop on Nuclear Data for the Transmutation of Nuclear Waste, GSI-Darmstadt, Germany, 2003 [Google Scholar]
- H. Leeb, D. Neudecker, T. Srdinko, Consistent procedure for nuclear data evaluation based on modeling, Nucl. Data Sheets 109, 2762 (2008) [CrossRef] [Google Scholar]
- D. Neudecker, R. Capote, H. Leeb, Impact of model defect and experimental uncertainties on evaluated output, Nucl. Instrum Meth. Phys. Res. A 723, 163 (2013) [Google Scholar]
- G. Schnabel, H. Leeb, Differential cross sections and the impact of model defects in nuclear data evaluation, EPJ Web Conf. 111, 9001 (2016) [Google Scholar]
- V. Blobel, Constrained Least Squares Methods with Correlated Data and Systematic Uncertainties (2010), http://www.desy.de/blobel/apltalk.pdf [Google Scholar]
- M.L. Williams, B.L. Broadhead, M.A. Jessee, J.J. Wagschal, TSURFER: An Adjustment Code To Determine Biases and Uncertainties in Nuclear System Responses by Consolidating Differential Data and Benchmark Integral Experiments, Version 6.2.1, Vol. III, Sect. M21, ORNL/TM-2005/39 (2016) [Google Scholar]
- B.T. Rearden, M.A. Jessee, Eds., SCALE Code System, ORNL/TM-2005/39, Version 6.2.1 (Oak Ridge National Laboratory, Oak Ridge, Tennessee, 2016) Available from Radiation Safety Information Computational Center as CCC-834 [CrossRef] [Google Scholar]
- National Nuclear Data Center, Brookhaven National Laboratory, http://nndc.bnl.gov [Google Scholar]
- F. Fröhner, Evaluation and Analysis of Nuclear Resonance Data, JEFF Report 18, 2000 [Google Scholar]
- V. Blobel (DESY), APLCON downloadable from http://www.desy.de/blobel/wwwcondl.html [Google Scholar]
- N.M. Larson, Updated Users' Guide for SAMMY: Multilevel R-matrix Fits to Neutron Data Using Bayes' Equations, ORNL/TM-9179/R8 (2008) [Google Scholar]
- F. James, M. Roos, Comput. Phys. Commun. 10, 343 (1975) [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
- R. Capote, D.L. Smith, An investigation of the performance of the unified Monte Carlo method of neutron cross section data evaluation, Nucl. Data Sheets 109, 2768 (2008) [CrossRef] [Google Scholar]
- L. Fiorito et al., Nuclear data uncertainty propagation to integral responses using SANDY, Ann. Nucl. Energy 101, 359 (2017) [CrossRef] [Google Scholar]
- V. Sobes, L. Leal, G. Arbanas, B. Forget, Resonance parameter adjustment based on integral experiments, Nucl. Sci. Eng. 183, 347 (2016) [CrossRef] [Google Scholar]
Cite this article as: Goran Arbanas, Jinghua Feng, Zia J. Clifton, Andrew M. Holcomb, Marco T. Pigni, Dorothea Wiarda, Christopher W. Chapman, Vladimir Sobes, Li Emily Liu, Yaron Danon, Bayesian optimization of generalized data, EPJ Nuclear Sci. Technol. 4, 30 (2018)
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.