Bayesian optimization of generalized data

Direct application of Bayes’ theorem to generalized data yields a posterior probability distribution function (PDF) that is a product of a prior PDF of generalized data and a likelihood function, where generalized data consists of model parameters, measured data, and model defect data. The prior PDF of generalized data is defined by prior expectation values and a prior covariance matrix of generalized data that naturally includes covariance between any two components of generalized data. A set of constraints imposed on the posterior expectation values and covariances of generalized data via a given model is formally solved by the method of Lagrange multipliers. Posterior expectation values of the constraints and their covariance matrix are conventionally set to zero, leading to a likelihood function that is a Dirac delta function of the constraining equation. It is shown that setting constraints to values other than zero is analogous to introducing a model defect. Since posterior expectation values of any function of generalized data are integrals of that function over all generalized data weighted by the posterior PDF, all elements of generalized data may be viewed as nuisance parameters marginalized by this integration. One simple form of posterior PDF is obtained when the prior PDF and the likelihood function are normal PDFs. For linear models without a defect this PDF becomes equivalent to constrained least squares (CLS) method, that is, the x minimization method.


Introduction
Advancement of scientific understanding of natural phenomena is partly due to a complementary activity of conceiving better models while performing experiments that test those models or explore their limits of validity.There are many historical examples of constructive interplay between conceptual models and experiments that have resulted in improved understanding of some phenomena.Although experiments are often viewed as objective and independent from the model they were designed to test, their conception and design may be based upon some model in which measured data are to be interpreted and compared to the prediction of the model being tested.
As our understanding of nature increases, the harmony between models and experiments will likely increase to reveal a complementary relationship between the conceptual models and experimental data.One practical way of formalizing the complementary nature of model parame-ters and experimental data is by defining their union that is conventionally called generalized data, although generalized parameters may be a more appropriate term because the aspect of fitting, which is conventionally restricted to parameters, is extended to experimental data by virtue of this generalization.This union is referred to as generalized data herein to maintain consistency with nomenclature used in literature.For defective models, this concept is extended to include the model defect data as the third component of generalized data.
In the process of applying Bayes' theorem [1] to generalized data, the farsightedness of Edwin T. James insight is apparent in this remark: "But every Bayesian problem is open ended; no matter how much analysis you have completed, this only suggests still other kinds of prior information that you might have had and therefore still more interesting calculations that need to be done to get still deeper insight into the problem" [2].From this perspective, prior expectation values of the generalized data (model parameters, measured experimental data, and the model defect data) and their generalized covariance matrix that by definition contains all pair-wise covariances among the three components of generalized data, are formally introduced as priors in the context of Bayes' theorem.
Direct application of Bayes' theorem to this generalized data yields a posterior PDF of generalized data from which posterior expectation values of generalized data and their posterior generalized covariance matrix can be computed.Furthermore, the constraint relating posterior expectation values of the three components of the generalized data (model parameters, measured experimental data, and the model defect data) and their covariance is imposed.The formal framework used in this work has been conducive to recognizing a connection between model defect and constraints presented here.
Key expressions related to Bayesian treatment of model defect have already been derived using standard notation in [3], and much has already been learned from various implementations of various model defects and their effects on neutron cross sections in [3][4][5][6][7].
Formal expressions in generalized data notation are derived in Section 2. In Section 2.1, Bayes' theorem [1] is applied directly to generalized data, yielding general expressions for generalized data optimization, that is, for simultaneous optimization of model parameters, measured experimental data, and model defect data.From the perspective of Bayes' theorem, generalized data have prior and posterior PDF, meaning that each of its three components will also have posterior values just like model parameters would.
In Section 2.2, normal PDFs are used to derive a special case of the posterior PDF, and in Section 2.3, an alternative derivation focusing on posterior PDF of model parameters is presented.
In Section 4, the connection between our Bayes' generalized data optimization method and the constrained generalized least squares (CGLS) method [8] is discussed.A linear form of CGLS has been implemented in the module TSURFER [9] of the SCALE system [10].The CGLS method defines an objective function, often called x 2 , along with a constraint that is enforced by Lagrange multipliers in those codes.The relationship to the conventional x 2 minimization is also discussed.

Derivation
This application of Bayes' theorem is relatively simple, illustrating the words of Bayesian advocate late Edwin T. Jaynes: "The difficulties are never mathematical," but they could be described more accurately as "conceptual difficulties" [2].
The posterior PDF of generalized data is derived by application of the Bayes' theorem in Section 2.1.Compact generalized data notation is used whenever possible, except when particular aspects of model parameters and experimental data must be distinguished.A distinction is maintained in this derivation between probability p(z| ⟨ z ⟩ , …) that a variable attains a particular value z and the corresponding expectation values denoted by ⟨z⟩.
Application of Bayes' theorem to generalized data implies that both prior parameters and the new experi-mental data together constitute prior generalized data.That is, since a given experimental data has not yet been used in an evaluation, it should be viewed as a prior from the perspective of the Bayes' theorem.Since experimental data are to be treated as a prior, it may be expected that Bayes' theorem would naturally yield an optimal posterior PDF of experimental data, just as it does conventionally for model parameter values.Prior model parameters, measured data, and model defect data, simultaneously inform each other in a way that yields their optimal posterior joint PDF.
An expression for posterior PDF of generalized data is derived and then used to compute posterior expectation values and covariances.In Section 2.3, conventional methods are used to derive posterior probability distribution of model parameters, where the total probability theorem is used to derive an expression equivalent to that derived in Section 2.1.The derivation based on generalized data notation may be more compact, but the conventional derivation may be more intuitive to those accustomed to PDFs of model parameters.Mathematical guidelines and nomenclature used in derivations are listed here for convenience: The following expressions are referenced within the paper by assigning a context-dependent meaning to generic variables a, b, and g used below.A generic Bayes' theorem could be stated as while a generic product rule of probability theory is Integrating equation ( 2) over b yields the law of total probability, that is equivalent to marginalization of nuisance parameter b by integrating over its all possible values.
and where the prior covariance matrix of generalized data is represented by a 3 Â 3 block diagonal matrix C where square matrices M, V, and D along the diagonal represent covariance matrix of parameters, measured data, and the model defect, respectively, while W, X, and Y are their respective pair-wise covariances.Prior expectation value of model defect ⟨d⟩ is a vector of the same size as measured data ⟨D⟩, and it is expectation value of deviations between model predictions T(P) and the measured data caused by the model defect alone.The Bayes' theorem is used to write a posterior PDF for z ≡ (P, D, d) by making the following substitution in equation ( 1), where p(z| ⟨ z ⟩ , C) is the prior PDF, f is a set of constraints imposed on the posterior expectation values, and where p(f|z, ⟨ z ⟩ , C) is the likelihood function.Constraints in f are defined by an auxiliary quantity that relates components of generalized data as as constraints on their posterior expectation values and their posterior covariance matrix elements, where v 0 f and V 0 f are given, and where posterior expectation values are indicated by primes.
Constraints on posterior expectation values and covariances yield a likelihood function formally expressed via Lagrange multipliers, so that the posterior PDF becomes pðfjz; 〈 z 〉 ; CÞ ¼ e where {l i } f and {L ij } f constitute a set of Lagrange multipliers to be determined from the constraint set f.This posterior PDF of generalized data implicitly contains a combined posterior PDF of parameters, measured data, and model defect data, that has been informed by all prior information available, namely, by ⟨z⟩, C, and the constraint f enforced on the posterior expectation values and covariances.
Upon normalizing the posterior PDF of generalized data to unity, posterior expectation values of any function g(z) of posterior generalized data z could be computed as an integral over generalized data that are also used to compute posterior expectation values, ⟨z ⟩ 0 , and their posterior covariance matrix C 0 .This posterior PDF yields ⟨v⟩ 0 ¼ v 0 f and V 0 ¼ V 0 f .One consequence of the derivation of the posterior PDF of generalized data is that any computation of expectation values computed with this PDF entails integration over all generalized data.Therefore, generalized data (model parameters, measured data, and model defect data) could be viewed as nuisance parameters marginalized by integration.
Expectation values of the constraint parameters are generally set to where this choice defines a particular set of constraints labeled f 0 .Since the diagonal elements of V 0 are the posterior expectation values of v 2 i , and since PDF's are positive functions, this constraint could be satisfied by enforcing v = 0 for all values of z inside the integral.This suggests an effective likelihood function With this likelihood function, the expectation values computed by the posterior PDF become where p(z| ⟨ z ⟩ , C) is the prior PDF of generalized data.A d Dirac (v) likelihood function of a defective model effectively reduces integration over z = (P, D, d) to (P,D), and the model defect variable d is replaced by T(P) À D in the prior PDF.This component of the prior PDF has similar features as the likelihood function obtained by setting constraints v 0 f ← ⟨d⟩ 0 and V 0 f ← D 0 for a perfect model.Conversely, non-zero values of v 0 f and V 0 f for a perfect model with ⟨d ⟩ = ⟨ d ⟩ 0 = 0 and D ¼ D 0 ¼ 0 yield a PDF analogous to that obtained by setting constraints to zero and introducing a model defect ⟨d⟩ 0 ← v 0 f and D 0 ← V 0 f .This could be phrased as or in shorthand This point will be elaborated upon in Sections 2.2 and 3.The exact connection between the two approaches may be intricate because constraints are defined on posterior expectation values and covariances, while the model defect is defined by prior model defect data expectation values and covariances.The posterior generalized data PDF enables computation of covariance matrix Q of posterior model values ⟨T(P) ⟩ 0 , corresponding to experimental data D, to all orders in contrast to the first-order approximation expression, reported in evaluated nuclear data files like the ENDF [11].

Posterior PDF for normal PDFs
Although this formalism applies to arbitrary PDFs, a particularly simple form is attained when normal form is assumed for all PDFs.In that case, the prior PDF becomes pðzj⟨z⟩; CÞ ∝ e À 1 2 ðzÀ⟨z⟩Þ ⊺ C À1 ðzÀ⟨z⟩Þ ; where N stands for a normal PDF, and the likelihood function could be stated as where v is defined in equation ( 10), v f is an effective parameter vector and V f is an effective covariance matrix such that this posterior PDF obeys the constraint f on posterior expectation value 〈 v 〉 0 ¼ v 0 f and V ¼ V f .Unknown parameters v f and V f play a role equivalent to Lagrange multipliers fl i g f and fL ij g f in equation (12).
Combining the normal prior PDF and the normal likelihood functions yields a posterior PDF subject to aforementioned constraints in f.Constraint set, f 0 , namely ⟨v ⟩ 0 = 0 and V 0 ≡ 〈 vv ⊺ 〉 0 ¼ 0, are satisfied for v f = 0 and V F ¼ 0, for which the normal likelihood function in equation ( 24) becomes a Dirac delta function, so that the posterior PDF becomes Furthermore, for models without a defect, that is, ⟨d ⟩ = 0 and D ¼ 0, this posterior PDF becomes and Ĉ is the covariance matrix corresponding to ẑ.In Section 4, it will be shown that the expression for the posterior PDF in this limit is equivalent to the CGLS method implemented in the APLCON code, or to its linear approximation implemented in the TSURFER module of the SCALE code system.

Conventional derivation of posterior parameter PDF
Making the following substitutions into a generic Bayes' theorem in equation ( 1): The first term in equation ( 31) and the second term in equation (32) could be combined by making the following substitutions: where Bayes' theorem stated by equation ( 9) was used to introduce pðzj 〈 z 〉 ; C; fÞ in the integrand on the last line above.This shows that a partial posterior PDF of parameters, P, is simply an integral of the posterior PDF over all measured data, D, and model defect d.

Simple example
The correspondence between a constraint set f and a model defect suggested above is illustrated on a simple analytically solvable example.A simple model without a defect is defined as T(P) = P with a single scalar parameter P, and a single data point D with constraints f: ⟨v ⟩ 0 = 0 and A corresponding model with a defect ⟨d ⟩ = 0 and D = 1 has constraint set to zero, namely f 0 : ⟨v ⟩ 0 = 0 and V 0 = 0, while its covariance matrix has a component corresponding to model defect, namely C 3,3 = D = 1.For simplicity, the prior expectation values of generalized data are set to zero, ⟨z ⟩ = 0, and the prior covariance matrix C for both models is set to the identity matrix, so that posterior expectation values remain unchanged for both models, that is, ⟨z ⟩ = ⟨ z ⟩ 0 = 0. First we consider a model without a defect, that is, ẑ ¼ ðP ; DÞ whose covariance matrix is defined as and whose posterior PDF is expressed as a product of its prior PDF and the likelihood function expressed in terms of Lagrange multipliers, pðẑj 〈 ẑ 〉 ; Ĉ; fÞ ¼ N ðẑj 〈 ẑ 〉 ; ĈÞe ÀlðxÀyÞÀLðxÀyÞ 2 ; ð37Þ where l = 0 and L = 1/2 yield a posterior PDF that satisfies the imposed constraints, that is, ⟨v ⟩ 0 = 0 and V 0 = 2/3.For a model with a defect, z = (P, D, d), with a prior covariance matrix yields a posterior PDF that satisfies f 0 : ⟨v ⟩ 0 = 0 and V 0 = 0, as Upon integration over d, the remaining PDF becomes equivalent to the posterior PDF in equation (37).In this simple illustration, numerical constants were chosen to make the correspondence between the two approaches exact.

Connections to other methods
To establish connections to other methods all PDFs are assumed to be normal, and consequently, the exponents in equations ( 22) and (24) could be combined to define a generalized cost function: where the constraint T(P) À D = d is enforced by defining z ≡ ðP ; D; T ðP Þ À DÞ.This cost function can be minimized by using Laplace transform and Newton-Raphson method to yield approximate posterior expectation values of generalized data ⟨z ⟩ 0 ≈ z min and of its covariance matrix C ' ≈ C min [12].For a perfect model, one may set ⟨d ⟩ !0 and D!0 to obtain where ẑ ≡ ðP ; DÞ and Ĉ is the corresponding covariance matrix, with the constraint T(P) = D. A constrained minimization of this cost function performed by the TSURFER code [9] and the APLCON code [13], where the constraint is enforced by the Lagrange multiplier method.The values of z min that minimize x 2 are then approximate expectation values of posterior generalized data ⟨z ⟩ 0 ≈ z min .Since TSURFER makes a linear approximation of the model, its method is referred to as generalized linear least squares (GLLS).In conventional GLS, which is also known as the x 2 minimization method, the constraint is applied to the generalized data ẑ ≡ ðP ; DÞ!ðP ; T ðP ÞÞ, and the difference ðẑ À ⟨ẑ⟩Þ in equation ( 41) is replaced by with no constraint enforced.A common approximation to the x 2 -function is obtained for a block-diagonal generalized data covariance matrix C, with parameter covariance matrix M and experimental data covariance matrix V along the diagonal: Minimization of x 2 with respect to P yields a solution vector P min , that is approximately equal to the posterior expectation value of parameters ⟨P ⟩ 0 [14].This definition of x 2 has been used in nuclear data evaluations and is also the quantity that is minimized in generic optimization codes like MINUIT [15].That these methods could lead to incorrect posterior values and covariances has been recognized by Capote [16].

Conclusions and outlook
A new, general expression for the posterior PDF of generalized data has been derived, where generalized data refers to a union of model parameters, measured data, and model defect data, starting from the Bayes' theorem.An analogy between the constraints on posterior expectation values and model defect data has been suggested in this work, and further study may be needed to better understand their connections and potential applications.Key ingredients used in this derivation are: use of generalized data that is a union of parameters, measured data, and model defect; formal recognition constraints on posterior expectation values and covariances when applying Bayes' theorem; formal and consistent separation between expectation and instance values of generalized data.
A direct consequence of application of Bayes' theorem to generalized data, of which experimental data are a subset, is that the posterior PDF of generalized data yields posterior expectation values and covariances for experimental data and model defect data.This is in contrast to the prevalent nuclear data evaluation practice where posterior PDF of experimental data and its covariances are identical to those of model predictions.
A normal form of posterior PDF of generalized data, obtained by setting all constituent PDFs to be normal and assuming a perfect model, was found useful to establish a connection to extant optimization methods, namely CGLS implemented in APLCON and its linear approximation implemented in the TSURFER module of SCALE.
The appealing features of the posterior PDF of generalized data listed above make it a candidate for simultaneous and consistent optimization of model parameters and experimental data of differential cross section and integral benchmarks that would yield more accurate and complete evaluations and covariances.The presented method could simultaneously sample R-matrix resonance parameters, optical model potential parameters, and integral benchmark parameters such as spatial dimensions and material composition, to compute presently unknown covariances among integral benchmark experiments and cross section data.Although sensitivities of integral benchmark responses with respect to cross sections are not needed for calculations using the Bayesian Monte Carlo method, they could nevertheless be computed [17].It is hoped that the derived method could complement conventional nuclear data adjustment methods [18].
Useful discussions with Ivan Kodeli, Mark Williams, Helmut Leeb, Georg Schnabel, Roberto Capote, and Christopher Perfetti are acknowledged.This work has been funded by the Nuclear Criticality Safety Program in the National Nuclear Security Agency of the United States Department of Energy.
is a union of model parameters (P), measured experimental data (D), and model defect data (d).-The covariance matrix of generalized data, C, by definition contains covariance among all three components of generalized data.-Bayes' theorem is applied directly to generalized data.-A set of constraints, f, on posterior expectation values and covariances of generalized data is imposed on the posterior PDF, and formally solved by the Lagrange multiplier method.-A model, i.e., theory T(⋅), relates model parameters, measured data, and model defect data in definition of constraints.-Distinction between expectation values, i.e., ⟨z⟩, and their instance value, i.e., z, is maintained in PDFs, and -Posterior expectation values are indicated by a prime, ⟨z ⟩ 0 and C 0 , while unprimed ones represent prior expectation values, ⟨z⟩ and C.