Choice of positive distribution law for nuclear data

Nuclear data evaluation files in the ENDF6 format provide mean values and associated uncertainties for physical quantities relevant in nuclear physics. Uncertainties are denoted as Δ in the format description, and are commonly understood as standard deviations. Uncertainties can be completed by covariance matrices. The evaluations do not provide any indication on the probability density function to be used when sampling. Three constraints must be observed: the mean value, the standard deviation and the positivity of the physical quantity. MENDEL code generally uses positively truncated Gaussian distribution laws for small relative standard deviations and a lognormal law for larger uncertainty levels (>50%). Indeed, the use of truncated Gaussian laws can modify the mean and standard deviation value. In this paper, we will make explicit the error in the mean value and the standard deviation when using different types of distribution laws. We also employ the principle of maximum entropy as a criterion to choose among the truncated Gaussian, the fitted Gaussian and the lognormal distribution. Remarkably, the difference in terms of entropy between the candidate distribution laws is a function of the relative standard deviation only. The obtained results provide therefore general guidance for the choice among these distributions.


Introduction
Nuclear data evaluation files in the ENDF6 format [1] provide mean values and associated uncertainties for physical quantities relevant in nuclear physics. These uncertainties are denoted as D in the format description for most of the nuclear data parameter types, and are understood as standard deviations. Uncertainties can be completed (for microscopic cross sections, for example) by covariance matrices.
For uncertainty propagation based on random sampling, one needs to know the probability density function associated with each random variable. Current nuclear data evaluation files [2][3][4] do not provide any indication which probability density function to use, and users have to choose a distribution law. This is the case for all uncertain data propagated in fuel cycle code systems, such as independent fission yields, radioactive decay constants, radioactive decay energies, radioactive decay branching ratios and multigroup microscopic cross sections. For some data, in particular independent neutron fission yields and some microscopic cross sections, the relative standard deviation can be high (more than 50%), and Gaussian laws naturally lead to negative occurrences, which is not acceptable for those physical quantities.
Three constraints must therefore be respected when sampling: positivity of the physical quantity; its mean value; and its standard deviation.
Often the positively truncated Gaussian law is used, which takes into account positivity but introduces a bias in the mean value and the standard deviation. Furthermore, it is not symmetric around the mean value.
MENDEL [5], is the new generation of the CEA code system for nuclear fuel cycle studies. Its depletion solver is provided to the transport code systems APOLLO3 ® [6] and TRIPOLI-4 ® [7]. MENDEL is the successor of DARWIN/PEPIN2 [8].
Uncertainty quantification in MENDEL is based on a propagation method using a Monte Carlo approach by correlated sampling [9], and sampling is done by the CEA uncertainty platform URANIE [10,11]. Nuclear data uncertainties are propagated to physical quantities of interest (such as decay heat or concentrations) [12].
Until now, the choice of distribution law in MENDEL is done in the following way: positively truncated Gaussian law if the relative standard deviation is less than 50%; This choice is based on physical reasons, as truncated Gaussian distributions modify the mean value and the standard deviation values for large relative uncertainties. The switching point between the Gaussian and the lognormal distributions is a pragmatic choice. This paper aims to give a formal justification for this choice.
The structure of this paper is as follows. First, we will investigate the introduced bias in both the mean value and the standard deviation by a truncation of the Gaussian distribution. We will then describe how to modify the Gaussian law parameters in order to obtain after truncation the mean value and the standard deviation as specified in evaluation files.
In the second part, we employ the principle of maximum entropy [13,14] to choose between different distribution laws. We will show that the choice of the distribution law depends on the relative standard deviation.
We limit our study in this paper to the distribution laws themselves, without propagation in numerical code systems.

Maximum entropy method
For a continuous probability density function p(x) defined on I, we introduce the differential entropy defined as: We define p(x)lnp(x) = 0 when p(x) = 0 (due to This entropy function appears in statistical physics and thermodynamics where higher entropy is associated with states closer to equilibrium. The maximum entropy principle [14] states that for a given set of constraints (for example known mean value and standard deviation), probability distribution with the largest entropy should be chosen.
For given constraints, the law with the largest entropy is the one that contains the least amount of information about the physical quantity. For example, the maximum entropy principle will lead to the following choices: a uniform law if the constraints are minimal and maximal values; and a Gaussian law if the constraints are a given mean value and a standard deviation.

Candidate distribution laws
Candidate distribution laws must respect the three following criteria: positivity of the realizations: P(X < 0) = 0 (id est I = R + ), as nuclear data are positive; they must match the mean value m (within a given tolerance) as specified in the evaluation file; they must match the standard deviation s (within a given tolerance) as specified in the evaluation file.

Gaussian distribution
Let G(m,s) be the non-truncated Gaussian distribution with mean value m and standard deviation s. Its probability distribution function reads:

Entropy
The Gaussian distribution maximizes the differential entropy among all distribution laws for given mean value and standard deviation. The Gaussian distribution entropy is given by:

Positivity
A Gaussian distribution yields negative values with the probability: This negative occurrence probability is given as a function of the relative standard deviation in Figure 1.
Due to this negative occurrence probability, which is non-negligible when s m is large, it is necessary to use other distribution laws to enforce the positivity constraint.

Positively truncated Gaussian distribution 3.2.1 Distribution law
We define a Gaussian distribution with mean value m and standard deviation s, and then set the probability density to zero for negative values. The resulting distribution À after normalization À is called a positively truncated Gaussian distribution. Draws can be realized by sampling from the original Gaussian distribution and rejecting all negative values. Its probability density function reads: The constant b is defined so that ∫ ℝ p(x)dx = 1, which means: : The positively truncated Gaussian law will be denoted by PG(m,s).

Entropy
The differential entropy can be computed as a sum of ln m and a function of d:

Errors on moments
With this distribution, we modify the distribution moments, particularly for large relative uncertainties. The truncated Gaussian distribution mean value is equal to: And the truncated Gaussian distribution variance is equal to: We obtain the following relative error on the expected value, which is a function of the relative standard deviation of the original input data (i.e. non-truncated Gaussian distribution m and s parameters): This bias is represented in Figure 2 as a function of the parameter d.
The relative error of the standard deviation is also a function of d only: This bias is represented in Figure 3 as a function of the parameter d.
The squared relative discrepancy between the relative standard deviations of the Gaussian distribution and the truncated Gaussian distribution is given in equation (12).
The relative standard deviation discrepancy is represented in Figure 4. When choosing this truncated law, we obtain the numerical values for the errors shown in Table 1.
We can conclude that if the truncation is totally acceptable up to 25% uncertainty, it begins to be problematic for a 50% uncertainty, and is totally unacceptable for 100% uncertainty. These formal results confirm the need to switch to another law, which has already been introduced heuristically in the URANIE/MENDEL sampling scheme.

Distribution law
We consider a random variable in an evaluated nuclear data file characterized by a mean value m and a standard deviation s. We define a truncated Gaussian law q ¼ PGðm;sÞ so that its mean value equals m and its standard deviation equals s.
Please note the changed notation compared to equation (5). The variablesm,s here corresponds to m, s there.
The following specification ofb ensures the proper normalization of the distribution:

Coefficient determination
We obtain from equation (8):   And from equation (9): Leading to:m which leads to (17) by substitutingm as given in (16) in (15): Equation (17) contains one unknown variables, but its complexity does not enable a formal analytical solution.
Hence, we compute the solutions for several values of (m,s) tuples numerically.
To do so, we introduce a pseudo relative standard deviations m in equation (17).
With X ¼ d ¼ s m andX ¼s m we obtain: The distribution parameters are summarized in Table 2 for different values of X ¼ s m . Problematic values, i.e. negative and unreasonably large values, appear in bold letters.
The reader should note thatm ands are not the mean value and standard deviation of the truncated Gaussian distribution, which are respectively equal to m and s. A negative value ofm means that more than half of the distribution is truncated and the mode is positioned at zero, which may not be desirable.
For this reason, it is reasonable to not use the truncated Gaussian distribution for relative standard deviations larger than 75%.

Entropy
The differential entropy of the fitted Gaussian distribution is computed in the same way as in equation (7).

Distribution law
The probability density function of a lognormal distribution characterized by parameters m and s reads:

Coefficient determination
The mean value of a lognormal distribution is equal to: The variance of a lognormal distribution is equal to: which leads to:

Entropy
The differential entropy of the lognormal distribution reads: 4 Choice of the law

Entropy principle
The different distribution laws will now be compared based on their differential entropies. We show in Table 3 the entropy values for the truncated Gaussian law PG(m,s), the fitted truncated Gaussian law PGðm;sÞ À constistant with the mean value and the standard deviation provided in the evaluated nuclear data file À and the log-normal law LN(m,s). The values were obtained for m = 1. The differences between the entropies are independent from the choice of m.
Inspecting equations (7) and (24) shows that the difference between the truncated Gaussian law and the lognormal law is independent of m.
Even though the mode valuem of the fitted Gaussian distribution which has to be used in equation (7) differs from m, the difference to the lognormal distribution or the truncated Gaussian distribution is still a function of the relative standard deviation d only. Equation (7) for the fitted Gaussian law reads: In fact,d is a function of d only asd ¼s m ¼s m m m and: -s m depends of d only as it is the solution of equation (18); is also a function of d only.
In conclusion, the difference between the fitted truncated Gaussian law entropy and another candidate law entropy will be a function of d plus lnm À lnm ¼ lnm m , which is also a function of d only. Consequently, the difference in differential entropies is a function of the relative standard deviation only.
Despite entropy considerations, large discrepancies between the mode and the mean value are not desirable, which is an argument against the fitted truncated Gaussian in case of large relative uncertainties.
Using the entropy principle, we can see in Table 3 that À between those three laws À the modified Gaussian law is optimal, when the relative standard deviation is less than 80%.
When relative standard deviation is bigger than 80%, the truncated Gaussian distribution entropy is optimal. Nevertheless, for truncated Gaussian with high level of uncertainties, users need to take into account the huge discrepancy between objective and effective moments. Truncated Gaussian laws is not to be used in this context.
The comparison of the candidate laws for several values of s m in Figures 5 and 6 shows the similarity of the truncated Gaussian distribution and the Gaussian distribution in the case of small relative uncertainties. The fitted truncated Gaussian distribution is the first distribution to diverge from the Gaussian distribution, and resembles the lognormal distribution for large relative uncertainties (100% uncertainty, right part of Fig. 6).

Mode and mean
For a non-truncated Gaussian, mean value and mode are equal.
For the truncated Gaussian law (resp. the fitted truncated Gaussian law), the parameter m (resp.m) indicates the distribution mode. In general, it is preferable to use distributions where the mode does not differ too much from the mean value.

Conclusion
The Gaussian distribution is not adequate for positive physical quantities, especially for large relative uncertainties, as it leads to excessive amount of negative occurrences. For this reason, we investigated several positive distribution: the truncated Gaussian distribution, the fitted truncated Gaussian distribution and the lognormal distribution. All three laws exhibit zero probability density for negative values.
First, we studied the impact on the mean value and the standard deviation of the use of the truncated Gaussian distribution. We then compared the three distributions in terms of differential entropy. Despite the fact that the differential entropy is not scale invariant, the difference between two differential entropies is a function of the relative standard deviation only.
Both the maximum entropy principle and physical considerations have been considered in this work.
In summary, we suggest the following distribution recipe for choosing among the three distributions, depending on the relative standard deviation: for small values of relative uncertainties s m < 1 4 , the mean and standard deviation of the three laws are nearly identical. The differential entropy is slightly better for Gaussian laws. Hence, users can choose indifferently between the truncated Gaussian law PG(m,s) and the fitted truncated Gaussian law PGðm;sÞ; for intermediate values of relative uncertainties, id est , the principle of maximum entropy and favors the fitted truncated Gaussian law PGðm;sÞ; for large values of relative uncertainties s m ≥ 1 2 , positiveness of the mode and accuracy of the moments impose the choice of a lognormal law.
In conclusion we propose the following two laws:    Future work can be the study of other distribution laws, such as asymmetric Gaussian and mixed Gaussian laws [15]. Prospectives are the use of the proposed distribution laws in uncertainty quantification problems and the uncertainty propagation in nuclear reactor fuel cycle studies.