Open Access
Issue
EPJ Nuclear Sci. Technol.
Volume 2, 2016
Article Number 36
Number of page(s) 10
DOI https://doi.org/10.1051/epjn/2016026
Published online 16 September 2016

© T. Burr et al., published by EDP Sciences, 2016

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction, background, and implications

Nuclear material accounting (NMA) is a component of nuclear safeguards, which are designed to deter and detect illicit diversion of nuclear material (NM) from the peaceful fuel cycle for weapons purposes. NMA consists of periodically comparing measured NM inputs to measured NM outputs, and adjusting for measured changes in inventory. Avenhaus and Canty [1] describe quantitative diversion detection options for NMA data, which can be regarded as time series of residuals. For example, NMA at large throughput facilities closes the material balance (MB) approximately every 10 to 30 days around an entire material balance area, which typically consists of multiple process stages [2,3].

The MB is defined as MB = Ibegin + Tin − Tout − Iend, where Tin is transfers in, Tout is transfers out, Ibegin is beginning inventory, and Iend is ending inventory. The measurement error standard deviation of the MB is denoted σMB. Because many measurements enter the MB calculation, the central limit theorem, and facility experience imply that MB sequences should be approximately Gaussian.

To monitor for possible data falsification by the operator that could mask diversion, paired (operator, inspector) verification measurements are assessed by using one-item-at-a-time testing to detect significant differences, and also by using an overall difference of the operator-inspector values (the “D (difference) statistic”) to detect overall trends. These paired data are declarations usually based on measurements by the operator, often using DA, and measurements by the inspector, often using NDA. The D statistic is commonly defined as , applied to paired (Oj,Ij) where j indexes the sample items, Oj is the operator declaration, Ij is the inspector measurement, n is the verification sample size, and N is the total number of items in the stratum. Both the D statistic and the one-item-at-a-time tests rely on estimates of operator and inspector measurement uncertainties that are based on empirical uncertainty quantification (UQ). The empirical UQ uses paired (Oj,Ij) data from previous inspection periods in metrology studies to characterize measurement error variance components, as we explain below. Our focus is a sensitivity analysis of the impact of the uncertainty in the measurement error variance components (that are estimated using the prior verification (Oj,Ij) data) on sample size calculations in IAEA verifications. Such an assessment depends on the assumed measurement error model and associated uncertainty components, so it is important to perform effective UQ.

This paper is organized as follows. Section 2 describes measurement error models and error variance estimation using Grubbs' estimation [46]. Section 3 describes statistical tests based on the D statistic and one-verification-item-at-a-time testing. Section 4 gives simulation results that describe inference quality as a function of two sample sizes. The first sample size n1 is the metrology study sample size (from previous inspection periods) used to estimate measurement error variances using Grubbs' (or similar) estimation methods. The second sample size n2 is the number of verification items from a population of size N. Section 5 is a discussion, summary, and implications.

2 Measurement error models

The measurement error model must account for variation within and between groups, where a group is, for example, a calibration or inspection period. The measurement error model used for safeguards sets the stage for applying an analysis of variance (ANOVA) with random effects [4,69]. If the errors tend to scale with the true value, then a typical model for multiplicative errors is Iij=μij(1+SIi+RIij),(1) where Iij is the inspector's measured value of item j (from 1 to n) in group i (from 1 to g), μij is the true but unknown value of item j from group i, is the “item variance”, defined here as , is a random error of item j from group i, and is a short-term systematic error in group i. Note that the variance of Iij is given by . The term is the called “product variability” by Grubbs [6]. Neither RIij nor SIi are observable from data. However, for various types of observed data, we can estimate the variances and . The same error model is typically also used for the operator, but with and . We use capital letters such as I and O to denote random variables and corresponding lower case letters i and o to denote the corresponding observed values.

Figure 1 plots simulated example verification measurement data. The relative difference  = (o − i)/o is plotted for each of 10 paired (o,i) measurements in each of 5 groups (inspection periods), for a total of 50 relative differences. As shown in Figure 1, typically, the between-group variation is noticeable compared to the within-group variation, although the between-group variation is amplified to a quite large value for better illustration in Figure 1; we used δRO = 0.005, δSO = 0.001, δRI = 0.01, δSI = 0.03, and the value δSI = 0.03 is quite large. Figure 2a is the same type of plot as Figure 1, but is for real data (four operator and inspector measurements on drums of UO2 powder from each of three inspection periods). Figure 2b plots inspector versus operator data for each of the three inspection periods; a linear fit is also plotted.

thumbnail Fig. 2

Example real verification measurement data. (a) Four paired (O,I) measurements in three inspection periods; (b) inspector vs. operator measurement by group, with linear fits in each group.

thumbnail Fig. 1

Example simulated verification measurement data. The relative difference  = (o − i)/o is plotted for each of 10 paired (o,i) measurements in each of 5 groups, for a total of 50 relative differences. The mean relative difference within each group (inspection period) is indicated by a horizontal line through the respective group means of the paired differences.

2.1 Grubbs' estimator for paired (operator, inspector) data

Grubbs introduced a variance estimator for paired data under the assumption that the measurement error model was additive. We have developed new versions of the Grubbs' estimator to accommodate multiplicative error models and/or prior information regarding the relative sizes of the true variances [4,5]. Grubbs' estimator was developed for the situation in which more than one measurement method is applied to multiple test items, but there is no replication of measurements by any of the methods. This is the typical situation in paired (O,I) data.

Grubbs' estimator for an additive error model can be extended to apply to the multiplicative model equation (1) as follows. First, equation (1) for the inspector data (the operator data is analysed in the same way) implies that the within-group mean squared error (MSE), , has expectation where is the average value of  μij (assuming that each group has the same number of paired observations n). Second, the between-group MSE, , has expectation Therefore, both and are involved in both the within- and between-groups MSEs, which implies that one must solve a system of two equations and two unknowns to estimate and [4,5]. By contrast, if the error model is additive, only is involved in the within-group MSE, while both and are involved in the between-group MSE. The term in both equations is estimated as in the additive error model, by using the fact that the covariance between operator and inspector measurements equals [4,5]. However, will be estimated with non-negligible estimation error in many cases. For example, see Figure 2b where the fitted lines in periods 1 and 3 have negative slope, which implies that the estimate of is negative in periods 1 and 3 (but the true value of cannot be negative in this situation). We note that in the limit as approaches zero, the expression for the within-group MSE reduces to that in the additive model case (and similarly for the between-group MSE).

3 Applying uncertainty estimates: the D statistic and one-at-a-time-verification measurements

This paper considers two possible IAEA verification tests. First, the overall D test for a pattern is based on the average difference, . Second, the one-at-a-time test compares the operator to the corresponding inspector measurement for each item and a relative difference is computed, defined as dj = (oj − ij)/oj. If dj > 3δ, where where and (or some other alarm threshold close to the value of 3 that corresponds to a small false alarm probability), then the jth item selected for verification leads to an alarm. Note that the correct normalization used to define the relative difference is actually dj = (oj − ij)/μj, which has standard deviation exactly δ. But μj is not known in practice, so a reasonable approximation is to use dj = (oj − ij)/oj, because the operator measurement oj is typically more accurate and precise than the inspectors's NDA measurement ij. Provided (approximately), one can assume that dj = (oj − ij)/oj is an adequate approximation to dj = (oj − ij)/μj [10]. Although IAEA experience suggests that sometimes exceeds 0.20, usually [8].

3.1 The D statistic to test for a trend in the individual differences dj = oj − ij 

For an additive error model, Iij = μij + SIi + RIij, it is known [11] that the variance of the D statistic is given by , where and are the absolute (not relative) variances. If one were sampling from a finite population without measurement error to estimate a population mean, then where f = (N − n)/N  is the finite population correction factor, and σ2 is a quasi-variance term (the “item variance” as defined previously in a slightly different context), defined here as . Notice that without any measurement error, if n = N then f = 0, so , which is quite different from . Figure 1 can be used to explain why when there are both random and systematic measurement errors. And, the fact that when n = N and there are no measurement errors is also easily explainable.

For a multiplicative error model (our focus), it can be shown [11] that σD2=NnδR2j=1Nμj2+Total2δS2+NnnNσμ2δS2,(2) where and , and so to calculate in equation (2), one needs to know or assume values for (the item variance) and the average of the true values, . In equation (2), the first two terms are analogous to in the additive error model case. The third term involves and decreases to 0 when n = N. Again, in the limit as approaches zero, equation (2) reduces to that for the additive model case; and regardless whether is large or near zero, the effect of cannot be reduced by taking more measurements (increasing n in Eq. (2)).

In general, the multiplicative error model gives different results than an additive error model because variation in the true values, , contributes to in a multiplicative model, but not in an additive model. For example, let and , so that the average variance in the multiplicative model is the same as the variance in the additive model for both random and systematic errors. Assume δR = 0.10, δS = 0.02, (arbitrary units), and (50% relative standard deviation in the true values). Then the additive model has σD = 270.8 and the corresponding multiplicative model with the same average absolute variance has σD = 310.2, a 15% increase. The fact that var(μ) contributes to in a multiplicative model has an implication for sample size calculations such as those we describe in Section 4. Provided the magnitude of SIij + RIij is approximately 0.2 or less (equivalently, the relative standard deviation of SIij + RIij should be approximately 8% or less), one can convert equation (1) to an additive model by taking logarithms, using the approximation log(1 + x) ≈ x for |x| ≤ 0.20. However, there are many situations for which the log transform will not be sufficiently accurate, so this paper describes a recently developed option to accommodate multiplicative models rather than using approximations based on the logarithm transform [4,5].

The overall D test for a pattern is based on the average difference, . The D-statistic test is based on equation (2), where is the random error variance and is the systematic error variance of  = (o − i)/μ ≈ (o − i)/o, and is the absolute variance of the true (unknown) values. If the observed D value exceeds 3σD (or some similar multiple of σD to achieve a lot false alarm probability) then the D test alarms.

The test that alarms if D ≥ 3σD is actually testing whether D ≥ 3σˆD, where σˆD denotes an estimate of σD; this leads to two sample size evaluations. The first sample size n1 involves metrology data collected in previous inspection samples used to estimate , , and needed in equation (2). The second sample size n2 is the number of operator's declared measurements randomly selected for verification by the inspector. The sample size n1 consists of two sample sizes: the number of groups g (inspection periods) used to estimate and the total number of items over all groups, n1 = gn in the case (the only case we consider in examples in Sect. 4) that each group has n paired measurements.

3.2 One-at-a-time sample verification tests

The IAEA has historically used zero-defect sampling, which means that the only acceptable (passing) sample is one for which no defects are found. Therefore, the non-detection probability is the probability that no defects are found in a sample of size n when one or more true defective items are in the population of size N. For one-item-at-a-time testing, the non-detection probability is given by Prob(discover 0 defects in sample of size n)=i=Max(0,n+rN)Min(n,r)Ai×Bi,(3) where the term Ai is the probability that the selected sample contains i truly defective items, which is given by the hypergeometric distribution with parameters on i, n, N, r, where i is the number of defects in the sample, n is the sample size, N is the population size, and r is the number of defective items in the population. More specifically,

Ai=(ri)(Nrni)/(Nn), the above equation is the probability of choosing i defective items from r defective items in a population of size N in a sample of size n, which is the well-known hypergeometric distribution. The term Bi is the probability that none of the i truly defective items is inferred to be defective based on the individual d tests. The value of Bi depends on the metrology and the alarm threshold. Assuming a multiplicative error model for the inspector measurement (and similarly for the operator), implies that, for an alarm threshold of k = 3, for we have to calculate , where , which is given by the multivariate normal integral

Bi=1(2π)i/2|Σi|1/23δ3δexp{(zλ)TΣi1(zλ)2}dz1dz2dzi, where each of the components of λ are equal to 1 SQ/r (SQ is a significant quantity; for example, 1 SQ = 8 kg for Pu, and r was defined above as the number of defective items in the population). The term ∑i in the Bi calculation involved in the multivariate normal integral is a square matrix with i rows and columns with values on the diagonal and values on the off-diagonals.

4 Simulation study

The left hand side of equations (2) and (3) can be considered a “measurand” in the language used in the guide to expressing uncertainty in measurement [12]. Although the error propagation in the GUM is typically applied in a “bottom-up” uncertainty evaluation of a measurement method, it can also be applied to any other output quantity y (such as y = σD or y = DP) expressed as a known function y = f(x1, x2, …, xp) of inputs x1, x2, …, xp (inputs such as and ). The GUM recommends linear approximations (“delta method”) or Monte Carlo simulations to propagate uncertainties in the inputs to predict uncertainties in the output. Here we use Monte Carlo simulations to evaluate the uncertainties in the inputs and and also to evaluate the uncertainty in y = σD or y = DP as a function of the uncertainties in the inputs. Notice that equation (2) is linear in and so the delta method to approximate the uncertainty in y = σD would be exact; however, there is non-zero covariance (a negative covariance) between and that would need to be taken into account in the delta method.

We used the statistical programming language R [13] to perform simulations for example true values of , and the amount of diverted nuclear material. For each of 105 or more simulation runs, normal errors were generated assuming the multiplicative error model (1) for both random and systematic errors (see Sect. 4.2 for examples with non-normal errors). The new version of the Grubbs' estimator for multiplicative errors was applied to produce the estimates , , , , and , which were then used to estimate y = σD in equation (2) and y = DP in equation (3). Because there is large uncertainty in the estimates , , , unless is nearly 0, we also present results for a modified Grubbs' estimator applied to the relative differences that estimates the aggregated variances and , and also estimates . Results are described in Sections 4.1 and 4.2.

4.1 The D statistic to test for a trend in the individual differences dj = (oj − ij)/oj

Figure 3 plots 95% CIs for σD versus sample size n2 using the modified Grubbs' estimator applied to the relative differences for the parameter values δRO = 0.01, δSO = 0.001, δRI = 0.05, δSI = 0.005, , σμ = 0.01, N = 200 for case A (defined here and throughout as n1 = 4 with g = 2, n = 2) and for case B (defined here and throughout as n1 = 50 with g = 5, n = 10) . We used 105 simulations of the measurement process to estimate the quantiles of the distribution of y = σD. We confirmed by repeating the sets of 105 simulations that simulation error due to using a finite number of simulations is negligible. Clearly, and not surprisingly, the sample size in Case A leads to CI length that seems to be too wide for effectively quantifying the uncertainty in σD. The traditional Grubbs' estimator performs poorly unless σμ is very small, such as σμ = 0.0001. We use the traditional Grubbs' estimator in Section 4.2. The modified estimator that estimates the aggregated variances performs well for any value of σμ.

Figure 4 is similar to Figure 3, except Figure 4 plots the length of 95% CIs for 6 possible values of n1 (see the figure legend). Again, the case A sample size is probably too small for effective estimation of σD. In this example, the smallest length CI is for g = 5 and n = 100, but n = 100 is unrealistically large, while g = 3 and n = 10 or g = 5 and n = 10 are typically possible with reasonable resources. The length of these 95% CIs is one criterion to choose an effective sample size n1.

Another criterion to choose an effective sample size n1 is the root mean squared error (RMSE, defined below) in estimating the sample size n2 needed to achieve σD = 8/3.3 (the 3.3 is an example value that corresponds to a 95% DP to detect an 8 kg shift (1 SQ for Pu) while maintaining a 0.05 FAP when testing for material loss). In this example, the RMSE in estimating the sample size n2 needed to achieve σD = 8/3.3 is approximately 12.9 for case A and 8.0, 7.3, 6.8, 6.7, and 6.3, respectively, for the other values of n1 considered in Figure 4. These RMSEs are repeatable to within ±0.1 across sets of 105 simulations so the RMSE values are in the same order as the CI lengths in Figure 4. The RMSE is defined as RMSE=i=1105(nˆ2,in2,true)2105, where 2,i is the estimated sample size n2 in simulation i that is needed in order to achieve σD = 8/3.3, and n2,true is the true sample size n2 (n2,true = 22 in this example; see Fig. 3 where the true value of σD versus n2 is also shown) needed to achieve σD = 8/3.3.

Another criterion to choose an effective size n1 is the detection probability to detect specified loss scenarios. We consider this criterion in Section 4.3.

thumbnail Fig. 3

The estimate of σD versus sample size n2 for two values of n1 (case A: g = 2, n = 2 so n1 = 4, or case B: g = 5, n = 10 so n1 = 50).

thumbnail Fig. 4

Estimated lengths of 95% confidence intervals for σD versus sample size n2 for six values of n1 (g = 2, n = 2 so n1 = 4, g = 3, n = 5 so n1 = 15, etc.).

4.2 Uncertainty on the uncertainty on the uncertainty

The term “uncertainty” typically refers to a measurement error standard deviation, such as σD. Therefore, Figures 3 and 4 involve the “uncertainty of the uncertainty” as a function of n1 (defined as n1 = ng, so more correctly, as a function of g and n) and n2. Figures 57 illustrate the “uncertainty of the uncertainty of the uncertainty” (we commit to stopping at this level-three usage of “uncertainty”). The “uncertainty of the uncertainty” depends on the underlying measurement error probability density, which is sometimes itself uncertain. Figure 5 plots the familiar normal density and three non-normal densities (uniform, gamma, and generalized lambda, [14]). Figure 6 plots the estimated probability density (using the 105 realizations) of the estimated value of δIR using the traditional Grubbs' estimator for each of the four distributions (the true value of δIR is 0.05) and the five true standard deviations are the same as in Section 4.1 for generating the random variables (δRO = 0.01, δSO = 0.001, δRI = 0.05, δSI = 0.005, , σμ = 0.01, N = 200). Figure 7 is similar to Figure 3 (for g = 5, n = 10), except it compares CIs assuming the normal distribution to CIs assuming the generalized lambda distribution. That is, Figure 7 plots the estimated CI, again for the model parameters as above, for σD for the normal and for the generalized lambda distributions. In this case, the CIs are wider for the generalized lambda distribution than for the normal distribution. Recall (Fig. 5) that standard deviation of the four estimated probability densities are: 0.14, 0.25, 0.10, and 0.36 for the normal, gamma, uniform, and generalized lambda, respectively. Therefore, one might expect the CI for σD to be shorter for the normal than for a generalized lambda distribution that has the same relative standard deviation as the corresponding normal distribution.

thumbnail Fig. 7

95% confidence intervals for the estimate of σD versus sample size n2 for case B, assuming the measurement error distribution is either the normal or the generalized lambda distribution.

thumbnail Fig. 6

The estimated probability density for δˆIR in the four example measurement error probability densities (normal, gamma, uniform, and generalized lambda, each with mean 0 and variance 1) from Figure 4.

thumbnail Fig. 5

Four example measurement error probability densities: normal, gamma, uniform, and generalized lambda, each with mean 0 and variance 1.

4.3 One-at-a-time testing

For one-at-a-time testing, Figure 8 plots 95% confidence intervals for the estimated DP versus sample size n2 for cases A and B (see Sect. 4.1). The true parameter values used in equation (3) were δRO = 0.1, δSO = 0.05, δRI = 0.1, δSI = 0.05, , σμ = 0.01. And, a true mean shift of 8 kg in each of 10 falsified items was used (representing data falsification by the operator to mask diversion of material). The CIs for the DP were estimated by using the observed 2.5% and 97.5% quantiles of the DP values in 105 simulations. As in Section 4.1, we confirmed by repeating the sets of 105 simulations that simulation error due to using a finite number of simulations is negligible. The very small case A sample leads to approximately the same lower 2.5% quantile as did case B; however, the upper 97.5% quantile is considerably lower for case A than for case B. Other values for the parameters (δRO, δSO, δRI, δSI, , σμ, the number of falsified items, and the amount falsified per item) lead to different conclusions about uncertainty as a function of n2 in how the DP decreases as a function of n2. For example, if we reduce to in this example, then the confidence interval lengths are very short for both case A and case B.

For this same example, we can also compute the DP in using the D statistic to detect the loss (which the operator attempts to mask by falsifying the data). For the example just described (for which simulation results are shown in Fig. 8), the true DP in using the D statistic (using an alarm threshold of σD and n2 = 30 using Eq. (2)) is 0.65. The corresponding true DP for one-at-a-time testing is 0.27. Therefore, in this example, with 10 of 200 items falsified, each by an amount of 8 units, the D statistic has lower DP than the n2 = 30 one-at-a-time tests. In other examples, the D statistic will have higher DP, particularly when there are many falsified items in the population. For example, if we increase the number of defectives in this example from 10 of 200 to 20, 30, or 40 of 200, then the DPs are (0.17, 0.17), (0.08, 0.15), and (0.06, 0.14) for one-at-a-time testing and for the D statistic, respectively. These are low DPs, largely because the measurement error variances are large in this example. One can also assess the sensitivity of the estimated DP in using the D statistic to the uncertainty in the estimated variances; for brevity, we do not show that here.

thumbnail Fig. 8

Estimated detection probability and 95% confidence interval versus sample size n2 for cases A and B. The true detection probability is plotted as the solid (black) line.

5 Discussion and summary

This study was motivated by three considerations. First, there is an ongoing need to improve UQ for error variance estimation. For example, some applications involve characterizing items for long-term storage and the measurement error behaviour for the items is not well known, so an initial metrology study with to-be-determined sample sizes is required. Second, we recently provided the capability to allow for multiplicative error models in evaluating the D statistic (Eq. (2)) [4,5]. Third, we recently provided the capability to allow for both random and systematic errors in one-at-a-time item testing (Eq. (3)).

We presented a simulation study that assumed error variances are estimated using an initial metrology study characterized by g measurement groups and n paired operator, inspector measurements per group. Not surprisingly, both one-item-at-a-time testing and pattern testing using the D statistic, it appears that g = 2 and n = 2 is too small for effective variance estimation.

Therefore, the sample sizes in the previous and current inspections will impact the estimated DP and FAP, as is illustrated by numerical examples. The numerical examples include application of the new expression for the variance of the D statistic assuming the measurement error model is multiplicative (Eq. (2)) is used in a simulation study and new application of both random and systematic error variances in one-item-at-a-time testing (Eq. (3)).

Future work will evaluate the impact of larger values of product variability, on the standard Grubbs' estimator; this study used a very small value of , which is adequate in some contexts, such as product streams. The value of could be considerably larger in some NM streams, particularly waste streams. Therefore, this study also evaluated the relative differences dj = (oj − ij)/oj to estimate the aggregated quantities needed in equations (2) and (3), , using a modified Grubbs' estimation, to mitigate the impact of noise in estimation of σμ. Because is a source of noise in estimating the individual measurement error variances [15], a Bayesian alternative is under investigation to reduce its impact [16]. Also, one could base a statistical test for data falsification based on the relative differences between operator and inspector measurements d = (o − i)/o in which case an alternate expression to equation (2) for σD that does not involve the product variability would be used.

5.1 Implications and influences

This study was motivated by three considerations, each of which have implications for future work. First, there is an ongoing need to improve UQ for error variance estimation. For example, some applications involve characterizing items for long-term storage and the measurement error behaviour might not be well known for the items, so an initial metrology study with to-be-determined sample sizes is required. Second, we recently provided the capability to allow for multiplicative error models in evaluating the D statistic (Eq. (2) in Sect. 3) [4,5]. Third, we recently provided the capability to allow for both random and systematic errors in one-at-a-time item testing (Eq. (3) in Sect. 3). Previous to this work, the variance of the D statistic was estimated by assuming measurement error models are additive rather than multiplicative, and one-at-a-time item testing assumed that all measurement errors were purely random.

Acknowledgments

The authors acknowledge CETAMA for hosting the November 17–19, 2015 conference on sampling and characterizing where this paper was first presented.

References

  1. R. Avenhaus, M. Canty, Compliance Quantified (Cambridge University Press, 1996) [CrossRef] [Google Scholar]
  2. T. Burr, M.S. Hamada, Revisiting statistical aspects of nuclear material accounting, Sci. Technol. Nucl. Install. 2013, 961360 (2013) [Google Scholar]
  3. T. Burr, M.S. Hamada, Bayesian updating of material balances covariance matrices using training data, Int. J. Prognost. Health Monitor. 5, 6 (2014) [Google Scholar]
  4. E. Bonner, T. Burr, T. Guzzardo, T. Krieger, C. Norman, K. Zhao, D.H. Beddingfield, W. Geist, M. Laughter, T. Lee, Ensuring the effectiveness of safeguards through comprehensive uncertainty quantification, J. Nucl. Mater. Manage. 44, 53 (2016) [Google Scholar]
  5. T. Burr, T. Krieger, K. Zhao, Grubbs' estimators in multiplicative error models, IAEA report, 2015 [Google Scholar]
  6. F. Grubbs, On estimating precision of measuring instruments and product variability, J. Am. Stat. Assoc. 43, 243 (1948) [CrossRef] [Google Scholar]
  7. K. Martin, A. Böckenhoff, Analysis of short-term systematic measurement error variance for the difference of paired data without repetition of measurement, Adv. Stat. Anal. 91, 291 (2007) [CrossRef] [Google Scholar]
  8. R. Miller, Beyond ANOVA: Basics of Applied Statistics (Chapman & Hall, 1998) [Google Scholar]
  9. C. Norman, Measurement errors and their propagation, Internal IAEA Document, 2014 [Google Scholar]
  10. G. Marsaglia, Ratios of normal variables, J. Stat. Softw. 16, 2 (2006) [Google Scholar]
  11. T. Burr, T. Krieger, K. Zhao, Variations of the D statistics for additive and multiplicative error models, IAEA report, 2015 [Google Scholar]
  12. Guide to the Expression of Uncertainty in Measurement, JCGM 100: www.bipm.org (2008) [Google Scholar]
  13. R Core Team R, A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2012): www.R-project.org [Google Scholar]
  14. M. Freimer, G. Mudholkar, G. Kollia, C. Lin, A study of the generalized Tukey Lambda family, Commun. Stat. Theor. Methods 17, 3547 (1988) [CrossRef] [Google Scholar]
  15. F. Lombard, C. Potgieter, Another look at Grubbs' estimators, Chemom. Intell. Lab. Syst. 110, 74 (2012) [CrossRef] [Google Scholar]
  16. C. Elster, Bayesian uncertainty analysis compared to the application of the gum and its supplements, Metrologia 51, S159 (2014) [CrossRef] [Google Scholar]

Cite this article as: Tom Burr, Thomas Krieger, Claude Norman, Ke Zhao, The impact of metrology study sample size on uncertainty in IAEA safeguards calculations, EPJ Nuclear Sci. Technol. 2, 36 (2016)

All Figures

thumbnail Fig. 2

Example real verification measurement data. (a) Four paired (O,I) measurements in three inspection periods; (b) inspector vs. operator measurement by group, with linear fits in each group.

In the text
thumbnail Fig. 1

Example simulated verification measurement data. The relative difference  = (o − i)/o is plotted for each of 10 paired (o,i) measurements in each of 5 groups, for a total of 50 relative differences. The mean relative difference within each group (inspection period) is indicated by a horizontal line through the respective group means of the paired differences.

In the text
thumbnail Fig. 3

The estimate of σD versus sample size n2 for two values of n1 (case A: g = 2, n = 2 so n1 = 4, or case B: g = 5, n = 10 so n1 = 50).

In the text
thumbnail Fig. 4

Estimated lengths of 95% confidence intervals for σD versus sample size n2 for six values of n1 (g = 2, n = 2 so n1 = 4, g = 3, n = 5 so n1 = 15, etc.).

In the text
thumbnail Fig. 7

95% confidence intervals for the estimate of σD versus sample size n2 for case B, assuming the measurement error distribution is either the normal or the generalized lambda distribution.

In the text
thumbnail Fig. 6

The estimated probability density for δˆIR in the four example measurement error probability densities (normal, gamma, uniform, and generalized lambda, each with mean 0 and variance 1) from Figure 4.

In the text
thumbnail Fig. 5

Four example measurement error probability densities: normal, gamma, uniform, and generalized lambda, each with mean 0 and variance 1.

In the text
thumbnail Fig. 8

Estimated detection probability and 95% confidence interval versus sample size n2 for cases A and B. The true detection probability is plotted as the solid (black) line.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.