| Issue |
EPJ Nuclear Sci. Technol.
Volume 11, 2025
Special Issue on ‘Overview of recent advances in HPC simulation methods for nuclear applications’, edited by Andrea Zoia, Elie Saikali, Cheikh Diop and Cyrille de Saint Jean
|
|
|---|---|---|
| Article Number | 55 | |
| Number of page(s) | 17 | |
| DOI | https://doi.org/10.1051/epjn/2025054 | |
| Published online | 16 September 2025 | |
https://doi.org/10.1051/epjn/2025054
Regular Article
Data-driven reduced order modelling with malfunctioning sensors recovery applied to the Molten Salt Reactor case
1
Politecnico di Milano, Energy Department- Nuclear Engineering Division, 20156 Milano, Italy
2
MINES Paris, PSL University, CRC, Sophia Antipolis, France
3
Emirates Nuclear Technology Center (ENTC), Department of Mechanical and Nuclear Engineering, Khalifa University, Abu Dhabi, 127788, United Arab Emirates
* e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
27
May
2025
Received in final form:
6
August
2025
Accepted:
7
August
2025
Published online: 16 September 2025
This work presents the use of two Data-Driven Reduced Order Modelling techniques in predicting the transient response of a Molten Salt Fast Reactor when one or more sensors fail and, thus, provide wrong information; Supervised Machine Learning techniques are used to compensate for the failed sensors. Data-Driven Reduced Order Modelling integrate the physical knowledge contained in high-fidelity mathematical models with that coming from data measured on the actual system. This enables refining and updating the mathematical model, and address the challenges related to local-only observations, allowing for global state estimation. These methods are of interest when both sources of information are present, albeit incomplete, as is the case of the Molten Salt Fast Reactor. In these designs, typically operating in the fast neutron spectrum, the fuel is liquid, and no solid structures are foreseen in the core, thus making sensing and monitoring of safety-critical parameters and quantities quite challenging. Additionally, most literature studies on Data-Driven Reduced Order Modelling take the experimental observations as (noisy) ground-truth: very few works consider the case in which sensor fail or malfunction, and how this affect the state estimation.
© S. Riva et al., Published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
When analysing the behaviour of complex engineering systems, and even more so for safety-critical applications such as those of nuclear reactor engineering, the state-of-the-art approach is to use physics-based high-fidelity numerical models, as accurate as possible. Despite the advancements in computational hardware, the drawback becomes the computational cost of these models, both in terms of storage (and data-sharing) and simulation time; a cost that becomes even more unsustainable for online control and dynamic monitoring. To address this drawback, several algorithms for Reduced Order Modelling (ROM) have been developed [1, 2]. The literature on this topic is quite abundant, and the authors of this work have focused mainly on Reduced Basis approaches [3]. Briefly, given a dataset of solutions of the Full Order Model (FOM), typically called snapshots 1, through this approach, one can extract the fundamental modes, which describe the dominant spatial physics within the dataset [4]. These modes, then, define a surrogate space, whose dimension is much smaller than the degrees of freedom of the starting FOM, upon which a reduced model is built. This surrogate model is now much more efficient, in terms of computational burden, compared to the FOM, at the cost of some accuracy: however, it is possible to tune the dimension of the surrogate space to achieve the desired accuracy level [2]. Its uses span from inverse problems like parameter estimation, that is, retrieving physical parameter values characterising transients not included in the starting dataset without the need of solving the FOM again, and forecasting of the future state of the system. Additionally, optimisation problems such as those within Data Assimilation (DA) algorithms [4] can now be solved on the reduced space, with a significant reduction in complexity and computational cost: as, traditionally, the optimisation problem was the bottleneck for most state-of-the-art variational DA methods [5], their integration with ROM techniques looks very promising.
For this purpose, the authors of the present work have defined a novel framework, called Data-Driven Reduced Order Modelling (DDROM) [6], a branch of Scientific Machine Learning [7] fusing physically consistent data and governing equations with data-driven machine learning methodologies. This framework aims to update and correct the background mathematical model using observations (either from experimental sensors or some higher-fidelity model) and to select the optimal positioning of sensors to maximise the amount of information collected [8, 9]. Most ROM techniques can be adapted within the DDROM framework: the present work restricts the focus on two of them, the Generalised Empirical Interpolation Method (GEIM) [9] and the Parameterised-Background Data-Weak (PBDW) formulation [8], already tested for nuclear reactor engineering problems [10–14]. Although these studies consider noisy measurements, proposing also regularisation techniques if needed [15, 16], all of them assume that the data coming from the sensors are the ground-truth, implicitly assuming that a sensor cannot malfunction nor it can fail; thus, there is no consideration regarding the robustness of the DDROM algorithms and their performance in presence of malfunctioning sensors.
However, faulty sensors imply a faulty ground-truth, meaning that the DDROM algorithm is fed wrong reference values; in an engineering system, sensors may fail or malfunction for multiple reasons, and any methodology that relies on measurements should implement corrective actions to retain its reliability. Therefore, to use the DDROM framework outside laboratory test cases, it becomes mandatory to assess how its algorithms perform in the presence of malfunctioning or faulty sensors, to discriminate between working and faulty sensors and to determine strategies to retrieve the missing information [17]. Available literature on this topic, to the best of the authors’ knowledge, is quite scarce: it is worth mentioning the work of [18], which developed a framework for ‘robust data assimilation’ by changing the traditional norm formulation from L2 to L1 to dynamically adjust the importance weight of the data according to their deviation from the mean forecast, the work of [19], in which the missing values of incomplete sensors are approximated with the Gappy-POD technique, and the work of [20], which developed a robust Dynamic Mode Decomposition by suppressing outliers in the dataset; there has not been an in-depth investigation on the performance of GEIM and PBDW methods n presence of faulty sensors [21]. Regarding the implementation, this work follows a companion approach as [19], adding another building block to the DDROM framework by implementing Machine Learning (ML) techniques to retrieve the missing information (or correct the wrong information) of the faulty sensor: more in detail, Gaussian Process Regression (GPR) is used [22]. Additionally, this work also includes a preliminary analysis on faulty sensor recognition, both by adopting a Random Forest algorithm [23] for a priori classifying and labelling sensor faults within the training dataset, and by leveraging the trained GPR to identify unexpected signals coming from the system in quasi-real-time, and thus automatically substituting the faulty measure with the GPR prediction without the need of knowing a priori which sensor has failed.
As a test case, this work builds from the results obtained in [14], which applied DDROM algorithms to the study of an accidental transient scenario of the European Molten Salt Fast Reactor [24] for online monitoring in presence of noisy measurements, introducing the possibility of having one or more malfunctioning sensors under the assumption that a single (unknown) sensor can fail at the beginning of the transient. Regarding sensor positioning, the premise of this work is to assess whether auxiliary sensors, located in ‘safer’ (from the point of view of the sensor failure probability) regions of the domain can be used to recover the missing information from a failed sensor located in a safety-critical region of the domain, characterised by much higher sensor failure probability due to the harsher environment conditions: for the MSFR, the two regions are, respectively, the solid outside reflector and the liquid core2.
Section 2 includes a brief presentation of the DDROM framework and the two algorithms (GEIM and PBDW) used in this work; Section 3 then describes the strategy for sensor recovery. Section 4 reports the main results, first assessing the performance of the algorithms in the presence of a malfunctioning sensor, exploring two different recovery strategies and discussing autonomous faulty sensor classification and recognition. Finally, Section 5 summarises the key findings of the study and reports some future perspectives regarding this research topic.
2. Data-driven reduced order modelling
Reduced Order Modelling techniques aim to reduce the computational complexity of high-fidelity numerical models by efficiently compressing the information while maintaining the desired level of accuracy. Being based on ROM techniques, the DDROM framework shares the typical offline-online decomposition [1], or training/testing decomposition typical of ML approaches, with the additional step of optimal sensor placement algorithms to include experimental measurements. Figure 1 shows the scheme of the DDROM framework [6]. The DDROM framework, then, combines the background knowledge in the mathematical model with the information retrieved from the experimental sensors. Compared to stand-alone ROM methods, the accuracy of the FOM no longer bounds the accuracy of the state estimate/reconstruction; by integrating measurements of the fields of interest, the latter can be improved by including unforeseen uncertainties and non-modelled physics [6, 16, 25].
![]() |
Fig. 1. Schematic of the DDROM framework, highlighting the offline-online decomposition: from a series of snapshots, the ROM algorithm extracts the fundamental modes, performing dimensionality reduction and building the DDROM model; in the online phase, measurements are given as input to the reduced model to obtain an updated state estimation. Taken from [21]. |
-
In the offline phase, the FOM is solved several times for different values of the model parameters μ ∈ 𝒟 ⊂ ℝp (including time and/or thermo-physical properties, input parameters, boundary conditions): each full-order solution or snapshot uFOM(x; μn)∈ℝ𝒩h, which depends on the spatial coordinate and the vector of parameters μ within the training dataset Ξtrain, is then collected and stored. From this dataset, through a dimensionality compression technique of choice, it is possible to extract a set of basis functions {ψn(x)∈ℝ𝒩h}n = 1N, whose physical interpretation depends on the reduction algorithm used; these basis functions span a linear subspace of the solution space, and they embed the fundamental spatial dependence of the training dataset. The FOM is, then, projected onto this surrogate space, whose dimension N is much smaller than the degree of freedoms 𝒩h of the FOM, to build a Reduced Order Model, from which the modal coefficients {α(μn)}n = 1N, which embed the parametric dependence (including time) of the training dataset, can be retrieved. Finally, the problem of finding the optimal positioning of sensors within the physical system to maximise the amount of information that can be extracted and solved in the reduced space, significantly reducing its computational cost: this implies searching for the optimal configuration within a large space of possible combinations, but the costs associated to this task can be greatly reduced through the aforementioned reduction process.
-
In the online phase, measurements y(utrue(x, μ*)) ∈ ℝM are collected from aptly-positioned sensors to retrieve an augmented state estimate
that accounts for both source of information. The ROM is solved quickly and accurately due to its much lower dimension. Finally, the output of the reduced model can be decoded back to the full state of the system, approximating the true solution.
Regarding the selected reduction techniques, this work adopts the Generalised Empirical Interpolation Method and the Parameterised-Background Data-Weak formulation, both have been implemented within the pyforce package, leveraging the Python programming language [6, 14]. The former is part of the ROSE framework (Reduced Order multi-phySics data-drivEN), developed and maintained by the authors and available,under the MIT license, at https://github.com/ERMETE-Lab/ROSE-pyforce.
2.1. Generalised empirical interpolation method
The Generalised Empirical Interpolation Method is a DDROM technique, first proposed in [9]. The basis functions ψn(x) and the optimal locations of sensors are selected following a greedy procedure. More in detail, the GEIM approximates a given function u with a suitable interpolant:
where the magic/basis functions {qm(x)}m = 1M embed the spatial behaviour, whereas the coefficients {βm(μ)}m = 1M contain the parametric dependence. Each magic function is associated with a magic sensor υm(⋅, xm, s), which can be represented mathematically by a linear functional centred in xm ∈ Ω with a point-spread s ∈ ℝ+ representing the area onto which the sensor collects data, to simulate the fact that, since sensors have a physical dimension, they do not collect point-wise data; the value of the point-spread depends on the sensor itself and the spatial discretisation of the FOM [16].
-
In the offline phase, the greedy algorithm returns a set of magic functions and sensors by minimising the interpolation error between the interpolant ℐM and the training dataset. The magic sensors are selected from the library Υ, representing the set of available locations in the domain.
-
In the online phase, measurements are collected from the magic sensors:
with ϵm ∼ 𝒩(0, σ2) being uncorrelated Gaussian random noise and σ2 its standard deviation, also called the noise level. The reduced coefficients β ∈ ℝM can, then, be determined by solving the linear system (having dimension M ≪ 𝒩h) resulting from the interpolation condition equation 1. To avoid having an unbounded error due to noisy data [26], this work adopts the Tikhonov-regularised version of the GEIM, henceforth called TR-GEIM, developed by the authors in [16]; this version weakens the interpolation condition by adding a penalty term λ, and the equivalent linear system of dimension M2 to be solved becomes:
given 𝔹ij = υi(qj), 𝕋 the regularisation matrix, which depends on the standard deviation of the training coefficients β and λ the regularisation parameter, whose optimal value is σ2 in the optimal case of unconstrained sensors [14].
For more details on the mathematical implementation of the TR-GEIM algorithm, interested readers can refer to [16].
2.2. Parameterised-background data-weak formulation
Among all reduction techniques that are built upon the Reduced Basis framework, the PBDW (Parameterised Background Data-Weak) [8] offer a general formulation for coupling with additional data. Derived from the general DA problem statement [5, 27], the PBDW algorithm aims at approximating the state u(x, μ) through the linear combination of the available sources of information, namely the physical knowledge from the mathematical model zN and the information from the data ηM:
where {ξn}n = 1N is the basis of the N-dimensional reduced space spanned by the mathematical model and {αn}n = 1N is the associated weight coefficient, whereas {θn}n = 1N is the basis of the M-dimensional update space obtained from the data, with {g}n = 1N its weight coefficients.
-
In the offline phase, to build the reduced space of the mathematical model, any Reduced Basis techniques can be used: in particular, the present work uses the Proper Orthogonal Decomposition (POD) [1]; for what concerns the update space, which typically refers to the experimental data, a sGREEDY procedure [25, 27, 28] is used, with the overall goal of minimising the reconstruction error by selecting the optimal positioning of the available sensors {υm}m = 1M. In particular, the basis functions gm of the update space are the Riesz representation of the linear functional υm [6, 8, 25].
-
In the online phase, the weight coefficients α ∈ ℝN and θ ∈ ℝM are computed by solving the following (N + M)2 linear system:
given 𝔸m, m′ = (gm, gm′)L2(Ω), 𝕂m, n = (gm, ξn)L2(Ω), and χ a hyperparameter that should be tuned using cross-validation to improve the performance of the algorithm [14], and 𝕀 being the identity matrix.
The PBDW algorithm is generally stable in the presence of noise; therefore, it does not require additional regularisation, and the hyperparameter χ becomes a weight of the relative importance of models and measurements [28]. For more information on the algorithm and its implementation, interested readers can refer to [6, 8].
3. Strategy for sensor recovery
With noisy measurements as input data, the TR-GEIM and the PBDW formulation are reliable and robust in performing state reconstruction and state estimation, given a correct tuning of the hyperparameters λ and χ. However, to the best of the authors’ knowledge, research regarding the performance of these algorithms (and more in general, of DDROM and DA-based ROM) in presence of one or more malfunctioning sensor is still scarce [17], despite the significant importance for the deployment of such techniques in industrial settings.
As a first analysis, this work focuses on two classes of possible malfunctions, both represented by the general formulation:
High values of κ correspond to drifting, that is, a time-uniform shift compared to the average value; high values of ρ, conversely, indicate unexpected spikes (that is, not related to random noise). Modelling the lifetime of the sensor during the incidental transient is outside the scope of this work; instead, this work includes a preliminary analysis on faulty sensor recognition and supervised classification, discussed more detail in Section 3.1. The recovery strategy proposed in this work relies on the use of Supervised Learning (SL) methods and the premise of redundancy: the missing information about the measurement of the malfunctioning sensor is retrieved through the use of auxiliary sensors, located in ‘safer’ regions and trade measurement accuracy with reliability and robustness. For the case of nuclear reactors, where the in-core region is characterised by high neutron fluences and therefore the sensors are subjected to significant radiation, this implies positioning sensors outside the core, where the radiation field, and thus the probability of failure, will be lower. Therefore, the problem in principle becomes two-fold, as the algorithm must be able to infer the state of the system in one region of the domain from sensors located in another part of the domain: both TR-GEIM and PBDW are quite suited for the latter problem [13, 14].
To solve the former, Supervised Learning methods can be used to learn the input-output relationship between primary sensors and auxiliary ones, that is, the map yk(t)=ℱ(yext); once this map is learned and a surrogate model is built, it is possible to map the signal from auxiliary sensors back to the in-core ones, thus retrieving the expected measurement despite the presence of a faulty signal. The above strategy is shown in Figure 2 for the case study of the Molten Salt Fast Reactor (MSFR) and assuming that the observable field is temperature T (although the strategy hereby discussed can be adapted for any case study and any observable field of interest). In-core measurements yT are labelled from 1 to 15, and they are represented by black dots; positioning of these sensors is determined by a greedy procedure, meaning that they are ordered according to their importance. Auxiliary out-core measurements yextT are labelled from A to D and represented by blue dots. The Supervised Learning technique selected for this work is the Gaussian Process Regression (GPR) method [22], whose output replaces the missing information from the malfunctioning sensor.
![]() |
Fig. 2. Recovery of missing information of failed sensor ykT (in black on the left) from external sensors yextT(in blue), for the temperature field T. Taken from [21]. |
Generally speaking, SL algorithms find a model f that can infer the existing relationship between input and output; in particular, the GPR infers a probabilistic distribution, used for predicting yi in the presence of unknown input values x. Then, the prior distribution of f is a Gaussian Process (GP) f(x)∼GP(μ(x, 𝒦(x, x′)), with mean value μ(x) (typically taken as a constant function based on the training measurements) and covariance function 𝒦(x, x′); this latter term is also called kernel, whose selection is a hyperparameter of the GPR model. This work uses the default Radial Basis Function (RBF) kernel. The kernel function depends on the hyperparameters ϑ, whose values must be tuned by maximising a log-marginal likelihood loss function [21].
GPR models include the knowledge within the training dataset 𝒳 into the prior distribution to obtain a posterior distribution. Given a set of training input measurements 𝕏 = [x1|x2|...xNs
]∈ℝd × Ns, a set of corresponding outputs y = [y1, y2, ..., yNs
]T ∈ ℝNs and a set of unseen test input data 𝕏* ∈ ℝd × M, predictions y* ∈ ℝM can be made by using the theorem of conditional Gaussian, in which the conditional distribution of the predicted GP is
[29].
The GPR implementation used in this paper is from the Python package GPy (https://github.com/SheffieldML/GPy), which includes the hyperparameter optimisation.
3.1. Faulty sensor recognition
Compared to the preliminary work done by the authors in [21], in which the state of each sensor was known a-priori and thus it was an input to the GPR algorithm, the present work removes this hypothesis by implementing faulty sensor recognition. In particular, two different strategies have been used.
The first strategy assumes that the map yk(t)=ℱ(yext) has been learnt for each sensor k. Then, given a new set of measurements (retaining the assumption that only one sensor fails at time t = 0), the malfunctioning sensor can be recognised by comparing each trajectory yk(t) with the one predicted by the GPR. For the comparison, the z-score has been defined as:
recalling
, that is, the trajectory predicted by the GPR being described by the mean
and the covariance
. This quantity is, then, compared to a threshold, whose value must be tuned to discriminate between measurement noise and the actual faulty signal. It is worth mentioning that this strategy does not require labelled data, as it works by simply comparing the incoming signal to its expected value.
In the second strategy, instead, a supervised learning classification algorithm is used to classify sensors a-priori: this allows a more thorough training of the GPR algorithm, as now the model can discriminate between healthy and faulty sensors. The selected classifier is Random Forest (RF) [23], with output being the class of sensor failure (for the present test case, four different labels are possible: ‘Healthy’, ‘Drift’, ‘Spikes’, ‘Both’). The implementation used here is from the sklearn Python package [30]. It is worth mentioning that in actual applications with real-world sensors, labelled data may be unavailable, and the RF classifier may be substituted, for example, with a simple binary classifier, only discriminating between ‘Healthy’ and ‘Faulty’ sensors, which will be enough for the algorithm.
4. Numerical results
This work considers the Unprotected Loss of Fuel Flow (ULOFF) accidental transient scenario for the MSFR, using the 2D axisymmetric wedge of the EVOL geometry [24] with an external Hastelloy reflector layer [31] as in Figure 3.
![]() |
Fig. 3. EVOL geometry. The dark blue external layer represent the Hastelloy solid reflector, whereas the lighter blue domain is the primary loop containing the liquid fuel. The location of the pump and heat exchanger (green and red, respectively) is also reported. |
For sensor placement, main sensors (used as input to the DDROM) can be located freely within the domain, whereas auxiliary sensors (aimed at recovering the information from failed sensors) can be located only in the solid reflector region, where the neutron fluence is lower. The simulation also includes a momentum source, representing the primary loop pump and a heat sink, the primary-to-intermediate loop heat exchanger. The model is solved within the OpenFOAM environment using the custom-made msfrPimpleFoam coupled solver, developed by Politecnico di Milano [32] and adapted by the authors to include the solid reflector layer [31]: in particular, the modified solver imposes the continuity of the variables at the interface between regions, whereas at the external reflector, wall vacuum and adiabatic conditions are imposed, respectively, on the neutron fluxes and temperature fields.
The FOM dataset then includes 275 snapshots, with time domain [0 : 0.2 : 55] seconds. This dataset Ξ is split into three subsets:
-
train set Ξtrain, including 75% of the starting dataset up to time instant ti = 250 s;
-
test set Ξtest, including 25% of the starting dataset up to time instant ti = 250 s, used for cross-validation;
-
predict set Ξpredict, which includes the last snapshots from ti = 250 s to tM = 275 s, to analyse the forecast capabilities of the reconstruction algorithm, and includes 25 snapshots.
The train-test split was done randomly using the Python package scikit-learn. To ensure training stability and to improve the performance of the algorithms, given that different fields have quite different orders of magnitude, all FOM snapshots were normalised to the maximum value of the initial condition: given a generic field u:
Sensors are described by linear functionals with a Gaussian kernel centred in the sensor position and with point-spread s = 0.025 [14]. Normalised measurements were synthetically generated during the online phase of TR-GEIM and PBDW using the sensors selected by (2), and corrupted by Gaussian white noise with variance σ2 = 10−3.
4.1. Performance of TR-GEIM and PBDW with malfunctioning sensors
The performance of TR-GEIM and PBDW algorithms in the presence of a malfunctioning sensor and without implementing any recovery strategy will be analysed first. Assuming that a single sensor fails at time t = 0, its output yk will be corrupted according to (6), using κ = 0.1 and ρ = 0.5 to simulate the generic malfunctioning scenario of a fixed drift from the truth value and oscillations caused by an increase in noise due, for example, to electrical contacts (Fig. 4).
![]() |
Fig. 4. Example of malfunctioning sensors with both a drift from the true value (κ = 0.1) and spikes due to unwanted noise (ρ = 0.5). |
To measure the performance of the two DDROM algorithms, the absolute reconstruction error is used, given r[u(⋅, t](x)=|u(⋅, t)−ℛM[u(⋅, t)]| as the residual field:
where M = 153 is the number of sensors for the field u, chosen during the offline phase by measuring the reducibility of the problem [1, ℛM[u(⋅, t)] is the reconstruction operator for TR-GEIM, Ek is the average absolute error measured in L2-norm assuming that the kth has failed, and given Ξ* ⊂ 𝒟 a subset of the parameter space with unseen data (either test or predict sets).
Figure 5 shows the reconstruction error for the two algorithms when a single sensor k fails, compared against the reference case of no malfunctioning sensors (which serves as a best-case scenario for the error). As expected, the error increases for both algorithms, meaning that the reconstruction is becoming less and less accurate: in particular, PBDW shows a larger deterioration in performance, with the error increasing by three orders of magnitude. Generally, the errors increase compared to the reference case for the sensor k depends on the variability of the measurement itself: sensors that measure less variability will influence less the performance of the algorithms, whereas sensors which record more variability will influence more in reconstruction. Still, regardless of their impact, a single failed sensor is enough to significantly worsen the reconstruction in the whole domain, as can be seen in Figure 6 for temperature and, for sake of brevity, the TR-GEIM algorithm only, with PBDW showing a similar behaviour (for the scenario κ = 0.1 and ρ = 0.5, and failed sensor #4): reconstruction is visibly worse in the entire domain, not only in the region near the failed sensor.
![]() |
Fig. 5. Reconstruction error (computed using Eq. 9 for the most general case of both κ and ρ different from zero. The red line reports the error for the reference case without malfunctioning sensors, serving as a lower bound. The shaded area indicates the uncertainty band of the error (each scenario is repeated 20 times randomly sampling the measurement noise, to obtain statistical relevancy). |
![]() |
Fig. 6. Absolute error in the domain for the reconstruction of the temperature field, assuming malfunctioning sensor #4. |
In principle, once the failed sensor has been correctly identified, the algorithm could remove the corrupted measurements from the associated linear system. The effectiveness of this strategy depends strongly on the field under consideration and the selected DDROM algorithm. Indeed, for TR-GEIM, deleting the kth observation means deleting also the associated kth basis function: since TR-GEIM adopts a greedy approach to select sensors in a hierarchical manner, the loss of information is inversely proportional to the index of the sensor [26]. Instead, in the PBDW algorithm, sensors are used for updating the background space, and therefore their correction is more on a local level, whereas the global spatial behaviour is given by the numerical model: sensors are still selected hierarchically, but in the update space only. For PBDW, issues may arise if more than one sensor fails, such that M < N: when the update space is larger than the background space, the algorithm may become unstable, losing its good convergence properties [8]. Such behaviour can be seen in Figure 7.
![]() |
Fig. 7. Absolute error in the domain for the reconstruction of the temperature field when the (single) malfunctioning sensor kth is completely removed from the algorithms’ input. |
The hierarchy of sensor importance is clear for TR-GEIM and temperature, with the error improving as sensors with higher index are removed; conversely, PBDW is quite robust against a single sensor failure, with only a slight worsening of the error. Interestingly, the temperature and neutron flux errors show a markedly different behaviour in the TR-GEIM algorithm, with the former being much more affected by the removal of high index sensors. Indeed, whereas for temperature good error behaviour is recovered only after k = 9, for neutron flux it is recovered starting from k = 2.
To explain this behaviour, it is possible to look at the complexity of the two fields, which is linked to their reducibility: more complex fields will be less reducible, meaning that a larger number of basis will be needed to properly capture its fundamental spatial behaviour: a way to measure how much a given field is reducible is by decomposing its snapshot matrix through Singular Value Decomposition and plot the obtained singular values (see Appendix A). Figure 8 reports the normalised first forty singular values λi for the neutron flux and the temperature: the singular values for the former decay much faster compared to those for the latter, indicating that less basis functions are needed to retain all the relevant information regarding the neutron flux (in particular, most information is retained by the first basis function): therefore, for this field the only critical sensor from the point of view of removal is the very first one.
![]() |
Fig. 8. SVD singular values for the neutron flux and the temperature: the faster they decay, the more the field is reducible, and a few basis can adequately describe it. |
4.2. Faulty sensor recognition
As seen in the previous section, simply removing the malfunctioning sensor may not be enough to recover the good performance of the DDROM algorithm for all fields of interest, especially for TR-GEIM in which sensors are hierarchical: thus, in the following section the recovery strategy described in Section 3 is implemented and used on the discussed test case. Then, given the kth sensor which fails at time t = 0, GPR learns the input-output relationship between the external auxiliary measurements yext and the output of the malfunctioning sensor yk: from the mathematical standpoint, the prediction obtained with the GPR fills the missing term in the measurement vector y appearing in the RHS of TR-GEIM and PBDW linear systems, respectively equations 3 and 5.
Whereas in [21] the index k of the failed sensor was known a-priori and was given as a fixed input to the trained GPR to recover the missing information, in this work, two different strategies have been implemented to recognise the malfunctioning data. The most straightforward method can be seen as a a-posteriori identification: assuming that a new set of measurements comes at t = 0 from the system and that the GPR has been thoroughly and correctly trained, it leverages equation 7 to compare each new trajectory to the expected output obtained from the GPR: if the z-score for the kth sensor is below a certain threshold, the kth is considered failed, and its trajectory is substituted with the one predicted by the GPR in the measurement vector that serves as an input to the DDROM.
The performance of this method is strongly dependent on the selection of the threshold value zth and on the disturbance level. Figure 9 reports (not on scale) the True Positive Rate (recall) for different values of zth and of the disturbance magnitude (computed as κ + ϵ for visualisation purposes). The recall metric is computed as the ratio between the number of true positives (the faulty sensor is correctly identified) and the sum of true positives and false negatives (the faulty sensor is missed), and it is the key metric for safety-critical applications, that is, when missed faults must be minimised. As seen in the Figure, the only region in which the recall is not 1 is the bottom right one, corresponding to low fault magnitudes and high threshold: for low values of fault magnitude, the recognition algorithm tends to confound them with measurement noise, thus being unable to correctly detect the presence of a malfunction.
![]() |
Fig. 9. Recall heatmap, reporting the ratio between true positives and the sum between true positives and false negatives. High recall values means that faults are rarely missed. |
To better understand why, instead, the bottom left region (low fault magnitudes and low thresholds) show high recall values, it is useful to check the value of the precision metric, defined as the ratio between true positives and the sum of true positives and false positives (healthy sensors flagged as faulty): high precision values means few false alarms, and this metric should be preferred when false positives are costly from the point of view of maintenance. Figure 10 shows the precision heatmap. Indeed, for low threshold the precision decreases, meaning that more healthy sensors are wrongfully flagged as ‘faulty’: for low thresholds, then, the algorithm tend to consider even the measurement noise as due to a malfunction, so that regardless of the fault magnitude, all noisy sensors are considered as faulty, giving rise to many false positives but also correctly flagging the faulty sensor.
![]() |
Fig. 10. Precision heatmap, reporting the ratio between true positives and the sum between true positives and false positives. High recall values means that there are few false alarms. |
Thus, the z-score-based algorithm is correctly able to identify the malfunctioning sensor, provided that the fault magnitude is larger than the measurement noise magnitude: the overall (i.e., considering all threshold values and fault magnitudes) recall metric is equal to 0.99, whereas the overall precision is equal to 0.9 (in both cases, values range between 0 and 1).
4.2.1. Random forest classifier
Despite its good performance, the above algorithm has two drawbacks: it requires an accurately trained GPR model for each sensor and cannot recognise the type of malfunction. Therefore, a companion approach based on a Random Forest classification algorithm is implemented. The goal of this algorithm is twofold: 1) classify the malfunction and 2) feed the GPR classified sensor data to improve the training and testing performance, by providing the GPR a-priori information on the shape of possible sensor failure. For the latter, if the state of each sensor is not known a priori, one GPR model for each sensor must be trained and optimised: instead, if the state of each sensor (either healthy or faulty) is known a-priori, it is possible to select the GPR model to be optimised and later used to recover the missing information; conversely, having information about the type of malfunction can help in the maintenance process.
For training the classifier, a labelled dataset Ξrf ∈ ℝNt × 2 has been created, with 𝒩t being the size of the dataset (equal to 5000); labels can be 0 (“Healthy”), 1 (“Drift”), 2 (“Spikes”) and 3 (“Both”), and this dataset has been divided into train (80%) and test (20%) sets. Following training, the classification report on the test set is given in Table 1 : this report gives, for each class, the precision, recall and F1-score metrics, computed as:
as well as the weighted average metrics. The classifier offers excellent performance on the ‘Healthy’ and ‘Spikes’ classes; for the ‘Drift’ class, almost all drift malfunctions are caught (high recall), but given the lower precision metrics, some ‘Both’ cases are misclassified as drift. Indeed, the ‘Both’ class is the weakest one, having a recall of 0.83 (meaning that 17% of ‘Both’ malfunctions are either missed or mislabelled): this is likely due to an overlap of the feature space with the ‘Drift’ or ‘Spikes’ classes, given that the dataset has been split in a balanced way to have almost the same number of samples for all four classes.
Random Forest classification report on the test set for the different classes, and weighted (over the number of samples in each class) average metrics.
To better breakdown the performance of the classifier, the confusion matrix is reported in Figure 11.
![]() |
Fig. 11. Confusion matrix for the test set. |
Focusing on the “Both” class, only 195 out of 245 malfunctions were correctly identified, with 30 misclassifications as “Drift” and 20 misclassifications as “Spikes”, thus explaining the lower recall metric. This suggests that the classifier struggles somewhat to distinguish compound malfunctions from individual ones, likely due to feature overlap, as even though oversampling “Both” malfunctions, no significant improvement in the classifier performance was observed. Regardless, as the classifier can correctly detect malfunctioning sensors from healthy ones, the performance is considered satisfactory.
4.3. GPR-based sensor recovery
In the following, two different recovery strategies have been considered:
-
the failed sensor is removed from the system and the corresponding reduced coefficient is reconstructed using the others: the ML model recovers the failed coefficient from the estimation of the non-failed ones.
-
The GPR algorithm directly reconstructs the missing measurements by using the information from the auxiliary sensors: the linear system for both TR-GEIM and PBDW retains the starting reduced dimension.
It is worth mentioning that, for the purpose of faulty sensor recovery, the information on the type of failure is not used: rather, the output of both faulty sensor recognition techniques is the index of the faulty sensor, to be fed to the recovery procedure.
4.3.1. Failed sensor removal and coefficient recovery
In the first proposed recovery approach, the kth failed sensor is removed from the input vector of the measurements, and GPR is used to recover its reduced coefficients βk starting from the training dataset. This strategy is tested only for the TR-GEIM algorithm, since it is the most affected by the removal of the failed sensor4, as seen in Section 4.1: the input vector will, therefore, have dimension M − 1, and the missing coefficients are computed a posteriori using GPR. Other possibilities can actually be investigating, such as the use of the magic functions and magic sensors as basis for the PBDW and then remove the failed one, since this algorithm is less sensitive to the hierarchy of sensors. Moreover, regularisation strategies and variants of EIM-like methods [33] can be adopted: this will be matter of future analysis.
The absolute reconstruction error under this recovery approach is shown in Figure 12; the reconstructed coefficients are shown in Figure 13; finally, the residual field for temperature and failed sensor k = 2, for the three cases (perfect sensors, removed sensors, recovered coefficients) are shown in Figure 14. Globally, reconstruction is quite good, albeit somewhat noisy, except for coefficients with behaviour similar to β9Φ, which show small oscillations around a constant value. Still, given the hierarchical nature of TR-GEIM, these high-index coefficients have significantly less impact on the performance. Regarding the absolute error, there is an improvement of the performance, especially noticeable with temperature, for k > 1: interestingly enough, this improvement is not monotonous, especially for k = 3 and k = 4, and further studies regarding this peculiar behaviour are currently in progress. Regardless, this strategy fails when k = 1, hence the need for a more robust recovery strategy.
![]() |
Fig. 12. Absolute reconstruction error for TR-GEIM when the failed measurement is removed from the input vector and the associated missing coefficients are estimated through GPR. |
![]() |
Fig. 13. Reconstructed coefficients β for TR-GEIM for four different failed sensors, both for temperature and for the neutron flux. Reconstruction is relatively good albeit somewhat noisy, except for constant or almost-constant coefficients: however, since sensors in TR-GEIM are hierarchical, high-order oscillating coefficients have less effect on the performance of the algorithm. |
![]() |
Fig. 14. Residual field for temperature in the three scenarios (perfect sensors, removed sensor and recovered coefficients), for sensor k = 2 at the end of the transient. |
4.3.2. Failed sensor recovery
The strategy proposed in the previous section, in which the information from the malfunctioning sensor is completely removed from the measurement input vector and the reduced coefficients about it are recovered using GPR, shows good but not great performance, especially for TR-GEIM when high-index sensors fail. Indeed, by removing an input measure, the associated basis function gets removed as well, thus impoverishing the reduced space. This unwanted behaviour is especially significant when k = 1, that is, the first sensor fails: for TR-GEIM, this is the one containing most of the information of the starting dataset, and, as seen in Figure 12, its loss cannot be compensated by the recovery of the associated reduced coefficients through GPR. The PBDW, on the other hand, can weigh the trust in either the model or the measurements through the hyperparameter χ and hence, for failed sensors, the model can be considered more reliable and produce a state estimation based on the background information [25, 27].
Therefore, a second recovery strategy is hereby proposed: the kth failed sensor is not removed from the measurement input vector, but rather, it is substituted by the output provided by the GPR, obtained using the auxiliary sensors located in ‘safer’ regions of the domain (in the case of the MSFR, the solid reflector layer). The reduced linear system then maintains its starting dimension M, and the basis function associated with the failed sensor is retained. Then, the online phase proceeds as in the case of perfect sensors.
Figure 15 shows the reconstructed measurement for sensor k = 1, for both temperature and neutron flux: clearly, an adequately-trained GPR can correctly recover the missing measurements, even in presence of drifting and unphysical spikes, meaning that it is possible to substitute the GPR output into the measurements input vector and, then, proceed with the ‘standard’ online phase for DDROM. The good performance of this approach compared to the previous one can be seen in Figure 16, which shows the average absolute error, computed as:
given Ek defined in equation 9. Only temperature is considered for brevity, as it is the field that is more influenced by malfunctioning sensors. Four different scenarios, considering a time instant outside the training dataset, have been considered: 1) perfect case, in which all sensors are correctly operating; 2) malfunctioning sensors with κ = 0.1 and ρ = 3; 3) scenario in which the measurement ykT associated with the failed sensor is removed from the measurement input vector; 4) GPR-aided case in which the input-output map learned by GPR during the training phase through auxiliary external sensors is used to recover the output of the failed sensor, for the case κ = 0.1 and ρ = 3. The difference in the removal strategy for TR-GEIM and PBDW is evident: generally speaking, algorithms that adopt greedy procedures to select the optimal location of sensors are sensitive to malfunctioning ones, especially for high-index sensors; thus, the improvement in performance when the offending measurement is removed is minimal.
![]() |
Fig. 15. Reconstructed measurements for sensor k = 1 (temperature and neutron flux) using GPR. |
![]() |
Fig. 16. Comparison of the average absolute error using different algorithms, for the following scenarios: perfect sensors, failed sensor, removed failed sensor, recovered failed sensor. |
Conversely, by recovering the missing information through GPR, TR-GEIM shows a significant improvement in the performance, and the error returns comparable to the perfect case; the fact that the GPR-aided TR-GEIM shows better performance than the truth case is related to the fact that the GPR tends to output smoother measurements compared to the original ones, thus lowering the noise introduced in the algorithm. Due to the fact that sensors are constrained to stay on the boundary the value of λ can be further tuned to achieve more accurate results: however, for the sake of simplicity and brevity it has been chosen to be equal to the variance of the noise5
Even locally, the improvement in performance through the recovery of the failed sensor is noticeable, as seen in Figure 17 for the worst scenario of k = 1 and for a time instant outside the training dataset. By recovering the missing measurements, both algorithms can correctly reconstruct the field of interest, even in forecast mode, with the performance returning comparable to that of the perfect case.
![]() |
Fig. 17. Contour plots of the temperature field T for the FOM, TR-GEIM and PBDW (the latter two both in perfect and GPR-aided conditions), for failed sensor k = 1 at time t* = 55 s ∈Ξpredict. |
5. Conclusions
This paper has presented two Data-Driven Reduced Order Modelling methods, the Generalised Empirical Interpolation Method and the Parameterised-Background Data Weak formulation for combining high-fidelity numerical simulation and measurements collected by physical sensors, in the realistic scenario of sensor malfunction. Two different classes of malfunctions have been considered, namely a drift from the true value and non-physical spikes, under the assumptions that only one sensor can fail at a time, and that the state of the sensor is known as a priori (these assumptions will be relaxed in future works). As a test case, this paper uses the Molten Salt Fast Reactor, and in particular, it considers an Unprotected Loss of Fuel Flow accidental scenario. The test case is interesting because it features a liquid fuel and a fast neutron spectrum: in the optimal case that in-core sensing would be possible for the main quantities of interest (in this work, the neutron flux and the temperature), in-core sensors will be subjected to high neutron fluences and very hot molten salt, hence their failure probability will be higher; conversely, sensors located in the solid reflector layer will be more robust, thus they can act as ‘redundant’ sensors.
Indeed, even a single drifted measure coming from a malfunctioning sensor significantly worsens the capabilities of the two DDROM techniques, for all sensors, resulting in unbounded errors and unphysical results. The straightforward removal strategy, that is, removing the failed sensor from the measurement input vector, shows different results: whereas it seems viable for PBDW, which selects sensors in a non-greedy manner, for TR-GEIM, removing an input means also removing the associated basis function and thus impoverishing the reduced space. Especially for temperature and high-index sensors, since TR-GEIM greedily selects the optimal sensor positioning, the good performance of the algorithm cannot be fully recovered this way. A study of the performance of DDROM-like techniques accounting for the realistic possibility of sensor failure is recommended, regardless of the selected algorithm.
To overcome this limitation of the DDROM framework, a Machine Learning technique, Gaussian Process Regression, has been used to try to recover good performance. The auxiliary sensors located in the ‘safer’ region of the solid reflector have been used alongside the in-core sensor to learn the existing input-output relationship between them. This map is used to 1) either recover the reduced coefficients of the removed sensor or 2) directly reconstruct the missing measurement. Between the two, the second strategy has been proved to be more successful even for the worst-case scenario of failed sensor k = 1 and temperature reconstruction with TR-GEIM, allowing the algorithm to correctly forecast the field of interest even for a time step outside the training dataset.
To correctly identify the failed sensor, to serve as an input to the trained GPR for the recovery step, two different strategies have been adopted. In the first one, based on the evaluation of the z-score of the incoming sensor trajectory and its comparison with the respective prediction of the GPR, good performance were observed as long as the fault magnitude can be distinguished from the measurement noise: low fault magnitude and high threshold for discriminating between healthy and malfunctioning sensors gave rise to false negatives, where faulty sensors were not correctly labelled as such; conversely, for low thresholds the number of false positives is high, meaning that the algorithm tends to consider all kind of deviations, including measurement noise, as malfunctions. Regardless, the overall recall score was 0.99, with a precision of 0.9.
To further study the classification problem, a Random Forest classifier was also implemented to discriminate between the types of faults and to give a-priori information to the GPR algorithm during the training phase. Performances were quite good also for the RF classifier, with an overall recall metric of 0.95. In terms of correct classification, the class with the worst performance was the “Both” class (recall metric 0.83), due to misclassification of “Both” malfunctions either into the “Drift” or the “Spikes” class: given the balance in the test set, this indicates a feature overlap, which may indicate the need of performing feature engineering on the dataset to further improve the performance of the classifier.
This work and its promising results are the first step of a methodological pathway for engineering applications where the algorithm itself can recognise the state of each sensor in time and retrieve the information from each failed sensor when needed to develop autonomous systems from the point of view of monitoring, control and diagnosis. Future studies in this direction are currently underway, including the use of unsupervised classifiers, accounting for the actual sensor failure probability and simulating the lifetime of the sensor, and accounting for the possibility of concurrent malfunctions.
Funding
This research received no external funding.
Conflicts of interest
The authors have nothing to disclose.
Data availability statement
This work adopts the pyforce package; the extended code for failed sensors will be made available upon completion of the review process, and it will be made available at https://github.com/ERMETE-Lab/ROSE-pyforce.git
Author contribution statement
Conceptualization, S.R., C.I. and A.C.; Methodology, S.R.; Software, S.R. and C.I.; Formal Analysis, S.R.; Investigation, S.R. and C.I.; Data Curation, S.R.; Writing – Original Draft Preparation, S.R. and C.I.; Writing – Review & Editing, A.C. and E.Z.; Visualization, S.R.; Supervision, A.C.
Glossary
Glossary
AcronymsDDROM: Data-Driven Reduced Order Modelling
GEIM: Generalised Empirical Interpolation Method
GPR: Gaussian Process Regression
MSFR: Molten Salt Fast Reactor
PBDW: Parameterised-Background Data-Weak
POD: Proper Orthogonal Decomposition
ULOFF: Unprotected Loss of Fuel Flow
Latin Symbols
𝒟: Subset of the parameter space
E: Average absolute error in L2-norm
IM[u]: TR-GEIM interpolant for the field u
p: Dimension of the parameter space
𝒫: log-marginal likelihood loss function
ℛM: Reconstruction operator for TR-GEIM
𝕋: TR-GEIM regularisation matrix
uFOM(x,μ): Full order solution (snapshot)
ûDDROM(x,μ*): Output of the ROM
y(utrue(x,μ*)): Measurement vector
Greek Symbols
α(μn): PBDW model coefficients
β(μ: Modal coefficients for TR-GEIM
∊ ~ 𝒩(0, σ2): Uncorrelated Gaussian random noise
λ: TR-GEIM regularisation parameter
ηM(x,μ): PBDW measurement knowledge
θ: PBDW update space basis function
ξ(x): PBDW model basis functions
Ξ*: Splitted dataset (train, test, predict)
ψn(x): Generic basis functions
References
- T. Lassila, A. Manzoni, A. Quarteroni, G. Rozza, Model Order Reduction in Fluid Dynamics: Challenges and Perspectives (Springer International Publishing, 2014) [Google Scholar]
- G. Rozza et al., Model Order Reduction: Volume 2: Snapshot-Based Methods and Algorithms (De Gruyter, 2020). https://doi.org/10.1515/9783110671490 [Google Scholar]
- A. Quarteroni, A. Manzoni, F. Negri, Reduced Basis Methods for Partial Differential Equations: An Introduction, 1st edn., (UNITEXT, Springer Cham, 2015) [Google Scholar]
- S.L. Brunton, J.N. Kutz, Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control (Cambridge University Press, 2022) [Google Scholar]
- A. Carrassi, M. Bocquet, L. Bertino, G. Evensen, Data assimilation in the geosciences: An overview of methods, issues, and perspectives, WIREs Climate Change 9, e535 (2018). https://doi.org/10.1002/wcc.535 [CrossRef] [Google Scholar]
- S. Riva, C. Introini, A. Cammi, Applied Mathematical Modelling Multi-physics model bias correction with data-driven reduced order techniques: Application to nuclear case studies, Appl. Math. Modell. 135, 243 (2024). https://doi.org/10.1016/j.apm.2024.06.040 [Google Scholar]
- N. Baker et al., Workshop Report on Basic Research Needs for Scientific Machine Learning: Core Technologies for Artificial Intelligence, Tech. rep., USDOE Office of Science (SC), Washington, D.C. (United States), 2019. https://doi.org/10.2172/1478744 [Google Scholar]
- Y. Maday, A.T. Patera, J.D. Penn, M. Yano, A parameterized-background data-weak approach to variational data assimilation: formulation, analysis, and application to acoustics, Int. J. Numer. Methods Eng. 102, 933 (2019). https://doi.org/10.1002/nme.4747 [Google Scholar]
- Y. Maday, O. Mula, in A Generalized Empirical Interpolation Method: Application of Reduced Basis Techniques to Data Assimilation (Springer, 2013), pp. 221–235 [Google Scholar]
- H. Gong, J.P. Argaud, B. Bouriquet, Y. Maday, The empirical interpolation method applied to the neutron diffusion equations with parameter dependence, in Physics of Reactors 2016, PHYSOR 2016: Unifying Theory and Experiments in the 21st Century, 1 (May) (2016), pp. 54–63 [Google Scholar]
- J.-P. Argaud, B. Bouriquet, F. de Caso, H. Gong, Y. Maday, O. Mula, Sensor placement in nuclear reactors based on the generalized empirical interpolation method, J. Comput. Phys. 363, 354 (2018). https://doi.org/10.1016/j.jcp.2018.02.050 [Google Scholar]
- H. Gong, Data assimilation with reduced basis and noisy measurement: Applications to nuclear reactor cores, Ph.D. thesis, Sorbonne Université, 2018 [Google Scholar]
- C. Introini, S. Riva, S. Lorenzi, S. Cavalleri, A. Cammi, Non-intrusive system state reconstruction from indirect measurements: A novel approach based on hybrid data assimilation methods, Ann. Nucl. Energy 182, 109538 (2023). https://doi.org/10.1016/j.anucene.2022.109538 [Google Scholar]
- A. Cammi, S. Riva, C. Introini, L. Loi, E. Padovani, Data-driven model order reduction for sensor positioning and indirect reconstruction with noisy data:Application to a circulating fuel reactor, Nucl. Eng. Des. 421, 113105 (2024). https://doi.org/10.1016/j.nucengdes.2024.113105 [Google Scholar]
- H. Gong, Z. Chen, Q. Li, Generalized empirical interpolation method with H1 regularization: application to nuclear reactor physics, Front. Energy Res. 9, 4018 (2022). https://doi.org/10.3389/fenrg.2021.804018 [Google Scholar]
- C. Introini, S. Cavalleri, S. Lorenzi, S. Riva, A. Cammi, Stabilization of generalized empirical interpolation method (geim) in presence of noise: A novel approach based on tikhonov regularization, Comput. Methods Appl. Mech. Eng. 404, 115773 (2023). https://doi.org/10.1016/j.cma.2022.115773 [Google Scholar]
- F. Cannarile, P. Baraldi, P. Colombo, E. Zio, A novel method for sensor data validation based on the analysis of wavelet transform scalograms, Int. J. Progn. Health Manage. 9, (2020). https://doi.org/10.36001/ijphm.2018.v9i1.2670 [Google Scholar]
- V. Rao, A. Sandu, M. Ng, E.D. Nino-Ruiz, Robust data assimilation using l1 and huber norms, SIAM J. Sci. Comput. 39, B548 (2017). https://doi.org/10.1137/15M1045910 [Google Scholar]
- B. Peherstorfer, K. Willcox, Dynamic data-driven model reduction: adapting reduced models from incomplete data, Adv. Model. Simul. Eng. Sci. 3, 11 (2016). https://doi.org/10.1186/s40323-016-0064-x [Google Scholar]
- A. Hossein Abolmasoumi, M. Netto, L. Mili, Robust dynamic mode decomposition, IEEE Access 10, 65473 (2022). https://doi.org/10.1109/ACCESS.2022.3183760 [Google Scholar]
- S. Riva, C. Introini, E. Zio, A. Cammi, Impact of malfunctioning sensors on data-driven reduced order modelling: Application to molten salt reactors, EPJ Web Conf. 302, 17003 (2024). https://doi.org/10.1051/epjconf/202430217003 [Google Scholar]
- C.E. Rasmussen, C.K.I. Williams, Gaussian Processes for Machine Learning (MIT Press, 2006) [Google Scholar]
- T.K. Ho, The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. Mach. Intell. 20, 832 (1998). https://doi.org/10.1109/34.709601 [Google Scholar]
- M. Brovchenko et al., Design-related studies for the preliminary safety assessment of the molten salt fast reactor, Nucl. Sci. Eng. 175, 329 (2013). https://doi.org/10.13182/NSE12-70 [Google Scholar]
- W. Haik, Y. Maday, L. Chamoin, A real-time variational data assimilation method with data-driven model enrichment for time-dependent problems, Comput. Methods Appl. Mech. Eng. 405, 115868 (2023). https://doi.org/10.1016/j.cma.2022.115868 [Google Scholar]
- Y. Maday, O. Mula, G. Turinici, Convergence analysis of the generalized empirical interpolation method, SIAM J. Numer. Anal. 54, 1713 (2016). https://doi.org/10.1137/140978843 [Google Scholar]
- T. Taddei, Model order reduction methods for data assimilation; state estimation and structural health monitoring, Ph.D. thesis, MIT, 2016. https://doi.org/10.13140/RG.2.2.16001.45928 [Google Scholar]
- Y. Maday, T. Taddei, Adaptive PBDW approach to state stimation: Noisy observations; user-defined update spaces, SIAM J. Sci. Comput. 41, B669 (2019). https://doi.org/10.1137/18M116544X [Google Scholar]
- O.A. Martin, R. Kumar, J. Lao, Bayesian Modeling and Computation in Python (CRC Press, Boca Raton, 2021) [Google Scholar]
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12, 2825 (2011) [MathSciNet] [Google Scholar]
- S. Riva, S. Deanesi, C. Introini, S. Lorenzi, A. Cammi, Neutron flux reconstruction from out-core sparse measurements using data-driven reduced order modelling, in Proceedings of the International Conference on Physics of Reactors (PHYSOR24), 2024, pp. 1632–1641. [Google Scholar]
- M. Aufiero, Development of Advanced Simulation Tools for Circulating Fuel Nuclear Reactors, Ph.D. thesis, Politecnico di Milano, 2014. https://doi.org/10.13140/2.1.4455.1044 [Google Scholar]
- F. Casenave, A. Ern, T. Lelièvre, Variants of the empirical interpolation method: Symmetric formulation, choice of norms and rectangular extension, Appl. Math. Lett. 56, 23 (2016). https://doi.org/10.1016/j.aml.2015.11.010 [Google Scholar]
- S.L. Brunton, J.N. Kutz, Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control, 2nd edn. (Cambridge University Press, USA, 2022) [Google Scholar]
Appendix A Problem reducibility and singular value decomposition
When considering a reduction problem, it is important to assess how well the high-fidelity dataset can be approximated by a reduced subspace: an estimate on the optimal size of this reduced subspace can be found through the Singular Value Decomposition (SVD) [3]: given any complex-value matrix 𝕏 ∈ ℂ𝒩h × Ns, the SVD provides a unique decomposition:
in which 𝕌 = {φi}𝒩h ∈ ℂ𝒩h × Nh contains the left singular vectors of 𝕏, Σ = [diag(σ1, ...σ𝒩𝓈, 0, ..., 0]∈ℝ𝒩h × 𝒩s contains the singular values and 𝕍⋆ = {υi*}𝒩s* ∈ ℂ𝒩s × Nh contains the right singular vectors of 𝕏. If 𝒩h > 𝒩s, the diagonal matrix Σ has at most Ns non-zero elements so that the economy SVD can be retrieved[34]:
In practice, the SVD provides a hierarchy of low-rank approximations, due to the fact that the singular values are ranked from the most to the least important: a simple and interpretable way of approximating 𝕏, and hence the associated snapshots, is therefore:
Since each subsequent term is less important in capturing the most dominant features of 𝕏, a good approximation can be obtained by truncating at some rank r < < 𝒩h:
allowing for the discovery of dominant low-dimensional patterns in the data matrix 𝕏. The truncated SVD basis
then provides the basis functions that span the reduced space. Moreover, the SVD comes also with another important property related to the energy captured by the truncation: the partial sum, till k ≤ r, captures as much of energy of the matrix 𝕊 and the SVD basis represents the optimal rank-r approximation. In the end, some insights about what the columns of
, i.e. the modes or basis functions, and the rows of
represent can be given. The former describes the most dominant/energetic spatial features of the data; whereas the latter embeds the parametric and temporal dependences (sometimes referred to as latent dynamics).
Cite this article as: Stefano Riva, Carolina Introini, Enrico Zio, Antonio Cammi. Data-driven reduced order modelling with malfunctioning sensors recovery applied to the molten salt reactor case, EPJ Nuclear Sci. Technol. 11, 55 (2025). https://doi.org/10.1051/epjn/2025054
All Tables
Random Forest classification report on the test set for the different classes, and weighted (over the number of samples in each class) average metrics.
All Figures
![]() |
Fig. 1. Schematic of the DDROM framework, highlighting the offline-online decomposition: from a series of snapshots, the ROM algorithm extracts the fundamental modes, performing dimensionality reduction and building the DDROM model; in the online phase, measurements are given as input to the reduced model to obtain an updated state estimation. Taken from [21]. |
| In the text | |
![]() |
Fig. 2. Recovery of missing information of failed sensor ykT (in black on the left) from external sensors yextT(in blue), for the temperature field T. Taken from [21]. |
| In the text | |
![]() |
Fig. 3. EVOL geometry. The dark blue external layer represent the Hastelloy solid reflector, whereas the lighter blue domain is the primary loop containing the liquid fuel. The location of the pump and heat exchanger (green and red, respectively) is also reported. |
| In the text | |
![]() |
Fig. 4. Example of malfunctioning sensors with both a drift from the true value (κ = 0.1) and spikes due to unwanted noise (ρ = 0.5). |
| In the text | |
![]() |
Fig. 5. Reconstruction error (computed using Eq. 9 for the most general case of both κ and ρ different from zero. The red line reports the error for the reference case without malfunctioning sensors, serving as a lower bound. The shaded area indicates the uncertainty band of the error (each scenario is repeated 20 times randomly sampling the measurement noise, to obtain statistical relevancy). |
| In the text | |
![]() |
Fig. 6. Absolute error in the domain for the reconstruction of the temperature field, assuming malfunctioning sensor #4. |
| In the text | |
![]() |
Fig. 7. Absolute error in the domain for the reconstruction of the temperature field when the (single) malfunctioning sensor kth is completely removed from the algorithms’ input. |
| In the text | |
![]() |
Fig. 8. SVD singular values for the neutron flux and the temperature: the faster they decay, the more the field is reducible, and a few basis can adequately describe it. |
| In the text | |
![]() |
Fig. 9. Recall heatmap, reporting the ratio between true positives and the sum between true positives and false negatives. High recall values means that faults are rarely missed. |
| In the text | |
![]() |
Fig. 10. Precision heatmap, reporting the ratio between true positives and the sum between true positives and false positives. High recall values means that there are few false alarms. |
| In the text | |
![]() |
Fig. 11. Confusion matrix for the test set. |
| In the text | |
![]() |
Fig. 12. Absolute reconstruction error for TR-GEIM when the failed measurement is removed from the input vector and the associated missing coefficients are estimated through GPR. |
| In the text | |
![]() |
Fig. 13. Reconstructed coefficients β for TR-GEIM for four different failed sensors, both for temperature and for the neutron flux. Reconstruction is relatively good albeit somewhat noisy, except for constant or almost-constant coefficients: however, since sensors in TR-GEIM are hierarchical, high-order oscillating coefficients have less effect on the performance of the algorithm. |
| In the text | |
![]() |
Fig. 14. Residual field for temperature in the three scenarios (perfect sensors, removed sensor and recovered coefficients), for sensor k = 2 at the end of the transient. |
| In the text | |
![]() |
Fig. 15. Reconstructed measurements for sensor k = 1 (temperature and neutron flux) using GPR. |
| In the text | |
![]() |
Fig. 16. Comparison of the average absolute error using different algorithms, for the following scenarios: perfect sensors, failed sensor, removed failed sensor, recovered failed sensor. |
| In the text | |
![]() |
Fig. 17. Contour plots of the temperature field T for the FOM, TR-GEIM and PBDW (the latter two both in perfect and GPR-aided conditions), for failed sensor k = 1 at time t* = 55 s ∈Ξpredict. |
| In the text | |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.


 =&\sum _{m=1}^M \beta _m(\boldsymbol{\mu })\cdot q_m(\mathbf x ) \quad \\ \nonumber&\text{ s.t.} \quad \{\upsilon _m(u)=\upsilon _m(\mathcal{I} _M)\}_{m=1}^M, \end{aligned} $$](/articles/epjn/full_html/2025/01/epjn20250035/epjn20250035-eq2.gif)








![$$ \begin{aligned} u(\mathbf x ,t) \leftarrow u(\mathbf x ,t)\cdot \left[\max _\mathbf{x \in \Omega } u(\mathbf x ,0)\right]^{-1}. \end{aligned} $$](/articles/epjn/full_html/2025/01/epjn20250035/epjn20250035-eq13.gif)

![$$ \begin{aligned}&E^k[u] = \frac{1}{\dim (\Xi ^*)}\sum _{t\in \Xi ^*}\vert \vert r(\mathbf x ,t)\vert \vert _{L^{2}}(\Omega ), \end{aligned} $$](/articles/epjn/full_html/2025/01/epjn20250035/epjn20250035-eq14.gif)











![$$ \begin{aligned} E = \frac{1}{M}\sum _{k=1}^M E^k[T], \end{aligned} $$](/articles/epjn/full_html/2025/01/epjn20250035/epjn20250035-eq16.gif)






