Facebook Prophet + SHAP - Explainable Predictive Maintenance for Regular Multivariate Time Series
This regression model exploits historical data measured from machine sensors to perform inference on future usage and detect possible future faults in the machine itself. Explainability metrics targets sensor groups and are powered by the SHAP library.
Sensors inside a production machine record data from usage and activities. Each sensor produces a time series with its measurements, and a regression model is trained on each series. The model performs regression on historical data learning different patterns and periodicities and is able to simulate future behaviors by following the trends and seasonalities observed in the data, thus allowing to preemptively plan maintenance right before a machine component breaks down or causes a machine fault, avoiding plant stoppages.
Prophet adjusts its parameters during regression, when it computes its coefficients.
A linear model has been chosen for growth.
Seasonality effects have been set to be additive.
For certain time series, the changepoint prior scale has been raised above the default 0.0.5.
Prophet internally offers computed metrics for MSE, RMSE, MAE, MAPE and MDAPE.
To compare Prophet results to GRU models, an empirically validated metric has been the cumulative error, employed where visual inspection was not sufficient to determine which model was the clear best.
To explain the models’ results, a series of explainability metrics have been produced leveraging the SHAP library. In both univariate and multivariate approaches, a series of post-hoc explanations were built representing the minimum score registered by the models for each sensor (or group of seniors) within a selected temporal window. In the multivariate approach, the SHAP explainability library was also employed to investigate the influence of each variable within each sensor group. By leveraging SHAP, it was possible to gain insights into the relative importance of variables and understand the factors contributing to anomalous behavior within specific sensor groups. Finally, for the multivariate approach, a series of aggregated metrics have been produced to explain the connection between sensor groups and individual anomalies, by means of the minimum silhouette score across the selected time range, weighted by the number of times each individual anomaly occurred overall. The obtained values, used to fill-in the matrix, indicates quantitatively the overall contribution of each group of sensors to each anomaly.