# The correct way to normalize time series data

Zero mean unit variance normalization of multivariate time series

I’m asking a new question because that one didn’t have any replies.

I’m analyzing dataset number 6 here

https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/

in particular, data set 001.

Data: The data set consists of multiple multivariate time series. Each time series is from a different engine – i.e., the data can be considered to be from a fleet of engines of the same type. For each engine, we have the engine ID, the time of operation (in cycles), and 24 time series: three operating conditions and 21 noisy sensor measurements. Example:

> glimpse(train_set)
Observations: 20,631
Variables: 26
$engine <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...$ cycles       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,...
$op_setting_1 <dbl> -0.0007, 0.0019, -0.0043, 0.0007, -0.0019, -0....$ op_setting_2 <dbl> -4e-04, -3e-04, 3e-04, 0e+00, -2e-04, -1e-04, ...
$op_setting_3 <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 1...$ sensor_1     <dbl> 518.67, 518.67, 518.67, 518.67, 518.67, 518.67...
$sensor_2 <dbl> 641.82, 642.15, 642.35, 642.35, 642.37, 642.10...$ sensor_3     <dbl> 1589.70, 1591.82, 1587.99, 1582.79, 1582.85, 1...
$sensor_4 <dbl> 1400.60, 1403.14, 1404.20, 1401.87, 1406.22, 1...$ sensor_5     <dbl> 14.62, 14.62, 14.62, 14.62, 14.62, 14.62, 14.6...
$sensor_6 <dbl> 21.61, 21.61, 21.61, 21.61, 21.61, 21.61, 21.6...$ sensor_7     <dbl> 554.36, 553.75, 554.26, 554.45, 554.00, 554.67...
$sensor_8 <dbl> 2388.06, 2388.04, 2388.08, 2388.11, 2388.06, 2...$ sensor_9     <dbl> 9046.19, 9044.07, 9052.94, 9049.48, 9055.15, 9...
$sensor_10 <dbl> 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1.3, 1...$ sensor_11    <dbl> 47.47, 47.49, 47.27, 47.13, 47.28, 47.16, 47.3...
$sensor_12 <dbl> 521.66, 522.28, 522.42, 522.86, 522.19, 521.68...$ sensor_13    <dbl> 2388.02, 2388.07, 2388.03, 2388.08, 2388.04, 2...
$sensor_14 <dbl> 8138.62, 8131.49, 8133.23, 8133.83, 8133.80, 8...$ sensor_15    <dbl> 8.4195, 8.4318, 8.4178, 8.3682, 8.4294, 8.4108...
$sensor_16 <dbl> 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03...$ sensor_17    <int> 392, 392, 390, 392, 393, 391, 392, 391, 392, 3...
$sensor_18 <int> 2388, 2388, 2388, 2388, 2388, 2388, 2388, 2388...$ sensor_19    <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 1...
$sensor_20 <dbl> 39.06, 39.00, 38.95, 38.88, 38.90, 38.98, 39.1...$ sensor_21    <dbl> 23.4190, 23.4236, 23.3442, 23.3739, 23.4044, 2...


There are $N=100$ engines (multivariate time series) in the training set, and each time series has about $M=200$ time samples, so for each sensor we have a total of about $N\times M = 20000$ time samples across all engines in the training set. I can add a plot of the data if necessary, but I don’t think it’s needed for my question.

The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure.

Analysis goal: Given a test time series as input, I want to predict the corresponding Remaining Useful Life (RUL) value, with a model which I’ll train on the training set. I also have a vector of RULs for all time series in the test set, so I can compute the test accuracy of my model. I’ve chosen to use two models:

• Cox proportional hazard models, corrected to allow time-dependent input features (any suggestions/references on how to do this?)
• an RNN

and compare results.

Question: I would like to normalize the value of the sensor outputs, because I don’t think the units of measurement carry any information about the RUL. What’s the most correct approach?

• for sensor $X_i(t)$, compute the sample mean $\overline{X}_i$ and the sample standard deviation $S$ across all the $\approx 20000$ data points in all time series in the the training set. For example, for sensor_11 we have

summary(train_set$sensor_11) Min. 1st Qu. Median Mean 3rd Qu. Max. 46.85 47.35 47.51 47.54 47.70 48.53  Then we compute$Z_i(t)=\frac{X_i(t)-\overline{X}_i}{S}$• or instead we compute the sample mean$\overline{X_i}(t)$and sample standard deviation$S(t)$of$X_i(t)$at each time$t$, and we normalize the time series with a time-varying sample mean and sample standard deviation,$Z_i(t)=\frac{X_i(t)-\overline{X_i}(t)}{S(t)}$. Since I have$N=100$engines in the training set, at each time$t$I have a random sample of size$N_t \leq N$to estimate$\overline{X_i}(t)$and$S(t)$. I don’t overly like this approach because it introduces new issues: if$t_{min}$is the minimum failure time across all engines in the training set, how do I compute$\overline{X_i}(t)$and$S(t)$for times$t>t_{min}$? Finally, in both cases I believe I should compute$\overline{X}_i$and$S$(or$\overline{X_i}(t)$and$S(t)\$) based only on training set data, and use the values so computed to normalize the test set time series. Correct?

Note that one of the models I’ll use is a RNN. I’m not sure if this carries any relevance when choosing the way to normalize data.

1. You should stick to the first approach. Or its variation when you calculate the separate scale for each series. That’s a short answer, there are nuances though.

Your scale must be constant if your model doesn’t have a scale in it. For instance, take a look at Heston model in finance. Just look at the equations, ignore the context. You see how the valatility changes with time. The first equation models the returns, and the second one models the volatility (standard deviation). The model explicitly models volatility.

In this case I could see dynamic scaling working, because then you’d have other parts of your model that are modeling the scale itself. Your model doesn’t have to be stochastic volatility like Heston’s but it must explicitly account for the fact that volatility changes over time somehow. Otherwise, if your scale changes and you don’t deal with it, I doubt you’ll get a sensible result.

1. On the second question of whether to use the training set to scale. Ideally it shouldn’t even matter, because your scale is constant, right? So, if it’s constant then your scale shouldn’t change if you calculated it on subsamples. In reality it will move a little bit, and if the change causes issues it means that your training set is different from the test set. The macro characteristics such as means and standard deviation should not change too much between subsamples. If this happens it’s a sampling or data size problem. Also, if the change in scale is small, but the model breaks down it means that the model is not robust to small disturbances, a problem in and of itself, in my opinion.