I would like to setup up an algorithm for detecting an anomaly in time series, and I plan to use clustering for that.
Why should I use a distance matrix for clustering and not the raw time series data?,
For the detection of the anomaly, I will use density-based clustering, an algorithm as DBscan, so would that work in this case? Is there an online version for streaming data?
I would like to detect the anomaly before it happens, so , would using a trend detection algorithm (ARIMA) be a good choice?
Regarding your first question, I would recommend that you read this famous article (Clustering of Time Series Subsequences is Meaningless) before doing clustering on a time series. It is clearly written and illustrates many pitfalls that you want to avoid.