I have a multivariate dataset that changes over time. I have extracted (and normalised) some features and used

k-meansto generate clusters over the entire span of the dataset.Now I want to see whether the clusters change significantly over time. So, working backwards, and thus reducing the dataset by x-months, can I see a significant reduction on certain clusters?

This, I think, could fall within the realm of time series clustering. I was hoping to avoid complicating the approach, since the clusters are currently meaningful and the approach is relatively simple.

Could anyone please advise me on how to go about this?

My intuition is

to reduce the dataset by x-months and then cluster (using k-means) the data for comparison. However, I may be breaking the rules here, and oversimplifying a complicated problem.

**Answer**

Time-series clustering requires sample size remaining the same but the features changes over time, otherwise it makes little sense. In the question though, inferring from the description sample size increases over time. In that case, to see `significant reduction on certain clusters`

, one should use a fixed sample-size. Then choose fixed sample from the initial time period, and see how their cluster sizes and memberships are changing over time.

Symbolically, let’s say you have 3 datasets (feature matrices) over time:

X_{t_{0}} \supset X_{t_{1}} \supset X_{t_{2}}

and corresponding clusterings C_{0}, C_{1}, C_{2}, where C is essentially instances and cluster membership tables. To judge how clustering changes, take samples at t_{0}, such that X_{0} \supset X_{t_{0}}. Tracking how X_{0}‘s membership and cluster sizes on different clusterings C_{0}, C_{1}, C_{2} changes. This would give a good idea if there are “reductions” (significant changes) over different clustering, given that X_{0} is representative over-time.

**Attribution***Source : Link , Question Author : slotishtype , Answer Author : msuzen*