Methods in R or Python to perform feature selection in unsupervised learning [closed]

What are the available methods/implementation in R/Python to discard/select unimportant/important features in data? My data does not have labels (unsupervised).

The data has ~100 features with mixed types. Some are numeric while others are binary (0/1).


It’s a year old but I still feel it is relevant, so I just wanted to share my python implementation of Principal Feature Analysis (PFA) as proposed in the paper that Charles linked to in his answer.

from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from collections import defaultdict
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.preprocessing import StandardScaler

class PFA(object):
    def __init__(self, n_features, q=None):
        self.q = q
        self.n_features = n_features

    def fit(self, X):
        if not self.q:
            self.q = X.shape[1]

        sc = StandardScaler()
        X = sc.fit_transform(X)

        pca = PCA(n_components=self.q).fit(X)
        A_q = pca.components_.T

        kmeans = KMeans(n_clusters=self.n_features).fit(A_q)
        clusters = kmeans.predict(A_q)
        cluster_centers = kmeans.cluster_centers_

        dists = defaultdict(list)
        for i, c in enumerate(clusters):
            dist = euclidean_distances([A_q[i, :]], [cluster_centers[c, :]])[0][0]
            dists[c].append((i, dist))

        self.indices_ = [sorted(f, key=lambda x: x[1])[0][0] for f in dists.values()]
        self.features_ = X[:, self.indices_]

You can use it like this:

import numpy as np
X = np.random.random((1000,1000))

pfa = PFA(n_features=10)

# To get the transformed matrix
X = pfa.features_

# To get the column indices of the kept features
column_indices = pfa.indices_

This is strictly following the described algorithm from the article. I think the method has promise, but honestly I don’t think it’s the most robust approach to unsupervised feature selection. I’ll post an update if I come up with something better.

