What are the available methods/implementation in R/Python to discard/select unimportant/important features in data? My data does not have labels (unsupervised).

The data has ~100 features with mixed types. Some are numeric while others are binary (0/1).

**Answer**

It’s a year old but I still feel it is relevant, so I just wanted to share my **python implementation** of Principal Feature Analysis (PFA) as proposed in the paper that Charles linked to in his answer.

```
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from collections import defaultdict
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.preprocessing import StandardScaler
class PFA(object):
def __init__(self, n_features, q=None):
self.q = q
self.n_features = n_features
def fit(self, X):
if not self.q:
self.q = X.shape[1]
sc = StandardScaler()
X = sc.fit_transform(X)
pca = PCA(n_components=self.q).fit(X)
A_q = pca.components_.T
kmeans = KMeans(n_clusters=self.n_features).fit(A_q)
clusters = kmeans.predict(A_q)
cluster_centers = kmeans.cluster_centers_
dists = defaultdict(list)
for i, c in enumerate(clusters):
dist = euclidean_distances([A_q[i, :]], [cluster_centers[c, :]])[0][0]
dists[c].append((i, dist))
self.indices_ = [sorted(f, key=lambda x: x[1])[0][0] for f in dists.values()]
self.features_ = X[:, self.indices_]
```

You can use it like this:

```
import numpy as np
X = np.random.random((1000,1000))
pfa = PFA(n_features=10)
pfa.fit(X)
# To get the transformed matrix
X = pfa.features_
# To get the column indices of the kept features
column_indices = pfa.indices_
```

This is strictly following the described algorithm from the article. I think the method has promise, but honestly I don’t think it’s the most robust approach to unsupervised feature selection. I’ll post an update if I come up with something better.

**Attribution***Source : Link , Question Author : learner , Answer Author : Nick Cox*