faiss实现的高效 K-means 聚类
https://www.aiuai.cn/aifarm1662.html
faiss安装报错参考:https://github.com/facebookresearch/faiss/issues/821
1. K-means 聚类
1 |
|
2. PCA 计算
例如,将 40D 向量降维到 10D,1
2
3
4
5
6
7
8#随机生成训练数据
mt = np.random.rand(1000, 40).astype('float32')
mat = faiss.PCAMatrix (40, 10)
mat.train(mt)
assert mat.is_trained
tr = mat.apply_py(mt)
#print this to show that the magnitude of tr's columns is decreasing
print((tr ** 2).sum(0))
3. PQ 量化
如:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23d = 32 # data dimension
cs = 4 # code size (bytes)
#随机生成数据集
nt = 10000
xt = np.random.rand(nt, d).astype('float32')
# dataset to encode (could be same as train)
n = 20000
x = np.random.rand(n, d).astype('float32')
#
pq = faiss.ProductQuantizer(d, cs, 8)
pq.train(xt)
# encode
codes = pq.compute_codes(x)
# decode
x2 = pq.decode(codes)
# compute reconstruction error
avg_relative_error = ((x - x2)**2).sum() / (x ** 2).sum()
标量量化(scalar quantizer):1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22d = 32 # data dimension
# train set
nt = 10000
xt = np.random.rand(nt, d).astype('float32')
# dataset to encode (could be same as train)
n = 20000
x = np.random.rand(n, d).astype('float32')
# QT_8bit allocates 8 bits per dimension (QT_4bit also works)
sq = faiss.ScalarQuantizer(d, faiss.ScalarQuantizer.QT_8bit)
sq.train(xt)
# encode
codes = sq.compute_codes(x)
# decode
x2 = sq.decode(codes)
# compute reconstruction error
avg_relative_error = ((x - x2)**2).sum() / (x ** 2).sum()