Clustering datasets

 

[bridge.pgm]
4096 vectors, 16-d
4x4 pixel blocks  ts  txt
4x4 binarized pixel blocks  ts  txt
4x4 pixel blocks: 25% randomly sampled (for training)  ts txt
4x4 pixel blocks: 75% randomly sampled (for testing)  ts  txt
[house.ppm]
34112 vectors, 3-d
RGB-values, quantized to 5 bits per color ts  txt
RGB-values, 8 bits per color  ts  txt
[missa001.pgm]
6480 vectors, 16-d
4x4 pixel blocks from the difference image of frame 1 and 2  ts  txt
4x4 pixel blocks from the difference image of frame 2 and 3  ts  txt
 

 



Birch1

Birch2

BIRCH-sets

Synthetic 2-d data with 100 000 vectors and 100 clusters.

Zhang et al., "BIRCH: A new data clustering algorithm and its applications", Data Mining and Knowledge Discovery, 1 (2), 141-182, 1997.


Birch3
 
Birch1: Clusters in regular grid structure  ts  txt
Birch2: Clusters at a sine curve  ts  txt
Birch3: Random sized clusters in random locations  ts  txt
 

 


S1
S1
S3
S3
S2
S2
S4
S4

 

S-sets

Synthetic 2-d data with 5000 vectors and 15 Gaussian clusters with different degree of cluster overlapping.

P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pattern Recognition, 39 (5), 761-765, May 2006.

S1: ts  txt
S2:  ts  txt
S3:  ts  txt
S4:  ts  txt

Source and labels:  zip

 

 


A1
A1
A2
A2

A-sets

Synthetic 2-d data with varying number of clusters and vectors.

A1:  ts  txt
A2:  ts  txt
A3:  ts  txt
A3
A3
   
 

 


Dim2
Dim2

Dim032

Dim-sets

Synthetic data with Gaussian clusters in multi-dimensional space.
1351-10126 vectors, 2-d - 15-d

Dim2:  ts  txt

 Dim32: ts  txt

Dim64: ts  txt

Dim128: ts  txt

Dim256: ts  txt

Dim512: ts  txt

Dim1024: ts  txt


Dim064

Dim128

Dim256

Dim512

Dim1024
 

 

 


KDDCUP04Bio

 

Thyroid

Other data sets

KDDCUP04Bio biology data set:. ts  txt

Thyroid data set:  ts  txt

Wine data set:  ts  txt

Yeast:  txt
Yeast_times100:  ts  txt

Breast:  ts  txt

g2:  ts's in zip file (53MB) 

 


Wine

Yeast

Breast-cancer-Wisconsin

g2-sets

 

 

Related links