|
Do I need an account to use this site?
|
Users may view results and download datasets and software from the site anonymously.
Only registered users can submit results, datasets, papers or software to the site. However,
all accounts must be approved by the site administrator before this functionality is available.
Please be patient in awaiting administrator approval after registering with the site.
|
|
What is the format of the datasets?
|
Dataset file (.dat)
This file is a sparse, zero-based indexed data format with
header information containing the matrix size and a flag indicating symmetry. Note that the number of rows and
number of columns should always be the same.
Format
Number_of_Rows Number_of_Columns
{symmetric | asymmetric}
row_index column_index similarity
...
row_index column_index similarity
Example.dat
3 3
symmetric
0 0 1.0
0 1 0.5
0 2 0.25
1 1 1.0
1 2 0.75
2 2 1.0
Label file (.lbl)
This file has no header, the number of lines are equal to the number of
rows/columns in the dat file. Each line is an integer repesenting the object membership.
Format
object0_class
object1_class
...
objectn_class
Example.lbl
1
1
0
|
|
What is the Standard Partition of datasets?
|
Packaged with every data set is an associated standard partition,
marked by the .idx extension. This partition is intended to allow direct comparisions
of algorithms by evaluating on exactly the same randomized test and training sets.
A standard partion specifies n randomly chosen test sets (each is 20% of the total
size of the data) and for each of these test sets, the rest of the data is partitioned
evenly into k disjoint training sets to be used for cross-validation. Using the standard
partition, an algorithm will be trained and tested n*k times, after which the k errors
computed on each of the n distinct test sets can be used to compute n mean and standard
deviation results. Currently, these n results are averaged to produce a single average
error and average standard deviation of error, but in the future we hope to incorporate
all of the data for statistical significance testing.
Format (n = number of tests, k = number of folds per test set)
n k
indices_of_test_0
indices_of_train_00
indices_of_train_01
...
indices_of_train_0k
...
indices_of_test_n
indices_of_train_n0
indices_of_train_n1
...
indices_of_train_nk
Example.idx
2 5
0 5 7 13 17
3 6 8 24
1 10 18 21
9 12 16 19
2 20 22 23
4 11 14 15
0 5 8 13 18
1 2 6 23
3 11 14 22
4 16 17 19
9 12 20 21
7 10 15 24
|
|
What is the Results section?
|
This section allows you to compare results that have been contributed to the site.
Simply select the desired algorithms and datasets from the list and click the
"Generate Dynamic Matrix" link to generate a table of results. All results can be compared
by choosing "select all" on both algorithms and datasets.
Results can be submitted by approved users and should always be obtained on the
standard partition of the dataset to allow for direct comparison.
|
|
How can I load the data sets into Matlab?
|
The data sets are stored in a readable open format for portability. For those who
wish to import/export the data sets into the MATLAB environment, we
have provided the following bundle of scripts:
Matlab Scripts
|