Παρασκευή 5 Μαΐου 2017

Can classification performance be predicted by complexity measures? A study using microarray data

Abstract

Data complexity analysis enables an understanding of whether classification performance could be affected, not by algorithm limitations, but by intrinsic data characteristics. Microarray datasets based on high numbers of gene expressions combined with small sample sizes represent a particular challenge for machine learning researchers. This type of data also has other particularities that may negatively affect the generalization capacity of classifiers, such as overlaps between classes and class imbalance. Making use of several complexity measures, we analyzed the intrinsic complexity of several microarray datasets with and without feature selection and then explored the connection with the empirical results obtained by four widely used classifiers. Experimental results for 21 binary and multiclass datasets demonstrate that a correlation exists between microarray data complexity and the classification error rates.



http://ift.tt/2pKidpt

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου

Δημοφιλείς αναρτήσεις