Archives

  • 2019-07
  • 2019-08
  • 2019-09
  • 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • br Conclusions br Appropriate feature selection is required

    2019-09-28


    5. Conclusions
    Appropriate feature selection is required when building a col-orectal cancer risk prediction model. It helps to avoid overfit-ting and is an aid to identify the features with more prediction power so that proper interventions can be taken to address the risk.Assessing the stability of the feature selection methods be-comes necessary, otherwise conclusions derived from the analysis may be quite unreliable. The graphical approach that is presented here enables us to analyze the stability of feature selection algo-rithms as well as the similarity among different feature ranking techniques.
    Comparisons have been conducted with several feature ranking algorithms and different risk prediction models. The experimental results on the multicase control-study of the Spanish Veratridine indicate that the SVM-wrapper approach shows moderate stability and it leads to the best classification model performance.In addi-tion, the simple Pearson correlation coe cient shows a good trade in terms of performance and stability.
    Screening and preventive interventions can certainly benefit from an improved estimation of the risk of developing CRC. How-ever, there are still some barriers and more research to be done in order to incorporate it into a daily clinical practice.
    Disclosure statements
    MCC-Spain Study Group: G. Castaño-Vinyals, B. Pérez-Gómez, J. Llorca, J. M. Altzibar, E. Ardanaz, S. de Sanjosé, J.J. Jiménez-Moleón, A. Tardón, J. Alguacil, R. Peiró, R. Marcos-Gragera, C. Navarro, M. Pollán and M. Kogevinas.
    Declaration of competing interest
    None.
    Funding
    Genotyping: SNP genotyping services were provided by the Spanish Centro Nacional de Genotipado (CEGEN-ISCIII)" and by the Basque Biobank.
    Acknowledgments
    All the subjects who participated in the study and all MCC-Spain collaborators.
    Supplementary material
    References
    [5] R. Alaiz-Rodríguez, N. Japkowicz, P. Tischer, A visualization-based exploratory technique for classifier comparison with respect to multiple metrics and mul-tiple domains, in: Proceedings of the Joint ECML-KDD Conference, Springer, 2008, pp. 660–665.
    [7] G. Castano-Vinyals, N. Aragonés, B.Pérez-Gómez, V. Martín, J. Llorca, V. Moreno, Population-based multicase-control study in common tumors in spain (mcc-s-pain): rationale and study design, Gac Sanit (2015).
    [10] P. Chaudhari, H. Agarwal, Improving feature selection using elite breeding QPSO on gene data set for cancer classification, in: Intelligent Engineering In-formatics, Springer, 2018, pp. 209–219.
    [11] C.-H. Chen, A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection, Appl. Soft Comput. 20 (2014) 4–14. 
    [16] I. Guyon, S. Gunn, A.B. Hur, G. Dror, Result analysis of the nips 2003 fea-ture selection challenge, in: Proceedings of the 17th International Conference on Neural Information Processing Systems, MIT Press, Cambridge, USA, 2004,
    [17] I. Guyon, S. Gunn, M. Nikravesh, L.A. Zadeh, Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing), Springer-Verlag, 2006.
    [18] R. Guzmán-Martínez, R. Alaiz-Rodríguez, Feature selection stability assessment based on the Jensen-Shannon divergence, in: Proceedings of the ECML-KDD Conference - Volume Part I, Springer-Verlag, 2011, pp. 597–612.
    G. Binefa, P.F. Navarro, A. Espinosa, V. Dávila-Batista, A.J. Molina, C. Palazue-los, G. Castaño-Vinyals, N. Aragonés, M. Kogevinas, M. Pollán, V. Moreno, Risk model for colorectal cancer in spanish population using environmental and ge-netic factors: results from the mcc-spain study, Sci. Rep. 7 (2017). 43263 EP –.
    [24] L.I. Kuncheva, A stability index for feature selection, in: Proceedings of the
    25th Conference on IASTED International Multi-Conference: Artificial Intelli-