Published January 1, 2022 | Version v1
Journal article Open

Fast and interpretable genomic data analysis using multiple approximate kernel learning

  • 1. Koc Univ, Grad Sch Sci & Engn, TR-34450 Istanbul, Turkey
  • 2. Oregon Hlth & Sci Univ, Knight Canc Inst, Portland, OR 97239 USA

Description

Motivation: Dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices.

Files

bib-81c500da-e04e-4642-b79b-7476ead91732.txt

Files (163 Bytes)

Name Size Download all
md5:57df23ec360b7343fd1a3b1d50b9612a
163 Bytes Preview Download