The Application of Chaos Game Representations and Deep Learning for Grapevine Genetic Testing

Date

Authors

Vu, Andrew

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The identification of grapevine species, cultivars, and clones associated with desired traits such as disease resistance, crop yield, crop quality, etc., is an important component of viticulture. True-to-type identification has proven to be very critical and yet very challenging for grapevine due to the existence of a large number of cultivars and clones and the historical issues of synonyms and homonyms. DNA-based identification, superior to morphology-based methods in accuracy, has been used as the standard genetic testing method, but not without shortcomings. To overcome some of the limitations of the traditional microsatellite-marker based on genetic testing, we explored a whole genome sequencing (WGS)-based approach by taking advantage of the latest next-generation sequencing technologies (NGS) for achieving the best accuracy and better availability at affordable cost. To address the challenges of the extremely high dimensional nature of the WGS data, we examined the effectiveness of using Chaos Game Representation (CGR) for representing the genome sequence data and the use of deep learning in visual analysis for species and cultivar identification. We found that CGR images provide a meaningful way of capturing patterns and motifs for use with visual analysis, with the best prediction results demonstrating a 0.990 mean balanced accuracy in classifying a subset of five species. Our preliminary research highlights the potential for CGR and deep learning as a complementary tool for WGS-based species-level and cultivar-level classification.

Description

Citation