Paper Title
SHINYP:AN AI-ASSISTED PLATFORM FOR SNP ANALYSIS AND CORE GERMPLASM OPTIMIZATION IN SWEET POTATO
Abstract
Genome-wide single nucleotide polymorphism (SNP) analysis enables high-resolution characterization of genetic variation, providing essential insights for population genomics and molecular breeding. To address the analytical demands of large-scale SNP datasets, we developed ShiNyP, an interactive, AI-assisted platform built on the R Shiny framework, offering an end-to-end workflow for SNP data processing, visualization, and reporting. The platform encompasses six functional modules, ranging from data quality control and population structure analysis to selection sweep detection and core collection optimization, allowing researchers to perform comprehensive analyses without programming expertise. ShiNyP was applied to a sweet potato (Ipomoea batatas) dataset consisting of 183 accessions and over 260,000 SNPs. After rigorous quality control, 140,444 high-confidence biallelic SNPs were retained for downstream analyses. Discriminant Analysis of Principal Components (DAPC) revealed four distinct genetic clusters with clear population differentiation. Genetic diversity analysis identified fixed and unique alleles within subgroups, while selective sweep detection highlighted candidate genomic regions potentially linked to adaptive and pigmentation-related traits. Utilizing diversity and representativeness metrics, ShiNyP’s core collection module identified 39 core germplasm accessions that capture the majority of genetic variation present across the population.The platform outputs more than 70 customizable publication-ready figures and tables, and employs AI to generate human-readable summary reports that accelerate interpretation. ShiNyP is compatible with haploid to polyploid species and is freely available on GitHub, supporting diverse applications in genetic research, conservation, and breeding program design.
Keywords - ShiNyP, multi-module, AI-assisted bioinformatics,sweet potato, genome-wide SNP analysis, population genetics, core collection optimization