• Skip to primary navigation
  • Skip to main content
  • Skip to primary navigation
  • Skip to main content
Choose which site to search.
University of Arkansas for Medical Sciences Logo University of Arkansas for Medical Sciences
College of Medicine: Department of Biomedical Informatics
  • UAMS Health
  • Jobs
  • Giving
  • About Us
    • Employment
    • Access, Opportunity, and Advocacy
      • About DBMI-AOA
      • Current DBMI-AOA Committee Members
      • DBMI-AOA Resources
      • DBMI-AOA Committee Events
    • Links
    • News
    • Department Intranet
  • Faculty & Staff
    • Primary Faculty
    • Secondary Faculty
    • Adjunct Faculty
    • Staff
  • Education
    • Admission Information
    • Clinical Informatics Fellowship
      • Fellowship Overview
      • Training Sites
      • Faculty
      • Current Fellows
      • Welcome to Little Rock!
    • Graduate Programs
    • Current Course Offerings
    • DBMI FAQs
    • Research & Application Seminar
    • Recorded Sessions for CME Credit
    • Student Funding Opportunities
    • Graduate Students
  • Cores and Shared Resources
    • Arkansas Clinical Data Repository (AR-CDR)
    • Bioinformatics Collaborative Resource Center
    • INBRE
      • INBRE Bioinformatics Core Support Request Form
  • Research
    • Databases
    • Research Labs
      • Biomedical Ontologies Arkansas (BOAR)
    • Publications
  • Artificial Intelligence for Health
  1. University of Arkansas for Medical Sciences
  2. College of Medicine
  3. Department of Biomedical Informatics
  4. News
  5. Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

Sci Rep. 2017 Jan 19;7:40712. doi: 10.1038/srep40712. PubMed PMID: 28102365; PubMed Central PMCID: PMC5244389

Qian Zhang, Se-Ran Jun, Michael Leuze, David Ussery & Intawat Nookaew

Abstract

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.

Read more: https://www.nature.com/articles/srep40712

Posted by Chris Lesher on January 19, 2017

Filed Under: Publications Tagged With: David Ussery, Intawat Nookaew, Michael Leuze, Qian Zhang, Se-Ran Jun, Viral Phylogenomics

UAMS College of Medicine LogoUAMS College of MedicineUniversity of Arkansas for Medical Sciences
Mailing Address: 4301 West Markham Street, Little Rock, AR 72205
Phone: (501) 686-7000
  • Facebook
  • X
  • Instagram
  • YouTube
  • LinkedIn
  • Pinterest
  • Disclaimer
  • Terms of Use
  • Privacy Statement

© 2025 University of Arkansas for Medical Sciences