CGRclust is a novel unsupervised clustering method that utilizes Chaos Game Representation (CGR) and twin contrastive learning to cluster unlabelled DNA sequences. It has been evaluated across various metagenomic datasets, including mitochondrial genomes from fish, fungi, and protists, as well as viral whole genome assemblies and synthetic DNA sequences.
Despite its strengths, CGRclust's performance can be influenced by hyperparameter tuning and the computational efficiency may be a concern for very large datasets. Additionally, the method's reliance on the quality of input data can affect clustering outcomes.
Overall, CGRclust represents a significant advancement in the field of metagenomic data analysis, providing a robust tool for clustering diverse DNA sequences without the need for alignment or taxonomic labels.
For further details, refer to the study: CGRclust: Chaos Game Representation for twin contrastive clustering of unlabelled DNA sequences [2024].