Nottingham, England, United Kingdom
Bush Farm Road, Boghall, Scotland, United Kingdom
Errol Road, Kingoodie, Scotland, United Kingdom
14 Rue Pierre et Marie Curie, Paris 5e Arrondissement, Île-de-France, France
147 Rue de l'Université, Paris 7e Arrondissement, Île-de-France, France
133 Waterloo Road, London Borough of Lambeth, England, United Kingdom
25 Rue du Docteur Roux, Paris 15e Arrondissement, Île-de-France, France
The Project #CODES4STRAINS
Start: | October 2019 |
Duration: | 3 Years |
Domain: | Foodborne Zoonoses, Antimicrobial Resistance |
Members: | IP, INRA, ANSES- France, PHE- UK |
Contact: | Dr Sylvain Brisse (IP) |
Codes4strains: Tracking bacterial pathogens through sources, geography and time using stable phylogenetically informative genome codes
The implementation of genome sequencing in public health microbiology has allowed the natural variation exhibited by pathogenic bacteria to be leveraged for infectious disease surveillance and outbreak detection. Genotype information derived from WGS allows the monitoring of pathogenic potential and the tracking of epidemic behaviour, to inform infection control, diagnostic and treatment practice.
To track strains globally, and as they spread between the environment, food, animals and humans, universal strain nomenclatures are necessary. Two main strain nomenclatures approaches are currently existing.
First, core genome Multilocus Sequence Typing (cgMLST) is widely applied for bacterial pathogen surveillance. It relies on predefined gene loci, the sequence variants of which are given unique identifiers (allelic numbers). Resulting allelic profiles are given unique identifiers (cgST) or are grouped based on their similarity, generally using the single-linkage clustering method. A
An alternative approach known as the SNP address was developed at Public Health England. Different from MLST, it is based on single nucleotide polymorphisms (SNP) compared to a reference genome. Single-linkage clustering is performed based on the resulting SNP distance between isolates. An original concept of the SNP address is to apply several thresholds upon allelic or SNP differences. The ‘address’ is a multi-positions code, where each position corresponds to the cluster membership at descending thresholds of genetic (SNP) distance among strains resulting in a multi-level nomenclature which provides a good approximation of the phylogenetic relatedness among isolates. Likewise, several cgMLST thresholds can be used to provide phylogenetic information on top of classification purposes, as was done for Listeria monocytogenes by the group of the main applicant.
Providing multi-level information on phylogenetic relatedness has proved helpful for epidemiological investigations and for prospective surveillance. This has facilitated outbreak detection as well as providing the framework for case/control studies at different diversity levels, depending on the length or complexity of an outbreak. Further, utilising a flexible level of divergence to define an ‘outbreak type’ aids hypothesis generation and may allow in some cases to identify the specific source of the outbreak by maximizing the power of case-control source attribution studies.
SNP and cgMLST approaches have complementary characteristics. One strength of the cgMLST approach is its standardized aspect (predefined sets of loci; unlike SNPs, which have proven difficult to standardize), which maximizes the applicability of the method for international or cross-sector strain comparisons where analysis is performed independently. In turn, whole-genome SNPs are more discriminatory than cgMLST, which relies on predefined set of ‘core’ loci. Therefore, SNP and cgMLST should be regarded as two useful approaches to be integrated jointly in future genomic epidemiology strategies.
However, one major limitation of current SNP address or multi-level cgMLST classifications is that they utilise single-linkage clustering to define groups. This approach is unstable, as the fusion of predefined groups upon discovery of ‘intermediate’ genotypes is an inherent mathematical property of single-linkage. This issue is pertinent within epidemiological timescales, where intermediate genotypes have a high probability of being sampled. It is our experience in both applicants groups that the fusion of predefined groups is a challenge to handle in practice, and introduces nomenclatural confusion.
Currently, no genomic nomenclature system of bacterial pathogens exists that combines complete stability of identifiers, high standardization and reproducibility and high resolution. This gap represents an important barrier to the field of genomic epidemiology and slows down communication and action against the transmission of pathogens across sectors, world regions and over long periods of time. This critical gap was addressed in the Codes4strains PhD project.
Congratulations to Mélanie for being awarded her doctorate degree in Autumn 2022!
Project Assets
Poster presentation at 13th International Meeting on Microbial Epidemiological Markers (IMMEM XIII), Bath, UK. 14-17th September 2022.
Poster presentation & participation in 3-minute thesis competition at One Health EJP Annual Scientific Meeting, Orvieto, Italy. 11-13th April 2022.
Melanie-Hennart