Enhancement and validation of the antibiotic resistance prediction performance of a cloud-based genetics processing platform for Mycobacteria

Jeremy Westhead1, Catriona S Baker1, Marc Brouard1, Matthew Colpus1, Bede Constantinides1, Alexandra Hall1, Jeff Knaggs1, Marcela Lopes Alves1, Ruan Spies1, Hieu Thai1, Sarah Surrall5, Kumeren Govender5, Timothy EA Peto123, Derrick W Crook123, Shaheed V Omar4, Robert Turner1 and Philip W Fowler*123

    1Nuffield Department of Medicine, University of Oxford, Oxford, U.K

    2National Institute of Health Research Oxford Biomedical Research Centre, University of Oxford, Oxford, UK

    3Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, John Radcliffe Hospital, Oxford, U.K

    4Centre for Tuberculosis, National & WHO Supranational TB Reference Laboratory, National Institute for Communicable Diseases, a division of the National Health Laboratory Service, Johannesburg, South Africa

    5Ellison Institute of Technology, Oxford Ltd, U.K

    *To whom correspondence should be addressed: philip.fowler{at}ndm.ox.ac.uk, @philipwfowler

      bioRxiv preprint DOI: https://doi.org/10.1101/2024.11.08.622466

      Posted: April 23, 2025, Version 2

      Copyright: This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0/

      Abstract

      Tuberculosis remains a global health problem. Making it easier and quicker to identify which antibiotics an infection is likely to be susceptible to will be a key part of the solution. Whilst whole-genome sequencing offers many advantages, the processing of the genetic reads to produce the relevant public health and clinical information is, surprisingly, often the responsibility of the end user which inhibits uptake. Here we describe our Mycobacterial genetics processing pipeline and its deployment in a cloud-based platform. For antibiotic resistance prediction we have implemented the second edition of the WHO catalogue of resistance-associated variants. We validate the resistance prediction performance by constructing and processing a diverse dataset of 2,663 publicly-available M. tuberculosis samples with published drug susceptibility testing (DST) data and find that identifying a sample as resistant if it contains a minor allele known to be associated with resistance increases sensitivity. By only considering high confidence DST results we are able to show that our Mycobacterial pipeline achieves sensitivities and specificities in excess of 95% for both isoniazid and rifampicin.

      Introduction

      In 2023 just under 11 million people became ill with tuberculosis (TB) worldwide and 1.25 million died 1. The aetiological agent, M. tuberculosis, is difficult to kill with a single drug and therefore requires treating with multiple antibiotics. Susceptible M. tuberculosisis routinely treated with four antibiotics: rifampicin, isoniazid, pyrazinamide and ethambutol. Like other bacterial pathogens, resistance to all antibiotics has now been observed and multi-drug resistant (MDR) TB is defined as infections resistant to both rifampicin and isoniazid – these require alternative treatment, such as the BPaLM (bedaquline, protonamid, linezolid and moxifloxacin) regimen that was recommended by the WHO in 20222. MDR TB is increasingly recognised as a global health concern and consequently rifampicin-resistant M. tuberculosis was added to the WHO Bacterial Priority Pathogens List in 20243. Positive identification of M. tuberculosis complex in a clinical sample and subsequent drug susceptibility testing (DST) are important steps in treating this disease.

      Due to the slow growth rate of M. tuberculosisculture-based DST takes weeks; whole genome sequencing (WGS) is an attractive alternative since it is faster, potentially more accurate and also yields epidemiological information. Over the last seven years 4, many high income countries have adopted WGS for M. tuberculosis (and in some cases the genus Mycobacteria). Our ability to accurately predict the antibiogram of a sample from its genetics has been boosted in recent years by the systematic sequencing and drug susceptibility testing of large numbers of clinical samples by projects such as the Comprehensive Resistance Prediction for Tuberculosis: an International Consortium (CRyPTIC). CRyPTIC collected over 15,000 clinical samples via 14 laboratories based in 11 countries 5. This and other publicly available datasets enabled the World Health Organization to release in 2021 the first catalogue of mutations in M. tuberculosis complexassociated with drug resistance 6. A second edition was released in late 20237 which we will call WHOv2. Both editions take the form of a traditional text-based report (which contains some expert rules) and an accompanying detailed Excel worksheet (with more rules), with the second edition also including a variant call file. Each edition used a single dataset to both infer the association of genetic variants with antibiotic resistance and estimate the performance of the resulting catalogue.

      A key obstacle hindering the uptake of WGS is the requirement to write, host and maintain a computational workflow that processes the raw genetic reads from a clinical sample, yielding the relevant public health and clinical information; this may include removing any human reads to ensure patient privacy, identifying contaminants and what species and/or mixtures are present before producing a consensus genome from which mutations in resistance genes can be derived and then looked up in a resistance catalogue. Using cloud computing is an especially appealing solution since the entire pipeline and computing infrastructure required to run it can be hosted ‘elsewhere’ – all the user has to do is upload the samples, which admittedly, if the network is neither resilient or fast enough, can pose additional challenges. Other solutions include ‘on device’ computing or some variation of edge computing, but these all suffer from their own challenges and will not be discussed further.

      In this paper we shall validate the antibiotic resistance prediction capabilities of a Mycobacterial pipeline we have developed that has been implemented in a cloud-based platform that is free for academics and LMICs. The ability of the pipeline to detect different Mycobacterial subspecies and identify members of the M. tuberculosis complex that are putatively related and therefore could be part of the same outbreak lie outside the scope of this work. At present, the pipeline consumes short genetic reads (e.g. from Illumina sequencers) and work is underway to validate long-reads (e.g. from Oxford Nanopore Technology sequencers). In particular we shall test our implementation of WHOv2, investigate the effect of some enhancements we have made and evaluate the performance of our Diverse Testset of 2,663 M. tuberculosis samples. Our hope is that cloud-deployed pipelines, such as show-cased here, will encourage the uptake of WGS by public health bodies for M. tuberculosis, especially in LMICs which have the opportunity to ‘leap-frog’ several technologies in one go.

      Materials and Methods

      Sample selection

      A total of 11,887 samples were identified from the publicly available CRyPTIC dataset 5 that (i) had been whole genome sequenced using short-read (Illumina) technologies and (ii) had minimum inhibitory concentrations (MICs) to 13 different antibiotics measured using a bespoke 96-well broth microdilution (BMD) plate. Each sample was sequenced and incubated on a BMD plate as described previously 5,8. In addition to the MICs measured visually by the laboratory scientist, photographs of each plate were image processed 9 and classified by at least 11 volunteers as part of a citizen science project 10. All MICs where two or three of these independent measurements agreed were annotated as high confidence MICs – these are assumed to have reduced measurement error. Two plate designs were used (UKMYC5 & UKMYC6), each of which included 13 antibiotics: amikacin, bedaquline, clofazimine delamanid, ethambutol, ethionamide, isoniazid, kanamycin, levofloxacin, linezolid, moxifloxacin, rifabutin and rifampicin. All MICs were converted to a binary Resistant/Susceptible classification using a set of research ECOFFs 8. All 13 antibiotics bar rifabutin are included in the WHOv2 catalogue. The WHOv2 catalogue also includes three drugs not present on the plates: pyrazinamide, capromycin and strep-tomycin. We therefore also identified a further 10,606 publicly available samples 5 which had at least one binary phenotype measured via the MGIT960 system.

      Construction of the Diverse Testset

      The Diverse Testset has two competing aims: (i) have as close to 50% resistance / 50% susceptibility for all drugs to maximise resolution whilst also (ii) being as small as possible to enable rapid, repeated testing. The difficulty being, of course, that each sample chosen brings phenotypic information for more than one drug which makes it difficult to achieve the first criterion. Finally there is a risk of introducing bias if all samples have some degree of resistance. We therefore arbitrarily decided to create an initial dataset of 1,000 samples with phenotypes for the 13 drugs on the UKMYC plate designs ensuring that 200 of these were pan-susceptible.

      Since the CRyPTIC project collected very few samples resistant to the new and repurposed drugs, we first selected all 284 samples which were assessed as resistant to one or more of bedaquline (n=65), linezolid (117) and delamanid (140). Additional samples were then randomly selected if they (i) were resistant to the next drug with fewest resistant samples, (ii) were not resistant to any drugs that had already reached 50% penetration in the dataset and (iii) were drawn from the remaining 100 samples with the greatest number of high confidence MICs. This process was repeated until 800 samples had been chosen whereupon a further 200 pan-susceptible samples were chosen from the 1,000 pan-susceptible samples with the greatest number of high confidence MICs.

      Despite our efforts, there are fewer than 250 resistant samples for bedaquline, linezolid, and delamanid in the UKMYC dataset (Table 1). We therefore repeated the process of iteratively selecting individual samples on the dataset of 10,606 samples with MGIT DST data, the main differences being (i) we only considered pyrazinamide, capreomycin, streptomycin, linezolid and bedaquline and (ii) each sample had phenotypic drug susceptibility results for a variable number of these drugs and therefore we tried to maximise the number of antibiotics per sample. After de-duplication with the UKMYC dataset and verification that all samples had raw genetic files (FASTQ) available in the European Nucleotide Archive (ENA), this led to a dataset of 1,663 samples. The aggregated dataset of 2,663 samples therefore contains at least 250 resistant samples for all drugs (Table 1), except linezolid (n=117) and delamanid (140).

      Table 1:

      The proportion of resistance by drug in the 1,000 UKMYC and 1,663 MGIT samples. Due to having the highest prevalence of resistance in the CRyPTIC dataset, only isoniazid reached 50% resistance. A key difference between the two datasets is that each UKMYC samples has an MIC and thence a binary phenotype for 13 drugs whereas the MGIT samples have between one and four binary phenotypes, with two being the most common (702 samples).

      Processing of the Diverse Testset by EIT Pathogena

      In brief, all samples were uploaded to EIT Pathogena (https://www.eit-pathogena.com) using its command line interface (CLI). This ensured reads matching the human genome were removed using the hostile11 algorithm prior to upload. Upon arrival in the cloud all samples undergo a second round of decontamination with hostile11 then poor quality/short reads are discarded before the number of reads belonging to the My-cobacterium genus is assessed using kraken2 in conjunction with the Standard index from June 202312. If a sample has over 10,000 Mycobacterial (and unclassified) reads it is progressed. These reads are then competitively mapped to a curated list of 186 Mycobacterial genomes using minimap2 (v2.24-r1122) 13 to provide fine-grained speciation. Lastly, in the case of complexes the reads are then examined by mykrobe (v0.13.0) 14 which classifies reads down to the level of lineage (in the case of Mycobacterium tuberculosis complex) or subspecies (for Mycobacterium avium complex and Mycobacterium abscessus complex). Reads are then mapped to version 3 of the M. tuberculosis H37Rv reference genome 1517 using clockwork (v0.12.3) 18 which in turn uses minimap213 to build a pile-up and then both samtools (v1.15.1) 19 and cortex20 to call variants, the former being better at identifying SNPs and the latter insertions and deletions; minos 21 adjudicates when there is overlap. Clockwork was used by the CRyPTIC project 5 and also for all variant calling in the first edition of the WHO catalogue of resistance-associated variants 6. The variant call file is used by the resistance prediction process, which is described below, whilst the genome is passed to a novel algorithm, FindNeighbour5 (v2.0.2), which rapidly returns a list of samples within 20 SNPs that therefore could be epidemiologically related. The entire pipeline is deployed within the EIT Pathogena cloud platform that, through the use of industry standard technologies like kubernetes and object storage, is able to scale with demand and all data is resiliently stored and encrypted at rest.

      Translation of the second edition of the WHO catalogue of resistance-associated variants in M. tuberculosis and resistance prediction

      As mentioned the second edition of the WHO catalogue of resistance-associated variants comprises three arte-facts: no single artefact contains all the rules in the catalogue and only the Excel and VCF files are parsable by computer code. The Excel file adopts the HGVS nomenclature for describing genetic variants but this is unable to encode some of the broader rules found in the catalogue such as “any frameshift in gene X” which is a key component of the new Loss of Function rules. The CRyPTIC project 8 developed a grammar, GARC, specifically for this purpose and we therefore translated the Excel file into the GARC format 22 that can be read and understood by piezo23. A formal definition of the GARC grammar is provided via its Backus-Naur Form in the Supplemental Methods. Code changes were required to incorporate the new epistatic rules introduced in WHOv2. Our catalogue maps Groups 1 & 2 onto Resistant (R) and Groups 4 & 5 onto Susceptible (S) as suggested by the WHO, with Group 3 being labelled Unknown (U).

      There are several important differences in our implementation of WHOv2; our catalogue will report any mutation in a gene known to be associated with resistance but not in the catalogue as Unknown. If there are two or fewer reads at a genetic locus known to be associated with resistance our catalogue will return a result of Fail since there is not enough information to know if the sample is resistant or not, and it probably needs resequencing. Neither scenario is covered by WHOv2 and therefore both would be predicted susceptible without these rules. Finally, we call any resistance-associated variant listed in the catalogue if it is supported by three or more reads, regardless of how many other (usually wild-type) reads are also found at that locus. Minor alleles with resistance-associated variants are therefore detected; by contrast 75% of reads were required to support a genetic variant for it to be considered for the WHOv2 catalogue and there is no guidance on what threshold should be applied when detecting variants listed in WHOv2.

      Our translation of the WHOv2 catalogue 24, along with the variant call file produced by clockwork and version 3 of the reference H37Rv genome (as a GenBank file) are then ingested by gnomonicus (v2.6.8) 25 which produces lists of which genetic variants are detected in the sample (translated into amino acid mutations where appropriate), their predicted effects on antibiotics as specified by the catalogue and finally the resulting predicted antibiogram.

      Data availability and reproducibility

      Once processing was complete, two output files for each sample were downloaded using the same CLI that handled uploading; one of these contains general information about the sample whilst the second contains more detailed information about resistance prediction. Both are structured using the javascript object notation (JSON) standard and all 5,326 files are stored in the attendant GitHub repository*. This repository contains a series of Juypter notebooks containing Python3 code that allows the user to discover, parse and save as data tables the relevant information in the JSON files. Other notebooks allow the user to reproduce all the analysis underlying this work, including reproducing the figures and many of the tables. The repository also lists the 2,663 samples that make up our Diverse Testset and includes bash scripts allowing all the FASTQ files to be downloaded from the ENA with the intention that people can either reproduce our results, or use the same dataset for other analyses.

      Results

      Some antibiotics perform slightly better, some slightly worse if we apply a Simple implementation of the WHOv2 catalogue

      Since the WHOv2 catalogue does not provide any guidance on whether to detect minor alleles or what to do if there are insufficient reads at a loci associated with resistance, we shall start by ignoring both of these effects and simply calculate the performance on our Diverse Testset – we shall call this the Simple implementation (Fig. 1 and Table 2). The only basis for comparison we have available is the published performance of WHOv2 on its own Training Set 7 which is an imperfect comparison since the datasets are different. Comparing to the Simple implementation applied to our Diverse Testset (Fig. 1) we find six antibiotics have sensitivities within ±1.5% (rifampicin, isoniazid, streptomycin, amikacin, kanamycin, capreomycin) of that reported by the WHO, two antibiotics (pyrazinamide, ethambutol) have higher sensitivities (>1.5%) and the remaining seven drugs (bedaquline, linezolid, moxifloxacin, levofloxacin, clofazimine, delamanid, ethionamide) all record lower sensitivities (< 1.5%) than seen on the WHOv2 Training Dataset. The variation in specificity is lower: two antibiotics (ethionamide, kanamycin) record higher specificities (> 1.5%) than reported for WHOv2 with one drug (ethambutol) lower. We cannot conclude what is responsible for the differences and hence all options – from implementation errors to differences between the datasets – remain valid.

      Table 2:

      The performance of the WHOv2 catalogue on the Diverse Testset of 2,663 M. tuberculosis samples. Nulls+Minors includes rules that will Fail a drug if there are two or fewer reads at a locus associated with resistance but will call Resistant if there are three or more reads at the same locus. Simple excludes these rules. Nulls+Minors (High confidence) applies the same catalogue, but to a subset of the Diverse Testset with only MICs supported by two or more independent reading methods concur (high confidence). This is only possible for antibiotics with UKMYC DST data – antibiotics marked with an asterisk used simple binary MGIT DST and therefore have no value for this last category. Confidence limits were estimated by bootstrapping.

      Figure 1:

      When tested on the 2,663 validation samples on the Mycobacterial Genetics Pipeline it produces comparable AMR prediction performance as reported for the WHOv2 catalogue. The (A) sensitivity and (B) susceptibility values for the 15 antibiotics included in the WHOv2 catalogue. The grey bars are the performance reported on the WHO Training Set and since are calculated on a different, larger dataset are not directly comparable. Four drugs (marked with an asterisk) have phenotypes from MGIT testing. Three values are reported for the Mycobacterial Genetics Pipeline: (1) a simple translational of the WHOv2 catalogue, (2) then allowing resistant minor alleles with three or more reads to be called and, lastly, (3) only comparing to high confidence MICs in addition to calling minor alleles. Since the latter can only be calculated for the antibiotics with UKMYC MICs, no values can be reported for the four drugs measured by MGIT. Mantel-haentzsel statistics were calculated for all comparisons. Comparisons with p-values less than 0.05 and 0.01 are annotated with one or two asterisks, respectively.

      Failing samples with insufficient reads at loci associated with resistance improves performance

      If there are insufficient reads at a genetic locus one does not know if the nucleotide is the same as the reference, a SNP or even a deletion, perhaps leading to a frame-shift. If the genetic locus is also one associated with resistance then the likelihood is such that even in the absence of genetic information the sample has a reasonable probability of being resistant to the relevant antibiotic. This is a qualitatively different outcome to RS or U since re-sequencing is likely to result in more reads at that position, allowing a definite result to be returned. In our implementation of the WHOv2 catalogue, any genetic locus in a sample which has two or fewer (including zero) reads but could be a resistance-associated variant (RAV) is described as a Fail (F).

      Adding these rules yields the Nulls implementation. In 2,663 samples 142 Fails are generated from 125 genetic loci which affect the antibiogram of 33 (1.2%) samples. The majority of Fail calls occur in the rrs gene (82) followed by rpoB (19), pncA (8) and embB (4). The resulting changes to the performance of streptomycin, rifampicin, pyrazinamide and ethambutol are small and lie within error and are therefore not shown. Note that the majority of loci in rrs with Fails do not contribute to amikacin and kanamycin resistance according to WHOv2 and therefore these antibiotics are, perhaps surprisingly, not affected.

      Identifying minor alleles containing resistance-associated variants improves performance

      Thus far all genetic variants have been identified using the default filter in the clockwork pipeline which requires 90% of reads to support a variant for it to be called. Note that this threshold is the same as used in WHOv16 but higher than the 75% used in WHOv27. Let us now add rules to our implementation of the WHOv2 catalogue such that any RAV supported by three or more short reads, regardless of how many reads are in support of different (usually wild-type) alleles are called Resistant: this is the Nulls+Minors implementation. The sensitivity of every antibiotic increases (Fig. 1 and Table 2) with values ranging from +0.8% (streptomycin) to +9.9% (capreomycin). Eight antibiotics experienced an increase of sensitivity of 2% or more (pyrazinamide +3.4%, bedaquline +4.9%, linezolid +5.1%, moxifloxacin +4.6%, levofloxacin +4.7%, amikacin +2.6%, kanamycin +2.5% and capreomycin +9.9%) although only the increase seen for capreomycin is statistically significant. Only isoniazid experienced a reduction in specificity of more than 1% (-1.1%); this is not statistically significant. Including minor alleles therefore appears to bring the performance of our implementation of the WHOv2 catalogue on our 2,663 sample Diverse Testset closer to that reported by the WHO 7, but that is a spurious comparison because neither the dataset nor the implementation are the same.

      Discrepancy analysis on UKMYC dataset

      There are many reasons why we cannot perfectly predict the antibiogram for each sample; these include sample mislabelling, not all genes or genetic variants having been classified, DST measurement errors and genetics not being a perfect predictor of phenotype. We are fortunate in that the UKMYC dataset of 1,000 samples had images of all the 96-well plates from when they were read by the laboratory scientist after two weeks incubation. As mentioned in the Methods, this allowed the CRyPTIC project to produce high-confidence MICs where two or more independent methods agreed on the value 5,8, thereby reducing measurement error. We will therefore first examine the effect of only using these high-confidence MICs.

      Only using high confidence UKMYC phenotypes improves performance

      The sensitivities of ten of the 11 antibiotics for which we have UKMYC data increases (clofazimine being the exception) compared to the Nulls+Minors dataset, with increases ranging from +1.4% for rifampicin to +14.4% for linezolid (Fig. 1 and Table 2). Specificity is largely unchanged with only ethambutol (-2.7%) changing by more than an absolute percentage point. Only the increases seen for amikacin and kanamycin are statistically significant; if we compare back to the Simple dataset then the increases for isoniazid, linezolid, moxifloxacin, levofloxacin, amikacin and kanamycin are statistically signficant. No changes in specificity are statistically significant. This suggests that measurement error is partly constraining the measured performance of the WHOv2 catalogue. Notably both rifampicin and isoniazid now achieve sensitivities and specificities above 95% which is a requirement to pass ISO 20776-2:202126. The number of MICs discarded varies by antibiotic with isoniazid having 906 high-confidence measurements and clofazimine only 676 which tells us something about the relative difficulties of reading MICs for the different drugs, which could, in part be due to the differing levels of growth observed on the UKMYC 96-well plates.

      Discrepants tend to have lower growth and/or MICs near the ECOFF/ECV

      Even after subsetting down to only consider high-confidence MICs, our hypothesis is that discrepant samples are more likely to have poor bacterial growth on the UKMYC plates after two weeks incubation as this would affect all measurement methods. Examining the distributions of bacterial growth (averaged from the two positive control wells) shows that this is not true (Fig. 2) as in all cases, whilst there is large variation between samples, there is no significant difference in the distributions for e.g. the True Positives (RR) and False Positives (RS).

      Figure 2:

      Examining the discrepancies for (A) rifampicin, (B) ethambutol, (C) moxifloxacin and (D) delamanid shows that it is not because these samples were growing less well on the UKMYC plates and were therefore more difficult to measure but for some drugs (e.g. ethambutol) a majority of the discrepant samples have MICs close to the ECOFF/ECV whilst for other drugs (e.g. delamanid) it appears not all the genetic basis for resistance is yet understood. All genetic predictions used the Nulls+Minor Alleles implementation of the WHOv2 catalogue and only high confidence MICs were used. MIC distributions are shown for just one of the two 96-well plate designs used.

      Lastly, whilst the original measurement was an MIC, this was binarised using an ECOFF/ECV 8, and discrepants could arise if their MIC was close to the ECOFF/ECV such that it was within measurement error. Results vary by drug (Fig. 2), but this appears at least partly true for ethambutol, moxifloxacin and delamanid. It is clear that the less bimodal the MIC distribution, the more likely this effect will lead to misclassification and thereby discrepants.

      Discussion

      We have validated the antimicrobial resistance prediction functionality of our Mycobacterial genetics processing pipeline that has been implemented in a cloud-based platform. Crucially, the two main antitubercular drugs, rifampicin and isoniazid, achieve sensitivities and specificities above 95% which is required by ISO 20776-2:202126. Whilst the resistance prediction applies an accepted and well-known catalogue – the second edition of the WHO catalogue of M. tuberculosis RAVs 7 – we have had to, in effect, translate the catalogue since the published catalogue is not a single artefact that is computer-parsable and therefore we are testing two distinct statements: (1) that our translation is correct and (2) that the reported performance for WHOv27 is representative of what would be expected in the clinic.

      Unfortunately the samples that make up the WHOv2 Training Set are not (yet) publicly available. There are likely very good reasons for this, for example data owners may be willing to share samples with the WHO but not the public, but it does prevent researchers, such as ourselves, from reproducing the analysis that led to WHOv2 which is important to find errors and gain trust. We cannot therefore formally disentangle whether the observed differences are due to differences between the WHO Training datatset and our publicly-available Diverse Testset, problems with parsing the catalogue, or assembly and variant calling differences; using our diverse dataset of 2,663 M. tuberculosis samples we can achieve similar performance for many of the 15 antibiotics covered by WHOv2. The specificities of bedaquline, linezolid, clofazimine and delamanid, however, remain low and variable and clearly more resistant samples are needed for the next iteration of the WHO catalogue.

      Since there is no guidance on how to apply the WHOv2 to a WGS sample we have chosen to make some enhancements; the first is to explicitly flag genetic loci which are associated with resistance when there are insufficient reads to identify the allele. These we call Fails and, whilst small in number (1.2% of samples had at least one F), they correlate with poor sequencing quality and usually indicate a sample needs to be re-sequenced. The second improvement is to allow any resistance-associated variant to be identified if it is supported by at least three (short) reads, thereby allowing minor alleles to contribute to resistance prediction. This has been shown to boost the sensitivity for fluoroquinolones 27 and rifampicin 28 and the WHOv2 report also showed how lowering the fraction of reads supporting a call often increased the sensitivity 7. Eight antibiotics saw an increase in sensitivity of more than two absolute percentage points but this was only statistically significant for capreomycin; a larger dataset would be needed to draw definite conclusions. Only one drug saw a drop in specificity of more than one absolute percentage point and this was not statistically significant. Detecting minor alleles is not yet commonplace in WGS, but happens as a matter of course in targeted NGS approaches and nucleic acid amplification tests. In a further enhancement to the WHOv2, we also classify mutations in genes known to be associated with resistance as Unknown even if they are not listed in the catalogue. This is possible because our GARC grammar (Supplemental Information) allows wildcards such that our catalogues can contain a single rule encoding logic like “any missense mutation in the coding region of gene X” which, in addition to a way of prioritising rules, is necessary to enable this functionality.

      Reducing measurement error by only using MICs in which two or more methods concur further improves sensitivity, suggesting that our Diverse Testset (like all others) contains some measurement errors. Statistically significant increases were observed for six antibiotics compared to the results obtained using the Simple Dataset. Finally we note that some of the remaining discrepancies are likely, in part, a natural consequence of thresholding MICs which have a level of natural variability and this is particularly pronounced for antibiotics where the MIC distribution is not bimodal, such as ethambutol and the fluoroquinolones.

      To permit comparison with the stated performance of WHOv2, we have grouped samples predicted to be Susceptible to an antibiotic along with those which contain a genetic variant with no definite classification: we call these U, for Unknown. Aggregating Susceptible and Unknown results like this is reasonable for antibiotics where the majority of the genetic determinants have been discovered, but for drugs like bedaquiline or even pyrazinamide it breaks down because the probability of a novel mutation has a reasonable probability of being associated with resistance. For these drugs at least, it seems sensible to report an Unknown result for the drug, rather than assume it is Susceptible. Other studies have shown how, in some cases, one can take advantage of the correlations between different drugs to infer that they are likely Susceptible29.

      Thus far we have employed a four-valued logic: ResistantSusceptibleUnknown and Fail. The Unknown value in this logic is, in fact, composed of two distinct cases. These are delineated by whether the genetic variant has been seen in sufficient clinical samples to have adequate statistical support or not. In the latter case the label is therefore transitory; collecting more samples will improve the statistics and its effect will become associated with a definite label – we suggest this retains the Unknown label. The former corresponds to genetic variants, such as M306V/I in embB, where the minimum inhibitory concentration distribution straddles the ECOFF/ECV. Collecting more samples changes nothing and one can argue another definite value, distinct from Resistant or Susceptible, is therefore needed. Naming such a value is controversial (e.g. Intermediate) but its existence is unavoidable and undeniable. One can argue that it would be confusing or inappropriate to adopt a five-valued logic – we have kept here to a four-valued logic (ResistantSusceptibleUnknown and Fail). Fortunately, the requirement to minimise the very major error rate naturally gives us the natural priority order in our GARC grammar; Resistant > Fail > Unknown > Susceptible.

      Clearly it would have been preferable to be able to isolate the implementation of the WHOv2 catalogue in our Mycobacterial pipeline from the performance of the WHOv2 catalogue but this was not possible due to the lack of publicly available datasets. Also, whilst we have described our 2,663 samples as a Diverse Testset it is likely not truly independent since at least the 1,000 UKMYC samples form part of the WHO Training Set. Also, despite our best efforts, there are insufficient samples resistant to bedaquline, linezolid and delamanid in the Diverse Testset; additional, targeted sample collection with better sharing is the only answer here.

      Despite these shortcomings, we hope that this dataset of 2,663 M. tuberculosis samples will be of use to other researchers and could even form the kernel for a standard testset upon which new tools or implementations of catalogues could be tested and preliminary performance reported. All samples and their results can be downloaded using the attendant GitHub repository* which also contains code allowing all analysis to be repeated and all figures to be redrawn.

      Supporting information

      Supplemental Information[supplements/622466_file02.pdf]

      Funding

      The authors would like to acknowledge funding from the National Institute for Health Research (NIHR) Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance (NIHR200915), a partnership between the UK Health Security Agency (UKHSA) and the University of Oxford, the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) and the Ellison Institute of Technology, Oxford Ltd. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The findings and conclusions in this report are solely the responsibility of the authors and do not necessarily represent the official views of the NHS, the NIHR, UKHSA, the Department of Health and Social Care or the Ellison Institute of Technology, Oxford Ltd.

      Ethics

      All samples (both genetics and drug susceptibility data) were downloaded from public repositories. Ethics approval was previously obtained by the CRyPTIC project 5.

      Author contributions

      TEAP, DWC and PWF conceived of the design. JW, CB, MB, MC, BC, AH, JK, MLA, RS, HT, TEAP, DWC, RT and PWF built and tested the Mycobacterial pipeline. SS and KG contributed to the co-ordination and development of EIT Pathogena software.

      Conflict of Interest

      SS and KG were employed by the Ellison Institute of Technology, Oxford Ltd. DWC and PWF receive consultancy fees from the Ellison Institute of Technology, Oxford Ltd.

      Acknowledgements

      We are grateful to EIT Pathogena for helpful discussions and deploying our pipeline in their cloud platform and to ORACLE Corporation for access to their cloud.

      Funding

      NIHR, , NIHR200915

      Footnotes

      References

      [29].The CRyPTIC Consortium, 100000 Genomes Project (2018) New Eng J Med 379:1403–1415.

      [1].World Health Organization (2024) Global tuberculosis report. ISBN: 978-92-4-010153-1.

      [2].World Health Organization (2022) Rapid communication: key changes to the treatment of drug-resistant tuberculosis. Technical report, World Health Organization.

      [3].World Health Organization (2024) WHO Bacterial Priority Pathogens List: Bacterial pathogens of public health importance to guide research, development and strategies to prevent and control antimicrobial resistance. ISBN: 978-92-4-009346-1.

      [4].Walker TM, Cruz ALG, Peto TE, Smith EG, Esmail H, Crook DW (2017) Lancet Infec Disease 17:359– 361.

      [5].The CRyPTIC Consortium (2022) PLOS Biology 20:e3001721.

      [6].World Health Organization (2021) Catalogue of mutations in Mycobacterium tuberculosis complex and their association with drug resistance. Technical report. ISBN: 978-92-4-002817-3.

      [7].World Health Organization (2023) Catalogue of mutations in Mycobacterium tuberculosis complex and their association with drug resistance, 2nd ed. Technical report. ISBN: 978-92-4-008241-0.

      [8].The CRyPTIC Consortium (2022) Eur Respir J 60:2200239.

      [9].Fowler PW, Gibertoni Cruz AL, Hoosdally SJ, Jarrett L, Borroni E, Chiacchiaretta M, Rathod P, Lehmann S, Molodtsov N, Grazian C, Walker TM, Robinson E, Hoffmann H, Peto TEA, Cirillo DM, Smith GE, Crook DW (2018) Microbiology 164:1522–1530.

      [10].Fowler PW, Wright C, Spiers-bowers H, Zhu T, Baeten EML, Hoosdally W, Lu A, Cruz G, Roohi A, Kouchaki S, Walker TM, Peto TEA, Miller G, Lintott C, Clifton D, Crook DW, Walker AS (2022) eLife 11:e75046.

      [11].Constantinides B, Hunt M, Crook DW (2023) Bioinformatics btad728.

      [12].Wood DE, Lu J, Langmead B (2019) Genome Biology 20:257.

      [13].Li H (2021) Bioinformatics 37:4572–4574.

      [14].Hunt M, Bradley P, Lapierre SG, Heys S, Thomsit M, Hall MB, Malone KM, Wintringer P, Walker TM, Cirillo DM, Comas I, Farhat MR, Fowler P, Gardy J, Ismail N, Kohl TA, Mathys V, Merker M, Niemann S, Omar SV, Sintchenko V, Smith G, Soolingen Dv, Supply P, Tahseen S, Wilcox M, Arandjelovic I, Peto TEA, Crook DW, Iqbal Z (2019) Wellcome Open Research 4:191.

      [15].Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, Krogh A, McLean J, Moule S, Murphy L, Oliver K, Osborne J, Quail MA, Rajandream MA, Rogers J, Rutter S, Seeger K, Skelton J, Squares R, Squares S, Sulston JE, Taylor K, Whitehead S, Barrell BG (1998) Nature 393:537–544.

      [16].Camus JC, Pryor MJ, Médigue C, Cole ST (2002) Microbiology 148:2967–2973.

      [17].Lew JM, Kapopoulou A, Jones LM, Cole ST (2011) Tuberculosis 91:1–7.

      [18].Hunt M (2021). Clockwork: Pipelines for processing bacterial sequence data (Illumina only) and variant calling.

      [19].Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H (2021) GigaScience 10:giab008.

      [20].Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G (2012) Nature Genetics 44:226–232.

      [21].Hunt M, Letcher B, Malone KM, Nguyen G, Hall MB, Colquhoun RM, Lima L, Schatz MC, Ramakrishnan S, Iqbal Z, CRyPTIC consortium (2022) Genome Biology 23:147.

      [22].Westhead J, Fowler PW (2024). Conversion of the WHO TB catalogue to GARChttps://github.com/fowler-lab/who_catalogue_conversion.

      [23].Westhead J, Fowler PW (2021). piezo: predicting the effect of a genetic mutation on an antibiotic. https://github.com/oxfordmmm/piezo.

      [24].Westhead J, Fowler PW (2021). Tuberculosis AMR catalogues in a standard grammarhttps://github.com/oxfordmmm/tuberculosis_amr_catalogues.

      [25].Westhead J, Fowler PW (2023). gnomonicus. https://github.com/oxfordmmm/gnomonicus.

      [26].International Organization for Standardization (2021) Clinical laboratory testing and in vitro diagnostic test systems – Susceptibility testing of infectious agents and evaluation of performance of antimicrobial susceptibility test devices. Technical report.

      [27].Brankin AE, Fowler PW (2023) JAC-Antimicrobial Resistance 5:dlad039.

      [28].Brunner VM, Fowler PW (2025) BioRxiv preprint.