Functional Analysis Using Whole-Genome Sequencing of a Drug-Sensitive Mycobacterium tuberculosis Strain from Peru

We report the whole-genome sequence of a Latin American-Mediterranean (LAM) lineage drug-sensitive Mycobacterium tuberculosis strain from Peru, INS-SEN. The functional analysis revealed more mutations in secondary metabolite biosynthesis, transport, and catabolism (clusters of orthologous groups [COG] category Q) than for other LAM-sensitive strains. This study contributes to the understanding of the genomic diversity of drug-sensitive M. tuberculosis.

culosis (TB) worldwide. In Peru, the incidence rate for TB was 95 cases/100,000 people, of which 96% of cases were drugsensitive TB (1). It has been reported that in Peru there is a high diversity of Mycobacterium tuberculosis lineages, including Latin American-Mediterranean (LAM) (23.8%), Haarlem (23.8%), T (22.3%), and Beijing (9.3%) (3). We performed whole-genome sequencing and analysis to investigate the genetic diversity and phylogeny relationships of a drug-sensitive strain of M. tuberculosis, INS-SEN.
INS-SEN was isolated from Lima, Peru. The establishment of this strain's lineage was based on 24 mycobacterial interspersed repetitive unit-variable number of tandem repeat (MIRU-VNTR) loci (4) and by single-nucleotide polymorphisms (SNPs) based on phylogeny (5). The genomic DNA of INS-SEN was sequenced to 1,406ϫ coverage, which consisted of 61,422,158 paired-end reads, using the Illumina HiSeq 2000 sequencer machine. Then, the genomic sequence was assembled with BWA v 0.5.9-r16 (6), using the H37Rv genome (AL123456.3) as a reference, producing 18 contigs. The genomic sequence was annotated with the Rapid Annotations using Subsystem Technology (RAST) server (7) and Prokaryotic Genome Annotation Pipeline (PGAAP). A polymorphism study of the INS-SEN genome was carried out by comparative analysis against the genome of the drug-sensitive strain KZN 4207 (LAM lineage) (8) using SNPsFinder (9) to identify the differences between intergenic and coding regions, and then clusters of orthologous groups (COG) (10).
INS-SEN had more SNPs in PPE associated with antigenic variation (11) in category N and in PE-PGRS associated with antigenic variation and immune evasion (12) in category M than the strains KZN 4207 and H37Rv. Additionally, INS-SEN showed more mutations in category Q than the strain KZN 4207. It is possible that the organization of SNPs in INS-SEN may have a role in adaptation to its environment.
Nucleotide sequence accession numbers. This whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number JAQH00000000. The version described in this paper is JAQH01000000.

ACKNOWLEDGMENT
This study was supported by the Peruvian National Institute of Health.