What is GOTTCHA?
GOTTCHA (Genomic Origin Through Taxonomic CHAllenge) is an application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly smaller false discovery rates (FDR) that is also laptop deployable. Our algorithm was tested and validated on twenty synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools.
What's new?
GOTTCHA v1.0c released (2017/11/09):
GOTTCHA_database_v20150825 Released (2015/09/21):
Updates (2015/07/20):
GOTTCHA v1.0b released (2015/05/22):
Some major changes have been made in the v1.0 release:
Can my system run GOTTCHA?
Either Linux (2.6 kernel or later) or Mac OS (OSX 10.6 Snow Leopard or later) with a minimum of 8 GB of RAM is recommended. Perl v5.8 or above is required. The C/C++ compiling enviroment might be required for installing dependencies. Systems may vary. Please assure that your system has the essential software building packages (e.g. build-essential for Ubuntu, XCODE for Mac...etc) installed properly before running the installing script.
GOTTCHA was tested successfully on our Linux servers (Ubuntu 12.10 w/ Perl v5.14.2; Ubuntu 10.04 w/ Perl v5.10.1) and Macbook Pro laptops (MAC OSX 10.8 w/ XCODE v5.1).
How to install GOTTCHA?
The installation guide and a quick tutorial can be found on the Github page. A more detailed description can be found in this section.
Discussions / Bugs Reporting
We have created a mailing list for GOTTCHA users. If you would like to recieve notifications about the updates and join the discussion, please join the mailing list by becoming the member of GOTTCHA-users groups.
Despite all these efforts, there are potential bugs and issues. Please help us to make it better by reporting them to GitHub issue tracker.
Any other questions? You are welcome to contact Po-E (Paul) Li via po-e[at]lanl.gov.
How to Run GOTTCHA? (The "I Can't Wait!" instructions)
This is a quick example of profiling a "test.fastq" file using GOTTCHA with a species-level pre-computed bacterial database. The testing FASTQ file comes along with the GOTTCHA package in the "test" directory. More details are stated in the INSTRUCTION section.
-
Obtaining the GOTTCHA package:
$ git clone https://github.com/LANL-Bioinformatics/GOTTCHA.git gottcha
-
Installing GOTTCHA:
$ cd gottcha $ ./INSTALL.sh
-
Downloading lookup table and species-level database from our web server:
$ wget ftp://ftp.lanl.gov/public/genome/gottcha/latest/GOTTCHA_lookup.tar.gz $ wget ftp://ftp.lanl.gov/public/genome/gottcha/latest/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.species.tar.gz
If you have any difficulty obtaining the databases, please contact us.
-
Unpacking and decompressing the previous downloads:
$ tar -zxvf GOTTCHA_lookup.tar.gz $ tar -zxvf GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.species.tar.gz
-
Running gottcha.pl:
$ bin/gottcha.pl \ --threads 8 \ --outdir ./ \ --input test/test.fastq \ --database database/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.species
Enjoying the result at './test.gottcha.tsv'.
What's the output?
GOTTCHA reports profiling results in a neat summary table (*.gottcha.tsv) by default. The tsv file will list the organism(s) at all taxonomic levels from STRAIN to PHYLUM, their linear length, total bases mapped, linear depth of coverage, and the normalized linear depth of coverage. The linear depth of coverage (LINEAR_DOC) is used to calculate relative abundance of each organism or taxonomic name in the sample.
Summary table:
Column | Description |
---|---|
LEVEL | taxonomic rank |
NAME | taxonomic name |
REL_ABUNDANCE | relative abundance (equivalent to NORM_COV by default) |
LINEAR_LENGTH | number of non-overlapping bases covering the signatures |
TOTAL_BP_MAPPED | sum total of all hit lengths recruited to signatures |
HIT_COUNT | number of hits recruited to signatures |
HIT_COUNT_PLASMID | number of hits recruited to signatures |
READ_COUNT | number of reads recruited to signatures |
LINEAR_DOC | linear depth-of-coverage (TOTAL_BP_MAPPED / LINEAR_LENGTH) |
NORM_COV | normalized linear depth-of-coverage (LINEAR_DOC / SUM(LINEAR_DOC in certain level)) |
How to visualize the result?
Krona is an interactive browser that allows
the exploration of hierarchical data with pie charts. Assuming you have Krona installed properly,
you are going to create a Krona chart from a text file listing abundance and lineages.
You must run GOTTCHA with the "--mode all" option; Use "
Use 'ktImportText' and save the chart to "test.krona.html":
$ ktImportText test_temp/test.lineage.tsv -o test.krona.html
Citation
Please cite GOTTCHA in your publications:
Tracey Allen K. Freitas, Po-E Li, Matthew B. Scholz and Patrick S. G. Chain (2015) Accurate read-based metagenome characterization using a hierarchical suite of unique signatures, Nucleic Acids Research (DOI: 10.1093/nar/gkv180)