BLAST Service
Overview
The BLAST service integrates the BLAST (Basic Local Aligment Search Tool) algorithms to perform searches against against public or private genomes in MAAGE or other reference databases using a DNA or protein sequence and find matching genomes, genes, RNAs, or proteins.
See also
Using the BLAST Service
The BLAST submenu option under the Services main menu (Genomics category) opens the BLAST input form (shown below). Note: You must be logged into MAAGE to use this service.

Options

Search Program
There are four BLAST programs provided by MAAGE, and each has a specific query sequence and database. Clicking on the button in front of the program name will select it and will also select the appropriate databases.
BLASTN – The query sequence is DNA (nucleotide), and when enabled the program will search against DNA databases of contig or gene sequences.
BLASTX – The query sequence is DNA (nucleotide), and when enabled the program will search against the protein sequence database.
BLASTP – The query sequence is protein (amino acid), and when enabled the program will search against the protein sequence database.
tBLASTn – The query sequence is protein (amino acid), and when enabled the program will search against DNA databases of contig or gene sequences.
Query Source
There are three types of Query sources that are provided by MAAGE:
Enter sequence - Paste the query sequence into the box.
Select FASTA file - Choose FASTA file that has been uploaded to the Workspace.
Select feature group - Choose a feature (gene/protein) that has been saved in the Workspace.
Database Source
MAAGE has different databases to choose from for the source to search wihin:
Reference and representative genomes (bacteria, archaea) - Those designated by the NCBI. This is the default.
Reference and representative genomes (virus) - Those designated by the NCBI.
Selected genome list - Clicking on “Search within genome list” in the drop-down box will open a new source box where desired genomes can be added.
Selected genome group - Genome group saved in the Workspace.
Selected feature group - Feature (gene/protein) group saved in the workspace.
Taxon - Selected taxonomic level from the database.
Selected fasta file - FASTA file that has been uploaded to the Workspace.
Database Type
There are three database types:
Genome Sequences (NT) - Genomic sequences from bacterial and viral genomes in MAAGE, i.e. chromosomes, contigs, plasmids, segments, and partial genomic sequences
Genes (NT) - Gene sequences from bacterial and viral genomes in MAAGE.
Proteins (AA) - Protein sequences from bacterial and viral genomes in MAAGE.
Output Folder
Folder in the Workspace where you want the BLAST results stored.
Output Name
Name you provide to identify the results in the Workspace.
Advanced Options
BLAST Parameters include the following:
Max hits - Maximum number of BLAST hits to return.
E-Value threshold - the number of expected hits of similar quality (score) that could be found just by chance.
Output Results

The BLAST Service Job Results page (above) contains information about the job and all the files that are produced when the service completes. Information about the job submission can be seen in the table at the top of the results page. Clicking on “Parameters” below the job information table will display all the parameters that were selected when the job was submitted.
![]()
Clicking on the “View” icon near the top right of the page (left of the green action bar) will display a results table of the hits (below), from which you can perform more actions via the green Action Bar such as downloading, copying, accessing DNA and protein FASTA format data, constructing an MSA, creating a group of the results, and displaying the associated features and genomes in the database. See definitions of these actions at the bottom of this documentation.

The table at the bottom of the page lists all of the files that were generated by the BLAST run:

blast_headers.txt - column headers in BLAST results table, which include
qseqid: query or source (e.g., gene) sequence id
sseqid: subject or target (e.g., reference genome) sequence id
pident: percentage of identical matches
length: alignment length (sequence overlap)
mismatch: number of mismatches
gapopen: number of gap openings
qstart: start of alignment in query
qend: end of alignment in query
sstart: start of alignment in subject
send: end of alignment in subject
evalue: expect value
bitscore: bit score
blast_out.txt - Entire file generated by BLAST job. Used by MAAGE to create the blast_out.json file.
blast_out.json - JavaScript Object Notation (JSON) formatted file, which is a standard data interchange format. It is primarily used for transmitting data between the MAAGE web application and backend servers.
blast_out.metadata.json - JSON-formatted file containing the metadata associated with the BLAST job.
blast_out.raw.json - JSON-formatted file containing list of BLAST identifiers (BLAST uses a different set of identifiers than MAAGE).
blast_out.txt - Text file containing the BLAST results, including query and target sequences, and the strength of the BLAST hits.
References
Altschul, S. F. J. e. BLAST algorithm. (2001).
Boratyn, G.M., Camacho, C., Cooper, P.S., Coulouris, G., Fong, A., Ma, N., Madden, T.L., Matten, W.T., McGinnis, S.D., Merezhuk, Y. et al. (2013) BLAST: a more efficient report with usability improvements. Nucleic acids research, 41, W29-33.
O’Leary, N.A., Wright, M.W., Brister, J.R., Ciufo, S., Haddad, D., McVeigh, R., Rajput, B., Robbertse, B., Smith-White, B., Ako-Adjei, D. et al. (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic acids research, 44, D733-745.