glam2scan Manual

glam2scan finds matches, in a sequence database, to a motif discovered by glam2. Each match receives a score, indicating how well it fits the motif.

Basic usage

Running glam2scan without any arguments gives a usage message:

Usage: glam2scan [options] alphabet my_motif.glam2 my_seqs.fa
Main alphabets: p = proteins, n = nucleotides
Main options (default settings):
-h: show all options and their default settings
-o: output file (stdout)
-n: number of alignments to report (25)
-2: examine both strands - forward and reverse complement

glam2scan needs three pieces of information - the alphabet, a file containing a motif found by glam2, and a file of sequences in FASTA format:

glam2scan p prot_motif.glam2 lotsa_prots.fa
glam2scan n nuc_motif.glam2 lotsa_nucs.fa

An alphabet other than p or n is interpreted as the name of an alphabet file. Motif files from glam2 often contain multiple motifs: glam2scan only considers the top one.

Output format

The output begins with some general information:

GLAM2scan
Version 9999

glam2scan p prot_motif.glam2 lotsa_prots.fa

This is followed by motif matches, sorted in order of score. A motif match looks like this:

                 **.****
SOS1_HUMAN   780 HPIE.IA 785 + 8.70

The name of the sequence with the match appears on the left; the start and end coordinates of the match appear on either side of the matching sequence; the match score appears on the right. The plus sign indicates the strand of the match (only meaningful when considering both strands of nucleotide sequences with the -2 option). The stars indicate the key positions of the motif: the alignment of the match to the key positions is shown.

Basic options

Advanced options

The remaining options are somewhat specialized. For typical usage, it is reasonable to set them to exactly the same values as were used with glam2 to discover the motif.

Motif format

Some users may wish to make 'fake' glam2 motifs for input to glam2scan, for instance based on motifs found by other tools. Most of the glam2 output is ignored by glam2scan, and a minimal motif file looks like this:

                **..****
seq1         10 HP..D.IG
seq2          5 HPGADLIG
seq3          7 HP..ELIG
seq4          5 HP..ELLA

The sequence names and coordinates are ignored, but some placeholder characters should be present. The stars indicating key positions are necessary, and the first and last columns must be starred.