Download it from ftp://ftp.lsd.ornl.gov/xgrail
It can locate:
promoter regions-red boxes, TATA/ATA elements.
exons-bars above the sequence.
poly-a sites-cyan bars-AATAAA within 5kb of stop codon.
Highly suggestive of
the end of a gene.
repetitive elements-yellow and orange
boxes-satellite DNA are
sequences (i.e. ACAAACT) that are repeated millions of times
around the centromeres and telomeres. (center and end of
chromosomes) there are also smaller repeats of up to a couple
thousand bp scattered throughout the genome. Presence of
repetitive DNA is highly suggestive of an intron.
CpG islands-purple boxes-About 56% of human genes are
associated with CpG islands. Often CpG islands overlap the
promoter and extend about 1000 base pairs downstream into the
transcription unit. Identification of potential CpG islands during
sequence analysis helps to define the extreme 5' ends of genes.
CpG islands are commonly defined as regions of DNA of at
least 200 bp in length and that have a G+C content above 50.
The Cs in most CpG dinucleotides are methylated, and
methylated
Cs tend to mutate to T.
When tested against a set of sequence data with known exons, GRAIL recognized 91% of the exons in the set, with a false positive rate of 8.6%. Now its time to play with XGRAIL on a 135kb sequence of human chromosome 22. (Z83838.2)
There is a gene, Rho GTPase
activating protein 8, which is located over a 48,048bp span, starting at 123.
Get your sequence data in
FASTA and save it.
Open up the file in a text
editor and replace the first line (> [info]) with >[filename].
Run xgrail and open up your
sequence. Select the correct organism in the open dialogue so that your codon
biases are correct.
In the features menu you can
choose which features you’d like to display.
Be prepared to sit and wait for repetitive DNA.
After running all the
analyses you can save them. The annotated sequence is ~4 times the size of the
FASTA.
Once probable exons have
been located, the protien must be assembled.
GRAIL uses BLAST to search a
database of known proteins and returns high-scoring alignments.
There is also a web
interface to GRAIL at
http://grail.lsd.ornl.gov/grailexp/
Procrustes – A homology
based gene prediction tool
http://www-hto.usc.edu/software/procrustes/
Procrustes was a legendary
Greek robber baron who would lay his victims down on an iron bed and either
stretch them until they fit if they were too short, or cut off their legs if
they were too long.
Based on the theory that
genes are well conserved across species
Best for complex exon
assemblies and short exon prediction.
Procrustes runs through each
possible exon assembly and compares it to the database of known protiens,
saving the best matches.
This is an important
difference from GRAIL, which chooses its exons based on known codon preferences
and other factors, assembles the protein, and then uses genquest to compare
it’s guess against known proteins.
Images: