From Shiu Lab
Environment
Cygwin
- Cygwin is a Linux-like environment for Windows.
- Useful packages in addition to the base:
- openssh
- vim
- wget
- perl and python
SSH access
SSR in UTRs, CDS
- MISA: SSR identification script.
perl misa.pl fasta_file_name
- Determine whether the repeats are in CDS or UTR
- Similarity search software: BLAST
- Build a database of protein sequences from 5 plants
- Run ESTs against the protein database
- Compare the MISA output against the output of BLAST
SSR mapping script
- Emma is responsible for this part.
- Example files:
SSR in three pairs of closely related plant species
Species selection
- Look into the species relationships in phytome and select species pairs based on the following criteria:
- Two species in the same genus or the same family.
- More than 10,000 EST contigs (see PlantGDB)
- You will need the following scripts from /home/shiu/codes:
- BlastUtility.py
- ParseBlast.py
- FastaManager.py
- FileUtility.py
- SingleLinkage.py
- Translation.py
- Download all scripts in a single zip file
- Determine the number of EST contig by running:
- python FastaManager.py -f count -fasta fasta_file_name
- Species that somehow interest you.
Major questions
- What are the SSR type, freuqency, and distribution?
- What is the level of SSR conservation cross species?
- Are SSRs in coding sequences tend to locate in regions with high or low conservation?
What the students did
- Shiu 16:52, 18 July 2007 (EDT)
- Identify SSRs in the EST contigs of 8 plant species.
- Run BLAST on a two-species sequence file for identifying reciprocal best hits.
- Run BLAST using the two-species EST sequences as queries and all plant protein sequences as subjects for identifying UTR and CDS.
SNP, CAP, dCAP
- SNP call will be done in JCVI.
- CAP finding
Intron spanning marker
- Use couple examples, Arabidopsis full length cDNA.
References