VIRTUAL NORTHERN BLOT (VNB)

GENERAL INFORMATION:

VNB is a program that will query the large collection of EST sequences in the public domain at NCBI's dbEST (http://www.ncbi.nlm.nih.gov/dbEST/) for tissue-specific gene expression for any mRNA (cDNA) sequence from mouse or human in which you are interested. The cDNA sequence in which you are interested (your query) should be provided as a plain text file. The program will use an alignment of your query with other genes in the gene family and from that alignment, generate a set of gene-specific probes. VNB uses these probes to find exact matches in dbEST using the BLAST server at NCBI. The alignment is either generated from RefSeq (http://www.ncbi.nlm.nih.gov/RefSeq/) or provided by the user using the ClustalW site (http://www.ebi.ac.uk/clustalw/). The ESTs that are retrieved as "hits" are tallied according to tissue and two profiles are provided. A non-quantitative profile that categorizes all the hits by tissue-of-origin regardless of how the library was generated (listed in numerical order) and a quantitative profile that tallies the hits that were exclusively from libraries that had not been manipulated in any way such that the ESTs from these libraries reflect the mRNA population of the tissue of origin. The expression levels are calculated from these hits by dividing by the total number of such ESTs in dbEST from that tissue and the standard deviation of the expression level calculated using a Poisson distribution. All results are available after several minutes and an email is returned with links to these files.

 

SET YOUR INPUT:

Your nucleotide sequence must be in a plain text file that contains only the characters {A,G,T,C,a,g,t,c} and spaces and returns. You can NOT cut and paste the sequence in the window. No "word" (.doc) files will work. Use the "Browse" button to upload a TEXT (.txt) file. The name of your file will appear in the adjacent window (not visible with some browsers).

You can also enter your query in the form of an alignment generated using ClustalW. Your query sequence should be at the top and all the paralogs with which you would not like it confused in the tissue-specific gene expression profile below. When you use ClustalW, do not change any of the default settings for the standard output. The plain text file should have the sequences aligned with the names of each on the left and the base numbers on the right. The positions in the alignment that are invariant are noted by an asterisk at the bottom of the alignment. This option is provided because the alignment generated from the sequences currently available in RefSeq may not contain all the paralogs due to the incomplete nature of RefSeq.

IMPORTANT: In either case, be certain to remove from your query any poly(A) sequences or other well-conserved repeat sequences that are in many mRNAs, as this will cause large numbers of meaningless hits.

 

SELECT YOUR PARAMETERS:

1. Organism; To use, you must first select an organism. The only organisms that currently have all their libraries indexed at the cGAP site (http://cgap.nci.nih.gov/Tissues/LibraryFinder) are those for mouse and human. Here they distinguish libraries from normal tissues, cancerous tissues, and being quantitative or non-quantitative.

2. Exclude ESTs from cancerous tissue in profile; If this is checked, the results will only tally EST hits from libraries derived from normal non-diseased tissues. If you do not choose this option, cancerous tissue profiles, in addition to normal tissues, will be included in your profile.

3. Check probes for exact matches to paralogs; when this option is invoked by checking the box, an additional routine called ProbeChecker is implemented. First, the BLAST alignment that is generated for your query is modified to remove any entries from RefSeq that are not genuine paralogs (rather might be from alternative splicing, alternative poly-adenylation, or mistakes, etc.). It does this by asking if there is >95% identity of putative paralogs with your query. If so, it is removed. Second, the set of probes that are generated from your query are matched against the remaining paralogs and any that are 100% matches to any of the paralogs to your query are removed. The effect of this is to decrease the false positives in the results. This, however, will also decrease the sensitivity. Caution should be used when using ClustalW alignments that include alternative transcripts from the same gene. This may lead to the program considering one of them as a true paralog, and if "Check probes" is selected, most of the probes will be rejected resulting in very few hits.

4. Window size; the default size is 8 nucleotide bases, which means that a "best-fit" probe is chosen from your query every 8 nucleotides. By decreasing this number you generate more probes, which will take longer to run, but will increase your sensitivity. Window sizes under 8 have shown an increase in false positives. By increasing this number, you generate fewer probes, which runs faster, but will decrease your sensitivity.

5. Probe length; the default size is 20 nucleotide bases, which means that the probes generated from your query for matching to entries in dbEST will be 20 nucleotides in length. Probe sizes longer than 20 usually give less sensitivity, but more specificity (fewer false positives). Probe sizes shorter than 20 usually give more sensitivity, but less specificity (more false positives). These defaults were judged optimal for queries that came from gene families that ranged from 45-90% identity for the closest paralog to the query.

 

GETTING YOUR RESULTS:

1. Name your query; this will be used to identify your result file. The name you give to your query may only contain digits and letters because it will be part of a file name.

2. Email address; your results will be sent to you via email. The email will provide links to the files generated from your query. These files remain on the VNB server for a limited time. In addition to the qualitative and quantitative list of the number of EST hits for each tissue, graphic displays of each are included, as well as files for the intermediate results; the alignment used for generation of the probes, the sequences of the probes used to query dbEST, and a linked list of accession numbers for each EST hit.

A significant portion of the VNB program is a "web robot", which means that it uses other web sites to compile the information. It is therefore dependent on the "traffic" on those servers. Generally, VNB should return results within 10-90 minutes, unless the servers it uses are down or slower than normal. If an error message is returned in your email, you should try it at a later time because the problem is most likely due to problems with the BLAST server. This program works best at BLAST off-peak hours, which are typically 10 PM to 6 AM EST.