| Trees | Indices | Help |
|---|
|
|
Parse Unigene flat file format files such as the Hs.data file.
Here is an overview of the flat file format that this parser deals with:
Line types/qualifiers:
ID UniGene cluster ID
TITLE Title for the cluster
GENE Gene symbol
CYTOBAND Cytological band
EXPRESS Tissues of origin for ESTs in cluster
RESTR_EXPR Single tissue or development stage contributes
more than half the total EST frequency for this gene.
GNM_TERMINUS genomic confirmation of presence of a 3' terminus;
T if a non-templated polyA tail is found among
a cluster's sequences; else
I if templated As are found in genomic sequence or
S if a canonical polyA signal is found on
the genomic sequence
GENE_ID Entrez gene identifier associated with at least one
sequence in this cluster;
to be used instead of LocusLink.
LOCUSLINK LocusLink identifier associated with at least one
sequence in this cluster;
deprecated in favor of GENE_ID
HOMOL Homology;
CHROMOSOME Chromosome. For plants, CHROMOSOME refers to mapping
on the arabidopsis genome.
STS STS
ACC= GenBank/EMBL/DDBJ accession number of STS
[optional field]
UNISTS= identifier in NCBI's UNISTS database
TXMAP Transcript map interval
MARKER= Marker found on at least one sequence in this
cluster
RHPANEL= Radiation Hybrid panel used to place marker
PROTSIM Protein Similarity data for the sequence with
highest-scoring protein similarity in this cluster
ORG= Organism
PROTGI= Sequence GI of protein
PROTID= Sequence ID of protein
PCT= Percent alignment
ALN= length of aligned region (aa)
SCOUNT Number of sequences in the cluster
SEQUENCE Sequence
ACC= GenBank/EMBL/DDBJ accession number of sequence
NID= Unique nucleotide sequence identifier (gi)
PID= Unique protein sequence identifier (used for
non-ESTs)
CLONE= Clone identifier (used for ESTs only)
END= End (5'/3') of clone insert read (used for
ESTs only)
LID= Library ID; see Hs.lib.info for library name
and tissue
MGC= 5' CDS-completeness indicator; if present, the
clone associated with this sequence is believed
CDS-complete. A value greater than 511 is the gi
of the CDS-complete mRNA matched by the EST,
otherwise the value is an indicator of the
reliability of the test indicating CDS
completeness; higher values indicate more
reliable CDS-completeness predictions.
SEQTYPE= Description of the nucleotide sequence.
Possible values are mRNA, EST and HTC.
TRACE= The Trace ID of the EST sequence, as provided by
NCBI Trace Archive
|
|||
|
|||
|
SequenceLine Store the information for one SEQUENCE line from a Unigene file |
|||
|
ProtsimLine Store the information for one PROTSIM line from a Unigene file |
|||
|
STSLine Store the information for one STS line from a Unigene file |
|||
|
Record Store a Unigene record |
|||
|
UnigeneSequenceRecord Store the information for one SEQUENCE line from a Unigene file (DEPRECATED). |
|||
|
UnigeneProtsimRecord Store the information for one PROTSIM line from a Unigene file (DEPRECATED). |
|||
|
UnigeneSTSRecord Store the information for one STS line from a Unigene file (DEPRECATED). |
|||
|
UnigeneRecord Store a Unigene record (DEPRECATED). |
|||
|
_RecordConsumer This class is DEPRECATED; please use the read() function in this module instead. |
|||
|
_Scanner Scans a Unigene Flat File Format file (DEPRECATED). |
|||
|
RecordParser This class is DEPRECATED; please use the read() function in this module instead. |
|||
|
Iterator This class is DEPRECATED; please use the parse() function in this module instead. |
|||
|
|||
|
|||
|
|||
|
|||
|
|||
UG_INDENT = 12
|
|||
StringTypes =
|
|||
__package__ =
|
|||
xml_support = 1
|
|||
| Trees | Indices | Help |
|---|
| Generated by Epydoc 3.0.1 on Fri Nov 26 16:18:42 2010 | http://epydoc.sourceforge.net |