APPENDIX 3 Codon Usage in C. elegans
Abstract
All nuclear protein-coding sequences annotated as deriving from C. elegans were extracted from the GenBank/EMBL/DDBJ DNA sequence data library (GenBank release 92), using the ACNUC retrieval system (Gouy et al. 1985). Duplicate sequences, partial sequences, and sequences containing ambiguous codons or multiple stop codons were excluded, yielding a total dataset of 4027 open reading frames (ORFs). Although some of these sequences were determined by the “traditional” approach — i.e., the genes were identified and sequenced because of some known function or phenotype — many others were found within cosmids sequenced as part of the genome project and many of the genes thus identified remain putative. Therefore, gene sequences were first designated as (1) “genes” if the sequence was determined by...
Full Text:
PDFDOI: http://dx.doi.org/10.1101/0.1053-1057