Mining gene expression information using a controlled hierarchical vocabulary Genome scientists, geneticists and clinicians operate in disciplines that have each contributed to increased scientific knowledge namely, release of genome draft sequence, transcriptome data and mapping information for diseases loci and disease phenotypes. Successful integration of these fields, by linking genome information to disease phenotypes allow for accelerated disease gene discovery. Electric Genetics and SANBI researches have developed a tool that integrates transcript information, genomic sequence, genetic mapping information and a standardised controlled vocabulary for identification of disease candidate genes. We have contructed a controlled vocabulary for dbEST libraries that classifies each cDNA library into four orthogonal categories including anatomical_site, cell_type, developmental stage and pathology. A total of 5756 dbEST libraries were mapped onto this controlled vocabulary and imported into a relational database. The controlled vocabulary demonstrates that detailed definitions in four separate vocabulary tracks allow detailed mining of gene expression state information. The controlled vocabulary was linked to the STACK_PACK clustering tool and provided a means of querying all reconstructed transcripts. Genetic markers were mapped onto the "golden path" genome assemblies to serve as a reference guide to position all identified candidate genes. Reconstructed transcripts are also mapped to "golden path" genome assemblies. The EG/SANBI toolset and controlled vocabulary will be released to the scientific community under an Open Source license. ---END -- _______________/\/eGenetics.com\/\_____________________________________ Peter van Heusden 1024D/0517502B pvh@egenetics.com Electric Genetics DE5B 6EAA 28AC 57F7 58EF 9295 6A26 6A92 0517 502B