DNA regulatiivsete elementide parendatud otsing kasutades geneetilist algoritmi
Date
2007
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
N/A
Detection of transcription factor binding sites is an important area of contemporary bioinformatics research. Most of the algorithms currently available for that task (e.g. SPEXS or MEME) perform pattern mining on strings, searching for overrepresented or conserved short DNA sequences and reporting the position weight matrices (PWMs), corresponding to the sites found. PWMs thus found can then be used to search for binding sites in other genes or to perform functional classification. However, the PWMs reported by SPEXS or MEME were not explicitly optimized for discriminative tasks and therefore can be suboptimal. In this thesis we examine a way to optimize these initial PWMs to perform better in gene classification using genetic algorithms. We used two measures of discriminative performance, hypergeometric p-value and ROC AUC and ran genetic algorithms to optimize them with respect to two datasets: one artificial, and one realistic. In two experiments out of four the p-value and the ROC AUC score could be significantly improved and we find this result very interesting.
Detection of transcription factor binding sites is an important area of contemporary bioinformatics research. Most of the algorithms currently available for that task (e.g. SPEXS or MEME) perform pattern mining on strings, searching for overrepresented or conserved short DNA sequences and reporting the position weight matrices (PWMs), corresponding to the sites found. PWMs thus found can then be used to search for binding sites in other genes or to perform functional classification. However, the PWMs reported by SPEXS or MEME were not explicitly optimized for discriminative tasks and therefore can be suboptimal. In this thesis we examine a way to optimize these initial PWMs to perform better in gene classification using genetic algorithms. We used two measures of discriminative performance, hypergeometric p-value and ROC AUC and ran genetic algorithms to optimize them with respect to two datasets: one artificial, and one realistic. In two experiments out of four the p-value and the ROC AUC score could be significantly improved and we find this result very interesting.