پديد آورندگان :
سيددخت، عاطفه سازمان تحقيقات، آموزش و ترويج كشاورزي - بخش تحقيقات علوم دامي - مركز تحقيقات و آموزش كشاورزي و منابع طبيعي استان خراسان رضوي، مشهد، ايران , رحماني نيا، جواد مؤسسه تحقيقات علوم دامي كشور - سازمان تحقيقات، آموزش و ترويج كشاورزي، كرج، ايران
كليدواژه :
روش هاي محاسباتي , عوامل تنظيمي , كرم ابريشم , ريز RNA
چكيده فارسي :
ريز RNA ها خانواده اي گسترده از مولكول هايRNA كوتاه غير كد كننده پروتئيني (ncRNA) و داراي وظايفي مهم در تنظيم فرآيندهاي رشد در گياهان و حيوانات هستند. مطالعات اندكي در ارتباط با ريز RNA هاي كرم ابريشم كه از نظر اقتصادي بسيار مهم نيز هستند، با تمركز بر شناسايي، آناليز بيان و پيش بيني عملكرد انجام شده است. به طور كلي توالي ريز RNA ها در سرتاسر گونه ها بسيار محافظت شده هستند و از ساختار ساقه-حلقه اوليه در هسته كه از ويژگي هاي بسيار مهم ريز RNA ها است، توليد مي شوند. ريز RNA ها از مهمترين عوامل تنظيمي دخيل در سطوح پس از رونويسي پس از بيان ژن هستند كه در تنظيم تعداد زيادي از فرآيندهاي فيزيولوژيكي مانند رشد و نمو، متابوليسم و وقوع بيماري ها مشاركت مي كنند. با اينكه هزاران ريز RNA در گونه هاي مختلف شناسايي شده اند، تعداد خيلي زيادي هنوز هم ناشناخته باقي مانده است. بنابراين كشف ژن هاي جديد ريز RNA يك گام مهم براي درك ريز RNA هايي است كه مكانيسم هاي تنظيم پس از رونويسي را واسطه گري مي كنند. روش هاي بيولوژيكي براي شناسايي ژن هاي ريز RNA ممكن است در شناسايي تشخيص ريز RNA هاي نادر محدوديت داشته باشند و بيشتر محدود به بافت هاي خاص و مراحل رشد و نموي ارگانيسم تحت آزمايش مي شوند. اين محدوديت ها منجر به پيشرفت روش هاي محاسباتي پيشرفته براي شناسايي ريز RNA هاي احتمالي جديد شده است. استفاده از روش هاي محاسباتي باعث افزايش دقت در شناسايي ريز RNA هاي كرم ابريشم خواهد شد. در اين پژوهش، انواع مدل هاي محاسباتي براي شناسايي توالي هاي ريز RNA استفاده شد. با استفاده از داده هاي مناسب و استخراج ويژگي هاي بيولوژيكي مؤثر، عملكرد اين روش ها ارزيابي شد. در مقايسه با ساير مدل هاي استفاده شده در اين تحقيق، مدل پرسپترون چند لايه با بيشترين مقادير دقت، معيار F و ضريب همبستگي متيو به عنوان روشي مناسب جهت پيش بيني توالي هاي ريز RNA در كرم ابريشم معرفي شد.
چكيده لاتين :
Introduction MicroRNAs (miRNAs) constitute a large family of non-protein-coding small RNA (ncRNA) molecules and have important roles in the regulation of both plant and animal developmental procedures. Generally, sequences of miRNA demonstrate high sequence conservation across animals and are produced from the primary stem-loop structure in the nucleus, which is an important feature of miRNAs. MiRNAs are one of the most important regulatory factors involved in post-transcriptional levels of gene expression that contribute to the modulation of a large number of physiological processes such as development, metabolism and disease occurrence. To date, A few studies related to miRNAs of the economically important silkworm, Bombyx mori, have been carried out, focusing on detection, expression study, and prediction of function. Machine learning approaches are crucial for prediction success. These methods can solve classification problem.
Materials and Method Although hundreds of miRNAs have been detected in different animals, a lot of them are still unknown. Then, finding of novel miRNA genes is an essential step for understanding miRNA intervened post transcriptional regulation processes. It appears that biological methods to recognize miRNA genes might be inadequate in their capacity to identify uncommon miRNAs and are further limited to the tissues surveyed and the developmental phase of the animal under experiment. These restrictions have led to the development of new computational methods attempting to detect potential miRNAs. Experimentally verified miRNA sequences in miRBase release 22.0 were extracted for inclusion in the positive data set. In the miRBase, the reported secondary structures were predicted by a collection of RNA folding software packages. Consequently, in this study for uniformity, all miRNA secondary structures analyzed using RNAfold packages. The major step for machine learning approaches is the selection of a suitable negative dataset. It is important for a well-trained classifier. If the sequences are too artificial, e.g. completely random sequences, then there is a risk that the classifiers will not be well trained to differentiate between different categories of real biological sequences. Conversely, if the negative dataset is too similar to the positive dataset, the classifiers will be unable to find a way to adequately differentiate between these two data sets. We investigated several different types of negative sequences and finally selected negative sequences which made the best distinction with positive data set. The positive training dataset for our classifier development composed of known silkworm pre miRNAs, while the negative training dataset composed of other ncRNA sequences. Our feature set composed of various features and selecting the most discriminative set of features would increase the performance, efficiency and comprehensibility of a classifier method by reducing its complexity.
Results and Discussion Secondary structural patterns of pre miRNA used in this study such as the intramolecular base pairing of pre miRNA is an important beneficial feature for miRNAs classification. The selective powers of the two different classes of miRNAs secondary structural conformation (dot-bracket notation) were analyzed. Secondary structural feature of miRNA such as Minimum Free Energy, Watson-crick base pairing (AU, GC), Wobble base pairing (G-U) and unpaired bases (A, G, C, U) is analyzed by different algorithms. Here we could successfully solve classification problem by developing an effective classification system using machine learning techniques. Our approach includes introducing more representative datasets, extracting new effective biological features, and comprehensive evaluating of classification performance through these methods via cross-validation. Performance of different algorithms was measured by the total number of true negatives (TN), true positives (TP), false positives (FP), false negatives (FN), and accuracy (Q). In order to evaluate the efficiency of various methods developed in this study, various parameters like F-measure, Matthews correlation coefficient (MCC), accuracy (Q) and, ROC area were calculated. Performance measurement of various models tested with data from miRBase in release 22 in ten-fold cross validation. Multilayer Perceptron model could predict pre miRNAs from non-coding sequences that can be important for detecting the true pre miRNAs in genomic sequences. Consequently a new method on miRNA prediction model could be favorable to understand the characteristics miRNA associated with miRNA biogenesis.
Conclusion Research on miRNA represents important progress in the study of ncRNAs and may provide further information on understanding of RNA regulation networks. Practical research on silkworm microRNAs has shown that microRNAs can have significant effects on the underlying mechanisms of silkworm growth processes. In addition to the research that has been done so far, it provides the basis for advances in improving our understanding of RNA regulatory networks and the molecular mechanisms involved in gene expression patterns during different stages of silkworm life. Due to insufficient computational research in the field of silkworm microRNAs, further research on the microRNAs of this species represents an important advance in the study of noncoding RNAs, which can provide further information on the activity of noncoding RNAs. Machine learning algorithms will help the researcher discover the uncover miRNA that many researchers were not able to explore.