مرکز منطقه ای اطلاع رساني علوم و فناوري - خلاصه‌سازي اسناد كلان داده با استفاده از ويژگي‌هاي معنايي ماتريس فاكتورگيري نامنفي بر پايه پردازش موازي توزيع‌شده هادوپ

شماره ركورد كنفرانس :

3704

عنوان مقاله :

خلاصه‌سازي اسناد كلان داده با استفاده از ويژگي‌هاي معنايي ماتريس فاكتورگيري نامنفي بر پايه پردازش موازي توزيع‌شده هادوپ

عنوان به زبان ديگر :

Big data Summarization using non-negative Matrix Factorization(NMF) by Hadoop and Map-Reduce

پديدآورندگان :

يوسفيان هاشم آباد اميد yousefian.itm@gmail.com نشگاه آزاد -علوم و تحقيقات تهران; , ابطحي عطاء الله aoa.sepehr4@gmail.com دانشگاه آزاد -علوم و تحقيقات تهران; , البرزي محمود mahmood_alborzi@yahoo.com دانشگاه آزاد -علوم و تحقيقات تهران; , يوسفيان هاشم آباد كاوه kaveh_y2002@yahoo.com دانشگاه آزاد - واحد الكترونيكي;

تعداد صفحه :

كليدواژه :

خلاصه‌سازي , كلان داده , هادوپ , ويژگي‌هاي معنايي , ماتريس فاكتورگيري نامنفي , نگاشت كاهش

سال انتشار :

1396

عنوان كنفرانس :

پنجمين كنفرانس بين المللي در مهندسي برق و كامپيوتر با تاكيد بر دانش بومي

زبان مدرك :

فارسي

چكيده فارسي :

در عصر مهبانگ داده و مهبانگ‌محتوا، خلاصه‌سازي متن امروزه به ابزار مهمي براي ارزيابي متن و تفسير وفهم متن تبديل‌شده است. و به همين دليل، به ابزار بسيار مهمي در تصميم‌سازي‌هاي خرد و كلان فردي و اجتماعي و نيز توليد اطلاعات و دانش كاربردي و حتي توليد علم تبديل شده است. خلاصه‌سازي دستي متون بسيار بزرگ براي انسان كار دشواري است. روش‌هاي سنتي خلاصه سازي اسناد محدود به سايز اسناد هستند و قادر به خلاصه سازي اسناد كلان داده بر روي ابر نيستند. اين مقاله، يك متد خلاصه‌سازي كلان داده پيشنهاد مي‌دهد كه از ويژگي‌هاي معنايي استخراج‌شده از ماتريس فاكتورگيري نامنفي با استفاده از پردازش موازي توزيع‌شده در هادوپ استخراج ‌شده‌ است. نتايج تجربي به‌دست‌آمده اين پژوهش نشان مي‌دهد كه متد مذبور به‌خوبي مي‌تواند سايز اسناد كلان داده را با استفاده از پردازش موازي توزيع‌شدهي هادوپ خلاصه كند و در مقايسه با متدهاي خلاصه‌سازي تك گره‌اي از ضريب دقت و بازخواني بهتري برخوردار است.

چكيده لاتين :

The increscent and expansion of Internet data such as, web pages, social networks, smart phones, apps, sensors, and so on, as well as, with the fast growth of the Internet access by users (i.e., laptops, mobile devices, data of IoT, etc.), have grown up data to big data. Big data is a set of data that due to the large volume of data requires special solutions to manage its own Data. The data is so large and bulky that typical software and data management tools cannot perform various operations such as collection, storage, summarization, search, filtering and data processing on them. Therefore, we need to summarize these massive amounts of data. Document summarization is the process of reducing the sizes of documents while maintaining their basic outlines. That is, it should distill the most important information from the document. Document summarization cause Using more resources with higher speed and the result is richer in information. The main advantage of summarization is reducing study time. Traditional methods of documents summarization are restricted and have lacks of necessary performance to summarize Big Data documents, so, this dissertation proposed a method which uses various statistical or natural language processing methods based on distributed parallel processing in connection with Hadoop framework. The proposed method can well represent the inherent structure of big data sets using the semantic feature by the Scalable NMF based on Hadoop MapReduce and also it can summarize the big data document using the distributed parallel processing as well.

كشور :

ايران

لينک به اين مدرک :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=36&DC=291334