بهبود به‌ روز رساني پايگاه داده تحليلي نيمه‌ آني

عنوان به زبان ديگر

Improving Near Real Time Data Warehouse Refreshment

پديد آورندگان

حضرتي، عيسي دانشگاه آزاد اسلامي مياندوآب - گروه مهندسي كامپيوتر , دانشپور، نگين دانشگاه تربيت دبير شهيد رجايي، تهران - دانشكده مهندسي كامپيوتر

تعداد صفحه

از صفحه

تا صفحه

كليدواژه

پايگاه داده تحليلي نيمه‌ آني , پيوست , جريان داده , تصميم‌گيري

چكيده فارسي

امروزه تصميم‌گيري سريع، اهميت زيادي در محيط كسب و كار دارد. بنابراين مديران سعي دارند تا از داده‌هاي موجود در پايگاه داده تحليلي براي پيش‌بيني و تصميم‌گيري درست استفاده كنند. براي داشتن داده‌هاي مناسب، بايد تغييرات ايجاد‌شده در منابع، با كم‌ترين تأخير در پايگاه داده تحليلي اعمال شوند. براي رسيدن به اين هدف، الگوريتم‌هاي متعددي ارايه شده است كه از آن جمله به الگوريتم X-HYBRIDJOIN مي‌توان اشاره كرد. در اين الگوريتم براي انتخاب پارتيشني از لوح سخت كه در حافظه اصلي بارگزاري مي‌شود از روش مناسبي استفاده نشده است. در اين مقاله الگوريتم جديدي ارائه مي‌شود كه در آن تغييراتي در نحوه انتخاب پارتيشن يادشده، ايجاد شده است. بدين صورت كه براي هر پارتيشني از R كه بر روي لوح سخت قرار دارد، تعداد ركوردهاي موجود از آن پارتيشن در حافظه اصلي، شمارش شده و در آرايه‌اي ثبت ميشود. با استفاده از آرايه به‌دست آمده، هر بار پارتيشني را مي‌توان انتخاب كرد كه شامل بيشترين ركورد براي پيوست است. براي شمارش تعداد ركوردهاي هر پارتيشن، در هنگام ورود جريان داده، بررسي مي‌شود كه جريان داده ورودي مربوط به كدام پارتيشن است. نتايج حاصل از اجراي الگوريتم جديد نشان مي‌دهد كه زمان پيوست و فضاي مصرفي كاهش يافته است.

چكيده لاتين

Near-real time data warehouse gives the end users the essential information to achieve appropriate decisions. Whatever the data are fresher in it, the decision would have a better result either. To achieve a fresh and up-to-date data, the changes happened in the side of source must be added to the data warehouse with little delay. For this reason, they should be transformed in to the data warehouse format. One of the famous algorithms in this area is called X-HYBRIDJOIN. In this algorithm the data characteristics of real word have been used to speed up the join operation. This algorithm keeps some partitions, which have more uses, in the main memory. In the proposed algorithm in this paper, disk-based relation is joined with input data stream. The aim of such join is to enrich stream. The proposed algorithm uses clustered index for disk-based relation and join attribute. Moreover, it is assumed that the join attribute is exclusive throughout the relation. This algorithm has improved the mentioned algorithm in two stages. At the first stage, some records of source table which are frequently accessible are detected. Detection of such records is carried out during the algorithm implementation. The mechanism is in the way that each record access is counted by a counter and if it becomes more than the determined threshold, then it is considered as the frequently used record and placed in the hash table. The hash table is used to keep the frequently used records in the main memory. When the stream is going to enter in to join area, it is searched in this table. At the second stage, the choice method of the partition which is going to load in the main memory has been changed. One dimensional array is used to choose the mentioned partition. This array helps to select a partition of source table with highest number of records for the join among all partitions of source table. Using this array in each iteration, always leads to choose the best partition loading in memory. To compare the usefulness of the suggested algorithm some experiments have been done. Experimental results show that the service rate acquired in suggested algorithm is more than the existing algorithms. Service rate is the number of joined records in a time unit. Increasing service rate causes the effectiveness of the algorithm.

سال انتشار

1397

عنوان نشريه

پردازش علائم و داده ها

فايل PDF

7329477

عنوان نشريه

پردازش علائم و داده ها

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=8&DC=997316