عنوان مقاله :
اراﺋﻪ روﺷﯽ ﺟﺪﯾﺪ ﺑﺮاي ﭘﺎﮐﺴﺎزي دادهﻫﺎ ﺟﻬﺖ ﺑﻬﺒﻮد ﮐﯿﻔﯿﺖ اﻧﺒﺎر داده
عنوان به زبان ديگر :
A new approach for data cleaning to improve quality of data warehouse
پديد آورندگان :
ﺷﻬﻨﻮاز، ﻋﻠﯽ داﻧﺸﮕﺎه آزاد اﺳﻼﻣﯽ واﺣﺪ زﻧﺠﺎن - ﮔﺮوه آﻣﻮزﺷﯽ رﯾﺎﺿﯽ و آﻣﺎر , اﻓﻀﻠﯽ، ﻣﻬﺪي داﻧﺸﮕﺎه آزاد اﺳﻼﻣﯽ واﺣﺪ زﻧﺠﺎن - ﮔﺮوه آﻣﻮزﺷﯽ ﻣﻬﻨﺪﺳﯽ ﻓﻦ آوري اﻃﻼﻋﺎت , رﺣﯿﻢزاده، ﺷﯿﻤﺎ داﻧﺸﮕﺎه ﻋﻠﻮم ﭘﺰﺷﮑﯽ و ﺧﺪﻣﺎت درﻣﺎﻧﯽ و ﺑﻬﺪاﺷﺘﯽ زﻧﺠﺎن
كليدواژه :
ﻣﺪﯾﺮﯾﺖ داده , آﻣﺎدهﺳﺎزي , اﻧﺒﺎر داده , دادهﮐﺎوي , دادهﻫﺎي آﻟﻮده , ﭘﺎكﺳﺎزي
چكيده فارسي :
ﻣﻬﻤﺘﺮﯾﻦ ﻣﺴﺌﻠﻪ در ﻣﺪﯾﺮﯾﺖ دادهﻫﺎ، ﻣﻮﺿﻮع ﮐﯿﻔﯿﺖ داده اﺳﺖ. ﮐﯿﻔﯿﺖ داده ﻣﯽﺗﻮاﻧﺪ ﭘﺎﮐﺴﺎزي دادهﻫﺎ را ﻗﺒﻞ از ﺑﺎرﮔﺬاري ﺑﻪ اﻧﺒﺎر دادهﻫﺎ ﺗﻀﻤﯿﻦﮐﻨﺪ. ﭘﺎﮐﺴﺎزي داده ﻓﻌﺎﻟﯿﺘﯽ اﺳﺖ ﺷﺎﻣﻞ ﻓﺮآﯾﻨﺪ ﺗﺸﺨﯿﺺ و اﺻﻼح اﺷﺘﺒﺎﻫﺎت و ﺗﻨﺎﻗﻀﺎت در اﻧﺒﺎر دادهﻫﺎ. ﺑﻪدﻟﯿﻞ وﺟﻮد اﻃﻼﻋﺎت زﯾﺎد در ﺑﺎﻧﮏﻫﺎي اﻃﻼﻋﺎﺗﯽ ﻣﺸﮑﻼت و ﺗﻨﺎﻗﻀﺎت ﻓﺮاواﻧﯽ درآنﻫﺎ ﺑﻪوﺟﻮدآﻣﺪهاﺳﺖ. ﻫﺪف اﺻﻠﯽ ﻣﺎ اراﺋﻪ روﺷﯽ ﺑﺮاي رﻓﻊ ﺗﻨﺎﻗﻀﺎت ﻣﻮﺟﻮد در ﺑﺎﻧﮏﻫﺎي اﻃﻼﻋﺎﺗﯽ ﺑﺮاي ﭘﺎكﺳﺎزي دادهﻫﺎي آﻟﻮده ﻣﯽﺑﺎﺷﺪ. ﺑﺎ ﻫﺪف ﺑﻬﺒﻮد ﮐﯿﻔﯿﺖ اﻧﺒﺎر داده ﺑﺮاي ﺗﺼﻤﯿﻢﮔﯿﺮيﻫﺎي ﺻﺤﯿﺢ، روش ﺟﺪﯾﺪي اراﺋﻪﺷﺪهاﺳﺖ و ﺑﺮاي آزﻣﺎﯾﺶ روش ﭘﯿﺸﻨﻬﺎدي، از ﺑﺎﻧﮏ اﻃﻼﻋﺎﺗﯽ ﺷﻨﺎﺳﻨﺎﻣﻪ ﺳﻼﻣﺖ داﻧﺸﺠﻮﯾﺎن داﻧﺸﮕﺎه ﻋﻠﻮم ﭘﺰﺷﮑﯽ زﻧﺠﺎن ورودي ﺳﺎلﻫﺎي 92 و 93، ﺷﺎﻣﻞ 845ﻧﻔﺮ ﮐﻪ در ﺣﺎل ﺣﺎﺿﺮ ﻫﻤﻪ آنﻫﺎ ﻓﺎرغ اﻟﺘﺤﺼﯿﻞ ﺷﺪهاﻧﺪ ﺑﻪﻋﻨﻮان دادهﻫﺎي ﻣﻮرد ﺑﺮرﺳﯽ اﺳﺘﻔﺎدهﺷﺪهاﺳﺖ. ﺑﺮﻧﺎﻣﻪ ﭘﯿﺸﻨﻬﺎدي ﺑﺎ زﺑﺎن ﺑﺮﻧﺎﻣﻪﻧﻮﯾﺴﯽ ﺳﯽﺷﺎرپ ﭘﯿﺎدهﺳﺎزي و اﺟﺮاﺷﺪهاﺳﺖ. ﺑﺮﻧﺎﻣﻪ ﯾﺎ اﭘﻠﯿﮑﯿﺸﻦ ﻣﺎ در ﭼﻬﺎر ﻻﯾﻪ و ﺑﻪﺻﻮرت وﯾﻨﺪوز اﭘﻠﯿﮑﯿﺸﻦ ﻧﻮﺷﺘﻪﺷﺪهاﺳﺖ. از ﻃﺮﯾﻖ اﺟﺮاي روش ﭘﯿﺸﻨﻬﺎدي ﺗﻮاﻧﺴﺘﯿﻢ ﺑﺎ ﺑﺮرﺳﯽ ﮐﺪﻣﻠﯽ داﻧﺸﺠﻮﯾﺎن، دادهﻫﺎي آﻟﻮده در اﯾﻦ ﻣﺸﺨﺼﻪ را ﺗﺸﺨﯿﺺداده و ﺳﭙﺲ ﻓﺮآﯾﻨﺪ اﺻﻼح داده را روي آنﻫﺎ اﻋﻤﺎلﻧﻤﺎﯾﯿﻢ. ﺑﺮاﺳﺎس ﻧﺘﺎﯾﺞ ﺑﻪدﺳﺖآﻣﺪه، ﻣﯿﺰان داده آﻟﻮده در اﻧﺒﺎر داده ﺗﻮﻟﯿﺪﺷﺪه از 25,79 درﺻﺪ ﺑﻪ 4,97 درﺻﺪ ﮐﺎﻫﺶﯾﺎﻓﺖ.
چكيده لاتين :
Data management provides a tool that the information organization needs will be answered based on that properly. The most important issue in business intelligence is data quality. Data quality can guarantee data cleaning before uploading it to the data warehouse. Data cleaning is a procedure which includes the process of errors detection and correction and inconsistencies in the data warehouse. Because of the huge number of data in databases many problems and contradictions have been emerged. The main goal of this study is to remove inconsistencies in the databases in order to clean up the dirty data. A new approach with the purpose of improving the quality of data warehouse for correct decisions has been provided. For testing the proposed approach, data collection of student health certificate were used. Through the implementation of this approach we have been able to detect dirty data and then with using students’ national code, the correction process has been applied to them. Based on the achieved results, the amount of dirty data decreased from %25.79 to %4.97.
عنوان نشريه :
سيستمهاي پردازشي و ارتباطي چندرسانهاي هوشمند