كليدواژه :
كنترل كيفي , داده هاي ازن سطحي , خطاي فاحش , ابزار AutoQA4Env
چكيده فارسي :
بيتوجهي به وجود خطاهاي متعدد شامل خطاي فاحش، اعداد ثابت و غيره در داده مي تواند به نتايج نادرست در تحليل داده ها منجر شود؛ ازاين رو كنترل كيفي داده گامي ضرورري جهت حصول اطمينان از صحت داده است. در دسترس نبودن واقعيت، سبب پيچيدگي در تشخيص خطا و انجام دادن كنترل كيفي داده مي شود. روش ها و آزمايش هاي آماري گوناگوني براي كنترل كيفي داده وجود دارد، ولي هيچ يك يافتن تمامي خطاها را در داده ضمانت نمي كنند. اجراي هرچه بيشتر آزمايش ها سبب افزايش اطمينان نسبي از كيفيت داده مي شود. در اين مطالعه به دليل اهميت و ضرورت مطالعه آلاينده ازن سطحي، كيفيت اين داده ها در سطح شهر تهران بررسي شد. كنترل كيفي داده ها با استفاده از ابزار AutoQA4Env انجام شد. اين ابزار متشكل از مجموعه آزمايش هاي آماري گروهبندي شده در دو حالت پايه و پيشرفته است. از ويژگي هاي خاص اين ابزار، تنظيمات كاربري، تكرارپذيري و گسترش پذيري آن است. نتايج اجراي اين ابزار در حالت پايه، حاكي از وجود خطاي فاحش در برخي از داده ها بود كه اين موضوع بهمنزله لزوم بررسي كنترل كيفي داده پيش از بهكارگيري آن است. از طرف ديگر، در برخي موارد نشان داده شد اجراي ابزار در حالت پايه كافي نيست و كاربست ابزار در حالت پيشرفته مناسب تر است.
چكيده لاتين :
Being an inseparable part of environmental data, errors are generated due to several reasons, either natural or artificial. The first is produced from natural phenomena such as animal activities, storms, floods, etc. The later can be generated via human activities during data collecting, entering and processing that can be intentional or unintentional. Since errors can affect results of any analysis, distinguishing them via quality control is a prerequisite of any data usage. Because of unknown truth, this seemingly simple task becomes challenging. Although many efforts have been devoted to develop tests and tools for distinguishing errors in data, none of them can guarantee that all errors can be found. It is important as much as orthogonal testing to find more errors. Here we used a tool named AutoQA4Env, which has been developed for an automated quality control of environmental data. This tool consists of a series of statistical tests which have been used in various communities and organizations such as World Meteorological Organization and Environmental Protection Agency. The tests have been classified in several groups, based on their strictness. The tool has a setting menu by which users can add tests and modify the thresholds. Two versions of the tool, namely basic and advanced flagging system are open source and accessible via b2share. The tool was tested for the quality control of a set of data series of surface ozone measured at the pollution monitoring stations in the city of Tehran. These data are an important source to get information about the pollution levels and trends in Tehran; thus knowing their quality can improve and reduce the uncertainties in the results. The results indicate that gross errors exist in the most of the stations’ data, even though these data are published and are publicly available. Applying the tool in the basic state finds most of the errors. About 0.02% of the data were erroneous for three years of data at 15 stations. Binary flagging system of the tool labels these failure data as an unacceptable data, although they were in fact acceptable. The advanced state of the tool was more moderate than the basic one and corrected these labels. In this state, 57.7% of the unacceptable data in the basic state were distinguished as a suspected value and only 5.6% of them were unacceptable. Therefore, we can conclude that the AutoQA4Env even at this stage could find and flag most of the data errors, at least gross errors. Besides, the advanced flagging system of the tool reduces errors in labeling.