• DocumentCode
    2262765
  • Title

    Improving grid monitoring with data quality assessment

  • Author

    Liu, Wei ; Luo, Tiejian ; Song, Jinliang ; Chen, Su

  • Author_Institution
    Graduate Univ. of Chinese Acad. of Sci., Beijing
  • fYear
    2007
  • fDate
    17-19 Oct. 2007
  • Firstpage
    1534
  • Lastpage
    1539
  • Abstract
    As Grid emerges as a cyber-infrastructure for the next-generation of e-Science applications, monitoring Grid becomes a very significant task. A typical Grid application is composed of a large number of resources that can fail, including network, hardware and software. Even when monitoring information from all these components is accessible, it is hard to determine whether anomalies and failures during the execution are related to a particular job. However receiving intermediate results and interacting with applications play a key role for users in reality. Considering the complexity of implementation and the large scope the monitoring system covers, there is no doubt we will face incomplete and duplicate data in many applications. Overcoming data heterogeneity is a long standing problem in the Grid research communities. It will be a disaster to handle large amount of inaccurate information where the quality of data is very poor. Fortunately, a wide spectrum of applications exhibit strong dependencies among data samples, the readings of nearby sensors are generally correlated, and the components are connected with interactions. Such relations can be used for promoting the quality of the recorded data. This paper proposes a data cleaning approach oriented Grid monitoring model, which is based on modeling data dependencies based on entity relation graph. We bring effective data quality preprocessing approach into the Grid applications monitoring model, which is critical because many real-world Grid datasets are not perfect, but rather they contain missing, erroneous, duplicate data and other data quality problems.
  • Keywords
    data analysis; data flow graphs; data mining; data models; entity-relationship modelling; grid computing; monitoring; cyber-infrastructure; data cleaning approach; data dependency modelling; data heterogeneity; data quality assessment; data quality preprocessing approach; e-science application; entity relation graph; grid monitoring model; Information technology; Monitoring; Quality assessment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications and Information Technologies, 2007. ISCIT '07. International Symposium on
  • Conference_Location
    Sydney,. NSW
  • Print_ISBN
    978-1-4244-0976-1
  • Electronic_ISBN
    978-1-4244-0977-8
  • Type

    conf

  • DOI
    10.1109/ISCIT.2007.4392260
  • Filename
    4392260