• DocumentCode
    316960
  • Title

    Similarity detection among data files-a machine learning approach

  • Author

    Dash, M. ; Liu, H.

  • Author_Institution
    Dept. of Inf. Syst. & Comput. Sci., Nat. Univ. of Singapore, Singapore
  • fYear
    1997
  • fDate
    35738
  • Firstpage
    172
  • Lastpage
    179
  • Abstract
    In any database, description files are essential to understand the data files in it. However, it is not uncommon that one is left with data files without any description file. An example is the aftermath of a system crash; other examples are related to security problems. Manual determination of the subject of a data file can be a difficult and tedious task, particularly if files look alike. An example is a big survey database where data files that look alike are actually related to different subjects. Two data files on the same subject will probably have similar semantic structures of attributes. We detect the similarity between two attributes. Then we create clusters of attributes to compare the similarity of the subjects of two data files. Finally, a machine learning technique is used to predict the subject of unseen data files
  • Keywords
    file organisation; learning (artificial intelligence); pattern matching; attribute clusters; data file similarity detection; data file subject; database; description files; file attribute similarity; machine learning; security problems; semantic structures; surveys; system crash; unseen data files; Computer crashes; Computer science; Data security; Dictionaries; Engineering profession; Image databases; Information security; Information systems; Machine learning; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge and Data Engineering Exchange Workshop, 1997. Proceedings
  • Conference_Location
    Newport Beach, CA
  • Print_ISBN
    0-8186-8230-2
  • Type

    conf

  • DOI
    10.1109/KDEX.1997.629863
  • Filename
    629863