• DocumentCode
    2922037
  • Title

    Clustering patent document in the field of ICT (Information & Communication Technology)

  • Author

    Widodo, Agus ; Budi, Indra

  • Author_Institution
    Fac. of Comput. Sci., Univ. of Indonesia, Jakarta, Indonesia
  • fYear
    2011
  • fDate
    28-29 June 2011
  • Firstpage
    203
  • Lastpage
    208
  • Abstract
    The current classification of patent data that refers to the IPC (International Patent Classification) of the WIPO (World Intellectual Property Organization), deemed not reflect the classification of the field of ICT (Information & Communication Technology). ICT applications are usually included in sections G (Physics) and H (Electricity). This paper will evaluate the eight groupings of patents based on the IPC classes (G01, G06, G09, G11, H01, H03, H04, and H06) of patents registered in the Directorate General of Intellectual Property Rights in Indonesia, from the year 1991 to 2000. The algorithm used to grouping is KMeans, KMeans++, Hierchical Clustering, and a combination of these three algorithms with SVD (Singular Value Decomposition). For external validation, Purity and F-Measure are used, whereas Silhouette is used for internal validation. From the experimental results it can be concluded that SVD provides improvements to the clustering results. In addition, the use of abstract does not necessarily improve the performance of clustering, and the use of phrase does not always yield better cluster than the use of the word as index. Moreover, no cluster has purity measure greater than 50%, which means that the existing IPC classification has not been able to accommodate the field of ICT appropriately.
  • Keywords
    document handling; information technology; patents; pattern classification; pattern clustering; singular value decomposition; F-Measure validation; ICT field; KMeans++ algorithm; SVD; Silhouette validation; WIPO; hierarchical clustering; information and communication technology; international patent classification; patent data classification; patent document clustering; patent registration; singular value decomposition; world intellectual property organization; Abstracts; Clustering algorithms; Indexing; Matrix decomposition; Patents; Singular value decomposition; Clustering; Information & Communication Technology; Kmeans; Patent; Singular Value Decomposition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Technology and Information Retrieval (STAIR), 2011 International Conference on
  • Conference_Location
    Putrajaya
  • Print_ISBN
    978-1-61284-354-4
  • Electronic_ISBN
    978-1-61284-353-7
  • Type

    conf

  • DOI
    10.1109/STAIR.2011.5995789
  • Filename
    5995789