• DocumentCode
    3248677
  • Title

    Extracting problematic API features from forum discussions

  • Author

    Yingying Zhang ; Daqing Hou

  • Author_Institution
    Dept. of Comput. Sci., Clarkson Univ., Potsdam, NY, USA
  • fYear
    2013
  • fDate
    20-21 May 2013
  • Firstpage
    142
  • Lastpage
    151
  • Abstract
    Software engineering activities often produce large amounts of unstructured data. Useful information can be extracted from such data to facilitate software development activities, such as bug reports management and documentation provision. Online forums, in particular, contain extensive valuable information that can aid in software development. However, no work has been done to extract problematic API features from online forums. In this paper, we investigate ways to extract problematic API features that are discussed as a source of difficulty in each thread, using natural language processing and sentiment analysis techniques. Based on a preliminary manual analysis of the content of a discussion thread and a categorization of the role of each sentence therein, we decide to focus on a negative sentiment sentence and its close neighbors as a unit for extracting API features. We evaluate a set of candidate solutions by comparing tool-extracted problematic API design features with manually produced golden test data. Our best solution yields a precision of 89%. We have also investigated three potential applications for our feature extraction solution: (i) highlighting the negative sentence and its neighbors to help illustrate the main API feature; (ii) searching helpful online information using the extracted API feature as a query; (iii) summarizing the problematic features to reveal the “hot topics” in a forum.
  • Keywords
    application program interfaces; feature extraction; natural language processing; query processing; software engineering; discussion thread; forum discussions; manually produced golden test data; natural language processing; negative sentiment sentence; online information; problematic API feature extraction; problematic feature summarization; sentiment analysis techniques; Data mining; Dictionaries; Feature extraction; Message systems; Pattern matching; Software; Tutorials; APIs; AWT/Swing; Design Feedback; Information Extraction; Online Forums;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Program Comprehension (ICPC), 2013 IEEE 21st International Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    1063-6897
  • Type

    conf

  • DOI
    10.1109/ICPC.2013.6613842
  • Filename
    6613842