• DocumentCode
    731530
  • Title

    StORMeD: Stack Overflow Ready Made Data

  • Author

    Ponzanelli, Luca ; Mocci, Andrea ; Lanza, Michele

  • Author_Institution
    REVEAL @ Fac. of Inf., Univ. of Lugano, Lugano, Switzerland
  • fYear
    2015
  • fDate
    16-17 May 2015
  • Firstpage
    474
  • Lastpage
    477
  • Abstract
    Stack Overflow is the de facto Question and Answer (Q&A) website for developers, and it has been used in many approaches by software engineering researchers to mine useful data. However, the contents of a Stack Overflow discussion are inherently heterogeneous, mixing natural language, source code, stack traces and configuration files in XML or JSON format. We constructed a full island grammar capable of modeling the set of 700,000 Stack Overflow discussions talking about Java, building a heterogeneous abstract syntax tree (H-AST) of each post (question, answer or comment) in a discussion. The resulting dataset models every Stack Overflow discussion, providing a full H-AST for each type of structured fragment (i.e., JSON, XML, Java, Stack traces), and complementing this information with a set of basic meta-information like term frequency to enable natural language analyses. Our dataset allows the end-user to perform combined analyses of the Stack Overflow by visiting the H-AST of a discussion.
  • Keywords
    Java; Web sites; XML; question answering (information retrieval); software engineering; JSON format; StORMeD; XML; configuration files; heterogeneous abstract syntax tree; natural language; question and answer Website; software engineering researchers; source code; stack overflow ready made data; stack traces; term frequency; Data mining; Data models; Grammar; Java; Natural languages; Software; XML; h-ast; island parsing; unstructured data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mining Software Repositories (MSR), 2015 IEEE/ACM 12th Working Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/MSR.2015.67
  • Filename
    7180121