• DocumentCode
    3314834
  • Title

    Modified weighting method in TF∗IDF algorithm for extracting user topic based on email and social media in Integrated Digital Assistant

  • Author

    Pramono, Luthfan Hadi ; Rohman, Arief Syaichu ; Hindersah, Dan Hilwadi

  • Author_Institution
    Electr. Eng. Dept., Bandung Inst. of Technol., Bandung, Indonesia
  • fYear
    2013
  • fDate
    26-28 Nov. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Integrated Digital Assistant (IDA) is a system designed to be a "personal secretary" who worked in full for the user. IDA will be active when the user is relaxing at home, office activities and even while traveling or outside activities. IDA works to minimize the interaction between user and system. The system will be able to find out information from the outside that is needed by users by searching users\´ topics through email and social media data. Searching and extracting user interest or topics in social media and email data of IDA is using TF*IDF weighting modification algorithm named TF*IDF*DF which is extend of TF*IDF method. Expected with TF*IDF weighting modification algorithm, topics that obtained more representative and in accordance with the information needed by the user. From extraction by using TF*IDF*DF, the number of terms (words) that has a value of document frequency (df) more than one are increases. On the other hand the computational load is also increasing due to the multiplier factor of df. News taken based on the extracted topic using the TF*IDF*DF increased and more diverse. The term from topic extraction result still have noisy text that not appropriate to grammar writing and need to be fixed, so the term that found will be more perfect.
  • Keywords
    Internet; electronic mail; information retrieval; social networking (online); text analysis; IDA; TF*IDF*DF algorithm; document frequency; email; grammar writing; integrated digital assistant; modified weighting method; personal secretary; social media; user topic extraction; Algorithm design and analysis; Conferences; Data mining; Electronic mail; Feature extraction; Media; Twitter; TF∗IDF; feature selection; topic extraction; topic model; user topic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Rural Information & Communication Technology and Electric-Vehicle Technology (rICT & ICeV-T), 2013 Joint International Conference on
  • Conference_Location
    Bandung
  • Print_ISBN
    978-1-4799-3363-1
  • Type

    conf

  • DOI
    10.1109/rICT-ICeVT.2013.6741547
  • Filename
    6741547