• DocumentCode
    3540100
  • Title

    Stably extracting text contents from email messages with Python

  • Author

    Sun, Stephen Yong

  • Author_Institution
    Firstwave Technol., VIC, Australia
  • fYear
    2009
  • fDate
    4-6 Aug. 2009
  • Firstpage
    199
  • Lastpage
    203
  • Abstract
    Extracting text contents from email messages is a fundamental task in email processing, such as spam mail identifying and email filtering. Although Python is a rapid application development language, there is not a library in Python which can efficiently and stably accomplish this task when facing versatile email formats in a real application. This paper proposes an approach to fulfill the task with three software layers. How to automatically evaluate it in a busy server environment has also been documented. It has been deployed in our email processing platform to extract text content of email messages on 24 hours per day and 7 days per week base. Its stable and effective performance improves our email filtering service to our customers. The principles in the approach can also be adopted to stabilize the performance of other software.
  • Keywords
    electronic mail; electronic messaging; hypermedia markup languages; information retrieval; software architecture; text analysis; HTML file; Python-rapid application development language; email message; email processing; software architecture; text content extraction; Application software; Business; Data mining; Filtering; HTML; Intellectual property; Postal services; Protection; Sun; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
  • Conference_Location
    London
  • Print_ISBN
    978-1-4244-4456-4
  • Electronic_ISBN
    978-1-4244-4457-1
  • Type

    conf

  • DOI
    10.1109/ICADIWT.2009.5273961
  • Filename
    5273961