DocumentCode
3540100
Title
Stably extracting text contents from email messages with Python
Author
Sun, Stephen Yong
Author_Institution
Firstwave Technol., VIC, Australia
fYear
2009
fDate
4-6 Aug. 2009
Firstpage
199
Lastpage
203
Abstract
Extracting text contents from email messages is a fundamental task in email processing, such as spam mail identifying and email filtering. Although Python is a rapid application development language, there is not a library in Python which can efficiently and stably accomplish this task when facing versatile email formats in a real application. This paper proposes an approach to fulfill the task with three software layers. How to automatically evaluate it in a busy server environment has also been documented. It has been deployed in our email processing platform to extract text content of email messages on 24 hours per day and 7 days per week base. Its stable and effective performance improves our email filtering service to our customers. The principles in the approach can also be adopted to stabilize the performance of other software.
Keywords
electronic mail; electronic messaging; hypermedia markup languages; information retrieval; software architecture; text analysis; HTML file; Python-rapid application development language; email message; email processing; software architecture; text content extraction; Application software; Business; Data mining; Filtering; HTML; Intellectual property; Postal services; Protection; Sun; Unsolicited electronic mail;
fLanguage
English
Publisher
ieee
Conference_Titel
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
Conference_Location
London
Print_ISBN
978-1-4244-4456-4
Electronic_ISBN
978-1-4244-4457-1
Type
conf
DOI
10.1109/ICADIWT.2009.5273961
Filename
5273961
Link To Document