DocumentCode :
3540100
Title :
Stably extracting text contents from email messages with Python
Author :
Sun, Stephen Yong
Author_Institution :
Firstwave Technol., VIC, Australia
fYear :
2009
fDate :
4-6 Aug. 2009
Firstpage :
199
Lastpage :
203
Abstract :
Extracting text contents from email messages is a fundamental task in email processing, such as spam mail identifying and email filtering. Although Python is a rapid application development language, there is not a library in Python which can efficiently and stably accomplish this task when facing versatile email formats in a real application. This paper proposes an approach to fulfill the task with three software layers. How to automatically evaluate it in a busy server environment has also been documented. It has been deployed in our email processing platform to extract text content of email messages on 24 hours per day and 7 days per week base. Its stable and effective performance improves our email filtering service to our customers. The principles in the approach can also be adopted to stabilize the performance of other software.
Keywords :
electronic mail; electronic messaging; hypermedia markup languages; information retrieval; software architecture; text analysis; HTML file; Python-rapid application development language; email message; email processing; software architecture; text content extraction; Application software; Business; Data mining; Filtering; HTML; Intellectual property; Postal services; Protection; Sun; Unsolicited electronic mail;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Digital Information and Web Technologies, 2009. ICADIWT '09. Second International Conference on the
Conference_Location :
London
Print_ISBN :
978-1-4244-4456-4
Electronic_ISBN :
978-1-4244-4457-1
Type :
conf
DOI :
10.1109/ICADIWT.2009.5273961
Filename :
5273961
Link To Document :
بازگشت