عنوان مقاله :
اراﺋﻪ روﺷﯽ ﻣﺒﺘﻨﯽ ﺑﺮ راي ﮔﯿﺮي ﺑﺮاي ﺗﺮﮐﯿﺐ ﺧﺮوﺟﯽ ﻫﺎي ﺷﺒﮑﻪ ﻫﺎي ﻋﻤﯿﻖ 1ﺟﻬﺖ آﻧﺎﻟﯿﺰ 2ﻗﺎﻟﺐ ﺑﻨﺪي 3اﺳﻨﺎد ﭼﺎﭘﯽ
عنوان به زبان ديگر :
Providing a Voting-Based Method for Combining Deep Neural Network Outputs to Layout Analysis of Printed Documents
پديد آورندگان :
ﻓﺎﺗﺢ، اﻣﯿﺮرﺿﺎ دانشگاه صنعتي شاهرود - دانشكده مهندسي كامپيوتر , رﺿﻮاﻧﯽ، ﻣﺤﺴﻦ دانشگاه صنعتي شاهرود - دانشكده مهندسي كامپيوتر , ﺗﺠﺮي، ﻋﻠﯿﺮﺿﺎ دانشگاه صنعتي شاهرود - دانشكده مهندسي كامپيوتر , ﻓﺎﺗﺢ، ﻣﻨﺼﻮر دانشگاه صنعتي شاهرود - دانشكده مهندسي كامپيوتر
كليدواژه :
تقسيمبندي تصوير , آناليز قالببندي سند , آشكارسازي متن , آشكارسازي تصوير , رايگيري
چكيده فارسي :
در ﭼﻨﺪ دﻫﻪ ﮔﺬﺷﺘﻪ، ﺗﺤﻘﯿﻘﺎت ﻓﺮاواﻧﯽ در زﻣﯿﻨﻪ OCR ﯾﺎ ﻧﻮﯾﺴﻪ ﺧﻮان ﻧﻮري اﻧﺠﺎم ﺷـﺪه اﺳـﺖ. ﻧﻮﯾﺴـﻪ ﺧﻮان ﻧـﻮري، ﯾﮑـﯽ از راه ﻫـﺎي ﺗﺒﺪﯾﻞ ﺗﺼﺎوﯾﺮ ﻣﺘﻨﯽ ﺑﻪ ﻣﺘﻦ ﻗﺎﺑﻞ وﯾﺮاﯾﺶ و ﺷﻨﺎﺳﺎﯾﯽ ﺣﺮوف و ﮐﻠﻤﺎت ﺑﻪ ﺻﻮرت ﺧﻮدﮐﺎر اﺳـﺖ. ﺗﺸـﺨﯿﺺ ﻣﻨـﺎﻃﻖ ﻣﺘﻨـﯽ و ﻏﯿﺮﻣﺘﻨـﯽ درون ﺳﻨﺪ ﺑﻪ آﻧﺎﻟﯿﺰ ﻗﺎﻟﺐ ﺑﻨﺪي اﺳﻨﺎد ﺷﻨﺎﺧﺘﻪ ﻣﯽ ﺷﻮد و ﯾﮑﯽ از ﮔﺎم ﻫﺎي ﮐﻠﯿﺪي در روﻧﺪ ﺗﺒﺪﯾﻞ ﺗﺼﻮﯾﺮ ﺳﻨﺪ ﺑﻪ ﻣﺘﻦ ﻗﺎﺑﻞ وﯾﺮاﯾﺶ اﺳﺖ. ﺟﺪاﺳﺎزي ﻣﻨﺎﻃﻖ ﻣﺘﻨﯽ و ﻏﯿﺮﻣﺘﻨـﯽ درون ﯾـﮏ ﺗﺼـﻮﯾﺮ از ﺗﺎﺛﯿﺮﮔـﺬارﺗﺮﯾﻦ ﭘﯿﺶ ﭘﺮدازش ﻫـﺎي ﻣﻤﮑـﻦ در ﺳﯿﺴـﺘﻢ ﻫﺎي ﻧﻮﯾﺴـﻪ ﺧﻮان ﻧـﻮري اﺳﺖ. ﻧﺒﻮدن ﯾﮏ ﻗﺎﻟﺐ ﯾﮑﺴﺎن در ﺗﻤﺎﻣﯽ ﺻﻔﺤﺎت، وﺟﻮد ﭘﺲ زﻣﯿﻨﻪ ﻫﺎي ﭘﯿﭽﯿﺪه، ﻧﻮﯾﺰﻫﺎي ﻣﺨﺘﻠﻒ، ﮐﯿﻔﯿﺖ ﭘـﺎﯾﯿﻦ، ﭼـﺮﺧﺶ ﺗﺼـﺎوﯾﺮ و ﺗﺼﺎوﯾﺮ ﭼﻨﺪﯾﻦ ﺳﺘﻮﻧﻪ ﻣﺎﻧﻊ از ﺷﻨﺎﺳﺎﯾﯽ درﺳﺖ ﻣﻨﺎﻃﻖ ﺣﺎوي ﻣﺘﻦ ﻣﯽ ﺷﻮﻧﺪ. ﻋﺪم ﺗﺸﺨﯿﺺ درﺳﺖ ﻣﻨـﺎﻃﻖ ﺣـﺎوي ﻣـﺘﻦ و ﺑـﻪ ﺗﺒـﻊ آن ﻋﺪم ﺗﺸﺨﯿﺺ ﺻﺤﯿﺢ ﻣﺨﺘﺼﺎت ﺧﻄﻮط، ﺗﻤﺎﻣﯽ ﺑﺨﺶ ﻫﺎي ﺑﻌـﺪي ﯾـﮏ ﺳﯿﺴـﺘﻢ ﻧﻮﯾﺴـﻪ ﺧﻮان ﻧـﻮري را دﭼـﺎر اﺧـﻼل ﻣﯽ ﮐﻨـﺪ. در اﯾـﻦ ﺗﺤﻘﯿﻖ، روﺷﯽ ﻧﻮﯾﻦ ﺑـﺮاي ﺗﺸـﺨﯿﺺ ﻣﻨـﺎﻃﻖ ﻣﺘﻨـﯽ درون ﺗﺼـﻮﯾﺮ اراﺋـﻪ ﺷـﺪه اﺳـﺖ. روش ﭘﯿﺸـﻨﻬﺎدي، ﺑـﺎ ﺑﮑـﺎرﮔﯿﺮي از ﭼﻨـﺪﯾﻦ روش ﻣﺨﺘﻠﻒ و اﺳﺘﻔﺎده از ﺳﯿﺴﺘﻢ راي ﮔﯿﺮي در ﻣﯿﺎن آن ﻫﺎ، ﻣﻨﺎﻃﻖ ﻣﺘﻨﯽ ﺗﺼﻮﯾﺮ را اﺳﺘﺨﺮاج ﻣﯽ ﻧﻤﺎﯾﺪ ﮐﻪ ﺗﺎ ﮐﻨـﻮن در ﮐﺎرﻫـﺎي ﭘﯿﺸـﯿﻦ از آن ﺑﻬﺮه ﮔﺮﻓﺘﻪ ﻧﺸﺪه اﺳﺖ. روش ﭘﯿﺸﻨﻬﺎدي ﺑﺮ روي دادﮔﺎﻧﯽ از ﺗﺼﺎوﯾﺮ ﺑﺎ ﺑﯿﺶ از 950 ﺻﻔﺤﻪ ﻣـﻮرد آﻣـﻮزش و آزﻣـﻮن ﻗـﺮار ﮔﺮﻓﺘـﻪ اﺳـﺖ ﮐﻪ ﻧﺘﺎﯾﺞ آزﻣﻮن ﺣﺎﮐﯽ از اراﺋﻪ دﻗﺖ 97٬94% در روش ﭘﯿﺸﻨﻬﺎدي اﺳﺖ. ﻣﺠﻤﻮﻋﻪ دادﮔﺎن اراﺋﻪ ﺷﺪه در اﯾﻦ ﻣﻘﺎﻟـﻪ ﺑـﻪ ﺻـﻮرت آزاد در دﺳﺘﺮس اﺳﺖ.
چكيده لاتين :
In the last few decades, a lot of research has been done in the field of OCR or optical character recognition. Optical character recognition is one of the ways to convert text images to editable text and recognize letters and words automatically. Recognizing textual and non-textual areas within a document is known as document layout analysis, and is one of the key steps in the process of converting a document image to editable text. Separating textual and non-textual areas within an image is one of the most effective possible preprocesses in optical character recognition systems. The lack of the same template on all pages, the presence of complex backgrounds, different kinds of noises, low quality, image rotation, and the existence of more than one text column prevent the correct recognition of areas containing text. Failure to correctly recognize areas containing text and, consequently, incorrect recognition of line coordinates will disrupt all subsequent parts of an optical character recognition system. In this research, a new method has been proposed to recognize textual areas within the image. The proposed method, using various methods and using a voting system among them, extracts the textual areas of the image. The proposed method has been trained and tested on a dataset with more than 950 images and reached 97.94% accuracy. The presented dataset in this article is open access.
عنوان نشريه :
ماشين بينايي و پردازش تصوير