ABSTRACT: as automatically recognizable language. For Bangla language the

ABSTRACT:

We present
a new technique to recognize bangle character based on a multi point feature
extraction. The main focus was to read the characters from number plate images.
The idea is in its primary stage and further research is being conducted. We
have also used the gravitational algorithm introduced to improve canny
algorithm of edge detection to find the different intensity levels between the
characters and white space. This is a qualitative proposal based on previous
works. But few test results will be included too.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

1.INTRODUCTION:

 Optical character recognition (OCR) has been a
fascinating field of research because of its immense possibility. Improvement
in this sector can improve the way human and machine interacts. In some extent
every language has been under research to find out its usability as
automatically recognizable language. For Bangla language the research has not
been enough although some exciting papers are being published. There are many
techniques are introduced for this purpose. But it is difficult because huge
number of available alphabets. The most difficult part is to find out the
compound characters with different shapes. In our project we will use the
gravitational algorithm and multiple extraction points to decompose the
characters and recognize them with automation.

 

2.PREVIOUS
WORKS:

A number
of researches have been conducted in this matter. In 1 the authors shape
decomposition to find the characters. The method is more suitable for compound
characters. A working algorithm is also provided.

 In 2 the authors decomposed different
documents to find out text and then recognize the characters. The decomposition
process is based on detection of edge of the free space of given documents like
newspaper, textbook etc.

In 3
yellow algorithm has been used to find out the ROI of an image of number plate
in recognize the characters. This is an improvement on the research being
conducted since 1976.

In
4  research on complex Urdu Nastaq
character recognition is conducted. The technique is based on stacked denoising
encoder.

In 5
automated recognition of printed bangle character is the purpose. A geometric
approach has been used. The curvature of character was taken into consideration
for segmentation of different characters. Finally a perception based algorithm
is proposed for automation and learning.

In 6 a
new algorithm based on Newton’s gravitational law is introduced for edge
detection. The equation determines different intensity levels of an Image and
based on the histogram finds out the edge. It is an improvement on traditional
Canny algorithm.

 

3.CHARACTERISITIC
OF BANGLA LANGUAGE:

The
importance of Bangla language can be acknowledged from the fact that it is the
seventh most spoken language in the world. Bangla is also the national language
of Bangladesh. Over 200 million people speaks Bangla language. So, Bangla has a
considerable significance as a language and script both. The Bangla script has
the following structural and syntactic properties as mentioned below:

• The
writing style is from left to right and the concept of upper and lower case is
absent in this script.
• Most of the characters in this script have a horizontal line at the upper
part of the character called ‘matra’.
• The character set of Bangla is divided into two categories: basic and compound
characters.
• The basic characters are an agglomeration of vowels and consonants. There are
11 vowels and 39 consonants in this script. In this research we mainly focus on
consonants.

 

 

4.PROPOSED METHODOLOGY:
Here we propose a new way to recognize Bangla character. It is divided into
three major parts such as preprocessing, segmentation and pattern matching.

1.      PREPROCESSING:

Preprocessing  are the steps
performed before segmentation. It consists of Binarization of the image,
removal of noise and scaling.

 

1.1  
Binarization
A natural way of binarization 9 is through threshold. Threshold creates
binary images from grey level ones by turning all pixels below some threshold
to zero and all pixels about that threshold to one. The simplest binarization
technique is to use a global fixed threshold. Otsu’s thresholding method is
used for automatic binarization level decision.

1.2  
Noise Removal
Noise reduction is a typical pre-processing step to improve the
results of later processing. It is performed using a window consisting of an
odd number of input data samples. It is one kind of smoothing technique as
well. All smoothing techniques are effective at removing noise in smooth
patches or smooth regions of a signal, but adversely affect edges.

1.3  
 Scaling
Each Line of text can be scaled into different sizes and shapes.
To extract feature from it, it must be scaled into a standard size. Depending
on the height and width of stored image, size of scanned image is scaled. In
this paper each line of characters is scaled to height of 70.

 

2.      SEGMENTATION:

Segmentation is an operation that seeks to decompose an image of
sequence of characters into sub images of individual symbols. Character
segmentation is a key requirement that determines efficient Character
Recognition systems. It includes word and character segmentation. In word
segmentation each word is separated from each other. To do that the image is
scanned vertically in which frequency of black pixels in each row is counted in
order to construct the row histogram.

Next major step is the removal of Matra. Each character in Bangla language is
connected by a line called matra. To separate each character Matra is removed.
Detection of Matra is simple. The image is scanned horizontally and the number
of 1 (one) is counted. If it’s about equal to the number of pixels in a row,
then it’s detected as a Matra and removed it. After this step each character is
segmented and scaled to 60×70 matrix.

 

 

3.      PATTERN
MATCHING

We propose 21 zones in a 60×70 matrix. The zones are given below:

FIG: first
16 zones

FIG: ZONE
17-20

                                          
FIG:ZONE 21

Each character segment 
will be matched for the predefined zones and histogram will be made.
From the histogram the character will be recognized. For the matching we will
use gravitational algorithm proposed in 6. First step is Gaussian filtering:

G ( x, y ) = exp-( x 2 + y 2 ) / 2 ? 2 / 2 ?? 2 (1)

And then gravitational equation is used to determine pixel
wise intensity.

Here G is constant and m is the intensity of the pixel
according to gray scale.

These pixel wise intensity values will be put into a
histogram to find the percentage of each zones it lies into. From this we can
determine the character form pre defined data set.

4.RESULT:

Due to insufficient time we couldn’t arrange and test the
proposal properly. As this is an qualitative research the base of the proposal
is important here. In 7 a similar method is used in English OCR and the
resultant table can be used to assume the results of this paper’s proposal.

FIG: Accuracy of English OCR using proposed technique

Pre defined data set had been used for this purpose.
Currently we are trying to build and design the data set. We will use MATLAB as
the operating tool.

 

5.CONCLUSION:

Though recently Bangla OCR has emerged, still there is a long
way to go to find the near perfect result. This paper has proposed a method of
categorizing each character into 21 zones and match the input image after
processing it into character wise 
segmentation. In future this proposal can be improved by using large
tested data set. Further researches have to be conducted in this sector. This
proposal will not be very effective to find out compound letters and characters
with additional dots. The main focus of this paper is to recognize characters
of license plates.

 

6.REFERENCES:

1)      Rahul
Pramanik?, Soumen Bag ‘Shape decomposition-based handwritten
compound character recognition for Bangla OCR’2017

2)     
MD. Farhad Hossain, Tasmin Afroz, Sabir Ismail
‘Document Decomposition of Bangla Printed Text’ 2017

3)     
Mohammad Tahir Qadri, Mohammad Asif ‘AUTOMATIC
NUMBER PLATE RECOGNITION SYSTEM FOR VEHICLE
IDENTIFICATION USING OPTICAL CHARACTER RECOGNITION’2009

4)     
Ibrar Ahmed, Ruifan Li, Xiaojie Wang ‘Ofine
Urdu Nastaleeq Optical Character Recognition
Based on Stacked Denoising Autoencoder’2007

5)      Shyla
Afroz, Boshir Ahmed, Ali Hossain ‘Bangla Optical Character Recognition through
Segmentation using Curvature Distance and Multilayer Perceptron Algorithm’2017

6)     
Weibin
Rong, Zhanjing Li, Wei Zhang and Lining Sun ‘An
Improved Canny Edge Detection Algorithm’
2014

7)     
Sandip
Kundu, Hrishi Singh Chhabra, Sahi Summa Ara, Rishi Prakash Mishra ‘Optical
Character Recognition Using 26-Point Feature
Extraction and ANN’ 2017