Abstract: A human eye can read the
characters of a language written legibly either by a person, or when it is
printed. Making the machine do the same, is broadly the problem of 'Character
Recognition'. This implies endowing the machine with the capability to
approximate and thus identify characters correctly, in some sense, a similar
capability in human beings. The Online/Dynamic Character Recognition is a
process of identifying a hand-written character. This paper gives an insight
into the problem, using the Hindi language as a model. The paper addresses and
enumerates the various ways and methodologies to tackle the problem. This
problem 'Online/Dynamic Character Recognition' comes under the purview of
Pattern Recognition and Image Processing.
The
problem of Character Recognition is to make the machine recognize the
characters written, as stated above. The goal of the problem is basically to
achieve character recognition accuracy, which is very close to the superb
capability exhibited by human beings for performing such tasks. Thus, a
character recognition system should be able to exhibit a certain degree of
intelligence, in identifying characters, which are either in the form of typed
documents or the ones that are hand-written by person/user. This problem
diversifies into two different aspects:
The
Optical/Static Character Recognition deals with recognizing printed characters.
An example is printed, properly formatted documents. In the whole of the
document, a particular character wherever it occurs has the same form. The
problem is to read these documents. The systems that address this problem are
highly specialized and thus have very little or extendibility. This paper,
however, does not address this problem.
This
problem is about recognizing hand-written characters by the machine. The user
gives the input by writing the character/alphabet with the help of stylus. So,
the input is a list of points of positions in the stroke/character written by
the user. This problem is termed 'Dynamic', because the input is fed to the
system as the user writes the character and also due to the variation in the
character every time the user writes. It is very well known that the
handwriting of a person is never constant, i.e. varying. Thus, the system that
addresses this problem should be able to approximate, neglect the minor changes
in the character written and then identify the correct character. The language,
in which the character is written, is a parameter, which is to be considered.
For instance, the alphabets of the English language are quite simpler and less
complex than those of the Hindi language. The details of this are given subsequently.
The state-of-the-art in Character
Recognition, for the most part (either static or dynamic character recognition)
is based on heuristic formulations designed for specific problems. The problem
of Online/Dynamic Character Recognition (OCR) can be divided into two sub
problems:
1.
Online
Stroke Detection
2.
Strokes
to Character Identification
In order to solve this problem, the
notion of 'stroke' and 'character' is to be known well.
A stroke is defined as a continuous and single movement of the
electronic pen over the stylus pad. A stroke ends when the user lifts the pen
from the stylus or the pad. An End of Stroke signal denotes the end of the
stroke.
A character is defined as a set of strokes that complete an alphabet.
A character can have one or more strokes required to make it complete and comprehendible.
An End of Character signal denotes the end of the character.
The Online Stroke Detection (OSD)
involves identifying the stroke, from the user input. The input is a list of
continuous points of position on the stylus, while the stroke is being written.
Once the pen is lifted, the end of stroke signal is sent to the system and the
stroke is processed for identification.
This problem OCR comes under
pattern recognition. So, the concept of a 'pattern', feature and a 'pattern
class', ought to be known.
Once the End of the Character
signal is detected, the strokes obtained are checked to identify the total
character.
As
mentioned earlier, the problem can be broken down to that of recognizing
strokes and then identifying characters. This is because most of the Hindi
characters can be broken down into basic
components or strokes and these strokes are geometric figures like
lines, curves and loops, which are written down in one single and continuous
movement of the pen. Some examples of
the characters are given below. The first example is that of the first alphabet
in the Hindi language, the vowel a'.
3 1
Example 1:
![]()
![]()
![]()
4 2![]()
![]()
Fig1:
The first alphabet in Hindi.
It can be seen that the above
alphabet can be broken down into the following components, two curves that move
from the left to right and one horizontal line and one vertical line. In all,
the alphabet consists of four strokes. The numbers in the figure denote the
strokes.
2
Example 2:
1
![]()

![]()
![]()
![]()
![]()
![]()
Fig2: The third alphabet in Hindi
It can be seen that the above alphabet 'e' can be broken down
into the following components, two curves that move from the left to right and
one horizontal line and one vertical line. This alphabet can be written in two
strokes.
Therefore
it can be concluded from the above examples that breaking the whole problem of
OCR down into two parts is advantageous.
·
To recognize the character components or strokes.
·
To recognize the character from a set of components or a set of
strokes.
Since
the character is broken down into components, given below are some strokes that
a character can be divided into. The components that have been commonly found
are
i)
Line
Components
ii)
Curve
Components
![]()
a) A
vertical line:
UDL DUL
b) A
horizontal line:
![]()
LRL RLL
c) A
slating line towards the right:


LRSL
RLSL

d) A slanting line towards the left:
LRSL RLSL
The convention of naming the
above lines is given in Appendix.
The different curves are:


![]()

![]()

![]()
LRC
RLC UDC DUC LOOP
Since
the characters are going to be handwritten the components will not be geometric
and will be noisy. The curves dealt above are very naïve and simple. In the
Hindi language, there are many other curves, which a little complex. Some
instances of these curves are given in Appendix.
The convention of naming the
above curves is given in Appendix.
Therefore
the following approaches have been taken to solve this problem of identifying
the correct stroke.
1. X-Y
displacement approach.
2. Slope Characteristics.
3. Neural
networks as Pattern classifiers.
In this
approach the component/stroke is recognized by analyzing how the directions
vary as the component/stroke is written. The assumptions that are made here are
as follows:
1. People
generally write Hindi characters in a certain way, i.e. most of the characters
are written from top to bottom, left to right.
2. For
some characters “gha” and “dha” which are similar, some simplifying modes of
writing will be suggested and only those will be recognized. There is a need to
do this, otherwise a high cost of processing has to be paid.
To
distinguish between a line and a curve, a threshold is defined. A threshold is that range that
distinguishes a horizontal line from a slant line or a curve. This threshold is
fixed, based on the origin of the component. This is shown in the figure below.
When a curve crosses both the X-threshold area and the Y-threshold area, it is
considered to be a likely curve component or a likely slant-line component.
In this
method, for every two adjacent sampled points, the x and y displacements are
calculated. If both of them cross the threshold, then the direction of x and y
displacement is considered to determine which kind of curve that line segment joining
the two points in the stroke is likely to be.
If only one of the displacements crosses the threshold, the line segment
has more of line characteristics.
Below
are the diagrams that illustrate the concept of X-threshold and Y-threshold.


(Ox,Oy) (Ex,Ey) Threshold
range for y.
straight
line
![]()
slant
line
(Ex1,Ey1)
curve
![]()
Threshold
range for x
Fig3: Diagram to depict the threshold
values

![]()
(ox,oy) · (a1,b1) Threshold -y
·(a2,b2)
Threshold –x
Fig4: Segment characterstics
In the
above diagram the segment between
(a1,b1) and (a2,b2) has curve
characterstics because both the y and x
displacements have crossed the
threshold levels.
The implementation of this
algorithm is done as follows.
1. A data
structure is maintained which contains all the components as variables.
2. To
check if the current point, in consideration, is outside the threshold or not,
and its difference with the origin is taken.
3. For
every two adjacent sampled points on the component/stroke the characteristic of
the component is noted by checking whether it lies in the threshold or not.
a) If the segment crosses the x-threshold only, then the segment
has RLL or LRL characterstics based on
the direction. The corresponding count in the data structure is incremented.
b) If the segment crosses only the y-threshold then the segment
has UDL or DUL characterstics based on the direction. The corresponding count
in the data structure is incremented.
c) If the segment crosses both the y and the x-threshold then both
the vertical and horizontal direction is checked.
To
distinguish between a curve and a slant line, the previous change in the
direction is to be noted. A direction change in either the horizontal or
vertical direction indicates a curve characteristic. And based on the x and y
displacements the corresponding curve features are incremented.
Lines:
![]()
![]()
O(x,y)
![]()
![]()
![]()
B(x,y)
A(x,y)
Fig5: Lines
Here component OA crosses the y-threshold whereas OB crosses
x-threshold. Therefore each segment in OA would either be within the 2
thresholds or cross the y-threshold, and so it can be characterized as an up to
down line or a down to up line. Similarly for the component OB each segment
crosses the X-threshold or remains within both the thresholds.
Curves:


O(x,y) · A(x,y)
· B(x,y)
E(x,y)
Fig6: Curves
The
segment AB of the component has crossed the threshold but till that point
considering all the previous segment there has not been any considerable change
in the horizontal or vertical direction therefore it can be concluded that AB
has slant line characteristics. A slant line going right and downward (RDSL).
In the end the characteristic
having maximum count is declared as the curve characterstics of that stroke.
Once the overall curve characteristic is found, the stroke can be identified.
In this method
the component is taken as a geometric figure and then the threshold is set in
degrees and the amount of slope variation that can be allowed between a slant
line and a line. The method is very similar to the above except that the
threshold values and component differences are calculated using the slopes and
slope variations.
In this
method, as the user draws the character, at every time period, the points are
taken (the time period depends upon the graphics library). These points are
stored for calculation of mole fraction, which will be discussed later. So, for
each time interval, the slope of the latest point (i.e. the current point or
position) and the previous point is calculated. This slope is compared with the
previous slope to decide on the curve or line characteristics of the component
being drawn by the user.
Mathematically,
a straight line consists of points in which the difference between two
consecutive slopes of three consecutive points is zero. But here, the
difference is checked whether it is equal or approximately equal to zero.
Similarly,
a curve consists of points in which the difference between two consecutive
slopes of three consecutive points is almost constant. Thus, when the slope
difference is almost constant, the component is said to be a curve.
But for
curves like the one shown below, the above implementation tends to give the
output as a line but not as a curve. In order to make it more accurate, mole
fraction has to be calculated.
Fig7: A curve identified as a line.
Here,
the slope of not just the consecutive points is taken into consideration, but
also the slopes of the current point and the second previous point, the current
point and the third previous point and so on are calculated for the current
point. Root mean square of all these slopes is calculated to determine the
current slope. This slope is compared with the previous slope (calculated in
the same manner). Better results have been obtained by employing this method.
To
classify the curves, the slope difference is characteristic of each basic curve
and hence with the help of slope difference the curve is identified. And character recognition is similar to the
above approach.
Neural
networks have been often used to solve problems in which the mathematical
solution is unknown or when there is some kind of uncertainty. In the current
problem neural networks can be used to recognize the different kinds of curves.
This can be done in the following three ways:
a) Feeding
the slope values of the curve as an input to the network.
b) The
input of the network can be direction or displacement information, which we
gather in the above ways.
c) The
input to the network can be coordinate values.
The
above stated approaches are naïve and simple. Better and more structured
approaches are as follows.
All the
above approaches deal with only one feature from the stroke given. It may be
the slope, direction or the coordinate values. Better results can be obtained
if more than one feature are employed, and then extract some weight or value
from each feature. Then, identify the stroke, by applying the k-Nearest Neighboring
algorithm. Thus, more than one pattern classifiers are used, each using a
different descriptor/feature. The various features extracted, after a detailed
study of the characters are :
1. Slope
between two consecutive points: Slopes of consecutive points are calculated and
fed to the neural network.
2.
Density of the stroke: Density of the stroke is defined as the number of points
in the whole stroke, divided by the whole number of points on the writing pad.
The value is directly taken in computing the final distance.
3.
Height-Width ratio: This value is obtained by dividing the height of the stroke
by its width. This is another feature that is directly taken while computing
the distance (feature distance from a character).
4.
Distance from the centroid: This feature is extracted by calculating the
distance of each point in the stroke from the centroid of the stroke and
summing it over all the points. The centroid of the stroke is obtained by
taking the average of all the points.
5.
Division of region into subregions: The region is divided into, say n,
subregions. When the stroke is written, what are the regions into which a part
of the stroke is present is checked. This information is fed to the neural net
classifier.
The network can be trained on some standard
components/strokes. The neural network used here is a multi-layered feed forward neural network with only one hidden
layer. The above stated features form the pattern vector, which can be
extracted after scaling, interpolating/extrapolating and normalizing of the
stroke. These tasks of scaling, interpolating/extrapolating and normalizing of
the stroke is required in order to reduce the possibility of noise and
distortion. These tasks will be classified as low-level processing tasks.
Having obtained the refined input, the feature or pattern vector is computed,
which is fed to the respective pattern classifiers. The output of each pattern
classifier is taken and the final distance is calculated which will help in
classifying and identifying the stroke.
The problem is: given a set of classified strokes as input, the characters written, are to be recognized. But this has two different aspects, one in which the user indicates the end of character, after he/she has written down the character, and the other aspect, in which the user does not indicate the end of character, but keeps writing the characters continuously.
A
database is maintained for each character. The database contains a list of what
components/strokes a character consists of.
A lookup is done to check if the component/stroke list given matches any
one of the available. Every character can have more than one component/stroke
list. To quicken the search a finite automata is used. This can be done in the
following manner. Consider the Hindi character ‘a’. We can represent it as the
following set.
LRLC, LRLC, LRL, UDL.
This
means that the character consists of two left moving curves and one left moving
line and one down moving line. This information can be made more specific by
also including a point of contact. For
the inclusion of point of contact of various components we will have to define
some other terms which can specify either the exact point of contact or
relative to the beginning of the character.

Fig8: Character 'a'
Note: The horizontal line on
top is ignored, as it is present in all the characters.
Character: LRLC, LRLC, LRL, and UDL.
This represents the components
of the above character.
I=0;
While(
! Character_end )
{
While ( !component_end )
{
component
= recognise_component(x,y); // recognize the stroke
}
Characterstics[I++] = component ; // taking
all the strokes till end of // character
is detected
}
lookup(
Characteristics ) // checking the list of strokes to identify character
The algorithm is explained in terms of an example.
Eg: If the input is a pattern of strokes. Let A, B, C be sample inputs to the pattern recognizer. Let AA, AB, BC, CC, A, B, C and BB represent valid inputs. Then a given input pattern AABC can be classified as
1. <A> <AB><C>
2. <AA><BC>
3. <AA><B><C> and so on.
The algorithm also does the same. It tries to generate alternate list of patterns.
The algorithm:
1. Get the input stroke.
2. If the input stroke, appended to any existing list forms a character ending, append the stroke to that list and append a $(end of char) to that list. If there is any character that has additional strokes after the ones in the existing list, then copy the above list without a dollar into the next one.
Eg:
<A>$
<A
if B is input next then the lists would be <A>$<B
<AB>$
3. If the input stroke is not the beginning of a new character and the existing list has a dollar in the end, then that list can be eliminated.
Data Structures Used:
a) m_Character
{
int m_StrokeIds;
m_Character * next;
}
This structure represents one character and the set of strokes represented by it. It points to the next character in the list, i.e. it contains <A> and points to <B>
b) m_CharacterList is a pointer to the above structure.
m_Character * m_CharacterList;
c) m_CharacterLists is a pointer to m_CharacterList
m_CharacterList * m_CharacterLists
d) The input language i.e. the different sequences in which a character is written, is read from a file and stored into the following data structure
m_Char
{
int ** m_Types ; // This is a 2D array which contains different // ways in which a character can be written.
int m_NoOfTypes; // This integer gives the number of such
// sequences.
}
The language is stored in m_Char m_CharLang [75]. This is because the language has 75 alphabets, including half-alphabets and matras.
Problems:
No end of input is to be indicated:
Since the user does not indicate an end of input, it is difficult to find out which one of the classification lists indicates the correct input. Spacing and direction information of the strokes is required.
Spacing Information can be used as follows. In the input language, instead of maintaining strokes as an integer array the strokes can be maintained as a structure containing the stroke and its relative position on the screen. The problem here would be how we ascertain the position of the first stroke. If the spacing information is maintained as relative to the next or the previous stroke, then all the possible orders in the database are to be stored. Even then the problem of ascertaining the position of the first stroke would not be resolved. The first stroke can be any one stroke and it might even be a matra in which case the position relative to the character becomes all the more very important.
To resolve this we can store the stroke’s position as relative to the screen, this can be determined only after the user writes the whole character. This can be achieved by placing a restriction on where the user writes the first part of the character.
Problem with spacing information:
a) The first stroke cannot be given the correct displacement information
b) The database will have to store the relation between every two strokes in the character. This data would be quite large.
c) Since end of character information is missing, therefore if relative positioning is given, the stroke is relative to which character - this information becomes very difficult. For this problem the solution lies in giving fixed positions relative to the screen. Or the only solution to this seems to be, to maintain a very large database of inter relations between different strokes of a character.
d) If the position were stored relative to the screen the user would loose the flexibility of writing.
e)
Stroke List given by the Classifier may have been
misclassified. This leads to the problem of propagating errors in the case of
lists. Many of the lists will yield wrong results.
Multiple Correct Results:
The same list can be interpreted in different ways, as given in the above example. Therefore when the user continues to write, in case there are multiple ways to write the character, the interpretation would differ. This would cause the algorithm to give multiple correct results, which would be different interpretations of the same. This should be avoided and only one output should be outputted. This can be done using the spacing information, which to an extent would avoid ambiguities.
This problem holds lots of
importance, since with the upcoming of many palmtops in the market, one mode of
keying in information would be direct writing. For this, the system needs to
interpret and identify what is being written. Many approaches to approximate
are formulated which are heuristically designed, since exhaustive approach to
this problem does not arise due the fact that hand-written characters vary every
time they are written. Much more advanced approaches like Support Vector
machines and Fuzzy set theory can be used to better the performance of the
'character recognition' systems.
An instance of a curve that is quite complex in Hindi language:


The two curves vary very slightly, hence a restriction on the
style of writing has to be put on the user.
Hindi Language
Information
No of characters: 75 (including the half-alphabets and maatras).
No of possible strokes: 81 (including the maatras).
Maximum number of strokes in any character: 6 (in 'O' alphabet).
References
1. Anil K. Jain,
"Fundamentals of Image Processing", Prentice-Hall India, pp. 342-421,
2001.
2.
James
A. Freeman, David M. Skapura, "Neural Networks", Addison-Wesley, pp. 1-127,
1991.
3.
Mahmud Hassoun, "Fundamentals of Neural
Networks", PHI, Chap. 1 - 2.
4. Rafael C. Gonzalez, Richard E. Woods, "Digital Image Processing", Addison-Wesley, pp. 571-619, 1999.
5. Robert Schalkoff, "Artificial Neural Networks", AWL, Chap. 1 - 2.