Abstract: A human eye can read the characters of a language written legibly either by a person, or when it is printed. Making the machine do the same, is broadly the problem of 'Character Recognition'. This implies endowing the machine with the capability to approximate and thus identify characters correctly, in some sense, a similar capability in human beings. The Online/Dynamic Character Recognition is a process of identifying a hand-written character. This paper gives an insight into the problem, using the Hindi language as a model. The paper addresses and enumerates the various ways and methodologies to tackle the problem. This problem 'Online/Dynamic Character Recognition' comes under the purview of Pattern Recognition and Image Processing.
The problem of Character Recognition is to make the machine recognize the characters written, as stated above. The goal of the problem is basically to achieve character recognition accuracy, which is very close to the superb capability exhibited by human beings for performing such tasks. Thus, a character recognition system should be able to exhibit a certain degree of intelligence, in identifying characters, which are either in the form of typed documents or the ones that are hand-written by person/user. This problem diversifies into two different aspects:
The Optical/Static Character Recognition deals with recognizing printed characters. An example is printed, properly formatted documents. In the whole of the document, a particular character wherever it occurs has the same form. The problem is to read these documents. The systems that address this problem are highly specialized and thus have very little or extendibility. This paper, however, does not address this problem.
This problem is about recognizing hand-written characters by the machine. The user gives the input by writing the character/alphabet with the help of stylus. So, the input is a list of points of positions in the stroke/character written by the user. This problem is termed 'Dynamic', because the input is fed to the system as the user writes the character and also due to the variation in the character every time the user writes. It is very well known that the handwriting of a person is never constant, i.e. varying. Thus, the system that addresses this problem should be able to approximate, neglect the minor changes in the character written and then identify the correct character. The language, in which the character is written, is a parameter, which is to be considered. For instance, the alphabets of the English language are quite simpler and less complex than those of the Hindi language. The details of this are given subsequently.
The state-of-the-art in Character Recognition, for the most part (either static or dynamic character recognition) is based on heuristic formulations designed for specific problems. The problem of Online/Dynamic Character Recognition (OCR) can be divided into two sub problems:
1. Online Stroke Detection
2. Strokes to Character Identification
In order to solve this problem, the notion of 'stroke' and 'character' is to be known well.
A stroke is defined as a continuous and single movement of the electronic pen over the stylus pad. A stroke ends when the user lifts the pen from the stylus or the pad. An End of Stroke signal denotes the end of the stroke.
A character is defined as a set of strokes that complete an alphabet. A character can have one or more strokes required to make it complete and comprehendible. An End of Character signal denotes the end of the character.
The Online Stroke Detection (OSD) involves identifying the stroke, from the user input. The input is a list of continuous points of position on the stylus, while the stroke is being written. Once the pen is lifted, the end of stroke signal is sent to the system and the stroke is processed for identification.
This problem OCR comes under pattern recognition. So, the concept of a 'pattern', feature and a 'pattern class', ought to be known.
Once the End of the Character signal is detected, the strokes obtained are checked to identify the total character.
As mentioned earlier, the problem can be broken down to that of recognizing strokes and then identifying characters. This is because most of the Hindi characters can be broken down into basic components or strokes and these strokes are geometric figures like lines, curves and loops, which are written down in one single and continuous movement of the pen. Some examples of the characters are given below. The first example is that of the first alphabet in the Hindi language, the vowel a'.
Fig1: The first alphabet in Hindi.
It can be seen that the above alphabet can be broken down into the following components, two curves that move from the left to right and one horizontal line and one vertical line. In all, the alphabet consists of four strokes. The numbers in the figure denote the strokes.
Fig2: The third alphabet in Hindi
It can be seen that the above alphabet 'e' can be broken down into the following components, two curves that move from the left to right and one horizontal line and one vertical line. This alphabet can be written in two strokes.
Therefore it can be concluded from the above examples that breaking the whole problem of OCR down into two parts is advantageous.
· To recognize the character components or strokes.
· To recognize the character from a set of components or a set of strokes.
Since the character is broken down into components, given below are some strokes that a character can be divided into. The components that have been commonly found are
i) Line Components
ii) Curve Components
a) A vertical line:
b) A horizontal line:
c) A slating line towards the right:
d) A slanting line towards the left:
The convention of naming the above lines is given in Appendix.
The different curves are:
LRC RLC UDC DUC LOOP
Since the characters are going to be handwritten the components will not be geometric and will be noisy. The curves dealt above are very naïve and simple. In the Hindi language, there are many other curves, which a little complex. Some instances of these curves are given in Appendix.
The convention of naming the above curves is given in Appendix.
Therefore the following approaches have been taken to solve this problem of identifying the correct stroke.
1. X-Y displacement approach.
2. Slope Characteristics.
3. Neural networks as Pattern classifiers.
In this approach the component/stroke is recognized by analyzing how the directions vary as the component/stroke is written. The assumptions that are made here are as follows:
1. People generally write Hindi characters in a certain way, i.e. most of the characters are written from top to bottom, left to right.
2. For some characters “gha” and “dha” which are similar, some simplifying modes of writing will be suggested and only those will be recognized. There is a need to do this, otherwise a high cost of processing has to be paid.
To distinguish between a line and a curve, a threshold is defined. A threshold is that range that distinguishes a horizontal line from a slant line or a curve. This threshold is fixed, based on the origin of the component. This is shown in the figure below. When a curve crosses both the X-threshold area and the Y-threshold area, it is considered to be a likely curve component or a likely slant-line component.
In this method, for every two adjacent sampled points, the x and y displacements are calculated. If both of them cross the threshold, then the direction of x and y displacement is considered to determine which kind of curve that line segment joining the two points in the stroke is likely to be. If only one of the displacements crosses the threshold, the line segment has more of line characteristics.
Below are the diagrams that illustrate the concept of X-threshold and Y-threshold.
(Ox,Oy) (Ex,Ey) Threshold range for y.
Threshold range for x
Fig3: Diagram to depict the threshold values
(ox,oy) · (a1,b1) Threshold -y
Fig4: Segment characterstics
In the above diagram the segment between (a1,b1) and (a2,b2) has curve characterstics because both the y and x displacements have crossed the threshold levels.
The implementation of this algorithm is done as follows.
1. A data structure is maintained which contains all the components as variables.
2. To check if the current point, in consideration, is outside the threshold or not, and its difference with the origin is taken.
3. For every two adjacent sampled points on the component/stroke the characteristic of the component is noted by checking whether it lies in the threshold or not.
a) If the segment crosses the x-threshold only, then the segment has RLL or LRL characterstics based on the direction. The corresponding count in the data structure is incremented.
b) If the segment crosses only the y-threshold then the segment has UDL or DUL characterstics based on the direction. The corresponding count in the data structure is incremented.
c) If the segment crosses both the y and the x-threshold then both the vertical and horizontal direction is checked.
To distinguish between a curve and a slant line, the previous change in the direction is to be noted. A direction change in either the horizontal or vertical direction indicates a curve characteristic. And based on the x and y displacements the corresponding curve features are incremented.
Here component OA crosses the y-threshold whereas OB crosses x-threshold. Therefore each segment in OA would either be within the 2 thresholds or cross the y-threshold, and so it can be characterized as an up to down line or a down to up line. Similarly for the component OB each segment crosses the X-threshold or remains within both the thresholds.
O(x,y) · A(x,y)
The segment AB of the component has crossed the threshold but till that point considering all the previous segment there has not been any considerable change in the horizontal or vertical direction therefore it can be concluded that AB has slant line characteristics. A slant line going right and downward (RDSL).
In the end the characteristic having maximum count is declared as the curve characterstics of that stroke. Once the overall curve characteristic is found, the stroke can be identified.
In this method the component is taken as a geometric figure and then the threshold is set in degrees and the amount of slope variation that can be allowed between a slant line and a line. The method is very similar to the above except that the threshold values and component differences are calculated using the slopes and slope variations.
In this method, as the user draws the character, at every time period, the points are taken (the time period depends upon the graphics library). These points are stored for calculation of mole fraction, which will be discussed later. So, for each time interval, the slope of the latest point (i.e. the current point or position) and the previous point is calculated. This slope is compared with the previous slope to decide on the curve or line characteristics of the component being drawn by the user.
Mathematically, a straight line consists of points in which the difference between two consecutive slopes of three consecutive points is zero. But here, the difference is checked whether it is equal or approximately equal to zero.
Similarly, a curve consists of points in which the difference between two consecutive slopes of three consecutive points is almost constant. Thus, when the slope difference is almost constant, the component is said to be a curve.
But for curves like the one shown below, the above implementation tends to give the output as a line but not as a curve. In order to make it more accurate, mole fraction has to be calculated.
Fig7: A curve identified as a line.
Here, the slope of not just the consecutive points is taken into consideration, but also the slopes of the current point and the second previous point, the current point and the third previous point and so on are calculated for the current point. Root mean square of all these slopes is calculated to determine the current slope. This slope is compared with the previous slope (calculated in the same manner). Better results have been obtained by employing this method.
To classify the curves, the slope difference is characteristic of each basic curve and hence with the help of slope difference the curve is identified. And character recognition is similar to the above approach.
Neural networks have been often used to solve problems in which the mathematical solution is unknown or when there is some kind of uncertainty. In the current problem neural networks can be used to recognize the different kinds of curves. This can be done in the following three ways:
a) Feeding the slope values of the curve as an input to the network.
b) The input of the network can be direction or displacement information, which we gather in the above ways.
c) The input to the network can be coordinate values.
The above stated approaches are naïve and simple. Better and more structured approaches are as follows.
All the above approaches deal with only one feature from the stroke given. It may be the slope, direction or the coordinate values. Better results can be obtained if more than one feature are employed, and then extract some weight or value from each feature. Then, identify the stroke, by applying the k-Nearest Neighboring algorithm. Thus, more than one pattern classifiers are used, each using a different descriptor/feature. The various features extracted, after a detailed study of the characters are :
1. Slope between two consecutive points: Slopes of consecutive points are calculated and fed to the neural network.
2. Density of the stroke: Density of the stroke is defined as the number of points in the whole stroke, divided by the whole number of points on the writing pad. The value is directly taken in computing the final distance.
3. Height-Width ratio: This value is obtained by dividing the height of the stroke by its width. This is another feature that is directly taken while computing the distance (feature distance from a character).
4. Distance from the centroid: This feature is extracted by calculating the distance of each point in the stroke from the centroid of the stroke and summing it over all the points. The centroid of the stroke is obtained by taking the average of all the points.
5. Division of region into subregions: The region is divided into, say n, subregions. When the stroke is written, what are the regions into which a part of the stroke is present is checked. This information is fed to the neural net classifier.
The network can be trained on some standard components/strokes. The neural network used here is a multi-layered feed forward neural network with only one hidden layer. The above stated features form the pattern vector, which can be extracted after scaling, interpolating/extrapolating and normalizing of the stroke. These tasks of scaling, interpolating/extrapolating and normalizing of the stroke is required in order to reduce the possibility of noise and distortion. These tasks will be classified as low-level processing tasks. Having obtained the refined input, the feature or pattern vector is computed, which is fed to the respective pattern classifiers. The output of each pattern classifier is taken and the final distance is calculated which will help in classifying and identifying the stroke.
The problem is: given a set of classified strokes as input, the characters written, are to be recognized. But this has two different aspects, one in which the user indicates the end of character, after he/she has written down the character, and the other aspect, in which the user does not indicate the end of character, but keeps writing the characters continuously.
A database is maintained for each character. The database contains a list of what components/strokes a character consists of. A lookup is done to check if the component/stroke list given matches any one of the available. Every character can have more than one component/stroke list. To quicken the search a finite automata is used. This can be done in the following manner. Consider the Hindi character ‘a’. We can represent it as the following set.
LRLC, LRLC, LRL, UDL.
This means that the character consists of two left moving curves and one left moving line and one down moving line. This information can be made more specific by also including a point of contact. For the inclusion of point of contact of various components we will have to define some other terms which can specify either the exact point of contact or relative to the beginning of the character.
Fig8: Character 'a'
Note: The horizontal line on top is ignored, as it is present in all the characters.
Character: LRLC, LRLC, LRL, and UDL.
This represents the components of the above character.
While( ! Character_end )
While ( !component_end )
component = recognise_component(x,y); // recognize the stroke
Characterstics[I++] = component ; // taking all the strokes till end of // character is detected
lookup( Characteristics ) // checking the list of strokes to identify character
The algorithm is explained in terms of an example.
Eg: If the input is a pattern of strokes. Let A, B, C be sample inputs to the pattern recognizer. Let AA, AB, BC, CC, A, B, C and BB represent valid inputs. Then a given input pattern AABC can be classified as
1. <A> <AB><C>
3. <AA><B><C> and so on.
The algorithm also does the same. It tries to generate alternate list of patterns.
1. Get the input stroke.
2. If the input stroke, appended to any existing list forms a character ending, append the stroke to that list and append a $(end of char) to that list. If there is any character that has additional strokes after the ones in the existing list, then copy the above list without a dollar into the next one.
if B is input next then the lists would be <A>$<B
3. If the input stroke is not the beginning of a new character and the existing list has a dollar in the end, then that list can be eliminated.
Data Structures Used:
m_Character * next;
This structure represents one character and the set of strokes represented by it. It points to the next character in the list, i.e. it contains <A> and points to <B>
b) m_CharacterList is a pointer to the above structure.
m_Character * m_CharacterList;
c) m_CharacterLists is a pointer to m_CharacterList
m_CharacterList * m_CharacterLists
d) The input language i.e. the different sequences in which a character is written, is read from a file and stored into the following data structure
int ** m_Types ; // This is a 2D array which contains different // ways in which a character can be written.
int m_NoOfTypes; // This integer gives the number of such
The language is stored in m_Char m_CharLang . This is because the language has 75 alphabets, including half-alphabets and matras.
No end of input is to be indicated:
Since the user does not indicate an end of input, it is difficult to find out which one of the classification lists indicates the correct input. Spacing and direction information of the strokes is required.
Spacing Information can be used as follows. In the input language, instead of maintaining strokes as an integer array the strokes can be maintained as a structure containing the stroke and its relative position on the screen. The problem here would be how we ascertain the position of the first stroke. If the spacing information is maintained as relative to the next or the previous stroke, then all the possible orders in the database are to be stored. Even then the problem of ascertaining the position of the first stroke would not be resolved. The first stroke can be any one stroke and it might even be a matra in which case the position relative to the character becomes all the more very important.
To resolve this we can store the stroke’s position as relative to the screen, this can be determined only after the user writes the whole character. This can be achieved by placing a restriction on where the user writes the first part of the character.
Problem with spacing information:
a) The first stroke cannot be given the correct displacement information
b) The database will have to store the relation between every two strokes in the character. This data would be quite large.
c) Since end of character information is missing, therefore if relative positioning is given, the stroke is relative to which character - this information becomes very difficult. For this problem the solution lies in giving fixed positions relative to the screen. Or the only solution to this seems to be, to maintain a very large database of inter relations between different strokes of a character.
d) If the position were stored relative to the screen the user would loose the flexibility of writing.
e) Stroke List given by the Classifier may have been misclassified. This leads to the problem of propagating errors in the case of lists. Many of the lists will yield wrong results.
Multiple Correct Results:
The same list can be interpreted in different ways, as given in the above example. Therefore when the user continues to write, in case there are multiple ways to write the character, the interpretation would differ. This would cause the algorithm to give multiple correct results, which would be different interpretations of the same. This should be avoided and only one output should be outputted. This can be done using the spacing information, which to an extent would avoid ambiguities.
This problem holds lots of importance, since with the upcoming of many palmtops in the market, one mode of keying in information would be direct writing. For this, the system needs to interpret and identify what is being written. Many approaches to approximate are formulated which are heuristically designed, since exhaustive approach to this problem does not arise due the fact that hand-written characters vary every time they are written. Much more advanced approaches like Support Vector machines and Fuzzy set theory can be used to better the performance of the 'character recognition' systems.
An instance of a curve that is quite complex in Hindi language:
The two curves vary very slightly, hence a restriction on the style of writing has to be put on the user.
Hindi Language Information
No of characters: 75 (including the half-alphabets and maatras).
No of possible strokes: 81 (including the maatras).
Maximum number of strokes in any character: 6 (in 'O' alphabet).
1. Anil K. Jain, "Fundamentals of Image Processing", Prentice-Hall India, pp. 342-421, 2001.
2. James A. Freeman, David M. Skapura, "Neural Networks", Addison-Wesley, pp. 1-127, 1991.
3. Mahmud Hassoun, "Fundamentals of Neural Networks", PHI, Chap. 1 - 2.
4. Rafael C. Gonzalez, Richard E. Woods, "Digital Image Processing", Addison-Wesley, pp. 571-619, 1999.
5. Robert Schalkoff, "Artificial Neural Networks", AWL, Chap. 1 - 2.