A BRIEF HISTORY OF LANGUAGE TECHNOLOGY
RESEARCH AT I.I.T.KANPUR
The work on Computer Processing of Indian Languages and
scripts started in early seventies at IIT Kanpur. The dominant role that the computer
could play in solving complex problems posed by the plurality of languages and scripts in
the country, was very well visualized by the researchers at IITK and some ice-breaking
work done in this area provided the foundation to the present R & D effort.
The work started on Devanagari Optical Character Recognition (1970-73)(R.M.K. Sinha and H.N. Mahabala), and
on developing keyboarding and coding schemes (P.V.H.L.Narasimham, V.Rajaraman and
B.Prasada; R.M.K. Sinha and H.N.Mahabala). However due to the exorbitant cost of the
computers which could be afforded only by a few in those days, these remained more or less
an academic exercise. With the advent of microprocessors in mid seventies, it became
economically viable, to translate some of these ideas into a stand-alone Indian language
terminal design (R.M.K. Sinha and Arjun Raman). A number of B.Tech and M.Tech project works
were devoted to this.
In 1978 with the initiation of IIT Kanpur (R.M.K. Sinha) a National Symposium was
organized on Linguistic Implications in Computer based Information Systems by Department
of Electronics (DOE) (Om Vikas), Govt. of India. This triggered widespread activity in the
area in the country. The work at IIT Kanpur got a real fillip when a project on design and
development of 'Integrated Devanagari Computer(IDC)' terminal was sponsored by DOE, Govt.
of India in 1983 (R.M.K. Sinha, S.K. Mullick). The IDC terminal was designed in a record
time of about 8 months and was demonstrated at the Third World Hindi Convention at Delhi.
It was developed using Intel 8086 processor with multitasking firmware. The IDC project
was further extended to implement the same technology using the 32-bit 68000
microprocessor and the outcome was named as GIST (Graphics and Indian Script Terminal)
technology. A number of companies bought this technology for manufacturing multilingual
computer terminals. This GIST technology was adapted by the Centre for Development of
Advanced Computing (C-DAC) when the research engineer working on the project at IIT Kanpur
(Mohan Tambe) joined C-DAC and took the technology with him without a formal transfer of
technology.
In 1984, Journal of Institution of Electronics and Telecommunication Engineering
published a special issue on Computer Processing of Indian languages and scripts (Guest
Editor: R.M.K. Sinha). This special issue carried several articles on results of research
and development works at IIT Kanpur(R.M.K. Sinha). Prof. Sinha's research on comparison of
different possible coding schemes, keyboarding schemes, pros & cons of phonetic
keyboarding and internal representation, its inherent transliteration capability, schema
for Machine Translation using Interlingua, etc were presented in the lead article. Some of the other
articles presented, for the first time, strategy for English to Hindi and Hindi to English
Transliteration (R.M.K. Sinha, B. Srinivasan), Spell-checker (R.M.K. Sinha and K.S.
Singh), segment display (R.M.K. Sinha). This special issue became a reference material for
researchers in this area.
The GIST technology represented a major breakthrough in solving our complex problem of
man-machine linguistic interface for Indian languages. This technology incorporated
several desirable features. A natural phonetically oriented keyboarding scheme directly
converting to internal codes called ISSCII-8 (8-bit Indian Standard Script Code for
Information Interchange), a human engineered keyboard layout, a display which dynamically
changes as the input progresses, built-in intelligence to disallow illegal compositions
such as attaching two vowel modifiers on the same character, automatic transliteration
from one Indian script to another, are some of the key attractive features making it user
friendly. ISSCII-8 is an extension of ASCII which has been designed during early 1980's
with active inputs from IIT Kanpur (R.M.K. Sinha), caters to the entire set of Indian
scripts in an uniform way. ISSCII-8 has undergone further modifications and a modified
version has been accepted by Bureau of Indian Standards as ISCII (8-bit Indian Script Code
for Information Interchange) code in 1991. ISCII forms the basis for UNICODE code
assignments for Indian scripts.
In 1992, UNESCO and UNDP sponsored the Second Regional Workshop on Computer Processing
of Asian Languages (CPAL-2) at IIT Kanpur under the Chairmanship of Prof. R.M.K. Sinha.
(The first CPAL was held at Asian Institute of Technology, Bangkok in 1989.) This workshop
was attended by several international experts and a a set of recommendations on current
issues were generated out of the panel discussions which were submitted to UNESCO.
In 1990-92 Professor R.M.K. Sinha conceptualized design of a Machine Aided Translation system for translation
from English to Indian Languages.
This system was named as ANGLABHARTI and the underlying
methodology named as ANGLABHARTI Technology or ANGLABHARTI Approach.
In parallel a Language accessor system for Indian Languages (ANUSARAK) was also
designed and developed (Rajiv Sangal, Vineet Chaitanya) which later continued at
University of Hyderabad.
In 1992-94, IITK (R.M.K. Sinha) implemented the Anglabharti system on Sun OS environment
for translation from English to Hindi. All the modules of the systems were implemented,
tested and demonstrated.
During 1995-97, Department of Electronics, Govt. of India, sanctioned a grant-in-aid for
implementation of the project titled "Machine Aided Translation from English to Hindi
for standard documents (domain of Public Health Campaign) based on ANGLABHARTI
approach" for which ERDC (with its office at Lucknow and now moved to NOIDA) was
associated for implementation and commercialization of this software on a PC platform in
the domain of public health campaign. The ANGLABHARTI software already developed by IITK
on SUN system was used in this project and was implemented (re-engineered) on PC under
Linux jointly by IITK and ERDC under the supervision of Prof. R.M.K. Sinha.
In 1995-96, IITK also designed and developed an Example-based approach for Machine Aided Translation for similar (Indian languages) and dissimilar (English and Indian Languages) under the leadership of Professor R.M.K. Sinha. This approach has been named as ANUBHARTI approach.
Currently, AnglaHindi, the English to Hindi MAT based on Anglabharti methodology, which accepts unconstrained text, has already been made available to the users and is very well received. AnglaUrdu which is based on AnglaHindi has also been demonstrated. HindiAngla, the Hindi to English MAT based on Anubharti methodology, has been demonstrated for simple sentences and further work is going on to handle compound and complex sentences.
IIT Kanpur in association with the Technology Development for Indian Language (TDIL) Programme of Govt. of India has recently taken an initiative to make the AnglaBharti Technology available to all the thirteen Resource Centres in the coutry. These rsource centres have been established across the country for development of Indian languages technology solutions in their regional languages. These centres will develop MAT systems from English to their assigned languages using AnglaBharti technology.