A BRIEF HISTORY OF LANGUAGE TECHNOLOGY
RESEARCH AT I.I.T.KANPUR

The work on Computer Processing of Indian Languages and scripts started in early seventies at IIT Kanpur. The dominant role that the computer could play in solving complex problems posed by the plurality of languages and scripts in the country, was very well visualized by the researchers at IITK and some ice-breaking work done in this area provided the foundation to the present R & D effort.

The work started on Devanagari Optical Character Recognition (1970-73)(R.M.K. Sinha and H.N. Mahabala), and on developing keyboarding and coding schemes (P.V.H.L.Narasimham, V.Rajaraman and B.Prasada; R.M.K. Sinha and H.N.Mahabala). However due to the exorbitant cost of the computers which could be afforded only by a few in those days, these remained more or less an academic exercise. With the advent of microprocessors in mid seventies, it became economically viable, to translate some of these ideas into a stand-alone Indian language terminal design (R.M.K. Sinha and Arjun Raman). A number of B.Tech and M.Tech project works were devoted to this.

In 1978 with the initiation of IIT Kanpur (R.M.K. Sinha) a National Symposium was organized on Linguistic Implications in Computer based Information Systems by Department of Electronics (DOE) (Om Vikas), Govt. of India. This triggered widespread activity in the area in the country. The work at IIT Kanpur got a real fillip when a project on design and development of 'Integrated Devanagari Computer(IDC)' terminal was sponsored by DOE, Govt. of India in 1983 (R.M.K. Sinha, S.K. Mullick). The IDC terminal was designed in a record time of about 8 months and was demonstrated at the Third World Hindi Convention at Delhi. It was developed using Intel 8086 processor with multitasking firmware. The IDC project was further extended to implement the same technology using the 32-bit 68000 microprocessor and the outcome was named as GIST (Graphics and Indian Script Terminal) technology. A number of companies bought this technology for manufacturing multilingual computer terminals. This GIST technology was adapted by the Centre for Development of Advanced Computing (C-DAC) when the research engineer working on the project at IIT Kanpur (Mohan Tambe) joined C-DAC and took the technology with him without a formal transfer of technology.

In 1984, Journal of Institution of Electronics and Telecommunication Engineering published a special issue on Computer Processing of Indian languages and scripts (Guest Editor: R.M.K. Sinha). This special issue carried several articles on results of research and development works at IIT Kanpur(R.M.K. Sinha). Prof. Sinha's research on comparison of different possible coding schemes, keyboarding schemes, pros & cons of phonetic keyboarding and internal representation, its inherent transliteration capability, schema for Machine Translation using Interlingua, etc were presented in the lead article. Some of the other articles presented, for the first time, strategy for English to Hindi and Hindi to English Transliteration (R.M.K. Sinha, B. Srinivasan), Spell-checker (R.M.K. Sinha and K.S. Singh), segment display (R.M.K. Sinha). This special issue became a reference material for researchers in this area.

The GIST technology represented a major breakthrough in solving our complex problem of man-machine linguistic interface for Indian languages. This technology incorporated several desirable features. A natural phonetically oriented keyboarding scheme directly converting to internal codes called ISSCII-8 (8-bit Indian Standard Script Code for Information Interchange), a human engineered keyboard layout, a display which dynamically changes as the input progresses, built-in intelligence to disallow illegal compositions such as attaching two vowel modifiers on the same character, automatic transliteration from one Indian script to another, are some of the key attractive features making it user friendly. ISSCII-8 is an extension of ASCII which has been designed during early 1980's with active inputs from IIT Kanpur (R.M.K. Sinha), caters to the entire set of Indian scripts in an uniform way. ISSCII-8 has undergone further modifications and a modified version has been accepted by Bureau of Indian Standards as ISCII (8-bit Indian Script Code for Information Interchange) code in 1991. ISCII forms the basis for UNICODE code assignments for Indian scripts.

In 1992, UNESCO and UNDP sponsored the Second Regional Workshop on Computer Processing of Asian Languages (CPAL-2) at IIT Kanpur under the Chairmanship of Prof. R.M.K. Sinha. (The first CPAL was held at Asian Institute of Technology, Bangkok in 1989.) This workshop was attended by several international experts and a a set of recommendations on current issues were generated out of the panel discussions which were submitted to UNESCO.

In 1990-92 Professor R.M.K. Sinha conceptualized design of a Machine Aided Translation system for translation from English to Indian Languages. This system was named as ANGLABHARTI and the underlying methodology named as ANGLABHARTI Technology or ANGLABHARTI Approach.

In parallel a Language accessor system for Indian Languages (ANUSARAK) was also designed and developed (Rajiv Sangal, Vineet Chaitanya) which later continued at University of Hyderabad.

In 1992-94, IITK (R.M.K. Sinha) implemented the Anglabharti system on Sun OS environment for translation from English to Hindi. All the modules of the systems were implemented, tested and demonstrated.

During 1995-97, Department of Electronics, Govt. of India, sanctioned a grant-in-aid for implementation of the project titled "Machine Aided Translation from English to Hindi for standard documents (domain of Public Health Campaign) based on ANGLABHARTI approach" for which ERDC (with its office at Lucknow and now moved to NOIDA) was associated for implementation and commercialization of this software on a PC platform in the domain of public health campaign. The ANGLABHARTI software already developed by IITK on SUN system was used in this project and was implemented (re-engineered) on PC under Linux jointly by IITK and ERDC under the supervision of Prof. R.M.K. Sinha.

In 1995-96, IITK also designed and developed an Example-based approach for Machine Aided Translation for similar (Indian languages) and dissimilar (English and Indian Languages) under the leadership of Professor R.M.K. Sinha. This approach has been named as ANUBHARTI approach.

Currently, AnglaHindi, the English to Hindi MAT based on Anglabharti methodology, which accepts unconstrained text, has already been made available to the users and is very well received. AnglaUrdu which is based on AnglaHindi has also been demonstrated. HindiAngla, the Hindi to English MAT based on Anubharti methodology, has been demonstrated for simple sentences and further work is going on to handle compound and complex sentences.

IIT Kanpur in association with the Technology Development for Indian Language (TDIL) Programme of Govt. of India has recently taken an initiative to make the AnglaBharti Technology available to all the thirteen Resource Centres in the coutry. These rsource centres have been established across the country for development of Indian languages technology solutions in their regional languages. These centres will develop MAT systems from English to their assigned languages using AnglaBharti technology.