User Manual for iconverter Author : Saumen Mandal (saumen@cse.iitk.ac.in) History : First release : Code Transformations using iconverter ------------------------------------- Support is provided for conversions from iscii code space to unicode and vice-versa for each of the ten Indian languages. A file containing Indian language texts in ISCII codes, can be converted to its unicode equivalent with the help of the tool - iconverter. The syntax for the above is : iconverter -e encoding_file iscii_coded_file_name unicode_file_name For the conversion from unicode to iscii coded file the syntax is : iconverter -e encoding_file unicode_coded_file_name iscii_file_name The encoding_file for the code conversions contains the mapping from one code space to another and is different for different Indian languages. It is also different when converting from iscii to unicode as against unicode to iscii for the same language. The encoding files come along with the isciilib package and get installed when isciilib is installed. The various encoding files are contained in the directory /usr/share/fonts/encodings/ An example command for conversion would be : iconverter -e iscii_unicode_dev dev.isc dev.uni This does the conversion of an iscii coded devanagari file to its equivalent unicode codes. The name of the encoding file for the above convertion is iscii_unicode_dev. An example for conversion back to iscii would be : iconverter -e unicode_iscii_bng bng.uni bng.isc This does the conversion of an unicode coded bengali file to its equivalent iscii codes. unicode_iscii_bng is the encoding file used here. Note : Only the name of the encoding file needs to be specified rather than the entire path viz. /usr/share/fonts/encodings/encoding_file_name. One can also specify one's own encoding_file in which case the entire path needs to be mentioned with respect from the above mentioned directory. Support is also provided for converting unicode coded files to their PostScript format. The utility iscii2ps converts both iscii and unicode coded Indian language files to their PostScript format using the fonts that get installed with isciilib. An example converting command is : iscii2ps dev.isc dev.isc.ps iscii2ps -unicode 2 bng.uni bng.uni.ps The first one does the conversion from iscii to PostScript while the second one converts the input unicode file to PostScript. The unicode corresponding to a letter of an alphabet of a language comprises of two bytes. Hence the option "-unicode 2" has to be specified to iscii2ps for converting Unicode coded files to their PostScript form. Here too the corresponding encoding file for conversions from iscii/unicode to PostScript for each language exists in the same directory as above and gets installed along with the isciilib package. The fonts provided by C-DAC (Centre for Development of advanced computing, India) have been used for displaying Indian language scripts in PostScript format. These fonts also get installed along with isciilib and are used by the X-font server to generate the PS format. The different fonts used for the different languages are listed below. The following 10 Indian Languages are supported : Assamese Bengali Devanagari Gujarati Kannada Malayalam Oriya Punjabi Tamil Telugu The code conversion files for the above languages are : For iscii_to_unicode : For unicode_to_iscii : iscii_unicode_asm unicode_iscii_asm iscii_unicode_bng unicode_iscii_bng iscii_unicode_dev unicode_iscii_dev iscii_unicode_gjr unicode_iscii_gjr iscii_unicode_knd unicode_iscii_knd iscii_unicode_mlm unicode_iscii_mlm iscii_unicode_ori unicode_iscii_ori iscii_unicode_pnj unicode_iscii_pnj iscii_unicode_tml unicode_iscii_tml iscii_unicode_tlg unicode_iscii_tlg For iscii-to-isfoc : For unicode-to-isfoc : iscii_isfoc_asm unicode_isfoc_asm iscii_isfoc_bng unicode_isfoc_bng iscii_isfoc_dev unicode_isfoc_dev iscii_isfoc_gjr unicode_isfoc_gjr iscii_isfoc_knd unicode_isfoc_knd iscii_isfoc_mlm unicode_isfoc_mlm iscii_isfoc_ori unicode_isfoc_ori iscii_isfoc_pnj unicode_isfoc_pnj iscii_isfoc_tml unicode_isfoc_tml iscii_isfoc_tlg unicode_isfoc_tlg The fonts used for generating the PostScript format for the different languages are : For Assamese : -altsys-as_ttdurga-medium-r-normal--0-0-0-0-p-0-iso8859-1 For Bengali : -altsys-bn_ttdurga-medium-r-normal--0-0-0-0-p-0-iso8859-1 For Devanagari : -altsys-dv_ttyogesh-medium-r-normal--0-0-0-0-p-0-iso8859-1 For Gujarati : -altsys-gj_ttanita-medium-r-normal--0-0-0-0-p-0-iso8859-1 For Kannada : -altsys-kn_ttpadmini-medium-r-normal--0-0-0-0-p-0-iso8859-1 For Malayalam : -altsys-ml_ttkarthika-medium-r-normal--0-0-0-0-p-0-ascii-0 For Oriya : -altsys-or_ttsarala-medium-r-normal--0-0-0-0-p-0-iso8859-1 For Punjabi : -altsys-pn_ttamar-medium-r-normal--0-0-0-0-p-0-ascii-0 For Tamil : -altsys-tm_ttvalluvar-medium-r-normal--0-0-0-0-p-0-iso8859-1 For Telugu : -altsys-tl_ttharshapriya-medium-r-normal--0-0-0-0-p-0-iso8859-1 See Also : The man page for iconverter.