ARNAB BHATTACHARYA


Professor, Dept. of Computer Science and Engineering, Indian Institute of Technology, Kanpur.

Email: arnabb@iitk.ac.in, arnabb@cse.iitk.ac.in, arnabbhattacharya@gmail.com

Web Page: http://www.cse.iitk.ac.in/users/arnabb/

Address: Dept. of Computer Science and Engineering, Indian Institute of Technology, Kanpur, UP - 208016, India.

Phone: +91-512-679-7650, +91-512-392-7650, +91-512-259-7650.

Area of Research: Databases, Data Mining, Information Retrieval, Natural Language Processing, Artificial Intelligence.

Experience:

  • Professor, Dept. of Computer Science and Engineering, Indian Institute of Technology (IIT), Kanpur, India. December 2020 - present.

  • Associate Professor, Dept. of Computer Science and Engineering, Indian Institute of Technology (IIT), Kanpur, India. June 2014 - December 2020.

  • Assistant Professor, Dept. of Computer Science and Engineering, Indian Institute of Technology (IIT), Kanpur, India. December 2007 - June 2014.

  • Project Scientist, Dept. of Computer Science, University of California, Santa Barbara, CA, USA. September 2007 - November 2007.

  • Graduate Student Research Assistant, Dept. of Computer Science, University of California, Santa Barbara, CA, USA. July 2003 - August 2007.

  • Teaching Assistant, Dept. of Computer Science, University of California, Santa Barbara, CA, USA. September 2002 - June 2003.

  • Software Design Engineer, Texas Instruments (India) Ltd., Bangalore, India. July 2001 - July 2002.


Books:

  1. Fundamentals of Database Indexing and Searching. Arnab Bhattacharya. CRC Press, 2014.


Selected Publications:

  1. Framework for Question-Answering in Sanskrit through Automated Construction of Knowledge Graphs. Hrishikesh Terdalkar, Arnab Bhattacharya. 6th International Sanskrit Computational Linguistics Symposium (ISCLS), 2019, to appear, Kharagpur, India.

  2. TIPS: Mining Top-K Locations to Minimize User-Inconvenience for Trajectory-Aware Services. Shubhadip Mitra, Priya Saraf, Arnab Bhattacharya. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2019, to appear.

  3. RAQ: Relationship-Aware Graph Querying in Large Networks. Jithin Vachery, Akhil Arora, Sayan Ranu, Arnab Bhattacharya. International World Wide Web Conference (WWW), 2019, pages 1886-1896, San Francisco, USA.

  4. HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces. Akhil Arora, Sakshi Sinha, Piyush Kumar, Arnab Bhattacharya. Proceedings of the VLDB Endowment (PVLDB), 2018, 11(8), pages 906-919.

  5. Finding Largest Rectangle inside a Digital Object and Rectangularization. Apurba Sarkar, Arindam Biswas, Mousumi Dutt, Arnab Bhattacharya. Journal of Computer and System Sciences, 2018, 95, pages 204-217.

  6. Image Management for Biological Data. Arnab Bhattacharya, Vebjorn Ljosa. Book chapter in Encyclopedia of Database Systems (2nd Edition) edited by L. Liu and M. T. Ozsu. Springer, 2018.

  7. MineAr: Using Crowd Knowledge for Mining Association Rules in the Health Domain. Milan Someswar, Arnab Bhattacharya. ACM Joint International Conference on Data Science & Management of Data (CoDS-COMAD), 2018, pages 108-117, Goa, India.

  8. Finding Shell Company Accounts using Anomaly Detection. Devendra K. Luna, Girish K. Palshikar, Manoj Apte, Arnab Bhattacharya. ACM Joint International Conference on Data Science & Management of Data (CoDS-COMAD), 2018, pages 167-174, Goa, India.

  9. Tracking the Impact of Fact Deletions on Knowledge Graph Queries using Provenance Polynomials. Garima Gaur, Srikanta J. Bedathur, Arnab Bhattacharya. International Conference on Information and Knowledge Management (CIKM), 2017, pages 2079-2082, Singapore.

  10. SkyGraph: Retrieving Regions of Interest using Skyline Subgraph Queries. Shiladitya Pande, Sayan Ranu, Arnab Bhattacharya. Proceedings of the VLDB Endowment (PVLDB), 2017, 10(11), pages 1382-1393.

  11. NetClus: A Scalable Framework for Locating Top-K Sites for Placement of Trajectory-Aware Services. Shubhadip Mitra, Priya Saraf, Richa Sharma, Arnab Bhattacharya, Sayan Ranu, Harsh Bhandari. International Conference on Data Engineering (ICDE), 2017, pages 87-90, San Diego, USA.

  12. K-Dominant Skyline Join Queries: Extending the Join Paradigm to K-Dominant Skylines. Anuradha Awasthi, Arnab Bhattacharya, Sanchit Gupta, Ujjwal K. Singh. International Conference on Data Engineering (ICDE), 2017, pages 99-102, San Diego, USA.

  13. Neighbor-Aware Search for Approximate Labeled Graph Matching using the Chi-Square Statistics. Sourav Dutta, Pratik Nayek, Arnab Bhattacharya. International World Wide Web Conference (WWW), 2017, pages 1281-1290, Perth, Australia.

  14. Automatic Grading and Feedback using Program Repair for Introductory Programming Courses. Sagar Parihar, Ziyaan Dadachanji, Praveen Kumar Singh, Rajdeep Das, Amey Karkare, Arnab Bhattacharya. ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE), 2017, pages 92-97, Bologna, Italy.

  15. GARUDA: A System for Large-Scale Mining of Statistically Significant Connected Subgraphs. Satyajit Bhadange, Akhil Arora, Arnab Bhattacharya. Demo at International Conference on Very Large Data Bases (VLDB), 2016, to appear, New Delhi, India.

  16. SMS: Stable Matching Algorithm using Skylines. Rohit Anurag, Arnab Bhattacharya. International Conference on Scientific and Statistical Database Management (SSDBM), 2016, pages 24:1-24:4, Budapest, Hungary.

  17. SkyCover: Finding Range-Constrained Approximate Skylines with Bounded Quality Guarantees. Shubhendu Aggarwal, Shubhadip Mitra, Arnab Bhattacharya. International Conference on Management of Data (COMAD), 2016, pages 1-12, Pune, India.

  18. Probabilistic Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Existential Uncertain Relations. Arnab Bhattacharya, Shrikant Awate. International Conference on Scientific and Statistical Database Management (SSDBM), 2015, pages 5:1-5:12, San Diego, USA.

  19. Trajectory Aware Macro-cell Planning for Mobile Users. Shubhadip Mitra, Sayan Ranu, Vinay Kolar, Arnab Bhattacharya, Ravi Kokku, Aditya Telang, Sriram Raghavan. IEEE International Conference on Computer Communications (INFOCOM), 2015, 792-800, Hong Kong, China.

  20. Generation of Random Triangular Digital Curves using Combinatorial Techniques. Apurba Sarkar, Arindam Biswas, Mousumi Dutt, Arnab Bhattacharya. International Conference on Pattern Recognition and Machine Intelligence (PReMI), 2015, pages 136-145, Warsaw, Poland.

  21. Using Social Connections to Improve Collaborative Filtering. Kanish Manuja, Arnab Bhattacharya. IKDD Conference on Data Sciences (CoDS), 2015, pages 140-141, Bengaluru, India.

  22. Generation of Random Digital Curves using Combinatorial Techniques. Apurba Sarkar, Arindam Biswas, Mousumi Dutt, Arnab Bhattacharya. Conference on Algorithms and Discrete Applied Mathematics (CALDAM), 2015, pages 286-297, Kanpur, India.

  23. Mining Statistically Significant Connected Subgraphs in Vertex Labeled Graphs. Akhil Arora, Mayank Sachan, Arnab Bhattacharya. SIGMOD International Conference on Management of Data (SIGMOD), 2014, pages 1003-1014, Snowbird, USA.

  24. Efficient and Effective Route Planning in Road Networks with Probabilistic Data using Skyline Paths. Arzoo Katiyar, Arnab Bhattacharya, Shubhadip Mitra. IKDD Conference on Data Sciences (CoDS), 2014, New Delhi, India.

  25. Emotion Recognition from Audio and Visual Data using F-score based Fusion. Abhishek Gera, Arnab Bhattacharya. IKDD Conference on Data Sciences (CoDS), 2014, New Delhi, India.

  26. RCached-tree: An Index Structure for Efficiently Answering Popular Queries. Manash Pal, Arnab Bhattacharya, Debjyoti Paul. International Conference on Information and Knowledge Management (CIKM), 2013, pages 1173-1176, San Francisco, USA.

  27. Efficient Edit Distance based String Similarity Search using Deletion Neighborhoods. Shashwat Mishra, Tejas Gandhi, Akhil Arora, Arnab Bhattacharya. EDBT/ICDT Workshops, 2013, pages 375-383, Genoa, Italy.

  28. Hybrid HBase: Leveraging Flash SSDs to Improve Cost per Throughput of HBase. Anurag Awasthi, Avani Nandini, Arnab Bhattacharya, Priya Sehgal. International Conference on Management of Data (COMAD), 2012, pages 68-79, Pune, India.

  29. A Plant Identification System using Shape and Morphological Features on Segmented Leaflets: Team IITK, CLEF 2012. Akhil Arora, Ankit Gupta, Nitesh Bagmar, Shashwat Mishra, Arnab Bhattacharya. CLEF (Online Notes/Labs/Workshop), 2012, Rome, Italy.

  30. Mining Statistically Significant Substrings using the Chi-Square Statistic. Mayank Sachan, Arnab Bhattacharya. International Conference on Very Large Data Bases (VLDB), 2012, pages 1052-1063, Istanbul, Turkey.

  31. Mining Statistically Significant Substrings using the Chi-Square Statistic. Mayank Sachan, Arnab Bhattacharya. Proceedings of the VLDB Endowment (PVLDB), 2012, 5(10), pages 1052-1063.

  32. Mining Statistically Significant Substrings Based on the Chi-Square Measure. Sourav Dutta, Arnab Bhattacharya. Book chapter in Pattern Discovery Using Sequence Data Mining: Applications and Studies edited by P. Kumar, P. R. Krishna and S. B. Raju. IGI Global, 2012.

  33. Minimally Infrequent Itemset Mining using Pattern-Growth Paradigm and Residual Trees. Ashish Gupta, Akshay Mittal, Arnab Bhattacharya. International Conference on Management of Data (COMAD), 2011, pages 57-68, Bengaluru, India. (Best paper)

  34. Caching Stars in the Sky: A Semantic Caching Approach to Accelerate Skyline Queries. Arnab Bhattacharya, B. Palvali Teja, Sourav Dutta. International Conference on Database and Expert Systems Applications (DEXA), 2011, pages 493-501, Toulouse, France.

  35. A Continuous Query System for Dynamic Route Planning. Nirmesh Malviya, Samuel Madden, Arnab Bhattacharya. International Conference on Data Engineering (ICDE), 2011, pages 792-803, Hannover, Germany.

  36. Finding the Bias and Prestige of Nodes in Networks based on Trust Scores. Abhinav Mishra, Arnab Bhattacharya. International World Wide Web Conference (WWW), 2011, pages 567-576, Hyderabad, India.

  37. Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Multiple Relations. Arnab Bhattacharya, B. Palvali Teja. International Conference on Management of Data (COMAD), 2010, pages 15-26, Nagpur, India. (Best student paper)

  38. INSTRUCT: Space-Efficient Structure for Indexing and Complete Query Management of String Databases. Sourav Dutta, Arnab Bhattacharya. International Conference on Management of Data (COMAD), 2010, pages 27-38, Nagpur, India.

  39. Simulated Evolution and Learning, Proceedings of the 8th International Conference on Simulated Evolution and Learning (SEAL). Co-edited by K. Deb, A. Bhattacharya, N. Chakraborti, P. Chakroborty, S. Das, J. Dutta, S. K. Gupta, A. Jain, V. Aggarwal, J. Branke, S. J. Louis, K. C. Tan, Springer, 2010.

  40. Minimum Spanning Tree on Spatio-Temporal Networks. Viswanath Gunturi, Shashi Shekhar, Arnab Bhattacharya. International Conference on Database and Expert Systems Applications (DEXA), 2010, pages 149-158, Bilbao, Spain.

  41. Finding Top-k Similar Pairs of Objects Annotated with Terms from an Ontology. Arnab Bhattacharya, Abhishek Bhowmick, Ambuj K. Singh. International Conference on Scientific and Statistical Database Management (SSDBM), 2010, pages 214-232, Heidelberg, Germany.

  42. Most Significant Substring Mining based on Chi-square Measure. Sourav Dutta, Arnab Bhattacharya. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2010, pages 319-327, Hyderabad, India.

  43. Querying Spatial Patterns. Vishwakarma Singh, Arnab Bhattacharya, Ambuj K. Singh. International Conference on Extending Database Technology (EDBT), 2010, pages 418-429, Lausanne, Switzerland.

  44. Image Management for Biological Data. Arnab Bhattacharya, Vebjorn Ljosa. Book chapter in Encyclopedia of Database Systems edited by M. T. Ozsu and L. Liu. Springer, 2009.

  45. On Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces. Arnab Bhattacharya, Purushottam Kar, Manjish Pal. International Conference on Database and Expert Systems Applications (DEXA), 2009, pages 164-172, Linz, Austria.

  46. FTDP-17 Mutations in Tau Alter the Regulation of Microtubule Dynamics: An ''Alternative Core'' Model for Normal and Pathological Tau Action. Adria LeBoeuf, Sasha F. Levy, Michelle Gaylord, Arnab Bhattacharya, Ambuj K. Singh, Mary Ann Jordan, Leslie Wilson, Stuart C. Feinstein. Journal of Biological Chemistry, 2008, 283(52), pages 36406-36415.

  47. A General Modeling and Visualization Tool for Comparing Different Members of a Group: Application to Studying Tau-Mediated Regulation of Microtubule Dynamics. Arnab Bhattacharya, Sasha Levy, Adria LeBoeuf, Michelle Gaylord, Leslie Wilson, Ambuj K. Singh, Stuart C. Feinstein. BMC Bioinformatics, 2008, 9, page 339.

  48. Efficient Computation of Statistical Significance of Query Results in Databases. Vishwakarma Singh, Arnab Bhattacharya, Ambuj K. Singh. International Conference on Scientific and Statistical Database Management (SSDBM), 2008, pages 509-516, Hong Kong, China.

  49. MIST: Distributed Indexing and Querying in Sensor Networks using Statistical Models. Arnab Bhattacharya, Anand Meka, Ambuj K. Singh. International Conference on Very Large Data Bases (VLDB), 2007, pages 854-865, Vienna, Austria.

  50. Indexing Spatially Sensitive Distance Measures Using Multi-Resolution Lower Bounds. Vebjorn Ljosa, Arnab Bhattacharya, Ambuj K. Singh. International Conference on Extending Database Technology (EDBT), 2006, pages 865-883, Munich, Germany.

  51. LB-Index: A Multi-Resolution Index Structure for Images. Vebjorn Ljosa, Arnab Bhattacharya, Ambuj K. Singh. International Conference on Data Engineering (ICDE), 2006, pages 144-145, Atlanta, USA.

  52. ViVo: Visual Vocabulary Construction for Mining Biomedical Images. Arnab Bhattacharya, Vebjorn Ljosa, Jia-Yu Pan, Mark R. Verardo, Hyung-Jeong Yang, Christos Faloutsos, Ambuj K. Singh. International Conference on Data Mining (ICDM), 2005, pages 50-57, Houston, USA. (One of the top five student papers)

  53. ProGreSS: Simultaneous Searching of Protein Databases by Sequence and Structure. Arnab Bhattacharya, Tolga Can, Tamer Kahveci, Ambuj K. Singh, Yuan-Fang Wang. Pacific Symposium on Biocomputing (PSB), 2004, pages 264-275, Hawaii, USA.


Education:

  • Ph.D. in Computer Science, Dept. of Computer Science, University of California, Santa Barbara, CA 93106, USA. 2007.

  • M.S. in Computer Science, Dept. of Computer Science, University of California, Santa Barbara, CA 93106, USA. 2007.

  • Bachelor of Computer Science and Engineering (B.C.S.E.), Jadavpur University, Kolkata - 700032, India. 2001.


Invited Talks:

  1. ''Querying Statistically Significant Subgraphs'' at the NetApp Corporation, Bengaluru, India, 2019.

  2. ''Graph Querying using Statistical Significance'' at the Indian Institute of Science, Engineering and Technology, Shibpur, 2019.

  3. ''Data Mining'' at the Andhra Pradesh Human Resource Development Institute (APHRDI), 2018.

  4. ''Trajectory Aware Service Location Problems'' at the NetApp Corporation, Bengaluru, India, 2016.

  5. ''Mining Statistically Significant Connected Subgraphs'' at the NetApp Corporation, Bengaluru, India, 2015.

  6. ''Mining Statistically Significant Substructures based on the Chi-square Statistic'' at IBM, New Delhi, India, 2015.

  7. ''Mining Statistically Significant Substrings based on the Chi-square Measure'' at the NetApp Corporation, Bengaluru, India, 2014.

  8. ''Mining Statistically Significant Substructures based on the Chi-square Statistic'' at the Indian Statistical Institute, Kolkata, India, 2014.

  9. ''Skylines: Databases' Answer to Multiple Preferences'' at the NetApp Corporation, Bengaluru, India, 2013.

  10. ''Skylines: Databases' Answer to Multiple Preferences'' at the Dept. of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India, 2012.

  11. ''Finding the Bias and Prestige of Nodes in Networks based on Trust Scores'' at Yahoo! Labs, Bengaluru, India, 2011.

  12. ''Earth Mover's Distance: An Adaptable and Universally Applicable Distance Measure'' at the Dept. of Computer Science, Andhra University, Vishakhapatnam, India, 2010.

  13. ''Earth Mover's Distance: An Adaptable and Universally Applicable Distance Measure'' at Tata Consultancy Services (TCS), Gurgaon, India, 2010.

  14. ''On Earth Mover's Distance: A Spatially Sensitive Distance Measure'' at the Dept. of Computer Science, Free University of Bozen-Bolzano, Italy, 2009.

  15. Popular lecture on ''Game Theory'' at the Business Club meeting of the Indian Institute of Technology, Kanpur, India, 2009.

  16. ''Distributed Indexing and Querying in Sensor Networks using Statistical Models'' at the Dept. of Computer Science, Université Libre de Bruxelles, Belgium, 2008.


Patents:

  1. Multiple Criteria Decision Analysis
    • US patent number US8504581B2
    • India patent number INDEL20123027A

  2. Multiple Criteria Decision Analysis in Distributed Databases
    • Global patent number WO2015104591A1


Important Courses Taught:

  1. Data Mining

  2. Indexing and Searching Techniques in Databases

  3. Information Retrieval

  4. Skyline Queries in Databases

  5. Data-Driven Program Analysis

  6. Topics in Biocomputing

  7. Principles of Database Systems

  8. Fundamentals of Computing

  9. Computing Laboratory


Awards, Scholarships and Certificates:

  1. IBM Faculty Research Award, 2014.

  2. Recipient of award from Yahoo! Faculty Research and Engagement Program, 2011.

  3. Best paper award at the International Conference on Management of Data (COMAD), 2011 for the paper ''Minimally Infrequent Itemset Mining using Pattern-Growth Paradigm and Residual Trees''.

  4. Best student paper award at the International Conference on Management of Data (COMAD), 2010 for the paper ''Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Multiple Relations''.

  5. One of the top-five student paper awards at the International Conference on Data Mining (ICDM), 2005 for the paper ''ViVo: Visual Vocabulary Construction for Mining Biomedical Images''.

  6. ICDM Student Travel Award sponsored by IBM at the International Conference on Data Mining (ICDM), 2005 awarded to the top five student papers.


Major Sponsored Projects:

  1. “Scalable Spatio-Temporal Measurement and Analysis of Air Pollution Data for Delhi-NCR using Vehicle Mounted Sensors” under IMPRINT-II scheme from SERB, 2019-2022.

  2. “NYAYA: A Legal Assistance System for Legal Experts and the Common Man in India” from SERB, India, 2019-2022.

  3. “Continuous Monitoring of Sampreeti Setu (New Jubilee Bridge): Instrumentation, Design and Health Assessment” from Eastern Railway Zone, Indian Railways, 2018-2023.

  4. “Development of Optimal Eco Driving System in HEV/PHEV based on Vehicle Environment” from KEIIT (Korea Evaluation Institute of Industrial Technology), 2018-2021.

  5. “Development of Novel Materials and Methods for Removal of Relcalcitrant Organics from Water” from Indo-Taiwan Programme in Science and Technology, 2017-2020.

  6. “Provenance in Graph Databases” from IBM, India, 2016-2018.

  7. “A Smart Phone Based Dark Field Microscope for Point of Care (Poc) Diagnosis of Blood Cell Disorder in Lethal Diseases” under IMPRINT-I scheme from SERB, 2017-2020.

  8. “Identifying Fake Product Listings and Sellers” from Flipkart, India, 2016-2018.

  9. “Mining Statistically Significant Substructures using the Chi-Square Measure and Setup of Big Data Lab” from IBM, India, 2014-2017.

  10. “Extending Skyline Queries to Distributed and Uncertain Databases” from SERB, India, 2014-2017.

  11. “Development of Air Quality Index (AQI) for Indian Cities” from Central Pollution Control Board (CPCB), India, 2014-2015.

  12. “Deciphering the BMP Signaling Network in Developing Bone: An Inter-disciplinary Approach Combining Bioinformatic Data Mining Tools along with Molecular, Genetic and Developmental Biology” from DBT, Govt. of India, 2013-2016.

  13. “Flash-Aware Optimizations for Columnar Databases” from NetApp Corporation, 2011-2018.

  14. “Reputation Framework for Ad Networks” from Yahoo! Research, 2011-2011.

  15. “Data Storage and Backup Solutions” from BITCOE (BSNL IITK Telecom Centre of Excellence), 2010-2011.


Professional Activities:

  • Program Chair for the ACM India Joint International Conference on Data Science & Management of Data (CoDS-COMAD), 7th ACM IKDD CODS and 25th COMAD, 2020.

  • Organizer for FIRE 2019 AILA Track: Artificial Intelligence for Legal Assistance at Forum for Information Retrieval (FIRE), 2019.

  • Workshop Organizer of 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) co-located with SIGMOD 2019.

  • Workshop Organizer of 1st Workshop on Legal Data Analytics and Mining (LeDAM) co-located with CIKM 2018.

  • Workshop Organizer of 1st Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) co-located with SIGMOD 2018.

  • Workshop Organizer of 2nd International Workshop on Network Data Analytics (NDA@SIGMOD) co-located with SIGMOD 2017.

  • Organizer for FIRE 2017 IRLeD Track: Information Retrieval from Legal Documents at Forum for Information Retrieval (FIRE), 2017.

  • Program Chair for the 19th International Conference on Management of Data (COMAD), 2013.

  • Executive Member of the Computer Society of India's (CSI) Special Interest Group in Data (SIGDATA) since 2012.

  • Publication Chair for the 18th International Conference on Management of Data (COMAD), 2012.

  • Program Chair for the 8th International Conference on Simulated Evolution and Learning (SEAL), 2010.

  • Publicity Chair for the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2010.

  • Member of the CODATA National Committee since 2016.

  • Member of the Association for Computing Machinery (ACM) since 2010.

  • Member of the Institute of Electrical and Electronics Engineers (IEEE) since 2005.

  • Reviewer and Program Committee member for many international journals and conferences, including VLDB, ICDE, TKDD, etc.

  • Panelist for discussions on AI and Its Impact on Jobs at IIT Kanpur.


Toggle Menu