CSE - IIT Kanpur

CS 658A: Malware Analysis and Intrusion Detection

Credits: 3-0-0-0- [9]

Prerequisite: CS 771A or equivalent exposure to a machine learning course and hands-on project development, familiarity with Machine Learning libraries are a necessary requirement for this course. Good Exposure to Linear Algebra, and Cyber Security (equivalent to CS 628A) will also be helpful.

Who can take the course: PhD, Masters, 3rd and 4th year UG Students

Instructor: Prof. Sandeep Kumar Shukla

Course Rationale:

A recent report by the IDC for smartphone operating system global market share shows that in the 3^rd quarter of the year 2018, the total market share of Android was 86.8%. In May 2019, Google revealed that there are now more than two and half billion Android devices that are being used actively in a month. With the increase in popularity of Android, the number of active users and the day to day activity of each user on Android devices have also increased a lot. This allows malware authors to target Android devices more and more. It is reported by Gadgets360 that 8400 new instances of Android malware are found every day. This implies that a new malware surfaces every 10 seconds. Malware is one of the serious cyber threats which evolve daily, and can disrupt various sectors like online banking, social networking, etc. According to the reports published by AV-Test Institute, across various platforms – android, windows, Linux etc., there has been a tremendous growth in the number of malicious samples registering over 250,000 new malicious samples every day. Analyzing these samples manually using reverse engineering and disassembly is a tedious and cumbersome task. It is therefore not convenient for the security analysts. Thus, there is a dire need for automated malware analysis systems which produce effective results with minimal human intervention. Antivirus systems use the most common and primitive approach, which involves the generation of signatures of known malware beforehand and then comparing newly downloaded executables against these signatures to predict its nature. This technique drastically fails in case of any zero-day malware, a malware which has been newly created and thus a signature is not available. Other common techniques are static analysis and dynamic analysis. Static analysis analyzes the executables without executing it and predicts the results. It is generally used because it’s relatively fast but fails if the malware is packed, encrypted or obfuscated. To overcome the limitations of static approaches another approach, i.e., dynamic analysis is used. It involves collecting behavioural data by executing the sample in a sandboxed environment and then using it for detection and classification. The dynamic analysis also has some limitations such as the detection of virtual environment and code coverage issues. As a result, researchers have started using the combination of both the approaches known as a hybrid approach.

Intrusion into IT networks of organizations, and OT network of utilities, factories, power generation stations is another growing cyber threat that one has to cope with. Intrusion detection is the technique for analyzing various signals (network traffic, variations in CPU activities, variations in sensor data, variations in attempts to gain access etc.) in order to ascertain if an intrusion attempt is on-going or if the intrusion is taking place. Intrusion detection also can be done by rule-based techniques, signature-based techniques or anomaly detection.

For those wanting to pursue a career in Cyber Security – knowing the techniques for malware analysis, and intrusion detection, ability to develop tools to carry out such analyses is very important, and that is why this course is being offered as an advanced topics course.

On completion of this course, a student should be able to: (i) Explain the vast scope of the malware borne cyber-attacks, various malware types, and platform-specific variations of malware; (ii) Explain the threat models associated with network and host intrusion by cyber-attackers; (iii) Explain the basic signs of malware infection and signs of intrusion from a security analyst’s point of view; (iv) Explain various machine learning techniques and tools used for malware analysis, anomaly detection and techniques such as memory forensics; (v) Implement tools for malware analysis employing machine learning tools and libraries and measure the efficacy of their tools on labelled and unlabeled data; (vi) Implement intrusion detection tools with machine learning libraries and measure the efficacies of the tools; (vii) Read and explain most recent publications in top conferences in the field of cyber security pertaining to machine learning and intrusion detection; (viii) prepare for further research in malware analysis and intrusion detection.

The reason the two topics (a) malware analysis and classification; and (b) intrusion detection are put together in the same advanced topics course is because they share many common machine learning based techniques.

Module	Topic	No. of 1 hour Lectures
Introduction	Malware classification, types, and platform specific issues with malware, Intrusion into IT and operational network (OT) and their signs	3
Basic Malware Analysis	Manual Malware Infection analysis, signature based malware detection and classification – pros and cons, and need for machine learning based techniques	5
Advanced Techniques Malware Analysis	Static Analysis, Dynamic Analysis and Hybrid Analysis of Windows Malware, Linux Malware and Android Malware	8
Case Studies	Study papers in Malware Analysis from most recent conferences, Presentations and Discussions, and Implementations	6
Basic Intrusion Detection	Intrusion into network – Firewalls, Rule based techniques, signature based Techniques, Simple Machine Learning Models on Network Data	4
Advanced Intrusion Detection	Advanced Machine Learning Models for Intrusion Detection in IT Networks, Machine Learning in OT network especially with Cyber Physical Systems	6
Case Studies	Latest Papers in Intrusion Detection, Their theory and Implementations, and Data Analysis Techniques	8
Total Lecture hours		40 hours

Text:

There is no textbook for such a course yet. Research Papers will be the main sources of study material.

There will be other resources put on the web by the instructor.

Lecture notes, assignments, supplemental readings, and other resources will be provided via the course website
The course will consist of 3 hours of lectures per week, projects and homework, and possibly a course project.

Grading

Semester grades will be based on the following weights:

Attendance & In-Class Exercises

10%

(including pop quizzes)

Projects & Assignments

50%

(10% each for 7 assignments and projects)

Midterm Exam

Final Exam

20%

Semester grades will be determined after all work is completed and graded. Point ranges for letter grades will be based on a several factors, including absolute and relative performance. Letter grades will not be based on a curve or point range.

Unless otherwise stated on the class all graded assignments must be submitted by 11:55 pm on the specified due date via course site on canvas. There will be a 10% penalty for each 24-hour delay in submitting an assignment.

If you feel that an error is made in grading an assignment or an exam, you must present a written appeal within one week after the assignment or exam is returned to you. Verbal appeals are not allowed, and grades will not be changed after the one-week period. Your appeal should be specific. Submit all appeals to the instructor.

CS 658A: Malware Analysis and Intrusion Detection

Grading

People

Resources

Programs

Admissions

Department

Research