Prof. Dr. Konrad Rieck
Institute of Computer Science
University of Göttingen
37077 Göttingen, Germany
Fon: +49 551 39 172000
Fax: +49 551 39 14403
Email: firstname.lastname@example.org (PGP key)
I am a junior professor at the University of Göttingen, where I am heading the Computer Security Group. Prior to taking this position, I have been working at Technische Universität Berlin and Fraunhofer Institute FIRST.
My research interests revolve around computer security and machine learning. This includes the detection of attacks, the analysis of malicious software, and the discovery of vulnerabilities, as well as learning with structured data, such as sequences and trees.
Generalized Vulnerability Extrapolation using Abstract Syntax Trees.
The discovery of vulnerabilities in source code is a key for securing computer systems. While specific types of security flaws can be identified automatically, in the general case the process of finding vulnerabilities cannot be automated and vulnerabilities are mainly discovered by manual analysis. In this paper, we propose a method for assisting a security analyst during auditing of source code. Our method proceeds by extracting abstract syntax trees from the code and determining structural patterns in these trees, such that each function in the code can be described as a mixture of these patterns. This representation enables us to decompose a known vulnerability and extrapolate it to a code base, such that functions potentially suffering from the same flaw can be suggested to the analyst. We evaluate our method on the source code of four popular open-source projects: LibTIFF, FFmpeg, Pidgin and Asterisk. For three of these projects, we are able to identify zero-day vulnerabilities by inspecting only a small fraction of the code bases.
Automatic Analysis of Malware Behavior using Machine Learning.
Malicious software — so called malware — poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in the form of computer viruses, Internet worms and Trojan horses. While obfuscation and polymorphism employed by malware largely impede detection at file level, the dynamic analysis of malware binaries during run-time provides an instrument for characterizing and defending against the threat of malicious software. In this article, we propose a framework for the automatic analysis of malware behavior using machine learning. The framework allows for automatically identifying novel classes of malware with similar behavior (clustering) and assigning unknown malware to these discovered classes (classification). Based on both, clustering and classification, we propose an incremental approach for behavior-based analysis, capable of processing the behavior of thousands of malware binaries on a daily basis. The incremental analysis significantly reduces the run-time overhead of current analysis methods, while providing accurate discovery and discrimination of novel malware variants.
Cujo: Efficient Detection and Prevention of Drive-by-Download Attacks.
Approximate Tree Kernels.
Convolution kernels for trees provide simple means for learning with tree-structured data. The computation time of tree kernels is quadratic in the size of the trees, since all pairs of nodes need to be compared. Thus, large parse trees, obtained from HTML documents or structured network data, render convolution kernels inapplicable. In this article, we propose an effective approximation technique for parse tree kernels. The approximate tree kernels (ATKs) limit kernel computation to a sparse subset of relevant subtrees and discard redundant structures, such that training and testing of kernel-based learning methods are significantly accelerated. We devise linear programming approaches for identifying such subsets for supervised and unsupervised learning tasks, respectively. Empirically, the approximate tree kernels attain run-time improvements up to three orders of magnitude while preserving the predictive accuracy of regular tree kernels. For unsupervised tasks, the approximate tree kernels even lead to more accurate predictions by identifying relevant dimensions in feature space.
Linear-Time Computation of Similarity Measures for Sequential Data.
Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures. Experiments on data sets from bioinformatics, text processing and computer security illustrate the efficiency of the proposed algorithms — enabling peak performances of up to 106 pairwise comparisons per second. The utility of distances and non-metric similarity measures for sequences as alternatives to string kernels is demonstrated in applications of text categorization, network intrusion detection and transcription site recognition in DNA.
See all publications.
Conference and Workshop Organization
Program chair of the 10th Conference on Detection of Intrusions and Malware (DIMVA 2013)
General chair of the 6th European Conference on Computer Network Defense (EC2ND 2010)
Local organization of GI Graduate Workshop on Reactive Security (SPRING 2006)
Program Committee Memberships
Conference on Detection of Intrusions and Malware (DIMVA) 2009 – 2013
ACM Workshop on Artificial Intelligence and Security (AISEC) 2011 – 2013
International Conference on Availability, Reliability and Security (ARES) 2012, 2013
International Conference on Privacy, Security and Trust (PST) 2013
Symposium on Stabilization, Safety, and Security of Distributed Systems. (SSS) 2012
European Conference on Computer Network Defense (EC2ND) 2010, 2011
Demo Track of European Conference on Machine Learning (ECML DEMO) 2010, 2011
International Joint Conference on Artificial Intelligence (IJCAI) 2011
GI Conference "Sicherheit, Schutz und Zuverlässigkeit" (SICHERHEIT) 2010
Workshop on Machine Learning Open Source Software (MLOSS) 2010
Reviewing for Journals
Journal of Machine Learning Research (JMLR)
ACM Transactions on Information and System Security (TISSEC)
IEEE Transactions on Dependable and Secure Computing (TDSC)
Data Mining and Knowledge Discovery (DMKD)
Information Fusion (INFFUS)
International Journal of Information Security (IJIS)
Security and Communication Networks (SCN)
I am a member of "Verband der krawattenlosen Wissensträger" (VDKW)