Contact

Prof. Dr. Konrad Rieck
Institute of Computer Science
University of Göttingen
Goldschmidtstraße 7
37077 Göttingen, Germany

Fon: +49 551 39 172000
Fax: +49 551 39 14403
Konrad Rieck
Email: konrad.rieck@uni-goettingen.de (PGP key)
Web: www.sec.cs.uni-goettingen.de

About me

I am a junior professor at the University of Göttingen, where I am heading the Computer Security Group. Prior to taking this position, I have been working at Technische Universität Berlin and Fraunhofer Institute FIRST.

My research interests revolve around computer security and machine learning. This includes the detection of attacks, the analysis of malicious software, and the discovery of vulnerabilities, as well as learning with structured data, such as sequences and trees.

I received the joint dissertation award of the Competence Center for Applied Security Technology (CAST e.V.) and the German Informatics Society (GI) in 2010.

Selected Publications

  • Generalized Vulnerability Extrapolation using Abstract Syntax Trees.

    Fabian Yamaguchi, Markus Lottmann, and Konrad Rieck.

    Proc. of 28th Annual Computer Security Applications Conference (ACSAC), 359–368, December 2012. Outstanding Paper Award
    abstract abstractpdf pdflink link

    The discovery of vulnerabilities in source code is a key for securing computer systems. While specific types of security flaws can be identified automatically, in the general case the process of finding vulnerabilities cannot be automated and vulnerabilities are mainly discovered by manual analysis. In this paper, we propose a method for assisting a security analyst during auditing of source code. Our method proceeds by extracting abstract syntax trees from the code and determining structural patterns in these trees, such that each function in the code can be described as a mixture of these patterns. This representation enables us to decompose a known vulnerability and extrapolate it to a code base, such that functions potentially suffering from the same flaw can be suggested to the analyst. We evaluate our method on the source code of four popular open-source projects: LibTIFF, FFmpeg, Pidgin and Asterisk. For three of these projects, we are able to identify zero-day vulnerabilities by inspecting only a small fraction of the code bases.

  • Automatic Analysis of Malware Behavior using Machine Learning.

    Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz.

    Journal of Computer Security (JCS), 19 (4) 639–668, IOSPress, June 2011.
    abstract abstractpdf pdflink link

    Malicious software — so called malware — poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in the form of computer viruses, Internet worms and Trojan horses. While obfuscation and polymorphism employed by malware largely impede detection at file level, the dynamic analysis of malware binaries during run-time provides an instrument for characterizing and defending against the threat of malicious software. In this article, we propose a framework for the automatic analysis of malware behavior using machine learning. The framework allows for automatically identifying novel classes of malware with similar behavior (clustering) and assigning unknown malware to these discovered classes (classification). Based on both, clustering and classification, we propose an incremental approach for behavior-based analysis, capable of processing the behavior of thousands of malware binaries on a daily basis. The incremental analysis significantly reduces the run-time overhead of current analysis methods, while providing accurate discovery and discrimination of novel malware variants.

  • Cujo: Efficient Detection and Prevention of Drive-by-Download Attacks.

    Konrad Rieck, Tammo Krueger, and Andreas Dewald.

    Proc. of 26th Annual Computer Security Applications Conference (ACSAC), 31–39, December 2010.
    abstract abstractpdf pdflink link

    The JavaScript language is a core component of active and dynamic web content in the Internet today. Besides its great success in enhancing web applications, however, JavaScript provides the basis for so-called drive-by downloads — attacks exploiting vulnerabilities in web browsers and their extensions for unnoticeably downloading malicious software. Due to the diversity and frequent use of obfuscation in these attacks, static code analysis is largely ineffective in practice. While dynamic analysis and honeypots provide means to identify drive-by-download attacks, current approaches induce a significant overhead which renders immediate prevention of attacks intractable. In this paper, we present Cujo, a system for automatic detection and prevention of drive-by-download attacks. Embedded in a web proxy, Cujo transparently inspects web pages and blocks delivery of malicious JavaScript code. Static and dynamic code features are extracted on-the-fly and analysed for malicious patterns using efficient techniques of machine learning. We demonstrate the efficacy of Cujo in different experiments, where it detects 94% of the drive-by downloads with few false alarms and a median run-time of 500 ms per web page — a quality that, to the best of our knowledge, has not been attained in previous work on detection of drive-by-download attacks.

  • Approximate Tree Kernels.

    Konrad Rieck, Tammo Krueger, Ulf Brefeld, and Klaus-Robert Müller.

    Journal of Machine Learning Research (JMLR), 11 (Feb) 555–580, February 2010.
    abstract abstractpdf pdflink link

    Convolution kernels for trees provide simple means for learning with tree-structured data. The computation time of tree kernels is quadratic in the size of the trees, since all pairs of nodes need to be compared. Thus, large parse trees, obtained from HTML documents or structured network data, render convolution kernels inapplicable. In this article, we propose an effective approximation technique for parse tree kernels. The approximate tree kernels (ATKs) limit kernel computation to a sparse subset of relevant subtrees and discard redundant structures, such that training and testing of kernel-based learning methods are significantly accelerated. We devise linear programming approaches for identifying such subsets for supervised and unsupervised learning tasks, respectively. Empirically, the approximate tree kernels attain run-time improvements up to three orders of magnitude while preserving the predictive accuracy of regular tree kernels. For unsupervised tasks, the approximate tree kernels even lead to more accurate predictions by identifying relevant dimensions in feature space.

  • Linear-Time Computation of Similarity Measures for Sequential Data.

    Konrad Rieck and Pavel Laskov.

    Journal of Machine Learning Research (JMLR), 9 (Jan) 23–48, January 2008.
    abstract abstractpdf pdflink link

    Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures. Experiments on data sets from bioinformatics, text processing and computer security illustrate the efficiency of the proposed algorithms — enabling peak performances of up to 106 pairwise comparisons per second. The utility of distances and non-metric similarity measures for sequences as alternatives to string kernels is demonstrated in applications of text categorization, network intrusion detection and transcription site recognition in DNA.

See all publications.

Professional Activities

Memberships
Editorial board of the Journal of Machine Learning Research (JMLR)
Steering committee of the GI SIG Intrusion Detection and Response (SIDAR)
German Informatics Society (GI)

Conference and Workshop Organization
Program chair of the 10th Conference on Detection of Intrusions and Malware (DIMVA 2013)
General chair of the 6th European Conference on Computer Network Defense (EC2ND 2010)
Local organization of GI Graduate Workshop on Reactive Security (SPRING 2006)

Program Committee Memberships
Conference on Detection of Intrusions and Malware (DIMVA) 2009 – 2013
ACM Workshop on Artificial Intelligence and Security (AISEC) 2011 – 2013
International Conference on Availability, Reliability and Security (ARES) 2012, 2013
International Conference on Privacy, Security and Trust (PST) 2013
Symposium on Stabilization, Safety, and Security of Distributed Systems. (SSS) 2012
European Conference on Computer Network Defense (EC2ND) 2010, 2011
Demo Track of European Conference on Machine Learning (ECML DEMO) 2010, 2011
International Joint Conference on Artificial Intelligence (IJCAI) 2011
GI Conference "Sicherheit, Schutz und Zuverlässigkeit" (SICHERHEIT) 2010
Workshop on Machine Learning Open Source Software (MLOSS) 2010

Reviewing for Journals
Journal of Machine Learning Research (JMLR)
ACM Transactions on Information and System Security (TISSEC)
IEEE Transactions on Dependable and Secure Computing (TDSC)
Data Mining and Knowledge Discovery (DMKD)
Information Fusion (INFFUS)
International Journal of Information Security (IJIS)
Security and Communication Networks (SCN)

Whenever Possible
I am a member of "Verband der krawattenlosen Wissensträger" (VDKW)