|
Prof. Dr. Konrad Rieck Institute of Computer Science University of Göttingen Goldschmidtstraße 7 37077 Göttingen, Germany |
Fon: +49 551 39 172000 Fax: +49 551 39 14403 |
|
|
Email: konrad.rieck@uni-goettingen.de (PGP key) Web: www.sec.cs.uni-goettingen.de |
||
I am a junior professor at the University of Göttingen, where I am heading the Computer Security Group. Prior to taking this position, I have been working at Technische Universität Berlin and Fraunhofer Institute FIRST.
My research interests revolve around computer security and machine learning. This includes the detection of attacks, the analysis of malicious software, and the discovery of vulnerabilities, as well as learning with structured data, such as sequences and trees.
I received the joint dissertation award of the Competence Center for Applied Security Technology (CAST e.V.) and the German Informatics Society (GI) in 2010.
Generalized Vulnerability Extrapolation using Abstract Syntax Trees.
Proc. of 28th Annual Computer Security Applications Conference (ACSAC), 359–368, December 2012. Outstanding Paper Award
abstract
pdf
link
The discovery of vulnerabilities in source code is a key for securing computer systems. While specific types of security flaws can be identified automatically, in the general case the process of finding vulnerabilities cannot be automated and vulnerabilities are mainly discovered by manual analysis. In this paper, we propose a method for assisting a security analyst during auditing of source code. Our method proceeds by extracting abstract syntax trees from the code and determining structural patterns in these trees, such that each function in the code can be described as a mixture of these patterns. This representation enables us to decompose a known vulnerability and extrapolate it to a code base, such that functions potentially suffering from the same flaw can be suggested to the analyst. We evaluate our method on the source code of four popular open-source projects: LibTIFF, FFmpeg, Pidgin and Asterisk. For three of these projects, we are able to identify zero-day vulnerabilities by inspecting only a small fraction of the code bases.
Automatic Analysis of Malware Behavior using Machine Learning.
Journal of Computer Security (JCS), 19 (4) 639–668, IOSPress, June 2011.
abstract
pdf
link
Malicious software — so called malware — poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in the form of computer viruses, Internet worms and Trojan horses. While obfuscation and polymorphism employed by malware largely impede detection at file level, the dynamic analysis of malware binaries during run-time provides an instrument for characterizing and defending against the threat of malicious software. In this article, we propose a framework for the automatic analysis of malware behavior using machine learning. The framework allows for automatically identifying novel classes of malware with similar behavior (clustering) and assigning unknown malware to these discovered classes (classification). Based on both, clustering and classification, we propose an incremental approach for behavior-based analysis, capable of processing the behavior of thousands of malware binaries on a daily basis. The incremental analysis significantly reduces the run-time overhead of current analysis methods, while providing accurate discovery and discrimination of novel malware variants.
Cujo: Efficient Detection and Prevention of Drive-by-Download Attacks.
Proc. of 26th Annual Computer Security Applications Conference (ACSAC), 31–39, December 2010.
abstract
pdf
link
The JavaScript language is a core component of active and dynamic web content in the Internet today. Besides its great success in enhancing web applications, however, JavaScript provides the basis for so-called drive-by downloads — attacks exploiting vulnerabilities in web browsers and their extensions for unnoticeably downloading malicious software. Due to the diversity and frequent use of obfuscation in these attacks, static code analysis is largely ineffective in practice. While dynamic analysis and honeypots provide means to identify drive-by-download attacks, current approaches induce a significant overhead which renders immediate prevention of attacks intractable. In this paper, we present Cujo, a system for automatic detection and prevention of drive-by-download attacks. Embedded in a web proxy, Cujo transparently inspects web pages and blocks delivery of malicious JavaScript code. Static and dynamic code features are extracted on-the-fly and analysed for malicious patterns using efficient techniques of machine learning. We demonstrate the efficacy of Cujo in different experiments, where it detects 94% of the drive-by downloads with few false alarms and a median run-time of 500 ms per web page — a quality that, to the best of our knowledge, has not been attained in previous work on detection of drive-by-download attacks.
Approximate Tree Kernels.
Journal of Machine Learning Research (JMLR), 11 (Feb) 555–580, February 2010.
abstract
pdf
link
Convolution kernels for trees provide simple means for learning with tree-structured data. The computation time of tree kernels is quadratic in the size of the trees, since all pairs of nodes need to be compared. Thus, large parse trees, obtained from HTML documents or structured network data, render convolution kernels inapplicable. In this article, we propose an effective approximation technique for parse tree kernels. The approximate tree kernels (ATKs) limit kernel computation to a sparse subset of relevant subtrees and discard redundant structures, such that training and testing of kernel-based learning methods are significantly accelerated. We devise linear programming approaches for identifying such subsets for supervised and unsupervised learning tasks, respectively. Empirically, the approximate tree kernels attain run-time improvements up to three orders of magnitude while preserving the predictive accuracy of regular tree kernels. For unsupervised tasks, the approximate tree kernels even lead to more accurate predictions by identifying relevant dimensions in feature space.
Linear-Time Computation of Similarity Measures for Sequential Data.
Journal of Machine Learning Research (JMLR), 9 (Jan) 23–48, January 2008.
abstract
pdf
link
Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures. Experiments on data sets from bioinformatics, text processing and computer security illustrate the efficiency of the proposed algorithms — enabling peak performances of up to 106 pairwise comparisons per second. The utility of distances and non-metric similarity measures for sequences as alternatives to string kernels is demonstrated in applications of text categorization, network intrusion detection and transcription site recognition in DNA.
See all publications.
Memberships
Editorial board of the
Journal of Machine Learning Research (JMLR)
Steering committee of the GI SIG
Intrusion Detection and Response (SIDAR)
German Informatics Society (GI)
Conference and Workshop Organization
Program chair of the 10th Conference on Detection of Intrusions and Malware
(DIMVA 2013)
General chair of the 6th European Conference on
Computer Network Defense (EC2ND 2010)
Local organization
of GI
Graduate Workshop on Reactive Security (SPRING 2006)
Program Committee Memberships
Conference on Detection of Intrusions and Malware
(DIMVA) 2009 – 2013
ACM Workshop on Artificial Intelligence and Security
(AISEC) 2011 – 2013
International Conference on Availability, Reliability and Security
(ARES) 2012, 2013
International Conference on Privacy, Security and Trust
(PST) 2013
Symposium on Stabilization, Safety, and Security of Distributed Systems.
(SSS) 2012
European Conference on Computer
Network Defense (EC2ND) 2010, 2011
Demo Track of European Conference on Machine Learning
(ECML DEMO) 2010, 2011
International Joint Conference on Artificial Intelligence
(IJCAI) 2011
GI Conference "Sicherheit,
Schutz und Zuverlässigkeit"
(SICHERHEIT) 2010
Workshop on Machine Learning Open Source Software
(MLOSS) 2010
Reviewing for Journals
Journal of Machine Learning Research (JMLR)
ACM Transactions on Information and System Security (TISSEC)
IEEE Transactions on Dependable and Secure Computing (TDSC)
Data Mining and Knowledge Discovery (DMKD)
Information Fusion (INFFUS)
International Journal of Information Security (IJIS)
Security and Communication Networks (SCN)
Whenever Possible
I am a member of "Verband der krawattenlosen Wissensträger" (VDKW)