Welcome to the Holzinger Group!

Due to the trend towards P4 medicine there is an increasing amount of complex, high-dimensional, weakly-structured and big data sets, where traditional approaches of data mining and knowledge discovery often yield insufficient results.

We work consistently on bringing together aspects of the best of two worlds by creating a synergetic combination of approaches from Human–Computer Interaction (HCI) and Knowledge Discovery/Data Mining (KDD) to contribute towards the development of methods, algorithms and tools to deal with complex data sets.

Our Motto: Science is to test crazy ideas – Engineering is to bring these ideas into Business!

Work Areas and Research topics of the Holzinger Group

Work Areas and Research topics of the Holzinger Group

 

In der Biomedizin und den Lebenswissenschaften explodiert quasi die Menge der anfallenden komplexen Daten, wo traditionelle Ansätze zur Wissensentdeckung oft nur unzureichende Ergebnisse liefern.

Wir kombinieren mathematische und informatische Ansätze um interaktive und benutzerfreundliche Methoden, Algorithmen und Werkzeuge zum Umgang mit diesem “Daten-Tsunami” zu entwickeln. Um dieses Ziel zu erreichen arbeiten wir an einer synergetischen Kombination zweier Bereiche um menschliche Intelligenz mit maschinellem Lernen zu unterstützen: Mensch-Computer Interaktion (HCI) und Wissensentdeckung/Data Mining (KDD) in Datenmengen.

Unser Motto: Wissenschaft is das Testen von Ideen (gspinnerte, verrückte Ideen ;-)), Ingenieursarbeit ist es, diese Ideen in die Wirtschaft zu bringen!

Technical Area: Interactive Knowledge Discovery and Data Mining, Machine Learning
Application Area: Biomedical Informatics

Research topics horizontal: Data Fusion, Preprocessing, Data Mapping, Mining Algorithms, Machine Learning, Data Visualization; Privacy, Data Protection, Safety and Security
Research topics vertical: Graph-based Data Mining, Entropy-based Data Mining and Topological Data Mining

Selected Publications:

1) Holzinger, A., Dehmer, M. & Jurisica, I.  (2014) Knowledge Discovery and Interactive Data Mining in Bioinformatics – State-of-the-Art, Future challenges and Research Directions. BMC Bioinformatics, 15(Suppl 6), I1 <link to paper>

This paper outlines that we are just at the beginning of a turning point towards data intensive life sciences, which entails many challenges and future research directions. Within this overview we have highlighted only a few issues. Summarizing, we may say that the grand challenge is in building frameworks for enabling domain experts to interactively deal with their data sets in order to “ask questions” about the data, for example: “Show me similarities/differences/anomalies of data set X and data set Y”, hence the discovery of novel, previously unknown patterns in complex data. Which mathematical framework should we use? One challenge is that such a framework must be usable for domain experts without prior training in mathematics or computational sciences. We need machine intelligence to deal with the flood of data, but at the same time we must acknowledge that humans possess certain problem solving and cognition abilities, which are far beyond computation. A possible solution is in the cross-disciplinary combination of aspects of the better of two worlds: Human–Computer Interaction (HCI) and Knowledge Discovery from Data (KDD). A proverb attributed perhaps incorrectly to Albert Einstein illustrates this perfectly: “Computers are incredibly fast, accurate, but stupid. Humans are incredibly slow, inaccurate, but brilliant. Together they may be powerful beyond imagination”.

@article{HolzingerDehmerJurisica2014KDDBMCBioinfo,
author = {Holzinger, Andreas and Dehmer, Matthias and Jurisica, Igor},
title = {Knowledge Discovery and Interactive Data Mining in Bioinformatics – State-of-the-Art, Future challenges and Research Directions},
journal = {BMC Bioinformatics},
volume = {15(Suppl 6)},
number = {I1},
year = {2014}
}

2) Holzinger, A. & Jurisica, I. (eds.) 2014. Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. Lecture Notes in Computer Science LNCS 8401, Heidelberg, Berlin: Springer. <link to Springer-Link>

Volume 8401 of theLecture Notes in Computer Scienceis a state-of-the-art volume focusing on hot topics from interactive knowledge discovery and data mining in biomedical informatics. Each paper describes the state of the art and focuses on open problems and future challenges in order to provide a research agenda to stimulate further research and progress within the scientific community.

@book{LNCS8401,
author = {Holzinger, Andreas and Jurisica, Igor},
title = {Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. Lecture Notes in Computer Science LNCS 8401},
publisher = {Springer},
address = {Heidelberg, Berlin},
year = {2014}
}

3)  Holzinger, A. 2014. Extravaganza Tutorial on Hot Ideas for Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. In: Slezak, D., Peters, J. F., Tan, A.-H. & Schwabe, L. (eds.) Brain Informatics and Health, BIH 2014, Lecture Notes in Artificial Intelligence, LNAI 8609. Heidelberg, Berlin: Springer, pp. 502-515. <Link for download of preprint>

Mapping higher dimensional data into lower dimensions is a major task in HCI, and a concerted effort including recent advances from graph-theory and algebraic topology may contribute to finding solutions. Moreover, much biomedical data is sparse, noisy and time-dependent, hence entropy is also amongst promising topics. This tutorial provides an overview of the HCI-KDD approach and focuses on three challenging hot topics: graphs, topology and entropy. The goal of this intro tutorial is to motivate and stimulate further research.

@inbook{Holzinger2014Extravaganza,
author = {Holzinger, Andreas},
title = {Extravaganza Tutorial on Hot Ideas for Interactive Knowledge Discovery and Data Mining in Biomedical Informatics},
booktitle = {Brain Informatics and Health, BIH 2014, Lecture Notes in Artificial Intelligence, LNAI 8609},
editor = {Slezak, Dominik and Peters, James F. and Tan, Ah-Hwee and Schwabe, Lars},
publisher = {Springer},
address = {Heidelberg, Berlin},
pages = {502-515},
year = {2014}
}

4) Holzinger, A. (2013) Human–Computer Interaction & Knowledge Discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Alfredo Cuzzocrea, Christian Kittl, Dimitris E. Simos, Edgar Weippl, Lida Xu – Multidisciplinary Research and Practice for Information Systems, Springer Lecture Notes in Computer Science LNCS 8127 <link for download>

This paper outlines our strategic aim to find solutions for data intensive problems in the combination of two areas, which bring ideal pre-conditions to solve data-centric problems: Human–Computer Interaction (HCI) and Knowledge Discovery and Data Mining (KDD). HCI deals with questions of human perception, cognition, intelligence, decision-making and interactive techniques of visualization, so it centers mainly on supervised methods. KDD deals mainly with questions of machine intelligence and data mining, in particular with the development of scalable algorithms for finding previously unknown relationships in data, thus centers on automatic computational methods. A proverb attributed perhaps incorrectly to Albert Einstein illustrates this perfectly: “Computers are incredibly fast, accurate, but stupid. Humans are incredibly slow, inaccurate, but brilliant. Together they may be powerful beyond imagination”. Consequently, a novel approach is to combine HCI & KDD in order to enhance human intelligence by computational intelligence.

@inbook{Holzinger2013HCI-KDD,
author = {Holzinger, Andreas},
title = {Human–Computer Interaction & Knowledge Discovery (HCI-KDD): What is the benefit of bringing those two fields to work together?},
booktitle = {Multidisciplinary Research and Practice for Information Systems, Springer Lecture Notes in Computer Science LNCS 8127},
editor = {Alfredo Cuzzocrea, Christian Kittl, Dimitris E. Simos, Edgar Weippl, Lida Xu},
publisher = {Springer},
address = {Heidelberg, Berlin, New York},
pages = {319–328},
year = {2013}
}

5) Holzinger, A. & Zupan, M. (2013). KNODWAT: A scientific framework application for testing knowledge
discovery methods for the biomedical domain. BMC Bioinformatics, 14, (1), 191. <link for download>

This article presents the output of a project where we developed a web application using Java on Spring framework 3.1 for testing knowledge discovery/data mining methods, requiring a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. The software enables inexperienced biomedical researchers new to the field of knowledge discovery and data mining, to test methods using existing data to assess suitability and performance. In the evaluation phase we have tested two algorithms, CART and C4.5 implemented using the WEKA data mining framework.

@article{HolzingerZupan2013KNODWAT,
author = {Holzinger, Andreas and Zupan, Mario},
title = {KNODWAT: A scientific framework application for testing knowledge discovery methods for the biomedical domain},
journal = {BMC Bioinformatics},
volume = {14},
number = {1},
pages = {191},
year = {2013}
}

6) Holzinger, A., Yildirim, P., Geier, M., & Simonic, K.‐M. (2013). Quality‐based knowledge discovery from
medical text on the Web: Example of computational methods in Web intelligence. In: Gabriella Pasi,
Lakhmi Jain (Ed.), Advanced Techniques in Web Intelligence, Lecture Notes in Artificial Intelligence,
LNAI (pp. 145‐158). Heidelberg, New York: Springer. <link for download>

This paper is an output of a joint project with the Okan University in Istanbul: The MEDLINE data base contains an enormously increasing volume of biomedical articles, consequently there is need for data mining techniques which enable the quality‐based discovery, extraction, integration and the use of hidden knowledge in those articles. Text mining helps to cope with the interpretation of these large volumes of data and we applied co‐occurrence analysis and statistical models to evaluate the significance of the relationship between entities such as disease names, drug
names, and keywords in titles, abstracts or even entire publications. In this paper we present a selection of quality‐oriented Web‐based tools for analyzing biomedical literature, and specifically discuss PolySearch, FACTA and Kleio. Finally we discuss Pointwise Mutual Information (PMI), which is a measure to discover the strength of a relationship. Quality‐based approaches are very relevant for data mining purposes.

@inbook{Holzinger_etal2013TextDataMining,
author = {Holzinger, Andreas and Yildirim, Pinar and Geier, Michael and Simonic, Klaus-Martin},
title = {Quality-Based Knowledge Discovery from Medical Text on the Web},
booktitle = {Quality Issues in the Management of Web Information, Intelligent Systems Reference Library, ISRL 50},
editor = {Pasi, Gabriella and Bordogna, Gloria and Jain, Lakhmi C.},
publisher = {Springer},
address = {Berlin Heidelberg},
pages = {145-158},
year = {2013}
}

7) Holzinger, A., Stocker, C., Peischl, B. & Simonic, K.‐M. (2012). On Using Entropy for Enhancing Handwriting Preprocessing. Entropy, 14, (11), 2324‐2350.<link for download>

This paper was an “academic by‐product” of an industrial project (handwriting input on mobile computers in moving ambulance cars), where we experimented with point cloud data sets in R2: We developed a model of handwriting, and evaluated the performance of entropy‐based slant‐ and skew‐correction, and compared the results to other methods. For this purpose we used the Unipen‐ICROW‐03 benchmark data set, which we annotated with their associated error angles manually. Our results showed that the entropy‐based slant correction method outperforms a window based approach. These work is the basis for further entropy‐based approaches, which are very relevant for data mining purposes.

@article{Holzinger_etal2012entropydatamining,
author = {Holzinger, Andreas and Stocker, Christof and Peischl, Bernhard and Simonic, Klaus-Martin},
title = {On Using Entropy for Enhancing Handwriting Preprocessing},
journal = {Entropy},
volume = {14},
number = {11},
pages = {2324-2350},
year = {2012}
}

8) Holzinger, A., Ofner, B., Stocker, C., Valdez, A. C., Schaar, A. K., Ziefle, M. & Dehmer, M. 2013. On Graph
Entropy Measures for Knowledge Discovery from Publication Network Data. In: Cuzzocrea, A., Kittl, C., Simos, D. E., Weippl, E. & Xu, L. (eds.) Multidisciplinary Research and Practice for Information Systems, Springer Lecture Notes in Computer Science LNCS 8127. Heidelberg, Berlin: Springer, pp. 354‐362. <link for download>

During the visiting professorship of Andreas Holzinger at the RWTH Aachen, he was part of a project of the local network of excellence of the RWTH Aachen and one output was – in the context of supervision tasks for PhD students there – this paper, where information‐theoretic network measures were evaluated on large publication network data sets. The measures can be understood as graph complexity measures, which evaluate the structural complexity based on the corresponding concept. During this project we saw that it is very challenging to generalize such results towards different measures as every measures captures structural information differently and, hence, leads to different entropy values. This calls for exploring the 2 of 2 structural interpretation of a graph measure which has been a challenging problem. Graph‐entropy based approaches are very relevant for data mining purposes. Remember: Graph + Data = Network :-)

9) Holzinger, A., Stocker, C., Bruschi, M., Auinger, A., Silva, H., Gamboa, H. & Fred, A. 2012. On Applying Approximate Entropy to ECG Signals for Knowledge Discovery on the Example of Big Sensor Data. In: Huang, R., Ghorbani, A., Pasi, G., Yamaguchi, T., Yen, N. & Jin, B. (eds.) Active Media Technology, Lecture Notes in Computer Science, LNCS 7669. Berlin Heidelberg: Springer, pp. 646‐657. <link for download>

This paper was presented in the context of the World Intelligence Congress in Macau, China. The paper reports on the output of a joint project with the Technical University of Lisbon. The basic idea was that approximate entropy can classify complex data in diverse settings and we applied this concept to ECG data. The challenge is to gain knowledge with only small ApEn windows while avoiding the modeling artifacts. Our central hypothesis was that for intra subject information (e.g. tendencies, fluctuations etc.) the ApEn window size can be significantly smaller than for
inter subject classification. For that purpose we introduced the novel term truthfulness to complement the statistical validity of a distribution, and showed how this truthfulness is able to establish trust in their local properties. Such concepts are very relevant for data mining purposes.

Latest News

2014-04-13 (open call) WIC 2014 > AMT 2014 > Special Session HCI-KDD, Warszawa, August 11-14, 2014

World Intelligence Congress WIC 2014 International Conference on Active Media Technology (AMT 2014),  11-14 August 2014 / Warsaw, Poland http://wic2014.mimuw.edu.pl/amt/homepage Special Session on Advanced Methods of Interactive Data Mining for Personalized Medicine Paper Submission via the Conference On-Line System LNCS/LNAI Style due to April, 13, 2014 (Camera Ready due to May 11, 2014) https://wi-lab.com/cyberchair/2014/amtbih14/scripts/submit.php?subarea=SA4&undisplay_detail=1&wh=/cyberchair/2014/amtbih14/scripts/ws_submit.php Special Session Organizers: Andreas HOLZINGER, Frank EMMERT-STREIB, Matthias DEHMER, Szymon WILK One of the grand challenges in the life sciences are [...]

2014-04-24 (open call) TIR’ 14 > DEXA 2014 > Munich > September, 1-5, 2014

TIR’14 – 11th International Workshop on Text-Based Information Retrieval In conjunction with the DEXA 2014 25th International Conference on Database and Expert Systems Applications Munich, Germany, September 1 – September 5, 2014 http://tir.webis.de Intelligent algorithms for data mining and information retrieval are the key technology to cope with the information need challenges in our media-centered society. Methods for text-based information retrieval receive special attention, which results from the important role of written text, from the [...]