Welcome to the Holzinger Group – the first Austrian IBM Watson Think Group
In Biomedicine and the Life Sciences due to the trend towards a personalized medicine, there is an exploding amount of complex data (“big data”), where traditional approaches of data mining and knowledge discovery often yield insufficient results.
We work consistently on bringing together aspects of the best of two worlds by creating a synergetic combination of methods and approaches from Human–Computer Interaction (HCI) and Knowledge Discovery and Data Mining (KDD) to contribute towards developing user-centred and interactive solutions to deal with this flood of data. We follow a quality approach, combining the hypothetico-deductive method, cross-linked with the PDCA-Deming cycle.
Our Motto: Science is to test ideas (weird ideas ;-)), Engineering is to bring these ideas into Business!
Herzlich Willkommen in der Gruppe Holzinger!
In der Biomedizin explodiert quasi die Menge der anfallenden komplexen Daten, wo traditionelle Ansätze zur Wissensentdeckung oft nur unzureichende Ergebnisse liefern.
Wir kombinieren mathematische und informatische Ansätze um interaktive und benutzerfreundliche Lösungen zum benutzerzentrierten Umgang mit diesem “Daten-Tsunami” zu entwickeln. Um dieses Ziel zu unterstützen arbeiten wir an einer synergetischen Kombination zweier Bereiche um menschliche Intelligenz mit maschineller Intelligenz zu unterstützen: Mensch-Computer Interaktion und Wissensentdeckung in Datenmengen. Dabei folgen wir einem Qualitätsansatz: wir verzahnen in unserer Arbeit die Hypothetico-Deduktive Methode mit dem PDCA-Deming Zyklus.
Unser Motto: Wissenschaft is das Testen von Ideen (gspinnerte, verrückte Ideen ;-)), Ingenieursarbeit ist es, diese Ideen in die Wirtschaft zu bringen!
Selected Publications/Ausgewählte Veröffentlichungen:
1) Holzinger, A. (2013) Human–Computer Interaction & Knowledge Discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Alfredo Cuzzocrea, Christian Kittl, Dimitris E. Simos, Edgar Weippl, Lida Xu – Multidisciplinary Research and Practice for Information Systems, Springer Lecture Notes in Computer Science LNCS 8127 <link for download>
This paper outlines our strategic aim to find solutions for data intensive problems in the combination of two areas, which bring ideal pre-conditions to solve data-centric problems: Human–Computer Interaction (HCI) and Knowledge Discovery and Data Mining (KDD). HCI deals with questions of human perception, cognition, intelligence, decision-making and interactive techniques of visualization, so it centers mainly on supervised methods. KDD deals mainly with questions of machine intelligence and data mining, in particular with the development of scalable algorithms for finding previously unknown relationships in data, thus centers on automatic computational methods. A proverb attributed perhaps incorrectly to Albert Einstein illustrates this perfectly: “Computers are incredibly fast, accurate, but stupid. Humans are incredibly slow, inaccurate, but brilliant. Together they may be powerful beyond imagination”. Consequently, a novel approach is to combine HCI & KDD in order to enhance human intelligence by computational intelligence.
2) Holzinger, A. & Zupan, M. (2013). KNODWAT: A scientific framework application for testing knowledge
discovery methods for the biomedical domain. BMC Bioinformatics, 14, (1), 191. <link for download>
This article presents the output of a project where we developed a web application using Java on Spring framework 3.1 for testing knowledge discovery/data mining methods, requiring a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. The software enables inexperienced biomedical researchers new to the field of knowledge discovery and data mining, to test methods using existing data to assess suitability and performance. In the evaluation phase we have tested two algorithms, CART and C4.5 implemented using the WEKA data mining framework.
3) Holzinger, A., Yildirim, P., Geier, M., & Simonic, K.‐M. (2013). Quality‐based knowledge discovery from
medical text on the Web: Example of computational methods in Web intelligence. In: Gabriella Pasi,
Lakhmi Jain (Ed.), Advanced Techniques in Web Intelligence, Lecture Notes in Artificial Intelligence,
LNAI (pp. 145‐158). Heidelberg, New York: Springer. <link for download>
This paper is an output of a joint project with the Okan University in Istanbul: The MEDLINE data base contains an enormously increasing volume of biomedical articles, consequently there is need for data mining techniques which enable the quality‐based discovery, extraction, integration and the use of hidden knowledge in those articles. Text mining helps to cope with the interpretation of these large volumes of data and we applied co‐occurrence analysis and statistical models to evaluate the significance of the relationship between entities such as disease names, drug
names, and keywords in titles, abstracts or even entire publications. In this paper we present a selection of quality‐oriented Web‐based tools for analyzing biomedical literature, and specifically discuss PolySearch, FACTA and Kleio. Finally we discuss Pointwise Mutual Information (PMI), which is a measure to discover the strength of a relationship. Quality‐based approaches are very relevant for data mining purposes.
4) Holzinger, A., Stocker, C., Peischl, B. & Simonic, K.‐M. (2012). On Using Entropy for Enhancing
Handwriting Preprocessing. Entropy, 14, (11), 2324‐2350.<link for download>
This paper was an “academic by‐product” of an industrial project (handwriting input on mobile computers in moving ambulance cars), where we experimented with point cloud data sets in R2: We developed a model of handwriting, and evaluated the performance of entropy‐based slant‐ and skew‐correction, and compared the results to other methods. For this purpose we used the Unipen‐ICROW‐03 benchmark data set, which we annotated with their associated error angles manually. Our results showed that the entropy‐based slant correction method outperforms a window based approach. These work is the basis for further entropy‐based approaches, which are very relevant for data mining purposes.
5) Holzinger, A., Ofner, B., Stocker, C., Valdez, A. C., Schaar, A. K., Ziefle, M. & Dehmer, M. 2013. On Graph
Entropy Measures for Knowledge Discovery from Publication Network Data. In: Cuzzocrea, A., Kittl, C., Simos, D. E., Weippl, E. & Xu, L. (eds.) Multidisciplinary Research and Practice for Information Systems, Springer Lecture Notes in Computer Science LNCS 8127. Heidelberg, Berlin: Springer, pp. 354‐362. <link for download>
During the visiting professorship of Andreas Holzinger at the RWTH Aachen, he was part of a project of the local network of excellence of the RWTH Aachen and one output was – in the context of supervision tasks for PhD students there – this paper, where information‐theoretic network measures were evaluated on large publication network data sets. The measures can be understood as graph complexity measures, which evaluate the structural complexity based on the corresponding concept. During this project we saw that it is very challenging to generalize such results towards different measures as every measures captures structural information differently and, hence, leads to different entropy values. This calls for exploring the 2 of 2 structural interpretation of a graph measure which has been a challenging problem. Graph‐entropy based approaches are very relevant for data mining purposes. Remember: Graph + Data = Network
6) Holzinger, A., Stocker, C., Bruschi, M., Auinger, A., Silva, H., Gamboa, H. & Fred, A. 2012. On Applying Approximate Entropy to ECG Signals for Knowledge Discovery on the Example of Big Sensor Data. In: Huang, R., Ghorbani, A., Pasi, G., Yamaguchi, T., Yen, N. & Jin, B. (eds.) Active Media Technology, Lecture Notes in Computer Science, LNCS 7669. Berlin Heidelberg: Springer, pp. 646‐657. <link for download>
This paper was presented in the context of the World Intelligence Congress in Macau, China. The paper reports on the output of a joint project with the Technical University of Lisbon. The basic idea was that approximate entropy can classify complex data in diverse settings and we applied this concept to ECG data. The challenge is to gain knowledge with only small ApEn windows while avoiding the modeling artifacts. Our central hypothesis was that for intra subject information (e.g. tendencies, fluctuations etc.) the ApEn window size can be significantly smaller than for
inter subject classification. For that purpose we introduced the novel term truthfulness to complement the statistical validity of a distribution, and showed how this truthfulness is able to establish trust in their local properties. Such concepts are very relevant for data mining purposes.