Tutorial“Mining Complex Dynamic Data” at ECML/PKDD 2011

By: Hans-Peter Kriegel, Irene Ntoutsi, Myra Spiliopoulou, Grigoris Tsoumakas, Arthur Zimek

 

 

Motivation:

In recent years, many applications require mining from richer data types than conventional data(base) records: the analysis of social networks requires the combination of activity recordings with content (e.g. resource descriptions and user records); recommendation engines require considering user ratings, customer transactions, item descriptions and user profiles; medical applications require the combination of different kinds of recordings on patients, including historical data on ailments and medication. At the same time, the mining tasks become more elaborate: the data are multi-faceted and adhere to many, orthogonal or overlapping concepts; the data accumulate or form streams; they are dynamic and call for adaptation of the mining models. In this tutorial, we discuss mining on complex data, putting the emphasis on learning and adaptation over streaming, dynamic data.

We consider three categories of complex data: data that adhere to multiple overlapping labels, high-dimensional data that contain interesting subspaces, and data that span across multiple tables. For each category, we first provide a comprehensive overview of static mining methods, and then focus on methods and example applications for dynamic data. For multi-label stream data, we focus on the example application of document (news) categorization; the core methods are stream classification with decision trees, prediction and ranking. For high-dimensional stream data, we focus on the example application of bioinformatics and network intrusion; the core methods are stream subspace clustering and outlier detection. For multi-relational stream data, we consider two example applications: analysis of dynamic social networks, and analysis of evolving customer data; the core methods are tensor-based clustering, and multi-relational clustering and classification.

 

Draft outline:

  • Introduction and overview of complex data types

  • Mining multi-label data - underpinnings, methods and applications (G.T.)

  • Mining high-dimensional data – underpinnings,methods,applications (H.-P.K., I.N., A.Z.)

  • Mining multi-relational data – underpinnings, methods and applications (M.S.)

  • Outlook – summary and future challenges


Target audience:

The target groups are: postgraduate students with solid background in data mining; research scholars who work on conventional stream mining and are confronted with applications on complex data; practitioners that own applications on complex and dynamic data.


CVs of the Presenters:


Hans-Peter Kriegel is a full professor for database systems and data mining in the Institute for Informatics at the Ludwig Maximilians University Munich, Germany and has served as the department chair or vice chair over the last years. His research interests are in spatial and multimedia database systems, particularly in query processing, performance issues, similarity search, high-dimensional indexing as well as in knowledge discovery and data mining. Hans-Peter Kriegel has been chairman and program committee member in many international database and data mining conferences. He has published over 200 refereed conference and journal papers including several contributions in the scope of the tutorial, and he received the “SIGMOD Best Paper Award”' 1997 and the “DASFAA Best Paper Award” 2006 together with members of his research team.


Myra Spiliopoulou is Professor of Business Information Systems at the Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany. Her main research interest is knowledge discovery and adaptation. She has publications in international journals and conferences on web mining, text mining, model monitoring and adaptation over evolving data. She served as PC Co-Chair of ECML/PKDD 2006 and NLDB 2008, as Tutorials Chair at ICDM 2010, and is Workshops Chairs at ICDM 2011. Next to several tutorials at ECML/PKDD, she has given tutorials at User Modeling 2007 and at KDD 2009.


Irene Ntoutsi is a postdoc researcher in the Database Systems and Data Mining group of Prof. Hans-Peter Kriegel at the Ludwig Maximilians University Munich, Germany and scholar of the Alexander vor Humboldt Foundation. She received a B.Sc. in Computer Engineering (2001), an M.Sc. in Computer Science (2003), both from the Polytechnic School of the University of Patras, Greece and a Ph.D. in Data Mining (2008) from the University of Piraeus, Greece. Her main research interests include data mining over dynamic data with a current focus on high dimensional data. She has publications on pattern comparison and monitoring for different pattern types (clustering, decision trees, frequent itemsets), on spatiotemporal mining and on high-dimensional mining over dynamic data.


Grigorios Tsoumakas is a Lecturer at the Department of Informatics of the Aristotle University of Thessaloniki (AUTH). He received a B.Sc. in Informatics from AUTH in 1999, an M.Sc. in Artificial Intelligence from the University of Edinburgh in 2000 and a Ph.D. in Informatics from AUTH in 2005. His research interests include machine learning, knowledge discovery and data mining. He is serving/has served as Programme Committee member in several international conferences such as PAKDD’09, ICML’10, KDD’10, IJCAI-11, KDD’11. He has authored several refereed publications in the topic of multi-label learning in international conferences and journals during the last 4 years. He is also managing the open-source software library for multi-label learning, called Mulan. He has successfully co-organized a workshop and a tutorial in ECML PKDD 2009, a second workshop in ICML 2010 and is co-editing a special issue for MLJ, all under the area of learning from multi-label data.


Arthur Zimek is a scientific assistant in the database systems and data mining group of Hans-Peter Kriegel at the Ludwig Maximilians University Munich, Germany. He finished his PhD thesis in computer science on “Correlation Clustering” in summer 2008. He received the “Best Paper Honorable Mention Award” at the SIAM Int. Conf. on Data Mining (SDM) together with his co-authors in 2008 and has been selected as runner-up of the “SIGKDD Doctoral Dissertation Award” in 2009. He serves as a reviewer for several top database and data mining journals and as a member of program committees in data mining and pattern recognition conferences. He gave tutorials at several top database and data mining conferences (e.g. KDD, ICDM, SDM, PAKDD, VLDB).


 

Survey work in the domain of the proposed tutorial: