Geometric Data Analysis (M104)

Ioannis Emiris



A. Randomized and approximate algorithms and heuristics to deal with high-dimensional data. Sampling, random walks, and volume computation of convex regions. Convex hull and Voronoi diagrams. Applications in fintech.

B. Data structures for approximate nearest neighbors in general dimension to address the curse of dimensionality: tree based, Locality-Sensitive Hashing, randomized projections for dimensionality reduction. Clustering: Non-hierarchical methods including k-means, k-medoids, iq-means, variants and improvements.

C. Machine and Deep Learning for Graphs: Graph kernels and community detection; Representation learning for Graphs and node classification; Representation learning for Graphs and Graph classification; Sets embeddings and point clouds

Software development in Python and applications.

Fall 2020: I Emiris and M. Vazirgiannis (AUEB & Ecole Polytechnique, Paris). VIRTUAL meetings: Thursday 2.15-5pm; video stored at -- u

Assessment Methods

The course is structured around lectures by the instructors, and presentations by the students of book chapters, papers, and of their projects. The latter include critical paper readings and/or software projects, and happen at the end of each of the 3 course sections.

Grading is based on these presentations and the corresponding reports handed-in, including implementations and experimental results, when appropriate. Each student undertakes 3 presentations (possibly with related topics), solo or in small groups.


Slides and notes by the instructors, book chapters and papers.

One relevant book (offering slides, videos, etc) is: Mining of Massive Datasets, by J. Leskovec, A. Rajaraman, and J. Ullman.


Schedule 2020

Meetings occur virtually every Thursday at 2.15 - 5 pm. Used to be in room Z (Dept Informatics & Telecoms).
They are broadcast live, and the video is stored, at

01/10. Intro to course [Emiris].

Sampling and random walks [Emiris]
08/10. Convex polytopes, Voronoi diagrams, volumes [Emiris].
15/10. Sampling and Random walks [Emiris].
22/10. Optimization, Financial modeling [Chalkis]
29/10. Student presentations.

Search and Clustering [Emiris]
05/11. Nearest neighbors, JL Lemma [Psarros]
12/11. Dimensionality reduction, LSH, Randomized Projections, Clustering [Emiris]
19/11. Centroid-based Clustering: k-means, and variations [Emiris]
26/11. Student presentations.

Machine and Deep Learning for Graphs [Vazirgiannis]
03/12: Graph kernels, community detection, Grakel Python library
10/12. Representation learning -- node classification: node embeddings, Supervised node embeddings
14/01. Representation learning -- graph classification: graph CNNs, message passing architectures, Graph Autoencoders
21/01. Sets embeddings and point clouds (Deep Sets. ReptheSet)

Course topics

I. Polytopes, Sampling, Random walks, Volume, optimization. Applications to finance, Software tutorial

II. Search and Mining: Tree-based, LSH, Randomized Projections, JL lemma. Clustering. Applications.

III. Graph Neural networks, deep geometric learning, applications in Python.