Thibault Sellam

Pronunciation: "Tee-Bo" - it's French!
email GitHub Bitbuket

I am a postdoc at the WuLab, within the Data Science Institute of Columbia University, in New York. Previously, I was a PhD student at CWI and the University of Amsterdam (the Netherlands). I am interested in data exploration, data mining, human-in-the-loop machine learning and more generally anything at the intersection of data management and AI.


Ongoing projects

Neural Debugging: we are building tools to help people understand and debug deep neural networks. The challenge is to infer what the neurons do from statistics over their activations. So far we have focused on recurrent models (e.g., RNNs, LSTMs), but there is more to come. This is a collaboration with Kevin Lin and Eugene Wu (Columbia).

Precision Interfaces: we are building a system to generate user interfaces automatically by mining SQL query logs and navigational data. This is a collaboration with Haoci Zhang (Tsinghua) and Eugene Wu (Columbia). See our HILDA paper for an overview.

PhD project: Automatic Advisors for Data Exploration

The aim of my PhD was to develop automatic advisors, to help users explore and understand their databases. These advisors could detect statistical patterns, and exploit them to recommend queries and visualizations. For instance, Claude [CIKM 2015] uses feature selection and information theory to recommend views. Charles [CIDR 2015], then its successor Blaeu [TKDE 2015] exploit cluster analysis and subspace search. Also, instead of well-structured databases, users may have to deal with text files, or even worse, tweets. Raimond [ICWE 2015] extracts and organizes quantitative data from social data.

You may access the full book here.

Advisor: Martin Kersten
Committee: Gerard Weikum, Bart Goethals, Maarten de Rijke, Peter Adriaans and Marcel Worring

Some of the the ideas in the thesis were implemented in a R package called findviews, available on CRAN. Check it out!

Past Projects

Here are a few other projects I am involved in:

Social data analysis: we mine query logs and social data. This project has been ongoing since my internship at Microsoft Research, during summer of 2014 in Mountain View (CA). Then, I was under supervised by Omar Alonso.

TimeTrails, Spatiotemporal data warehouses for trajectory exploitation: we develop database technology to mine large volumns of GPS data. This project is collaboration with TomTom, funded by the Dutch organization COMMIT.

MonetDB: MonetDB is a very fast Open Source column-store.

Other Past Projects








Two US patents on methods to mine social data, with Microsoft: