Current ProjectsWe are developing BLEURT, a flexible metric to evaluate Natural Language Generation systems. See this blog post for more information, or our Github repo if you would like to give it a try.
The aim of this project was to build tools to help engineers understand and debug neural networks with statistical methods. See our SysML paper for an overview and our SIGMOD paper for the details. This was a collaboration with Kevin Lin, Ian Huang, Eugene Wu (Columbia) and Carl Vondrick (Google Research).
We built a system to generate user interfaces automatically by mining SQL query logs and navigational data. This was a collaboration with Qianrui Zhang (Columbia), Haoci Zhang (Tsinghua) and Eugene Wu (Columbia). See our SIGMOD paper for an overview.
My PhD project: Automatic Advisors for Data Exploration
The aim of my PhD was to develop automatic advisors, to help users explore and understand their databases. These advisors could detect statistical patterns, and exploit them to recommend queries and visualizations. For instance, Blaeu [TKDE 2015] would use cluster analysis and subspace search to create navigable maps of the data.
Some of the the ideas in the thesis were implemented in a R package called findviews, available on CRAN. Check it out!
Social Data Analytics
The aim of this project was to model search query logs and Twitter data to detect experts on social media. This project has been ongoing since my internship at Microsoft Research, during summer of 2014 in Mountain View (CA). Then, I was under supervised by Omar Alonso, at Bing.
Timetrails - Traffic Data Analysis
The aim of this project was to develop database and forecasting technology to analyze large repositories of GPS data. This project was collaboration with TomTom.
MonetDB is a very fast Open Source column-store.