Automatic Clustering At Snowflake

For partitioned tables, maintaining good clustering properties for frequently filtered dimensions is critical for partition pruning and query performance. Naive methods of maintaining good clustering is usually expensive, especially when the clustering dimensions are different from the natural dimension with which the data is loaded. Usually the tradeoff between cost of reorganizing the data and benefit on the query  time taper off after a certain point. Approximate clustering is cheaper to maintain while still resulting in good pruning performance. In this talk, I will present Snowflake’s clustering capabilities, including our algorithm for incremental maintenance of approximate clustering of partitioned tables, as well as our infrastructure to perform such maintenance automatically. I will also cover some real-world problems we run into and our solutions.


Prasanna Rajaperumal

Developer @SnowflakeDB

Prasanna Rajaperumal is a senior engineer at Snowflake, working on Snowflake Databases' Query Engine. Before Snowflake, he worked on building the next generation Data infrastructure at Uber. Over the last decade, He has been building data systems that scale in Cloudera, Cisco and few other...

Read more
Find Prasanna Rajaperumal at:


Windsor, 5th flr.


Modern CS in the Real World


Database ArchitectureSilicon Valley


From the same track

SESSION + Live Q&A Protocols

Using Randomized Communication for Robust, Scalable Systems

Three key needs that any distributed system must address are discovery, fault detection, and load balancing among its components. Satisfying these needs in a robust and scalable manner is challenging, but it turns out randomized communication can help with each of them. In this talk, we will...

Jon Currey

Director of Research @HashiCorp


Automated Test Design and Bug Fixing @Facebook

The talk describes the deployment of Sapienz, a system for automated test case design that uses Search Based Software Engineering (SBSE) that has been deployed at Facebook since October 2017 to design test cases, localise and triage crashes to developers and monitor their fixes. It also describes...

Nadia Alshahwan

Software Engineer @Facebook

SESSION + Live Q&A Clojure

Functional Composition

Marc Andreessen famously observed that "software is eating the world". As an increasing proportion of our culture becomes codified (literally), we need to consider how to authentically express theory and insights from diverse fields in our software. This must account for domains besides business...

Chris Ford

Technical Principal @ThoughtWorksESP

SESSION + Live Q&A Quantum Computing

Using Quantum Computers to Simulate Chemistry

Quantum computing is unmistakably becoming a thing. With IBM’s announcement of their quantum computing cloud service at CES in January and Google’s announcement last year of their 72-qubit Bristlecone processor, suddenly quantum computing seems to be entering into the Enterprise. In this...

Peter Morgan

AI Community Leader & Founder and CEO Deep Learning Partnership

View full Schedule