Big Data Analysis with SQL


Playlists: 'gpn22' videos starting here / audio

This talk explains how you can build your own scalable data processing system with just a few open source tools: DBT, Trino, Iceberg and MinIO. And also why SQL is still the best language for data analysis!

Have you ever used PostgreSQL to store *massive* amounts of data? Did your queries take *minutes* or even *hours* to compute?

The field of data analysis is rather complex and a ton of solutions are available: therefore I will show how to compare systems with each other. You will learn why databases like PostgreSQL or MongoDB are not suited to compute analytics queries on huge amounts of data. Then we will look at data analysis architectures that are capable of scaling to terabytes of data and I will explain why they are better in those particular situations.

At the end of the talk you will know which solution is best suited for your next large-scale data project!