conference logo

Playlist "Swiss Python Summit 2024"

More Than Pixels – Unlock your image data with Vision-Language Models

Johannes Kolbe

Join us on two Vision-Language Adventures! We'll uncover the information hidden inside big image collections with Vision-Language Models (VLMs) showing us the way. Who knows which forgotten gems await us? In the first part, we'll use CLIP and FAISS to go on a treasure hunt in your photo collection. You'll learn how to filter through millions of images in a breeze, using natural language. Bye-bye endless scrolling, hour-long tagging, and frustrated folder searching. In the second part, we will harness the power of VLMs to help us caption images – translating pixels to words. Then we'll make use of the BERTopic library to reveal even deeper insights into your photo collections. By the end of this talk, you'll be equipped with the knowledge and tools to unlock new insights, identify patterns, and make your image data work harder for you. This talk is for an intermediate audience – it is good if you bring some knowledge in Computer Vision, NLP or just general Deep Learning.