Python-centric Feature Stores | PyData London 2022

Jim Dowling Presents:

Python-centric Feature Stores

Most enterprise data used by Data Scientists to train machine learning models is tabular data that comes from data warehouses and data lakes. Recent growth in the popularity of the modern data stack, based on lakehouses like Snowflake, Delta Lake, Big Query, and Redshift, have led to growth in the use of SQL-centric tools for data engineers, such as DBT. However, Data Scientists' language of choice is Python. How do we square this circle?

In this talk, Jim Dowling investigates the role of the Feature Store for machine learning in enabling Python native access to enterprise data for both training and serving features to models. In particular, Dowling describes the problem of how to create point-in-time consistent training data from features spread over many tables using a SQL backend from Python.

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Home