🚀 Lead Data Engineer | Spark | Scala | Python | Azure | Databricks
We all know the importance of optimizing the Spark execution plan. And we all know how diffucult it is to achieve the best execution workflow. Caching...
Photo by Ant Rozetsky on Unsplash Fault tolerance is an important requirement in distributed systems. Apache Spark provides robust fault tolerance...
Photo by Jim Wilson on Unsplash Have you ever wanted to pivot a Spark dataframe to change rows into columns or vice versa? Let me tell you that Spark...
https://unsplash.com/fr/photos/ItLQ6scEmz0 In the realm of distributed data processing, Apache Spark stands as a prominent figure, transforming the...
Unsplash image of Maarten van den Heuvel In today’s data-driven landscape, managing and optimizing data storage is a critical factor in achieving...
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta tables are a core abstraction in Delta Lake, which are similar...