Image by Kai Pilger on Unsplash
Delta Lake, an open-source storage layer that brings reliability to data lakes, allows you to store and manage data in data lakes. Delta tables are a core concept of Delta Lake, which enables data versioning, transactional reads and writes, schema enforcement, and metadata management. In this article, we will focus on Delta tables constraints, how they work, and their implementation with code examples.
What are Delta Tables Constraints?
Delta tables constraints are a set of rules that control the values that are inserted, updated, or deleted in a Delta table. They help to ensure data integrity and consistency by enforcing data quality rules. Delta tables constraints come in two forms: Column constraints and Table constraints.
Column Constraints
Column constraints are rules that apply to a single column in a table. Delta tables support the following column constraints:
- NOT NULL: This constraint ensures that a column must have a value and cannot be null.
- UNIQUE: This constraint ensures that all values in a column are unique and distinct.
- CHECK: This constraint allows you to specify a Boolean expression that must evaluate to true for each row in the table. If the expression evaluates to false, an error is raised, and the statement is rolled back.
Table Constraints
Table constraints apply to a combination of columns in a table. Delta tables support the following table constraints:
- PRIMARY KEY: This constraint identifies a column or a set of columns that uniquely identifies each row in the table. A primary key constraint ensures that the values in the specified column(s) are unique and not null.
- FOREIGN KEY: This constraint identifies a column or a set of columns in a table that refers to the primary key of another table. A foreign key constraint ensures that the values in the specified column(s) exist in the primary key of the referred table.
Implementing Delta Tables Constraints
Let’s look at some code examples that demonstrate how to implement Delta tables constraints.
Column Constraints Example
First, we will create a Delta table with a NOT NULL column constraint:
CREATE TABLE delta_table (
id INT NOT NULL,
name STRING
)
USING delta;
If we try to insert a row with a null value in the id column, we will get an error:
INSERT INTO delta_table VALUES (NULL, 'John');
Output:
Error in SQL statement: Delta column constraint violation: `id` is defined as NOT NULL but null value is attempted to be inserted.
Next, we will create a Delta table with a UNIQUE column constraint:
CREATE TABLE delta_table (
id INT,
name STRING,
UNIQUE (id)
)
USING delta;
If we try to insert a row with a duplicate value in the id column, we will get an error:
INSERT INTO delta_table VALUES (1, 'John'), (1, 'Jane');
Output:
Error in SQL statement: Delta column constraint violation: `id` is defined as UNIQUE but duplicate values are attempted to be inserted.
Finally, we will create a Delta table with a CHECK column constraint:
CREATE TABLE delta_table (
id INT,
age INT,
CHECK (age > 0)
)
USING delta;
If we try to insert a row with a negative value in the age column, we will get an error:
INSERT INTO delta_table VALUES (1, -20);
Output:
Error in SQL statement: Delta column constraint violation: the CHECK constraint `check_constraint` is violated for column `age`.
Table Constraints Example
First, we will create two Delta tables: customers and orders. The customers table has a primary key constraint on the id column, and the orders table has a foreign key constraint on the customer_id column that refers to the id column in the customers table:
CREATE TABLE customers (
id INT PRIMARY KEY,
name STRING
)
USING delta;
CREATE TABLE orders (
order_id INT,
order_date DATE,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id)
)
USING delta;
If we try to insert a row in the orders table with a non-existent customer_id value, we will get an error:
INSERT INTO orders VALUES (1, '2021-01-01', 100);
Output:
Error in SQL statement: Delta foreign key constraint violation: The key `(100)` doesn't exist on the referenced table `default`.`customers`.
Enhanced Business Logic Constraints
❓ What if my business needs to validate different constraints? For instance emptiness, length constraints on specific columns, positive/negative… How can I express my constraint to the Delta engine?
Unfortunately, on the day I’m writing this story, it’s not possible to add such constraints to your Delta tables (using the Delta framework).
It remains possible to incorporate the custom constraints into the Writer component of the data processing stage based on your business logic. I have written a story related to this.
[Unleashing the Power of Deequ for Efficient Spark Data Analysis
In the big data world, ensuring data quality is even more important due to the large volume and variety of data being…medium.com](https://medium.com/@omarlaraqui/unleashing-the-power-of-deequ-for-efficient-spark-data-analysis-be0f490cce54 "medium.com/@omarlaraqui/unleashing-the-powe..")
Conclusion
Delta tables constraints are an essential feature of Delta Lake that ensures data quality and consistency. Column constraints control the values in a single column, while table constraints control the values in a combination of columns. Delta Lake supports several types of constraints, including NOT NULL, UNIQUE, CHECK, PRIMARY KEY, and FOREIGN KEY. Custom rules are possible using third-party libraries like Deequ.
With this knowledge, you can create Delta tables with robust data quality rules that help you to maintain data integrity in your data lake.
Further Reading
[Spark caching, when and how?
A guide to wisely use caching on Sparkmedium.com](https://medium.com/@omarlaraqui/caching-in-spark-when-and-how-367e77db454d "medium.com/@omarlaraqui/caching-in-spark-wh..")
[Build efficient tests for your Spark data pipeline using BDD with Cucumber
Using Cucumber to define & validate acceptance criteria, teams can collaborate effectively and build software that is…medium.com](https://medium.com/@omarlaraqui/build-efficient-tests-for-your-spark-data-pipeline-using-bdds-with-cucumber-61f1bdc08faf "medium.com/@omarlaraqui/build-efficient-tes..")
[The Medallion Architecture
Data is a hot topic in the business world. Everyone wants to talk about the insights and value they can derive from…medium.com](https://medium.com/@omarlaraqui/the-medallion-architecture-21fe878d1aca "medium.com/@omarlaraqui/the-medallion-archi..")
Let’s Connect
https://www.linkedin.com/in/omar-laraqui
[Omaroid - Overview
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…github.com](https://github.com/Omaroid "github.com/Omaroid")
And don’t forget to sign up for my free mailing list to not miss any new blog**.