Manage Validation Definitions
A Validation Definition is a fixed reference that links a Batch of data to an Expectation Suite. It can be run by itself to validate the referenced data against the associated Expectations for testing or data exploration. Multiple Validation Definitions can also be provided to a Checkpoint which, when run, executes Actions based on the Validation Results for each provided Validation Definition.
Prerequisites
- An installation of Python, version 3.8 to 3.11.
- An installation of GX 1.0.
- A preconfigured Data Context.
- A preconfigured Data Source and Data Asset connected to your data.
- A preconfigured Expectation Suite populated with Expectations.
Create a Validation Definition
- Procedure
- Sample code
- Import the
ValidationDefinition
class from the GX library.
from great_expectations.core import ValidationDefinition
In this example the variable context
is your Data Context.
- Get an Expectation Suite with Expectations. This can be an existing Expectation Suite retrieved from your Data Context or a new Expectation Suite in your current code.
In this example the variable suite
is your Expectation Suite.
- Get an existing or create a new Batch Definition describing the data that will be associated with the Expectation Suite.
In this example the variable batch_definition
is your Batch Definition.
- Create a
ValidationDefinition
instance using the Batch Definition, Expectation Suite, and a unique name.
definition_name = "My Validation Definition"
validation_definition = ValidationDefintion(data=batch_definition, suite=suite, name=definition_name)
- Optional. Save the Validation Definition to your Data Context.
validation_definition = context.validation_definitions.add(validation_definition)
You can add a Validation Definition to your Data Context at the same time as you create it with the following code:
definition_name = "My second Validation Definition"
validation_definition = context.validation_definitions.add(ValidationDefinition(data=batch_definition, suite=suite, name=definition_name))
import great_expectations as gx
from great_expectations.core import ValidationDefinition
context = gx.get_context()
existing_suite_name = "my_expectation_suite"
suite = context.suites.get(name=existing_suite_name)
existing_data_source_name = "my_datasource"
existing_data_asset_name = "my_data_asset"
existing_batch_definition_name = "my_batch_definition"
batch_definition = context.get_datasource(existing_data_source_name).get(existing_data_asset_name).get(existing_batch_definition_name)
definition_name = "My Validation Definition"
validation_definition = ValidationDefintion(data=batch_definition, suite=suite, name=definition_name)
validation_definition = context.validation_definitions.add(validation_definition)
new_definition_name = "My second Validation Definition"
validation_definition = context.validation_definitions.add(ValidationDefinition(data=batch_definition, suite=suite, name=new_definition_name))
List available Validation Definitions
- Procedure
- Sample code
In this example the variable context
is your Data Context.
- Use the Data Context to retrieve and print the names of the available Validation Definitions:
validation_definition_names = [definition.name for definition in context.validation_definitions]
print(validation_definition_names)
import great_expectations as gx
context = gx.get_context()
for definition in context.validation_definitions:
print(definition.name)
Get a Validation Definition by name
- Procedure
- Sample code
In this example the variable context
is your Data Context.
- Use the Data Context to request the Validation Definition.
definition_name = "My Validation Definition"
validation_definition = context.validation_definitions.get(name=definition_name)
import great_expectations as gx
context = gx.get_context()
definition_name = "My Validation Definition"
validation_definition = context.validation_definitions.get(name=definition_name)
Get Validation Definitions by attributes
- Procedure
- Sample code
In this example the variable context
is your Data Context.
- Determine the attributes to filter on.
Validation Definitions associate an Expectation Suite with a Batch Definition. This means that valid attributes to filter on include the attributes for the Expectation Suite, as well as the attributes for the Batch Definition, the Batch Definition's Data Asset, and the Data Asset's Data Source.
- Use a list comprehension to return all Validation Definitions that match the filtered attributes.
For example, you can retrieve all Validation Definitions that include a specific Expectation Suite by filtering on the Expectation Suite name:
existing_expectation_suite_name = "my_expectation_suite"
validation_definitions_for_suite = [
definition for definition in context.validation_definitions
if definition.suite.name == existing_expectation_suite_name
]
Or you could return all Validation Definitions involving a specific Data Asset by filtering on the Data Source and Data Asset names:
existing_data_source_name = "my_data_source"
existing_data_asset_name = "my_data_asset"
validation_definitions_for_asset = [
definition for definition in context.validation_definitions
if definition.data_source.name == existing_data_source_name
and definition.asset.name == existing_data_asset_name
]
import great_expectations as gx
context = gx.get_context()
existing_expectation_suite_name = "my_expectation_suite"
validation_definitions_for_suite = [
definition for definition in context.validation_definitions
if definition.expectation_suite.name == existing_expectation_suite_name
]
existing_data_source_name = "my_data_source"
existing_data_asset_name = "my_data_asset"
validation_definitions_for_asset = [
definition for definition in context.validation_definitions
if definition.data_source.name == existing_data_source_name
and definition.asset.name == existing_data_asset_name
]
Delete a Validation Definition
- Procedure
- Sample code
In this example the variable context
is your Data Context.
- Get the Validation Definition to delete.
In this example the variable validation_definition
is the Validation Definition to delete.
- Use the Data Context to delete the Validation Definition:
context.validations.delete(name=validation_definition.name)
You can directly provide the Validation Definition's name as a string. However, retrieving the Validation Definition from your Data Context and using its name attribute to specify the Validation Definition to delete will ensure that you do not introduce typos, differences in capitalization, or otherwise attempt to delete a Validation Definition that does not exist.
import great_expectations as gx
context = gx.get_context()
definition_name = "My Validation Definition"
validation_definition = context.validation_definitions.get(name=definition_name)
context.validation_definitions.delete(validation_definition.name)
Duplicate a Validation Definition
- Procedure
- Sample code
Validation definitions are intended to be fixed references that link a set of data to an Expectation Suite. As such, they do not include an update method. However, multiple Validation Definitions with the same Batch Definition and Expectation Suite can exist as long as each has a unique name.
Although an existing Validation Definition cannot be renamed, a duplicate can be created that has a name different or updated from the original.
- Import the GX library and
ValidationDefintion
class:
import great_expectations as gx
from great_expectations.core import ValidationDefinition
In this example the variable context
is your Data Context.
- Get the original Validation Definition.
In this example the variable original_validation_definition
is the original Validation Definition.
- Get the Batch Definition and Expectation Suite from the original Validation Definition:
original_suite = original_validation_definition.suite
original_batch = original_validation_definition.batch_definition
- Add a new Validation Definition to the Data Context using the same Batch Definition and Expectation Suite as the original:
new_definition_name = "my_validation_definition"
new_validation_definition = ValidationDefintion(
data=original_batch,
suite=original_suite,
name=definition_name
)
context.validation_definitions.add(new_validation_definition)
- Optional. Delete the original Validation Definition.
import great_expectations as gx
from great_expectations.core import ValidationDefinition
context = gx.get_context()
original_definition_name = "my_vldtn_dfntn"
original_validation_definition = context.validation_definitions.get(original_definition_name)
original_suite = original_validation_definition.suite
original_batch = original_validation_definition.batch_definition
new_definition_name = "my_validation_definition"
new_validation_definition = ValidationDefintion(
data=original_batch,
suite=original_suite,
name=definition_name
)
context.validation_definitions.add(new_validation_definition)
context.validation_definitions.delete(original_validation_definition)
Run a Validation Definition
- Procedure
- Sample code
-
Create a new or retrieve an existing Validation Definition.
-
Execute the Validation Definition's
run()
method:
validation_result = validation_definition.run()
Validation Results are automatically saved in your Data Context when a Validation Definition's run()
method is called. For convenience, the run()
method also returns the Validation Results as an object you can review.
- Review the Validation Results:
print(validation_result)
GX Cloud users can view the Validation Results in the GX Cloud UI by following the url provided with:
print(validation_result.result_url)
import great_expectations as gx
context = gx.get_context()
existing_validation_definition_name = "my_validation_definition"
validation_definition = context.validation_definitions.get(existing_validation_definition_name)
validation_result = validation_definition.run()
print(validation_result)
print(validation_result.results_url)