Skip to main content
Version: 1.0 prerelease

Manage Data Contexts

A Data Context defines the storage location for metadata, such as your configurations for Data Sources, Expectation Suites, Checkpoints, and Data Docs. It also contains your Validation Results and the metrics associated with them, and it provides access to those objects in Python, along with other helper functions for the GX Python API.

The following are the available Data Context types:

  • File Data Context - A persistent Data Context that stores metadata and configuration information as YAML files.

  • Ephemeral Data Context - A temporary Data Context that stores metadata and configuration information in memory. This Data Context will not persist beyond the current Python session.

  • GX Cloud Data Context - A Data Context that connects to a GX Cloud Account to retrieve and store metadata and configuration information from the cloud.

Prerequisites

Request a Data Context

  1. Run the following code to request a Data Context:
Python
import great_expectations as gxe

context = gxe.get_context()

If you don't specify parameters with the get_context() method, GX checks your project environment and returns the first Data Context using the following criteria:

  • get_context() instantiates and returns a GX Cloud Data Context if it finds the necessary credentials in your environment variables.
  • If a GX Cloud Data Context cannot be instantiated, get_context() will instantiate and return the first File Data Context it finds in the folder hierarchy of your current working directory.
  • If neither of the above options are viable, get_context() instantiates and returns an Ephemeral Data Context.
  1. Optional. Run the following code to verify the type of Data Context you received:
Python
from great_expectations.data_context import EphemeralDataContext, CloudDataContext, FileDataContext

print("Cloud:", isinstance(context, CloudDataContext))
print("File:", isinstance(context, FileDataContext))
print("Ephemeral:", isinstance(context, EphemeralDataContext))

Initialize a new Data Context

A Data Context is required in almost all Python scripts using GX 1.0. Use Python code to initialize, instantiate, and verify the contents of a Filesystem Data Context.

Import GX

Run the following code to import the GX module:

Python
import great_expectations as gx

Determine the folder to initialize the Data Context in

Run the following code to initialize your Filesystem Data Context in an empty folder:

Python
path_to_empty_folder = "/my_gx_project/"

Create a Data Context

You provide the path for your empty folder to the GX library's FileDataContext.create(...) method as the project_root_dir parameter. Because you are providing a path to an empty folder, FileDataContext.create(...) initializes a Filesystem Data Context in that location.

For convenience, the FileDataContext.create(...) method instantiates and returns the newly initialized Data Context, which you can keep in a Python variable.

Python
from great_expectations.data_context import FileDataContext

context = FileDataContext.create(project_root_dir=path_to_empty_folder)
What if the folder is not empty?

If the project_root_dir provided to the FileDataContext.create(...) method points to a folder that does not already have a Data Context present, the FileDataContext.create(...) method initializes a Filesystem Data Context in that location even if other files and folders are present. This allows you to initialize a Filesystem Data Context in a folder that contains your Data Assets or other project related contents.

If a Data Context already exists in project_root_dir, the FileDataContext.create(...) method will not re-initialize it. Instead, FileDataContext.create(...) instantiates and returns the existing Data Context.

Verify the Data Context content

Run the following code to confirm the Data Context was instantiated correctly:

Python
  print(context)

The Data Context configuration formatted as a Python dictionary appears.

Connect to an existing Data Context

If you're using GX for multiple projects, you might want to use a different Data Context for each project. Instantiate a specific Filesystem Data Context so that you can switch between sets of previously defined GX configurations.

Prerequisites

Import GX

Python
import great_expectations as gx

Initialize a Filesystem Data Context

Each Filesystem Data Context has a root folder in which it was initialized. This root folder identifies the specific Filesystem Data Context to instantiate.

Python
path_to_project_root = "./my_project/"

Run the get_context(...) method

You provide the path for your empty folder to the GX library's get_context(...) method as the project_root_dir parameter. Because you are providing a path to an empty folder, the get_context(...) method instantiates and return the Data Context at that location.

Python
context = gx.get_context(project_root_dir=path_to_project_root)
Project root vs context root

There is a subtle distinction between the project_root_dir and context_root_dir arguments accepted by get_context(...).

Your context root is the directory that contains your GX config while your project root refers to your actual working directory (and therefore contains the context root).

# The overall directory is your project root
data/
gx/ # The GX folder with your config is your context root
great_expectations.yml
...
...

Both are functionally equivalent for purposes of working with a file-backed project.

What if the folder does not contain a Data Context?

If the root directory provided to the get_context(...) method points to a folder that does not already have a Data Context, the get_context(...) method initializes a new Filesystem Data Context in that location.

The get_context(...) method instantiates and returns the newly initialized Data Context.

Verify the Data Context content

Run the following code to confirm the Data Context was instantiated correctly:

Python
  print(context)

The Data Context configuration formatted as a Python dictionary appears.

Export an Ephemeral Data Context to a new File Data Context

An Ephemeral Data Context is a temporary, in-memory Data Context that doesn't persist beyond the current Python session. To save the contents of an Ephemeral Data Context for future use you can convert it to a Filesystem Data Context.

Prerequisites

  • An Ephemeral Data Context

Confirm your Data Context is Ephemeral

To confirm that you're working with an Ephemeral Data Context, run the following code:

Python
from great_expectations.data_context import EphemeralDataContext

# ...

if isinstance(context, EphemeralDataContext):
print("It's Ephemeral!")

The example code assumes that your Data Context is stored in the variable context.

Verify that a Filesystem Data Context doesn't exist

The method for converting an Ephemeral Data Context to a Filesystem Data Context initializes the new Filesystem Data Context in the current working directory of the Python process that is being executed. If a Filesystem Data Context already exists at that location, the process will fail.

You can determine if your current working directory already has a Filesystem Data Context by looking for a great_expectations.yml file. The presence of that file indicates that a Filesystem Data Context has already been initialized in the corresponding directory.

Convert the Ephemeral Data Context into a Filesystem Data Context

Converting an Ephemeral Data Context into a Filesystem Data Context can be done with one line of code:

Python
context = context.convert_to_file_context()
Replacing the Ephemeral Data Context

The convert_to_file_context() method does not change the Ephemeral Data Context itself. Rather, it initializes a new Filesystem Data Context with the contents of the Ephemeral Data Context and then returns an instance of the new Filesystem Data Context. If you do not replace the Ephemeral Data Context instance with the Filesystem Data Context instance, it will be possible for you to continue using the Ephemeral Data Context.

If you do this, it is important to note that changes to the Ephemeral Data Context will not be reflected in the Filesystem Data Context. Moreover, convert_to_file_context() does not support merge operations. This means you will not be able to save any additional changes you have made to the content of the Ephemeral Data Context. Neither will you be able to use convert_to_file_context() to replace the Filesystem Data Context you had previously created: convert_to_file_context() will fail if a Filesystem Data Context already exists in the current working directory.

GX recommends that you stop using the Ephemeral Data Context instance after you convert your Ephemeral Data Context to a Filesystem Data Context.

View a Data Context configuration

Run the following code to view Data Context configuration information:

Python
  from great_expectations.data_context import EphemeralDataContext, CloudDataContext, FileDataContext

print("Cloud:", isinstance(context, CloudDataContext))
print("File:", isinstance(context, FileDataContext))
print("Ephemeral:", isinstance(context, EphemeralDataContext))

Next steps