Use Great Expectations with Amazon Web Services using Redshift
Great Expectations can work within many frameworks. In this guide you will be shown a workflow for using Great Expectations with AWS and cloud storage. You will configure a local Great Expectations project to store Expectations, Validation Results, and Data Docs in Amazon S3 buckets. You will further configure Great Expectations to access data from a Redshift database.
This guide will demonstrate each of the steps necessary to go from installing a new instance of Great Expectations to Validating your data for the first time and viewing your Validation Results as Data Docs.
Prerequisites
- An installation of Python, version 3.8 to 3.11. To download and install Python, see Python downloads.
- The AWS CLI. To download and install the AWS CLI, see Installing or updating the latest version of the AWS CLI.
- AWS credentials. See Configuring the AWS CLI.
- Permissions to install the Python packages (
boto3
andgreat_expectations
) with pip. - An S3 bucket and prefix to store Expectations and Validation Results.
Steps
Part 1: Setup
1.1 Ensure that the AWS CLI is ready for use
1.1.1 Verify that the AWS CLI is installed
Run the following code to verify that the AWS CLI is installed:
aws --version
If this code does not return the AWS CLI version information, you may need to install the AWS CLI or troubleshoot your current installation. See Install or update the latest version of the AWS CLI
1.1.2 Verify that your AWS credentials are properly configured
Run the following command in the AWS CLI to verify that your AWS credentials are properly configured:
aws sts get-caller-identity
When your credentials are properly configured, your UserId
, Account
, and Arn
are returned. If your credentials are not configured correctly, an error message appears. If you received an error message, or you couldn't verify your credentials, see Configuring the AWS CLI.
1.2 Prepare a local installation of Great Expectations
1.2.1 Verify that your Python version meets requirements
Run the following code to check what version of Python is currently installed:
python --version
Great Expectations supports Python versions 3.8 to 3.11. If a Python 3 version number is not returned, run the following code:
python3 --version
If you do not have Python 3 installed, go to python.org for the current downloads and installation guidance.
1.2.2 Create a virtual environment for your Great Expectations project
After you have confirmed that Python 3 is installed locally, you can create a virtual environment with venv
before installing your packages with pip
. The following examples use venv
for virtual environments because it is included with Python 3. You can use alternate tools such as virtualenv and pyenv to install GX in virtual environments.
Run one of the following code blocks to create your virtual environment:
python -m venv my_venv
or
python3 -m venv my_venv
A new directory named my_venv
is created in your virtual environment.
Run the following code to activate the virtual environment:
source my_venv/bin/activate
To change the name of your virtual environment, replace my_venv
in the example code.
1.2.3 Ensure you have the latest version of pip
After you've activated your virtual environment, you should ensure that you have the latest version of pip installed. Pip is a tool that is used to easily install Python packages.
Run the following code to ensure that you have the latest version of pip installed:
python -m ensurepip --upgrade
or
python3 -m ensurepip --upgrade
1.2.4 Install boto3
Python interacts with AWS through the boto3
library. Great Expectations makes use of this library in the background when working with AWS. Although you won't use boto3
directly, you'll need to install it in your virtual environment.
Run one of the following pip commands to install boto3
in your virtual environment:
python -m pip install boto3
or
python3 -m pip install boto3
To set up boto3 with AWS, and use boto3
within Python, see the Boto3 documentation.
1.2.5 Install Great Expectations
Run one of the following code blocks to use pip to install Great Expectations:
python -m pip install great_expectations
or
python3 -m pip install great_expectations
1.2.6 Verify that Great Expectations installed successfully
Run the following code to confirm the GX installation is working:
great_expectations --version
Version information similar to the following is returned:
great_expectations, version 0.18.9
1.2.7 Install additional dependencies for Redshift
To use connect to your Redshift database, Great Expectations will require the installation of additional dependencies. Fortunately, it is simple to install the necessary dependencies for Redshift by using pip
and running the following from your terminal:
pip install sqlalchemy sqlalchemy-redshift psycopg2
# or if on macOS:
pip install sqlalchemy sqlalchemy-redshift psycopg2-binary
As of this writing, Great Expectations is not compatible with SQLAlchemy version 2 or greater. We recommend using the latest non-version-2 release.