Skip to main content
Version: 1.0 prerelease

Manage Data Docs

Data Docs translate Expectations, Validation Results, and other metadata into human-readable documentation. Automatically compiling your data documentation from your data tests in the form of Data Docs keeps your documentation current. Use the information provided here to host and share Data Docs stored on a filesystem or a Data Source.

Host and share Data Docs on AWS S3.

Prerequisites

Create an S3 bucket

In the AWS CLI, run the following command to create an S3 bucket configured for a specific location. Modify the bucket name and region for your environment.

Terminal input
> aws s3api create-bucket --bucket data-docs.my_org --region us-east-1
{
"Location": "/data-docs.my_org"
}

Configure your bucket policy

The example policy below enforces IP-based access. Modify the bucket name and IP addresses for your environment. After you have customized the example policy to suit your situation, name the file ip-policy.json and save it in your local directory.

caution

Your policy should limit access to authorized users. Data Docs sites can include sensitive information and should not be publicly accessible.

File content: ip-policy.json
  {
"Version": "2012-10-17",
"Statement": [{
"Sid": "Allow only based on source IP",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::data-docs.my_org",
"arn:aws:s3:::data-docs.my_org/*"
],
"Condition": {
"IpAddress": {
"aws:SourceIp": [
"192.168.0.1/32",
"2001:db8:1234:1234::/64"
]
}
}
}
]
}
tip

Because Data Docs include multiple generated pages, it is important to include the arn:aws:s3:::{your_data_docs_site}/* path in the Resource list along with the arn:aws:s3:::{your_data_docs_site} path that permits access to your Data Docs' front page.

REMINDER

Amazon Web Service's S3 buckets are a third party utility. For more information about configuring AWS S3 bucket policies, see Using bucket policies.

Apply the policy

Run the following AWS CLI command to apply the policy:

Terminal input
> aws s3api put-bucket-policy --bucket data-docs.my_org --policy file://ip-policy.json

Add a new S3 site to great_expectations.yml

The following example shows the default local_site configuration that you will find in your great_expectations.yml file, followed by the s3_site configuration that you will need to add. To maintain a single S3 Data Docs site, remove the default local_site configuration and replace it with the new s3_site configuration.

Python
data_docs_sites:
local_site:
class_name: SiteBuilder
show_how_to_buttons: true
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
S3_site: # this is a user-selected name - you may select your own
class_name: SiteBuilder
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your>'
site_index_builder:
class_name: DefaultSiteIndexBuilder

Test your configuration

Run the following code to build and open your newly configured S3 Data Docs site:

Python
context.build_data_docs()

Additional notes

  • Run the following code to update static hosting settings for your bucket to enable AWS to automatically serve your index.html file or a custom error file:

    Terminal input
    > aws s3 website s3://data-docs.my_org/ --index-document index.html
  • To host a Data Docs site in a subfolder of an S3 bucket, add the prefix property to the configuration snippet immediately after the bucket property.

  • To host a Data Docs site through a private DNS, you can configure a base_public_path for the Data Docs Store. The following example will configure a S3 site with the base_public_path set to www.mydns.com. Data Docs will still be written to the configured location on S3 (for example https://s3.amazonaws.com/data-docs.my_org/docs/index.html), but you can access the pages from your DNS (http://www.mydns.com/index.html in our example)

    YAML
    data_docs_sites:
    s3_site: # this is a user-selected name - you may select your own
    class_name: SiteBuilder
    store_backend:
    class_name: TupleS3StoreBackend
    bucket: data-docs.my_org # UPDATE the bucket name here to match the bucket you configured above.
    base_public_path: http://www.mydns.com
    site_index_builder:
    class_name: DefaultSiteIndexBuilder
    show_cta_footer: true