Building a Blueprint with Great Expectations
Overview
In this tutorial, you'll walk through the steps required to set up Great Expectations to run in the cloud, on Shipyard. We will be creating a Blueprint that can be re-used by multiple team members and updated in the background. This tutorial is only in Python. By the end of the tutorial, you'll be able to:
Set up a Blueprint using Python
Successfully run Great Expectations on Shipyard
Share expectations with your organization
Run multiple instances of Great Expectations simultaneously
Integrate an Expectation Suite into your Fleets
For more information, read our blog post that covers Getting Started with Great Expectations. You can also visit www.greatexpectations.io for additional information.
Setup
For the sake of the this tutorial, we suggest starting off by building a Vessel inside of a Project called "Playground". You can follow this tutorial to set that up.
Download the following file to your computer, without changing the file name. It's a .zip containing a single python file and a Great Expectations directory structure with JSON expectation suites and a YML setup file. We'll use this throughout the tutorial.
Feel free to peruse this script beforehand so you understand everything that it's doing. The main script is accomplishing the following things:
Downloading a file from a public URL.
Decompressing the file if it is a
.gz
file and converting the file into a CSV if it is not one already.Running Great Expectations against the downloaded file, using the included sample expectation suites.
Uploading the validation output to S3, using a file name structure that reference's Shipyard's Platform Environment Variables.
Printing the validation results to the standard output.
Returning the appropriate exit code based on expectation results.
Steps
Click "Blueprints" on the side navigation bar.
Click the "Add Blueprint" button in the top right.
Step 1 - Select A Code Blueprint

Click on the Python Blueprint. You'll be immediately redirected to the next step.
Step 2 - Create Blueprint Variables
Click the plus icon to create a new Blueprint variable. You should see a screen that looks like this:

Our code for Great Expectations has 3 variables that we expect to receive. For a detailed overview of each of these fields, read more about Blueprint Variables.
File URL
Set the Display Name to File URL
Set the Reference Name to
input_url
Leave the Variable Type alone.
Leave the Default Value empty.
Check the box for "Required?"
Set the Placeholder to "https://s3.region.amazonaws.com/bucket-name/key-name.csv"
Set the Tooltip to "URL to download the file from. Must be publicly accessible."
Click Add Variable.
Bucket Name
Set the Display Name to Bucket Name
Set the Reference Name to
output_bucket_name
Leave the Variable Type alone.
Set the Default Value to the bucket name you set up during the Setup phase.
Leave the "Required?" field alone.
Leave the "Placeholder" empty.
Set the Tooltip to "Bucket Name to store the validation JSON files."
Click Add Variable.
Expectation Suite
Set the Display Name to Expectation Suite
Set the Reference Name to
expectation_suite
Change the Variable Type to Select
Under the new section of "Selection Options" click the plus button twice.
Set the first Display Name box to "Amazon Reviews" and set the Internal Value to "amazon-product-reviews".
Set the second Display Name box to "Sample" and set the Internal Value to "sample-suite"
Set the Default Value to Amazon Reviews
Leave the "Required?" field alone.
Leave the "Placeholder" empty.
Set the Tooltip to "Select which of our Expectation Suites to use against the provided file."
Click Add Variable.
At this point, your screen should look something like this. Once you've verified your Blueprint Variables, go ahead and click Next Step.

Step 3 - Provide Your Code
Click the upload section of the page and select the
great_expectations_demo.zip
file from your computer.On the right-hand side of the screen, enter
run_great_expectations.py
into the File to run field.Click the "plus" icon next to arguments 3 times.
We'll be creating an argument for each of the Blueprint Variables that we created in the last step, passing through the user input as ${reference_name}
.
In the first set of fields, type
--input_url
for the flag and${input_url}
for the value.In the second set of fields, type
--output_bucket_name
for the flag and${output_bucket_name}
for the value.In the final set of fields, type
--expectation_suite
for the flag and${expectation_suite}
for the value.
Once these steps are complete, your screen should look exactly like this.

Once you've verified that everything has been set up correctly, click "Next Step" in the bottom right.
Step 4 - Requirements
Environment Variables
Click the "plus" icon next to Environment Variables twice to add two new variables.
Set the first variable's KEY to
GREAT_EXPECTATIONS_AWS_ACCESS_KEY_ID
and Value to the Access Key ID of the bucket you chose during your Setup.Set the second variable's KEY to
GREAT_EXPECTATIONS_AWS_SECRET_ACCESS_KEY
and Value to the AWS Secret of the bucket you chose during your Setup.
Note: The value field will always show •••••••
as you type. This is because Environment Variables are commonly used for passwords and secrets. You can always reveal what you've written by clicking the eye icon.
Packages
Click the "plus" icon next to Packages 4 times to add four new packages.
Set the first Package Name to
boto3
and the version to==1.12.16
Set the second Package Name to
great-expectations
and the version to==0.9.5
Set the third Package Name to
pandas
and the version to==1.0.1
Set the fourth Package Name to
wget
and the version to==3.2
Your screen should look similar to this:

Once you're done, go ahead and click the Next Step button at the bottom of the screen.
Step 5 - Settings
Under the State section, select Active.
Under the Information section:
Give your Blueprint the name of
Great Expectations - Demo
.Give your Blueprint the Synopsis of
Run a file against an existing Expectation Suite.
Give your Blueprint a Description of
Provide a Link to a publicly available file in the File URL field. This file will be run against the Expectation Suite selected, with the final validation file sent directly to the S3 Bucket listed under "Bucket Name", nested under a folder called great-expectations/{expectation-suite}/
Leave the Guardrails section defaults of 1x and ASAP.
Click the Save & Finish button at the bottom of the screen.
You've successfully set up Great Expectations as a Blueprint!
Now anyone in your organization can use the Blueprint to test data against your Expectation Suites. We're going to test our Blueprint to validate that everything runs correctly.
Step 6 - Setting Up a Vessel
Navigate to any project. We recommend the Playground project set up in previous tutorials.
Click the "Build Vessel" button in the top right corner.
Select to Build a Vessel using a Custom Blueprint.
Select the Blueprint called "Great Expectations - Demo"

At this point, you should be on a screen that looks like this:

Enter
https://s3.amazonaws.com/amazon-reviews-pds/tsv/sample_us.tsv
into the File URL field.Leave the Bucket Name as is.
Leave the Expectation Suite as is.
Click Next Step.
On the Triggers step, immediately click Next Step. We don't need to have any schedules for this tutorial.
On the Settings step:
Change the State to Active.
Name your Vessel
GE - Sample Data - Amazon Reviews
Click Save & Finish
Immediately Click "Run Your Vessel"
Step 7 - Review the Results
Click the first Log ID that you generated. If everything was set up correctly, the run should be a Success!
Within the Log you'll be able to see all of the expectations and their output for the sample data.

You should also be able to see the validation file in your S3 bucket of choice.

Congratulations on setting up a Great Expectations Blueprint! You now have a repeatable solution that can be used again and again for all of your Expectation Suites.
What Comes Next
Now that you've successfully worked your way through this tutorial, there's a lot of additional things that you can try out on your own with this knowledge.
Test Additional Variables
Set up additional Vessels using the Great Expectations - Demo Blueprint and change just a few of the variables.
Try using different Amazon Review Files found here. Some of them will cause failures because they don't meet all of the expectations within the Expectations Suite.
Try leaving the Bucket Name blank.
Try sending your data to a different bucket.
Use the Sample expectation suite.
Create New Variables
Our tutorial may not have had enough flexibility to meet the general data demands of your organization. You can easily tweak the script to accomplish some of the following goals:
Set a custom file name for the validation output.
Pull files from other non-public sources.
Allow options for different exit code conditions.
Expectation Suite Updates
Add your own expectation suite into the
great_expectations/expectations
folder, add the suite as a new Select Option in the Blueprint, and set up a new Vessel to use that expectation suite.Update the existing
amazon-product-reviews
suite to include additional rules based on your own findings of the Amazon review data.
Last updated
Was this helpful?