Architecting Solutions: Building a Proof of Concept for Data Analytics
The exercises in this course will incur charges on your AWS account. In this exercise, you will create or use the following resources:
AWS Identity and Access Management (IAM) policy and user (These are AWS account features, offered at no additional charge)
Amazon Simple Storage Service (Amazon S3) bucket
AWS Lambda function
Amazon Kinesis Data Firehose delivery stream
Amazon API Gateway
Amazon Athena table
Amazon QuickSight dashboards
The final task in this exercise includes instructions for deleting all the resources you create. Familiarize yourself with IAM, Amazon S3 pricing, Lambda pricing, Kinesis Data Firehose pricing, API Gateway pricing, Amazon Athena pricing, Amazon QuickSight pricing, and the AWS Free Tier.
Exercise 2: Architecting Solutions: Building a Proof of Concept for Data Analytics
This exercise provides instructions for building a data analytics solution. This week, you will design an architecture for a customer who needs an analytics solution to ingest, store, and visualize clickstream data. The customer is a restaurant owner who wants to derive insights into all menu items ordered in their restaurant. Since the customer has limited staff for running and maintaining this solution, you will build a proof of concept using managed services on AWS.
Architecture Diagram
In this architecture, you will use API Gateway to ingest clickstream data, Lambda to transform the data, and Kinesis Data Firehose to deliver the data to an S3 bucket. You will then use Amazon Athena to query the data and Amazon QuickSight to create visualizations.
Learning Objectives
Create IAM policies and roles to follow AWS Cloud best practices.
Create an S3 bucket to store clickstream data.
Create a Lambda function for transforming data in Kinesis Data Firehose.
Create a Kinesis Data Firehose delivery stream to ingest real-time streaming data to S3.
Create a REST API for data insertion.
Create an Amazon Athena table to view the ingested data.
Create Amazon QuickSight dashboards to visualize the data.
Prerequisites
Use the US East (N. Virginia) us-east-1 Region in the AWS Management Console.
Your account ID is a 12-digit account number that appears under your account alias in the top-right corner of the AWS Management Console. Make sure to remove hyphens (-) when entering your account number.
Task 1: Setup - Creating the IAM Policy and Role
Step 1.1: Creating Custom IAM Policies
Sign in to the AWS Management Console.
Enter IAM in the search box and select IAM.
In the navigation pane, choose Policies and then Create policy.
In the JSON tab, replace the placeholder code with the following policy:
jsonCopy code{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "firehose:PutRecord",
"Resource": "*"
}
]
}
Choose Next.
Name the policy API-Firehose.
Choose Create policy.
Step 1.2: Creating an IAM Role and Attaching a Policy to It
In the IAM dashboard navigation pane, choose Roles and then Create role.
For Trusted entity type, select AWS service.
In the Use case section, choose API Gateway.
Choose Next twice.
Name the role APIGateway-Firehose.
Choose Create role.
From the roles list, select the APIGateway-Firehose role.
In the Permissions policies section, choose Attach policies from the Add permissions menu.
Select API-Firehose and choose Add permissions.
Copy the APIGateway-Firehose ARN and save it for your records. The ARN might look like this:
arn:aws:iam::<account ID>:role/APIGateway-Firehose
.
Task 2: Creating an S3 Bucket
In the AWS Management Console search box, enter S3 and open the service.
Choose Create bucket.
Enter a unique name for the bucket (e.g.,
architecting-week2-<your initials>
).Ensure the AWS Region is set to US East (N. Virginia) us-east-1.
Choose Create bucket.
Open the bucket details by choosing its name.
Choose the Properties tab.
Copy the bucket’s Amazon Resource Name (ARN) and save it for your records.
Task 3: Creating a Lambda Function
In the AWS Management Console, search for Lambda and open the service.
Choose Create a function.
Select Use a blueprint.
Filter by entering Kinesis and select the Python 3.8 blueprint called Process records sent to a Kinesis Firehose stream. Choose Configure.
Name the function transform-data.
Keep all default settings and choose Create function.
Replace the default code in the Code tab with the following:
pythonCopy codeimport json
import boto3
import base64
output = []
def lambda_handler(event, context):
for record in event['records']:
payload = base64.b64decode(record['data']).decode('utf-8')
row_w_newline = payload + "\n"
row_w_newline = base64.b64encode(row_w_newline.encode('utf-8'))
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': row_w_newline
}
output.append(output_record)
return {'records': output}
Choose Deploy.
In the Configuration tab, edit the Timeout setting to 10 seconds and save.
Copy the function ARN and save it for your records.
Task 4: Creating a Kinesis Data Firehose Delivery Stream
Step 4.1: Creating the Delivery Stream
In the AWS Management Console, search for Kinesis and open the service.
Select Kinesis Data Firehose and choose Create delivery stream.
Set Source to Direct PUT and Destination to Amazon S3.
Enable data transformation and select the ARN of your Lambda function.
Choose your S3 bucket and select Choose.
Create the delivery stream and wait for it to be ready.
Step 4.2: Copying the IAM Role ARN
Open the delivery stream details page if needed.
In the Configuration tab, select the IAM role.
Copy the IAM role ARN and save it.
Task 5: Adding the Firehose Delivery Stream ARN to the S3 Bucket
Open the Amazon S3 console and open the bucket details.
In the Permissions tab, edit the Bucket policy and paste the following script:
jsonCopy code{
"Version": "2012-10-17",
"Id": "PolicyID",
"Statement": [
{
"Sid": "StmtID",
"Effect": "Allow",
"Principal": {
"AWS": "<Enter the ARN for the Kinesis Firehose IAM role>"
},
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"<Enter the ARN of the S3 bucket>",
"<Enter the ARN of the S3 bucket>/*"
]
}
]
}
- Replace placeholders with your ARN values and save the policy.
Task 6: Creating an API in API Gateway
Open the API Gateway console.
Select Build on the REST API card and configure as follows:
Protocol: REST
Create new API: New API
API name: clickstream-ingest-poc
Endpoint Type: Regional
Choose Create API.
Create a new resource named poc.
Create a POST method with the following settings:
Integration type: AWS Service
AWS Region: us-east-1
AWS Service: Firehose
HTTP method: POST
Action Type: Use action name
Action: PutRecord
Execution role: Paste the ARN of the APIGateway-Firehose role
Add a mapping template for application/json and use the following script:
jsonCopy code{
"DeliveryStreamName": "<Enter the name of your delivery stream>",
"Record": {
"Data": "$util.base64Encode($util.escapeJavaScript($input.json('$')).replace('\', ''))"
}
}
Test the API with the following JSON payloads:
In the script code, substitute the placeholder value for
DeliveryStreamName
with the name of the Kinesis Data Firehose delivery stream that you created. Locate the name of the Firehose delivery stream in the Amazon Kinesis console, under the "Delivery streams" tab. For example:jsonCopy code{ "DeliveryStreamName": "<Enter the name of your delivery stream>", "Record": { "Data": "$util.base64Encode($util.escapeJavaScript($input.json('$')).replace(' \ } }
Choose "Save".
Navigate back to the
/poc - POST - Method Execution
page.Select "Test".
For the test, input various discrete JSON payloads into the API to simulate how an application frontend sends small bits of data each time a person interacts with the menu.
Paste the following JSON into the "Request Body" box and choose "Test":
jsonCopy code{
"element_clicked": "entree_1",
"time_spent": 67,
"source_menu": "restaurant_name",
"created_at": "2022-09-11 23:00:00"
}
Review the request logs on the right side of the window. Confirm that you see the following messages: “Successfully completed execution” and “Method completed with status: 200”. These messages indicate that API Gateway processed the data successfully.
Replace the previous JSON payload with the following JSON and choose "Test" to confirm successful processing:
jsonCopy code{
"element_clicked": "drink_1",
"time_spent": 15,
"source_menu": "restaurant_name",
"created_at": "2022-09-11 23:00:00"
}
Repeat the steps for each of the following JSON payloads:
Entree 4
Drink 1
Drink 3
Creating an Athena Table:
In this task, you'll create a table in Athena and run a SQL query to view the payloads inserted via the REST API.
Open the Athena service console.
Choose "Query editor".
Navigate to the "Settings" tab and then "Manage".
Browse and select the S3 bucket created in this exercise. Click "Save".
Choose the "Editor" tab.
In the "Tables and views" section, under the "Create" menu, select "CREATE TABLE AS SELECT".
Replace the placeholder code in the Query editor with the provided script:
sqlCopy codeCREATE EXTERNAL TABLE my_ingested_data (
element_clicked STRING,
time_spent INT,
source_menu STRING,
created_at STRING
)
PARTITIONED BY (datehour STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
Replace the following placeholder values in the script:
LOCATION
: Replace<Enter your Amazon S3 bucket name>
with the name of your bucket. Ensure the format is"s3://<bucket_name>/"
storage.location.template
: Replace<Enter your Amazon S3 bucket name>
with the name of your bucket. For example:"s3://<bucket_name>/${datehour}/"
Choose "Run".
Create a new query and paste
SELECT * FROM my_ingested_data;
into the query editor. Choose "Run" to view the inserted data.Create a new query by choosing the plus sign (+) (at the top-right of the query editor).
In the query editor, paste SELECT * FROM my_ingested_data; and choose Run. The query should produce results with the entries that you ran in API Gateway. Task 8: Visualizing data with QuickSight After the clickstream data is processed successfully, you can use QuickSight to visualize data. With QuickSight, you can gain better insights into your streaming data by analyzing it CREATE EXTERNAL TABLE my_ingested_data ( element_clicked STRING, time_spent INT, source_menu STRING, created_at STRING ) PARTITIONED BY ( datehour STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' with serdeproperties ( 'paths'='element_clicked, time_spent, source_menu, created_at' LOCATION "s3:///" TBLPROPERTIES ( "projection.enabled" = "true", "projection.datehour.type" = "date", "projection.datehour.format" = "yyyy/MM/dd/HH", "projection.datehour.range" = "2021/01/01/00,NOW", "projection.datehour.interval" = "1", "projection.datehour.interval.unit" = "HOURS", "storage.location.template" = "s3:///${datehour}/" ) Exercise 2: Architecting Solutions: Building a Proof of Concept for Da... https://aws-tc-largeobjects.s3.us-west-2.amazonaws.com/DEV-AWS-... 11 of 15 6/16/2024, 10:23 AM and publishing data dashboards. The instructions for how to visualize data in QuickSight might differ, depending if you are a new user or an existing user. Note: Amazon QuickSight is a subscription service. If you need to delete your QuickSight account after you complete this exercise, follow the instructions in the final task. For new Amazon QuickSight users
Open the QuickSight service console.
Choose Sign up for QuickSight. Choose Enterprise and Continue.
Set up an account and choose Finish.
In the upper-right corner, open the user menu by choosing the user icon and then choose Manage QuickSight.
In the navigation pane, choose Security & permissions and in QuickSight access to AWS services, choose Manage.
Under Amazon S3, choose Select S3 buckets.
Select the bucket that you created in this exercise, and also select Write permission for Athena Workgroup.
Choose Finish and save your changes.
Return to the QuickSight console.
In the Analyses tab, choose New analysis.
Choose New dataset.
Choose Athena and configure the following settings: ◦ Name datasource: poc-clickstream ◦ Select workgroup: [primary]
Choose Create data source.
In the Choose your table dialog box, select the my_ingested_data table, and choose Select.
In the Finish dataset creation dialog box, make sure that Import to SPICE for quicker analytics is selected, and choose Visualize.
View your visualization results by selecting field items and visual types for the diagram. For more information about how to visualize data in Amazon QuickSight, see Tutorial: Create an AmazonQuickSight analysis. Exercise 2: Architecting Solutions: Building a Proof of Concept for Da... https://aws-tc-largeobjects.s3.us-west-2.amazonaws.com/DEV-AWS-... 12