Using Lambda Functions as UDF’s in Redshift

Let’s assume I have a list of client_codes saved in a redshift table and I need to find the details from an API.

# select client_code from some_table limit 10;
  client_code   |
————–+
 1001 |  
 2002 |  
 9009 |  
 1009 |  
 1898 |  
 5465 |  
 3244 |  
 5576 |  
 4389 |  
 8756 |  
(10 rows)

I need to get the client addresses from a website. For e.g. the first client code is 1001 and address should come from
http://some_site.com/Details?dest=1001

This can not be done at SQL query level. You need to loop through an array using Python, PHP, Java etc. You can also write your scripts in AWS Lambda and use them as UDF (User Defined Functions) in Redshift. For e.g.

# select client_code, client_details(client_code) as c_address from some_table limit 10;
  client_code   |                  c_address
————–+———————————————
 1001 | 21,Tamilnadu,
 2002 | 14,Madhya Pradesh & Chattisgarh,
 9009 | 7,Gujarat,
 1009 | 23,Uttar Pradesh (W) & Uttarakhand
 1898 | 11,Karnataka
 5465 | 3,Bihar & Jharkhand
 3244 | 11,Karnataka
 5576 | 6,Delhi
 4389 | 13,Kolkata
 8756 | 11,Karnataka
(10 rows)

The code of “client_details” Lambda function will look something like this…

import json
import requests
myurl = ‘http://some_site.com/Details?dest=’

def lambda_handler(event, context):
  ret = dict()
  res = list()
  for argument in event[‘arguments’]:
      try:
        number = str(argument[0])
        page = requests.get(myurl+number[-10:])
        res.append((page.content).decode(‘utf-8’))
        ret[‘success’] = True
      except Exception as e:
        res.append(None)
        ret[‘success’] = False
        ret[‘error_msg’] = str(e)
      ret[‘results’] = res
  return json.dumps(ret)

Notes:
1) We are using “requests” module in this code. Since it is not available in AWS Lambda environment, I have added it using this layer…
# Layer: arn:aws:lambda:us-east-1:770693421928:layer:Klayers-python38-requests:9

2) You will also need to increase the timeout of Lambda upto 15 minutes. The API may take more than 3 seconds (default) to respond.

3) You will also have to update the IAM role associated with your Redshift cluster. (Actions – Manage Role) You can add the policy called “AWSLambdaFullAccess” or grant access to a single function as explained in the documentation.

The lambda function needs to be “linked” to Redshift using the “create function” statement like this…

CREATE OR REPLACE EXTERNAL FUNCTION client_details (number varchar )
RETURNS varchar STABLE
LAMBDA ‘client_details’
IAM_ROLE ‘arn:aws:iam::123456789012:role/RedshiftCopyUnload’;

You need to change the IAM role name and the 12 digit account ID mentioned above in the IAM Role. 

You can now use your lambda function in your redshift query for e.g.
# select client_code, client_details(client_code) as c_address from some_table limit 10;
You can read more…

# https://aws.amazon.com/blogs/big-data/accessing-external-components-using-amazon-redshift-lambda-udfs/

Powered by WPeMatico