To reduce the data scan cost, AWS Athena provides an option to bucket your data. This optimization technique can perform wonders on reducing cost.
Bucketing is a technique that groups data based on specific columns together within a single partition. These columns are known as bucket keys. By grouping related data together into a single bucket (a file within a partition), you significantly reduce the amount of data scanned by Athena, thus improving query performance and reducing cost.
For example, imagine collecting and storing clickstream data. If you frequently filter or aggregate by Sensor ID, then within a single partition it’s better to store all rows for the same sensor together.
CREATE TABLE TargetTable
format = ‘PARQUET’,
external_location = ‘s3://
partitioned_by = ARRAY[‘dt’],
bucketed_by = ARRAY[‘sensorID’],
bucket_count = 3)
AS SELECT *
You can run the select query like this:
select * from TargetTable where dt= ‘2020-08-04-21’ and sensorID = ‘1096’
Powered by WPeMatico