Oops! Something went wrong while submitting the form.
We use cookies to improve your browsing experience on our website, to show you personalised content and to analize our website traffic. By browsing our website, you consent to our use of cookies. Read privacy policy.
Serverless has become a new trend today and is here to stay for sure! Now when you think of wireless internet, you know that it still has some wires but you don’t need to worry about them as you don’t have to maintain them. Similarly, serverless has servers but you don’t have to keep worrying about handling or maintaining them. All you need to do is focus on your code and you’re good to go.
It has some more benefits, such as:
Zero administration: You can deploy code without provisioning anything beforehand, or managing anything later. There is no concept of a fleet, an instance, or even an operating system.
Auto-scaling: It lets your service providers manage the scaling challenges. You don’t need to fire alerts or write scripts to scale up and down. It handles quick bursts of traffic and weekend lulls the same way.
Pay-per-use: The function-as-a-service compute and managed services are charged based on usage rather than pre-provisioned capacity. You can have complete resource utilization without paying a cent for idle time. The results? 90% cost-savings over a cloud VM, and the satisfaction of knowing that you never pay for resources you don’t use.
What is AWS Athena?
AWS Athena is a similar serverless service. It is more of an interactive query service than a code deployment service.
Using Athena one can directly query the data stored in S3 buckets and using standard ANSI SQL.
As mentioned earlier, it works on the principle of serverless, that is, there is no infrastructure to manage, and you pay only for the queries that you run.
Athena is easy to use. You can simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.
It is based on Facebook’s PrestoDB and can be used to query structured and semi-structured data.
Some Exciting Features of Athena are:
Serverless. No ETL - Not having to set up and manage any servers or data warehouses.
Only pay for the data that is scanned.
You can ensure better performance by compressing, partitioning, and converting your data into columnar formats.
Can also handle complex analysis, including large joins, window functions, and arrays.
Athena automatically executes queries in parallel.
Need to provide a path to the S3 folder and when new files added automatically reflects in the table.
Supports -
Support CSV, Json, Parquet, ORC, Avro data formats
Complex Joins and datatypes
View creation
Does not Support -
User-defined functions and stored procedures
Hive or Presto transactions
LZO (Snappy is supported)
Pricing of Athena
AWS Athena is priced $5 for each TB of data scanned.
Queries are rounded up to the nearest MB, with a 10 MB minimum.
Users pay for stored data at regular S3 rates.
Amazon advises users to use compressed data files, have data in columnar formats, and routinely delete old results sets to keep charges low. Partitioning data in tables can speed up queries and reduce query bills.
Athena vs. Redshift Spectrum
AWS also has Redshift as data warehouse service, and we can use redshift spectrum to query S3 data, so then why should you use Athena?
Advantages of Redshift Spectrum:
Allows creation of Redshift tables. You’re able to join Redshift tables with Redshift spectrum tables efficiently.
If you do not need those things then you should consider Athena as well Athena differences from Redshift spectrum:
Billing. This is a major difference and depending on your use case you may find one much cheaper than the other Performance.
Athena slightly faster. SQL syntax and features.
Athena is derived from presto and is a bit different to Redshift which has its roots in Postgres.
It's easy enough to connect to Athena using API, JDBC or ODBC but many more products offer "standard out of the box" connection to Redshift.
Athena has GIS functions and lambdas.
So in nutshell, if you have existing instances of redshift you would probably go for Redshift Spectrum, if not then you can opt for Athena for querying the data. In some cases, you can use both in tandem.
Example
Here is a sample query to create a sample database having 3 tables basic_details, contact_details and bill_details, Uploaded csv file to s3: Basic_details:
Hence, we can easily say that AWS Athena gives us an efficient way to query our raw data present in different formats in S3 object storage, without spawning a dedicated infrastructure and at minimal cost.
Serverless has become a new trend today and is here to stay for sure! Now when you think of wireless internet, you know that it still has some wires but you don’t need to worry about them as you don’t have to maintain them. Similarly, serverless has servers but you don’t have to keep worrying about handling or maintaining them. All you need to do is focus on your code and you’re good to go.
It has some more benefits, such as:
Zero administration: You can deploy code without provisioning anything beforehand, or managing anything later. There is no concept of a fleet, an instance, or even an operating system.
Auto-scaling: It lets your service providers manage the scaling challenges. You don’t need to fire alerts or write scripts to scale up and down. It handles quick bursts of traffic and weekend lulls the same way.
Pay-per-use: The function-as-a-service compute and managed services are charged based on usage rather than pre-provisioned capacity. You can have complete resource utilization without paying a cent for idle time. The results? 90% cost-savings over a cloud VM, and the satisfaction of knowing that you never pay for resources you don’t use.
What is AWS Athena?
AWS Athena is a similar serverless service. It is more of an interactive query service than a code deployment service.
Using Athena one can directly query the data stored in S3 buckets and using standard ANSI SQL.
As mentioned earlier, it works on the principle of serverless, that is, there is no infrastructure to manage, and you pay only for the queries that you run.
Athena is easy to use. You can simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.
It is based on Facebook’s PrestoDB and can be used to query structured and semi-structured data.
Some Exciting Features of Athena are:
Serverless. No ETL - Not having to set up and manage any servers or data warehouses.
Only pay for the data that is scanned.
You can ensure better performance by compressing, partitioning, and converting your data into columnar formats.
Can also handle complex analysis, including large joins, window functions, and arrays.
Athena automatically executes queries in parallel.
Need to provide a path to the S3 folder and when new files added automatically reflects in the table.
Supports -
Support CSV, Json, Parquet, ORC, Avro data formats
Complex Joins and datatypes
View creation
Does not Support -
User-defined functions and stored procedures
Hive or Presto transactions
LZO (Snappy is supported)
Pricing of Athena
AWS Athena is priced $5 for each TB of data scanned.
Queries are rounded up to the nearest MB, with a 10 MB minimum.
Users pay for stored data at regular S3 rates.
Amazon advises users to use compressed data files, have data in columnar formats, and routinely delete old results sets to keep charges low. Partitioning data in tables can speed up queries and reduce query bills.
Athena vs. Redshift Spectrum
AWS also has Redshift as data warehouse service, and we can use redshift spectrum to query S3 data, so then why should you use Athena?
Advantages of Redshift Spectrum:
Allows creation of Redshift tables. You’re able to join Redshift tables with Redshift spectrum tables efficiently.
If you do not need those things then you should consider Athena as well Athena differences from Redshift spectrum:
Billing. This is a major difference and depending on your use case you may find one much cheaper than the other Performance.
Athena slightly faster. SQL syntax and features.
Athena is derived from presto and is a bit different to Redshift which has its roots in Postgres.
It's easy enough to connect to Athena using API, JDBC or ODBC but many more products offer "standard out of the box" connection to Redshift.
Athena has GIS functions and lambdas.
So in nutshell, if you have existing instances of redshift you would probably go for Redshift Spectrum, if not then you can opt for Athena for querying the data. In some cases, you can use both in tandem.
Example
Here is a sample query to create a sample database having 3 tables basic_details, contact_details and bill_details, Uploaded csv file to s3: Basic_details:
Hence, we can easily say that AWS Athena gives us an efficient way to query our raw data present in different formats in S3 object storage, without spawning a dedicated infrastructure and at minimal cost.
Velotio Technologies is an outsourced software product development partner for top technology startups and enterprises. We partner with companies to design, develop, and scale their products. Our work has been featured on TechCrunch, Product Hunt and more.
We have partnered with our customers to built 90+ transformational products in areas of edge computing, customer data platforms, exascale storage, cloud-native platforms, chatbots, clinical trials, healthcare and investment banking.
Since our founding in 2016, our team has completed more than 90 projects with 220+ employees across the following areas:
Building web/mobile applications
Architecting Cloud infrastructure and Data analytics platforms