top

Search

AWS Tutorials

Athena is a service offered by Amazon which is an interactive query service. Athena makes it easy for the user to directly analyse data which is present in Amazon S3 (Simple Storage Service) by using standard SQL. In the Amazon Management Console, with just a few clicks of the button, Athena can be made to point to the location where data is stored in Amazon S3. After this, SQL can be used to run ad-hoc queries, and this will fetch result to the user in a matter of seconds.  Athena is a serverless service, which means the user who uses this service doesn’t have to worry about managing the infrastructure or setting it up. Athena has the ability to scale automatically, hence it can execute multiple queries in parallel, which yields results quickly, even when datasets are huge and there are complex queries to be executed.  Use case of AthenaAthena helps in analysing unstructured, semi-structured as well as structured data that is stored in Amazon S3. Data can be stored in the form of CSV, JSON or columnar data formats like Apache Parquet and Apache ORC. Athena can be used to execute queries using ANSI SQL as well, and this doesn't require the user to aggregate or load data into Athena service.  For the purpose of data visualization, Athena can be integrated with Amazon QuickSight which helps in generating reports, data exploration with business-intelligence tools such as SQL clients that are connected with JDBC or an ODBC driver.  Athena can also be integrated with AWS Glue Data Catalog. AWS Data Catalog provides persistent metadata store for user data in Amazon S3. This way, tables can be created, and data can be queried in Athena, all based on the centralized metadata store which is available throughout the user’s account. This can also be integrated with ETL (Extract, Transform, and Load) and data discovery features which are present in AWS Glue Catalog.. Athena can be used to perform the following operations:Creation or selection of a database. Creating, viewing and deleting tables. Filtering tables by just typing their names.  Previewing tables, generating the ‘CREATE TABLE DDL’ for them.  Showing table properties.  Running queries on tables, saving and formatting these queries, viewing query history.  Displaying query results, saving and exporting this data.  Accessing AWS Glue Data Catalog.  Viewing and changing the setting, ex- viewing query result, and location, configuring auto-complete, and encrypting the results of the query.  High-level overview of the actions that can be performed on every table: Preview tables: The query syntax can be viewed in the Query Editor which is present on the right pane.  Show properties: A table’s properties, such as table name, its location in Amazon S3, input and output formats, serialization library that is used, and whether the table contains encrypted data.  Delete table: A table can be deleted when it is not required.  Generate CREATE TABLE DLL: Query can be generated behind a table, and it can be viewed in the query editor.  Accessing AthenaAthena can be accessed in the following ways: Amazon Management Console Through JDBC or an ODBC connection Athena API Athena CLI Pre-requisites An AWS account IAM User- IAM (Identity and Access Management) user is an account which is created so as to access the Amazon services. This is a different user which is not the same as AWS account.  Using Athena serviceBefore using the Athena service, a database has to be created in Athena. This can be done by following the below mentioned steps:1. Open Athena console. 2. A ‘getting started’ page shows up if the user is visiting the Athena console for the first time.3. Click on the ‘Get Started’ tab, and open the Athena Query Editor. 4. In the Query Editor, a query pane with a sample query is present. In that space, the user’s query can be entered.5. A database can be created, by entering the below command in the Query Editor, and then clicking on the ‘Run Query’ button:CREATE DATABASE database_name6. Make sure that the Catalog display is refreshed and the name of the database appears in the list of DATABASE in the Catalog dashboard which is on the left side.Once a database has been created, a table needs to be created.  A table has to be created in the database that will have the same data. Columns can be defined that will map to the data, and specifications regarding how the data can be delimited can be specified. In addition to this, the location of Amazon S3 has to be provided for the file.  Make sure the selected DATABASE from the list is your_database_name.  Click on the ‘New Query’. In the query pane, enter the below mentioned statement, and then click on ‘Run Query’.  Data can be queried in regions other than region where Athena is run.  Once the table has been created, it appears in the Catalog dashboard of the user’s database.Once the database has been created, and a table has been created in the Athena service based on the data present in Amazon S3, it is time to query the data present in the table.Click on the ‘New Query’, enter the below query, and click on the ‘Run Query’ button.SELECT os, COUNT(*) count  FROM data_table_name  WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05'  GROUP BY os;Once this query is executed, it gives some results based on the data present in Amazon S3.  This data can be stored to a CSV file, by clicking on the file icon present in the ‘Results’ pane. The results of previous queries can also be viewed, and this might take some time.  The ‘History’ button can be clicked on to view the previous queries.Click on the ‘View’ or ‘Download’ button to view or download the previous query results.  This will also display status of queries which are currently running.Note: The history of these queries is retained for 45 days.The query results can also be stored in Amazon S3 bucket, known as aws-athena-query-results-ACCOUNTID-REGION. The default location of S3 can be changed in the console. The encryption options can also be changed by clicking on the ‘Setting’ tab in the upper right pane.ConclusionIn this post, we understood how Amazon Athena can be used to run SQL queries and get results.
logo

AWS Tutorials

What are the different use case of Athena in AWS?

Athena is a service offered by Amazon which is an interactive query service. Athena makes it easy for the user to directly analyse data which is present in Amazon S3 (Simple Storage Service) by using standard SQL. In the Amazon Management Console, with just a few clicks of the button, Athena can be made to point to the location where data is stored in Amazon S3. After this, SQL can be used to run ad-hoc queries, and this will fetch result to the user in a matter of seconds.  

Athena is a serverless service, which means the user who uses this service doesn’t have to worry about managing the infrastructure or setting it up. Athena has the ability to scale automatically, hence it can execute multiple queries in parallel, which yields results quickly, even when datasets are huge and there are complex queries to be executed.  

Use case of Athena

Athena helps in analysing unstructured, semi-structured as well as structured data that is stored in Amazon S3. Data can be stored in the form of CSV, JSON or columnar data formats like Apache Parquet and Apache ORC. Athena can be used to execute queries using ANSI SQL as well, and this doesn't require the user to aggregate or load data into Athena service.  

For the purpose of data visualization, Athena can be integrated with Amazon QuickSight which helps in generating reports, data exploration with business-intelligence tools such as SQL clients that are connected with JDBC or an ODBC driver.  

Athena can also be integrated with AWS Glue Data Catalog. AWS Data Catalog provides persistent metadata store for user data in Amazon S3. This way, tables can be created, and data can be queried in Athena, all based on the centralized metadata store which is available throughout the user’s account. This can also be integrated with ETL (Extract, Transform, and Load) and data discovery features which are present in AWS Glue Catalog.. 

Athena can be used to perform the following operations:

  • Creation or selection of a database. 
  • Creating, viewing and deleting tables. 
  • Filtering tables by just typing their names.  
  • Previewing tables, generating the ‘CREATE TABLE DDL’ for them.  
  • Showing table properties.  
  • Running queries on tables, saving and formatting these queries, viewing query history.  
  • Displaying query results, saving and exporting this data.  
  • Accessing AWS Glue Data Catalog.  
  • Viewing and changing the setting, ex- viewing query result, and location, configuring auto-complete, and encrypting the results of the query.  

High-level overview of the actions that can be performed on every table: 

  • Preview tables: The query syntax can be viewed in the Query Editor which is present on the right pane.  
  • Show properties: A table’s properties, such as table name, its location in Amazon S3, input and output formats, serialization library that is used, and whether the table contains encrypted data.  
  • Delete table: A table can be deleted when it is not required.  
  • Generate CREATE TABLE DLL: Query can be generated behind a table, and it can be viewed in the query editor.  

Accessing Athena

Athena can be accessed in the following ways: 

  • Amazon Management Console 
  • Through JDBC or an ODBC connection 
  • Athena API 
  • Athena CLI 

Pre-requisites 

  • An AWS account 
  • IAM User- IAM (Identity and Access Management) user is an account which is created so as to access the Amazon services. This is a different user which is not the same as AWS account.  

Using Athena service

Before using the Athena service, a database has to be created in Athena. This can be done by following the below mentioned steps:

1. Open Athena console. 
2. A ‘getting started’ page shows up if the user is visiting the Athena console for the first time.
3. Click on the ‘Get Started’ tab, and open the Athena Query Editor. 
4. In the Query Editor, a query pane with a sample query is present. In that space, the user’s query can be entered.

Amazon Athena

5. A database can be created, by entering the below command in the Query Editor, and then clicking on the ‘Run Query’ button:

CREATE DATABASE database_name

6. Make sure that the Catalog display is refreshed and the name of the database appears in the list of DATABASE in the Catalog dashboard which is on the left side.

Amazon Athena

Once a database has been created, a table needs to be created.  A table has to be created in the database that will have the same data. Columns can be defined that will map to the data, and specifications regarding how the data can be delimited can be specified. In addition to this, the location of Amazon S3 has to be provided for the file.  

  • Make sure the selected DATABASE from the list is your_database_name.  
  • Click on the ‘New Query’
  • In the query pane, enter the below mentioned statement, and then click on ‘Run Query’.  
  • Data can be queried in regions other than region where Athena is run.  
  • Once the table has been created, it appears in the Catalog dashboard of the user’s database.

Amazon Athena

Once the database has been created, and a table has been created in the Athena service based on the data present in Amazon S3, it is time to query the data present in the table.

Click on the ‘New Query’, enter the below query, and click on the ‘Run Query’ button.

SELECT os, COUNT(*) count 
FROM data_table_name 
WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05' 
GROUP BY os;

Once this query is executed, it gives some results based on the data present in Amazon S3.  

This data can be stored to a CSV file, by clicking on the file icon present in the ‘Results’ pane. 

Amazon Athena

The results of previous queries can also be viewed, and this might take some time.  The ‘History’ button can be clicked on to view the previous queries.

Click on the ‘View’ or ‘Download’ button to view or download the previous query results.  This will also display status of queries which are currently running.

Note: The history of these queries is retained for 45 days.

Amazon Athena

The query results can also be stored in Amazon S3 bucket, known as aws-athena-query-results-ACCOUNTID-REGION. The default location of S3 can be changed in the console. The encryption options can also be changed by clicking on the ‘Setting’ tab in the upper right pane.

Conclusion

In this post, we understood how Amazon Athena can be used to run SQL queries and get results.

Leave a Reply

Your email address will not be published. Required fields are marked *