Enhance your career prospects with our Data Science TrainingKNOW MORE
Athena is a service offered by Amazon which is an interactive query service. Athena makes it easy for the user to directly analyse data which is present in Amazon S3 (Simple Storage Service) by using standard SQL. In the Amazon Management Console, with just a few clicks of the button, Athena can be made to point to the location where data is stored in Amazon S3. After this, SQL can be used to run ad-hoc queries, and this will fetch result to the user in a matter of seconds.
Athena is a serverless service, which means the user who uses this service doesn’t have to worry about managing the infrastructure or setting it up. Athena has the ability to scale automatically, hence it can execute multiple queries in parallel, which yields results quickly, even when datasets are huge and there are complex queries to be executed.
Athena helps in analysing unstructured, semi-structured as well as structured data that is stored in Amazon S3. Data can be stored in the form of CSV, JSON or columnar data formats like Apache Parquet and Apache ORC. Athena can be used to execute queries using ANSI SQL as well, and this doesn't require the user to aggregate or load data into Athena service.
For the purpose of data visualization, Athena can be integrated with Amazon QuickSight which helps in generating reports, data exploration with business-intelligence tools such as SQL clients that are connected with JDBC or an ODBC driver.
Athena can also be integrated with AWS Glue Data Catalog. AWS Data Catalog provides persistent metadata store for user data in Amazon S3. This way, tables can be created, and data can be queried in Athena, all based on the centralized metadata store which is available throughout the user’s account. This can also be integrated with ETL (Extract, Transform, and Load) and data discovery features which are present in AWS Glue Catalog..
Athena can be used to perform the following operations:
High-level overview of the actions that can be performed on every table:
Athena can be accessed in the following ways:
Before using the Athena service, a database has to be created in Athena. This can be done by following the below mentioned steps:
1. Open Athena console.
2. A ‘getting started’ page shows up if the user is visiting the Athena console for the first time.
3. Click on the ‘Get Started’ tab, and open the Athena Query Editor.
4. In the Query Editor, a query pane with a sample query is present. In that space, the user’s query can be entered.
5. A database can be created, by entering the below command in the Query Editor, and then clicking on the ‘Run Query’ button:
CREATE DATABASE database_name
6. Make sure that the Catalog display is refreshed and the name of the database appears in the list of DATABASE in the Catalog dashboard which is on the left side.
Once a database has been created, a table needs to be created. A table has to be created in the database that will have the same data. Columns can be defined that will map to the data, and specifications regarding how the data can be delimited can be specified. In addition to this, the location of Amazon S3 has to be provided for the file.
Once the database has been created, and a table has been created in the Athena service based on the data present in Amazon S3, it is time to query the data present in the table.
Click on the ‘New Query’, enter the below query, and click on the ‘Run Query’ button.
SELECT os, COUNT(*) count FROM data_table_name WHERE date BETWEEN date '2014-07-05' AND date '2014-08-05' GROUP BY os;
Once this query is executed, it gives some results based on the data present in Amazon S3.
This data can be stored to a CSV file, by clicking on the file icon present in the ‘Results’ pane.
The results of previous queries can also be viewed, and this might take some time. The ‘History’ button can be clicked on to view the previous queries.
Click on the ‘View’ or ‘Download’ button to view or download the previous query results. This will also display status of queries which are currently running.
Note: The history of these queries is retained for 45 days.
The query results can also be stored in Amazon S3 bucket, known as aws-athena-query-results-ACCOUNTID-REGION. The default location of S3 can be changed in the console. The encryption options can also be changed by clicking on the ‘Setting’ tab in the upper right pane.
In this post, we understood how Amazon Athena can be used to run SQL queries and get results.