Search

Python Programming Filter

How to Work with Excel Spreadsheets using Python

Excel is considered as one of the most popular and widely used spreadsheet applications developed by Microsoft. You can organize, analyze and store your data into tabular sheets with the help of Excel. From analysts and sales managers, to CEOs, professionals from every field use Excel for creating quick statistics and for data crunching.Spreadsheets are commonly used in the present world because of their intuitive nature and the ability to handle large datasets. Most importantly, they can work without any prior technical background.Finding different ways to work with Excel using code is essential since working with data and in Python has some serious advantages in comparison with Excel’s UI. Developers of Python have implemented ways to read, write and manipulate Excel documents.You can check the quality of your spreadsheet application by going over the checklist below:Is the spreadsheet able to represent static data?Is the spreadsheet able to mix data, calculations, and reports?Is the data in your spreadsheet complete and consistent in nature?Does the spreadsheet have an organized worksheet structure?This checklist will help you in verifying the qualitative nature of the spreadsheet application you’re going to work on.Practical Applications  In this article, we would be using openpyxl to work on data. With the help of this module, you can extract data from a database into an Excel spreadsheet or you can also convert an Excel spreadsheet into a programmatic format. There can be a lot of possible situations where you might feel the need to use a package like openpyxl. Let us discuss a few of them to get a comprehensive overview of it.Importing New Products Into a Database Consider yourself working in an online store company. When they want to add new products to the online store, they make an Excel spreadsheet with a few hundred rows along with the name of the product, description, price and a few more basic information and then they give it to you. Now, if you want to import this particular data, you need to iterate over each row of the spreadsheet and then add each of the products into the database of the online store.[Text Wrapping Break] Exporting Database Data Into a SpreadsheetConsider you have a Database table. In this particular table, you have collected information of all your users which includes their name, contact number, email address, and so forth. Now, the Marketing Team is willing to collectively contact all the users and promote a new product of the company. However, neither do they have access to the Database nor they have any idea about using SQL to extract the information. In this situation, openpyxl comes to play. You can use it effectively to iterate over each User record and transform the required information into an Excel spreadsheet.    Appending Information to an Existing SpreadsheetConsider the same online store example we discussed above. You have an Excel spreadsheet with a list of users and your job is to append to each row the total amount they have spent in your store.In order to perform this, you have to read the spreadsheet first and then iterate through each row and fetch the total amount spent from the Database. Finally, you need to write it back to the spreadsheet.Starting openpyxlYou can install the openpyxl package using pip. Open your terminal and write the following command: $ pip install openpyxlAfter you have installed the spreadsheet, you can make up your own simple spreadsheet: from openpyxl import Workbook workbook = Workbook() spreadsheet = workbook.active spreadsheet["A1"] = "Hello" spreadsheet["B1"] = "World!" workbook.save(filename="HelloWorld.xlsx")How to Read Excel Spreadsheets with openpyxl Let us start with the most important thing that you can do with a spreadsheet,i.e. read it. We will be using a Watch Sample Dataset which contains a list of 100 watches with information like product name, product ID, review and so forth.  A Simple Way to Read an Excel Spreadsheet Let us start with opening our sample spreadsheet:>>> from openpyxl import load_workbook >>> workbook = load_workbook(filename="sample.xlsx") >>> workbook.sheetnames ['Sheet 1'] >>> spreadsheet = workbook.active >>> spreadsheet <Worksheet "Sheet 1"> >>> spreadsheet.titleIn the example code above, we open the spreadsheet using load_workbook and then we check all the sheets that are available to work with using workbook.sheetnames. Then Sheet 1 is automatically selected using workbook.active since it is the first sheet available. This is the most common way of opening a spreadsheet.  Now, let us see the code to retrieve data from the spreadsheet: >>> spreadsheet["A1"] <Cell 'Sheet 1'.A1> >>> spreadsheet["A1"].value 'marketplace' >>> spreadsheet["F10"].value "G-Shock Men's Grey Sport Watch"You can retrieve the actual value and the cell value  both. To get the actual value, use .value and to get the cell, you can use .cell():>>> spreadsheet.cell(row=10, column=6) <Cell 'Sheet 1'.F10> >>> spreadsheet.cell(row=10, column=6).value "G-Shock Men's Grey Sport Watch"Importing Data from a Spreadsheet In this section, we will discuss how to iterate through the data, and about conversion into a more useful format using Python.Let us first start with iterating through the data. There are a number of iterating methods that depend solely on the user.You can slice the data with a combination of rows and columns:>>> spreadsheet["A1:C2"] ((<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>),  (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)) You can also iterate through the dataset by ranging between rows and columns: >>> # Get all cells from column A  >>> spreadsheet["A"] (<Cell 'Sheet 1'.A1>,  <Cell 'Sheet 1'.A2>,   ...   <Cell 'Sheet 1'.A99>,   <Cell 'Sheet 1'.A100>) >>> # Get all cells for a range of columns >>> spreadsheet["A:B"]  ((<Cell 'Sheet 1'.A1>,    <Cell 'Sheet 1'.A2>,    ...    <Cell 'Sheet 1'.A99>,    <Cell 'Sheet 1'.A100>),   (<Cell 'Sheet 1'.B1>,    <Cell 'Sheet 1'.B2>,    ...    <Cell 'Sheet 1'.B99>,    <Cell 'Sheet 1'.B100>))  >>> # Get all cells from row 5 >>> spreadsheet[5] (<Cell 'Sheet 1'.A5>,  <Cell 'Sheet 1'.B5>,  ...   <Cell 'Sheet 1'.N5>,  <Cell 'Sheet 1'.O5>) >>> # Get all cells for a range of rows >>> spreadsheet[5:6] ((<Cell 'Sheet 1'.A5>,   <Cell 'Sheet 1'.B5>,    ...    <Cell 'Sheet 1'.N5>,    <Cell 'Sheet 1'.O5>),   (<Cell 'Sheet 1'.A6>,    <Cell 'Sheet 1'.B6>,    ...    <Cell 'Sheet 1'.N6>,    <Cell 'Sheet 1'.O6>)) Python offers arguments by which you can set limits to the iteration with the help of Python generators like .iter_rows() and .iter_cols(): >>> for row in spreadsheet.iter_rows(min_row=1, ... max_row=2, ... min_col=1, ... max_col=3): ... print(row) (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>) (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>) >>> for column in spreadsheet.iter_cols(min_row=1,  ... max_row=2, ... min_col=1, ... max_col=3): ... print(column) (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>) (<Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.B2>)  (<Cell 'Sheet 1'.C1>, <Cell 'Sheet 1'.C2>) You can also add Boolean values_only in the above example and set it to True to get the values of cell: >>> for value in spreadsheet.iter_rows(min_row=1,  ... max_row=2,  ... min_col=1,  ... max_col=3,  ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id') ('US', 3653882, 'R3O9SGZBVQBV76')Since we are now done with iterating the data, let us now manipulate data using Python’s primitive data structures. Consider a situation where you want to extract information of a product from the sample spreadsheet and then store it into the dictionary. The key to the dictionary would be the product ID.   Convert Data into Python classesTo convert data into Python data classes, let us first decide what we want to store and how to store it.  The two essential elements that can be extracted from the data are as follows:                                                     1. Products                                             2. Review                                                          • ID                                                         • ID                                                          • Title                                                     • Customers ID                                                          • Parent                                                 • Headline                                                          • Category                                            • Body                                                                                                                         • DateLet us implement the two elements: import datetime from dataclasses import dataclass @dataclass class Product: id: str parent: str title: str category: str @dataclass class Review: id: str customer_id: str stars: int headline: str body: str  date: datetime.datetime The next step is to create a mapping between columns and the required fields: >>> for value in spreadsheet.iter_rows(min_row=1, ... max_row=1, ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id', 'product_id', ...) >>> # Or an alternative >>> for cell in sheet[1]: ... print(cell.value) marketplace Customer_ID Review_ID Product_ID Product_Parent ...Finally, let us convert the data into new structures which will parse the data in spreadsheet into a list of products and review objects: from datetime import datetime  from openpyxl import load_workbook  from classes import Product,Review  from mapping import PRODUCT_ID,PRODUCT_PARENT,PRODUCT_TITLE, \ PRODUCT_CATEGORY,REVIEW_DATE,REVIEW_ID,REVIEW_CUSTOMER, \ REVIEW_STARS,REVIEW_HEADLINE,REVIEW_BODY # Using the read_only method since you're not gonna be editing the spreadsheet workbook = load_workbook(filename="watch_sample.xlsx",read_only=True)  spreadsheet = workbook.active products = [] reviews = [] # Using the values_only because you just want to return the cell value for row in spreadsheet .iter_rows(min_row=2, values_only=True):  product = Product(id=row[PRODUCT_ID],  parent=row[PRODUCT_PARENT],  title=row[PRODUCT_TITLE],  category=row[PRODUCT_CATEGORY])  products.append(product) # You need to parse the date from the spreadsheet into a datetime format spread_date = row[REVIEW_DATE]  parsed_date = datetime.strptime(spread_date,"%Y-%m-%d") review = Review(id=row[REVIEW_ID], Customer_ID=row[REVIEW_CUSTOMER], stars=row[REVIEW_STARS], headline=row[REVIEW_HEADLINE], body=row[REVIEW_BODY], date=parsed_date) reviews.append(review) print(products[0]) print(reviews[0])After you execute the code, you will get an output that looks like this:Product(id='A90FALZ1ZC',parent=937111370,...) Review(id='D3O9OGZVVQBV76',customer_id=3903882,...)Appending Data To understanding how to append data, let us hover back to the first sample spreadsheet. We will open the document and append some data to it: from openpyxl import load_workbook # Start by opening the spreadsheet and selecting the main sheet workbook = load_workbook(filename="hello_world.xlsx") spreadsheet = workbook.active # Write what you want into a specific cell spreadsheet["C1"]="Manipulating_Data ;)" # Save the spreadsheet workbook.save(filename="hello_world_append.xlsx"If you open your Excel file, you will notice the additional Manipulating_Data being added to an adjacent cell. Writing Excel Spreadsheets With openpyxl A spreadsheet is a file that helps to store data in specific rows and columns. We can calculate and store numerical data and also perform computation using formulas. So, let’s begin with some simple Spreadsheets and understand what each line means. Creating our first simple Spreadsheet 1 from openpyxl import Workbook  2    3 filename = "first_program.xlsx"  4    5 workbook = Workbook()  6 spreadsheet = workbook.active  7    8 sheet["A1"] = "first"  9 sheet["B1"] = "program!" 10   11 workbook.save(filename=filename)Line 5: In order to make a Spreadsheet, at first,  we have to create an Empty workbook to perform further operations. Lines 8 and 9 : We can add data to a specific cell as per our requirement. In this example, we can see that two values “first” and “program” have been added to specific cells in the sheet. Line 11: The line shows how to save data after all the operations we have done. Basic Spreadsheet Operations Before going to the difficult coding part, at first we have to build our building blocks like how to add and update values, how to manage rows and columns, adding filters, styles or formulas in a Spreadsheet. We have already explained the following code by which we can add values to a Spreadsheet: >>> spreadsheet["A1"] = "the_value_we_want_to_add"There is another way that we can add values to Spreadsheet: >>> cell = sheet["A1"] >>> cell <Cell 'Sheet'.A1> >>> cell.value 'hello' >>> cell.value = "hey" >>> cell.value 'hey'Line 1: In the first line at first we have declared the cell and updated its value. Line 5: We have printed the value of the cell as “first”  because  in the first program we have already assigned sheet["A1"]with “first” Line 8 : We have updated the value of the cell as "second"by simply assigning it to cell.value. Lines 9 : In this line, we have just printed the updated value of cell. Finally, you have to save all the operations you have performed into the spreadsheet once you call workbook.save().If  the cell didn’t exist while adding a value ,then openpyxl creates a cell:>>> # Before, our spreadsheet has only 1 row >>> print_rows() ('first', 'program!') >>> # Try adding a value to row 10 >>> spreadsheet["B10"] = "test" >>> print_rows() ('first', 'program!') (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, 'test') Managing Rows and Columns in Spreadsheet Insertion or deletion of rows (adding or removing elements of rows /columns) is one of the most basic operations in Spreadsheet. In openpyxl.We can perform these operations by simply calling these methods and passing its arguments. .insert_rows().delete_rows().insert_cols().delete_cols()We can pass 2 types of arguments to the methods :  idx amount Idx stands for index position and amount refers to the number of values we can store in the Spreadsheet. Using our basic knowledge based on the first  simple program, let’s see how we can use these methods inside the program: >>> print_rows() ('first', 'program!') >>> # Insert a column at the first position before column 1 ("A") >>> spreadsheet.insert_cols(idx=1) >>> print_rows() (None, 'first', 'program!') >>> # Insert 5 columns in  between column 2 ("B") and 3 ("C") >>> spreadsheet.insert_cols(idx=3,amount=5) >>> print_rows() (None, 'first', None, None, None, None, None, 'program!') >>> # Delete the created columns >>> spreadsheet.delete_cols(idx=3,amount=5) >>> v.delete_cols(idx=1) >>> print_rows() ('first', 'program!') >>> # Insert a new row in the beginning >>> spreadsheet.insert_rows(idx=1) >>> print_rows() (None, None) ('first', 'program!') >>> # Insert 3 new rows in the beginning  >>> spreadsheet.insert_rows(idx=1,amount=3) >>> print_rows() (None, None) (None, None)  (None, None)  (None, None)  ('first', 'program!') >>> # Delete the first 4 rows  >>> spreadsheet.delete_rows(idx=1,amount=4)  >>> print_rows()  ('first', 'program!') Managing SheetsWe have seen the following recurring piece of code in our previous examples .This is one of the ways of selecting the default sheet from the Spreadsheet: spreadsheet = workbook.activeHowever, if you want to open a spreadsheet with multiple sheets, you can write the following command: >>> # Let's say you have two sheets: "Products" and "Company Sales" >>> workbook.sheetnames ['Products', 'Company Sales'] >>> # You can select a sheet using its title >>> Products_Sheet = workbook["Products"] >>> Sales_sheet = workbook["Company Sales"]If we want to change the title of the Sheet, execute the following code: >>> workbook.sheetnames ['Products', 'Company Sales'] >>> Products_Sheet = workbook["Products"] >>> Products_Sheet.title = "New Products" >>> workbook.sheetnames ['New Products', 'Company Sales']We can CREATE / DELETE Sheets also with the help of two methods - .create_sheet() and  .remove(): >>> #To print the available sheet names >>> workbook.sheetnames  ['Products', 'Company Sales'] >>> #To create a new Sheet named "Operations" >>> Operations_Sheet = workbook.create_sheet("Operations") >>> #To print the updated available sheet names >>> workbook.sheetnames ['Products', 'Company Sales', 'Operations'] >>> # To define the position Where we want to create the Sheet(here “HR” sheet is created at the first position .Here index 0 represents the first position) >>> HR_Sheet = workbook.create_sheet("HR",0) >>> #To again  print the updated available sheet names >>> workbook.sheetnames ['HR', 'Products', 'Company Sales', 'Operations'] >>> # To remove them,we just have to send the sheet names as an argument which we want to delete to the method  .remove()  >>> workbook.remove(Operations_Sheet) >>> workbook.sheetnames ['HR', 'Products', 'Company Sales'] >>> #To delete hr_sheet >>> workbook.remove(hr_sheet) >>> workbook.sheetnames ['Products', 'Company Sales']Adding Filters to the Spreadsheet We can use openpyxl to add filters in our Spreadsheet but when we open our Spreadsheet, the data won’t be rearranged according to these sorts and filters. When you’re programmatically creating a spreadsheet and it is going to be sent and used by someone else, it is a good practice to add different filters and allow people to use it afterward. In the code below there is a simple example which shows how to add a simple filter to your spreadsheet: >>> # Check the used spreadsheet space using the attribute "dimensions" >>> spreadsheet.dimensions 'A1:O100' >>> spreadsheet.auto_filter.ref="A1:O100" >>> workbook.save(filename="watch_sample_with_filters.xlsx")Adding Formulas to the Spreadsheet Formulas are one of the most commonly used and powerful features of spreadsheets. By using formulas, you can solve various mathematical equations with the additional support of openpyxl which makes those calculations as simple as editing a specific cell’s value.The list of formulas supported by openpyxl are:>>> from openpyxl.utils import FORMULAE >>> FORMULAE frozenset({'ABS',            'AMORLINC',            'ACCRINT',             'ACOS',             'ACCRINTM',             'ACOSH',              ...,                   'AND',            'YEARFRAC',             'YIELDDISC',             'AMORDEGRC',             'YIELDMAT',             'YIELD',             'ZTEST'}) Let’s add some formulas to our spreadsheet. Let’s check the average star rating of  the 99 reviews within the spreadsheet: >>> # Star rating is in column "H"  >>> spreadsheet["P2"] = "=AVERAGE(H2:H100)" >>> workbook.save(filename = "first_example.xlsx")Now, if we open your spreadsheet and go to cell P2, you can see the value to be 4.18181818181818.  Similarly, we can use this methodology to include any formulas for our requirements in our spreadsheet. For example, if we want to count the number of helpful reviews: >>> # The helpful votes  counted in column "I"  >>> spreadsheet["P3"] = '=COUNTIF(I2:I100, ">0")' >>> workbook.save(filename = "first_example.xlsx") Adding Styles to the SpreadsheetIt is not so important and usually, we don’t use this in everyday code but for the sake of completeness, we will also understand this with the following example.Using openpyxl, we get multiple styling options such as including fonts, colors,  borders,and so on.Let’s have a look at an example:>>> # Import necessary style classes >>> from openpyxl.styles import Font,Color,Alignment,Border,Side,colors >>> # Create a few styles >>> Bold_Font = Font(bold=True) >>> Big_Red_Text = Font(color=colors.RED,size=20)  >>> Center_Aligned_Text = Alignment(horizontal="center")  >>> Double_Border_Side = Side(border_style="double")  >>> Square_Border = Border(top=double_border_side,  ... right=double_border_side,  ... bottom=double_border_side,  ... left=double_border_side)  >>> # Style some cells! >>> spreadsheet["A2"].font = Bold_Font >>> spreadsheet["A3"].font = Big_Red_Text >>> spreadsheet["A4"].alignment = Center_Aligned_Text >>> spreadsheet["A5"].border = Square_Border >>> workbook.save(filename="sample_styles.xlsx") If you want to apply multiple styles to one or several cells in our spreadsheets,you can use  NamedStyle class: >>> from openpyxl.styles import NamedStyle >>> # Let's create a style template for the header row >>> header = NamedStyle(name="header") >>> header.font = Font(bold=True) >>> header.border = Border(bottom=Side(border_style="thin")) >>> header.alignment = Alignment(horizontal="center",vertical="center") >>> # Now let's apply this to all first row (header) cells >>> header_row = sheet[1] >>> for cell in header_row: ... cell.style = header >>> workbook.save(filename="sample_styles.xlsx") Adding Charts to our SpreadsheetCharts are a good way to compute and understand large amounts of data quickly and easily. We have a lot of charts such as bar chart, pie chart, line chart, and so on. Let us start by creating a new workbook with some data:  1 from openpyxl import Workbook   2 from openpyxl.chart import BarChart,Reference   3    4 workbook = Workbook()   5 spreadsheet = workbook.active   6    7 # Let's create some sample sales data  8 rows = [   9    ["Product","Online","Store"],  10    [1,30,45],  11    [2,40,30],  12    [3,40,25],  13    [4,50,30],  14    [5,30,25],  15    [6,25,35],  16    [7,20,40],  17 ]  18   19 for row in rows: 20    spreadsheet .append(row)Now let us create a bar chart that will show the total number of sales per product: 22 chart = BarChart() 23 data = Reference(worksheet=sheet, 24                 min_row=1,  25                 max_row=8,  26                 min_col=2,  27                 max_col=3)  28   29 chart.add_data(data,titles_from_data=True) 30 spreadsheet .add_chart(chart, "E2") 31 32 workbook.save("chart.xlsx")You can also create a line chart by simply making some changes to the data:  1 import random   2 from openpyxl import Workbook   3 from openpyxl.chart import LineChart,Reference   4    5 workbook = Workbook()  6 sheet = workbook.active  7    8 # Let's create some sample sales data   9 rows= [ 10    ["", "January", "February", "March", "April",  11    "May", "June", "July", "August", "September",  12     "October", "November", "December"],  13    [1, ], 14    [2, ], 15    [3, ], 16 ]  17   18 for row in rows:  19    sheet.append(row) 20   21 for row in sheet.iter_rows(min_row=2, 22                           max_row=4, 23                           min_col=2, 24                           max_col=13): 25    for cell in row: 26        cell.value = random.randrange(5,100)There are numerous types of charts and various types of customizations you can apply to your spreadsheet to make it more attractive.Convert Python Classes to Excel SpreadsheetLet us now learn how to convert the Excel Spreadsheet data to Python classes.  Assume we have a database and we use some Object Relational mapping to map the database into Python classes and then export the objects into spreadsheets: from dataclasses import dataclass from typing import List @dataclass class Sale: id: str  quantity: int @dataclass  class Product:  id: str  name: str  sales:List[Sale] Now, to generate some random data, let’s assume that the above classes are stored in  db_classes.py file then:  1 import random   2    3 # Ignore these for now. You'll use them in a sec ;)   4 from openpyxl import Workbook   5 from openpyxl.chart import LineChart,Reference   6    7 from db_classes import Product,Sale   8    9 products_range = []  10   11 # Let's create 5 products 12 for idx in range(1,6): 13    sales = []  14   15    # Create 5 months of sales  16    for_in range(5): 17        sale_range = Sale(quantity=random.randrange(5,100)) 18        sales.append(sale) 19   20    product = Product(id=str(idx), 21                      name="Product %s" % idx, 22                      sales=sales) 23    products_range.append(product)By running this code, we will get 5 products in 5 months of sale with a random quantity of sales for each month. Now, we have  to convert this into a spreadsheet in which we need to iterate over the data: 25 workbook = Workbook()  26 spreadsheet = workbook.active  27 28 # Append column names first  29 spreadsheet.append(["Product ID","Product Name","Month 1",  30              "Month 2","Month 3","Month 4","Month 5"])  31   32 # Append the data  33 for product in products_range: 34    data = [product.id,product.name] 35    for sale in product.sales: 36        data.append(sale.quantity)  37    spreadsheet.append(data) This will create a spreadsheet with some data coming from your database. How to work with pandas to handle Spreadsheets?We have learned to work with Excel in Python because Excel is one of the most popular tools and finding a way to work with Excel is critical. Pandas is a great tool to work with Excel in Python. It has unique methods to read all kinds of data in an Excel file and we can export items back to Excel using it. To use it, at first we need to install pandas package: $ pip install pandas Then, let’s create a simple DataFrame:  1 import pandas as pd   2    3 data = {   4    "Product Name":["Product 1","Product 2"],   5    "Sales Month 1":[10, 20],   6    "Sales Month 2":[5, 35],   7 }   8 dataframe = pd.DataFrame(data)Now we have some data, and to convert it from a DataFrame into a worksheet we generally use .dataframe_to_rows(): 10 from openpyxl import Workbook 11 from openpyxl.utils.dataframe import  dataframe_to_rows  12   13 workbook = Workbook()  14 spreadsheet = workbook.active  15   16 for row in dataframe_to_rows(df, index=False,header=True):  17    spreadsheet .append(row) 18   19 workbook.save("pandas_spreadsheet.xlsx")We need to use  read_excel method to read data from pandas DataFrame object. excel_file =’movies.xls’  movies=pd.read_excel(excel_file) We can also use Excel file class to use multiple sheets from the same excel file: movies_sheets = [] for sheet in xlsx.sheet_names:     movies_sheets.append(xlsx.parse(sheet))     movies = pd.concat(movies_sheets))Indexes and columns allows you to access data from your DataFrame easily: >>> df.columns  Index(['marketplace', 'customer_id', 'review_id', 'product_id',        'product_parent', 'product_title', 'product_category', 'star_rating',         'helpful_votes', 'total_votes', 'vine', 'verified_purchase',         'review_headline', 'review_body', 'review_date'],        dtype='object') >>> # Get first 10 reviews' star rating  >>> df["star_rating"][:10] R3O9SGZBVQBV76    5 RKH8BNC3L5DLF     5  R2HLE8WKZSU3NL    2  R31U3UH5AZ42LL    5  R2SV659OUJ945Y    4  RA51CP8TR5A2L     5  RB2Q7DLDN6TH6     5  R2RHFJV0UYBK3Y    1  R2Z6JOQ94LFHEP    5  RX27XIIWY5JPB     4  Name: star_rating, dtype: int64 >>> # Grab review with id "R2EQL1V1L6E0C9", using the index >>> df.loc["R2EQL1V1L6E0C9"] marketplace               US customer_id         15305006  review_id     R2EQL1V1L6E0C9  product_id        B004LURNO6  product_parent     892860326  review_headline   Five Stars  review_body          Love it  review_date       2015-08-31  Name: R2EQL1V1L6E0C9, dtype: object Summary In this article we have covered: How to extract information from spreadsheets  How to create Spreadsheets in different ways How to customize a spreadsheet by adding filters, styles, or charts and so on How to use pandas to work with spreadsheets Now you are well aware of the different types of implementations you can perform with spreadsheets using Python. However, if you are willing to gather more information on this topic, you can always rely on the official documentation of openpyxl. To gain more knowledge about Python tips and tricks, check out our Python tutorial. To gain mastery over Python coding,join ourPython certification course.  
Rated 4.5/5 based on 22 customer reviews

How to Work with Excel Spreadsheets using Python

14205
How to Work with Excel Spreadsheets using Python

Excel is considered as one of the most popular and widely used spreadsheet applications developed by Microsoft. You can organize, analyze and store your data into tabular sheets with the help of Excel. From analysts and sales managers, to CEOs, professionals from every field use Excel for creating quick statistics and for data crunching.

Spreadsheets are commonly used in the present world because of their intuitive nature and the ability to handle large datasets. Most importantly, they can work without any prior technical background.

Finding different ways to work with Excel using code is essential since working with data and in Python has some serious advantages in comparison with Excel’s UI. Developers of Python have implemented ways to read, write and manipulate Excel documents.

You can check the quality of your spreadsheet application by going over the checklist below:

  • Is the spreadsheet able to represent static data?
  • Is the spreadsheet able to mix data, calculations, and reports?
  • Is the data in your spreadsheet complete and consistent in nature?
  • Does the spreadsheet have an organized worksheet structure?

This checklist will help you in verifying the qualitative nature of the spreadsheet application you’re going to work on.

Practical Applications  

In this article, we would be using openpyxl to work on data. With the help of this module, you can extract data from a database into an Excel spreadsheet or you can also convert an Excel spreadsheet into a programmatic format. There can be a lot of possible situations where you might feel the need to use a package like openpyxl. Let us discuss a few of them to get a comprehensive overview of it.

Importing New Products Into a Database 

Consider yourself working in an online store company. When they want to add new products to the online store, they make an Excel spreadsheet with a few hundred rows along with the name of the product, description, price and a few more basic information and then they give it to you. 

Now, if you want to import this particular data, you need to iterate over each row of the spreadsheet and then add each of the products into the database of the online store.[Text Wrapping Break] 

Exporting Database Data Into a Spreadsheet

Consider you have a Database table. In this particular table, you have collected information of all your users which includes their name, contact number, email address, and so forth. Now, the Marketing Team is willing to collectively contact all the users and promote a new product of the company. However, neither do they have access to the Database nor they have any idea about using SQL to extract the information. 

In this situation, openpyxl comes to play. You can use it effectively to iterate over each User record and transform the required information into an Excel spreadsheet.    

Appending Information to an Existing Spreadsheet

Consider the same online store example we discussed above. You have an Excel spreadsheet with a list of users and your job is to append to each row the total amount they have spent in your store.

In order to perform this, you have to read the spreadsheet first and then iterate through each row and fetch the total amount spent from the Database. Finally, you need to write it back to the spreadsheet.

Starting openpyxl

You can install the openpyxl package using pip. Open your terminal and write the following command: 

$ pip install openpyxl

After you have installed the spreadsheet, you can make up your own simple spreadsheet: 

from openpyxl import Workbook

workbook = Workbook()
spreadsheet = workbook.active

spreadsheet["A1"] = "Hello"
spreadsheet["B1"] = "World!"

workbook.save(filename="HelloWorld.xlsx")

How to Read Excel Spreadsheets with openpyxl 

Let us start with the most important thing that you can do with a spreadsheet,i.e. read it. We will be using a Watch Sample Dataset which contains a list of 100 watches with information like product name, product ID, review and so forth.  

A Simple Way to Read an Excel Spreadsheet 

Let us start with opening our sample spreadsheet:

>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename="sample.xlsx")
>>> workbook.sheetnames
['Sheet 1']

>>> spreadsheet = workbook.active
>>> spreadsheet
<Worksheet "Sheet 1">

>>> spreadsheet.title

In the example code above, we open the spreadsheet using load_workbook and then we check all the sheets that are available to work with using workbook.sheetnames. Then Sheet 1 is automatically selected using workbook.active since it is the first sheet available. This is the most common way of opening a spreadsheet.  

Now, let us see the code to retrieve data from the spreadsheet: 

>>> spreadsheet["A1"]
<Cell 'Sheet 1'.A1>

>>> spreadsheet["A1"].value
'marketplace'

>>> spreadsheet["F10"].value
"G-Shock Men's Grey Sport Watch"

You can retrieve the actual value and the cell value  both. To get the actual value, use .value and to get the cell, you can use .cell():

>>> spreadsheet.cell(row=10, column=6)
<Cell 'Sheet 1'.F10>

>>> spreadsheet.cell(row=10, column=6).value
"G-Shock Men's Grey Sport Watch"

Importing Data from a Spreadsheet 

In this section, we will discuss how to iterate through the data, and about conversion into a more useful format using Python.

Let us first start with iterating through the data. There are a number of iterating methods that depend solely on the user.

You can slice the data with a combination of rows and columns:

>>> spreadsheet["A1:C2"]
((<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>), 
 (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)) 

You can also iterate through the dataset by ranging between rows and columns: 

>>> # Get all cells from column A 
>>> spreadsheet["A"]
(<Cell 'Sheet 1'.A1>, 
 <Cell 'Sheet 1'.A2>, 
 ... 
 <Cell 'Sheet 1'.A99>, 
 <Cell 'Sheet 1'.A100>) 

>>> # Get all cells for a range of columns 
>>> spreadsheet["A:B"] 
((<Cell 'Sheet 1'.A1>, 
  <Cell 'Sheet 1'.A2>, 
  ... 
  <Cell 'Sheet 1'.A99>, 
  <Cell 'Sheet 1'.A100>), 
 (<Cell 'Sheet 1'.B1>, 
  <Cell 'Sheet 1'.B2>, 
  ... 
  <Cell 'Sheet 1'.B99>, 
  <Cell 'Sheet 1'.B100>)) 

>>> # Get all cells from row 5 
>>> spreadsheet[5]
(<Cell 'Sheet 1'.A5>,
 <Cell 'Sheet 1'.B5>,
 ... 
 <Cell 'Sheet 1'.N5>,
 <Cell 'Sheet 1'.O5>)

>>> # Get all cells for a range of rows
>>> spreadsheet[5:6]
((<Cell 'Sheet 1'.A5>, 
  <Cell 'Sheet 1'.B5>, 
  ... 
  <Cell 'Sheet 1'.N5>, 
  <Cell 'Sheet 1'.O5>), 
 (<Cell 'Sheet 1'.A6>, 
  <Cell 'Sheet 1'.B6>, 
  ... 
  <Cell 'Sheet 1'.N6>, 
  <Cell 'Sheet 1'.O6>)) 

Python offers arguments by which you can set limits to the iteration with the help of Python generators like .iter_rows() and .iter_cols()

>>> for row in spreadsheet.iter_rows(min_row=1,
...                           max_row=2,
...                           min_col=1,
...                           max_col=3):
...    print(row)
(<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>) 
(<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)

>>> for column in spreadsheet.iter_cols(min_row=1...                              max_row=2,
...                              min_col=1,
...                              max_col=3):
... print(column)
(<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>) 
(<Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.B2>) 
(<Cell 'Sheet 1'.C1>, <Cell 'Sheet 1'.C2>) 

You can also add Boolean values_only in the above example and set it to True to get the values of cell: 

>>> for value in spreadsheet.iter_rows(min_row=1...                          max_row=2...                          min_col=1...                          max_col=3...                          values_only=True):
...    print(value)
('marketplace', 'customer_id', 'review_id') 
('US', 3653882, 'R3O9SGZBVQBV76')

Since we are now done with iterating the data, let us now manipulate data using Python’s primitive data structures. 

Consider a situation where you want to extract information of a product from the sample spreadsheet and then store it into the dictionary. The key to the dictionary would be the product ID.   

Convert Data into Python classes

To convert data into Python data classes, let us first decide what we want to store and how to store it.  

The two essential elements that can be extracted from the data are as follows:

Convert Data into Python classes

                                                     1. Products                                             2. Review

                                                          • ID                                                         • ID
                                                          • Title                                                     • Customers ID
                                                          • Parent                                                 • Headline
                                                          • Category                                            • Body
                                                                                                                         • Date

Let us implement the two elements: 

import datetime
from dataclasses import dataclass

@dataclass
class Product:
    id: str
    parent: str
    title: str
    category: str
@dataclass
class Review:
    id: str
    customer_id: str
    stars: int
    headline: str
    body: str 
    date: datetime.datetime 

The next step is to create a mapping between columns and the required fields: 

>>> for value in spreadsheet.iter_rows(min_row=1,
...                             max_row=1,
...                             values_only=True):
...    print(value)
('marketplace', 'customer_id', 'review_id', 'product_id', ...)

>>> # Or an alternative
>>> for cell in sheet[1]:
...  print(cell.value)
marketplace
Customer_ID
Review_ID
Product_ID
Product_Parent
...

Finally, let us convert the data into new structures which will parse the data in spreadsheet into a list of products and review objects: 

from datetime import datetime 
from openpyxl import load_workbook 
from classes import Product,Review 
from mapping import PRODUCT_ID,PRODUCT_PARENT,PRODUCT_TITLE, \
    PRODUCT_CATEGORY,REVIEW_DATE,REVIEW_ID,REVIEW_CUSTOMER, \
    REVIEW_STARS,REVIEW_HEADLINE,REVIEW_BODY

# Using the read_only method since you're not gonna be editing the spreadsheet 
workbook = load_workbook(filename="watch_sample.xlsx",read_only=True) 
spreadsheet = workbook.active

products = []
reviews = []

# Using the values_only because you just want to return the cell value
for row in spreadsheet .iter_rows(min_row=2, values_only=True): 
    product = Product(id=row[PRODUCT_ID], 
                      parent=row[PRODUCT_PARENT], 
                      title=row[PRODUCT_TITLE], 
                      category=row[PRODUCT_CATEGORY]) 
    products.append(product)
# You need to parse the date from the spreadsheet into a datetime format
spread_date = row[REVIEW_DATE] 
parsed_date = datetime.strptime(spread_date,"%Y-%m-%d")

review = Review(id=row[REVIEW_ID],
              Customer_ID=row[REVIEW_CUSTOMER],
              stars=row[REVIEW_STARS],
              headline=row[REVIEW_HEADLINE],
              body=row[REVIEW_BODY],
              date=parsed_date)
reviews.append(review)

print(products[0])
print(reviews[0])

After you execute the code, you will get an output that looks like this:

Product(id='A90FALZ1ZC',parent=937111370,...)
Review(id='D3O9OGZVVQBV76',customer_id=3903882,...)

Appending Data 

To understanding how to append data, let us hover back to the first sample spreadsheet. We will open the document and append some data to it: 

from openpyxl import load_workbook

# Start by opening the spreadsheet and selecting the main sheet 
workbook = load_workbook(filename="hello_world.xlsx")
spreadsheet = workbook.active

# Write what you want into a specific cell
spreadsheet["C1"]="Manipulating_Data ;)"

# Save the spreadsheet
workbook.save(filename="hello_world_append.xlsx"

If you open your Excel file, you will notice the additional Manipulating_Data being added to an adjacent cell. 

Writing Excel Spreadsheets With openpyxl 

A spreadsheet is a file that helps to store data in specific rows and columns. We can calculate and store numerical data and also perform computation using formulas. 

So, let’s begin with some simple Spreadsheets and understand what each line means. 

Creating our first simple Spreadsheet

 1 from openpyxl import Workbook
 2  
 3 filename = "first_program.xlsx"
 4  
 5 workbook = Workbook()
 6 spreadsheet = workbook.active
 7  
 8 sheet["A1"] = "first"
 9 sheet["B1"] = "program!"
10  
11 workbook.save(filename=filename)

Line 5: In order to make a Spreadsheet, at first,  we have to create an Empty workbook to perform further operations. 

Lines 8 and 9 : We can add data to a specific cell as per our requirement. In this example, we can see that two values “first” and “program” have been added to specific cells in the sheet. 

Line 11: The line shows how to save data after all the operations we have done. 

Basic Spreadsheet Operations 

Before going to the difficult coding part, at first we have to build our building blocks like how to add and update values, how to manage rows and columns, adding filters, styles or formulas in a Spreadsheet. 

We have already explained the following code by which we can add values to a Spreadsheet: 

>>> spreadsheet["A1"] = "the_value_we_want_to_add"

There is another way that we can add values to Spreadsheet: 

>>> cell = sheet["A1"]
>>> cell
<Cell 'Sheet'.A1>

>>> cell.value
'hello'

>>> cell.value = "hey"
>>> cell.value
'hey'

Line 1: In the first line at first we have declared the cell and updated its value. 

Line 5: We have printed the value of the cell as “first”  because  in the first program we have already assigned sheet["A1"]with “first” 

Line 8 : We have updated the value of the cell as "second"by simply assigning it to cell.value

Lines 9 : In this line, we have just printed the updated value of cell. 

Finally, you have to save all the operations you have performed into the spreadsheet once you call workbook.save().

If  the cell didn’t exist while adding a value ,then openpyxl creates a cell:

>>> # Before, our spreadsheet has only 1 row
>>> print_rows()
('first', 'program!')

>>> # Try adding a value to row 10
>>> spreadsheet["B10"] = "test"
>>> print_rows()
('first', 'program!') 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, 'test') 

Managing Rows and Columns in Spreadsheet 

Insertion or deletion of rows (adding or removing elements of rows /columns) is one of the most basic operations in Spreadsheet. In openpyxl.We can perform these operations by simply calling these methods and passing its arguments. 

  • .insert_rows()
  • .delete_rows()
  • .insert_cols()
  • .delete_cols()

We can pass 2 types of arguments to the methods :  

  1. idx 
  2. amount 

Idx stands for index position and amount refers to the number of values we can store in the Spreadsheet. 

Using our basic knowledge based on the first  simple program, let’s see how we can use these methods inside the program: 

>>> print_rows()
('first', 'program!')

>>> # Insert a column at the first position before column 1 ("A")
>>> spreadsheet.insert_cols(idx=1)
>>> print_rows()
(None, 'first', 'program!')

>>> # Insert 5 columns in  between column 2 ("B") and 3 ("C")
>>> spreadsheet.insert_cols(idx=3,amount=5)
>>> print_rows()
(None, 'first', None, None, None, None, None, 'program!')

>>> # Delete the created columns
>>> spreadsheet.delete_cols(idx=3,amount=5)
>>> v.delete_cols(idx=1)
>>> print_rows()
('first', 'program!')

>>> # Insert a new row in the beginning
>>> spreadsheet.insert_rows(idx=1)
>>> print_rows()
(None, None)
('first', 'program!')

>>> # Insert 3 new rows in the beginning 
>>> spreadsheet.insert_rows(idx=1,amount=3)
>>> print_rows()
(None, None) 
(None, None) 
(None, None) 
(None, None) 
('first', 'program!')

>>> # Delete the first 4 rows 
>>> spreadsheet.delete_rows(idx=1,amount=4>>> print_rows() 
('first', 'program!') 

Managing Sheets

We have seen the following recurring piece of code in our previous examples .This is one of the ways of selecting the default sheet from the Spreadsheet: 

spreadsheet = workbook.active

However, if you want to open a spreadsheet with multiple sheets, you can write the following command: 

>>> # Let's say you have two sheets: "Products" and "Company Sales"
>>> workbook.sheetnames
['Products', 'Company Sales']

>>> # You can select a sheet using its title
>>> Products_Sheet = workbook["Products"]
>>> Sales_sheet = workbook["Company Sales"]

If we want to change the title of the Sheet, execute the following code: 

>>> workbook.sheetnames
['Products', 'Company Sales']

>>> Products_Sheet = workbook["Products"]
>>> Products_Sheet.title = "New Products"

>>> workbook.sheetnames
['New Products', 'Company Sales']

We can CREATE / DELETE Sheets also with the help of two methods - .create_sheet() and  .remove()

>>> #To print the available sheet names
>>> workbook.sheetnames 
['Products', 'Company Sales']

>>> #To create a new Sheet named "Operations"
>>> Operations_Sheet = workbook.create_sheet("Operations")

>>> #To print the updated available sheet names
>>> workbook.sheetnames
['Products', 'Company Sales', 'Operations']

>>> # To define the position Where we want to create the Sheet(here “HR” sheet is created at the first position .Here index 0 represents the first position)
>>> HR_Sheet = workbook.create_sheet("HR",0)

>>> #To again  print the updated available sheet names
>>> workbook.sheetnames
['HR', 'Products', 'Company Sales', 'Operations']

>>> # To remove them,we just have to send the sheet names as an argument which we want to delete to the method  .remove() 
>>> workbook.remove(Operations_Sheet)
>>> workbook.sheetnames
['HR', 'Products', 'Company Sales']

>>> #To delete hr_sheet
>>> workbook.remove(hr_sheet)
>>> workbook.sheetnames
['Products', 'Company Sales']

Adding Filters to the Spreadsheet 

We can use openpyxl to add filters in our Spreadsheet but when we open our Spreadsheet, the data won’t be rearranged according to these sorts and filters. 

When you’re programmatically creating a spreadsheet and it is going to be sent and used by someone else, it is a good practice to add different filters and allow people to use it afterward. 

In the code below there is a simple example which shows how to add a simple filter to your spreadsheet: 

>>> # Check the used spreadsheet space using the attribute "dimensions"
>>> spreadsheet.dimensions
'A1:O100'

>>> spreadsheet.auto_filter.ref="A1:O100"
>>> workbook.save(filename="watch_sample_with_filters.xlsx")

Adding Formulas to the Spreadsheet 

Formulas are one of the most commonly used and powerful features of spreadsheets. By using formulas, you can solve various mathematical equations with the additional support of openpyxl which makes those calculations as simple as editing a specific cell’s value.

The list of formulas supported by openpyxl are:

>>> from openpyxl.utils import FORMULAE
>>> FORMULAE
frozenset({'ABS',
           'AMORLINC',
           'ACCRINT', 
           'ACOS', 
           'ACCRINTM', 
           'ACOSH', 
            ...,       
           'AND', 
           'YEARFRAC', 
           'YIELDDISC', 
           'AMORDEGRC', 
           'YIELDMAT', 
           'YIELD', 
           'ZTEST'}) 

Let’s add some formulas to our spreadsheet. 

Let’s check the average star rating of  the 99 reviews within the spreadsheet: 

>>> # Star rating is in column "H" 
>>> spreadsheet["P2"] = "=AVERAGE(H2:H100)"
>>> workbook.save(filename = "first_example.xlsx")

Now, if we open your spreadsheet and go to cell P2, you can see the value to be 4.18181818181818.  

Similarly, we can use this methodology to include any formulas for our requirements in our spreadsheet. For example, if we want to count the number of helpful reviews: 

>>> # The helpful votes  counted in column "I" 
>>> spreadsheet["P3"] = '=COUNTIF(I2:I100, ">0")' 
>>> workbook.save(filename = "first_example.xlsx") 

Adding Styles to the Spreadsheet

It is not so important and usually, we don’t use this in everyday code but for the sake of completeness, we will also understand this with the following example.

Using openpyxl, we get multiple styling options such as including fonts, colors,  borders,and so on.

Let’s have a look at an example:

>>> # Import necessary style classes
>>> from openpyxl.styles import Font,Color,Alignment,Border,Side,colors

>>> # Create a few styles
>>> Bold_Font = Font(bold=True)
>>> Big_Red_Text = Font(color=colors.RED,size=20>>> Center_Aligned_Text = Alignment(horizontal="center">>> Double_Border_Side = Side(border_style="double">>> Square_Border = Border(top=double_border_side, 
...                       right=double_border_side, 
...                       bottom=double_border_side, 
...                       left=double_border_side) 

>>> # Style some cells!
>>> spreadsheet["A2"].font = Bold_Font
>>> spreadsheet["A3"].font = Big_Red_Text
>>> spreadsheet["A4"].alignment = Center_Aligned_Text
>>> spreadsheet["A5"].border = Square_Border
>>> workbook.save(filename="sample_styles.xlsx") 

If you want to apply multiple styles to one or several cells in our spreadsheets,you can use  NamedStyle class: 

>>> from openpyxl.styles import NamedStyle

>>> # Let's create a style template for the header row
>>> header = NamedStyle(name="header")
>>> header.font = Font(bold=True)
>>> header.border = Border(bottom=Side(border_style="thin"))
>>> header.alignment = Alignment(horizontal="center",vertical="center")

>>> # Now let's apply this to all first row (header) cells
>>> header_row = sheet[1]
>>> for cell in header_row:
... cell.style = header

>>> workbook.save(filename="sample_styles.xlsx") 

Adding Charts to our Spreadsheet

Charts are a good way to compute and understand large amounts of data quickly and easily. We have a lot of charts such as bar chart, pie chart, line chart, and so on. 

Let us start by creating a new workbook with some data: 

 1 from openpyxl import Workbook 
 2 from openpyxl.chart import BarChart,Reference 
 3  
 4 workbook = Workbook() 
 5 spreadsheet = workbook.active 
 6  
 7 # Let's create some sample sales data 
 8 rows = [ 
 9    ["Product","Online","Store"], 
10    [1,30,45], 
11    [2,40,30], 
12    [3,40,25], 
13    [4,50,30], 
14    [5,30,25], 
15    [6,25,35], 
16    [7,20,40], 
17 ] 
18  
19 for row in rows:
20    spreadsheet .append(row)

Now let us create a bar chart that will show the total number of sales per product: 

22 chart = BarChart()
23 data = Reference(worksheet=sheet,
24                 min_row=1, 
25                 max_row=8, 
26                 min_col=2, 
27                 max_col=3) 
28  
29 chart.add_data(data,titles_from_data=True)
30 spreadsheet .add_chart(chart, "E2")
31
32 workbook.save("chart.xlsx")

You can also create a line chart by simply making some changes to the data: 

 1 import random 
 2 from openpyxl import Workbook 
 3 from openpyxl.chart import LineChart,Reference 
 4  
 5 workbook = Workbook()
 6 sheet = workbook.active
 7  
 8 # Let's create some sample sales data 
 9 rows= [
10    ["", "January", "February", "March", "April", 
11    "May", "June", "July", "August", "September", 
12     "October", "November", "December"], 
13    [1, ],
14    [2, ],
15    [3, ],
16 ] 
17  
18 for row in rows: 
19    sheet.append(row)
20  
21 for row in sheet.iter_rows(min_row=2,
22                           max_row=4,
23                           min_col=2,
24                           max_col=13):
25    for cell in row:
26        cell.value = random.randrange(5,100)

There are numerous types of charts and various types of customizations you can apply to your spreadsheet to make it more attractive.

Convert Python Classes to Excel Spreadsheet

Let us now learn how to convert the Excel Spreadsheet data to Python classes.  

Assume we have a database and we use some Object Relational mapping to map the database into Python classes and then export the objects into spreadsheets: 

from dataclasses import dataclass
from typing import List

@dataclass
class Sale:
    id: str 
    quantity: int

@dataclass 
class Product: 
    id: str 
    name: str 
    sales:List[Sale] 

Now, to generate some random data, let’s assume that the above classes are stored in  db_classes.py file then: 

 1 import random 
 2  
 3 # Ignore these for now. You'll use them in a sec ;) 
 4 from openpyxl import Workbook 
 5 from openpyxl.chart import LineChart,Reference 
 6  
 7 from db_classes import Product,Sale 
 8  
 9 products_range = [] 
10  
11 # Let's create 5 products
12 for idx in range(1,6):
13    sales = [] 
14  
15    # Create 5 months of sales 
16    for_in range(5):
17        sale_range = Sale(quantity=random.randrange(5,100))
18        sales.append(sale)
19  
20    product = Product(id=str(idx),
21                      name="Product %s" % idx,
22                      sales=sales)
23    products_range.append(product)

By running this code, we will get 5 products in 5 months of sale with a random quantity of sales for each month. 

Now, we have  to convert this into a spreadsheet in which we need to iterate over the data: 

25 workbook = Workbook() 
26 spreadsheet = workbook.active 
27
28 # Append column names first 
29 spreadsheet.append(["Product ID","Product Name","Month 1", 
30              "Month 2","Month 3","Month 4","Month 5"]) 
31  
32 # Append the data 
33 for product in products_range:
34    data = [product.id,product.name]
35    for sale in product.sales:
36        data.append(sale.quantity) 
37    spreadsheet.append(data) 

This will create a spreadsheet with some data coming from your database. 

How to work with pandas to handle Spreadsheets?

How to work with pandas to handle Spreadsheets

We have learned to work with Excel in Python because Excel is one of the most popular tools and finding a way to work with Excel is critical. Pandas is a great tool to work with Excel in Python. It has unique methods to read all kinds of data in an Excel file and we can export items back to Excel using it. 

To use it, at first we need to install pandas package: 

$ pip install pandas 

Then, let’s create a simple DataFrame: 

 1 import pandas as pd 
 2  
 3 data = { 
 4    "Product Name":["Product 1","Product 2"], 
 5    "Sales Month 1":[10, 20], 
 6    "Sales Month 2":[5, 35], 
 7 } 
 8 dataframe = pd.DataFrame(data)

Now we have some data, and to convert it from a DataFrame into a worksheet we generally use .dataframe_to_rows()

10 from openpyxl import Workbook
11 from openpyxl.utils.dataframe import  dataframe_to_rows 
12  
13 workbook = Workbook() 
14 spreadsheet = workbook.active 
15  
16 for row in dataframe_to_rows(df, index=False,header=True): 
17    spreadsheet .append(row)
18  
19 workbook.save("pandas_spreadsheet.xlsx")

We need to use  read_excel method to read data from pandas DataFrame object. 

excel_file =’movies.xls’ 
movies=pd.read_excel(excel_file) 

We can also use Excel file class to use multiple sheets from the same excel file: 

movies_sheets = []
for sheet in xlsx.sheet_names:
    movies_sheets.append(xlsx.parse(sheet))
    movies = pd.concat(movies_sheets))

Indexes and columns allows you to access data from your DataFrame easily: 

>>> df.columns 
Index(['marketplace', 'customer_id', 'review_id', 'product_id', 
       'product_parent', 'product_title', 'product_category', 'star_rating', 
       'helpful_votes', 'total_votes', 'vine', 'verified_purchase', 
       'review_headline', 'review_body', 'review_date'], 
      dtype='object') 
>>> # Get first 10 reviews' star rating 
>>> df["star_rating"][:10]
R3O9SGZBVQBV76    5 
RKH8BNC3L5DLF     5 
R2HLE8WKZSU3NL    2 
R31U3UH5AZ42LL    5 
R2SV659OUJ945Y    4 
RA51CP8TR5A2L     5 
RB2Q7DLDN6TH6     5 
R2RHFJV0UYBK3Y    1 
R2Z6JOQ94LFHEP    5 
RX27XIIWY5JPB     4 
Name: star_rating, dtype: int64

>>> # Grab review with id "R2EQL1V1L6E0C9", using the index
>>> df.loc["R2EQL1V1L6E0C9"]
marketplace               US 
customer_id         15305006 
review_id     R2EQL1V1L6E0C9 
product_id        B004LURNO6 
product_parent     892860326 
review_headline   Five Stars 
review_body          Love it 
review_date       2015-08-31 
Name: R2EQL1V1L6E0C9, dtype: object 

Summary 

In this article we have covered: 

  • How to extract information from spreadsheets  
  • How to create Spreadsheets in different ways 
  • How to customize a spreadsheet by adding filters, styles, or charts and so on 
  • How to use pandas to work with spreadsheets 

Now you are well aware of the different types of implementations you can perform with spreadsheets using Python. However, if you are willing to gather more information on this topic, you can always rely on the official documentation of openpyxl. To gain more knowledge about Python tips and tricks, check out our Python tutorial. To gain mastery over Python coding,join ourPython certification course

 

Priyankur

Priyankur Sarkar

Data Science Enthusiast

Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.

Join the Discussion

Your email address will not be published. Required fields are marked *

Suggested Blogs

Scala Vs Python Vs R Vs Java - Which language is better for Spark & Why?

One of the most important decisions for the Big data learners or beginners is choosing the best programming language for big data manipulation and analysis. Just understanding business problems and choosing the right model is not enough but implementing them perfectly is equally important and choosing the right language (or languages) for solving the problem goes a long way. If you search top and highly effective programming languages for Big Data on Google, you will find the following top 4 programming languages: JavaScalaPythonRJavaJava is one of the oldest languages of all 4 programming languages listed here. Traditional Frameworks of Big data like Apache Hadoop and all the tools within its ecosystem are Java-based and hence using java opens up the possibility of utilizing large ecosystem of tools in the big data world.  ScalaA beautiful crossover between object-oriented and functional programming language is Scala. Scala is a highly Scalable Language. Scala was invented by the German Computer Scientist, Martin Odersky and the first version was launched in the year 2003.PythonPython was originally conceptualized by Guido van Rossum in the late 1980s. Initially, it was designed as a response to the ABC programming language and later gained its popularity as a functional language in a big data world. Python has been declared as one of the fastest-growing programming languages in 2018 as per the recently held Stack Overflow Developer Survey. Many data analysis, manipulation, machine learning, deep learning libraries are written in Python and hence it has gained its popularity in the big data ecosystem. It’s a very user-friendly language and it is its biggest advantage.  Fun factPython is not named after the snake. It’s named after the British TV show Monty Python.RR is the language of statistics. R is a language and environment for statistical computing and graphics. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is named partly after the first names of the first two R authors and partly as a play on the name of S*. The project was conceived in 1992, with an initial version released in 1995 and a stable beta version in 2000.*SS is a statistical programming language developed primarily by John Chambers and R is an implementation of the S programming language combined with lexical scoping semantics, inspired by Scheme.Every framework is implemented in the underlying programming language for its implementation. Ex Zend uses PHP, Panda Framework uses python similarly Hadoop framework uses Java and Spark uses Scala.However, Spark officially supports Java, Scala, Python and R, all 4 languages. If one browses through Apache Spark’s official website documentation, he/she would find many other languages utilized by the open-source community for Spark implementation.    When any developer wants to start learning Spark, the first question he stumbles upon is, out of these pools of languages, which one to use and which one to master? Solution Architects would have a tough time choosing the right language for spark framework and Organizations will always be wondering, which skill sets are relevant for my problem if one doesn’t have the right knowledge about these languages in the context of Spark.    This article will try to answer all these queries.so let’s start-JavaOldest of all and popular, widely adopted programming language of all. There is a number offeatures/advantages due to which Java is favorite for Big data developers and tool creators:Java is platform-agnostic language and hence it can run on almost any system. Java is portable due to something called Java Virtual Machine – JVM. JVM is a foundation of Hadoop ecosystem tools like Map Reduce, Storm, Spark, etc. These tools are written in Java and run on JVM.Java provides various communities support like GitHub and stack overflow etc.Java is scalable, backward compatible, stable and production-ready language. Also, supports a large variety of tried and tested libraries.It is statically typed language (We would see details of this functionality in later sections, in comparison with others)Java is mostly the choice for most of the big data projects but for the Spark framework, one has to ponder upon, whether Java would be the best fit.One major drawback of Java is its verbosity. One has to write long code (number of lines of code) to achieve simple functionality in Java.Java does not support Read-Evaluate-Print-Loop (REPL) which is a major deal-breaker when choosing a programming language for big data processing.ScalaScala is comparatively new to the programming scene but has become popular very quickly. Above are a few quotes from bigger names in the industry for Scala. From the Spark context, many experts prefer Scala over other programming languages as Spark is written in Scala. Scala is the native language of Spark. It means any new API always first be available in Scala.Scala is a hybrid functional programming language because It has both the features of object-oriented programming and functional programming. As an OO Programming Language, it considers every value as an object and all OOPS concepts apply. As a functional programming language, it defines and supports functions. All operations are done as functions. No variable stands by itself. Scala is a machine-compiled language.Scala and Java are popular programming languages that run over JVM. JVM makes these languages framework friendly. One can say, Scala is an advanced level of Java.Features/Advantages of Scala:It’s general-purpose object-oriented language with functional language properties too. It’s less verbose than Java.It can work with JVM and hence is portable.It can support Java APIs comfortably.It's fast and robust in Spark context as its Spark native.It is a statically typed language.Scala supports Read-Evaluate-Print-Loop (REPL)Drawbacks / Downsides of Scala:Scala is complex to learn due to the functional nature of language.Steep learning curve.Lack of matured machine learning languages.PythonPython is one of the de-facto languages of Data Science. It is a simple, open-source, general-purpose language and is very easy to learn. It has a rich set of libraries, utilities, ready-to-use features and support to a number of mature machine learning, big data processing, visualization libraries.Advantages of Python:It is interpreted language (i.e. support to REPL, Read, Evaluate, Print, Loop.) If you type a command into a command-line interpreter and it responds immediately. Java lacks this feature.Easy to learn, easy debugging, fewer lines of code.It is dynamically typed. i.e. can dynamically defined variable types. i.e. Python as a language is type-safe.Python is platform agnostic and scalable.Drawbacks/Disadvantages:Python is slow. Big data professionals find projects built in Java / Scala are faster and robust than the once with python.Whilst using user-defined functions or third party libraries in Python with Spark, processing would be slower as increased processing is involved as Python does not have equivalent Java/Scala native language API for these functionalities.Python does not support heavy weight processing fork() using uWSGI but it does not support true multithreading.R LanguageR is the favourite language of statisticians. R is fondly called a language of statisticians.  It’s popular for research, plotting, and data analysis. Together with RStudio, it makes a killer statistic, plotting, and data analytics application.R is majorly used for building data models to be used for data analysis.Advantages/Features of R:Strong statistical modeling and visualization capabilities.Support for ‘data science’ related work.It can be integrated with Apache Hadoop and Spark easily.Drawbacks/Disadvantages of R:R is not a general-purpose language.The code written in R cannot be directly deployed into production. It needs conversion into Java or Python.Not as fast as Java / Scala.Comparison of four languages for Apache SparkWith the introduction of these 4 languages, let’s now compare these languages for the Spark framework:These languages can be categorized into 2 buckets basis high-level spark architecture support, broadly:JVM Languages: Java and ScalaNon-JVM Languages: Python and RDue to these categorizations, performance may vary. Let’s understand architecture in little depth to understand the performance implications of using these languages. This would also help us to understand the question of when to use which language.Spark Framework High-level architecture An application written in any one of the languages is submitted on the driver node and further driver node distributes the workload by dividing the execution on multiple worker nodes.JVM compatible Application Execution Flow Consider the applications written are JVM compatible (Java/Scala). Now, Spark is also written in native JVM compatible Scala language, hence there is no explicit conversion required at any point of time to execute JVM compatible applications on Spark. Also, this makes the native language applications faster to perform on the Spark framework.There are multiple scenarios for Python/R written applications:Python/R driver talk to JVM driver by socket-based API. On the driver node, both the driver processes are invoked when the application language is non-JVM language.Scenario 1: Applications for which Equivalent Java/Scala Driver API exists - This scenario executes the same way as JVM compatible applications by invoking Java API on the driver node itself. The cost for inter-process communication through sockets is negligible and hence performance is comparable. This is with the assumption that processed data over worker nodes are not to be sent back to the Driver again.Scenario 1(b): If the assumption taken is void in scenario 1 i.e. processed data on worker nodes is to be sent back to driver then there is significant overhead and serialization required. This adds to processing time and hence performance in this scenario deteriorates.Scenario 2: Applications for which Equivalent Java/Scala Driver API do not exist – Ex. UDF (User-defined functions) / Third party python libraries. In such cases equivalent Java API doesn’t exist and hence, additional executor sessions are initiated on worker node and python API is serialized on worker node and executed. This python worker processes in addition to JVM and coordination between them is overhead. Processes also compete for resources which adds to memory contention.In addition, if the data is to send back to the driver node then processing takes a lot of time and problem scales up as volume increases and hence performance is bigger problem.As we have seen a performance, Let’s see the tabular comparison between these languages.Comparison PointsJavaScalaPythonRPerformanceFasterFaster (about 10x faster than Python)SlowerSlowerLearning CurveEasier than JavaTougher than PythonSteep learning curve than Java & PythonEasiestModerateUser GroupsWeb/Hadoop programmersBig Data ProgrammersBeginners & Data EngineersData Scientists/ StatisticiansUsageWeb development and Hadoop NativeSpark NativeData Engineering/ Machine Learning/ Data VisualizationVisualization/ Data Analysis/ Statistics use casesType of LanguageObject-Oriented, General PurposeObject-Oriented & Functional General PurposeGeneral PurposeSpecifically for Data Scientists.Needs conversion into Scala/Python before productizingConcurrencySupport ConcurrencySupport ConcurrencyDoes not Support ConcurrencyNAEase of UseVerboseLesser Verbose than ScalaLeast VerboseNAType SafetyStatically typedStatically typed (except for Spark 2.0 Data frames)Dynamically TypedDynamically TypedInterpreted Language (REPL)NoNoYesYesMaturated machine learning libraries availability/ SupportLimitedLimitedExcellentExcellentVisualization LibrariesLimitedLimitedExcellentExcellentWeb Notebooks SupportIjava Kernel in Jupyter NotebookApache Zeppelin Notebook SupportJupyter Notebook SupportR NotebookWhich language is better for Spark and Why?With the info we gathered for the languages, let's move to the main question i.e. which language to choose for Spark? My answer is not a straightforward single language for this question. I will state my point of view for choosing the proper language: If you are a beginner and want to choose a language from learning Spark perspective. If you are organization/ self employed or looking to answer a question for solutioning a project perspective. I. If you are beginner:If you are a beginner and have no prior education of programming language then Python is the language for you, as it’s easy to pick up. Simple to understand and very user-friendly. It would prove a good starting point for building Spark knowledge further. Also, If you are looking for getting into roles like ‘data engineering’, knowledge of Python along with supported libraries will go a long way. If you are a beginner but have education in programming languages, then you may find Java very familiar and easy to build upon prior knowledge. After all, it grapevine of all the languages.  If you are a hardcore bigdata programmer and love exploring complexities, Scala is the choice for you. It’s complex but experts say if once you love Scala, you will prefer it over other languages anytime.If you are a data scientist, statistician and looking to work with Spark, R is the language for you. R is more science oriented than Python. II. If you are organization/looking for choice of language for implementations:You need to answer the following important questions before choosing the language:Skills and Proficiency: Which skill-sets and proficiency over language, you already have with you/in your team?Design goals and availability of features/ Capability of language: Which libraries give you better support for the type of problem(s) you are trying to solve.Performance implications Details of these explained below: 1. Skillset: This is very straightforward. Whichever is available skill set within a team, go with that to solve your problem, after evaluating answers of other two questions. If you are self-employed, the one you have proficiency is the most likely suitable choice of language.  2. Library Support:  Following gives high-level capabilities of languages:R: Good for research, plotting, and data analysis.Python: Good for small- or medium-scale projects to build models and analyse data, especially for fast start-ups or small teams.Scala/Java: Good for robust programming with many developers and teams; it has fewer machine learning utilities than Python and R, but it makes up for it with increased code maintenance.In my opinion, Scala/Java can be used for larger robust projects to ease maintenance. Also, If one wants the app to scale quickly and needs it to be robust, Scala is the choice.Python and R: Python is more universal language than R, but R is more science oriented. Broadly, one can say Python can be implemented for Data engineering use cases and R for Data science-oriented use cases. On the other hand, if you discover these two languages have about the same library support you need, then pick the one whose syntax you prefer. You may find that you need both depending on the situation. 3. Performance: As seen earlier in the article, Scala/ Java is about 10x faster than Python/R as they are JVM supported languages. However, if you are writing Python/R applications wisely (like without using UDFs/ Not sending data back to the Driver etc) they can perform equally well.ConclusionFor learning, depending upon your prior knowledge, Python is the easiest of all to pick up. For implementations, Choice is in your hands which language to choose for implementations but let me tell you one secret or a tip, you don’t have to stick to one language until you finish your project. You can divide your problem in small buckets and utilize the best language to solve the problem. This way, you can achieve balance between optimum performance, availability, proficiency in a skill, and sub-problem at hand.  Do let us know how your experience was in learning the language comparisons and the language you think is better for Spark. Moreover, which one you think is “the one for you”, through comments below.
Rated 4.5/5 based on 1 customer reviews
8240
Scala Vs Python Vs R Vs Java - Which language is b...

One of the most important decisions for the Big da... Read More

What is Context in React? How to use Context in React?

What the hack is Context?Have you ever wondered about passing data or using states in between different components without using Props? Or passing a state from Parent to Child component without manually passing a prop at every level?  Let’s understand with an example below:Here we have a parent component app.js where we have defined our states. We want to access the data of state in the last child which is “Child 1.2” in the below chart.app.js Parent ComponentThe ideal or older approach in React is to pass the data from the root component to the last component is via Props. We have to pass props in each intermediary level so as to send in the last level. While this approach also works, the real problems begin if data is needed on a different branch i.e Child 2.1 or Child 2.2 in above chart…In order to solve this problem, we need to pass data from the root/top level of the application through all the intermediate components to the one where we want to pass the data, even though some intermediate components don't even need it.  This mind-numbing process is known as prop drilling,  Prop Drillingwhere you’re passing the state from your root component to the bottom and you end up passing the data via props through components that do not even necessarily need themOne really good solution to solve the above problem is using Context According to the React documentation:  “Context provides a way to pass data through the component tree without having to pass props down manually at every level”Ordinarily, we’d have used any state management library like Redux or have used HOC’s to pass the data in a tedious manner. But what if we don’t want to use it all? Here comes the role of new Context API!In layman words, it gives an approach to make specific data available to all components throughout the React component tree regardless of how deeply nested those components are.Context is just like a global object to the subtree of the React component.When to use the Context APIThe Context API is convenient for sharing data that is either global, such as setting the header and footer theme of a website or logic of user authentication and many more. In cases like these, we can use the Context API without using any extra library or external modules. It can also be used in a multilingual application where we want to implement multiple languages that can be translated into the required text with the help of ContextAPI. It will save prop-drilling   In fact, in any situation where we have to pass a prop through a component so it reaches another component, inside down the tree is where we can use the Context API.Introducing The Context APIThe context API is a way to pass data from top component to bottom ones, without manually passing it to via props. Context is fundamentally utilized when some data needs to be accessible by numerous components at different nesting levels. To create a new Context, we can use the React createContext function like below: const MyContext = React.createContext(defaultValue);In React, data is often passed from a parent to its child component as a property. Here, we can also omit the default value which we have passed to the context, if needed.React data passing from parent to its child Let’s Get Started With ContextThree things are needed to tap into the power of context: 1. The context itselfTo create a context we can use React.createContext method which creates a context object. This is used to ensure that the components at different level can use the same context to fetch the data.   In React.createContext, we can pass an input parameter as an argument which could be anything or it can be null as well.import React from `react';  const ThemeContext = React.createContext('dark');  // Create our context        export default ThemeContext;In this example, a string is passed for the current Context which is “dark”. So we can say, the current theme required for a specific component is Dark.   Also, we have exported the object so that we can use it in other places. In one app, React also allows you to create multiple contexts. We should always try to separate context for different purposes, so as to maintain the code structure and better readability. We will see that later in our reading.   What next?? Now, to utilize the power of Context in our example, we want to provide this type of theme to all the components.  Context exposes a pair of elements which is a Provider Component and a Consumer Component.2. A context providerOkay, so now we have our Context object. And to make the context available to all our components we have to use a Provider.   But, what is Provider? According to the React documentation:"every context object comes with a Provider React component that allows consuming components to subscribe to context changes"In other words, Provider accepts a prop (value) and the data in this prop can be used in all the other child components. This value could be anything from the component state.// myProvider.js import React from 'react'; import Theme from './theme'; const myProvider = () => ( ...   ); export default myProvider;We can say that a provider acts just like a delivery service.prop finding context and deliverling it to consumerWhen a consumer asks for something, it finds it in the context and delivers it to where it's needed.But wait, who or what is the consumer???3.  A context consumer What is Consumer? A consumer is a place to keep the stored information. It can request for the data using the provider and can even manipulate the global store if the provider allows it. In our previous example, let’s grab the theme value and use it in our Header component. // Header.js   import React from 'react'; import Theme from './theme';   const Header = () => (                        {theme => Selected theme is {theme}}             );   export default Header;Dynamic Context:   We can also change the value of the provider by simply providing a dynamic context. One way of achieving it is by placing the Provider inside the component itself and grabbing the value from component state as below:// Footer.js   import React from 'react';   class Footer extends React.Component {    state = {        theme: 'dark'    };      render() {        return (                                                );    } }Simple, no? We can easily change the value of  the Provider to any Consumer.Consuming Context With Class-based ComponentsWe all pretty know that there are two methods to write components in React, which is Class based components and Function based components. We have already seen a demo of how we can use the power of Context in class based components.  One is to use the context from Consumer like “ThemeContext.Consumer” and the other method is by assigning context object from current Context to contextType property of our class.import React, { Component } from "react"; import MyThemeContext from "../Context/MyThemeContext"; import GlobalTheme from "../theme";   class Main extends Component {    constructor() {        super();    }    static contextType = MyThemeContext;  //assign context to component    render() {        const currentTheme = GlobalTheme[this.context];        return (            ...        );    }   }There is always a difference in how we want to use the Context. We can either provide it outside the render() method or use the Context Consumer as a component itself.  Here in the above example, we have used a static property named as contextType which is used to access the context data. It can be utilized by using this.context. This method however, limits you consuming, only one context at a time.Consuming Context With Functional ComponentsContext with Functional based components is quite easy and less tedious. In this we can access the context value through props with the help of useContext method in React. This hook (useContext) can be passed in as the argument along with our Context to consume the data in the functional component.const value = useContext(MyContext);It accepts a context object and returns the current context value. To read more about hooks, read here.  Our previous example looks like:import React, { useContext } from 'react' import MyThemeContext from './theme-context'   const User = props => {    const context = useContext(MyThemeContext)    return ...Now, instead of wrapping our content in a Consumer component we have access to the theme context state through the ‘context’ variable.But we should avoid using context for keeping the states locally. Instead of  conext, we can use local state there.Use of Multiple ContextsIt may be possible that we want to add multiple contexts in our application. Like holding a theme for the entire app, changing the language based on the location, performing some A/B testing, using global parameters for login or user Profile… For instance, let’s say there is a requirement to keep both Theme context and userInfo Context, the code will look like as:       ...   It’s quite possible in React to hold multiple Contexts, but this definitely hampers rendering, serving ‘n’ number of contexts in ‘m’ component and holding the updated value in each rendered component.To avoid this and to make re-rendering faster, it is suggested to make each context consumer in the tree as a separate node or into different contexts.                 And we can perform the nesting in context as:    {theme => (                    {colour => (                Theme: {theme} and colour: {colour}            )}            )} It’s worth noting that when a value of a context changes in the parent component, the child components or the components’ holding that value should be rerendered or changed. Hence, whenever there is a change in the value of provider, it will cause its consumers to re-render.ConclusionDon’t you think this concept is just amazing?? Writing a global context like theme or language or userProfile and using the data of them directly in the child or other components? Implementing these stateful logic by global preferences was never so easy, but Context made this transportation job a lot simple and achievable! Hope you find this article useful. Happy Coding!Having challenge learning to code? Let our experts help you with customized courses!
Rated 4.0/5 based on 1 customer reviews
7928
What is Context in React? How to use Context in Re...

What the hack is Context?Have you ever wondered ab... Read More

How to use sys.argv in Python

The sys module is one of the common and frequently used modules in Python. In this article, we will walk you through how to use the sys module. We will learn about what argv[0] and sys.argv[1] are and how they work. We will then go into how to parse Command Line options and arguments, the various ways to use argv and how to pass command line arguments in Python 3.x In simple terms,Command Line arguments are a way of managing the script or program externally by providing the script name and the input parameters from command line options while executing the script. Command line arguments are not specific just to Python. These can be found in other programming languages like C, C# , C++, PHP, Java, Perl, Ruby and Shell scripting. Understanding sys.argv with examples  sys.argv is a list in Python that contains all the command-line arguments passed to the script. It is essential in Python while working with Command Line arguments. Let us take a closer look with a few examples. With the len(sys.argv) function, you can count the number of arguments. import sys print ("Number of arguments:", len(sys.argv), "arguments") print ("Argument List:", str(sys.argv)) $ python test.py arg1 arg2 arg3 Number of arguments: 4 arguments. Argument List: ['test.py', 'arg1', 'arg2', 'arg3']Module name to be used while using sys.argv To use sys.argv, you will first need to the sys module. What is argv[0]? Remember that sys.argv[0] is the name of the script. Here – Script name is sysargv.py import sys print ("This is the name of the script: ", sys.argv[0]) print ("Number of arguments: ", len(sys.argv)) print ("The arguments are: " , str(sys.argv))Output:This is the name of the script:  sysargv.py                                                                               Number of arguments:  1                                                                                                 The arguments are:  ['sysargv.py']What is "sys. argv [1]"? How does it work? When a python script is executed with arguments, it is captured by Python and stored in a list called sys.argv. So, if the below script is executed: python sample.py Hello Python Then inside sample.py, arguments are stored as: sys.argv[0] == ‘sample.py’ sys.argv[1] == ‘Hello’ sys.argv[2] == ‘Python’Here,sys.argv[0] is always the filename/script executed and sys.argv[1] is the first command line argument passed to the script . Parsing Command Line options and arguments  Python provides a module named as getopt which helps to parse command line options and arguments. Itprovides a function – getopt, whichis used for parsing the argument sequence:sys.argv. Below is the syntax: getopt.getopt(argv, shortopts, longopts=[]) argv: argument list to be passed.shortopts: String of short options as list . Options in the arguments should be followed by a colon (:).longopts: String of long options as list. Options in the arguments should be followed by an equal sign (=). import getopt import sys   first ="" last ="" argv = sys.argv[1:] try:     options, args = getopt.getopt(argv, "f:l:",                                ["first =",                                 "last ="]) except:     print("Error Message ")   for name, value in options:     if name in ['-f', '--first']:         first = value     elif name in ['-l', '--last']:         last = value   print(first + " " + last)Output:(venv) C:\Users\Nandank\PycharmProjects\DSA\venv>python getopt_ex.py -f Knowledge -l Hut Knowledge Hut (venv) C:\Users\Nandank\PycharmProjects\DSA\venv>python getopt_ex.py --first Knowledge –last Hut Knowledge HutWhat are command line arguments? Why do we use them? Command line arguments are parameters passed to a program/script at runtime. They provide additional information to the program so that it can execute. It allows us to provide different inputs at the runtime without changing the code. Here is a script named as argparse_ex.py: import argparse parser = argparse.ArgumentParser() parser.add_argument("-n", "--name", required=True) args = parser.parse_args() print(f'Hi {args.name} , Welcome ')Here we need to import argparse package Then we need to instantiate the ArgumentParser object as parser. Then in the next line , we add the only argument, --name . We must specify either shorthand (-n) or longhand versions (--name)  where either flag could be used in the command line as shown above . This is a required argument as mentioned by required=True Output:  (venv) C:\Users\Nandank\PycharmProjects\DSA\venv>python argparse_ex.py --name Nandan  Hi Nandan , Welcome  (venv) C:\Users\Nandank\PycharmProjects\DSA\venv>python argparse_ex.py -n Nandan  Hi Nandan , Welcome The example above must have the --name or –n option, or else it will fail.(venv) C:\Users\Nandank\PycharmProjects\DSA\venv>python argparse_ex.py --name   usage: argparse_ex.py [-h] --name NAME argparse_ex.py: error: the following arguments are required: --namePassing command line arguments in Python 3.x argv represents an array having the command line arguments of thescript . Remember that here, counting starts fromzero [0], not one (1). To use it, we first need to import sys module (import sys). The first argument, sys.argv[0], is always the name of the script and sys.argv[1] is the first argument passed to the script. Here, we need to slice the list to access all the actual command line arguments. import sys if __name__ == '__main__':     for idx, arg in enumerate(sys.argv):        print("Argument #{} is {}".format(idx, arg))     print ("No. of arguments passed is ", len(sys.argv))Output:(venv) C:\Users\Nandank\PycharmProjects\DSA\venv\Scripts>python argv_count.py Knowledge Hut 21 Argument #0 is argv_count.py Argument #1 is Knowledge Argument #2 is Hut Argument #3 is 21 No. of arguments passed is  4Below script - password_gen.py is used to generate a secret password by taking password length as command line argument.import secrets , sys, os , string ''' This script generates a secret password using possible key combinations''' ''' Length of the password is passed as Command line argument as sys.argv[1]''' char = string.ascii_letters+string.punctuation+string.digits length_pwd = int(sys.argv[1])   result = "" for i in range(length_pwd):     next= secrets.SystemRandom().randrange(len(char))     result = result + char[next] print("Secret Password ==" ,result,"\n")Output:(venv) C:\Users\Nandank\PycharmProjects\DSA\venv\Scripts>python password_gen.py 12 Secret Password == E!MV|,M][i*[Key takeaways Let us summarize what we've learnt so far. We have seen how to use the sys module in Python, we have looked at what areargv[0] and sys.argv[1] are and how they work, what Command Line arguments are and why we use them and how to parse Command Line options and arguments. We also dived into multiple ways to use argv and how to pass command line arguments in Python 3.xHope this mini tutorial has been helpful in explaining the usage of sys.argv and how it works in Python. Be sure to check out the rest of the tutorials on KnowledgeHut’s website and don't forget to practice with your code! 
Rated 4.0/5 based on 14 customer reviews
5987
How to use sys.argv in Python

The sys module is one of the common and frequently... Read More

20% Discount