Search

Python Programming Filter

How to Work with Excel Spreadsheets using Python

Excel is considered as one of the most popular and widely used spreadsheet applications developed by Microsoft. You can organize, analyze and store your data into tabular sheets with the help of Excel. From analysts and sales managers, to CEOs, professionals from every field use Excel for creating quick statistics and for data crunching.Spreadsheets are commonly used in the present world because of their intuitive nature and the ability to handle large datasets. Most importantly, they can work without any prior technical background.Finding different ways to work with Excel using code is essential since working with data and in Python has some serious advantages in comparison with Excel’s UI. Developers of Python have implemented ways to read, write and manipulate Excel documents.You can check the quality of your spreadsheet application by going over the checklist below:Is the spreadsheet able to represent static data?Is the spreadsheet able to mix data, calculations, and reports?Is the data in your spreadsheet complete and consistent in nature?Does the spreadsheet have an organized worksheet structure?This checklist will help you in verifying the qualitative nature of the spreadsheet application you’re going to work on.Practical Applications  In this article, we would be using openpyxl to work on data. With the help of this module, you can extract data from a database into an Excel spreadsheet or you can also convert an Excel spreadsheet into a programmatic format. There can be a lot of possible situations where you might feel the need to use a package like openpyxl. Let us discuss a few of them to get a comprehensive overview of it.Importing New Products Into a Database Consider yourself working in an online store company. When they want to add new products to the online store, they make an Excel spreadsheet with a few hundred rows along with the name of the product, description, price and a few more basic information and then they give it to you. Now, if you want to import this particular data, you need to iterate over each row of the spreadsheet and then add each of the products into the database of the online store.[Text Wrapping Break] Exporting Database Data Into a SpreadsheetConsider you have a Database table. In this particular table, you have collected information of all your users which includes their name, contact number, email address, and so forth. Now, the Marketing Team is willing to collectively contact all the users and promote a new product of the company. However, neither do they have access to the Database nor they have any idea about using SQL to extract the information. In this situation, openpyxl comes to play. You can use it effectively to iterate over each User record and transform the required information into an Excel spreadsheet.    Appending Information to an Existing SpreadsheetConsider the same online store example we discussed above. You have an Excel spreadsheet with a list of users and your job is to append to each row the total amount they have spent in your store.In order to perform this, you have to read the spreadsheet first and then iterate through each row and fetch the total amount spent from the Database. Finally, you need to write it back to the spreadsheet.Starting openpyxlYou can install the openpyxl package using pip. Open your terminal and write the following command: $ pip install openpyxlAfter you have installed the spreadsheet, you can make up your own simple spreadsheet: from openpyxl import Workbook workbook = Workbook() spreadsheet = workbook.active spreadsheet["A1"] = "Hello" spreadsheet["B1"] = "World!" workbook.save(filename="HelloWorld.xlsx")How to Read Excel Spreadsheets with openpyxl Let us start with the most important thing that you can do with a spreadsheet,i.e. read it. We will be using a Watch Sample Dataset which contains a list of 100 watches with information like product name, product ID, review and so forth.  A Simple Way to Read an Excel Spreadsheet Let us start with opening our sample spreadsheet:>>> from openpyxl import load_workbook >>> workbook = load_workbook(filename="sample.xlsx") >>> workbook.sheetnames ['Sheet 1'] >>> spreadsheet = workbook.active >>> spreadsheet <Worksheet "Sheet 1"> >>> spreadsheet.titleIn the example code above, we open the spreadsheet using load_workbook and then we check all the sheets that are available to work with using workbook.sheetnames. Then Sheet 1 is automatically selected using workbook.active since it is the first sheet available. This is the most common way of opening a spreadsheet.  Now, let us see the code to retrieve data from the spreadsheet: >>> spreadsheet["A1"] <Cell 'Sheet 1'.A1> >>> spreadsheet["A1"].value 'marketplace' >>> spreadsheet["F10"].value "G-Shock Men's Grey Sport Watch"You can retrieve the actual value and the cell value  both. To get the actual value, use .value and to get the cell, you can use .cell():>>> spreadsheet.cell(row=10, column=6) <Cell 'Sheet 1'.F10> >>> spreadsheet.cell(row=10, column=6).value "G-Shock Men's Grey Sport Watch"Importing Data from a Spreadsheet In this section, we will discuss how to iterate through the data, and about conversion into a more useful format using Python.Let us first start with iterating through the data. There are a number of iterating methods that depend solely on the user.You can slice the data with a combination of rows and columns:>>> spreadsheet["A1:C2"] ((<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>),  (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)) You can also iterate through the dataset by ranging between rows and columns: >>> # Get all cells from column A  >>> spreadsheet["A"] (<Cell 'Sheet 1'.A1>,  <Cell 'Sheet 1'.A2>,   ...   <Cell 'Sheet 1'.A99>,   <Cell 'Sheet 1'.A100>) >>> # Get all cells for a range of columns >>> spreadsheet["A:B"]  ((<Cell 'Sheet 1'.A1>,    <Cell 'Sheet 1'.A2>,    ...    <Cell 'Sheet 1'.A99>,    <Cell 'Sheet 1'.A100>),   (<Cell 'Sheet 1'.B1>,    <Cell 'Sheet 1'.B2>,    ...    <Cell 'Sheet 1'.B99>,    <Cell 'Sheet 1'.B100>))  >>> # Get all cells from row 5 >>> spreadsheet[5] (<Cell 'Sheet 1'.A5>,  <Cell 'Sheet 1'.B5>,  ...   <Cell 'Sheet 1'.N5>,  <Cell 'Sheet 1'.O5>) >>> # Get all cells for a range of rows >>> spreadsheet[5:6] ((<Cell 'Sheet 1'.A5>,   <Cell 'Sheet 1'.B5>,    ...    <Cell 'Sheet 1'.N5>,    <Cell 'Sheet 1'.O5>),   (<Cell 'Sheet 1'.A6>,    <Cell 'Sheet 1'.B6>,    ...    <Cell 'Sheet 1'.N6>,    <Cell 'Sheet 1'.O6>)) Python offers arguments by which you can set limits to the iteration with the help of Python generators like .iter_rows() and .iter_cols(): >>> for row in spreadsheet.iter_rows(min_row=1, ... max_row=2, ... min_col=1, ... max_col=3): ... print(row) (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>) (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>) >>> for column in spreadsheet.iter_cols(min_row=1,  ... max_row=2, ... min_col=1, ... max_col=3): ... print(column) (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>) (<Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.B2>)  (<Cell 'Sheet 1'.C1>, <Cell 'Sheet 1'.C2>) You can also add Boolean values_only in the above example and set it to True to get the values of cell: >>> for value in spreadsheet.iter_rows(min_row=1,  ... max_row=2,  ... min_col=1,  ... max_col=3,  ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id') ('US', 3653882, 'R3O9SGZBVQBV76')Since we are now done with iterating the data, let us now manipulate data using Python’s primitive data structures. Consider a situation where you want to extract information of a product from the sample spreadsheet and then store it into the dictionary. The key to the dictionary would be the product ID.   Convert Data into Python classesTo convert data into Python data classes, let us first decide what we want to store and how to store it.  The two essential elements that can be extracted from the data are as follows:                                                     1. Products                                             2. Review                                                          • ID                                                         • ID                                                          • Title                                                     • Customers ID                                                          • Parent                                                 • Headline                                                          • Category                                            • Body                                                                                                                         • DateLet us implement the two elements: import datetime from dataclasses import dataclass @dataclass class Product: id: str parent: str title: str category: str @dataclass class Review: id: str customer_id: str stars: int headline: str body: str  date: datetime.datetime The next step is to create a mapping between columns and the required fields: >>> for value in spreadsheet.iter_rows(min_row=1, ... max_row=1, ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id', 'product_id', ...) >>> # Or an alternative >>> for cell in sheet[1]: ... print(cell.value) marketplace Customer_ID Review_ID Product_ID Product_Parent ...Finally, let us convert the data into new structures which will parse the data in spreadsheet into a list of products and review objects: from datetime import datetime  from openpyxl import load_workbook  from classes import Product,Review  from mapping import PRODUCT_ID,PRODUCT_PARENT,PRODUCT_TITLE, \ PRODUCT_CATEGORY,REVIEW_DATE,REVIEW_ID,REVIEW_CUSTOMER, \ REVIEW_STARS,REVIEW_HEADLINE,REVIEW_BODY # Using the read_only method since you're not gonna be editing the spreadsheet workbook = load_workbook(filename="watch_sample.xlsx",read_only=True)  spreadsheet = workbook.active products = [] reviews = [] # Using the values_only because you just want to return the cell value for row in spreadsheet .iter_rows(min_row=2, values_only=True):  product = Product(id=row[PRODUCT_ID],  parent=row[PRODUCT_PARENT],  title=row[PRODUCT_TITLE],  category=row[PRODUCT_CATEGORY])  products.append(product) # You need to parse the date from the spreadsheet into a datetime format spread_date = row[REVIEW_DATE]  parsed_date = datetime.strptime(spread_date,"%Y-%m-%d") review = Review(id=row[REVIEW_ID], Customer_ID=row[REVIEW_CUSTOMER], stars=row[REVIEW_STARS], headline=row[REVIEW_HEADLINE], body=row[REVIEW_BODY], date=parsed_date) reviews.append(review) print(products[0]) print(reviews[0])After you execute the code, you will get an output that looks like this:Product(id='A90FALZ1ZC',parent=937111370,...) Review(id='D3O9OGZVVQBV76',customer_id=3903882,...)Appending Data To understanding how to append data, let us hover back to the first sample spreadsheet. We will open the document and append some data to it: from openpyxl import load_workbook # Start by opening the spreadsheet and selecting the main sheet workbook = load_workbook(filename="hello_world.xlsx") spreadsheet = workbook.active # Write what you want into a specific cell spreadsheet["C1"]="Manipulating_Data ;)" # Save the spreadsheet workbook.save(filename="hello_world_append.xlsx"If you open your Excel file, you will notice the additional Manipulating_Data being added to an adjacent cell. Writing Excel Spreadsheets With openpyxl A spreadsheet is a file that helps to store data in specific rows and columns. We can calculate and store numerical data and also perform computation using formulas. So, let’s begin with some simple Spreadsheets and understand what each line means. Creating our first simple Spreadsheet 1 from openpyxl import Workbook  2    3 filename = "first_program.xlsx"  4    5 workbook = Workbook()  6 spreadsheet = workbook.active  7    8 sheet["A1"] = "first"  9 sheet["B1"] = "program!" 10   11 workbook.save(filename=filename)Line 5: In order to make a Spreadsheet, at first,  we have to create an Empty workbook to perform further operations. Lines 8 and 9 : We can add data to a specific cell as per our requirement. In this example, we can see that two values “first” and “program” have been added to specific cells in the sheet. Line 11: The line shows how to save data after all the operations we have done. Basic Spreadsheet Operations Before going to the difficult coding part, at first we have to build our building blocks like how to add and update values, how to manage rows and columns, adding filters, styles or formulas in a Spreadsheet. We have already explained the following code by which we can add values to a Spreadsheet: >>> spreadsheet["A1"] = "the_value_we_want_to_add"There is another way that we can add values to Spreadsheet: >>> cell = sheet["A1"] >>> cell <Cell 'Sheet'.A1> >>> cell.value 'hello' >>> cell.value = "hey" >>> cell.value 'hey'Line 1: In the first line at first we have declared the cell and updated its value. Line 5: We have printed the value of the cell as “first”  because  in the first program we have already assigned sheet["A1"]with “first” Line 8 : We have updated the value of the cell as "second"by simply assigning it to cell.value. Lines 9 : In this line, we have just printed the updated value of cell. Finally, you have to save all the operations you have performed into the spreadsheet once you call workbook.save().If  the cell didn’t exist while adding a value ,then openpyxl creates a cell:>>> # Before, our spreadsheet has only 1 row >>> print_rows() ('first', 'program!') >>> # Try adding a value to row 10 >>> spreadsheet["B10"] = "test" >>> print_rows() ('first', 'program!') (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, 'test') Managing Rows and Columns in Spreadsheet Insertion or deletion of rows (adding or removing elements of rows /columns) is one of the most basic operations in Spreadsheet. In openpyxl.We can perform these operations by simply calling these methods and passing its arguments. .insert_rows().delete_rows().insert_cols().delete_cols()We can pass 2 types of arguments to the methods :  idx amount Idx stands for index position and amount refers to the number of values we can store in the Spreadsheet. Using our basic knowledge based on the first  simple program, let’s see how we can use these methods inside the program: >>> print_rows() ('first', 'program!') >>> # Insert a column at the first position before column 1 ("A") >>> spreadsheet.insert_cols(idx=1) >>> print_rows() (None, 'first', 'program!') >>> # Insert 5 columns in  between column 2 ("B") and 3 ("C") >>> spreadsheet.insert_cols(idx=3,amount=5) >>> print_rows() (None, 'first', None, None, None, None, None, 'program!') >>> # Delete the created columns >>> spreadsheet.delete_cols(idx=3,amount=5) >>> v.delete_cols(idx=1) >>> print_rows() ('first', 'program!') >>> # Insert a new row in the beginning >>> spreadsheet.insert_rows(idx=1) >>> print_rows() (None, None) ('first', 'program!') >>> # Insert 3 new rows in the beginning  >>> spreadsheet.insert_rows(idx=1,amount=3) >>> print_rows() (None, None) (None, None)  (None, None)  (None, None)  ('first', 'program!') >>> # Delete the first 4 rows  >>> spreadsheet.delete_rows(idx=1,amount=4)  >>> print_rows()  ('first', 'program!') Managing SheetsWe have seen the following recurring piece of code in our previous examples .This is one of the ways of selecting the default sheet from the Spreadsheet: spreadsheet = workbook.activeHowever, if you want to open a spreadsheet with multiple sheets, you can write the following command: >>> # Let's say you have two sheets: "Products" and "Company Sales" >>> workbook.sheetnames ['Products', 'Company Sales'] >>> # You can select a sheet using its title >>> Products_Sheet = workbook["Products"] >>> Sales_sheet = workbook["Company Sales"]If we want to change the title of the Sheet, execute the following code: >>> workbook.sheetnames ['Products', 'Company Sales'] >>> Products_Sheet = workbook["Products"] >>> Products_Sheet.title = "New Products" >>> workbook.sheetnames ['New Products', 'Company Sales']We can CREATE / DELETE Sheets also with the help of two methods - .create_sheet() and  .remove(): >>> #To print the available sheet names >>> workbook.sheetnames  ['Products', 'Company Sales'] >>> #To create a new Sheet named "Operations" >>> Operations_Sheet = workbook.create_sheet("Operations") >>> #To print the updated available sheet names >>> workbook.sheetnames ['Products', 'Company Sales', 'Operations'] >>> # To define the position Where we want to create the Sheet(here “HR” sheet is created at the first position .Here index 0 represents the first position) >>> HR_Sheet = workbook.create_sheet("HR",0) >>> #To again  print the updated available sheet names >>> workbook.sheetnames ['HR', 'Products', 'Company Sales', 'Operations'] >>> # To remove them,we just have to send the sheet names as an argument which we want to delete to the method  .remove()  >>> workbook.remove(Operations_Sheet) >>> workbook.sheetnames ['HR', 'Products', 'Company Sales'] >>> #To delete hr_sheet >>> workbook.remove(hr_sheet) >>> workbook.sheetnames ['Products', 'Company Sales']Adding Filters to the Spreadsheet We can use openpyxl to add filters in our Spreadsheet but when we open our Spreadsheet, the data won’t be rearranged according to these sorts and filters. When you’re programmatically creating a spreadsheet and it is going to be sent and used by someone else, it is a good practice to add different filters and allow people to use it afterward. In the code below there is a simple example which shows how to add a simple filter to your spreadsheet: >>> # Check the used spreadsheet space using the attribute "dimensions" >>> spreadsheet.dimensions 'A1:O100' >>> spreadsheet.auto_filter.ref="A1:O100" >>> workbook.save(filename="watch_sample_with_filters.xlsx")Adding Formulas to the Spreadsheet Formulas are one of the most commonly used and powerful features of spreadsheets. By using formulas, you can solve various mathematical equations with the additional support of openpyxl which makes those calculations as simple as editing a specific cell’s value.The list of formulas supported by openpyxl are:>>> from openpyxl.utils import FORMULAE >>> FORMULAE frozenset({'ABS',            'AMORLINC',            'ACCRINT',             'ACOS',             'ACCRINTM',             'ACOSH',              ...,                   'AND',            'YEARFRAC',             'YIELDDISC',             'AMORDEGRC',             'YIELDMAT',             'YIELD',             'ZTEST'}) Let’s add some formulas to our spreadsheet. Let’s check the average star rating of  the 99 reviews within the spreadsheet: >>> # Star rating is in column "H"  >>> spreadsheet["P2"] = "=AVERAGE(H2:H100)" >>> workbook.save(filename = "first_example.xlsx")Now, if we open your spreadsheet and go to cell P2, you can see the value to be 4.18181818181818.  Similarly, we can use this methodology to include any formulas for our requirements in our spreadsheet. For example, if we want to count the number of helpful reviews: >>> # The helpful votes  counted in column "I"  >>> spreadsheet["P3"] = '=COUNTIF(I2:I100, ">0")' >>> workbook.save(filename = "first_example.xlsx") Adding Styles to the SpreadsheetIt is not so important and usually, we don’t use this in everyday code but for the sake of completeness, we will also understand this with the following example.Using openpyxl, we get multiple styling options such as including fonts, colors,  borders,and so on.Let’s have a look at an example:>>> # Import necessary style classes >>> from openpyxl.styles import Font,Color,Alignment,Border,Side,colors >>> # Create a few styles >>> Bold_Font = Font(bold=True) >>> Big_Red_Text = Font(color=colors.RED,size=20)  >>> Center_Aligned_Text = Alignment(horizontal="center")  >>> Double_Border_Side = Side(border_style="double")  >>> Square_Border = Border(top=double_border_side,  ... right=double_border_side,  ... bottom=double_border_side,  ... left=double_border_side)  >>> # Style some cells! >>> spreadsheet["A2"].font = Bold_Font >>> spreadsheet["A3"].font = Big_Red_Text >>> spreadsheet["A4"].alignment = Center_Aligned_Text >>> spreadsheet["A5"].border = Square_Border >>> workbook.save(filename="sample_styles.xlsx") If you want to apply multiple styles to one or several cells in our spreadsheets,you can use  NamedStyle class: >>> from openpyxl.styles import NamedStyle >>> # Let's create a style template for the header row >>> header = NamedStyle(name="header") >>> header.font = Font(bold=True) >>> header.border = Border(bottom=Side(border_style="thin")) >>> header.alignment = Alignment(horizontal="center",vertical="center") >>> # Now let's apply this to all first row (header) cells >>> header_row = sheet[1] >>> for cell in header_row: ... cell.style = header >>> workbook.save(filename="sample_styles.xlsx") Adding Charts to our SpreadsheetCharts are a good way to compute and understand large amounts of data quickly and easily. We have a lot of charts such as bar chart, pie chart, line chart, and so on. Let us start by creating a new workbook with some data:  1 from openpyxl import Workbook   2 from openpyxl.chart import BarChart,Reference   3    4 workbook = Workbook()   5 spreadsheet = workbook.active   6    7 # Let's create some sample sales data  8 rows = [   9    ["Product","Online","Store"],  10    [1,30,45],  11    [2,40,30],  12    [3,40,25],  13    [4,50,30],  14    [5,30,25],  15    [6,25,35],  16    [7,20,40],  17 ]  18   19 for row in rows: 20    spreadsheet .append(row)Now let us create a bar chart that will show the total number of sales per product: 22 chart = BarChart() 23 data = Reference(worksheet=sheet, 24                 min_row=1,  25                 max_row=8,  26                 min_col=2,  27                 max_col=3)  28   29 chart.add_data(data,titles_from_data=True) 30 spreadsheet .add_chart(chart, "E2") 31 32 workbook.save("chart.xlsx")You can also create a line chart by simply making some changes to the data:  1 import random   2 from openpyxl import Workbook   3 from openpyxl.chart import LineChart,Reference   4    5 workbook = Workbook()  6 sheet = workbook.active  7    8 # Let's create some sample sales data   9 rows= [ 10    ["", "January", "February", "March", "April",  11    "May", "June", "July", "August", "September",  12     "October", "November", "December"],  13    [1, ], 14    [2, ], 15    [3, ], 16 ]  17   18 for row in rows:  19    sheet.append(row) 20   21 for row in sheet.iter_rows(min_row=2, 22                           max_row=4, 23                           min_col=2, 24                           max_col=13): 25    for cell in row: 26        cell.value = random.randrange(5,100)There are numerous types of charts and various types of customizations you can apply to your spreadsheet to make it more attractive.Convert Python Classes to Excel SpreadsheetLet us now learn how to convert the Excel Spreadsheet data to Python classes.  Assume we have a database and we use some Object Relational mapping to map the database into Python classes and then export the objects into spreadsheets: from dataclasses import dataclass from typing import List @dataclass class Sale: id: str  quantity: int @dataclass  class Product:  id: str  name: str  sales:List[Sale] Now, to generate some random data, let’s assume that the above classes are stored in  db_classes.py file then:  1 import random   2    3 # Ignore these for now. You'll use them in a sec ;)   4 from openpyxl import Workbook   5 from openpyxl.chart import LineChart,Reference   6    7 from db_classes import Product,Sale   8    9 products_range = []  10   11 # Let's create 5 products 12 for idx in range(1,6): 13    sales = []  14   15    # Create 5 months of sales  16    for_in range(5): 17        sale_range = Sale(quantity=random.randrange(5,100)) 18        sales.append(sale) 19   20    product = Product(id=str(idx), 21                      name="Product %s" % idx, 22                      sales=sales) 23    products_range.append(product)By running this code, we will get 5 products in 5 months of sale with a random quantity of sales for each month. Now, we have  to convert this into a spreadsheet in which we need to iterate over the data: 25 workbook = Workbook()  26 spreadsheet = workbook.active  27 28 # Append column names first  29 spreadsheet.append(["Product ID","Product Name","Month 1",  30              "Month 2","Month 3","Month 4","Month 5"])  31   32 # Append the data  33 for product in products_range: 34    data = [product.id,product.name] 35    for sale in product.sales: 36        data.append(sale.quantity)  37    spreadsheet.append(data) This will create a spreadsheet with some data coming from your database. How to work with pandas to handle Spreadsheets?We have learned to work with Excel in Python because Excel is one of the most popular tools and finding a way to work with Excel is critical. Pandas is a great tool to work with Excel in Python. It has unique methods to read all kinds of data in an Excel file and we can export items back to Excel using it. To use it, at first we need to install pandas package: $ pip install pandas Then, let’s create a simple DataFrame:  1 import pandas as pd   2    3 data = {   4    "Product Name":["Product 1","Product 2"],   5    "Sales Month 1":[10, 20],   6    "Sales Month 2":[5, 35],   7 }   8 dataframe = pd.DataFrame(data)Now we have some data, and to convert it from a DataFrame into a worksheet we generally use .dataframe_to_rows(): 10 from openpyxl import Workbook 11 from openpyxl.utils.dataframe import  dataframe_to_rows  12   13 workbook = Workbook()  14 spreadsheet = workbook.active  15   16 for row in dataframe_to_rows(df, index=False,header=True):  17    spreadsheet .append(row) 18   19 workbook.save("pandas_spreadsheet.xlsx")We need to use  read_excel method to read data from pandas DataFrame object. excel_file =’movies.xls’  movies=pd.read_excel(excel_file) We can also use Excel file class to use multiple sheets from the same excel file: movies_sheets = [] for sheet in xlsx.sheet_names:     movies_sheets.append(xlsx.parse(sheet))     movies = pd.concat(movies_sheets))Indexes and columns allows you to access data from your DataFrame easily: >>> df.columns  Index(['marketplace', 'customer_id', 'review_id', 'product_id',        'product_parent', 'product_title', 'product_category', 'star_rating',         'helpful_votes', 'total_votes', 'vine', 'verified_purchase',         'review_headline', 'review_body', 'review_date'],        dtype='object') >>> # Get first 10 reviews' star rating  >>> df["star_rating"][:10] R3O9SGZBVQBV76    5 RKH8BNC3L5DLF     5  R2HLE8WKZSU3NL    2  R31U3UH5AZ42LL    5  R2SV659OUJ945Y    4  RA51CP8TR5A2L     5  RB2Q7DLDN6TH6     5  R2RHFJV0UYBK3Y    1  R2Z6JOQ94LFHEP    5  RX27XIIWY5JPB     4  Name: star_rating, dtype: int64 >>> # Grab review with id "R2EQL1V1L6E0C9", using the index >>> df.loc["R2EQL1V1L6E0C9"] marketplace               US customer_id         15305006  review_id     R2EQL1V1L6E0C9  product_id        B004LURNO6  product_parent     892860326  review_headline   Five Stars  review_body          Love it  review_date       2015-08-31  Name: R2EQL1V1L6E0C9, dtype: object Summary In this article we have covered: How to extract information from spreadsheets  How to create Spreadsheets in different ways How to customize a spreadsheet by adding filters, styles, or charts and so on How to use pandas to work with spreadsheets Now you are well aware of the different types of implementations you can perform with spreadsheets using Python. However, if you are willing to gather more information on this topic, you can always rely on the official documentation of openpyxl. To gain more knowledge about Python tips and tricks, check out our Python tutorial. To gain mastery over Python coding,join ourPython certification course.  
Rated 4.5/5 based on 22 customer reviews

How to Work with Excel Spreadsheets using Python

14074
How to Work with Excel Spreadsheets using Python

Excel is considered as one of the most popular and widely used spreadsheet applications developed by Microsoft. You can organize, analyze and store your data into tabular sheets with the help of Excel. From analysts and sales managers, to CEOs, professionals from every field use Excel for creating quick statistics and for data crunching.

Spreadsheets are commonly used in the present world because of their intuitive nature and the ability to handle large datasets. Most importantly, they can work without any prior technical background.

Finding different ways to work with Excel using code is essential since working with data and in Python has some serious advantages in comparison with Excel’s UI. Developers of Python have implemented ways to read, write and manipulate Excel documents.

You can check the quality of your spreadsheet application by going over the checklist below:

  • Is the spreadsheet able to represent static data?
  • Is the spreadsheet able to mix data, calculations, and reports?
  • Is the data in your spreadsheet complete and consistent in nature?
  • Does the spreadsheet have an organized worksheet structure?

This checklist will help you in verifying the qualitative nature of the spreadsheet application you’re going to work on.

Practical Applications  

In this article, we would be using openpyxl to work on data. With the help of this module, you can extract data from a database into an Excel spreadsheet or you can also convert an Excel spreadsheet into a programmatic format. There can be a lot of possible situations where you might feel the need to use a package like openpyxl. Let us discuss a few of them to get a comprehensive overview of it.

Importing New Products Into a Database 

Consider yourself working in an online store company. When they want to add new products to the online store, they make an Excel spreadsheet with a few hundred rows along with the name of the product, description, price and a few more basic information and then they give it to you. 

Now, if you want to import this particular data, you need to iterate over each row of the spreadsheet and then add each of the products into the database of the online store.[Text Wrapping Break] 

Exporting Database Data Into a Spreadsheet

Consider you have a Database table. In this particular table, you have collected information of all your users which includes their name, contact number, email address, and so forth. Now, the Marketing Team is willing to collectively contact all the users and promote a new product of the company. However, neither do they have access to the Database nor they have any idea about using SQL to extract the information. 

In this situation, openpyxl comes to play. You can use it effectively to iterate over each User record and transform the required information into an Excel spreadsheet.    

Appending Information to an Existing Spreadsheet

Consider the same online store example we discussed above. You have an Excel spreadsheet with a list of users and your job is to append to each row the total amount they have spent in your store.

In order to perform this, you have to read the spreadsheet first and then iterate through each row and fetch the total amount spent from the Database. Finally, you need to write it back to the spreadsheet.

Starting openpyxl

You can install the openpyxl package using pip. Open your terminal and write the following command: 

$ pip install openpyxl

After you have installed the spreadsheet, you can make up your own simple spreadsheet: 

from openpyxl import Workbook

workbook = Workbook()
spreadsheet = workbook.active

spreadsheet["A1"] = "Hello"
spreadsheet["B1"] = "World!"

workbook.save(filename="HelloWorld.xlsx")

How to Read Excel Spreadsheets with openpyxl 

Let us start with the most important thing that you can do with a spreadsheet,i.e. read it. We will be using a Watch Sample Dataset which contains a list of 100 watches with information like product name, product ID, review and so forth.  

A Simple Way to Read an Excel Spreadsheet 

Let us start with opening our sample spreadsheet:

>>> from openpyxl import load_workbook
>>> workbook = load_workbook(filename="sample.xlsx")
>>> workbook.sheetnames
['Sheet 1']

>>> spreadsheet = workbook.active
>>> spreadsheet
<Worksheet "Sheet 1">

>>> spreadsheet.title

In the example code above, we open the spreadsheet using load_workbook and then we check all the sheets that are available to work with using workbook.sheetnames. Then Sheet 1 is automatically selected using workbook.active since it is the first sheet available. This is the most common way of opening a spreadsheet.  

Now, let us see the code to retrieve data from the spreadsheet: 

>>> spreadsheet["A1"]
<Cell 'Sheet 1'.A1>

>>> spreadsheet["A1"].value
'marketplace'

>>> spreadsheet["F10"].value
"G-Shock Men's Grey Sport Watch"

You can retrieve the actual value and the cell value  both. To get the actual value, use .value and to get the cell, you can use .cell():

>>> spreadsheet.cell(row=10, column=6)
<Cell 'Sheet 1'.F10>

>>> spreadsheet.cell(row=10, column=6).value
"G-Shock Men's Grey Sport Watch"

Importing Data from a Spreadsheet 

In this section, we will discuss how to iterate through the data, and about conversion into a more useful format using Python.

Let us first start with iterating through the data. There are a number of iterating methods that depend solely on the user.

You can slice the data with a combination of rows and columns:

>>> spreadsheet["A1:C2"]
((<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>), 
 (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)) 

You can also iterate through the dataset by ranging between rows and columns: 

>>> # Get all cells from column A 
>>> spreadsheet["A"]
(<Cell 'Sheet 1'.A1>, 
 <Cell 'Sheet 1'.A2>, 
 ... 
 <Cell 'Sheet 1'.A99>, 
 <Cell 'Sheet 1'.A100>) 

>>> # Get all cells for a range of columns 
>>> spreadsheet["A:B"] 
((<Cell 'Sheet 1'.A1>, 
  <Cell 'Sheet 1'.A2>, 
  ... 
  <Cell 'Sheet 1'.A99>, 
  <Cell 'Sheet 1'.A100>), 
 (<Cell 'Sheet 1'.B1>, 
  <Cell 'Sheet 1'.B2>, 
  ... 
  <Cell 'Sheet 1'.B99>, 
  <Cell 'Sheet 1'.B100>)) 

>>> # Get all cells from row 5 
>>> spreadsheet[5]
(<Cell 'Sheet 1'.A5>,
 <Cell 'Sheet 1'.B5>,
 ... 
 <Cell 'Sheet 1'.N5>,
 <Cell 'Sheet 1'.O5>)

>>> # Get all cells for a range of rows
>>> spreadsheet[5:6]
((<Cell 'Sheet 1'.A5>, 
  <Cell 'Sheet 1'.B5>, 
  ... 
  <Cell 'Sheet 1'.N5>, 
  <Cell 'Sheet 1'.O5>), 
 (<Cell 'Sheet 1'.A6>, 
  <Cell 'Sheet 1'.B6>, 
  ... 
  <Cell 'Sheet 1'.N6>, 
  <Cell 'Sheet 1'.O6>)) 

Python offers arguments by which you can set limits to the iteration with the help of Python generators like .iter_rows() and .iter_cols()

>>> for row in spreadsheet.iter_rows(min_row=1,
...                           max_row=2,
...                           min_col=1,
...                           max_col=3):
...    print(row)
(<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>) 
(<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)

>>> for column in spreadsheet.iter_cols(min_row=1...                              max_row=2,
...                              min_col=1,
...                              max_col=3):
... print(column)
(<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>) 
(<Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.B2>) 
(<Cell 'Sheet 1'.C1>, <Cell 'Sheet 1'.C2>) 

You can also add Boolean values_only in the above example and set it to True to get the values of cell: 

>>> for value in spreadsheet.iter_rows(min_row=1...                          max_row=2...                          min_col=1...                          max_col=3...                          values_only=True):
...    print(value)
('marketplace', 'customer_id', 'review_id') 
('US', 3653882, 'R3O9SGZBVQBV76')

Since we are now done with iterating the data, let us now manipulate data using Python’s primitive data structures. 

Consider a situation where you want to extract information of a product from the sample spreadsheet and then store it into the dictionary. The key to the dictionary would be the product ID.   

Convert Data into Python classes

To convert data into Python data classes, let us first decide what we want to store and how to store it.  

The two essential elements that can be extracted from the data are as follows:

Convert Data into Python classes

                                                     1. Products                                             2. Review

                                                          • ID                                                         • ID
                                                          • Title                                                     • Customers ID
                                                          • Parent                                                 • Headline
                                                          • Category                                            • Body
                                                                                                                         • Date

Let us implement the two elements: 

import datetime
from dataclasses import dataclass

@dataclass
class Product:
    id: str
    parent: str
    title: str
    category: str
@dataclass
class Review:
    id: str
    customer_id: str
    stars: int
    headline: str
    body: str 
    date: datetime.datetime 

The next step is to create a mapping between columns and the required fields: 

>>> for value in spreadsheet.iter_rows(min_row=1,
...                             max_row=1,
...                             values_only=True):
...    print(value)
('marketplace', 'customer_id', 'review_id', 'product_id', ...)

>>> # Or an alternative
>>> for cell in sheet[1]:
...  print(cell.value)
marketplace
Customer_ID
Review_ID
Product_ID
Product_Parent
...

Finally, let us convert the data into new structures which will parse the data in spreadsheet into a list of products and review objects: 

from datetime import datetime 
from openpyxl import load_workbook 
from classes import Product,Review 
from mapping import PRODUCT_ID,PRODUCT_PARENT,PRODUCT_TITLE, \
    PRODUCT_CATEGORY,REVIEW_DATE,REVIEW_ID,REVIEW_CUSTOMER, \
    REVIEW_STARS,REVIEW_HEADLINE,REVIEW_BODY

# Using the read_only method since you're not gonna be editing the spreadsheet 
workbook = load_workbook(filename="watch_sample.xlsx",read_only=True) 
spreadsheet = workbook.active

products = []
reviews = []

# Using the values_only because you just want to return the cell value
for row in spreadsheet .iter_rows(min_row=2, values_only=True): 
    product = Product(id=row[PRODUCT_ID], 
                      parent=row[PRODUCT_PARENT], 
                      title=row[PRODUCT_TITLE], 
                      category=row[PRODUCT_CATEGORY]) 
    products.append(product)
# You need to parse the date from the spreadsheet into a datetime format
spread_date = row[REVIEW_DATE] 
parsed_date = datetime.strptime(spread_date,"%Y-%m-%d")

review = Review(id=row[REVIEW_ID],
              Customer_ID=row[REVIEW_CUSTOMER],
              stars=row[REVIEW_STARS],
              headline=row[REVIEW_HEADLINE],
              body=row[REVIEW_BODY],
              date=parsed_date)
reviews.append(review)

print(products[0])
print(reviews[0])

After you execute the code, you will get an output that looks like this:

Product(id='A90FALZ1ZC',parent=937111370,...)
Review(id='D3O9OGZVVQBV76',customer_id=3903882,...)

Appending Data 

To understanding how to append data, let us hover back to the first sample spreadsheet. We will open the document and append some data to it: 

from openpyxl import load_workbook

# Start by opening the spreadsheet and selecting the main sheet 
workbook = load_workbook(filename="hello_world.xlsx")
spreadsheet = workbook.active

# Write what you want into a specific cell
spreadsheet["C1"]="Manipulating_Data ;)"

# Save the spreadsheet
workbook.save(filename="hello_world_append.xlsx"

If you open your Excel file, you will notice the additional Manipulating_Data being added to an adjacent cell. 

Writing Excel Spreadsheets With openpyxl 

A spreadsheet is a file that helps to store data in specific rows and columns. We can calculate and store numerical data and also perform computation using formulas. 

So, let’s begin with some simple Spreadsheets and understand what each line means. 

Creating our first simple Spreadsheet

 1 from openpyxl import Workbook
 2  
 3 filename = "first_program.xlsx"
 4  
 5 workbook = Workbook()
 6 spreadsheet = workbook.active
 7  
 8 sheet["A1"] = "first"
 9 sheet["B1"] = "program!"
10  
11 workbook.save(filename=filename)

Line 5: In order to make a Spreadsheet, at first,  we have to create an Empty workbook to perform further operations. 

Lines 8 and 9 : We can add data to a specific cell as per our requirement. In this example, we can see that two values “first” and “program” have been added to specific cells in the sheet. 

Line 11: The line shows how to save data after all the operations we have done. 

Basic Spreadsheet Operations 

Before going to the difficult coding part, at first we have to build our building blocks like how to add and update values, how to manage rows and columns, adding filters, styles or formulas in a Spreadsheet. 

We have already explained the following code by which we can add values to a Spreadsheet: 

>>> spreadsheet["A1"] = "the_value_we_want_to_add"

There is another way that we can add values to Spreadsheet: 

>>> cell = sheet["A1"]
>>> cell
<Cell 'Sheet'.A1>

>>> cell.value
'hello'

>>> cell.value = "hey"
>>> cell.value
'hey'

Line 1: In the first line at first we have declared the cell and updated its value. 

Line 5: We have printed the value of the cell as “first”  because  in the first program we have already assigned sheet["A1"]with “first” 

Line 8 : We have updated the value of the cell as "second"by simply assigning it to cell.value

Lines 9 : In this line, we have just printed the updated value of cell. 

Finally, you have to save all the operations you have performed into the spreadsheet once you call workbook.save().

If  the cell didn’t exist while adding a value ,then openpyxl creates a cell:

>>> # Before, our spreadsheet has only 1 row
>>> print_rows()
('first', 'program!')

>>> # Try adding a value to row 10
>>> spreadsheet["B10"] = "test"
>>> print_rows()
('first', 'program!') 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, None) 
(None, 'test') 

Managing Rows and Columns in Spreadsheet 

Insertion or deletion of rows (adding or removing elements of rows /columns) is one of the most basic operations in Spreadsheet. In openpyxl.We can perform these operations by simply calling these methods and passing its arguments. 

  • .insert_rows()
  • .delete_rows()
  • .insert_cols()
  • .delete_cols()

We can pass 2 types of arguments to the methods :  

  1. idx 
  2. amount 

Idx stands for index position and amount refers to the number of values we can store in the Spreadsheet. 

Using our basic knowledge based on the first  simple program, let’s see how we can use these methods inside the program: 

>>> print_rows()
('first', 'program!')

>>> # Insert a column at the first position before column 1 ("A")
>>> spreadsheet.insert_cols(idx=1)
>>> print_rows()
(None, 'first', 'program!')

>>> # Insert 5 columns in  between column 2 ("B") and 3 ("C")
>>> spreadsheet.insert_cols(idx=3,amount=5)
>>> print_rows()
(None, 'first', None, None, None, None, None, 'program!')

>>> # Delete the created columns
>>> spreadsheet.delete_cols(idx=3,amount=5)
>>> v.delete_cols(idx=1)
>>> print_rows()
('first', 'program!')

>>> # Insert a new row in the beginning
>>> spreadsheet.insert_rows(idx=1)
>>> print_rows()
(None, None)
('first', 'program!')

>>> # Insert 3 new rows in the beginning 
>>> spreadsheet.insert_rows(idx=1,amount=3)
>>> print_rows()
(None, None) 
(None, None) 
(None, None) 
(None, None) 
('first', 'program!')

>>> # Delete the first 4 rows 
>>> spreadsheet.delete_rows(idx=1,amount=4>>> print_rows() 
('first', 'program!') 

Managing Sheets

We have seen the following recurring piece of code in our previous examples .This is one of the ways of selecting the default sheet from the Spreadsheet: 

spreadsheet = workbook.active

However, if you want to open a spreadsheet with multiple sheets, you can write the following command: 

>>> # Let's say you have two sheets: "Products" and "Company Sales"
>>> workbook.sheetnames
['Products', 'Company Sales']

>>> # You can select a sheet using its title
>>> Products_Sheet = workbook["Products"]
>>> Sales_sheet = workbook["Company Sales"]

If we want to change the title of the Sheet, execute the following code: 

>>> workbook.sheetnames
['Products', 'Company Sales']

>>> Products_Sheet = workbook["Products"]
>>> Products_Sheet.title = "New Products"

>>> workbook.sheetnames
['New Products', 'Company Sales']

We can CREATE / DELETE Sheets also with the help of two methods - .create_sheet() and  .remove()

>>> #To print the available sheet names
>>> workbook.sheetnames 
['Products', 'Company Sales']

>>> #To create a new Sheet named "Operations"
>>> Operations_Sheet = workbook.create_sheet("Operations")

>>> #To print the updated available sheet names
>>> workbook.sheetnames
['Products', 'Company Sales', 'Operations']

>>> # To define the position Where we want to create the Sheet(here “HR” sheet is created at the first position .Here index 0 represents the first position)
>>> HR_Sheet = workbook.create_sheet("HR",0)

>>> #To again  print the updated available sheet names
>>> workbook.sheetnames
['HR', 'Products', 'Company Sales', 'Operations']

>>> # To remove them,we just have to send the sheet names as an argument which we want to delete to the method  .remove() 
>>> workbook.remove(Operations_Sheet)
>>> workbook.sheetnames
['HR', 'Products', 'Company Sales']

>>> #To delete hr_sheet
>>> workbook.remove(hr_sheet)
>>> workbook.sheetnames
['Products', 'Company Sales']

Adding Filters to the Spreadsheet 

We can use openpyxl to add filters in our Spreadsheet but when we open our Spreadsheet, the data won’t be rearranged according to these sorts and filters. 

When you’re programmatically creating a spreadsheet and it is going to be sent and used by someone else, it is a good practice to add different filters and allow people to use it afterward. 

In the code below there is a simple example which shows how to add a simple filter to your spreadsheet: 

>>> # Check the used spreadsheet space using the attribute "dimensions"
>>> spreadsheet.dimensions
'A1:O100'

>>> spreadsheet.auto_filter.ref="A1:O100"
>>> workbook.save(filename="watch_sample_with_filters.xlsx")

Adding Formulas to the Spreadsheet 

Formulas are one of the most commonly used and powerful features of spreadsheets. By using formulas, you can solve various mathematical equations with the additional support of openpyxl which makes those calculations as simple as editing a specific cell’s value.

The list of formulas supported by openpyxl are:

>>> from openpyxl.utils import FORMULAE
>>> FORMULAE
frozenset({'ABS',
           'AMORLINC',
           'ACCRINT', 
           'ACOS', 
           'ACCRINTM', 
           'ACOSH', 
            ...,       
           'AND', 
           'YEARFRAC', 
           'YIELDDISC', 
           'AMORDEGRC', 
           'YIELDMAT', 
           'YIELD', 
           'ZTEST'}) 

Let’s add some formulas to our spreadsheet. 

Let’s check the average star rating of  the 99 reviews within the spreadsheet: 

>>> # Star rating is in column "H" 
>>> spreadsheet["P2"] = "=AVERAGE(H2:H100)"
>>> workbook.save(filename = "first_example.xlsx")

Now, if we open your spreadsheet and go to cell P2, you can see the value to be 4.18181818181818.  

Similarly, we can use this methodology to include any formulas for our requirements in our spreadsheet. For example, if we want to count the number of helpful reviews: 

>>> # The helpful votes  counted in column "I" 
>>> spreadsheet["P3"] = '=COUNTIF(I2:I100, ">0")' 
>>> workbook.save(filename = "first_example.xlsx") 

Adding Styles to the Spreadsheet

It is not so important and usually, we don’t use this in everyday code but for the sake of completeness, we will also understand this with the following example.

Using openpyxl, we get multiple styling options such as including fonts, colors,  borders,and so on.

Let’s have a look at an example:

>>> # Import necessary style classes
>>> from openpyxl.styles import Font,Color,Alignment,Border,Side,colors

>>> # Create a few styles
>>> Bold_Font = Font(bold=True)
>>> Big_Red_Text = Font(color=colors.RED,size=20>>> Center_Aligned_Text = Alignment(horizontal="center">>> Double_Border_Side = Side(border_style="double">>> Square_Border = Border(top=double_border_side, 
...                       right=double_border_side, 
...                       bottom=double_border_side, 
...                       left=double_border_side) 

>>> # Style some cells!
>>> spreadsheet["A2"].font = Bold_Font
>>> spreadsheet["A3"].font = Big_Red_Text
>>> spreadsheet["A4"].alignment = Center_Aligned_Text
>>> spreadsheet["A5"].border = Square_Border
>>> workbook.save(filename="sample_styles.xlsx") 

If you want to apply multiple styles to one or several cells in our spreadsheets,you can use  NamedStyle class: 

>>> from openpyxl.styles import NamedStyle

>>> # Let's create a style template for the header row
>>> header = NamedStyle(name="header")
>>> header.font = Font(bold=True)
>>> header.border = Border(bottom=Side(border_style="thin"))
>>> header.alignment = Alignment(horizontal="center",vertical="center")

>>> # Now let's apply this to all first row (header) cells
>>> header_row = sheet[1]
>>> for cell in header_row:
... cell.style = header

>>> workbook.save(filename="sample_styles.xlsx") 

Adding Charts to our Spreadsheet

Charts are a good way to compute and understand large amounts of data quickly and easily. We have a lot of charts such as bar chart, pie chart, line chart, and so on. 

Let us start by creating a new workbook with some data: 

 1 from openpyxl import Workbook 
 2 from openpyxl.chart import BarChart,Reference 
 3  
 4 workbook = Workbook() 
 5 spreadsheet = workbook.active 
 6  
 7 # Let's create some sample sales data 
 8 rows = [ 
 9    ["Product","Online","Store"], 
10    [1,30,45], 
11    [2,40,30], 
12    [3,40,25], 
13    [4,50,30], 
14    [5,30,25], 
15    [6,25,35], 
16    [7,20,40], 
17 ] 
18  
19 for row in rows:
20    spreadsheet .append(row)

Now let us create a bar chart that will show the total number of sales per product: 

22 chart = BarChart()
23 data = Reference(worksheet=sheet,
24                 min_row=1, 
25                 max_row=8, 
26                 min_col=2, 
27                 max_col=3) 
28  
29 chart.add_data(data,titles_from_data=True)
30 spreadsheet .add_chart(chart, "E2")
31
32 workbook.save("chart.xlsx")

You can also create a line chart by simply making some changes to the data: 

 1 import random 
 2 from openpyxl import Workbook 
 3 from openpyxl.chart import LineChart,Reference 
 4  
 5 workbook = Workbook()
 6 sheet = workbook.active
 7  
 8 # Let's create some sample sales data 
 9 rows= [
10    ["", "January", "February", "March", "April", 
11    "May", "June", "July", "August", "September", 
12     "October", "November", "December"], 
13    [1, ],
14    [2, ],
15    [3, ],
16 ] 
17  
18 for row in rows: 
19    sheet.append(row)
20  
21 for row in sheet.iter_rows(min_row=2,
22                           max_row=4,
23                           min_col=2,
24                           max_col=13):
25    for cell in row:
26        cell.value = random.randrange(5,100)

There are numerous types of charts and various types of customizations you can apply to your spreadsheet to make it more attractive.

Convert Python Classes to Excel Spreadsheet

Let us now learn how to convert the Excel Spreadsheet data to Python classes.  

Assume we have a database and we use some Object Relational mapping to map the database into Python classes and then export the objects into spreadsheets: 

from dataclasses import dataclass
from typing import List

@dataclass
class Sale:
    id: str 
    quantity: int

@dataclass 
class Product: 
    id: str 
    name: str 
    sales:List[Sale] 

Now, to generate some random data, let’s assume that the above classes are stored in  db_classes.py file then: 

 1 import random 
 2  
 3 # Ignore these for now. You'll use them in a sec ;) 
 4 from openpyxl import Workbook 
 5 from openpyxl.chart import LineChart,Reference 
 6  
 7 from db_classes import Product,Sale 
 8  
 9 products_range = [] 
10  
11 # Let's create 5 products
12 for idx in range(1,6):
13    sales = [] 
14  
15    # Create 5 months of sales 
16    for_in range(5):
17        sale_range = Sale(quantity=random.randrange(5,100))
18        sales.append(sale)
19  
20    product = Product(id=str(idx),
21                      name="Product %s" % idx,
22                      sales=sales)
23    products_range.append(product)

By running this code, we will get 5 products in 5 months of sale with a random quantity of sales for each month. 

Now, we have  to convert this into a spreadsheet in which we need to iterate over the data: 

25 workbook = Workbook() 
26 spreadsheet = workbook.active 
27
28 # Append column names first 
29 spreadsheet.append(["Product ID","Product Name","Month 1", 
30              "Month 2","Month 3","Month 4","Month 5"]) 
31  
32 # Append the data 
33 for product in products_range:
34    data = [product.id,product.name]
35    for sale in product.sales:
36        data.append(sale.quantity) 
37    spreadsheet.append(data) 

This will create a spreadsheet with some data coming from your database. 

How to work with pandas to handle Spreadsheets?

How to work with pandas to handle Spreadsheets

We have learned to work with Excel in Python because Excel is one of the most popular tools and finding a way to work with Excel is critical. Pandas is a great tool to work with Excel in Python. It has unique methods to read all kinds of data in an Excel file and we can export items back to Excel using it. 

To use it, at first we need to install pandas package: 

$ pip install pandas 

Then, let’s create a simple DataFrame: 

 1 import pandas as pd 
 2  
 3 data = { 
 4    "Product Name":["Product 1","Product 2"], 
 5    "Sales Month 1":[10, 20], 
 6    "Sales Month 2":[5, 35], 
 7 } 
 8 dataframe = pd.DataFrame(data)

Now we have some data, and to convert it from a DataFrame into a worksheet we generally use .dataframe_to_rows()

10 from openpyxl import Workbook
11 from openpyxl.utils.dataframe import  dataframe_to_rows 
12  
13 workbook = Workbook() 
14 spreadsheet = workbook.active 
15  
16 for row in dataframe_to_rows(df, index=False,header=True): 
17    spreadsheet .append(row)
18  
19 workbook.save("pandas_spreadsheet.xlsx")

We need to use  read_excel method to read data from pandas DataFrame object. 

excel_file =’movies.xls’ 
movies=pd.read_excel(excel_file) 

We can also use Excel file class to use multiple sheets from the same excel file: 

movies_sheets = []
for sheet in xlsx.sheet_names:
    movies_sheets.append(xlsx.parse(sheet))
    movies = pd.concat(movies_sheets))

Indexes and columns allows you to access data from your DataFrame easily: 

>>> df.columns 
Index(['marketplace', 'customer_id', 'review_id', 'product_id', 
       'product_parent', 'product_title', 'product_category', 'star_rating', 
       'helpful_votes', 'total_votes', 'vine', 'verified_purchase', 
       'review_headline', 'review_body', 'review_date'], 
      dtype='object') 
>>> # Get first 10 reviews' star rating 
>>> df["star_rating"][:10]
R3O9SGZBVQBV76    5 
RKH8BNC3L5DLF     5 
R2HLE8WKZSU3NL    2 
R31U3UH5AZ42LL    5 
R2SV659OUJ945Y    4 
RA51CP8TR5A2L     5 
RB2Q7DLDN6TH6     5 
R2RHFJV0UYBK3Y    1 
R2Z6JOQ94LFHEP    5 
RX27XIIWY5JPB     4 
Name: star_rating, dtype: int64

>>> # Grab review with id "R2EQL1V1L6E0C9", using the index
>>> df.loc["R2EQL1V1L6E0C9"]
marketplace               US 
customer_id         15305006 
review_id     R2EQL1V1L6E0C9 
product_id        B004LURNO6 
product_parent     892860326 
review_headline   Five Stars 
review_body          Love it 
review_date       2015-08-31 
Name: R2EQL1V1L6E0C9, dtype: object 

Summary 

In this article we have covered: 

  • How to extract information from spreadsheets  
  • How to create Spreadsheets in different ways 
  • How to customize a spreadsheet by adding filters, styles, or charts and so on 
  • How to use pandas to work with spreadsheets 

Now you are well aware of the different types of implementations you can perform with spreadsheets using Python. However, if you are willing to gather more information on this topic, you can always rely on the official documentation of openpyxl. To gain more knowledge about Python tips and tricks, check out our Python tutorial. To gain mastery over Python coding,join ourPython certification course

 

Priyankur

Priyankur Sarkar

Data Science Enthusiast

Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.

Join the Discussion

Your email address will not be published. Required fields are marked *

Suggested Blogs

How to Build a Python GUI Application With wxPython

A Graphical User Interface or GUI is a user interface that includes graphical elements that enable a person to communicate with electronic devices like computers, hand-held devices, and other appliances. It displays information using icons, menus, and graphics. They are handled with the help of a pointing device such as a mouse, trackball or a stylus.A GUI is basically an application that has windows, buttons and lots of other widgets that allows a user to interact with your application. A web browser is a common example of a GUI that has buttons, tabs, and the main window and where the content is displayed.It was developed by the Xerox Palo Alto research laboratory in the late 1970s. Today, every OS has its own GUI. Software applications make use of these and develop their own GUIs.Python contains many GUI toolkits, out which Tkinter, wxPython, and PyQt are the important ones. All these toolkits have the ability to work with Windows, macOS, and Linux with the additional quality of working on mobile phones.How to get started with wxPython?wxPython was first released in the year 1998. It is an open-source cross-platform Python GUI toolkit used as a wrapper class around a C++ library called wxWidgets. The main feature of wxPython which distinguishes itself from other toolkits like PyQt and Tkinter is that it uses actual widgets on the native platform where required. This allows the wxPython applications to look native to the operating system in which it is running.The wxPython toolkit contains a lot of core widgets along with many custom widgets which you can download from the Extra Files section of the wxPython official page.Here, there is a download of the wxPython demo package which is an application demonstrating the robustness of the widgets included with wxPython. The main advantage of this demo is that you can view it one tab and run it in another and it also allows to edit and re-run the code to observe the changes.Installing wxPythonYou will be using the latest wxPython release, wxPython 4, which is also called the wxPython’s Project Phoenix. It is a new implementation of wxPython that aims at improving the speed, maintainability, and extensibility of wxPython. The wxPython 3 and wxPython 2 were built only for Python 2. The maintainer of wxPython rejected a lot of aliases and cleaned up a lot of code to make wxPython more easy and Pythonic.If you are migrating from an older version of wxPython to wxPython 4, take a look at the following references:Classic version vs project PhoenixPhoenix Migration GuideThe Phoenix version is compatible with both Python 2.7 and Python 3. You can use pip to install wxPython 4:$ pip install wxpythonYou will get a prerequisites section on the Github page of wxPython which will provide information to install wxPython on Linux systems.You can also look into the Extras Linux section to learn about the Python wheels for both GTK2 and GTK3 versions. To install one of the wheels, use the command below:$ pip install -U -f https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-18.04/ wxPythonRemember to modify the command to match with the version of Linux.Components of GUIAs mentioned earlier, GUI is nothing but an interface that allows user interaction.Common components of the user interfaces:Main Window.Menu.Toolbar.Buttons.Text Entry.Labels.These items are generally known as widgets. wxPython supports many other common widgets and many custom widgets that are arranged in a logical manner by a developer to allow user interaction.Event LoopsA GUI works by waiting for the user to perform an action. This is known as an event. An event occurs when something is typed by the user or when the user uses their mouse to press a button or some widget while the application is in focus.The GUI toolkit runs an infinite loop called an event loop underneath the covers. The task of the event loop is to act on occurred events on the basis of what the developer has coded the application to do. The application ignores the event when it is not able to catch it.When you are programming a graphical user interface, make sure to attach the widgets to event handlers in order to make your application do something.You can also block an event loop to make the GUI unresponsive which will appear to freeze to the user. This is a special consideration for you to keep in mind while working with event loops. Launch a special thread or process whenever a GUI takes longer than a quarter of a second to launch a process.The frameworks of wxPython contain special thread-safe methods that you can use to communicate back to your application. This informs the thread is finished or given an update.How to create a Skeleton Application?An application skeleton is basically used for prototyping. It is a user interface comprising of widgets that do not contain event handlers. You just need to create the GUI and show it to the stakeholders for signing off and avoid spending time on the backend logic.An example of creating a Hello World application with Python:import wx   application = wx.App() framework = wx.Frame(parent=None, title='Hello World') framework.Show() app.MainLoop()In the example above, there are two parts of the program – wx.App and wx.Frame. The former one is wxPython’s application object which is basically required for running the GUI. It initiates the .MainLoop() which is the event loop you have learned earlier.The latter part creates a window for user interaction. It informs wxPython that the frame has no parent and its title is Hello World. If you run the code above, this is how it will look like:The application will look different if you execute it in Mac or Linux.Note: Mac users may get the following message: This program needs access to the screen. Please run with a Framework build of Python, and only when you are logged in on the main display of your Mac. If you see this message and you are not running in a virtualenv, then you need to run your application with pythonw instead of python. If you are running wxPython from within a virtualenv, then see the wxPython wiki for the solution.The minimize, maximize and exit will be included in the wx.Frame by default. However, most wxPython code will require you to make the wx.Frame as a subclass and other widgets in order to grab the full power of the toolkit.Let us rewrite the code using class:import wx   class MyFramework(wx.Frame):     def frame(self):         super().frame(parent=None, title='Hello World') self.Show()   if __name__ == '__main__':     application = wx.App()     framework = MyFramework() application.MainLoop()This code can be used as a template for your application.Widgets in wxPythonThe wxPython toolkit allows you to create rich applications from more than one hundred widgets. But it can be very daunting to choose the perfect widget from such a large number, so wxPython has included a wxPython Demo which contains a search filter which will help you to find the right widget from the list.Now, let us add a button and allow the user to enter some text by adding a text field:import wx   class MyFramework(wx.Frame):     def frame(self):         super().frame(parent=None, title='Hello World')         panel = wx.Panel(self)   self.text_ctrl = wx.TextCtrl(panel, pos=(5, 5)) my_button = wx.Button(panel, label='Press Me', pos=(5, 55))   self.Show()   if __name__ == '__main__':     application = wx.App()     framework = MyFramework() application.MainLoop()When you run the code, the application will look like this:The first widget that is recommended on Windows is wx.Panel. It makes the background color of the frame as the right shade of gray. Tab traversal is disabled without a Panel on Windows.If the panel is the sole child of the frame, it will be expanded automatically to fill the frame with itself. The next thing you need to do is to add a wx.TextCtrl to the panel. The first argument is always that which parent the widget should go to for almost all widgets. So if you are willing to keep the text control and the button on the top of the panel, it is the parent you need to specify.You also need to inform wxPython about the position of the widget. You can do it using the pos parameter. The default location is (0,0) which is actually at the upper left corner of the parent. So to change the text control, you can change the position of the frame, you can shift its left corner 5 pixels from the left(x) and 5 pixels from the top(y). Finally, you can add your button to the panel and label it. You can also set the y-coordinate to 55 to prevent the overlapping of widgets.Absolute PositioningAbsolute positioning is the technique found in most GUI toolkits by which you can provide the exact coordinates for your widget’s position.There might be situations when you need to keep track of all your widgets and relocate the widgets in case of a complex application. This can be a really difficult thing to do. However, most modern-day toolkits provide a solution for this, which we’ll study next.Sizers (Dynamic Sizing)Sizers are methods to define the control layout in dialogs in wxPython. They have the ability to create dialogs that are not dependent on the platform. They manage the positioning of the widgets and adjust them when the user resizes the application window.Some of the primary types of sizers that are commonly used are:wx.BoxSizerwx.GridSizerwx.FlexGridSizerAn example code to add wx.BoxSizer to the previous code:import wx   class MyFramework(wx.Frame):     def frame(self):         super().frame(parent=None, title='Hello World')         panel = wx.Panel(self)         my_sizer = wx.BoxSizer(wx.VERTICAL)         self.text_ctrl = wx.TextCtrl(panel) my_sizer.Add(self.text_ctrl, 0, wx.ALL | wx.EXPAND, 5)         my_button = wx.Button(panel, label='Press Me') my_sizer.Add(my_btn, 0, wx.ALL | wx.CENTER, 5)         panel.SetSizer(my_sizer)         self.Show()   if __name__ == '__main__':     application = wx.App()     framework = MyFramework() application.MainLoop() In the example above, an instance of wx.BoxSixer is created and passed to wx.VERTICAL which is actually the orientation that widgets are included in the sizer. The widgets will be added in a vertical manner from top to bottom. You can also set the BoxSizer’s orientation to wx.HORIZONTAL. In this case, the widgets are added from left to right.  You can use .Add() to a widget to a sizer which takes maximum five arguments as follows: window ( the widget )- This is the widget that is added to the sizer. proportion - It sets how much space corresponding to other widgets in the sizer will the widget should take. By default, the proportion is zero which leaves the wxPython to its original proportion. flag - It allows you to pass in multiple flags by separating them with a pipe character: |. The text control is added using wx.ALL and wx.EXPAND flags. The wx.ALL flag adds a border on all sides of the widget. On the other hand, wx.EXPAND expands the widgets as much as the sizer can be expanded. border - This parameter informs wxPython about the number of pixels of border needed around the widget.  userData - It is a rare argument that is used for resizing in case of complex applications. However, in this example, the wx.EXPAND flag is replaced with wx.CENTER to display the button in the center on-screen. When you run the code, your application will look something like this:Adding an event using wxPython Though your application looks cool, but it really does nothing. The button you have created does nothing on pressing it. Let us give the button a job:import wx   class MyFramework(wx.Frame):     def frame(self):         super().frame(parent=None, title='Hello World')         panel = wx.Panel(self)         my_sizer = wx.BoxSizer(wx.VERTICAL)         self.text_ctrl = wx.TextCtrl(panel) my_sizer.Add(self.text_ctrl, 0, wx.ALL | wx.EXPAND, 5)         my_button = wx.Button(panel, label='Press Me') my_button.Bind(wx.EVT_BUTTON, self.on_press) my_sizer.Add(my_btn, 0, wx.ALL | wx.CENTER, 5)         panel.SetSizer(my_sizer)         self.Show()   def button_press(self, event):         value = self.text_ctrl.GetValue()         if not value:             print("You didn't enter anything!")        else:             print(f'You typed: "{value}"')   if __name__ == '__main__':     application = wx.App()     framework = MyFramework() application.MainLoop() You can attach event bindings to the widgets in wxPython. This allows them to respond to certain types of events.If you want the button to do something, you can do it using the button’s .Bind() method. It takes the events you want to bind to, the handler to call an event, an optional source, and a number of optional ids. In the example above, the button object is binded to wx.EVT_BUTTON and told to call button_press when the event gets fired..button_press also accepts a second argument by convention that is called event. The event parameter suggests that the second argument should be an event object.You can get the text control’s contents with the help of GetValue() method within .button_press.How to create a Working Application?Consider a situation where you are asked to create an MP3 tag editor. The foremost thing you need to do is to look out for the required packages.Consider a situation where you are asked to create an MP3 tag editor. The foremost thing you need to do is to look out for the required packages.If you make a Google search for Python mp3 tag editor, you will find several options as below:mp3 -taggereyeD3mutagenOut of these, eyeD3 is a better choice than the other two since it has a pretty nice API that can be used without getting bogged down with MP3’s ID3 specification.You can install eyeD3 using pip from your terminal:pip install eyed3If you want to install eyeD3 in macOS, you have to install libmagic using brew. Linux and Windows users can easily install using the command mentioned above.Designing the User Interface using wxPythonThe very first thing you must do before designing an interface is to sketch out how you think the interface should look.The user interface should perform the following tasks:Open up one or more MP3 files.Display the current MP3 tags.Edit an MP3 tag.If you want to open a file or a folder, you need to have a menu or a button in your user interface. You can do that with a File menu. You will also need a widget to see the tags for multiple MP3 files. A tabular structure consisting of columns and rows would be perfect for this case since you can have labeled columns for the MP3 tags. wxPython toolkit consists of afew widgets to perform this task:wx.grid.Gridwx.ListCtrlwx.ListCtrl would be a better option of these two since the Grid widget is overkill and complex in nature. Finally, you can use a button to perform the editing tasks.Below is an illustration of what the application should look like:Creating the User Interface You can refer to a lot of approaches when you are creating a user interface. You can follow the Model-View-Controller design pattern that is used for developing user interfaces which divides the program logic into three interconnected elements. You should know how to split up classes and how many classes should be included in a single file and so on.However, in this case, you need only two classes which are as follows:wx.Panel classwx.Frame class Let’s start with imports and the panel class:import eyed3 import glob import wx   class Mp3Panel(wx.Panel):     def frame(self, parent):         super().__init__(parent) main_sizer = wx.BoxSizer(wx.VERTICAL) self.row_obj_dict = {}   self.list_ctrl = wx.ListCtrl(             self, size=(-1, 100),               style=wx.LC_REPORT | wx.BORDER_SUNKEN         ) self.list_ctrl.InsertColumn(0, 'Artist', width=140) self.list_ctrl.InsertColumn(1, 'Album', width=140) self.list_ctrl.InsertColumn(2, 'Title', width=200) main_sizer.Add(self.list_ctrl, 0, wx.ALL | wx.EXPAND, 0)         edit_button = wx.Button(self, label='Edit') edit_button.Bind(wx.EVT_BUTTON, self.on_edit) main_sizer.Add(edit_button, 0, wx.ALL | wx.CENTER, 5)         self.SetSizer(main_sizer)   def on_edit(self, event):         print('in on_edit')   def update_mp3_listing(self, folder_path):         print(folder_path)In this example above, the eyed3 package, glob package, and the wx package are imported. Then, the user interface is created by making wx.Panel a subclass. A dictionary row_obj_dict is created for storing data about the MP3s. The next thing you do is create a wx.ListCtrl and set it to report mode, i.e. wx.LC_REPORT. This report flag is the most popular among all but you can also choose your own depending upon the style flag that you pass in. Now you need to call .InsertColumn() to make the ListCtrl have the correct headers and then provide the index of the column, its label and the width of the column pixels. Finally, you need to add your Edit button, an event handler, and a method. The code for the frame is as follows:class Mp3Frame(wx.Frame):     def__init__(self):         super().__init__(parent=None,                          title='Mp3 Tag Editor') self.panel = Mp3Panel(self) self.Show()   if __name__ == '__main__':     app = wx.App(False)     frame = Mp3Frame() app.MainLoop()This class function is a better and simpler approach than the previous one because you just need to set the title of the frame and instantiate the panel class, MP3Panel. The user interface will look like this after all the implementations:The next thing we will do is add a File menu to add MP3s to the application and also edit their tags.Make a Functioning ApplicationThe very first thing you need to do to make your application work is to update the wx.Frame class to include the File menu which will allow you to add MP3 files.Code to add a menu bar to our application:class Mp3Frame(wx.Frame):   def__init__(self):         wx.Frame.__init__(self, parent=None,                             title='Mp3 Tag Editor') self.panel = Mp3Panel(self) self.create_menu() self.Show()   def create_menu(self): menu_bar = wx.MenuBar() file_menu = wx.Menu() open_folder_menu_item = file_menu.Append( wx.ID_ANY, 'Open Folder',   'Open a folder with MP3s'         ) menu_bar.Append(file_menu, '&File') self.Bind(             event=wx.EVT_MENU,               handler=self.on_open_folder,             source=open_folder_menu_item,         ) self.SetMenuBar(menu_bar)   def on_open_folder(self, event):         title = "Choose a directory:" dlg = wx.DirDialog(self, title,                              style=wx.DD_DEFAULT_STYLE) if dlg.ShowModal() == wx.ID_OK:             self.panel.update_mp3_listing(dlg.GetPath()) dlg.Destroy() In the example code above, .create_menu() is called within the class’s constructor and then two instances – wx.MenuBar and wx.Menu are created.Now, if you’re willing to add an item to the menu, you need to call the menu instance’s .Append() and pass the following things:A unique identifierLabelA help stringAfter that call the menubar’s .Append() to add the menu to the menubar. It will take the menu instance and the label for menu. The label is called as &File so that a keyboard shortcut is created to open the File menu using just the keyboard.Now self.Bind() is called to bind the frame to wx.EVT_MENU. This informs wxPython about which handler should be used and which source to bind the handler to. Lastly, call the frame’s .SetMenuBar and pass it the menubar instance. Your menu is now added to the frame.Now let’s come back to the menu item’s event handler:def on_open_folder(self, event):     title = "Choose a directory:" dlg = wx.DirDialog(self, title, style=wx.DD_DEFAULT_STYLE) if dlg.ShowModal() == wx.ID_OK:         self.panel.update_mp3_listing(dlg.GetPath()) dlg.Destroy()You can use wxPython’s wx.DirDialog to choose the directories of the correct MP3 folder. To display the dialog, use .ShowModal(). This will display the dialog modally but will disallow the user to interact with the main application.  You can get to the user’s choice of path using .GetPath() whenever the user presses the OK button. This path has to be added to the panel class and this can be done by the panel’s .update_mp3_listing().Finally, you will have to close the dialog and the best method is using .Destroy().  There are methods to close the dialog like .Close() which will just dialog but will not destroy it, so .Destroy() is the most effective option to prevent such situation.Now let’s update the MP3Panel class starting with .update_mp3_listing():def update_mp3_listing(self, folder_path): self.current_folder_path = folder_path self.list_ctrl.ClearAll()   self.list_ctrl.InsertColumn(0, 'Artist', width=140) self.list_ctrl.InsertColumn(1, 'Album', width=140) self.list_ctrl.InsertColumn(2, 'Title', width=200) self.list_ctrl.InsertColumn(3, 'Year', width=200)       mp3s = glob.glob(folder_path + '/*.mp3')     mp3_objects = []     index = 0 for mp3 in mp3s:         mp3_object = eyed3.load(mp3) self.list_ctrl.InsertItem(index,               mp3_object.tag.artist) self.list_ctrl.SetItem(index, 1,               mp3_object.tag.album) self.list_ctrl.SetItem(index, 2,               mp3_object.tag.title)         mp3_objects.append(mp3_object) self.row_obj_dict[index] = mp3_object         index += 1In the example above, the current directory is set to the specified folder and the list control is cleared. The list controls stay fresh and shows the MP3s you’re currently working with. Next, the folder is taken and Python’s globmoduleis used to search for the MP3 files. Then, the MP3s are looped over and converted into eyed3 objects. This is done by calling the .load() of eyed3. After that, you can add the artist, album, and the title of the Mp3 to the control list given that the MP3s have the appropriate tags..InsertItem() is used to add a new row to a list control for the first time and SetItem()  is used to add rows to the subsequent columns. The last step is to save your MP3 object to your Python dictionary row_obj_dict.Now to edit an MP3’s tags, you need to update the .on_edit() event handler:def on_edit(self, event):     selection = self.list_ctrl.GetFocusedItem() if selection >= 0:         mp3 = self.row_obj_dict[selection] dlg = EditDialog(mp3) dlg.ShowModal()         self.update_mp3_listing(self.current_folder_path) dlg.Destroy()The user’s selection is taken by calling the list control’s .GetFocusedItem(). It will return -1 if the user will not select anything in the list control. However, if you want to extract the MP3 obj3ct from the dictionary, the user have to select something. You can then open the MP3 tag editor dialog which will be a custom dialog. As before, the dialog is shown modally, then the last two lines in .on_edit() will execute what will eventually display the current MP3 tag information. SummaryLet us sum up what we have learned in this article so far – Installing wxPython and Working with wxPython’s widgets Working of events in wxPython Comparing absolute positioning with sizers Creating a skeleton application and a working application The main feature of the wxPython Graphical User Interface is its robustness and a large collection of widgets that you can use to build cross-platform applications. Since you have now learned how to create a working application, that is an MP3 tag editor, you can try your hand to enhance this application to a more beautiful one with lots of new features or you can perhaps create your own wonderful application. To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course.
Rated 4.5/5 based on 2 customer reviews
6205
How to Build a Python GUI Application With wxPytho...

A Graphical User Interface or GUI is a user interf... Read More

What is PyPI & How To Publish An Open-Source Python Package to PyPI

The Python Standard Library comprises of sophisticated and robust capabilities for working with larger packages. You will find modules for working with sockets and with files and file paths.Though there might be great packages that Python comes with, there are more exciting and fantastic projects outside the standard library which are mostly called the Python Packaging Index (PyPI). It is nothing but a repository of software for the Python programming language.The PyPI package is considered as an important property for Python being a powerful language. You can get access to thousands of libraries starting from Hello World to advanced deep learning libraries.What is PyPI"PyPI" should be pronounced like "pie pea eye", specifically with the "PI" pronounced as individual letters, but rather as a single sound. This minimizes confusion with the PyPy project, which is a popular alternative implementation of the Python language.The Python Package Index, abbreviated as PyPI is also known as the Cheese Shop. It is the official third-party software repository for Python, just like CPAN is the repository for  Perl.  Some package managers such as pip, use PyPI as the default source for packages and their dependencies. More than 113,000 Python packages can be accessed through PyPI.How to use PyPITo install the packages from PyPI you would need a package installer. The recommended package installer for PyPI is ‘pip’. Pip is installed along when you install Python on your system. To learn more about ‘pip’, you may go through our article on “What is pip”. The pip command is a tool for installing and managing Python packages, such as those found in the Python Package Index. It is a replacement for easy_install.To install a package from the Python Package Index, just open up your terminal and type in a search query using the PIP tool. The most common usage for pip is to install, upgrade or uninstall a package. Starting with a Small Python PackageWe will start with a small Python package that we will use as an example to publish to PyPI. You can get the full source code from the GitHub repository. The package is called reader and it is an application by which you can download and read articles. Below shows the directory structure of reader :reader/  │  ├── reader/  │   ├── config.txt  │   ├── feed.py  │   ├── __init__.py  │   ├── __main__.py  │   └── viewer.py  │  ├── tests/  │   ├── test_feed.py  │   └── test_viewer.py  │  ├── MANIFEST.in  ├── README.md  └── setup.py The source code of the package is in a reader subdirectory that is bound with a configuration file. The GitHub repository also contains few tests in a separate subdirectory. In the coming sections, we will discuss the working of the reader package and also take a look at the special files which include setup.py, README.md, MANIFEST.in, and others. Using the Article ReaderThe reader is a primitive data format used for providing users with the latest updated content. You can download the frequent articles from the article feed with the help of reader. You can get the list of articles using the reader:$ python -m reader The latest tutorials from Real Python (https://realpython.com/)   0 How to Publish an Open-Source Python Package to PyPI   1 Python "while" Loops (Indefinite Iteration)   2 Writing Comments in Python (Guide)   3 Setting Up Python for Machine Learning on Windows   4 Python Community Interview With Michael Kennedy   5 Practical Text Classification With Python and Keras   6 Getting Started With Testing in Python   7 Python, Boto3, and AWS S3: Demystified   8 Python's range() Function (Guide)   9 Python Community Interview With Mike Grouchy  10 How to Round Numbers in Python  11 Building and Documenting Python REST APIs With Flask and Connexion – Part 2  12 Splitting, Concatenating, and Joining Strings in Python  13 Image Segmentation Using Color Spaces in OpenCV + Python  14 Python Community Interview With Mahdi Yusuf  15 Absolute vs Relative Imports in Python  16 Top 10 Must-Watch PyCon Talks  17 Logging in Python  18 The Best Python Books  19 Conditional Statements in PythonThe articles in the list are numbered. So if you want to read a particular article, you can just write the same command along with the number of the article you desire to read.For reading the article on “How to Publish an Open-Source Python Package to PyPI”, just add the serial number of the article:$ python -m reader 0  # How to Publish an Open-Source Python Package to PyPI  Python is famous for coming with batteries included. Sophisticated  capabilities are available in the standard library. You can find modules  for working with sockets, parsing CSV, JSON, and XML files, and  working with files and file paths. However great the packages included with Python are, there are many  fantastic projects available outside the standard library. These are  most often hosted at the Python Packaging Index (PyPI), historically  known as the Cheese Shop. At PyPI, you can find everything from Hello  World to advanced deep learning libraries.  ...  ...  ...You can read any of the articles in the list just by changing the article number with the command. Quick LookThe package comprises of five files which are the working hands of the reader. Let us understand the implementations one by one: config.txt -  It is a text configuration file that specifies the URL of the feed of articles. The configparser standard library is able to read the text file. This type of file contains key-value pairs that are distributed into different sections.  # config.txt [feed] url=https://realpython.com/atom.xml__main__.py - It is the entry point of your program whose duty is to control the main flow of the program. The double underscores denote the specialty of this file. Python executes the contents of the __main__.py file. # __main__.py from configparser import ConfigParser  from importlib import resources  import sys from reader import feed  from reader import viewer def main(): # Read URL of the Real Python feed from config file  configure=ConfigParser() configure.read_string(resources.readtext("reader","config.txt"))  URL=configure.get("feed","url") # If an article ID is given, show the article  if len(sys.argv) > 1:  article = feed.getarticle(URL, sys.argv[1])  viewer.show(article) # If no ID is given, show a list of all articles else: site = feed.getsite(URL)  titles = feed.gettitles(URL)  viewer.showlist(site,titles)  if __name__ == "__main__": main() __init__.py - It is also considered a special file because of the double underscore. It denotes the root of your package in which you can keep your package constants, your documentations and so on. # __init__.py # Version of the realpython-reader package  __version__= "1.0.0"__version__ is a special variable in Python used for adding numbers to your package which was introduced in PEP 396. The variables which are defined in __init__.py are available as variables in the namespace also. >>> import reader >>> reader.__version__ '1.0.0'feed.py - In the __main__.py, you can see two modules feed and viewer are imported which perform the actual work. The file feed.py  is used to read from a web feed and parse the result.  # feed.py import feedparser import html2text Cached_Feeds = dict() def _feed(url):  """Only read a feed once, by caching its contents""" if url not in _CACHED_FEEDS: Cached_Feeds[url]=feedparser.parse(url) return Cached_Feeds[url]viewer.py -  This file module contains two functions show() and show_list(). # viewer.py def show(article):  """Show one article""" print(article) def show_list(site,titles):  """Show list of articles""" print(f"The latest tutorials from {site}") for article_id,title in enumerate(titles): print(f"{article_id:>3}{title}")The function of show() is to print one article to the console. On the other hand, show_list prints a list of titles.Calling a Package You need to understand which file you should call to run the reader in cases where your package consists of four different source code files. The Python interpreter consists of an -m option that helps in specifying a module name instead of a file name.An example to execute two commands with a script hello.py:$ python hello.py Hi there! $ python -m hello Hi there!The two commands above are equivalent. However, the latter one with -m has an advantage. You can also call Python built-in modules with the help of it: $ python -m antigravity Created new window in existing browser session.The -m option also allows you to work with packages and modules:$ python -m reader ...The reader only refers to the directory. Python looks out for the file named __main__.py, if the file is found, it is executed otherwise an error message is printed: $ python -m math python: No code object available for mathPreparing Your PackageSince now you have got your package, let us understand the necessary steps that are needed to be done before the uploading process. Naming the Package Finding a good and unique name for your package is the first and one of the most difficult tasks. PyPI has more than 150,000 packages already in their list, so chances are that your favorite name might be already taken. You need to perform some research work in order to find a perfect name. You can also use the PyPI search to verify whether it is already used or not.  We will be using a more descriptive name and call it realpython-reader so that the reader package can be easily found on PyPI and then use it to install the package using pip:$ pip install realpython-readerHowever, the name we have given is realpython-reader but when we import it, it is still called as reader:>>> import reader >>> help(reader) >>> from reader import feed >>> feed.get_titles() ['How to Publish an Open-Source Python Package to PyPI', ...]You can use a variety of names for your package while importing on PyPI but it is suggested to use the same name or similar ones for better understanding. Configuring your PackageYour package should be included with some basic information which will be in the form of a setup.py file. The setup.py is the only fully supported way of providing information, though Python consists of initiatives that are used to simplify this collection of information.The setup.py file should be placed in the top folder of your package. An example of a setup.py  for reader: import pathlib from setuptools import setup # The directory containing this file HERE = pathlib.Path(__file__).parent # The text of the README file README = (HERE/"README.md").read_text() # This call to setup() does all the work setup( name="realpython-reader",  version="1.0.1",  descp="The latest Python tutorials",  long_descp=README, long_descp_content="text/markdown",  URL="https://github.com/realpython/reader",  author="Real Python",  authoremail="office@realpython.com",  license="MIT",  classifiers=[  "License :: OSI Approved :: MIT License",  "Programming Language :: Python :: 3",  "Programming Language :: Python :: 3.7",  ],  packages=["reader"],  includepackagedata=True,  installrequires=["feedparser","html2text"],  entrypoints={  "console_scripts":[  "realpython=reader.__main__:main",  ]  },  ) The necessary parameters available in setuptools in the call to setup() are as follows: name - The name of your package as being appeared on PyPI version - the present version of your package packages - the packages and subpackages which contain your source code You will also have to specify any subpackages if included. setuptools contains find_packages() whose job is to discover all your subpackages. You can also use it in the reader project:from setuptools import find_packages,setup  setup(  ... packages=find_packages(exclude=("tests",)), ... ) You can also add more information along with name, version, and packages which will make it easier to find on PyPI.Two more important parameters of  setup() : install_requires - It lists the dependencies your package has to the third-party libraries. feedparser and html2text are listed since they are the dependencies of reader.entry_points - It creates scripts to call a function within your package. Our script realpython calls the main() within the reader/__main__.py file.Documenting Your PackageDocumenting your package before releasing it is an important step. It can be a simple README file or a complete tutorial webpage like galleries or an API reference.  At least a README file with your project should be included at a minimum which should give a quick description of your package and also inform about the installation process and how to use it. In other words, you need to include your README as the long_descp argument to setup() which will eventually be displayed on PyPI. PyPI uses Markdown for package documentation. You can use the setup() parameter long_description_content_type to get the PyPI format you are working with. When you are working with bigger projects and want to add more documentation to your package, you can take the help of websites like GitHub and Read the Docs. Versioning Your Package Similarly like documentation, you need to add a version to your package. PyPI promises reproducibility by allowing a user to do one upload of a particular version for a package. If there are two systems with the same version of a package, it will behave in an exact manner. PEP 440 of Python provides a number of schemes for software versioning. However, for a simple project, let us stick to a simple versioning scheme. A simple versioning technique is semantic versioning which has three components namely MAJOR, MINOR, and PATCH and some simple rules about the incrementation process of each component: Increment the MAJOR version when you make incompatible API changes. Increment the MINOR version when you add functionality in a backward-compatible manner. Increment the PATCH version when you make backward-compatible bug fixes. (Source) You need to specify the different files inside your project. Also, if you want to verify whether the version numbers are consistent or not, you can do it using a tool called Bumpversion: $ pip install bumpversionAdding Files To Your PackageYour package might include other files other than source code files like data files, binaries, documentation and configuration files. In order to add such files, we will use a manifest file. In most cases, setup() creates a manifest that includes all code files as well as README files.   However, if you want to change the manifest, you can create a manifest template of your own. The file should be called MANIFEST.in and it will specify rules for what needs to be included and what needs to be excluded: include reader/*.txtThis will add all the .txt files in the reader directory. Other than creating the manifest, the non-code files also need to be copied. This can be done by setting the include_package_data toTrue: setup(  ... include_package_data=True,  ... )Publishing to PyPI For publishing your package to the real world, you need to first start with registering yourself on PyPI and also on TestPyPI, which is useful because you can give a trial of the publishing process without any further consequences. You will have to use a tool called Twine to upload your package ton PyPI: $ pip install twineBuilding Your PackageThe packages on PyPI are wrapped into distribution packages, out of which the most common are source archives and Python wheels. A source archive comprises of your source code and other corresponding support files wrapped into one tar file. On the other hand, a Python wheel is a zip archive that also contains your code. However, the wheel can work with any extensions, unlike source archives. Run the following command in order to create a source archive and a wheel for your package: $ python setup.py sdist bdist_wheelThe command above will create two files in a newly created directory called dist, a source archive and a wheel: reader/ │  └── dist/      ├── realpython_reader-1.0.0-py3-none-any.whl      └── realpython-reader-1.0.0.tar.gz The command-line arguments like the sdist and bdist_wheel arguments are all implemented int the upstream distutils standard library. Using the --help-commands option, you list all the available arguments: $ python setup.py --help-commands  Standard commands:    build             build everything needed to install    build_py          "build" pure Python modules (copy to build directory)    < ... many more commands ...>Testing Your Package In order to test your package, you need to check whether the distribution packages you have newly created contain the expected files. You also need to list the contents of the tar source archive on Linux and macOS platforms: $ tar tzf realpython-reader-1.0.0.tar.gz  realpython-reader-1.0.0/  realpython-reader-1.0.0/setup.cfg  realpython-reader-1.0.0/README.md  realpython-reader-1.0.0/reader/  realpython-reader-1.0.0/reader/feed.py  realpython-reader-1.0.0/reader/__init__.py  realpython-reader-1.0.0/reader/viewer.py  realpython-reader-1.0.0/reader/__main__.py  realpython-reader-1.0.0/reader/config.txt  realpython-reader-1.0.0/PKG-INFO  realpython-reader-1.0.0/setup.py  realpython-reader-1.0.0/MANIFEST.in  realpython-reader-1.0.0/realpython_reader.egg-info/  realpython-reader-1.0.0/realpython_reader.egg-info/SOURCES.txt  realpython-reader-1.0.0/realpython_reader.egg-info/requires.txt  realpython-reader-1.0.0/realpython_reader.egg-info/dependency_links.txt  realpython-reader-1.0.0/realpython_reader.egg-info/PKG-INFO  realpython-reader-1.0.0/realpython_reader.egg-info/entry_points.txt  realpython-reader-1.0.0/realpython_reader.egg-info/top_level.txt On Windows, you can make use of the utility tool 7-zip to look inside the corresponding zip file. You should make sure that all the subpackages and supporting files are included in your package along with all the source code files as well as the newly built files. You can also run twine check on the files created in dist to check if your package description will render properly on PyPI: $ twine check dist/* Checking distribution dist/realpython_reader-1.0.0-py3-none-any.whl: Passed  Checking distribution dist/realpython-reader-1.0.0.tar.gz: Passed Uploading Your PackageNow you have reached the final step,i.e. Uploading your package to PyPI. Make sure you upload your package first to TestPyPI to check whether it is working according to your expectation and then use the Twine tool and instruct it to upload your newly created distribution: $ twine upload --repository-url https://test.pypi.org/legacy/ dist/* After the uploading process is over, you can again go to TestPyPI and look at your project being displayed among the new releases.  However, if you have your own package to publish, the command is short: $ twine upload dist/* Give your username and password and it’s done. Your package has been published on PyPI. To look up your package, you can either search it or look at the Your projects page or you can just directly go to the URL of your project: pypi.org/project/your-package-name/. After completing the publishing process, you can download it in your system using pip: $ pip install your-package-nameMiscellaneous Tools There are some useful tools that are good to know when creating and publishing Python packages. Some of these are mentioned below. Virtual Environments Each virtual environment has its own Python binary and can also have its own set of installed Python packages in its directories. These packages are independent in nature. Virtual environments are useful in situations where there are a variety of requirements and dependencies while working with different projects. You can grab more information about virtual environments in  the following references: Python Virtual Environments Pipenv It is recommended to check your package inside a basic virtual environment so that to make sure all necessary dependencies in your setup.py file are included. Cookiecutter Cookiecutter sets up your project by asking a few questions based on a template. Python contains many different templates. Install Cookiecutter using pip: $ pip install cookiecutterTo understand cookiecutter, we will use a template called pypackage-minimal. If you want to use a template, provide the link of the template to the cookiecutter: $ cookiecutter https://github.com/kragniz/cookiecutter-pypackage-minimal  author_name [Louis Taylor]: Real Python  author_email [louis@kragniz.eu]: office@realpython.com  package_name [cookiecutter_pypackage_minimal]: realpython-reader  package_version [0.1.0]:  package_description [...]: Read Real Python tutorials  package_url [...]: https://github.com/realpython/reader  readme_pypi_badge [True]:  readme_travis_badge [True]: False  readme_travis_url [...]: Cookiecutter sets up your project after you have set up answered a series of questions. The template above will create the following files and directories: realpython-reader/  │  ├── realpython-reader/  │   └── __init__.py  │  ├── tests/  │   ├── __init__.py  │   └── test_sample.py  │  ├── README.rst  ├── setup.py  └── tox.ini You can also take a look at the documentation of cookiecutter for all the available cookiecutters and how to create your own template. Summary Let us sum up the necessary steps we have learned in this article so far to publish your own package - Finding a good and unique name for your packageConfiguring your package using setup.py Building your package Publishing your package to PyPI Moreover, you have also learned to use a few new tools that help in simplifying the process of publishing packages.  You can reach out to Python’s Packaging Authority for more detailed and comprehensive information. To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course. 
Rated 4.5/5 based on 34 customer reviews
12091
What is PyPI & How To Publish An Open-Source P...

The Python Standard Library comprises of sophistic... Read More

How to Round Numbers in Python

While you are dealing with data, sometimes you may come across a biased dataset. In statistics, bias is whereby the expected value of the results differs from the true underlying quantitative parameter being estimated. Working with such data can be dangerous and can lead you to incorrect conclusions. To learn more about various other concepts of Python, go through our Python Tutorials or enroll to our Python Certification course online.There are many types of biases such as selection bias, reporting bias, sampling bias and so on. Similarly, rounding bias is related to numeric data. In this article we will see:Why is it important to know the ways to round numbersHow to use various strategies to round numbersHow data is affected by rounding itHow to use NumPy arrays and Pandas DataFrames to round numbersLet us first learn about Python’s built-in rounding process.About Python’s Built-in round() FunctionPython Programming offers a built-in round() function which rounds off a number to the given number of digits and makes rounding of numbers easier. The function round() accepts two numeric arguments, n and n digits and then returns the number n after rounding it to ndigits. If the number of digits are not provided for round off, the function rounds off the number n to the nearest integer.Suppose, you want to round off a number, say 4.5. It will be rounded to the nearest whole number which is 5. However, the number 4.74 will be rounded to one decimal place to give 4.7.It is important to quickly and readily round numbers while you are working with floats which have many decimal places. The inbuilt Python function round() makes it simple and easy.Syntaxround(number, number of digits)The parameters in the round() function are:number - number to be roundednumber of digits (Optional) - number of digits up to which the given number is to be rounded.The second parameter is optional. In case, if it is missing then round() function returns:For an integer, 12, it rounds off to 12For a decimal number, if the last digit after the decimal point is >=5 it will round off to the next whole number, and if =5 print(round(5.476, 2))     # when the (ndigit+1)th digit is  1 print(round("x", 2)) TypeError: type str doesn't define __round__ methodAnother example,print(round(1.5)) print(round(2)) print(round(2.5))The output will be:2 2 2The function round() rounds 1.5 up to 2, and 2.5 down to 2. This is not a bug, the round() function behaves this way. In this article you will learn a few other ways to round a number. Let us look at the variety of methods to round a number.Diverse Methods for RoundingThere are many ways to round a number with its own advantages and disadvantages. Here we will learn some of the techniques to rounding a number.TruncationTruncation, as the name means to shorten things. It is one of the simplest methods to round a number which involves truncating a number to a given number of digits. In this method, each digit after a given position is replaced with 0. Let us look into some examples.ValueTruncated ToResult19.345Tens place1019.345Ones place1919.345Tenths place19.319.345Hundredths place19.34The truncate() function can be used for positive as well as negative numbers:>>> truncate(19.5) 19.0 >>> truncate(-2.852, 1) -2.8 >>> truncate(2.825, 2) 2.82The truncate() function can also be used to truncate digits towards the left of the decimal point by passing a negative number.>>> truncate(235.7, -1) 230.0 >>> truncate(-1936.37, -3) -1000.0When a positive number is truncated, we are basically rounding it down. Similarly, when we truncate a negative number, the number is rounded up. Let us look at the various rounding methods.Rounding UpThere is another strategy called “rounding up” where a number is rounded up to a specified number of digits. For example:ValueRound Up ToResult12.345Tens place2018.345Ones place1918.345Tenths place18.418.345Hundredths place18.35The term ceiling is used in mathematics to explain the nearest integer which is greater than or equal to a particular given number. In Python, for “rounding up” we use two functions namely,ceil() function, andmath() functionA non-integer number lies between two consecutive integers. For example, considering a number 5.2, this will lie between 4 and 5. Here, ceiling is the higher endpoint of the interval, whereas floor is the lower one. Therefore, ceiling of 5.2 is 5, and floor of 5.2 is 4. However, the ceiling of 5 is 5.In Python, the function to implement the ceiling function is the math.ceil() function. It always returns the closest integer which is greater than or equal to its input.>>> import math >>> math.ceil(5.2) 6 >>> math.ceil(5) 5 >>> math.ceil(-0.5) 0If you notice you will see that the ceiling of -0.5 is 0, and not -1.Let us look into a short code to implement the “rounding up” strategy using round_up() function:def round_up(n, decimals=0):     multiplier = 10 ** decimals     return math.ceil(n * multiplier) / multiplierLet’s look at how round_up() function works with various inputs:>>> round_up(3.1) 4.0 >>> round_up(3.23, 1) 3.3 >>> round_up(3.543, 2) 3.55You can pass negative values  to decimals, just like we did in truncation.>>> round_up(32.45, -1) 40.0 >>> round_up(3352, -2) 3400You can follow the diagram below to understand round up and round down. Round up to the right and down to the left.Rounding up always rounds a number to the right on the number line, and rounding down always rounds a number to the left on the number line.Rounding DownSimilar to rounding up we have another strategy called rounding down whereValueRounded Down ToResult19.345Tens place1019.345Ones place1919.345Tenths place19.319.345Hundredths place19.34In Python, rounding down can be implemented using a similar algorithm as we truncate or round up. Firstly you will have to shift the decimal point and then round an integer. Lastly shift the decimal point back.math.ceil() is used to round up to the ceiling of the number once the decimal point is shifted. For “rounding down” we first need to round the floor of the number once the decimal point is shifted.>>> math.floor(1.2) 1 >>> math.floor(-0.5) -1Here’s the definition of round_down():def round_down(n, decimals=0):     multiplier = 10 ** decimals return math.floor(n * multiplier) / multiplierThis is quite similar to round_up() function. Here we are using math.floor() instead of math.ceil().>>> round_down(1.5) 1 >>> round_down(1.48, 1) 1.4 >>> round_down(-0.5) -1Rounding a number up or down has extreme effects in a large dataset. After rounding up or down, you can actually remove a lot of precision as well as alter computations.Rounding Half UpThe “rounding half up” strategy rounds every number to the nearest number with the specified precision, and breaks ties by rounding up. Here are some examples:ValueRound Half Up ToResult19.825Tens place1019.825Ones place2019.825Tenths place19.819.825Hundredths place19.83In Python, rounding half up strategy can be implemented by shifting the decimal point to the right by the desired number of places. In this case you will have to determine whether the digit after the shifted decimal point is less than or greater than equal to 5.You can add 0.5 to the value which is shifted and then round it down with the math.floor() function.def round_half_up(n, decimals=0):     multiplier = 10 ** decimals return math.floor(n*multiplier + 0.5) / multiplierIf you notice you might see that round_half_up() looks similar to round_down. The only difference is to add 0.5 after shifting the decimal point so that the result of rounding down matches with the expected value.>>> round_half_up(19.23, 1) 19.2 >>> round_half_up(19.28, 1) 19.3 >>> round_half_up(19.25, 1) 19.3Rounding Half DownIn this method of rounding, it rounds to the nearest number similarly like “rounding half up” method, the difference is that it breaks ties by rounding to the lesser of the two numbers. Here are some examples:ValueRound Half Down ToResult16.825Tens place1716.825Ones place1716.825Tenths place16.816.825Hundredths place16.82In Python, “rounding half down” strategy can be implemented by replacing math.floor() in the round_half_up() function with math.ceil() and then by subtracting 0.5 instead of adding:def round_half_down(n, decimals=0):     multiplier = 10 ** decimals return math.ceil(n*multiplier - 0.5) / multiplierLet us look into some test cases.>>> round_half_down(1.5) 1.0 >>> round_half_down(-1.5) -2.0 >>> round_half_down(2.25, 1) 2.2In general there are no bias for both round_half_up() and round_half_down(). However, rounding of data with more number of ties results in bias. Let us consider an example to understand better.>>> data = [-2.15, 1.45, 4.35, -12.75]Let us compute the mean of these numbers:>>> statistics.mean(data) -2.275Now let us compute the mean on the data after rounding to one decimal place with round_half_up() and round_half_down():>>> rhu_data = [round_half_up(n, 1) for n in data] >>> statistics.mean(rhu_data) -2.2249999999999996 >>> rhd_data = [round_half_down(n, 1) for n in data] >>> statistics.mean(rhd_data) -2.325The round_half_up() function results in a round towards positive infinity bias, and round_half_down() results in a round towards negative infinity bias.Rounding Half Away From ZeroIf you have noticed carefully while going through round_half_up() and round_half_down(), neither of the two is symmetric around zero:>>> round_half_up(1.5) 2.0 >>> round_half_up(-1.5) -1.0 >>> round_half_down(1.5) 1.0 >>> round_half_down(-1.5) -2.0In order to introduce symmetry, you can always round a tie away from zero. The table mentioned below illustrates it clearly:ValueRound Half Away From Zero ToResult16.25Tens place2016.25Ones place1616.25Tenths place16.3-16.25Tens place-20-16.25Ones place-16-16.25Tenths place-16.3The implementation of “rounding half away from zero” strategy on a number n is very simple. All you need to do is start as usual by shifting the decimal point to the right a given number of places and then notice the digit d immediately to the right of the decimal place in this new number. Here, there are four cases to consider:If n is positive and d >= 5, round upIf n is positive and d < 5, round downIf n is negative and d >= 5, round downIf n is negative and d < 5, round upAfter rounding as per the rules mentioned above, you can shift the decimal place back to the left.There is a question which might come to your mind - How do you handle situations where the number of positive and negative ties are drastically different? The answer to this question brings us full circle to the function that deceived us at the beginning of this article: Python’s built-in  round() function.Rounding Half To EvenThere is a way to mitigate rounding bias while you are rounding values in a dataset. You can simply round ties to the nearest even number at the desired precision. Let us look at some examples:ValueRound Half To Even ToResult16.255Tens place2016.255Ones place1616.255Tenths place16.216.255Hundredths place16.26To prove that round() really does round to even, let us try on a few different values:>>> round(4.5) 4 >>> round(3.5) 4 >>> round(1.75, 1) 1.8 >>> round(1.65, 1) 1.6The Decimal ClassThe  decimal module in Python is one of those features of the language which you might not be aware of if you have just started learning Python. Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.” – except from the decimal arithmetic specification. Some of the benefits of the decimal module are mentioned below -Exact decimal representation: 0.1 is actually 0.1, and 0.1 + 0.1 + 0.1 - 0.3 returns 0, as expected.Preservation of significant digits: When you add 1.50 and 2.30, the result is 3.80 with the trailing zero maintained to indicate significance.User-alterable precision: The default precision of the decimal module is twenty-eight digits, but this value can be altered by the user to match the problem at hand.Let us see how rounding works in the decimal module.>>> import decimal >>> decimal.getcontext() Context(     prec=28,     rounding=ROUND_HALF_EVEN,     Emin=-999999,     Emax=999999,     capitals=1,     clamp=0,     flags=[],     traps=[         InvalidOperation,         DivisionByZero,         Overflow     ] )The function decimal.getcontext() returns a context object which represents the default context of the decimal module. It also includes the default precision and the default rounding strategy.In the above example, you will see that the default rounding strategy for the decimal module is ROUND_HALF_EVEN. It allows to align with the built-in round() functionLet us create a new Decimal instance by passing a string containing the desired value and declare a number using the decimal module’s Decimal class.>>> from decimal import Decimal >>> Decimal("0.1") Decimal('0.1')You may create a Decimal instance from a floating-point number but in that case, a floating-point representation error will be introduced. For example, this is what happens when you create a Decimal instance from the floating-point number 0.1>>> Decimal(0.1) Decimal('0.1000000000000000055511151231257827021181583404541015625')You may create Decimal instances from strings containing the decimal numbers you need in order to maintain exact precision.Rounding a Decimal using the .quantize() method:>>> Decimal("1.85").quantize(Decimal("1.0")) Decimal('1.8')The Decimal("1.0") argument in .quantize() allows to determine the number of decimal places in order to round the number. As 1.0 has one decimal place, the number 1.85 rounds to a single decimal place. Rounding half to even is the default strategy, hence the result is 1.8.Decimal class:>>> Decimal("2.775").quantize(Decimal("1.00")) Decimal('2.78')Decimal module provides another benefit. After performing arithmetic the rounding is taken care of automatically and also the significant digits are preserved.>>> decimal.getcontext().prec = 2 >>> Decimal("2.23") + Decimal("1.12") Decimal('3.4')To change the default rounding strategy, you can set the decimal.getcontect().rounding property to any one of several  flags. The following table summarizes these flags and which rounding strategy they implement:FlagRounding Strategydecimal.ROUND_CEILINGRounding updecimal.ROUND_FLOORRounding downdecimal.ROUND_DOWNTruncationdecimal.ROUND_UPRounding away from zerodecimal.ROUND_HALF_UPRounding half away from zerodecimal.ROUND_HALF_DOWNRounding half towards zerodecimal.ROUND_HALF_EVENRounding half to evendecimal.ROUND_05UPRounding up and rounding towards zeroRounding NumPy ArraysIn Data Science and scientific computation, most of the times we store data as a  NumPy array. One of the most powerful features of NumPy is the use of  vectorization and broadcasting to apply operations to an entire array at once instead of one element at a time.Let’s generate some data by creating a 3×4 NumPy array of pseudo-random numbers:>>> import numpy as np >>> np.random.seed(444) >>> data = np.random.randn(3, 4) >>> data array([[ 0.35743992,  0.3775384 ,  1.38233789,  1.17554883],        [-0.9392757 , -1.14315015, -0.54243951, -0.54870808], [ 0.20851975, 0.21268956, 1.26802054, -0.80730293]])Here, first we seed the np.random module to reproduce the output easily. Then a 3×4 NumPy array of floating-point numbers is created with np.random.randn().Do not forget to install pip3 before executing the code mentioned above. If you are using  Anaconda you are good to go.To round all of the values in the data array, pass data as the argument to the  np.around() function. The desired number of decimal places is set with the decimals keyword argument. In this case, round half to even strategy is used similar to Python’s built-in round() function.To round the data in your array to integers, NumPy offers several options which are mentioned below:numpy.ceil()numpy.floor()numpy.trunc()numpy.rint()The np.ceil() function rounds every value in the array to the nearest integer greater than or equal to the original value:>>> np.ceil(data) array([[ 1.,  1.,  2.,  2.],        [-0., -1., -0., -0.], [ 1., 1., 2., -0.]])Look at the code carefully, we have a new number! Negative zero! Let us now take a look at Pandas library, widely used in Data Science with Python.Rounding Pandas Series and DataFramePandas has been a game-changer for data analytics and data science. The two main data structures in Pandas are Dataframe and Series. Dataframe works like an Excel spreadsheet whereas you can consider Series to be columns in a spreadsheet. Series.round() and DataFrame.round() methods. Let us look at an example.Do not forget to install pip3 before executing the code mentioned above. If you are using  Anaconda you are good to go.>>> import pandas as pd >>> # Re-seed np.random if you closed your REPL since the last example >>> np.random.seed(444) >>> series = pd.Series(np.random.randn(4)) >>> series 0    0.357440 1    0.377538 2    1.382338 3    1.175549 dtype: float64 >>> series.round(2) 0    0.36 1    0.38 2    1.38 3    1.18 dtype: float64 >>> df = pd.DataFrame(np.random.randn(3, 3), columns=["A", "B", "C"]) >>> df           A         B         C 0 -0.939276 -1.143150 -0.542440 1 -0.548708  0.208520  0.212690 2  1.268021 -0.807303 -3.303072 >>> df.round(3)        A      B      C 0 -0.939 -1.143 -0.542 1 -0.549  0.209  0.213 2  1.268 -0.807 -3.303 The DataFrame.round() method can also accept a dictionary or a Series, to specify a different precision for each column. For instance, the following examples show how to round the first column of df to one decimal place, the second to two, and the third to three decimal places: >>> # Specify column-by-column precision with a dictionary >>> df.round({"A": 1, "B": 2, "C": 3})      A     B      C 0 -0.9 -1.14 -0.542 1 -0.5  0.21  0.213 2  1.3 -0.81 -3.303 >>> # Specify column-by-column precision with a Series >>> decimals = pd.Series([1, 2, 3], index=["A", "B", "C"]) >>> df.round(decimals)      A     B      C 0 -0.9 -1.14 -0.542 1 -0.5  0.21  0.213 2  1.3 -0.81 -3.303 If you need more rounding flexibility, you can apply NumPy's floor(), ceil(), and print() functions to Pandas Series and DataFrame objects: >>> np.floor(df)      A    B    C 0 -1.0 -2.0 -1.0 1 -1.0  0.0  0.0 2  1.0 -1.0 -4.0 >>> np.ceil(df)      A    B    C 0 -0.0 -1.0 -0.0 1 -0.0  1.0  1.0 2  2.0 -0.0 -3.0 >>> np.rint(df)      A    B    C 0 -1.0 -1.0 -1.0 1 -1.0  0.0  0.0 2  1.0 -1.0 -3.0 The modified round_half_up() function from the previous section will also work here: >>> round_half_up(df, decimals=2)       A     B     C 0 -0.94 -1.14 -0.54 1 -0.55  0.21  0.21 2 1.27 -0.81 -3.30Best Practices and ApplicationsNow that you have come across most of the rounding techniques, let us learn some of the best practices to make sure we round numbers in the correct way.Generate More Data and Round LaterSuppose you are dealing with a large set of data, storage can be a problem at times. For example, in an industrial oven you would want to measure the temperature every ten seconds accurate to eight decimal places, using a temperature sensor. These readings will help to avoid large fluctuations which may lead to failure of any heating element or components. We can write a Python script to compare the readings and check for large fluctuations.There will be a large number of readings as they are being recorded each and everyday. You may consider to maintain three decimal places of precision. But again, removing too much precision may result in a change in the calculation. However, if you have enough space, you can easily store the entire data at full precision. With less storage, it is always better to store at least two or three decimal places of precision which are required for calculation.In the end, once you are done computing the daily average of the temperature, you may calculate it to the maximum precision available and finally round the result.Currency Exchange and RegulationsWhenever we purchase an item from a particular place, the tax amount paid against the amount of the item depends largely on geographical factors. An item which costs you $2 may cost you less (say $1.8)  if you buy the same item from a different state. It is due to regulations set forth by the local government.In another case, when the minimum unit of currency at the accounting level in a country is smaller than the lowest unit of physical currency, Swedish rounding is done. You can find a list of such rounding methods used by various countries if you look up on the internet.If you want to design any such software for calculating currencies, keep in mind to check the local laws and regulations applicable in your present location.Reduce errorAs you are rounding numbers in a large datasets used in complex computations, your primary concern should be to limit the growth of the error due to rounding.SummaryIn this article we have seen a few methods to round numbers, out of those “rounding half to even” strategy minimizes rounding bias the best. We are lucky to have Python, NumPy, and Pandas already have built-in rounding functions to use this strategy. Here, we have learned about -Several rounding strategies, and how to implement in pure Python.Every rounding strategy inherently introduces a rounding bias, and the “rounding half to even” strategy mitigates this bias well, most of the time.You can round NumPy arrays and Pandas Series and DataFrame objects.If you enjoyed reading this article and found it to be interesting, leave a comment. To learn more about rounding numbers and other features of Python, join our Python certification course.
Rated 5.0/5 based on 43 customer reviews
13146
How to Round Numbers in Python

While you are dealing with data, sometimes you may... Read More