HomeBlogProgrammingHow to Work with Excel Spreadsheets using Python

How to Work with Excel Spreadsheets using Python

Published
03rd May, 2024
Views
view count loader
Read it in
16 Mins
In this article
    How to Work with Excel Spreadsheets using Python

    Excel is considered one of the most popular and widely used spreadsheet applications developed by Microsoft. You can organize, analyze and store your data into tabular sheets with the help of Excel. From analysts and sales managers to CEOs, professionals from every field use Excel for creating quick statistics and for data crunching. These features make excel one of Python programs' most popular data sources.

    Check out the use of Self in Python with examples here.

    Spreadsheets are commonly used in the present world because of their intuitive nature and ability to handle large datasets. Most importantly, they can work without any prior technical background.

    Finding different ways to work with Excel using code is essential since working with data and in Python has some serious advantages in comparison with Excel’s UI. Developers of Python have implemented ways to read, write and manipulate Excel documents. An instructor-led Python Programming course is a good way to get hands-on experience in using excel within Python.

    You can check the quality of your spreadsheet application by going over the checklist below:

    • Is the spreadsheet able to represent static data?
    • Is the spreadsheet able to mix data, calculations, and reports?
    • Is the data in your spreadsheet complete and consistent in nature?
    • Does the spreadsheet have an organized worksheet structure?

    This checklist will help you in verifying the qualitative nature of the spreadsheet application you’re going to work on.

    Practical Applications  

    In this article, we will be using openpyxl to work on data. With the help of this module, you can extract data from a database into an Excel spreadsheet, or you can also convert an Excel spreadsheet into a programmatic format. There can be a lot of possible situations where you might feel the need to use a package like openpyxl. Let us discuss a few of them to get a comprehensive overview of them.

    Importing New Products Into a Database 

    Consider yourself working in an online store company. When they want to add new products to the online store, they make an Excel spreadsheet with a few hundred rows along with the product's name, description, price, and a few more basic information, then give it to you. 

    Now, if you want to import this particular data, you need to iterate over each row of the spreadsheet and then add each of the products to the database of the online store.

    Exporting Database Data Into a Spreadsheet

    Consider you have a Database table. In this particular table, you have collected information about all your users which includes their names, contact number, email address, and so forth. Now, the Marketing Team is willing to collectively contact all the users and promote a new product of the company. However, neither do they have access to the Database nor do they have any idea about using SQL to extract the information. 

    In this situation, openpyxl comes to play. You can use it effectively to iterate over each User record and transform the required information into an Excel spreadsheet.    

    Appending Information to an Existing Spreadsheet

    Consider the same online store example we discussed above. You have an Excel spreadsheet with a list of users, and your job is to append to each row the total amount they have spent in your store.

    In order to perform this, you have to read the spreadsheet first and then iterate through each row and fetch the total amount spent from the Database. Finally, you need to write it back to the spreadsheet.

    Starting openpyxl

    You can install the openpyxl package using pip. Open your terminal and write the following command: 

    $ pip install openpyxl

    After you have installed the spreadsheet, you can make up your own simple spreadsheet: 

    from openpyxl import Workbook
    
    workbook = Workbook()
    spreadsheet = workbook.active
    
    spreadsheet["A1"] = "Hello"
    spreadsheet["B1"] = "World!"
    
    workbook.save(filename="HelloWorld.xlsx")

    How to Read Excel Spreadsheets with openpyxl 

    Let us start with the most important thing that you can do with a spreadsheet,i.e. read it. We will be using a watch sample dataset which contains a list of 100 watches with information like product name, product ID, review, and so forth.  

    A Simple Way to Read an Excel Spreadsheet 

    Let us start by opening our sample spreadsheet:

    >>> from openpyxl import load_workbook
    >>> workbook = load_workbook(filename="sample.xlsx")
    >>> workbook.sheetnames
    ['Sheet 1']
    
    >>> spreadsheet = workbook.active
    >>> spreadsheet
    <Worksheet "Sheet 1">
    
    >>> spreadsheet.title

    In the example code above, we open the spreadsheet using load_workbook, and then we check all the sheets that are available to work with using workbook.sheetnames. Then Sheet 1 is automatically selected using workbook.active since it is the first sheet available. This is the most common way of opening a spreadsheet.  

    Now, let us see the code to retrieve data from the spreadsheet: 

    >>> spreadsheet["A1"]
    <Cell 'Sheet 1'.A1>
    
    >>> spreadsheet["A1"].value
    'marketplace'
    
    >>> spreadsheet["F10"].value
    "G-Shock Men's Grey Sport Watch"

    You can retrieve the actual value and the cell value both. To get the actual value, use .valueand to get the cell, you can use .cell():

    >>> spreadsheet.cell(row=10, column=6)
    <Cell 'Sheet 1'.F10>
    
    >>> spreadsheet.cell(row=10, column=6).value
    "G-Shock Men's Grey Sport Watch"

    Importing Data from a Spreadsheet 

    In this section, we will discuss how to iterate through the data and conversion into a more useful format using Python.

    Let us first start with iterating through the data. There are a number of iterating methods that depend solely on the user.

    You can slice the data with a combination of rows and columns:

    >>> spreadsheet["A1:C2"]
    ((<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>), 
     (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)) 

    You can also iterate through the dataset by ranging between rows and columns: 

    >>> # Get all cells from column A 
    >>> spreadsheet["A"]
    (<Cell 'Sheet 1'.A1>, 
     <Cell 'Sheet 1'.A2>, 
     ... 
     <Cell 'Sheet 1'.A99>, 
     <Cell 'Sheet 1'.A100>) 
    
    >>> # Get all cells for a range of columns 
    >>> spreadsheet["A:B"] 
    ((<Cell 'Sheet 1'.A1>, 
      <Cell 'Sheet 1'.A2>, 
      ... 
      <Cell 'Sheet 1'.A99>, 
      <Cell 'Sheet 1'.A100>), 
     (<Cell 'Sheet 1'.B1>, 
      <Cell 'Sheet 1'.B2>, 
      ... 
      <Cell 'Sheet 1'.B99>, 
      <Cell 'Sheet 1'.B100>)) 
    
    >>> # Get all cells from row 5 
    >>> spreadsheet[5]
    (<Cell 'Sheet 1'.A5>,
     <Cell 'Sheet 1'.B5>,
     ... 
     <Cell 'Sheet 1'.N5>,
     <Cell 'Sheet 1'.O5>)
    
    >>> # Get all cells for a range of rows
    >>> spreadsheet[5:6]
    ((<Cell 'Sheet 1'.A5>, 
      <Cell 'Sheet 1'.B5>, 
      ... 
      <Cell 'Sheet 1'.N5>, 
      <Cell 'Sheet 1'.O5>), 
     (<Cell 'Sheet 1'.A6>, 
      <Cell 'Sheet 1'.B6>, 
      ... 
      <Cell 'Sheet 1'.N6>, 
      <Cell 'Sheet 1'.O6>)) 

    Python offers arguments by which you can set limits to the iteration with the help of Python generators like .iter_rows() and .iter_cols()

    >>> for row in spreadsheet.iter_rows(min_row=1,
    ...                           max_row=2,
    ...                           min_col=1,
    ...                           max_col=3):
    ...    print(row)
    (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>) 
    (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)
    
    >>> for column in spreadsheet.iter_cols(min_row=1...                              max_row=2,
    ...                              min_col=1,
    ...                              max_col=3):
    ... print(column)
    (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>) 
    (<Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.B2>) 
    (<Cell 'Sheet 1'.C1>, <Cell 'Sheet 1'.C2>) 

    You can also add Boolean values_only in the above example and set it to True to get the values of cell: 

    >>> for value in spreadsheet.iter_rows(min_row=1...                          max_row=2...                          min_col=1...                          max_col=3...                          values_only=True):
    ...    print(value)
    ('marketplace', 'customer_id', 'review_id') 
    ('US', 3653882, 'R3O9SGZBVQBV76')

    Since we are now done with iterating the data, let us now manipulate data using Python’s primitive data structures. 

    Consider a situation where you want to extract information about a product from the sample spreadsheet and then store it in the dictionary. The key to the dictionary would be the product ID.   

    Convert Data into Python classes

    To convert data into Python data classes, let us first decide what we want to store and how to store it.  

    The two essential elements that can be extracted from the data are as follows:

    Convert Data into Python classes

                                                         1. Products                                             2. Review

                                                              • ID                                                         • ID
                                                              • Title                                                     • Customers ID
                                                              • Parent                                                 • Headline
                                                              • Category                                            • Body
                                                                                                                             • Date

    Let us implement the two elements: 

    import datetime
    from dataclasses import dataclass
    
    @dataclass
    class Product:
        id: str
        parent: str
        title: str
        category: str
    @dataclass
    class Review:
        id: str
        customer_id: str
        stars: int
        headline: str
        body: str 
        date: datetime.datetime 

    The next step is to create a mapping between columns and the required fields: 

    >>> for value in spreadsheet.iter_rows(min_row=1,
    ...                             max_row=1,
    ...                             values_only=True):
    ...    print(value)
    ('marketplace', 'customer_id', 'review_id', 'product_id', ...)
    
    >>> # Or an alternative
    >>> for cell in sheet[1]:
    ...  print(cell.value)
    marketplace
    Customer_ID
    Review_ID
    Product_ID
    Product_Parent
    ...

    Finally, let us convert the data into new structures which will parse the data in the spreadsheet into a list of products and review objects: 

    from datetime import datetime 
    from openpyxl import load_workbook 
    from classes import Product,Review 
    from mapping import PRODUCT_ID,PRODUCT_PARENT,PRODUCT_TITLE, \
        PRODUCT_CATEGORY,REVIEW_DATE,REVIEW_ID,REVIEW_CUSTOMER, \
        REVIEW_STARS,REVIEW_HEADLINE,REVIEW_BODY
    
    # Using the read_only method since you're not gonna be editing the spreadsheet 
    workbook = load_workbook(filename="watch_sample.xlsx",read_only=True) 
    spreadsheet = workbook.active
    
    products = []
    reviews = []
    
    # Using the values_only because you just want to return the cell value
    for row in spreadsheet .iter_rows(min_row=2, values_only=True): 
        product = Product(id=row[PRODUCT_ID], 
                          parent=row[PRODUCT_PARENT], 
                          title=row[PRODUCT_TITLE], 
                          category=row[PRODUCT_CATEGORY]) 
        products.append(product)
    # You need to parse the date from the spreadsheet into a datetime format
    spread_date = row[REVIEW_DATE] 
    parsed_date = datetime.strptime(spread_date,"%Y-%m-%d")
    
    review = Review(id=row[REVIEW_ID],
                  Customer_ID=row[REVIEW_CUSTOMER],
                  stars=row[REVIEW_STARS],
                  headline=row[REVIEW_HEADLINE],
                  body=row[REVIEW_BODY],
                  date=parsed_date)
    reviews.append(review)
    
    print(products[0])
    print(reviews[0])

    After you execute the code, you will get an output that looks like this:

    Product(id='A90FALZ1ZC',parent=937111370,...)
    Review(id='D3O9OGZVVQBV76',customer_id=3903882,...)

    Appending Data 

    To understand how to append data, let us hover back to the first sample spreadsheet. We will open the document and append some data to it: 

    from openpyxl import load_workbook
    
    # Start by opening the spreadsheet and selecting the main sheet 
    workbook = load_workbook(filename="hello_world.xlsx")
    spreadsheet = workbook.active
    
    # Write what you want into a specific cell
    spreadsheet["C1"]="Manipulating_Data ;)"
    
    # Save the spreadsheet
    workbook.save(filename="hello_world_append.xlsx"

    If you open your Excel file, you will notice the additional Manipulating_Data being added to an adjacent cell. 

    Writing Excel Spreadsheets With openpyxl 

    A spreadsheet is a file that helps to store data in specific rows and columns. We can calculate and store numerical data and also perform computation using formulas. 

    So, let’s begin with some simple Spreadsheets and understand what each line means. 

    1. Creating our first simple Spreadsheet

     1 from openpyxl import Workbook
     2  
     3 filename = "first_program.xlsx"
     4  
     5 workbook = Workbook()
     6 spreadsheet = workbook.active
     7  
     8 sheet["A1"] = "first"
     9 sheet["B1"] = "program!"
    10  
    11 workbook.save(filename=filename)

    Line 5: In order to make a Spreadsheet, at first,  we have to create an Empty workbook to perform further operations. 

    Lines 8 and 9: We can add data to a specific cell as per our requirement. In this example, we can see that two values “first” and “program” have been added to specific cells in the sheet. 

    Line 11: The line shows how to save data after all the operations we have done. 

    2. Basic Spreadsheet Operations 

    Before going to the difficult coding part, where a Python advanced certification might be required to land a job, you need to strengthen the building blocks like adding and updating values, managing rows and columns, adding filters, styles, or formulas in a Spreadsheet. 

    We have already explained the following code by which we can add values to a Spreadsheet: 

    >>> spreadsheet["A1"] = "the_value_we_want_to_add"

    There is another way that we can add values to the Spreadsheet: 

    >>> cell = sheet["A1"]
    >>> cell
    <Cell 'Sheet'.A1>
    
    >>> cell.value
    'hello'
    
    >>> cell.value = "hey"

    Conclusion

    You've finished that lengthy read, and now you know how to use spreadsheets in Python. Your dependable friend, openpyxl, will do the following for you: 

    • Spreadsheets may be used to extract useful data in a Pythonic way. 

    • Make your own spreadsheets, regardless of their complexity. 

    • Enhance your spreadsheets with fun features like conditional formatting or charts. 

    Even though this tutorial might not have covered all openpyxl can accomplish, you can always visit the package's official documentation page to find out more. You may even review the package's source code and make more improvements. 

    If you have any questions or would need additional information on any topic, please leave a comment below. 

    Frequently Asked Questions (FAQs)

    1Can you automate Excel with Python?

    Yes, by following specific methods, Excel may be automated using Python. Data analysis is a key component of getting insightful information from your clients. Many firms are experimenting with novel approaches to streamline the Data Analysis process. Some are attempting to fully automate this process, while others are setting it up with a variety of tools. Connecting a data source with a computer language and having it automate data analysis is another cutting-edge method utilized by businesses. Python Excel Automation is one example of this kind of automation. 

    2How to read data from Excel using Python?

    Method 1: Reading an excel file using Python using Pandas 

    This approach involves importing the Pandas module first, after which we'll utilize Pandas to read our excel file. 

    dataframe1 = pd.read_excel('book2.xlsx')

    Method 2: Reading an excel file using Python using openpyxl 

    The Books.xlsx file is read-only when the load workbook() method is used. This function receives this file as an input. To read the values of the max row and max column parameters, the script has generated the object of the dataframe.active. 

    dataframe = openpyxl.load_workbook("Book2.xlsx")

    Method 3: Reading an excel file using Python using Xlwings 

    Similar to how it reads from an Excel file, Xlwings can be used to insert data into an Excel file. To a single cell or a group of cells, data can be entered as a list, a single input, or both. 

    ws = xw.Book("Book2.xlsx").sheets['Sheet1']

    3Is openpyxl or XlsxWriter better?

    Similar to xlsxwriter, openpyxl requires line-by-line writing. The first line, for instance, must have the format [1, "A," 10], but it differs in that it makes use of openpyxl.load workbook (filepath) Data may be added to an existing.xlsx file without overwriting it, but pandas must first combine data by column using a format like [[1,2,3],['A',['B',['C'],[...]]]. Each approach has benefits of its own, thus while constructing the table, the best one should be chosen based on requirements. 

    Profile

    Ramulu Enugurthi

    Blog Author

    Ramulu Enugurthi, a distinguished computer science expert with an M.Tech from IIT Madras, brings over 15 years of software development excellence. Their versatile career spans gaming, fintech, e-commerce, fashion commerce, mobility, and edtech, showcasing adaptability in multifaceted domains. Proficient in building distributed and microservices architectures, Ramulu is renowned for tackling modern tech challenges innovatively. Beyond technical prowess, he is a mentor, sharing invaluable insights with the next generation of developers. Ramulu's journey of growth, innovation, and unwavering commitment to excellence continues to inspire aspiring technologists.

    Share This Article
    Ready to Master the Skills that Drive Your Career?

    Avail your free 1:1 mentorship session.

    Select
    Your Message (Optional)

    Upcoming Programming Batches & Dates

    NameDateFeeKnow more
    Course advisor icon
    Course Advisor
    Whatsapp/Chat icon