Search

Python Programming Filter

How to Stand Out in a Python Coding Interview - Functions, Data Structures & Libraries

Any coding interview is a test which primarily focuses on your technical skills and algorithm knowledge. However, if you want to stand out among the hundreds of interviewees, you should know how to use the common functionalities of Python in a convenient manner.The type of interview you might face can be a remote coding challenge, a whiteboard challenge or a full day on-site interview. So if you can prove your coding skills at that moment, the job letter will reach you in no time. You may go through some of the top Python interview questions and answers provided by experts which are divided into three levels- beginner, intermediate and advanced. A thorough practice of these questions and answers on Python will definitely help you achieve your dream job as a Python Developer, Full Stack engineer, and other top profiles.A Python coding interview is basically a technical interview. They are not just about solving problems, they are more about how technically sound you are and how you can write clean productive Python code. This will show your depth of knowledge about Python and how you can use Python’s built-in functions and libraries to implement your code. Go through our Python Tutorials to learn more about  concepts related to Python. Let us look into some of the built-in functions provided by Python and how to select the correct one, learn about the effective use of data structures, how standard libraries in Python can be utilized and so on.How to Select the Correct Built-in Function?Python’s library of built-in functions is small as compared to the standard library. The built-in functions are always available and are not needed to be imported. It is suggested to learn each function before sitting for the interview. Till then, let us learn a few built-in functions and how to use them and also what alternatives can be used.Perform iteration with enumerate() instead of range() Consider a situation during a coding interview: You have a list of elements and you have to iterate over the list with the access to both the indices and values. To differentiate between iteration with enumerate()  and iteration with range(), let us take a look at the classic coding interview question FizzBuzz. It can be solved by iterating over both indices and values. You will be given a list of integers and your task will be as follows:Replace all integers that are evenly distributed by 3 with “fizz”.Replace all integers divisible by 5 with “buzz”.Replace all integers divisible by 3 and 5 with “fizzbuzz”.Developers make use of range() in these situations which can access the elements by index:>>> list_num = [30, 29, 10, 65, 95, 99] >>> for i in range(len(list_num)):       if list_num[i] % 3 == 0 and list_num[i] % 5 == 0:           list_num[i] = 'fizzbuzz'       elif list_num[i] % 3 == 0:           list_num[i] = 'fizz'       elif list_num[i] % 5 == 0:           list_num[i] = 'buzz'   >>> list_num ['fizzbuzz', 22, 14, 'buzz', 97, 'fizz']Though range() can be used in a lot of iterative methods, it is better to use enumerate() in this case since it can access the element’s index and value at the same time:>>> list_num = [30, 29, 10, 65, 95, 99] >>> for i,num in enumerate(list_num):       if list_num[i] % 3 == 0 and list_num[i] % 5 == 0:           list_num[i] = 'fizzbuzz'       elif list_num[i] % 3 == 0:           list_num[i] = 'fizz'       elif list_num[i] % 5 == 0:           list_num[i] = 'buzz' >>> list_num ['fizzbuzz', 22, 14, 'buzz', 97, 'fizz']The enumerate() function returns a counter and the element value for each element. The counter is set to 0 by default which is also the element’s index.  However, if you are not willing to start your counter from 0, you can set an offset using the start parameter:>>> list_num = [30, 29, 10, 65, 95, 99] >>> for i, num in enumerate(list_num, start=11):       print(i, num) 11 30 12 29 13 10 14 65 14 95 16 99You can access all of the same elements using the start parameter. However, the count will start from the specified integer value.Using List Comprehensions in place of map() and filter()Python supports list comprehensions which are easier to read and are analogous in functionality as map() and filter(). This is one of the reasons why Guido van Rossum, the creator of Python felt that dropping map() and filter() was quite uncontroversial.An example to show  map() along with this equivalent list comprehension:>>> list_num = [1, 2, 3, 4, 5, 6] >>> def square_num(z): ...    return z*z ... >>> list(map(square_num, list_num)) [1, 4, 9, 16, 25, 36] >>> [square_num(z) for z in numbers] [1, 4, 9, 16, 25, 36]Though map() and list comprehension returns the same values but the list comprehension part is easier to read and understand.An example to show  filter() and its equivalent list comprehension:>>> def odd_num_check(z):       return bool(z % 2)   >>> list(filter(odd_num_check, num_list)) [1, 3, 5] >>> [z for z in numbers if odd_num_check(z)] [1, 3, 5]It is the same with filter()as it was with map(). The return values are the same but the list comprehension is easier to follow.List comprehensions are easier to read and beginners are able to catch it more intuitively.Though other programming language developers might argue to the fact but if you make use of list comprehensions during your coding interview, it is more likely to communicate your knowledge about the common functionalities to the recruiter.Debugging With breakpoint() instead of print() Debugging is an essential part of writing software and it shows your knowledge of Python tools which will be useful in developing quickly in your job in the long run. However, using print() to debug a small problem might be good initially but your code will become clumsy. On the other hand, if you use a debugger like breakpoint(), it will always act faster than print().If you’re using Python 3.7, you can simply call breakpoint() at the point in your code where you want to debug without the need of importing anything:# Complicated Code With Bugs ... ... ... breakpoint()Whenever you call breakpoint(), you will be put into The Python Debugger - pdb. However, if you’re using Python 3.6 or older, you can perform an explicit importing which will be exactly like calling breakpoint():import pdb; pdb.set_trace()In this example, you’re being put into the pdb by the pdb.set_trace().  Since it’s a bit difficult to remember, it is recommended to use breakpoint() whenever a debugger is needed. There are also other debuggers that you can try. Getting used to debuggers before your interview would be a great advantage but you can always come back to pdb since it’s a part of the Python Standard Library and is always available. Formatting Strings with the help of f-StringsIt can be confusing to know what type of string formatting should we use since Python consists of a number of different string formatting techniques. However, it is a good approach and is suggested to use Python’s f-strings during a coding interview for Python 3.6 or greater.Literal String Interpolation or f-strings is a powerful string formatting technique that is more readable, more concise, faster and less prone to error than other formatting techniques. It supports the string formatting mini-language which makes string interpolation simpler. You also have the option of adding new variables and Python expressions and they can be evaluated before run-time:>>> def name_and_age(name, age):       return f"My name is {name} and I'm {age / 10:.5f} years old."   >>> name_and_age("Alex", 21) My name is Alex and I'm 2.10000 years old.The f-string allows you to add the name Alex into the string and his corresponding age with the type of formatting you want in one single operation.Note that it is suggested to use Template Strings if the output consists of user-generated values.Sorting Complex Lists with sorted()There are a lot of interview questions that are mostly based on sorting and it is one of the most important concepts you should be clear about before you sit for a coding interview. However, it is always a better option to use sorted() unless you are asked to make your own sorting algorithm by the interviewer.Example code to illustrate simple uses of sorting like sorting numbers or strings:>>> sorted([6,5,3,7,2,4,1]) [1, 2, 3, 4, 5, 6, 7] >>> sorted(['IronMan', 'Batman', 'Thor', 'CaptainAmerica', 'DoctorStrange'], reverse=False) ['Batman', 'CaptainAmerica', 'DoctorStrange', 'IronMan', 'Thor']sorted() performs sorting in ascending order by default and also when the reverse argument is set to False. If you sorting complex data types, you might want to add a function which allows custom sorting rules:>>> animal_list = [ ...    {'type': 'bear', 'name': 'Stephan', 'age': 9}, ...    {'type': 'elephant', 'name': 'Devory', 'age': 5}, ...    {'type': 'jaguar', 'name': 'Moana', 'age': 7}, ... ] >>> sorted(animal_list, key=lambda animal: animal['age']) [     {'type': 'elephant', 'name': 'Devory', 'age': 5},     {'type': 'jaguar', 'name': 'Moana', 'age': 7},     {'type': 'bear, 'name': 'Stephan, 'age': 9}, ]You can easily sort a list of dictionaries using the lambda keyword. In the example above, the lambda returns each element’s age and the dictionary is sorted in ascending order by age.Effective Use of Data StructuresData Structures are one of the most important concepts you should know before getting into an interview and if you choose the perfect data structure during an interviewing context, it will certainly impact your performance. Python’s standard data structure implementations are incredibly powerful and give a lot of default functionalities which will surely be helpful in coding interviews.Storing Values with SetsMake use of sets instead of lists whenever you want to remove duplicate elements from an existing dataset.Consider a function random_word that always returns a random word from a set of words:>>> import random >>> words = "all the words in the world".split() >>> def random_word():       return random.choice(words)In the example above, you need to call random_word repeatedly to get 1000 random selections and then return a data structure that will contain every unique word.Let us look at three approaches to execute this – two suboptimal approaches and one good approach.Bad Approach An example to store values in a list and then convert into a set:>>> def unique_words():       words = []       for _ in range(1000):           words.append(random_word())       return set(words) >>> unique_words() {'planet', 'earth', 'to', 'words'}In this example, creating a list and then converting it into a set is an unnecessary approach. Interviewers notice this type of design and questions about it generally.Worse ApproachYou can store values into a list to avoid the conversion from list to a set. You can then check for the uniqueness by comparing new values with all current elements in the list:>>> def unique_words():       words = []       for _ in range(1000):     word = unique_words()     if word not in words:     words.append(word)       return words >>> unique_words() {'planet', 'earth', 'to', 'words'}This approach is much worse than the previous one since you have to compare every word to every other word already present in the list. In simple terms, the time complexity is much greater in this case than the earlier example.Good ApproachIn this example, you can skip the lists and use sets altogether from the beginning:>>> def unique_words():       words = set()       for _ in range(1000):           words.add(random_word())       return words >>> unique_words() {'planet', 'earth', 'to', 'words'}This approach differs from the second approach as the storing of elements in this approach allows near-constant-time-checks whether a value is present in the set or not whereas linear time-lookups were required when lists were used. The time complexity for this approach is O(N) which is much better than the second approach whose time complexity grew at the rate of O(N²).Saving Memory with GeneratorsThough lists comprehensions are convenient tools, it may lead to excessive use of memory.Consider a situation where you need to find the sum of the first 1000 squares starting with 1 using list comprehensions:>>> sum([z * z for z in range(1, 1001)])333833500Your solution returns the correct answer by making a list of every perfect square and then sums the values. However, the interviewer asks you to increase the number of perfect squares. Initially, your program might work well but it will gradually slow down and the process will be changed completely.  However, you can resolve this memory issue just by replacing the brackets with parentheses:>>> sum((z * z for z in range(1, 1001)))333833500When you make the change from brackets to parentheses, the list comprehension changes to generator expressions. It returns a generator object. The object calculates the next value only when asked. Generators are mainly used on massive sequences of data and in situations when you want to retrieve data from a sequence but don’t want to access all of it at the same time.Defining Default Values in Dictionaries with .get() and .setdefault()Adding, modifying or retrieving an item from a dictionary is one of the most primitive tasks of programming and it is easy to perform with Python functionalities. However, developers often check explicitly for values even its not necessary.Consider a situation where a dictionary named shepherd exists and you want to get that cowboy’s name by explicitly checking for the key with a conditional:>>> shepherd = {'age': 20, 'sheep': 'yorkie', 'size_of_hat': 'large'} >>> if 'name' in shepherd:       name = shepherd['name']     else:       name = 'The Man with No Name'   >>> nameIn this example, the key name is searched in the dictionary and the corresponding value is returned otherwise a default value is returned.You can use .get() in a single line instead of checking keys explicitly:>>> name = shepherd.get('name', 'The Man with No Name')The get() performs the same operation as the first approach does, but they are now handled automatically. However, .get() function does not help in situations where you need to update the dictionary with a default value while still accessing the same key. In such a case, you again need to use explicit checking:>>> if 'name' not in shepherd:       shepherd['name'] = 'The Man with No Name'   >>> name = shepherd['name']However, Python still offers a more elegant way of performing this approach using .setdefault():>>> name = shepherd.setdefault('name', 'The Man with No Name')The .setdefault() function performs the same operation as the previous approach did. If name exists in shepherd, it returns a value otherwise it sets shepherd[‘name’]  to The Man with No Name and returns a new value.Taking Advantage of the Python Standard LibraryPython’s functionalities are powerful on its own and all the things can be accessed just by using the import statement. If you know how to make good use of the standard library, it will boost your coding interview skills.How to handle missing dictionaries?You can use .get() and .setdefault() when you want to set a default for a single key. However, there will be situations where you will need to set a default value for all possible unset keys, especially during the context of a coding interview.Consider you have a  group of students and your task is to keep track of their grades on assignments. The input value is a tuple with student_name and grade. You want to look upon all the grades for a single student without iterating over the whole list. An example to store grade data using a dictionary:>>> grades_of_students = {} >>> grades = [       ('alex', 89),       ('bob', 95),       ('charles', 81),       ('alex', 94),       ] >>> for name, grade in grades:       if name not in grades_of_student:           grades_of_student[name] = []       grades_of_student[name].append(grade) >>> student_grades{'alex': [89, 94], 'bob': [95], 'charles': [81]}In the example above, you iterate over the list and check if the names are already present in the dictionary or not. If it isn’t, then you add them to the dictionary with an empty list and then append their actual grades to the student’s list of grades.However, the previous approach is good but there is a cleaner approach for such cases using the defaultdict:>>> from collections import defaultdict >>> student_grades = defaultdict(list) >>> for name, grade in grades:       student_grades[name].append(grade)In this approach, a defaultdict is created that uses the list() with no arguments. The list()returns an empty list. defaultdict calls the list() if the name does not exist and then appends the grade.Using the defaultdict, you can handle all the common default values at once and need not worry about default values at the key level. Moreover, it generates a much cleaner application code.How to Count Hashable Objects?Pretend you have a long string of words with no punctuation or capital letters and you are asked to count the number of the appearance of each word. In this case, you can use collections.Counter that uses 0 as the default value for any missing element and makes it easier and cleaner to count the occurrence of different objects:>>> from collections import Counter >>> words = "if I am there but if \ ... he was not there then I was not".split() >>> counts = Counter(words) >>> countsCounter({'if': 2, 'there': 2, 'was': 1, 'not': 2, 'but': 1, ‘I’: 2, ‘am’: 1, }When the list is passed to Counter, it stores each word and also the number of occurrences of that word in the list.If you want to know the two most common words in a list of strings like above, you can use .most_common() which simply returns the n most frequently inputs by count:>>> counts.most_common(2)[('if': 2), ('there': 2), ('not': 2), (‘I’: 2)] How to Access Common String Groups?If you want to check whether ‘A’ > ‘a’ or not, you have to do it using the ASCII chart. The answer will be false since the ASCII value for A is 65 and a is 97, which is clearly greater. However, it would be a difficult task to remember the ASCII code when it comes to lowercase and uppercase ASCII characters and also this method is a bit clumsy. You can use the much easier and convenient constants which are a part of the string module. An example to check whether all the characters in a string are uppercase or not:>>> import string >>> def check_if_upper(word):       for letter in word:           if letter not in string.ascii_uppercase:               return False       return True   >>> check_if_upper('Thanks Alex') False >>> check_if_upper('ROFL') TrueThe function check_if_upper iterates over the letters in words, and checks whether the letters are part of string.ascii_uppercase. It is set to the literal ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’.There are a number of string constants that are frequently used for referencing string values that are easy to read and use. Some of which are as follows:string.ascii_lettersstring.ascii_upercasestring.ascii_lowercasestring.ascii_digitsstring.ascii_hexdigitsstring.ascii_octdigitsstring.ascii_punctuationstring.ascii_printablestring.ascii_whitespaceConclusionClearing interview with confidence and panache is a skill. You might be a good programmer but it’s only a small part of the picture. You might fail to clear a few interviews, but if you follow a good process, it will certainly help you in the long run. Being enthusiastic is an important factor that will have a huge impact on your interview results. In addition to that is practice. Practice always helps. Brush up on all the common interview concepts and then head off to practicing different interview questions. Interviewers also help during interviews if you can communicate properly and interact. Ask questions and always talk through a brute-force and optimized solution.Let us now sum up what we have learned in this article so far:To use enumerate() to iterate over both indices and values.To debug problematic code with breakpoint().To format strings effectively with f-strings.To sort lists with custom arguments.To use generators instead of list comprehensions to save memory.To define default values when looking up dictionary keys.To count hashable objects with collections.Counter class.Hope you have learned about most of the powerful Python’s built-in functions, data structures, and standard library packages that will help you in writing better, faster and cleaner code. Though there are a lot of other things to learn about the language, join our Python certification course to gain more skills and knowledge.

How to Stand Out in a Python Coding Interview - Functions, Data Structures & Libraries

6591
How to Stand Out in a Python Coding Interview - Functions, Data Structures & Libraries

Any coding interview is a test which primarily focuses on your technical skills and algorithm knowledge. However, if you want to stand out among the hundreds of interviewees, you should know how to use the common functionalities of Python in a convenient manner.

The type of interview you might face can be a remote coding challenge, a whiteboard challenge or a full day on-site interview. So if you can prove your coding skills at that moment, the job letter will reach you in no time. You may go through some of the top Python interview questions and answers provided by experts which are divided into three levels- beginner, intermediate and advanced. A thorough practice of these questions and answers on Python will definitely help you achieve your dream job as a Python Developer, Full Stack engineer, and other top profiles.

A Python coding interview is basically a technical interview. They are not just about solving problems, they are more about how technically sound you are and how you can write clean productive Python code. This will show your depth of knowledge about Python and how you can use Python’s built-in functions and libraries to implement your code. Go through our Python Tutorials to learn more about  concepts related to Python. Let us look into some of the built-in functions provided by Python and how to select the correct one, learn about the effective use of data structures, how standard libraries in Python can be utilized and so on.

How to Select the Correct Built-in Function?

Python’s library of built-in functions is small as compared to the standard library. The built-in functions are always available and are not needed to be imported. It is suggested to learn each function before sitting for the interview. Till then, let us learn a few built-in functions and how to use them and also what alternatives can be used.

Perform iteration with enumerate() instead of range() 

Consider a situation during a coding interview: You have a list of elements and you have to iterate over the list with the access to both the indices and values. 

To differentiate between iteration with enumerate()  and iteration with range(), let us take a look at the classic coding interview question FizzBuzz. It can be solved by iterating over both indices and values. You will be given a list of integers and your task will be as follows:

  1. Replace all integers that are evenly distributed by 3 with “fizz”.
  2. Replace all integers divisible by 5 with “buzz”.
  3. Replace all integers divisible by 3 and 5 with “fizzbuzz”.

Developers make use of range() in these situations which can access the elements by index:

>>> list_num = [30, 29, 10, 65, 95, 99]
>>> for i in range(len(list_num)):
      if list_num[i] % 3 == 0 and list_num[i] % 5 == 0:
          list_num[i] = 'fizzbuzz'
      elif list_num[i] % 3 == 0:
          list_num[i] = 'fizz'
      elif list_num[i] % 5 == 0:
          list_num[i] = 'buzz'
 
>>> list_num
['fizzbuzz', 22, 14, 'buzz', 97, 'fizz']

Though range() can be used in a lot of iterative methods, it is better to use enumerate() in this case since it can access the element’s index and value at the same time:

>>> list_num = [30, 29, 10, 65, 95, 99]
>>> for i,num in enumerate(list_num):
      if list_num[i] % 3 == 0 and list_num[i] % 5 == 0:
          list_num[i] = 'fizzbuzz'
      elif list_num[i] % 3 == 0:
          list_num[i] = 'fizz'
      elif list_num[i] % 5 == 0:
          list_num[i] = 'buzz'

>>> list_num
['fizzbuzz', 22, 14, 'buzz', 97, 'fizz']

The enumerate() function returns a counter and the element value for each element. The counter is set to 0 by default which is also the element’s index.  

However, if you are not willing to start your counter from 0, you can set an offset using the start parameter:

>>> list_num = [30, 29, 10, 65, 95, 99]
>>> for i, num in enumerate(list_num, start=11):
      print(i, num) 
11 30
12 29
13 10
14 65
14 95
16 99

You can access all of the same elements using the start parameter. However, the count will start from the specified integer value.

Using List Comprehensions in place of map() and filter()

Python supports list comprehensions which are easier to read and are analogous in functionality as map() and filter(). This is one of the reasons why Guido van Rossum, the creator of Python felt that dropping map() and filter() was quite uncontroversial.

An example to show  map() along with this equivalent list comprehension:

>>> list_num = [1, 2, 3, 4, 5, 6]
>>> def square_num(z):
...    return z*z
...
>>> list(map(square_num, list_num))
[1, 4, 9, 16, 25, 36]

>>> [square_num(z) for z in numbers]
[1, 4, 9, 16, 25, 36]

Though map() and list comprehension returns the same values but the list comprehension part is easier to read and understand.

An example to show  filter() and its equivalent list comprehension:

>>> def odd_num_check(z):
      return bool(z % 2)
 
>>> list(filter(odd_num_check, num_list))
[1, 3, 5]

>>> [z for z in numbers if odd_num_check(z)]
[1, 3, 5]

It is the same with filter()as it was with map(). The return values are the same but the list comprehension is easier to follow.

List comprehensions are easier to read and beginners are able to catch it more intuitively.

Though other programming language developers might argue to the fact but if you make use of list comprehensions during your coding interview, it is more likely to communicate your knowledge about the common functionalities to the recruiter.

Debugging With breakpoint() instead of print() 

Debugging is an essential part of writing software and it shows your knowledge of Python tools which will be useful in developing quickly in your job in the long run. However, using print() to debug a small problem might be good initially but your code will become clumsy. On the other hand, if you use a debugger like breakpoint(), it will always act faster than print().

If you’re using Python 3.7, you can simply call breakpoint() at the point in your code where you want to debug without the need of importing anything:

# Complicated Code With Bugs
...
...
...
breakpoint()

Whenever you call breakpoint(), you will be put into The Python Debugger - pdb. However, if you’re using Python 3.6 or older, you can perform an explicit importing which will be exactly like calling breakpoint():

import pdb; pdb.set_trace()

In this example, you’re being put into the pdb by the pdb.set_trace().  Since it’s a bit difficult to remember, it is recommended to use breakpoint() whenever a debugger is needed. There are also other debuggers that you can try. Getting used to debuggers before your interview would be a great advantage but you can always come back to pdb since it’s a part of the Python Standard Library and is always available. 

Formatting Strings with the help of f-Strings

It can be confusing to know what type of string formatting should we use since Python consists of a number of different string formatting techniques. However, it is a good approach and is suggested to use Python’s f-strings during a coding interview for Python 3.6 or greater.

Literal String Interpolation or f-strings is a powerful string formatting technique that is more readable, more concise, faster and less prone to error than other formatting techniques. It supports the string formatting mini-language which makes string interpolation simpler. You also have the option of adding new variables and Python expressions and they can be evaluated before run-time:

>>> def name_and_age(name, age):
      return f"My name is {name} and I'm {age / 10:.5f} years old."
 
>>> name_and_age("Alex", 21)
My name is Alex and I'm 2.10000 years old.

The f-string allows you to add the name Alex into the string and his corresponding age with the type of formatting you want in one single operation.

Note that it is suggested to use Template Strings if the output consists of user-generated values.

Sorting Complex Lists with sorted()

There are a lot of interview questions that are mostly based on sorting and it is one of the most important concepts you should be clear about before you sit for a coding interview. However, it is always a better option to use sorted() unless you are asked to make your own sorting algorithm by the interviewer.

Example code to illustrate simple uses of sorting like sorting numbers or strings:

>>> sorted([6,5,3,7,2,4,1])
[1, 2, 3, 4, 5, 6, 7]

>>> sorted(['IronMan', 'Batman', 'Thor', 'CaptainAmerica', 'DoctorStrange'], reverse=False)
['Batman', 'CaptainAmerica', 'DoctorStrange', 'IronMan', 'Thor']

sorted() performs sorting in ascending order by default and also when the reverse argument is set to False. 

If you sorting complex data types, you might want to add a function which allows custom sorting rules:

>>> animal_list = [
...    {'type': 'bear', 'name': 'Stephan', 'age': 9},
...    {'type': 'elephant', 'name': 'Devory', 'age': 5},
...    {'type': 'jaguar', 'name': 'Moana', 'age': 7},
... ]
>>> sorted(animal_list, key=lambda animal: animal['age'])
[
    {'type': 'elephant', 'name': 'Devory', 'age': 5},
    {'type': 'jaguar', 'name': 'Moana', 'age': 7},
    {'type': 'bear, 'name': 'Stephan, 'age': 9},
]

You can easily sort a list of dictionaries using the lambda keyword. In the example above, the lambda returns each element’s age and the dictionary is sorted in ascending order by age.

Effective Use of Data Structures

Data Structures are one of the most important concepts you should know before getting into an interview and if you choose the perfect data structure during an interviewing context, it will certainly impact your performance. 

Python’s standard data structure implementations are incredibly powerful and give a lot of default functionalities which will surely be helpful in coding interviews.

Storing Values with Sets

Make use of sets instead of lists whenever you want to remove duplicate elements from an existing dataset.

Consider a function random_word that always returns a random word from a set of words:

>>> import random
>>> words = "all the words in the world".split()
>>> def random_word():
      return random.choice(words)

In the example above, you need to call random_word repeatedly to get 1000 random selections and then return a data structure that will contain every unique word.

Let us look at three approaches to execute this – two suboptimal approaches and one good approach.

Bad Approach 

An example to store values in a list and then convert into a set:

>>> def unique_words():
      words = []
      for _ in range(1000):
          words.append(random_word())
      return set(words)
>>> unique_words()
{'planet', 'earth', 'to', 'words'}

In this example, creating a list and then converting it into a set is an unnecessary approach. Interviewers notice this type of design and questions about it generally.

Worse Approach

You can store values into a list to avoid the conversion from list to a set. You can then check for the uniqueness by comparing new values with all current elements in the list:

>>> def unique_words():
      words = []
      for _ in range(1000):
    word = unique_words()
    if word not in words:
    words.append(word)
      return words
>>> unique_words()
{'planet', 'earth', 'to', 'words'}

This approach is much worse than the previous one since you have to compare every word to every other word already present in the list. In simple terms, the time complexity is much greater in this case than the earlier example.

Good Approach

In this example, you can skip the lists and use sets altogether from the beginning:

>>> def unique_words():
      words = set()
      for _ in range(1000):
          words.add(random_word())
      return words
>>> unique_words()
{'planet', 'earth', 'to', 'words'}

This approach differs from the second approach as the storing of elements in this approach allows near-constant-time-checks whether a value is present in the set or not whereas linear time-lookups were required when lists were used. The time complexity for this approach is O(N) which is much better than the second approach whose time complexity grew at the rate of O(N²).

Saving Memory with Generators

Though lists comprehensions are convenient tools, it may lead to excessive use of memory.

Consider a situation where you need to find the sum of the first 1000 squares starting with 1 using list comprehensions:

>>> sum([z * z for z in range(1, 1001)])
333833500

Your solution returns the correct answer by making a list of every perfect square and then sums the values. However, the interviewer asks you to increase the number of perfect squares. 

Initially, your program might work well but it will gradually slow down and the process will be changed completely.  

However, you can resolve this memory issue just by replacing the brackets with parentheses:

>>> sum((z * z for z in range(1, 1001)))
333833500

When you make the change from brackets to parentheses, the list comprehension changes to generator expressions. It returns a generator object. The object calculates the next value only when asked. 

Generators are mainly used on massive sequences of data and in situations when you want to retrieve data from a sequence but don’t want to access all of it at the same time.

Defining Default Values in Dictionaries with .get() and .setdefault()

Adding, modifying or retrieving an item from a dictionary is one of the most primitive tasks of programming and it is easy to perform with Python functionalities. However, developers often check explicitly for values even its not necessary.

Consider a situation where a dictionary named shepherd exists and you want to get that cowboy’s name by explicitly checking for the key with a conditional:

>>> shepherd = {'age': 20, 'sheep': 'yorkie', 'size_of_hat': 'large'}
>>> if 'name' in shepherd:
      name = shepherd['name']
    else:
      name = 'The Man with No Name'
 
>>> name

In this example, the key name is searched in the dictionary and the corresponding value is returned otherwise a default value is returned.

You can use .get() in a single line instead of checking keys explicitly:

>>> name = shepherd.get('name', 'The Man with No Name')

The get() performs the same operation as the first approach does, but they are now handled automatically. 

However, .get() function does not help in situations where you need to update the dictionary with a default value while still accessing the same key. In such a case, you again need to use explicit checking:

>>> if 'name' not in shepherd:
      shepherd['name'] = 'The Man with No Name'
 
>>> name = shepherd['name']

However, Python still offers a more elegant way of performing this approach using .setdefault():

>>> name = shepherd.setdefault('name', 'The Man with No Name')

The .setdefault() function performs the same operation as the previous approach did. If name exists in shepherd, it returns a value otherwise it sets shepherd[‘name’]  to The Man with No Name and returns a new value.

Taking Advantage of the Python Standard Library

Python’s functionalities are powerful on its own and all the things can be accessed just by using the import statement. If you know how to make good use of the standard library, it will boost your coding interview skills.

How to handle missing dictionaries?

You can use .get() and .setdefault() when you want to set a default for a single key. However, there will be situations where you will need to set a default value for all possible unset keys, especially during the context of a coding interview.

Consider you have a  group of students and your task is to keep track of their grades on assignments. The input value is a tuple with student_name and grade. You want to look upon all the grades for a single student without iterating over the whole list. 

An example to store grade data using a dictionary:

>>> grades_of_students = {}
>>> grades = [
      ('alex', 89),
      ('bob', 95),
      ('charles', 81),
      ('alex', 94),
      ]
>>> for name, grade in grades:
      if name not in grades_of_student:
          grades_of_student[name] = []
      grades_of_student[name].append(grade)

>>> student_grades
{'alex': [89, 94], 'bob': [95], 'charles': [81]}

In the example above, you iterate over the list and check if the names are already present in the dictionary or not. If it isn’t, then you add them to the dictionary with an empty list and then append their actual grades to the student’s list of grades.

However, the previous approach is good but there is a cleaner approach for such cases using the defaultdict:

>>> from collections import defaultdict
>>> student_grades = defaultdict(list)
>>> for name, grade in grades:
      student_grades[name].append(grade)

In this approach, a defaultdict is created that uses the list() with no arguments. The list()returns an empty list. defaultdict calls the list() if the name does not exist and then appends the grade.

Using the defaultdict, you can handle all the common default values at once and need not worry about default values at the key level. Moreover, it generates a much cleaner application code.

How to Count Hashable Objects?

Pretend you have a long string of words with no punctuation or capital letters and you are asked to count the number of the appearance of each word. In this case, you can use collections.Counter that uses 0 as the default value for any missing element and makes it easier and cleaner to count the occurrence of different objects:

>>> from collections import Counter
>>> words = "if I am there but if \
... he was not there then I was not".split()
>>> counts = Counter(words)
>>> counts
Counter({'if': 2, 'there': 2, 'was': 1, 'not': 2, 'but': 1, ‘I’: 2, ‘am’: 1, }

When the list is passed to Counter, it stores each word and also the number of occurrences of that word in the list.

If you want to know the two most common words in a list of strings like above, you can use .most_common() which simply returns the n most frequently inputs by count:

>>> counts.most_common(2)
[('if': 2), ('there': 2), ('not': 2), (‘I’: 2)]
How to Access Common String Groups?

If you want to check whether ‘A’ > ‘a’ or not, you have to do it using the ASCII chart. The answer will be false since the ASCII value for A is 65 and a is 97, which is clearly greater. 

However, it would be a difficult task to remember the ASCII code when it comes to lowercase and uppercase ASCII characters and also this method is a bit clumsy. You can use the much easier and convenient constants which are a part of the string module

An example to check whether all the characters in a string are uppercase or not:

>>> import string
>>> def check_if_upper(word):
      for letter in word:
          if letter not in string.ascii_uppercase:
              return False
      return True
 
>>> check_if_upper('Thanks Alex')
False
>>> check_if_upper('ROFL')
True

The function check_if_upper iterates over the letters in words, and checks whether the letters are part of string.ascii_uppercase. It is set to the literal ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’.

There are a number of string constants that are frequently used for referencing string values that are easy to read and use. Some of which are as follows:

  • string.ascii_letters
  • string.ascii_upercase
  • string.ascii_lowercase
  • string.ascii_digits
  • string.ascii_hexdigits
  • string.ascii_octdigits
  • string.ascii_punctuation
  • string.ascii_printable
  • string.ascii_whitespace

Conclusion

Clearing interview with confidence and panache is a skill. You might be a good programmer but it’s only a small part of the picture. You might fail to clear a few interviews, but if you follow a good process, it will certainly help you in the long run. Being enthusiastic is an important factor that will have a huge impact on your interview results. In addition to that is practice. Practice always helps. Brush up on all the common interview concepts and then head off to practicing different interview questions. Interviewers also help during interviews if you can communicate properly and interact. Ask questions and always talk through a brute-force and optimized solution.

Let us now sum up what we have learned in this article so far:

  • To use enumerate() to iterate over both indices and values.
  • To debug problematic code with breakpoint().
  • To format strings effectively with f-strings.
  • To sort lists with custom arguments.
  • To use generators instead of list comprehensions to save memory.
  • To define default values when looking up dictionary keys.
  • To count hashable objects with collections.Counter class.

Hope you have learned about most of the powerful Python’s built-in functions, data structures, and standard library packages that will help you in writing better, faster and cleaner code. Though there are a lot of other things to learn about the language, join our Python certification course to gain more skills and knowledge.

Priyankur

Priyankur Sarkar

Data Science Enthusiast

Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.

Join the Discussion

Your email address will not be published. Required fields are marked *

Suggested Blogs

Six Tips To Improve As Android Developer

Mobile applications are increasing day by day and that is all because of increasing craze and trend of Android development course among individuals. Evolution of these mobile applications has given people an interesting and innovative ways to stay connected with each other. The statistics say it all, Android market leads the position with 2.2 million apps available for download (as of June 2016). With the numbers only we can tell the increased demand for Android developers. Digital era has already started and we are already witnessing the emerging market for these new age jobs. Android development surely has a bright future but with proper Android development training and certification will take you to new heights. As the market for Android development is booming, it is mandatory for an Android developers to have the proper skill set. Let’s explore some basic and essential tips to become a good Android developer. Design does matter One of the primary areas to concentrate on will be your app design. You might be having a great idea and proper skill sets to become an android developer, but if you fail in to attract people just because of bad execution of your ideas, it does not matter how creative your app is because people are not going to turn their heads towards poorly designed applications. Android platform should be your primary concern We know that this article is primarily for android developers, but this point is worth noting. iOS gets more premium quality apps than android (at least sooner than android) because iOS users do not worry about spending a few bucks on their apps, android users, on the other hand, are less targeted towards purchasing the apps. This should not stop you from developing applications for android, the reason being there are more than 1000 million android devices out there and you will be missing out millions of customers. Releasing applications for free “How do I make my money back then?” is the first question that pops into your mind. Well as said before, android users want their applications to be free and you can make your money back by using ads in your application. If this does not work for you, develop two different applications with slightly different features (premium and regular). The premium app should have slightly more features than the normal version and you can release it as paid version while as the normal app can be released for free with slightly fewer features or with ads. But remember this, DO NOT compromise on quality on either of the applications. Being passionate leads to better development Building something that can solve real life problems can help App developers to kick start the early traction. It requires passion and persistence which lacks in many developers. Developing better applications requires some persistence in the work. Treating yourself as the most important user of the application will work in your favour. If you are really passionate about creating something out of the box, persistence is the only way. Top App developers around the globe would still create apps even if they are not getting paid. Teaming Up We all know that having a team makes work better. Developing android applications is nothing but working in a small core team and creating it bit by bit. Technically sound people working together will faster the work. Applications have many aspects which can’t be dealt by a single person. Having a small technical team with well-coordinated people can make things really easy. Languages to Focus On: A successful android developer needs to be proficient in few programming languages which will help in developing better application. These languages consist of Java, SQL and XML. Developers should be well versed in Java as it is the most in-demand language. Creating an android application requires the database and he should be well-versed in SQL. XML works along with SQL and Java as it performs tasks like parsing data feeds, designing UI and more. The above-mentioned points should be given priority before developing any android apps, remember that there are several other ways for you to become a good android developer. Intensive research on developing applications is strongly recommended.
Six Tips To Improve As Android Developer

Mobile applications are increasing day by day and ... Read More

What are Python KeyError Exceptions and How to Handle Them

There are times when you have written your code but while you execute, it might not run. These types of situations occur when the input is inappropriate or you try to open a file with a wrong path or try to divide a number by zero. Due to some errors or incorrect command the output will not be displayed. This is because of errors and exceptions which are a part of the Python programming language. Learn about such concepts and gain further knowledge by joining Python Programming Course.What is Exception Handling?Python raises exceptions when it encounters errors during execution. A Python Exception is basically a construct that signals any important event, such as a run-time error.Exception Handling is the process of responding to executions during computations, which often interrupts the usual flow of executing a program. It can be performed both at the software level as part of the program and also at hardware level using built-in CPU mechanisms.Why is Exception Handling Important?Although exceptions might be irritating when they occur, they play an essential role in high level languages by acting as a friend to the user.An error at the time of execution might lead to two things— either your program will die or will display a blue screen of death. On the other hand, exceptions act as communication tools. It allows the program to answer the questions — what, why and how something goes wrong and then terminates the program in a delicate manner.In simple words, exception handling protects against uncontrollable program failures and increases the potency and efficiency of your code. If you want to master yourself in programming, the knowledge of exceptions and how to handle them is very crucial, especially in Python.What are the Errors and Exceptions in Python?Python doesn’t like errors and exceptions and displays its dissatisfaction by terminating the program abruptly.There are basically two types of errors in the Python language-Syntax Error.Errors occuring at run-time or Exceptions.Syntax ErrorsSyntax Errors, also known as parsing errors, occur when the parser identifies an incorrect statement. In simple words, syntax error occurs when the proper structure or syntax of the programming language is not followed.An example of a syntax error:>>> print( 1 / 0 )) File "", line 1 print( 1 / 0 ))   ^SyntaxError: invalid syntaxExceptionsExceptions occur during run-time. Python raises an exception when your code has a correct syntax but it encounters a run-time issue which it is not able to handle.There are a number of defined built-in exceptions in Python which are used in specific situations. Some of the built-in exceptions are:ExceptionCause Of ErrorArithmeticErrorRaised when numerical computation fails.FloatingPointErrorRaised when floating point calculation fails.AssertionErrorRaised in case of failure of the Assert statement.ZeroDivisionErrorRaised when division or modulo by zero takes place for all numerical values.OverflowErrorRaised when result of an arithmetic operation is very large to be represented.IndexErrorRaised when an index is not found in a sequence.ImportErrorRaised when the imported module is not found.IndentationErrorRaised when indentation is not specified properly.KeyboardInterruptRaised when the user hits interrupt key.RuntimeErrorRaised when a generated error does not fall into any category.SyntaxErrorRaised when there is an error in Python syntax.IOErrorRaised when Python cannot access a file correctly on disk.KeyErrorRaised when a key is not found in a dictionary.ValueErrorRaised when an argument to a function is the right type but not in the right domain.NameErrorRaised when an identifier is not found in the local or global namespace.TypeErrorRaised when an argument to a function is not in the right type.There are another type of built-in exceptions called warnings. They are usually issued in situations where the user is alerted of some conditions. The condition does not raise an exception; rather it  terminates the program.What is a Python KeyError?Before getting into KeyError, you must know the meaning of dictionary and mapping in Python. Dictionary (dict) is an unordered collection of objects which deals with data type key. They are Python’s implementation of data structures and are also known as associative arrays. They comprise key-value pairs, in which each pair maps the key to its associated value.Dictionary is basically a data structure that maps one set of values into another and is the most common mapping in Python.Exception hierarchy of KeyError:->BaseException              ->Exception                         ->LookupError                                       ->KeyErrorA Python KeyError is raised when you try to access an invalid key in a dictionary. In simple terms, when you see a KeyError, it denotes that the key you were looking for could not be found.An example of KeyError:>>> prices = { 'Pen' : 10, 'Pencil' : 5, 'Notebook' : 25} >>> prices['Eraser'] Traceback (most recent call last): File "", line 1, in prices['Eraser'] KeyError: 'Eraser'Here, dictionary prices is declared with the prices of three items. The KeyError is raised when the item ‘Eraser’ is being accessed which is not present in prices.Whenever an exception is raised in Python, it is done using traceback, as you can see in the example code above. It tells why an exception is raised and what caused it.Let’s execute the same Python code from a file. This time, you will be asked to give the name of the item whose price you want to know:# prices.py prices = { 'Pen' : 10, 'Pencil' : 5, 'Notebook' : 25} item = input('Get price of: ') print(f'The price of {item} is {prices[item]}')You will get a traceback again but you’ll also get the information about the line from which the KeyError is raised:Get price of: Eraser Traceback (most recent call last): File "prices.py", line 5, in print(f'The price of {item} is {prices[item]}') KeyError: 'Eraser'The traceback in the example above provides the following information:A KeyError was raised.The key ‘Eraser’ was not found.The line number which raised the exception along with that line.Where else will you find a Python KeyError?Although most of the time, a KeyError is raised because of an invalid key in a Python dictionary or a dictionary subclass, you may also find it in other places in the Python Standard Library, such as in a zipfile. However, it denotes the same semantic meaning of the Python KeyError, which is not finding the requested key.An example of such:>>> from zipfile import ZipFile >>> my_zip_file = ZipFile('Avengers.zip') >>> my_zip_file.getinfo('Batman')Traceback (most recent call last): File "", line 1, in File "myzip.py", line 1119, in getinfo 'There is no item named %r in the archive' % name) KeyError: "There is no item named 'Batman' in the archive"In this example, the zipfile.ZipFile class is used to derive information about a ZIP archive ‘Batman’ using the getinfo() function. Here, the traceback indicates that the problem is not in your code but in the zipfile code, by showing the line which caused the problem. The exception raised here is not because of a LookUpError but rather due to the zipfile.ZipFile.getinfo()function call.When do you need to raise a Python KeyError?In Python Programming, it might be sensible at times to forcefully raise exceptions in your own code. You can usually raise an exception using the raise keyword and by calling the KeyError exception:>>> raise KeyError('Batman')Here, ‘Batman’ acts as the missing key. However, in most cases, you should provide more information about the missing key so that your next developer has a clear understanding of the problem.Conditions to raise a Python KeyError in your code:It should match the generic meaning behind the exception.A message should be displayed about the missing key along with the missing key which needs to be accessed.How to Handle a Python KeyError?The main motive of handling a Python KeyError is to stop unexpected KeyError exceptions to be raised. There are a number of number of ways of handling a KeyError exception.Using get()The get()is useful in cases where the exception is raised due to a failed dictionary LookupError. It returns either the specified key value or a default value.# prices.py prices = { 'Pen' : 10, 'Pencil' : 5, 'Notebook' : 25} item = input('Get price of: ') price = prices.get(item) if price:   print(f'The price of {item} is {prices[item]}')   else:   print(f'The price of {item} is not known')This time, you’ll not get a KeyError because the get() uses a better and safer method to retrieve the price and if not found, the default value is displayed:Get price of: EraserThe price of Eraser is not knownIn this example, the variable price will either have the price of the item in the dictionary or the default value ( which is None by default ).In the example above, when the key ‘Eraser’ is not found in the dictionary, the get() returns  None by default rather than raising a KeyError. You can also give another default value as a second argument by calling get():price = prices.get(item,0)If the key is not found, it will return 0 instead of None.Checking for KeysIn some situations, the get() might not provide the correct information. If it returns None, it will mean that the key was not found or the value of the key in Python Dictionary is actually None, which might not be true in some cases. In such situations, you need to determine the existence of a key in the dictionary. You can use the if and in operator to handle such cases. It checks whether a key is present in the mapping or not by returning a boolean (True or False) value:dict = dictionary() for i in range(50):   key = i % 10     if key in dict: dict[key] += 1 else: dict[key] = 1In this case, we do not check what the value of the missing key is but rather we check whether the key is in the dictionary or not. This is a special way of handling an exception which is used rarely.This technique of handling exceptions is known as Look Before You Leap(LBYL).Using try-exceptThe try-except block is one of the best possible ways to handle the KeyError exceptions. It is also useful where the get() and the if and in operators are not supported.Let’s apply the try-except block on our earlier retrieval of prices code:# prices.py prices = { 'Pen' : 10, 'Pencil' : 5, 'Notebook' : 25} item = input('Get price of: ') try: print(f'The price of {item} is {prices[item]}') except KeyError: print(f'The price of {item} is not known')Here, in this example there are two cases— normal case and a backup case. try block corresponds to the normal case and except block to the backup case. If the normal case doesn’t print the name of the item and the price and raises a KeyError, the backup case prints a different statement or a message.Using try-except-elseThis is another way of handling exceptions. The try-except-else  has three blocks— try block, except block and else block.The else condition in a try-except statement is useful when the try condition doesn’t raise an exception. However, it must follow all the except conditions.Let us take our previous price retrieval code to illustrate try-except-else:# prices.py prices = { 'Pen' : 10, 'Pencil' : 5, 'Notebook' : 25} item = input('Get price of:') try: print(f'The price of {item} is {prices[item]}') except KeyError: print(f'The price of {item} is not known') else: print(f'There is no error in the statement')First, we access an existing key in the try-except block. If the Keyerror is not raised, there are no errors. Then the else condition is executed and the statement is displayed on the screen.Using finallyThe try statement in Python can have an optional finally condition. It is used to define clean-up actions and is always executed irrespective of anything. It is generally used to release external sources.An example to show finally:# prices.py prices = { 'Pen' : 10, 'Pencil' : 5, 'Notebook' : 25} item = input('Get price of: ') try: print(f'The price of {item} is {prices[item]}') except KeyError: print(f'The price of {item} is not known') finally: print(f'The finally statement is executed')Remember, the finally statement will always be executed whether an exception has occurred or not.How to raise Custom Exceptions in Python?Python comprises of a number of built-in exceptions which you can use in your program. However, when you’re developing your own packages, you might need to create your own custom exceptions to increase the flexibility of your program.You can create a custom Python exception using the pre-defined class Exception:def square(x): if x
8280
What are Python KeyError Exceptions and How to Han...

There are times when you have written your code bu... Read More

How to Work With a PDF in Python

Whether it is an ebook, digitally signed agreements, password protected documents, or scanned documents such as passports, the most preferred file format is PDF or Portable Document Format. It was originally developed by Adobe and is a file format used to present and transfer documents easily and reliably. It uses the file extension .pdf. In fact, PDF being the most widely used digital media, is now considered as an open standard which is maintained by the International Standards Organization (ISO). Python has relatively easy syntax which makes it even easier for the ones who are in their initial stage of learning the language. The popular Python libraries are well suited and integrated which allows to easily extract documents from a PDF, rotate pages if required, split pdf to make separate documents, or add watermarks in them.Now an important question rises, why do we need Python to process PDFs? Well, processing a PDF falls under the category of text analytics. There are several libraries and frameworks available which are designed in Python exclusively for text analytics. This makes it easier to play with a PDF in Python. You can also extract information from PDF and use into Natural Language Processing or any other Machine Learning models. Get certified and learn more about Python Programming and apply those skills and knowledge in the real world.History of  pyPDF, PyPDF2, pyPDF4The first PyPDF package was released in 2005 and the last official release in 2010. After a year or so, a  company named Phasit sponsored a branch of the PyPDF called PyPDF2 which was consistent with the original package and worked pretty well for several years.A series of packages were released later on with the name of PyPDF3 and later renamed as PyPDF4. The biggest difference between PyPDF and the other versions was that the later versions supported Python3. PyPDF2 has been discarded recently. But since PyPDF4 is not fully backward compatible with the PyPDf2, it is suggested to use PyPDF2. You can also use a substitute package - pdfrw. Pdfrw was created by Patrick Maupin and allows you to perform all functions which PyPDF2 is capable of except a few such as encryption, decryption, and types of decompression.Some common libraries in PythonLet us look into some of the libraries Python offers to handle PDFs:PdfMiner It is a tool used to extract information from PDF documents. PDFMiner allows the user to analyze text data and obtain the definite location of a text. It provides information such as fonts and lines. We can also use it as a PDF transformer and a PDF parser.PyPDF2PyPDF2 is purely a Python library which allows users to split, merge, crop, encrypt, and transform PDFs. You can also add customized data, view options, and passwords to the documents. Tabula-pyIt is a Python wrapper of tabula-java which can read tables from PDF files and convert into Pandas Dataframe or into CSV/TSV/JSON file formats.SlateIt is a Python package which facilitates the extraction of information and is dependent on the PdfMiner package.PDFQueryA light Python wrapper which uses minimum code to extract data from PDFs.xPDFIt is an open source viewer of PDF which also includes an extractor, converter and other utilities. Out of all the libraries mentioned above, PyPDF2 is the most used to perform operations like extraction, merging, splitting and so on.Installing PyPDF2If you're using Anaconda, you can install PyPDF2 using pip or conda. To install PyPDF2 using pip, run the following command in the command line:pip install PyPDF2The module is case-sensitive. So you need to make sure that proper syntax is followed. The installation is really quick since PyPDF2 is free of dependencies.Extracting Document Information from a PDF in PythonPyPDF2 can be used to extract metadata and all sorts of texts from PDF when you are performing operations on preexisting PDF files. The types of data you can extract are:AuthorCreatorProducerSubjectTitleNumber of PagesTo understand it better, let us use an existing PDF in your system or you can go to Leanpub and download a book sample.The code for extracting the document information from the PDF—# get_doc_info.py from PyPDF2 import PdfFileReader def getinfo(path):     with open(path, 'rb') as f:         PDF = PdfFileReader(f)         information = PDF.getDocumentInfo()         numberofpages = PDF.getNumPages()     print(information)     author = information.author     creator = information.creator     producer =information .producer     subject = information.subject     title = information.title if __name__ == '__main__':     path = 'reportlab-sample.pdf'     getinfo(path)The output of the program above will look like—Here, we have firstly imported PdfFileReader from the PyPDF2 package. The class PdfFileReader is used to interact with PDF files like reading and extracting information using accessor methods. Then, we have created our own function getinfo with a PDF file as an argument and then called the getdocumentinfo(). This returned an instance of DocumentInformation. And finally we got extract information like the author, creator, subject or title, etc.getNumPages() is used to count the number of pages in the document. PdfMiner can be used when you want to extract text from a PDF file. It is potent and particularly designed for extracting text from PDF.We have learned to extract information from PDF. Now let’s learn how to rotate a PDF. Rotating pages in PDFA lot of times we receive PDFs which contain pages in landscape orientation instead of portrait. You may also find certain documents to be upside down, which happens while scanning a document or mailing. However, we can rotate the pages clockwise or counterclockwise according to our choice using Python with PyPDF2.The code for rotating the article is as follows—# rotate_pages.py from PyPDF2 import PdfFileReader, PdfFileWriter def rotate(pdf_path):     pdf_write = PdfFileWriter()     pdf_read = PdfFileReader(path)     # Rotate page 90 degrees to the right     page1 = pdf_read.getPage(0).rotateClockwise(90)     pdf_write.addPage(page1)     # Rotate page 90 degrees to the left     page2 = pdf_read.getPage(1).rotateCounterClockwise(90)     pdf_write.addPage(page2)     # Add a page in normal orientation     pdf_write.addPage(pdf_read.getPage(2))     with open('rotate_pages.pdf', 'wb') as fh:         pdf_write.write(fh) if __name__ == '__main__':     path = 'mldocument.pdf'     rotate(path)The output of the code will be as follows—Here firstly we imported the PdfFileReader and the PdfFileWriter so that we can write out a new PDF file. Then we declared a function rotate with a path to the PDF that is to be modified. Within the function, we created a read object pdf_read and write object pdf_write.Then, we used the getPage() to grab the pages. Two pages page1 and page2 are taken and rotated to 90 degrees clockwise and 90 degrees counterclockwise respectively using rotateClockwise() and rotateCounterClockwise().We used addPage() function after each rotation method calls. This adds the rotated page to the write object. The last page we add is page3 without any rotation.Lastly, we have used write() with a file-like parameter to write out the new PDF. The final PDF contains three pages, the first two will be in the landscape mode and rotated in reversed direction and the third page will be in normal orientation.Now we will learn to merge different PDFs into one.Merging PDFsIn many cases, we need to merge two PDFs into a single one. For example, suppose you are working on a project report and you need to print it and bind it into a book. It contains a cover page followed by the project report. So you have two different PDFs and you want to merge them into one PDF. You can simply use Python to do so. Let us see how can we merge PDFs into one.The code for merging two PDF documents using PyPDF in mentioned below:# pdf_merging.py from PyPDF2 import PdfFileReader, PdfFileWriter def pdfmerger(paths, output):     pdfwrite = PdfFileWriter()     for path in paths:         pdfread = PdfFileReader(path)         for page in range(pdfread.getNumPages()):             # Add each page to the writer object             pdfwrite.addPage(pdfread.getPage(page))     # Write out the merged PDF     with open(output, 'wb') as out:         pdfwrite.write(out) if __name__ == '__main__':     paths = ['document-1.pdf', 'document-2.pdf']     pdfmerger(paths, output='merged.pdf')Here we have created a function pdfmerger() which takes a number of inputs and a single output. Then we created a PdfFileReader() object for each PDF path and looped over the pages, added each page to the write object. Finally, using the write() function the object’s contents are written to the disk.PyPDF2 makes the process of merging simpler by creating the PdfFileMerger class.Code for merging two documents using PyPDF2—# pdf_merger2.py import glob from PyPDF2 import PdfFileMerger def merger(output_path, input_paths):     pdfmerge = PdfFileMerger()     file_handles = []     for path in input_paths:         pdfmerge.append(path)     with open(output_path, 'wb') as fileobj:         pdfmerge.write(fileobj) if __name__ == '__main__':     paths = glob.glob('d-1.pdf')     paths.sort()     merger('d-2.pdf', paths)The PyPDF2 makes it simpler in the way that we don’t need to loop the pages of each document ourselves.  Here, we created the object pdfmerge and looped through the PDF paths. The PyPDF2 automatically appends the whole document. Finally, we write it out.Let’s perform the opposite of merging now!Splitting PDFsThe PyPDF2 package has the ability to split up a single PDF into multiple PDFs. It allows us to split pages into different PDFs. Suppose we have a set of scanned documents in a single PDF and we need to separate the pages into different PDFs as per requirement, we can simply use Python to select pages we want to split and get the work done.Code for splitting a single PDF into multiple PDFs—# pdf_splitter.py import os from PyPDF2 import PdfFileReader, PdfFileWriter def splitpdf(path):     fname = os.path.splitext(os.path.basename(path))[0]     pdf = PdfFileReader(path)     for page in range(pdf.getNumPages()):         pdfwrite = PdfFileWriter()         pdfwrite.addPage(pdf.getPage(page))         outputfilename = '{}_page_{}.pdf'.format(             fname, page+1)         with open(outputfilename, 'wb') as out:             pdfwrite.write(out)         print('Created: {}'.format(outputfilename)) if __name__ == '__main__':     path = 'document-1.pdf'     splitpdf(path)Here we have imported the PdfFileReader and PdfFileWriter from PyPDF2. Then we created a function called splitpdf() which accepts the path of PDF we want to split. The first line of the function takes the name of the input file. Then we open the PDF and create a read object. Using the read object’s getNumPages(), we loop over all the pages.In the next step, we created an instance of PdfFileWriter inside the for loop. Then, we created a PDF write instance and added each page to it for each of the pages in the PDF input. We also created a unique filename using the original filename + the word ‘page’ + the page number + 1.Once we are done with running the script, we will have each of the pages of the input PDF split into multiple PDFs. Now let us learn how to add a watermark to a PDF and keep it secured.Adding Overlays/WatermarksAn image or superimposed text on selected pages in a PDF document is referred to as a Watermark. The Watermark adds security features and protects our rational property like images and PDFs. Watermarks are also called overlays.The PyPDF2 allows us to watermark documents. We just need to have a PDF which will consist of our watermark text, image or signature.Code for adding a watermark in a PDF—# watermarker.py from PyPDF2 import PdfFileWriter, PdfFileReader def watermark(inputpdf, outputpdf, watermarkpdf):     watermark = PdfFileReader(watermarkpdf)     watermarkpage = watermark.getPage(0)     pdf = PdfFileReader(inputpdf)     pdfwrite = PdfFileWriter()     for page in range(pdf.getNumPages()):         pdfpage = pdf.getPage(page)         pdfpage.mergePage(watermarkpage)         pdfwrite.addPage(pdfpage)     with open(outputpdf, 'wb') as fh:         pdfwrite.write(fh) if __name__ == '__main__':     watermark(inputpdf='document-1.pdf',               outputpdf='watermarked_w9.pdf',               watermarkpdf='watermark.pdf')The output of the code will look like— There are three arguments of the function watermark(): inputpdf: The path of the PDF that is to be watermarked. outputpdf: The path where the watermarked PDF will be saved. watermarkpdf: The PDF which contains the watermark.Firstly, we extract the PDF page which contains the watermark image or text and then open that PDF page where we want to give the desired watermark.Using the inputpdf, we create a read object and using the pdfwrite, we create a write object to write out the watermarked PDF and then iterate over the pages.Next, we call the page object’s mergePage and apply the watermark and add that to the write object pdfwrite.When the loop terminates, the watermarked PDF is written out to the disk and it’s done!Encrypting a PDFIn the PDF world, the PyPDF2 package allows an owner password which gives the user the advantage to work as an administrator. The package also provides the user password which allows us to open the document upon entering the password.The PyPDF2 basically doesn’t permit any allowances on any PDF file yet it allows the user to set the owner password and user password.Code to add a password and add encryption to a PDF—# pdf_encrypt.py from PyPDF2 import PdfFileWriter, PdfFileReader def encryption(inputpdf, outputpdf, password):     pdfwrite = PdfFileWriter()     pdfread = PdfFileReader(inputpdf)     for page in range(pdfread.getNumPages()):         pdfwrite.addPage(pdfread.getPage(page))     pdfwrite.encrypt(user_pwd=password, owner_pwd=None,                       use_128bit=True)     with open(outputpdf, 'wb') as fh:         pdfwrite.write(fh) if __name__ == '__main__':     encryption(inputpdf='document-1.pdf',                   outputpdf='document-1-encrypted.pdf',                   password='twofish')We declare a  function named encryption() with three arguments—the input PDF path, the output PDF path and the password that we want to keep. Then we create one read object pdfread and one write object pdfwrite. Now we loop over all the pages and add them to the write object since we need to encrypt the entire document.Finally, we call the encrypt() function which accepts three parameters—the user password, the owner password and the whether or not to use 128-bit encryption. The PDF  will be encrypted to 40-bit encryption if the argument use128bit is set to false. Also if the owner password is set to none, then it will be set to user password automatically.Reading the Table data from PDFSuppose you want to work with the Table data in Pdf, you can use tabula-py to read tables in a PDF. To install tabula-py, run:pip install tabula-pyCode to extract simple Text from pdf using PyPDF2:import tabula # readinf the PDF file that contain Table Data # you can find the pdf file with complete code in below # read_pdf will save the pdf table into Pandas Dataframe df = tabula.read_pdf("document.pdf") # in order to print first 5 lines of Table df.head()If you PDF file contains Multiple Tabledf = tabula.read_pdf("document.pdf",multiple_tables=True)If you want to extract Information from the specific part of any specific page of PDFtabula.read_pdf("document.pdf", area=(126,149,212,462), pages=1)If you want the output into JSON Formattabula.read_pdf("offense.pdf", output_format="json")Exporting PDF into ExcelSuppose you want to export a PDF into Excel, you can do so by writing the following code and convert the PDF Data into Excel or CSV.tabula.convert_into("document.pdf", "document_testing.xlsx", output_format="xlsx")Let us sum up what we have learned in the article:Extraction of data from a PDFRotate pages in a PDFMerge PDFs into one PDFSplit a PDF into many PDFsAdd watermarks or overlays in a PDFAdd password or encryption to a PDFReading table from PDFExporting PDF into Excel or CSVAs you have seen, PyPDF2 is one of the most useful tools available in Python. The features of PyPDF2 makes life easier whether you are working on a large project or even when you quickly want to make some changes to your PDF documents. Learn more about such libraries and frameworks as KnowledgeHut offers Python Certification Course for Programmers, Developers, Jr./Sr Software Engineers/Developers and anybody who wants to learn Python.
8368
How to Work With a PDF in Python

Whether it is an ebook, digitally signed agreement... Read More

20% Discount