Search

What are List Methods in Python

Sequence is one of the most basic data types in Python. Every element of a sequence is allocated a unique number called its position or index. The first designated index is zero, the second index is one, and so forth. Although Python comes with six types of pre-installed sequences, the most used ones are lists and tuples, and in this article we would be discussing about lists and its methods.Certain tasks can be performed with all types of sequence; these include adding, multiplying, indexing, slicing, and so on. For added convenience, Python provides built-in functions to find the length of a sequence and to find the largest and smallest elements of the sequence. If you are interested to learn more about other functions and features of Python, you may go through our Python tutorial.What is a ListList is the most versatile data-type available in Python that can be written as a collection of comma-separated values or items between square brackets. The items in a list need not necessarily be homogeneous, i.e. of the same type. This property of List makes it one of the most powerful tools in Python eg:[‘HELLO’, 57, “SKY”]. A single list can contain different Data-Types such as integers, strings, as well as Objects. Lists are mutable, and hence can be changed even after their creation.In Python, lists are ordered in a definite manner and have a definite count. Elements of a list are indexed according to a particular sequence and the indexing is done with 0 being the starting index. Even element of a list has its unique place in the list, which allows duplication of the elements in the list, with each element having its own distinct place and credibility. Lists are a useful tool for storing a sequence of data.Creating a list is as simple as putting different types of comma-separated values between square brackets:list1 = ['jack', 'jill', 1998, 2019];[Text Wrapping Break] list2 = [1, 2, 3, 4, 5 ];[Text Wrapping Break] list3 = ["w", "x", "y", "z"]Just like string indices, list indices start with 0, and lists can be sliced, concatenated and so on.Creating a ListPython Lists can be created just by placing the integer, string, or character inside the square brackets[]. Unlike Sets, a list does not require a built-in function for its creation.# Python program to demonstrate the creation of a list [Text Wrapping Break] # Creating a List [Text Wrapping Break] List = [] [Text Wrapping Break] print("Initial blank List: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] # Creating a List with  [Text Wrapping Break] # the use of a String [Text Wrapping Break] List = ['PythonListDemo'] [Text Wrapping Break] print("\nList with the use of String: ") [Text Wrapping Break] print(List) [Text Wrapping Break][Text Wrapping Break] # Creating a List with [Text Wrapping Break] # the use of multiple values  [Text Wrapping Break] List = ["Python", "List", "Demo"] [Text Wrapping Break] print("\nList containing multiple values: ") [Text Wrapping Break] print(List[0]) [Text Wrapping Break] print(List[2]) [Text Wrapping Break] [Text Wrapping Break] # Creating a Multi-Dimensional List [Text Wrapping Break] # (By Nesting a list inside a List) [Text Wrapping Break] List = [['Python', 'List'] , ['Demo']] [Text Wrapping Break] print("\nMulti-Dimensional List: ") [Text Wrapping Break] print(List)Initial blank List: [Text Wrapping Break] [] [Text Wrapping Break] [Text Wrapping Break] List with the use of String: [Text Wrapping Break] ['PythonListDemo'] [Text Wrapping Break] [Text Wrapping Break] List containing multiple values: [Text Wrapping Break] Python [Text Wrapping Break] Demo[Text Wrapping Break] [Text Wrapping Break] Multi-Dimensional List:  [Text Wrapping Break] [['Python', 'List'], ['Demo']]Creating a list with multiple distinct or duplicate elements:Multiple distinct or duplicate values can be stored as a sequence during creation of list:# Creating a List with [Text Wrapping Break] # the use of  Numbers [Text Wrapping Break] # (Having duplicate values) [Text Wrapping Break] List = [1, 2, 4, 4, 3, 3, 3, 6, 5] [Text Wrapping Break] print("\nList with the use of Numbers: ")  [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Creating a List with [Text Wrapping Break] # mixed type of values [Text Wrapping Break] #  (Having numbers and strings) [Text Wrapping Break] List = [1, 2, 'Python', 4, 'List', 6, 'Demo'] [Text Wrapping Break] print("\nList with the use of Mixed Values: ") [Text Wrapping Break] print(List)List with the use of Numbers: [Text Wrapping Break][1, 2, 4, 4, 3, 3, 3, 6,  5] [Text Wrapping Break] [Text Wrapping Break] List with the use of Mixed Values: [Text Wrapping Break] [1, 2, 'Python', 4, 'List', 6, 'Demo'] [Text Wrapping Break]Adding Elements to a ListUsing append() methodUsing the built-in append() function elements can be added to the List. At a time only one element can be added to the list by the use of append() method, but for the addition of multiple elements with the method, loops are used. Unlike Sets, a new List can be added to an existing one with the use of the append() method.# Python program to demonstrate addition of elements in a List [Text Wrapping Break] [Text Wrapping Break] # Creating a List [Text Wrapping Break] List = [] [Text Wrapping Break] print("Initial blank List: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] #  Addition of Elements  [Text Wrapping Break] # in the List [Text Wrapping Break] List.append(1) [Text Wrapping Break] List.append (2) [Text Wrapping Break] List.append(4) [Text Wrapping Break] print("\nList after Addition of Three elements: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break]# Adding elements to the List [Text Wrapping Break] # using Iterator [Text Wrapping Break] for i in range(1, 4): [Text Wrapping Break] List.append(i) [Text Wrapping Break] print("\nList after Addition of elements from 1-3: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] # Addition of List to a List [Text Wrapping Break] List2 = ['Python', 'List'] [Text Wrapping Break] List.append(List2) [Text Wrapping Break] print ("\nList after Addition of a List: ") [Text Wrapping Break] print(List)Initial blank List: [Text Wrapping Break] [] [Text Wrapping Break]  [Text Wrapping Break] List after Addition of Three elements: [Text Wrapping Break] [1,2,4][Text Wrapping Break][Text Wrapping Break]List after Addition of elements from 1-3: [Text Wrapping Break] [1, 2, 4, 1, 2, 3] [Text Wrapping Break]  [Text Wrapping Break] List after Addition of a List:[Text Wrapping Break] [1, 2,  4, 1, 2, 3, ['Python', 'List']]Using insert() methodAppend() method restricts the addition of elements at the end of the List only. Using the insert() method, elements can be added to the list at your desired position. Unlike append() which requires only one argument, insert() method requires two arguments for defining the position and value of the element to be inserted (position, value).# Python program to demonstrate addition of elements in a List [Text Wrapping Break] [Text Wrapping Break] # Creating a List  [Text Wrapping Break] List = [1,2,3,4] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] # Addition of Element at  [Text Wrapping Break] # specific Position [Text Wrapping Break] # (using Insert Method) [Text Wrapping Break] List.insert(3, 12) [Text Wrapping Break] List.insert(0, 'Python') [Text Wrapping Break] print("\nList after performing Insert Operation: ") [Text Wrapping Break] print(List)Initial List: [Text Wrapping Break] [1, 2, 3, 4] [Text Wrapping Break] [Text Wrapping Break] List after performing Insert Operation: [Text Wrapping Break] [ 'Python', 1, 2, 3, 12, 4]Using extend() methodApart from append() and insert() methods, there’s another method to add elements by the use of extend() method. This method is used for adding multiple elements to the end of the list at once.# Python program to demonstrate  [Text Wrapping Break] # Addition of elements in a List [Text Wrapping Break] [Text Wrapping Break] # Creating a List  [Text Wrapping Break] List = [1,2,3,4] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] #  Addition of multiple elements [Text Wrapping Break] # to the List at the end [Text Wrapping Break] # (using Extend Method)  [Text Wrapping Break] List.extend ( [8,  'Python', 'Program']) [Text Wrapping Break] print( "\nList after performing  Extend Operation: " ) [Text Wrapping Break] print(List)Initial List: [Text Wrapping Break] [1, 2, 3, 4] [Text Wrapping Break]  [Text Wrapping Break] List after performing Extend Operation:  [Text Wrapping Break] [1, 2, 3, 4, 8, 'Python', 'Program']Accessing elements from the ListIn order to access the items in a List, the index number is used as reference. The index operator [ ] is used to access the items of a list. The index should be an integer and nested lists are accessed by using nested indexing.# Python program to demonstrate  [Text Wrapping Break] # accessing of element from list [Text Wrapping Break][Text Wrapping Break] # Creating a List with [Text Wrapping Break]# the use of multiple values [Text Wrapping Break] List = ["Access", "List", "Elements"] [Text Wrapping Break]  [Text Wrapping Break]# accessing an element from the  [Text Wrapping Break] # list using index number [Text Wrapping Break] print("Accessing an element from the list") [Text Wrapping Break] print(List[0])  [Text Wrapping Break] print(List[2]) [Text Wrapping Break]  [Text Wrapping Break] # Creating a Multi-Dimensional List [Text Wrapping Break] # (By Nesting a list inside a List) [Text Wrapping Break] List = [['Access', 'List'] , ['Elements']] [Text Wrapping Break]  [Text Wrapping Break] # accessing an element from the [Text Wrapping Break] # Multi-Dimensional List using [Text Wrapping Break] # index number [Text Wrapping Break] print("Accessing an element from a Multi -  Dimensional list") [Text Wrapping Break] print(List[0][1]) [Text Wrapping Break] print(List[1][0])Accessing an element from the list [Text Wrapping Break] Access [Text Wrapping Break] Elements [Text Wrapping Break] [Text Wrapping Break] Accessing an element from a Multi-Dimensional  list [Text Wrapping Break] List [Text Wrapping Break] ElementsNegative indexingIn Python, negative sequence indexing means the representation of positions of the array from the end. Rather than calculating the offset like List[len(List)-3], we can just write it like List[-3]. Here, -1 refers to the last item, -2 refers to the second last item etc. i.e. beginning from the end.List = [1, 2, 'Python', 4, 'Negative', 6, 'Index'] [Text Wrapping Break] [Text Wrapping Break] # Accessing an element using negative indexing [Text Wrapping Break] print("Accessing element using negative indexing") [Text Wrapping Break]  [Text Wrapping Break] # print the last element of list [Text Wrapping Break] print(List[-1]) [Text Wrapping Break] [Text Wrapping Break]# print the third last element of list  [Text Wrapping Break] print(List[-3])Accessing element using negative indexing [Text Wrapping Break] Index [Text Wrapping Break] NegativeRemoving Elements from the ListUsing remove() methodIn Python, using the built-in remove() function, elements can be removed from the List but an Error will arise if the element is not present in the set. Remove() method is only capable of removing one element at a time, to remove a range of elements, an iterator is used. A limitation of this method is that it will only remove the first occurrence of the searched element and would not work if there are multiple occurrences of the searched element.# Python program to demonstrate removal of elements in a List [Text Wrapping Break][Text Wrapping Break] # Creating a List [Text Wrapping Break] List = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Removing elements from List [Text Wrapping Break] # using Remove() method [Text Wrapping Break] List.remove(5) [Text Wrapping Break] List.remove(6) [Text Wrapping Break] print("\nList after removal of two elements: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Removing elements from List [Text Wrapping Break] # using iterator method [Text Wrapping Break] for i in range(1, 5): [Text Wrapping Break]    List.remove(i) [Text Wrapping Break] print("\nList after removing a range of elements: ") [Text Wrapping Break] print(List)Initial List: [Text Wrapping Break][1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] [Text Wrapping Break][Text Wrapping Break]List after removal of two elements: [Text Wrapping Break][1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12][Text Wrapping Break] [Text Wrapping Break] List after removing a range of elements: [Text Wrapping Break] [7, 8, 9, 10, 11, 12]Using pop() methodIn Python,  we can also remove and return an element from the set using the Pop() function, but it removes  the last element of the set only by default. To remove a specific element from a position of the List, index of the element is passed as an argument to the pop() function.List = [1,2,3,4,5] [Text Wrapping Break] [Text Wrapping Break] # Removing element from the  [Text Wrapping Break] # Set using the pop() method [Text Wrapping Break] List.pop() [Text Wrapping Break] print("\nList after popping an element: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Removing element at a  [Text Wrapping Break] # specific location from the  [Text Wrapping Break] # Set using the pop() method [Text Wrapping Break] List.pop(2) [Text Wrapping Break] print("\nList after popping a specific element: ") [Text Wrapping Break] print(List)List after popping an element: [Text Wrapping Break] [1, 2, 3, 4]  [Text Wrapping Break][Text Wrapping Break]List after popping a specific  element: [Text Wrapping Break][1, 2, 4]Slicing of a ListAlthough there are several ways to print the whole List with all the elements in Python, there is only one way to print a specific range of elements from the list: by the use of Slice operation. Slice operation is performed on Lists by the use of colon(:). For printing elements from the beginning of the range use [:Index], for printing elements from end use [:-Index], to print elements from a specific index till the end use [Index:], for printing elements within a specific range, use [Start Index: End Index] and to print the entire List by the use of slicing operation, use [:]. Moreover, in order to print entire List in reverse order, use [::-1]. For printing the elements of List from rear end, negative indexes are used.# Python program to demonstrate removal of elements in a List [Text Wrapping Break][Text Wrapping Break]# Creating a List [Text Wrapping Break] List = ['P', 'Y', 'T', 'H','O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M'] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Print elements of a range [Text Wrapping Break] # using Slice operation [Text Wrapping Break] Sliced_List = List[3:10] [Text Wrapping Break] print("\nSlicing elements in a range 3-10: ") [Text Wrapping Break] print(Sliced_List) [Text Wrapping Break]  [Text Wrapping Break] # Print elements from a  [Text Wrapping Break] # pre-defined point to end [Text Wrapping Break] Sliced_List = List[6:] [Text Wrapping Break] print("\nElements sliced from 6th ""element till the end: ") [Text Wrapping Break] print(Sliced_List) [Text Wrapping Break] [Text Wrapping Break]# Printing elements from [Text Wrapping Break] # beginning till end [Text Wrapping Break] Sliced_List = List[:] [Text Wrapping Break] print("\nPrinting all elements using slice operation: ") [Text Wrapping Break] print(Sliced_List) Initial List: [Text Wrapping Break] ['P', 'Y', 'T', 'H','O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M'] [Text Wrapping Break] [Text Wrapping Break] Slicing elements in a range 3-10: [Text Wrapping Break]['H', 'O', 'N', 'P', 'R','O','G'][Text Wrapping Break][Text Wrapping Break]Elements sliced from 6th element till the end: [Text Wrapping Break]['P', 'R', 'O', 'G', 'R', 'A', 'M'][Text Wrapping Break][Text Wrapping Break]Printing all elements using slice operation: [Text Wrapping Break] ['P', 'Y', 'T', 'H', 'O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M']Negative index List Slicing# Creating a List [Text Wrapping Break] List = ['P', 'Y', 'T', 'H','O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M'] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Print elements from beginning [Text Wrapping Break] # to a pre-defined point using Slice [Text Wrapping Break] Sliced_List = List[:-7] [Text Wrapping Break] print("\nElements sliced till 7th element from last: ") [Text Wrapping Break] print(Sliced_List) [Text Wrapping Break]  [Text Wrapping Break] #  Print elements of a range [Text Wrapping Break] # using negative index List slicing [Text Wrapping Break] Sliced_List = List[-6:-1] [Text Wrapping Break] print("\nElements sliced from index -6 to -1") [Text Wrapping Break] print(Sliced_List) [Text Wrapping Break]  [Text Wrapping Break] #  Printing elements in reverse [Text Wrapping Break] # using Slice operation [Text Wrapping Break] Sliced_List = List[::-1] [Text Wrapping Break] print("\nPrinting List in reverse: ") [Text Wrapping Break] print(Sliced_List)Initial List: [Text Wrapping Break] ['P', 'Y', 'T', 'H','O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M'][Text Wrapping Break] [Text Wrapping Break]Elements sliced till 7th element from last: [Text Wrapping Break]['P', 'Y', 'T', 'H', 'O', 'N'][Text Wrapping Break] [Text Wrapping Break]Elements sliced from index -6 to -1[Text Wrapping Break] ['R', 'O', 'G', 'R', 'A'][Text Wrapping Break] [Text Wrapping Break]Printing List in reverse:[Text Wrapping Break]['M', 'A', 'R', 'G', 'O', 'R', 'P', 'N', 'O', 'H', 'T', 'Y', 'P']Updating ListsYou can update single or multiple elements of lists by reassigning the values on the lists individually.# Python program to update elements of a list[Text Wrapping Break]list = ['physics', 'chemistry', 1998, 2019];[Text Wrapping Break] print "Value available at index 2 : "[Text Wrapping Break] print list[2] [Text Wrapping Break] list[2] = 2000;[Text Wrapping Break] print "New value available at index 2 : "[Text Wrapping Break] print list[2]Value available at index 2 : [Text Wrapping Break] 1998 [Text Wrapping Break] New value available at index 2 : [Text Wrapping Break] 2000Built-in functionsFUNCTION DESCRIPTIONsum() Addsall numbers.ord() Used for returning an integer which represents the unique Unicode code point of the given Unicode character.cmp() If the first list is “greater” than the second list, the function returns 1.max() It returns the largest element in the list.min() It returns the smallest element in the list.all() It returns true if all elements are true or false if any element in the list is empty.any() It returns true if even one of the elements of the list is true. If one list is empty, it returns false.len() It returns length the list.enumerate() It adds a counter to an enumerate object that can be used directly for loops.accumulate() It makes an iterator that gives the result of a function. It takes a function as an argument.filter() It can individually check if every element is true or not.map() It applies a particular function to each item of an iterable and shows a list of the results.lambda() It is an anonymous function that behaves like a normal function in regard to arguments. While normal functions are defined with def keyword, anonymous functions are defined using lambda keyword.List MethodsFUNCTION DESCRIPTIONAppend() Adds an element at the end of the listExtend() Adds all elements of one list to another listInsert() Inserts an item at a desired indexRemove() Removes an item from the listPop() Removes and returns an element at a desired indexClear() Removes all elements from the listIndex() Returns the index of the first identical itemCount() Returns the number of items passed as argumentSort() Sort items of a list in ascending orderReverse() Reverses the listcopy() Returns a copy of the listSummaryIn this article, we have covered the concept of Lists in Python. You have learned the basics of creating a List, adding value to it, accessing its elements, removing the elements, and various other operations. We have also covered some basic built-in functions of Python and several other methods along with their functions. To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course.
What are List Methods in Python
Priyankur
Rated 4.5/5 based on 12 customer reviews
Priyankur

Priyankur Sarkar

Data Science Enthusiast

Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.

Posts by Priyankur Sarkar

What are List Methods in Python

Sequence is one of the most basic data types in Python. Every element of a sequence is allocated a unique number called its position or index. The first designated index is zero, the second index is one, and so forth. Although Python comes with six types of pre-installed sequences, the most used ones are lists and tuples, and in this article we would be discussing about lists and its methods.Certain tasks can be performed with all types of sequence; these include adding, multiplying, indexing, slicing, and so on. For added convenience, Python provides built-in functions to find the length of a sequence and to find the largest and smallest elements of the sequence. If you are interested to learn more about other functions and features of Python, you may go through our Python tutorial.What is a ListList is the most versatile data-type available in Python that can be written as a collection of comma-separated values or items between square brackets. The items in a list need not necessarily be homogeneous, i.e. of the same type. This property of List makes it one of the most powerful tools in Python eg:[‘HELLO’, 57, “SKY”]. A single list can contain different Data-Types such as integers, strings, as well as Objects. Lists are mutable, and hence can be changed even after their creation.In Python, lists are ordered in a definite manner and have a definite count. Elements of a list are indexed according to a particular sequence and the indexing is done with 0 being the starting index. Even element of a list has its unique place in the list, which allows duplication of the elements in the list, with each element having its own distinct place and credibility. Lists are a useful tool for storing a sequence of data.Creating a list is as simple as putting different types of comma-separated values between square brackets:list1 = ['jack', 'jill', 1998, 2019];[Text Wrapping Break] list2 = [1, 2, 3, 4, 5 ];[Text Wrapping Break] list3 = ["w", "x", "y", "z"]Just like string indices, list indices start with 0, and lists can be sliced, concatenated and so on.Creating a ListPython Lists can be created just by placing the integer, string, or character inside the square brackets[]. Unlike Sets, a list does not require a built-in function for its creation.# Python program to demonstrate the creation of a list [Text Wrapping Break] # Creating a List [Text Wrapping Break] List = [] [Text Wrapping Break] print("Initial blank List: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] # Creating a List with  [Text Wrapping Break] # the use of a String [Text Wrapping Break] List = ['PythonListDemo'] [Text Wrapping Break] print("\nList with the use of String: ") [Text Wrapping Break] print(List) [Text Wrapping Break][Text Wrapping Break] # Creating a List with [Text Wrapping Break] # the use of multiple values  [Text Wrapping Break] List = ["Python", "List", "Demo"] [Text Wrapping Break] print("\nList containing multiple values: ") [Text Wrapping Break] print(List[0]) [Text Wrapping Break] print(List[2]) [Text Wrapping Break] [Text Wrapping Break] # Creating a Multi-Dimensional List [Text Wrapping Break] # (By Nesting a list inside a List) [Text Wrapping Break] List = [['Python', 'List'] , ['Demo']] [Text Wrapping Break] print("\nMulti-Dimensional List: ") [Text Wrapping Break] print(List)Initial blank List: [Text Wrapping Break] [] [Text Wrapping Break] [Text Wrapping Break] List with the use of String: [Text Wrapping Break] ['PythonListDemo'] [Text Wrapping Break] [Text Wrapping Break] List containing multiple values: [Text Wrapping Break] Python [Text Wrapping Break] Demo[Text Wrapping Break] [Text Wrapping Break] Multi-Dimensional List:  [Text Wrapping Break] [['Python', 'List'], ['Demo']]Creating a list with multiple distinct or duplicate elements:Multiple distinct or duplicate values can be stored as a sequence during creation of list:# Creating a List with [Text Wrapping Break] # the use of  Numbers [Text Wrapping Break] # (Having duplicate values) [Text Wrapping Break] List = [1, 2, 4, 4, 3, 3, 3, 6, 5] [Text Wrapping Break] print("\nList with the use of Numbers: ")  [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Creating a List with [Text Wrapping Break] # mixed type of values [Text Wrapping Break] #  (Having numbers and strings) [Text Wrapping Break] List = [1, 2, 'Python', 4, 'List', 6, 'Demo'] [Text Wrapping Break] print("\nList with the use of Mixed Values: ") [Text Wrapping Break] print(List)List with the use of Numbers: [Text Wrapping Break][1, 2, 4, 4, 3, 3, 3, 6,  5] [Text Wrapping Break] [Text Wrapping Break] List with the use of Mixed Values: [Text Wrapping Break] [1, 2, 'Python', 4, 'List', 6, 'Demo'] [Text Wrapping Break]Adding Elements to a ListUsing append() methodUsing the built-in append() function elements can be added to the List. At a time only one element can be added to the list by the use of append() method, but for the addition of multiple elements with the method, loops are used. Unlike Sets, a new List can be added to an existing one with the use of the append() method.# Python program to demonstrate addition of elements in a List [Text Wrapping Break] [Text Wrapping Break] # Creating a List [Text Wrapping Break] List = [] [Text Wrapping Break] print("Initial blank List: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] #  Addition of Elements  [Text Wrapping Break] # in the List [Text Wrapping Break] List.append(1) [Text Wrapping Break] List.append (2) [Text Wrapping Break] List.append(4) [Text Wrapping Break] print("\nList after Addition of Three elements: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break]# Adding elements to the List [Text Wrapping Break] # using Iterator [Text Wrapping Break] for i in range(1, 4): [Text Wrapping Break] List.append(i) [Text Wrapping Break] print("\nList after Addition of elements from 1-3: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] # Addition of List to a List [Text Wrapping Break] List2 = ['Python', 'List'] [Text Wrapping Break] List.append(List2) [Text Wrapping Break] print ("\nList after Addition of a List: ") [Text Wrapping Break] print(List)Initial blank List: [Text Wrapping Break] [] [Text Wrapping Break]  [Text Wrapping Break] List after Addition of Three elements: [Text Wrapping Break] [1,2,4][Text Wrapping Break][Text Wrapping Break]List after Addition of elements from 1-3: [Text Wrapping Break] [1, 2, 4, 1, 2, 3] [Text Wrapping Break]  [Text Wrapping Break] List after Addition of a List:[Text Wrapping Break] [1, 2,  4, 1, 2, 3, ['Python', 'List']]Using insert() methodAppend() method restricts the addition of elements at the end of the List only. Using the insert() method, elements can be added to the list at your desired position. Unlike append() which requires only one argument, insert() method requires two arguments for defining the position and value of the element to be inserted (position, value).# Python program to demonstrate addition of elements in a List [Text Wrapping Break] [Text Wrapping Break] # Creating a List  [Text Wrapping Break] List = [1,2,3,4] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] # Addition of Element at  [Text Wrapping Break] # specific Position [Text Wrapping Break] # (using Insert Method) [Text Wrapping Break] List.insert(3, 12) [Text Wrapping Break] List.insert(0, 'Python') [Text Wrapping Break] print("\nList after performing Insert Operation: ") [Text Wrapping Break] print(List)Initial List: [Text Wrapping Break] [1, 2, 3, 4] [Text Wrapping Break] [Text Wrapping Break] List after performing Insert Operation: [Text Wrapping Break] [ 'Python', 1, 2, 3, 12, 4]Using extend() methodApart from append() and insert() methods, there’s another method to add elements by the use of extend() method. This method is used for adding multiple elements to the end of the list at once.# Python program to demonstrate  [Text Wrapping Break] # Addition of elements in a List [Text Wrapping Break] [Text Wrapping Break] # Creating a List  [Text Wrapping Break] List = [1,2,3,4] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break] [Text Wrapping Break] #  Addition of multiple elements [Text Wrapping Break] # to the List at the end [Text Wrapping Break] # (using Extend Method)  [Text Wrapping Break] List.extend ( [8,  'Python', 'Program']) [Text Wrapping Break] print( "\nList after performing  Extend Operation: " ) [Text Wrapping Break] print(List)Initial List: [Text Wrapping Break] [1, 2, 3, 4] [Text Wrapping Break]  [Text Wrapping Break] List after performing Extend Operation:  [Text Wrapping Break] [1, 2, 3, 4, 8, 'Python', 'Program']Accessing elements from the ListIn order to access the items in a List, the index number is used as reference. The index operator [ ] is used to access the items of a list. The index should be an integer and nested lists are accessed by using nested indexing.# Python program to demonstrate  [Text Wrapping Break] # accessing of element from list [Text Wrapping Break][Text Wrapping Break] # Creating a List with [Text Wrapping Break]# the use of multiple values [Text Wrapping Break] List = ["Access", "List", "Elements"] [Text Wrapping Break]  [Text Wrapping Break]# accessing an element from the  [Text Wrapping Break] # list using index number [Text Wrapping Break] print("Accessing an element from the list") [Text Wrapping Break] print(List[0])  [Text Wrapping Break] print(List[2]) [Text Wrapping Break]  [Text Wrapping Break] # Creating a Multi-Dimensional List [Text Wrapping Break] # (By Nesting a list inside a List) [Text Wrapping Break] List = [['Access', 'List'] , ['Elements']] [Text Wrapping Break]  [Text Wrapping Break] # accessing an element from the [Text Wrapping Break] # Multi-Dimensional List using [Text Wrapping Break] # index number [Text Wrapping Break] print("Accessing an element from a Multi -  Dimensional list") [Text Wrapping Break] print(List[0][1]) [Text Wrapping Break] print(List[1][0])Accessing an element from the list [Text Wrapping Break] Access [Text Wrapping Break] Elements [Text Wrapping Break] [Text Wrapping Break] Accessing an element from a Multi-Dimensional  list [Text Wrapping Break] List [Text Wrapping Break] ElementsNegative indexingIn Python, negative sequence indexing means the representation of positions of the array from the end. Rather than calculating the offset like List[len(List)-3], we can just write it like List[-3]. Here, -1 refers to the last item, -2 refers to the second last item etc. i.e. beginning from the end.List = [1, 2, 'Python', 4, 'Negative', 6, 'Index'] [Text Wrapping Break] [Text Wrapping Break] # Accessing an element using negative indexing [Text Wrapping Break] print("Accessing element using negative indexing") [Text Wrapping Break]  [Text Wrapping Break] # print the last element of list [Text Wrapping Break] print(List[-1]) [Text Wrapping Break] [Text Wrapping Break]# print the third last element of list  [Text Wrapping Break] print(List[-3])Accessing element using negative indexing [Text Wrapping Break] Index [Text Wrapping Break] NegativeRemoving Elements from the ListUsing remove() methodIn Python, using the built-in remove() function, elements can be removed from the List but an Error will arise if the element is not present in the set. Remove() method is only capable of removing one element at a time, to remove a range of elements, an iterator is used. A limitation of this method is that it will only remove the first occurrence of the searched element and would not work if there are multiple occurrences of the searched element.# Python program to demonstrate removal of elements in a List [Text Wrapping Break][Text Wrapping Break] # Creating a List [Text Wrapping Break] List = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Removing elements from List [Text Wrapping Break] # using Remove() method [Text Wrapping Break] List.remove(5) [Text Wrapping Break] List.remove(6) [Text Wrapping Break] print("\nList after removal of two elements: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Removing elements from List [Text Wrapping Break] # using iterator method [Text Wrapping Break] for i in range(1, 5): [Text Wrapping Break]    List.remove(i) [Text Wrapping Break] print("\nList after removing a range of elements: ") [Text Wrapping Break] print(List)Initial List: [Text Wrapping Break][1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] [Text Wrapping Break][Text Wrapping Break]List after removal of two elements: [Text Wrapping Break][1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12][Text Wrapping Break] [Text Wrapping Break] List after removing a range of elements: [Text Wrapping Break] [7, 8, 9, 10, 11, 12]Using pop() methodIn Python,  we can also remove and return an element from the set using the Pop() function, but it removes  the last element of the set only by default. To remove a specific element from a position of the List, index of the element is passed as an argument to the pop() function.List = [1,2,3,4,5] [Text Wrapping Break] [Text Wrapping Break] # Removing element from the  [Text Wrapping Break] # Set using the pop() method [Text Wrapping Break] List.pop() [Text Wrapping Break] print("\nList after popping an element: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Removing element at a  [Text Wrapping Break] # specific location from the  [Text Wrapping Break] # Set using the pop() method [Text Wrapping Break] List.pop(2) [Text Wrapping Break] print("\nList after popping a specific element: ") [Text Wrapping Break] print(List)List after popping an element: [Text Wrapping Break] [1, 2, 3, 4]  [Text Wrapping Break][Text Wrapping Break]List after popping a specific  element: [Text Wrapping Break][1, 2, 4]Slicing of a ListAlthough there are several ways to print the whole List with all the elements in Python, there is only one way to print a specific range of elements from the list: by the use of Slice operation. Slice operation is performed on Lists by the use of colon(:). For printing elements from the beginning of the range use [:Index], for printing elements from end use [:-Index], to print elements from a specific index till the end use [Index:], for printing elements within a specific range, use [Start Index: End Index] and to print the entire List by the use of slicing operation, use [:]. Moreover, in order to print entire List in reverse order, use [::-1]. For printing the elements of List from rear end, negative indexes are used.# Python program to demonstrate removal of elements in a List [Text Wrapping Break][Text Wrapping Break]# Creating a List [Text Wrapping Break] List = ['P', 'Y', 'T', 'H','O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M'] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Print elements of a range [Text Wrapping Break] # using Slice operation [Text Wrapping Break] Sliced_List = List[3:10] [Text Wrapping Break] print("\nSlicing elements in a range 3-10: ") [Text Wrapping Break] print(Sliced_List) [Text Wrapping Break]  [Text Wrapping Break] # Print elements from a  [Text Wrapping Break] # pre-defined point to end [Text Wrapping Break] Sliced_List = List[6:] [Text Wrapping Break] print("\nElements sliced from 6th ""element till the end: ") [Text Wrapping Break] print(Sliced_List) [Text Wrapping Break] [Text Wrapping Break]# Printing elements from [Text Wrapping Break] # beginning till end [Text Wrapping Break] Sliced_List = List[:] [Text Wrapping Break] print("\nPrinting all elements using slice operation: ") [Text Wrapping Break] print(Sliced_List) Initial List: [Text Wrapping Break] ['P', 'Y', 'T', 'H','O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M'] [Text Wrapping Break] [Text Wrapping Break] Slicing elements in a range 3-10: [Text Wrapping Break]['H', 'O', 'N', 'P', 'R','O','G'][Text Wrapping Break][Text Wrapping Break]Elements sliced from 6th element till the end: [Text Wrapping Break]['P', 'R', 'O', 'G', 'R', 'A', 'M'][Text Wrapping Break][Text Wrapping Break]Printing all elements using slice operation: [Text Wrapping Break] ['P', 'Y', 'T', 'H', 'O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M']Negative index List Slicing# Creating a List [Text Wrapping Break] List = ['P', 'Y', 'T', 'H','O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M'] [Text Wrapping Break] print("Initial List: ") [Text Wrapping Break] print(List) [Text Wrapping Break]  [Text Wrapping Break] # Print elements from beginning [Text Wrapping Break] # to a pre-defined point using Slice [Text Wrapping Break] Sliced_List = List[:-7] [Text Wrapping Break] print("\nElements sliced till 7th element from last: ") [Text Wrapping Break] print(Sliced_List) [Text Wrapping Break]  [Text Wrapping Break] #  Print elements of a range [Text Wrapping Break] # using negative index List slicing [Text Wrapping Break] Sliced_List = List[-6:-1] [Text Wrapping Break] print("\nElements sliced from index -6 to -1") [Text Wrapping Break] print(Sliced_List) [Text Wrapping Break]  [Text Wrapping Break] #  Printing elements in reverse [Text Wrapping Break] # using Slice operation [Text Wrapping Break] Sliced_List = List[::-1] [Text Wrapping Break] print("\nPrinting List in reverse: ") [Text Wrapping Break] print(Sliced_List)Initial List: [Text Wrapping Break] ['P', 'Y', 'T', 'H','O', 'N', 'P', 'R', 'O', 'G', 'R', 'A', 'M'][Text Wrapping Break] [Text Wrapping Break]Elements sliced till 7th element from last: [Text Wrapping Break]['P', 'Y', 'T', 'H', 'O', 'N'][Text Wrapping Break] [Text Wrapping Break]Elements sliced from index -6 to -1[Text Wrapping Break] ['R', 'O', 'G', 'R', 'A'][Text Wrapping Break] [Text Wrapping Break]Printing List in reverse:[Text Wrapping Break]['M', 'A', 'R', 'G', 'O', 'R', 'P', 'N', 'O', 'H', 'T', 'Y', 'P']Updating ListsYou can update single or multiple elements of lists by reassigning the values on the lists individually.# Python program to update elements of a list[Text Wrapping Break]list = ['physics', 'chemistry', 1998, 2019];[Text Wrapping Break] print "Value available at index 2 : "[Text Wrapping Break] print list[2] [Text Wrapping Break] list[2] = 2000;[Text Wrapping Break] print "New value available at index 2 : "[Text Wrapping Break] print list[2]Value available at index 2 : [Text Wrapping Break] 1998 [Text Wrapping Break] New value available at index 2 : [Text Wrapping Break] 2000Built-in functionsFUNCTION DESCRIPTIONsum() Addsall numbers.ord() Used for returning an integer which represents the unique Unicode code point of the given Unicode character.cmp() If the first list is “greater” than the second list, the function returns 1.max() It returns the largest element in the list.min() It returns the smallest element in the list.all() It returns true if all elements are true or false if any element in the list is empty.any() It returns true if even one of the elements of the list is true. If one list is empty, it returns false.len() It returns length the list.enumerate() It adds a counter to an enumerate object that can be used directly for loops.accumulate() It makes an iterator that gives the result of a function. It takes a function as an argument.filter() It can individually check if every element is true or not.map() It applies a particular function to each item of an iterable and shows a list of the results.lambda() It is an anonymous function that behaves like a normal function in regard to arguments. While normal functions are defined with def keyword, anonymous functions are defined using lambda keyword.List MethodsFUNCTION DESCRIPTIONAppend() Adds an element at the end of the listExtend() Adds all elements of one list to another listInsert() Inserts an item at a desired indexRemove() Removes an item from the listPop() Removes and returns an element at a desired indexClear() Removes all elements from the listIndex() Returns the index of the first identical itemCount() Returns the number of items passed as argumentSort() Sort items of a list in ascending orderReverse() Reverses the listcopy() Returns a copy of the listSummaryIn this article, we have covered the concept of Lists in Python. You have learned the basics of creating a List, adding value to it, accessing its elements, removing the elements, and various other operations. We have also covered some basic built-in functions of Python and several other methods along with their functions. To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course.
Rated 4.5/5 based on 12 customer reviews
4869
What are List Methods in Python

Sequence is one of the most basic data types in Py... Read More

How to Work with Excel Spreadsheets using Python

Excel is considered as one of the most popular and widely used spreadsheet applications developed by Microsoft. You can organize, analyze and store your data into tabular sheets with the help of Excel. From analysts and sales managers, to CEOs, professionals from every field use Excel for creating quick statistics and for data crunching.Spreadsheets are commonly used in the present world because of their intuitive nature and the ability to handle large datasets. Most importantly, they can work without any prior technical background.Finding different ways to work with Excel using code is essential since working with data and in Python has some serious advantages in comparison with Excel’s UI. Developers of Python have implemented ways to read, write and manipulate Excel documents.You can check the quality of your spreadsheet application by going over the checklist below:Is the spreadsheet able to represent static data?Is the spreadsheet able to mix data, calculations, and reports?Is the data in your spreadsheet complete and consistent in nature?Does the spreadsheet have an organized worksheet structure?This checklist will help you in verifying the qualitative nature of the spreadsheet application you’re going to work on.Practical Applications  In this article, we would be using openpyxl to work on data. With the help of this module, you can extract data from a database into an Excel spreadsheet or you can also convert an Excel spreadsheet into a programmatic format. There can be a lot of possible situations where you might feel the need to use a package like openpyxl. Let us discuss a few of them to get a comprehensive overview of it.Importing New Products Into a Database Consider yourself working in an online store company. When they want to add new products to the online store, they make an Excel spreadsheet with a few hundred rows along with the name of the product, description, price and a few more basic information and then they give it to you. Now, if you want to import this particular data, you need to iterate over each row of the spreadsheet and then add each of the products into the database of the online store.[Text Wrapping Break] Exporting Database Data Into a SpreadsheetConsider you have a Database table. In this particular table, you have collected information of all your users which includes their name, contact number, email address, and so forth. Now, the Marketing Team is willing to collectively contact all the users and promote a new product of the company. However, neither do they have access to the Database nor they have any idea about using SQL to extract the information. In this situation, openpyxl comes to play. You can use it effectively to iterate over each User record and transform the required information into an Excel spreadsheet.    Appending Information to an Existing SpreadsheetConsider the same online store example we discussed above. You have an Excel spreadsheet with a list of users and your job is to append to each row the total amount they have spent in your store.In order to perform this, you have to read the spreadsheet first and then iterate through each row and fetch the total amount spent from the Database. Finally, you need to write it back to the spreadsheet.Starting openpyxlYou can install the openpyxl package using pip. Open your terminal and write the following command: $ pip install openpyxlAfter you have installed the spreadsheet, you can make up your own simple spreadsheet: from openpyxl import Workbook workbook = Workbook() spreadsheet = workbook.active spreadsheet["A1"] = "Hello" spreadsheet["B1"] = "World!" workbook.save(filename="HelloWorld.xlsx")How to Read Excel Spreadsheets with openpyxl Let us start with the most important thing that you can do with a spreadsheet,i.e. read it. We will be using a Watch Sample Dataset which contains a list of 100 watches with information like product name, product ID, review and so forth.  A Simple Way to Read an Excel Spreadsheet Let us start with opening our sample spreadsheet:>>> from openpyxl import load_workbook >>> workbook = load_workbook(filename="sample.xlsx") >>> workbook.sheetnames ['Sheet 1'] >>> spreadsheet = workbook.active >>> spreadsheet >>> spreadsheet.titleIn the example code above, we open the spreadsheet using load_workbook and then we check all the sheets that are available to work with using workbook.sheetnames. Then Sheet 1 is automatically selected using workbook.active since it is the first sheet available. This is the most common way of opening a spreadsheet.  Now, let us see the code to retrieve data from the spreadsheet: >>> spreadsheet["A1"] >>> spreadsheet["A1"].value 'marketplace' >>> spreadsheet["F10"].value "G-Shock Men's Grey Sport Watch"You can retrieve the actual value and the cell value  both. To get the actual value, use .value and to get the cell, you can use .cell():>>> spreadsheet.cell(row=10, column=6) >>> spreadsheet.cell(row=10, column=6).value "G-Shock Men's Grey Sport Watch"Importing Data from a Spreadsheet In this section, we will discuss how to iterate through the data, and about conversion into a more useful format using Python.Let us first start with iterating through the data. There are a number of iterating methods that depend solely on the user.You can slice the data with a combination of rows and columns:>>> spreadsheet["A1:C2"] ((, , ),  (, , )) You can also iterate through the dataset by ranging between rows and columns: >>> # Get all cells from column A  >>> spreadsheet["A"] (,  ,   ...   ,   ) >>> # Get all cells for a range of columns >>> spreadsheet["A:B"]  ((,    ,    ...    ,    ),   (,    ,    ...    ,    ))  >>> # Get all cells from row 5 >>> spreadsheet[5] (,  ,  ...   ,  ) >>> # Get all cells for a range of rows >>> spreadsheet[5:6] ((,   ,    ...    ,    ),   (,    ,    ...    ,    )) Python offers arguments by which you can set limits to the iteration with the help of Python generators like .iter_rows() and .iter_cols(): >>> for row in spreadsheet.iter_rows(min_row=1, ... max_row=2, ... min_col=1, ... max_col=3): ... print(row) (, , ) (, , ) >>> for column in spreadsheet.iter_cols(min_row=1,  ... max_row=2, ... min_col=1, ... max_col=3): ... print(column) (, ) (, )  (, ) You can also add Boolean values_only in the above example and set it to True to get the values of cell: >>> for value in spreadsheet.iter_rows(min_row=1,  ... max_row=2,  ... min_col=1,  ... max_col=3,  ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id') ('US', 3653882, 'R3O9SGZBVQBV76')Since we are now done with iterating the data, let us now manipulate data using Python’s primitive data structures. Consider a situation where you want to extract information of a product from the sample spreadsheet and then store it into the dictionary. The key to the dictionary would be the product ID.   Convert Data into Python classesTo convert data into Python data classes, let us first decide what we want to store and how to store it.  The two essential elements that can be extracted from the data are as follows:                                                     1. Products                                             2. Review                                                          • ID                                                         • ID                                                          • Title                                                     • Customers ID                                                          • Parent                                                 • Headline                                                          • Category                                            • Body                                                                                                                         • DateLet us implement the two elements: import datetime from dataclasses import dataclass @dataclass class Product: id: str parent: str title: str category: str @dataclass class Review: id: str customer_id: str stars: int headline: str body: str  date: datetime.datetime The next step is to create a mapping between columns and the required fields: >>> for value in spreadsheet.iter_rows(min_row=1, ... max_row=1, ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id', 'product_id', ...) >>> # Or an alternative >>> for cell in sheet[1]: ... print(cell.value) marketplace Customer_ID Review_ID Product_ID Product_Parent ...Finally, let us convert the data into new structures which will parse the data in spreadsheet into a list of products and review objects: from datetime import datetime  from openpyxl import load_workbook  from classes import Product,Review  from mapping import PRODUCT_ID,PRODUCT_PARENT,PRODUCT_TITLE, \ PRODUCT_CATEGORY,REVIEW_DATE,REVIEW_ID,REVIEW_CUSTOMER, \ REVIEW_STARS,REVIEW_HEADLINE,REVIEW_BODY # Using the read_only method since you're not gonna be editing the spreadsheet workbook = load_workbook(filename="watch_sample.xlsx",read_only=True)  spreadsheet = workbook.active products = [] reviews = [] # Using the values_only because you just want to return the cell value for row in spreadsheet .iter_rows(min_row=2, values_only=True):  product = Product(id=row[PRODUCT_ID],  parent=row[PRODUCT_PARENT],  title=row[PRODUCT_TITLE],  category=row[PRODUCT_CATEGORY])  products.append(product) # You need to parse the date from the spreadsheet into a datetime format spread_date = row[REVIEW_DATE]  parsed_date = datetime.strptime(spread_date,"%Y-%m-%d") review = Review(id=row[REVIEW_ID], Customer_ID=row[REVIEW_CUSTOMER], stars=row[REVIEW_STARS], headline=row[REVIEW_HEADLINE], body=row[REVIEW_BODY], date=parsed_date) reviews.append(review) print(products[0]) print(reviews[0])After you execute the code, you will get an output that looks like this:Product(id='A90FALZ1ZC',parent=937111370,...) Review(id='D3O9OGZVVQBV76',customer_id=3903882,...)Appending Data To understanding how to append data, let us hover back to the first sample spreadsheet. We will open the document and append some data to it: from openpyxl import load_workbook # Start by opening the spreadsheet and selecting the main sheet workbook = load_workbook(filename="hello_world.xlsx") spreadsheet = workbook.active # Write what you want into a specific cell spreadsheet["C1"]="Manipulating_Data ;)" # Save the spreadsheet workbook.save(filename="hello_world_append.xlsx"If you open your Excel file, you will notice the additional Manipulating_Data being added to an adjacent cell. Writing Excel Spreadsheets With openpyxl A spreadsheet is a file that helps to store data in specific rows and columns. We can calculate and store numerical data and also perform computation using formulas. So, let’s begin with some simple Spreadsheets and understand what each line means. Creating our first simple Spreadsheet 1 from openpyxl import Workbook  2    3 filename = "first_program.xlsx"  4    5 workbook = Workbook()  6 spreadsheet = workbook.active  7    8 sheet["A1"] = "first"  9 sheet["B1"] = "program!" 10   11 workbook.save(filename=filename)Line 5: In order to make a Spreadsheet, at first,  we have to create an Empty workbook to perform further operations. Lines 8 and 9 : We can add data to a specific cell as per our requirement. In this example, we can see that two values “first” and “program” have been added to specific cells in the sheet. Line 11: The line shows how to save data after all the operations we have done. Basic Spreadsheet Operations Before going to the difficult coding part, at first we have to build our building blocks like how to add and update values, how to manage rows and columns, adding filters, styles or formulas in a Spreadsheet. We have already explained the following code by which we can add values to a Spreadsheet: >>> spreadsheet["A1"] = "the_value_we_want_to_add"There is another way that we can add values to Spreadsheet: >>> cell = sheet["A1"] >>> cell >>> cell.value 'hello' >>> cell.value = "hey" >>> cell.value 'hey'Line 1: In the first line at first we have declared the cell and updated its value. Line 5: We have printed the value of the cell as “first”  because  in the first program we have already assigned sheet["A1"]with “first” Line 8 : We have updated the value of the cell as "second"by simply assigning it to cell.value. Lines 9 : In this line, we have just printed the updated value of cell. Finally, you have to save all the operations you have performed into the spreadsheet once you call workbook.save().If  the cell didn’t exist while adding a value ,then openpyxl creates a cell:>>> # Before, our spreadsheet has only 1 row >>> print_rows() ('first', 'program!') >>> # Try adding a value to row 10 >>> spreadsheet["B10"] = "test" >>> print_rows() ('first', 'program!') (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, None)  (None, 'test') Managing Rows and Columns in Spreadsheet Insertion or deletion of rows (adding or removing elements of rows /columns) is one of the most basic operations in Spreadsheet. In openpyxl.We can perform these operations by simply calling these methods and passing its arguments. .insert_rows().delete_rows().insert_cols().delete_cols()We can pass 2 types of arguments to the methods :  idx amount Idx stands for index position and amount refers to the number of values we can store in the Spreadsheet. Using our basic knowledge based on the first  simple program, let’s see how we can use these methods inside the program: >>> print_rows() ('first', 'program!') >>> # Insert a column at the first position before column 1 ("A") >>> spreadsheet.insert_cols(idx=1) >>> print_rows() (None, 'first', 'program!') >>> # Insert 5 columns in  between column 2 ("B") and 3 ("C") >>> spreadsheet.insert_cols(idx=3,amount=5) >>> print_rows() (None, 'first', None, None, None, None, None, 'program!') >>> # Delete the created columns >>> spreadsheet.delete_cols(idx=3,amount=5) >>> v.delete_cols(idx=1) >>> print_rows() ('first', 'program!') >>> # Insert a new row in the beginning >>> spreadsheet.insert_rows(idx=1) >>> print_rows() (None, None) ('first', 'program!') >>> # Insert 3 new rows in the beginning  >>> spreadsheet.insert_rows(idx=1,amount=3) >>> print_rows() (None, None) (None, None)  (None, None)  (None, None)  ('first', 'program!') >>> # Delete the first 4 rows  >>> spreadsheet.delete_rows(idx=1,amount=4)  >>> print_rows()  ('first', 'program!') Managing SheetsWe have seen the following recurring piece of code in our previous examples .This is one of the ways of selecting the default sheet from the Spreadsheet: spreadsheet = workbook.activeHowever, if you want to open a spreadsheet with multiple sheets, you can write the following command: >>> # Let's say you have two sheets: "Products" and "Company Sales" >>> workbook.sheetnames ['Products', 'Company Sales'] >>> # You can select a sheet using its title >>> Products_Sheet = workbook["Products"] >>> Sales_sheet = workbook["Company Sales"]If we want to change the title of the Sheet, execute the following code: >>> workbook.sheetnames ['Products', 'Company Sales'] >>> Products_Sheet = workbook["Products"] >>> Products_Sheet.title = "New Products" >>> workbook.sheetnames ['New Products', 'Company Sales']We can CREATE / DELETE Sheets also with the help of two methods - .create_sheet() and  .remove(): >>> #To print the available sheet names >>> workbook.sheetnames  ['Products', 'Company Sales'] >>> #To create a new Sheet named "Operations" >>> Operations_Sheet = workbook.create_sheet("Operations") >>> #To print the updated available sheet names >>> workbook.sheetnames ['Products', 'Company Sales', 'Operations'] >>> # To define the position Where we want to create the Sheet(here “HR” sheet is created at the first position .Here index 0 represents the first position) >>> HR_Sheet = workbook.create_sheet("HR",0) >>> #To again  print the updated available sheet names >>> workbook.sheetnames ['HR', 'Products', 'Company Sales', 'Operations'] >>> # To remove them,we just have to send the sheet names as an argument which we want to delete to the method  .remove()  >>> workbook.remove(Operations_Sheet) >>> workbook.sheetnames ['HR', 'Products', 'Company Sales'] >>> #To delete hr_sheet >>> workbook.remove(hr_sheet) >>> workbook.sheetnames ['Products', 'Company Sales']Adding Filters to the Spreadsheet We can use openpyxl to add filters in our Spreadsheet but when we open our Spreadsheet, the data won’t be rearranged according to these sorts and filters. When you’re programmatically creating a spreadsheet and it is going to be sent and used by someone else, it is a good practice to add different filters and allow people to use it afterward. In the code below there is a simple example which shows how to add a simple filter to your spreadsheet: >>> # Check the used spreadsheet space using the attribute "dimensions" >>> spreadsheet.dimensions 'A1:O100' >>> spreadsheet.auto_filter.ref="A1:O100" >>> workbook.save(filename="watch_sample_with_filters.xlsx")Adding Formulas to the Spreadsheet Formulas are one of the most commonly used and powerful features of spreadsheets. By using formulas, you can solve various mathematical equations with the additional support of openpyxl which makes those calculations as simple as editing a specific cell’s value.The list of formulas supported by openpyxl are:>>> from openpyxl.utils import FORMULAE >>> FORMULAE frozenset({'ABS',            'AMORLINC',            'ACCRINT',             'ACOS',             'ACCRINTM',             'ACOSH',              ...,                   'AND',            'YEARFRAC',             'YIELDDISC',             'AMORDEGRC',             'YIELDMAT',             'YIELD',             'ZTEST'}) Let’s add some formulas to our spreadsheet. Let’s check the average star rating of  the 99 reviews within the spreadsheet: >>> # Star rating is in column "H"  >>> spreadsheet["P2"] = "=AVERAGE(H2:H100)" >>> workbook.save(filename = "first_example.xlsx")Now, if we open your spreadsheet and go to cell P2, you can see the value to be 4.18181818181818.  Similarly, we can use this methodology to include any formulas for our requirements in our spreadsheet. For example, if we want to count the number of helpful reviews: >>> # The helpful votes  counted in column "I"  >>> spreadsheet["P3"] = '=COUNTIF(I2:I100, ">0")' >>> workbook.save(filename = "first_example.xlsx") Adding Styles to the SpreadsheetIt is not so important and usually, we don’t use this in everyday code but for the sake of completeness, we will also understand this with the following example.Using openpyxl, we get multiple styling options such as including fonts, colors,  borders,and so on.Let’s have a look at an example:>>> # Import necessary style classes >>> from openpyxl.styles import Font,Color,Alignment,Border,Side,colors >>> # Create a few styles >>> Bold_Font = Font(bold=True) >>> Big_Red_Text = Font(color=colors.RED,size=20)  >>> Center_Aligned_Text = Alignment(horizontal="center")  >>> Double_Border_Side = Side(border_style="double")  >>> Square_Border = Border(top=double_border_side,  ... right=double_border_side,  ... bottom=double_border_side,  ... left=double_border_side)  >>> # Style some cells! >>> spreadsheet["A2"].font = Bold_Font >>> spreadsheet["A3"].font = Big_Red_Text >>> spreadsheet["A4"].alignment = Center_Aligned_Text >>> spreadsheet["A5"].border = Square_Border >>> workbook.save(filename="sample_styles.xlsx") If you want to apply multiple styles to one or several cells in our spreadsheets,you can use  NamedStyle class: >>> from openpyxl.styles import NamedStyle >>> # Let's create a style template for the header row >>> header = NamedStyle(name="header") >>> header.font = Font(bold=True) >>> header.border = Border(bottom=Side(border_style="thin")) >>> header.alignment = Alignment(horizontal="center",vertical="center") >>> # Now let's apply this to all first row (header) cells >>> header_row = sheet[1] >>> for cell in header_row: ... cell.style = header >>> workbook.save(filename="sample_styles.xlsx") Adding Charts to our SpreadsheetCharts are a good way to compute and understand large amounts of data quickly and easily. We have a lot of charts such as bar chart, pie chart, line chart, and so on. Let us start by creating a new workbook with some data:  1 from openpyxl import Workbook   2 from openpyxl.chart import BarChart,Reference   3    4 workbook = Workbook()   5 spreadsheet = workbook.active   6    7 # Let's create some sample sales data  8 rows = [   9    ["Product","Online","Store"],  10    [1,30,45],  11    [2,40,30],  12    [3,40,25],  13    [4,50,30],  14    [5,30,25],  15    [6,25,35],  16    [7,20,40],  17 ]  18   19 for row in rows: 20    spreadsheet .append(row)Now let us create a bar chart that will show the total number of sales per product: 22 chart = BarChart() 23 data = Reference(worksheet=sheet, 24                 min_row=1,  25                 max_row=8,  26                 min_col=2,  27                 max_col=3)  28   29 chart.add_data(data,titles_from_data=True) 30 spreadsheet .add_chart(chart, "E2") 31 32 workbook.save("chart.xlsx")You can also create a line chart by simply making some changes to the data:  1 import random   2 from openpyxl import Workbook   3 from openpyxl.chart import LineChart,Reference   4    5 workbook = Workbook()  6 sheet = workbook.active  7    8 # Let's create some sample sales data   9 rows= [ 10    ["", "January", "February", "March", "April",  11    "May", "June", "July", "August", "September",  12     "October", "November", "December"],  13    [1, ], 14    [2, ], 15    [3, ], 16 ]  17   18 for row in rows:  19    sheet.append(row) 20   21 for row in sheet.iter_rows(min_row=2, 22                           max_row=4, 23                           min_col=2, 24                           max_col=13): 25    for cell in row: 26        cell.value = random.randrange(5,100)There are numerous types of charts and various types of customizations you can apply to your spreadsheet to make it more attractive.Convert Python Classes to Excel SpreadsheetLet us now learn how to convert the Excel Spreadsheet data to Python classes.  Assume we have a database and we use some Object Relational mapping to map the database into Python classes and then export the objects into spreadsheets: from dataclasses import dataclass from typing import List @dataclass class Sale: id: str  quantity: int @dataclass  class Product:  id: str  name: str  sales:List[Sale] Now, to generate some random data, let’s assume that the above classes are stored in  db_classes.py file then:  1 import random   2    3 # Ignore these for now. You'll use them in a sec ;)   4 from openpyxl import Workbook   5 from openpyxl.chart import LineChart,Reference   6    7 from db_classes import Product,Sale   8    9 products_range = []  10   11 # Let's create 5 products 12 for idx in range(1,6): 13    sales = []  14   15    # Create 5 months of sales  16    for_in range(5): 17        sale_range = Sale(quantity=random.randrange(5,100)) 18        sales.append(sale) 19   20    product = Product(id=str(idx), 21                      name="Product %s" % idx, 22                      sales=sales) 23    products_range.append(product)By running this code, we will get 5 products in 5 months of sale with a random quantity of sales for each month. Now, we have  to convert this into a spreadsheet in which we need to iterate over the data: 25 workbook = Workbook()  26 spreadsheet = workbook.active  27 28 # Append column names first  29 spreadsheet.append(["Product ID","Product Name","Month 1",  30              "Month 2","Month 3","Month 4","Month 5"])  31   32 # Append the data  33 for product in products_range: 34    data = [product.id,product.name] 35    for sale in product.sales: 36        data.append(sale.quantity)  37    spreadsheet.append(data) This will create a spreadsheet with some data coming from your database. How to work with pandas to handle Spreadsheets?We have learned to work with Excel in Python because Excel is one of the most popular tools and finding a way to work with Excel is critical. Pandas is a great tool to work with Excel in Python. It has unique methods to read all kinds of data in an Excel file and we can export items back to Excel using it. To use it, at first we need to install pandas package: $ pip install pandas Then, let’s create a simple DataFrame:  1 import pandas as pd   2    3 data = {   4    "Product Name":["Product 1","Product 2"],   5    "Sales Month 1":[10, 20],   6    "Sales Month 2":[5, 35],   7 }   8 dataframe = pd.DataFrame(data)Now we have some data, and to convert it from a DataFrame into a worksheet we generally use .dataframe_to_rows(): 10 from openpyxl import Workbook 11 from openpyxl.utils.dataframe import  dataframe_to_rows  12   13 workbook = Workbook()  14 spreadsheet = workbook.active  15   16 for row in dataframe_to_rows(df, index=False,header=True):  17    spreadsheet .append(row) 18   19 workbook.save("pandas_spreadsheet.xlsx")We need to use  read_excel method to read data from pandas DataFrame object. excel_file =’movies.xls’  movies=pd.read_excel(excel_file) We can also use Excel file class to use multiple sheets from the same excel file: movies_sheets = [] for sheet in xlsx.sheet_names:     movies_sheets.append(xlsx.parse(sheet))     movies = pd.concat(movies_sheets))Indexes and columns allows you to access data from your DataFrame easily: >>> df.columns  Index(['marketplace', 'customer_id', 'review_id', 'product_id',        'product_parent', 'product_title', 'product_category', 'star_rating',         'helpful_votes', 'total_votes', 'vine', 'verified_purchase',         'review_headline', 'review_body', 'review_date'],        dtype='object') >>> # Get first 10 reviews' star rating  >>> df["star_rating"][:10] R3O9SGZBVQBV76    5 RKH8BNC3L5DLF     5  R2HLE8WKZSU3NL    2  R31U3UH5AZ42LL    5  R2SV659OUJ945Y    4  RA51CP8TR5A2L     5  RB2Q7DLDN6TH6     5  R2RHFJV0UYBK3Y    1  R2Z6JOQ94LFHEP    5  RX27XIIWY5JPB     4  Name: star_rating, dtype: int64 >>> # Grab review with id "R2EQL1V1L6E0C9", using the index >>> df.loc["R2EQL1V1L6E0C9"] marketplace               US customer_id         15305006  review_id     R2EQL1V1L6E0C9  product_id        B004LURNO6  product_parent     892860326  review_headline   Five Stars  review_body          Love it  review_date       2015-08-31  Name: R2EQL1V1L6E0C9, dtype: object Summary In this article we have covered: How to extract information from spreadsheets  How to create Spreadsheets in different ways How to customize a spreadsheet by adding filters, styles, or charts and so on How to use pandas to work with spreadsheets Now you are well aware of the different types of implementations you can perform with spreadsheets using Python. However, if you are willing to gather more information on this topic, you can always rely on the official documentation of openpyxl. To gain more knowledge about Python tips and tricks, check out our Python tutorial. To gain mastery over Python coding,join ourPython certification course.  
Rated 4.5/5 based on 22 customer reviews
13816
How to Work with Excel Spreadsheets using Python

Excel is considered as one of the most popular and... Read More

What is pip, Getting Started with Python pip

Pip is a package manager for Python that allows you to install additional libraries and packages that are not part of the standard Python library such as the ones found in the Python Package Index. It is a replacement for easy install. If your version of Python is 2.7.9 (or greater) or Python 3.4 (or greater), then PIP comes pre-installed with Python, in other cases you will have to install it separately. PIP is a recursive acronym for “Preferred Installer Program” or “PIP Installs Packages”. It is a command-line utility that installs, reinstalls, or uninstalls PyPI packages with one simple command: pip. You may be familiar with the term package manager if you have used other languages like Ruby uses Gem, JavaScript uses npm for package management, and .NET uses NuGet. Pip has become the standard package manager for Python. The Python installer installs pip automatically, so it is ready for you to use, unless you have installed an older version of Python. You can also verify if pip is available on your Python version by running the command below:On running the command mentioned above, a similar output should be displayed which will show the pip version, along with the location and version of Python. If you are using an older version of Python, the pip version will not be displayed. Then you can install it separately. You can download pip from the following link: https://pypi.org/project/pip/ Installing pip in PythonFor WindowsFollow the instructions to install pip in Python on Windows 7, Windows 8.1, and Windows 10: Download get-pip.py installer script from https://bootstrap.pypa.io/get-pip.py. For Python 3.2, download from https://bootstrap.pypa.io/3.2/get-pip.py. After that, right-click on the link and select Save As and save it to any safe location on your computer. Open Command Prompt and navigate to the get-pip.py file where you saved it previously. Run the command: python get-pip.py For Mac Modern Mac systems have Python and pip pre-installed but the version of Python tends to be outdated and not the best choice for serious programming in Python. So, it’s highly recommended that you install a more updated version of Python and PIP. If you want to use the pre-installed Python application but don’t have PIP available, you can install PIP with the following commands in Terminal:sudo easy_install pipIf you want to install an updated version of Python, then you can use Homebrew. Installing Python with Homebrew requires a single command:brew install pythonInstalling Python with Homebrew will give you the latest version which should come packaged with PIP but if PIP is unavailable, you can re-link Python using the following commands in Terminal:brew unlink python && brew link pythonFor Linux If your Linux distribution came with Python pre-installed, using your system’s package manager you will be able to install PIP. This is preferable since pre-installed versions of Python do not work well with the get-pip.py script used on Windows and Mac. Given below are the commands you should run in order to install pip in your system depending on the version of Python you are using:Advanced Package Tool (Python 2.x):sudo apt-get install python-pip pacman Package Manager (Python 2.x):sudo pacman -S python2-pip Yum Package Manager (Python 2.x):sudo yum upgrade python-setuptools  sudo yum install python-pip python-wheel Dandified Yum (Python 2.x):sudo dnf upgrade python-setuptools  sudo dnf install python-pip python-wheel Zypper Package Manager (Python 2.x):sudo zypper install python-pip python-setuptools python-wheel Advanced Package Tool (Python 3.x): sudo apt-get install python3-pip pacman Package Manager (Python 3.x): sudo pacman -S python-pip Yum Package Manager (Python 3.x): sudo yum install python3 python3-wheel Dandified Yum (Python 3.x): sudo dnf install python3 python3-wheel Zypper Package Manager (Python 3.x): sudo zypper install python3-pip python3-setuptools python3-wheel For Raspberry Pi You are most likely running Raspbian if you are a Raspberry Pi user as it is the official operating system designated and provided by the Raspberry Pi Foundation. PIP comes pre-installed on with Raspbian Jessie. It is one of the biggest reasons to upgrade to Raspbian Jessie instead of using Raspbian Wheezy or Raspbian Jessie Lite. If you are using an older version of Raspbian, you can still manually install PIP. Given below are the commands you should run in order to install pip on your system depending on the version of Python you are using: On Python 2.x:sudo apt-get install python-pipOn Python 3.x:sudo apt-get install python3-pipRaspbian users, working with Python 2.x must use pip while Python 3.x users must use pip3 while running PIP commands.For Ubuntusudo apt-get install python-pipFor Fedorasudo yum install python-pipHow to use PIP and PyPI? PyPI - the Python Package Index After PIP is installed, we need to find a package to install. Packages are usually installed from the repository of software for the Python programming language which is the Python Package Index.Set environment variable for PIP:You won’t have to reference the pip install directory again and again if you set an environment variable.Set: (default = C:\Python27\Scripts) in your Windows/Linux “PATH” environment variable.Getting Started with PIP Now that we know what PIP is and have successfully installed it on our computer, let's get started on how to use it: Commands in PIP Enter pip in the command terminal and it will show the following output on the screen. Usage:pip [options] Commands:InstallInstall packagesDownloadDownload packagesuninstallUninstall packagesunzipUnzip individual packagesbundleCreate pybundleshelpShow help for commandsconfigManage local and global configurationfreezeOutput installed packages in required formatlistList installed packageswheelBuild wheels from your requirementshashCompute hashes of package archivescompletionA helper command used for command completioncheckVerify installed packages have compatible dependenciesshowShow information about installed packagessearchSearch PyPI for packageszipZip individual packagesCommonly used commands in pip are install, upgrade or uninstall. General Options: -h, --help: Shows help. --isolated: To run pip in an isolated mode by ignoring environment variables and user configuration. -v, --verbose: Give more output. Option is additive, and can be used up to 3 times. -V, --version: Show version and exit. -q, --quiet: Give less output. Option is additive, and can be used up to 3 times (corresponding to WARNING, ERROR, and CRITICAL logging levels). --proxy: Specify a proxy in the form [user:passwd@]proxy.server:port. --trusted-host: Mark this host as trusted, even though it does not have valid or any HTTPS. --cert: Path to alternate CA bundle. --client-cert: Path to SSL client certificate, a single file containing the private key and the certificate in PEM format. --retries: Maximum number of retries each connection should attempt(5 times by default). --timeout: Set the socket timeout(15 seconds by default). --exists-action: Default action when a path already exists: (s)witch,(i)gnore, (w)ipe, (b)ackup, (a)bort). --cache-dir: Store the cache data in .--no-cache-dir: Disable the cache. --disable-pip-version-check: Don't periodically check PyPI to determine whether a new version of pip is available for download. Implied with --no-index. Finding required packages:To search any package, i.e. Flask command will be as shown below: pip search Flask The following output will be displayed with all packages and description: Flask-Cache - Adds cache support to your Flask applicationFlask-SeaSurf - An update CSRF extension for FlaskFlask-Admin - Simple and extensible admin interface framework for FlaskFlask-Security - Simple security for Flask appsFlask - A microframework based on Werkzeug, Jinja2 and good intentions Installing a package: To install the required package, in our case it is Flask, enter the following command : pip install Flask Pip – Show information To check information about the newly installed packages enter: pip show Flask  --- Name: Flask  Version: 0.10.1  Location: /usr/local/lib/python2.7/dist-packages  Requires: Werkzeug, Jinja2, itsdangerousUninstalling a package:To uninstall any package installed by PIP, enter the command given below. pip uninstall Flask Uninstalling Flask: ... ..... Proceed (y/n)?  Successfully uninstalled Flask That’s all. The PIP application has been uninstalled. How to Upgrade PIP for Python Although PIP application doesn’t receive updates very often, it’s still important to keep the application up to date with the newer versions because there may be important fixes to bugs, compatibility, and security holes. Fortunately, upgrading to the latest versions of PIP is very fast and simple. On Windows python -m pip install -U pip On Mac, Linux, or Raspberry Pi pip install -U pip Certain versions of Linux and Raspberry Pi, pip3 needs to be entered instead of pip. Using Requirement FilesThe pip install command always installs the latest published version of a package, but you should install the particular version that suits your code. You would want to create a specification of the dependencies and versions that you have used while developing and running your application, so that there are no surprises when you use the application in production. Requirement files allow you to specify exactly the packages and versions that should be installed on your system. Executing pip help shows that there is a freeze command that displays the installed packages in requirements format. This command can be used to redirect the output to a file to generate a requirements file: The freeze command is used to dump all the packages and their versions to a standard output, so as to redirect the output to a file that can be used to install the exact requirements into another system. The general convention is to name this file requirements.txt, but it is completely up to you to name it whatever you want. If you want to replicate the environment in another system, run pip install specifying the requirements file using the -r switch:The versions listed in requirements.txt will match those of the packages: $ pip list Package    Version  ----------        ----------  certifi2018.11.29  chardet3.0.4  idna   2.8  pip    19.0.1  requests     2.21.0  setuptools  40.6.2  urllib31.24.1 You may submit the requirements.txt file to source control and can use it to create the exact environment in other machines. Fine-Tuning Requirements The problem with hardcoding the versions of your packages and their dependencies is that the packages receive frequent updates with bug and security fixes, and you probably want to update to them as soon as they are published. The requirements file format gives you a bit of flexibility to ensure that the packages are up to date by allowing you to enter dependency versions using logical operators, although specifying the base versions of a package. Make the following changes by opening the requirements.txt file in your editor: certifi>=2018.11.29  chardet>=3.0.4  idna>=2.8  requests>=2.21.0  urllib3>=1.24.1 Change the logical operator to >= to install an exact or greater version that has been published. While you set a new environment using the requirments.txt file, pip searches for the latest version that supports the requirement and installs it. The packages in your requirements file can be updated by running the install command with the --upgrade switch: In this case nothing was upgraded because latest versions have already been installed, but if a new version was published for a listed package, then the package would’ve been upgraded. New versions can introduce changes that fix bugs and will make or break your application. In order to fine-tune your requirements, the requirements file syntax supports additional version specifiers. Let us assume that a new version 3.0 of requests is published but it breaks your application as it introduces an incompatible change. In such a case, the requirements file can be modified to prevent 3.0 or higher versions from being installed: certifi>=2018.11.29 chardet>=3.0.4 idna>=2.8 requests>=2.21.0, =1.24.1Changing the version specifier for the requests package ensures that only the versions which are less than 3.0 get installed. Production vs Development DependenciesAll packages which are installed during the development of your applications are not going to be application dependencies. During the development process, there are certain packages published to PyPI that are development tools or libraries that can be useful to you. For example, you would require a unit test framework in order to unit test your application. Pytest is a popular framework for unit testing. You would want to install the unit testing framework in your development environment, but not in your production environment because it is not an application dependency. To set up a development environment, you need to create a second requirements file (requirements_file.txt) to list additional tools: # In requirements_file.txt  pytest>=4.2.0 To do this, you need to install both requirement files using pip: requirements.txt and requirements_file.txt. Pip allows for specifying additional parameters within a single requirements file. The requirements_file.txt can also be modified to install the requirements from the production requirements.txt file: # In requirements_file.txt  -r requirements.txt  pytest>=4.2.0 Notice that the exact same -r switch is being used in order to install the production requirements.txt file. The file format of the requirements file allows you to specify additional arguments right on a requirements file. Alternatives to pipPip is an essential tool for all Pythonistas which is used in developing many applications and projects for package management. This article gives you the basics of Pip for Python but the Python community is very active in providing great tools and libraries for developers using other applications as well. These include alternatives to pip that try to improve and simplify package management. Here are some package management tools other than pip which are available for Python: CondaPoetryPipenvSummaryBy now you know that pip is a package manager for Python that is used in many projects to manage dependencies. It is included with the Python installer, hence it is essential for all Python programmers to know how to use it. Although Python provides a wide range of standard libraries which are suitable for developing all types of applications, the active Python community provides more sets of tools and libraries that speed up the development process of a Python application. In this article, we have covered: The process of installing pip in Python Setting an environment variable for pip Commonly used commands in pip and their functions Finding and installing new packages using pip with requirement files in the command line and getting information about the newly installed package How to uninstall a package in pip? In addition to the above topics we have also covered the importance of keeping dependencies updated and a few alternatives to pip that can help managing those dependencies. To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course. 
Rated 4.5/5 based on 43 customer reviews
19927
What is pip, Getting Started with Python pip

Pip is a package manager for Python that allows yo... Read More

A Guide to Threading in Python

In Computer Science, a thread is defined as the smallest unit of execution with the independent set of instructions. In simple terms, it is a separate flow of instruction. The advantage of threading is that it allows a user to run different parts of the program in a concurrent manner and make the design of the program simpler.  During threading, different processors run on a single program and each one of them performs an independent task simultaneously. However, if you want to perform multiprocessing, then you need to execute your code in a different language or use the multiprocessing module. In the CPython implementation of Python, interactions are made with the Global Interpreter Lock (GIL) which always limits one Python thread to run at a time. In threading, good candidates are considered those who spend much of their time waiting for external events. These are all true in the case when the code is written in Python. However, in the case of threading in C other than Python, they have the ability to release GIL and run in a concurrent manner.  Basically, building up your program to use threading will help to make the design clearer and easier to reason about. Let us see how to start a thread in Python. How to Start a Thread? The Python Standard Library contains a module named threading which comprises all the basics needed to understand the process of threading better. By this module, you can easily encapsulate threads and provide a clean interface to work with them.  If you want to start a thread, first you need to create a Thread instance and then implement .start(): import logging import threading import time def thread_func(name): logging.info("Thread %s: starting...",name) time.sleep(2) logging.info("Thread %s: finishing...",name) if __name__ == "__main__": format = "%(asctime)s: %(message)s" logging.basicConfig(format=format,level=logging.INFO, datefmt="%H:%M:%S") logging.info("Main    : before creating thread...") t = threading.Thread(target=thread_function,args=(1,)) logging.info("Main    : before running thread...") t.start() logging.info("Main    : wait for the thread to finish...") # t.join() logging.info("Main    : all done...")It is observable that the main section is responsible for creating and initiating the thread: t = threading.Thread(target=thread_function, args=(1,)) t.start()When a Thread is created, a function and a list of arguments to that function are passed. In the example above, thread_function() is being run and 1 is passed as an argument. The function, however, simply logs messages with a time.sleep() in between them.The output of the code above  will be displayed as:$ ./single_thread.py Main    : before creating thread... Main    : before running thread... Thread 1: starting... Main    : wait for the thread to finish... Main    : all done... Thread 1: finishing...The Thread gets finished only after the Main section of the code.Daemon ThreadsIn terms of computer science, a daemon is a computer program that runs as a background process. It is basically a thread that runs in the background without worrying about shutting it down. A daemon thread will shut down immediately when the program terminates. However, if a program is running non-Daemon threads, then the program will wait for those threads to complete before it ends.  In the example code above, you might have noticed that there is a pause of about 2 seconds after the main function has printed the all done message and before the thread is finished. This is because Python waits for the non-daemonic thread to complete. threading.shutdown() goes through all of the running threads and calls .join on every non-daemonic thread. You can understand it better if you look at the source of Python threading.  Let us the example we did before with a daemon thread by adding the daemon=True flag:t = threading.Thread(target=thread_function, args=(1,),daemon=True)Now if you run your program, the output will be as follows: $ ./daemon_thread.py  Main    : before creating thread...  Main    : before running thread...  Thread 1: starting...  Main    : wait for the thread to finish...  Main    : all done... The basic difference here is that the final line of output is missing. This is because when the main function reached the end of code, the daemon was killed.Multiple ThreadingThe process of executing multiple threads in a parallel manner is called multithreading. It enhances the performance of the program and Python multithreading is quite easy to learn.Let us start understanding multithreading using the example we used earlier:import logging import threading import time def thread_func(name): logging.info("Thread %s: starting...", name) time.sleep(2) logging.info("Thread %s: finishing...", name) if __name__ == "__main__": format = "%(asctime)s: %(message)s" logging.basicConfig(format=format,level=logging.INFO, datefmt="%H:%M:%S")     multiple_threads = list() for index in range(3): logging.info("Main    : create and start thread %d...",index) t = threading.Thread(target=thread_function,args=(index,)) threads.append(x) t.start() for index, thread in enumerate(multiple_threads): logging.info("Main    : before joining thread %d...",index) thread.join() logging.info("Main    : thread %d done...",index)This code will work in the same way as it was in the process to start a thread. First, we need to create a Thread object and then call the .start() object. The program then keeps a list of Thread objects. It then waits for them using .join(). If we run this code multiple times, the output will be as below: $ ./multiple_threads.py Main    : create and start thread 0... Thread 0: starting... Main    : create and start thread 1... Thread 1: starting... Main    : create and start thread 2...  Thread 2: starting...  Main    : before joining thread 0...  Thread 2: finishing...  Thread 1: finishing...  Thread 0: finishing...  Main    : thread 0 done...  Main    : before joining thread 1...  Main    : thread 1 done...  Main    : before joining thread 2...  Main    : thread 2 done... The threads are sequenced in the opposite order in this example. This is because multithreading generates different orderings. The Thread x: finishing message informs when each of the thread is done. The thread order is determined by the operating system, so it is essential to know the algorithm design that uses the threading process.  A ThreadPool ExecutorUsing a ThreadpoolExecutor is an easier way to start up a group of threads. It is contained in the Python Standard Library in concurrent.futures. You can create it as a context manager using the help of with statement. It will help in managing and destructing the pool. Example to illustrate a ThreadpoolExecutor (only the main section): import concurrent.futures if __name__ == "__main__":      format = "%(asctime)s: %(message)s"      logging.basicConfig(format=format,level=logging.INFO, datefmt="%H:%M:%S") with concurrent.futures.ThreadPoolExecutor(max_workers=3) asexecutor: executor.map(thread_function,range(3))The code above creates a ThreadpoolExecutor and informs how many worker threads it needs in the pool and then .map() is used to iterate through a list of things. When the with block ends, .join() is used on each of the threads in the pool. It is recommended to use ThreadpoolExecutor whenever possible so that you never forget to .join() the threads.The output of the code will look as follows:$ ./executor.py  Thread 0: starting... Thread 1: starting... Thread 2: starting... Thread 1: finishing... Thread 0: finishing... Thread 2: finishing…Race Conditions When multiple threads try to access a shared piece of data or resource, race conditions occur. Race conditions produce results that are confusing for a user to understand and it occurs rarely and is very difficult to debug.Let us try to understand a race condition using a class with a false database:class FalseDatabase: def race(self): self.value = 0 def update(self,name): logging.info("Thread %s: starting update...",name) local_copy_value = self.value local_copy_value += 1 time.sleep(0.1) self.value = local_copy_value logging.info("Thread %s: finishing update...",name)The class FalseDatabase holds the shared data value on which the race condition will occur. The function race simply intializes .value to zero.  The work of .update() is to analyze a database, perform some computation and then rewrite a value to the database. However, reading from the database means just copying .value to a local variable. Computation means adding a single value and then .sleep() for a little bit and then the value is written back by copying the local value back to .value().The main section of FalseDatabase:if __name__ == "__main__": format = "%(asctime)s: %(message)s" logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S") dtb = FalseDatabase() logging.info("Testing update. Starting value is %d...",dtb.value) with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor: for index in range(2): executor.submit(dtb.update,index) logging.info("Testing update. Ending value is %d...", dtb.value)The programs create a ThreadPoolExecutor with two threads and calls .submit()and then runs database.update()..submit() contains two arguments: both positional and named arguments are passed to the function running in the thread: .submit(function, *args, **kwargs)The output will look like as follows: $ ./racecond.py Testing unlocked update... Starting value is 0... Thread 0: starting update... Thread 1: starting update... Thread 0: finishing update... Thread 1: finishing update... Testing unlocked update... Ending value is 1...One ThreadIn this section, we would be discussing how threads work in a simplified manner.  When the ThreadPoolExecutor is informed to run each thread, we are basically telling it to which function to run and what are the parameters to be passed: executor.submit(database.update, index). This will allow each thread in the pool to call the executor.submit(index). The database is a reference to the FalseDatabase object that was created in main function.Each of the threads will have a reference to the database and also a unique index value which will make the log statements readable. The thread contains its own version of all the data local to the function. This is called local_copy in case of .update(). This is an advantage that allows all the local variables to a function to be thread-safe.Two ThreadsIf we consider the race condition again, the two threads will run concurrently. They will each point to the same object database and will have their own version of local_copy. The database object will be the reason for the problems.  The program will start with Thread 1 running .update() and then the thread will call time.sleep() and allows other threads to take its place and start running. Now Thread 2 performs all the same operations just like Thread 1. It also copies database.value into its local_copy but database.value does not get updated.  Now when Thread 2 ends, the shared database.value still contains zero and both versions of local_copy have the value one. Finally, Thread 1 again wakes up and it terminates by saving its local_copy which gives a chance to Thread 2 to run. On the other hand,  Thread 2 is unaware of Thread 1 and the updated database.value.  Thread 2 also then stores its version of local_copy into database.value.  The race condition occurs here in the sense that Thread 1 and Thread 2 have interleaving access to a single shared object and they overwrite each other’s results. Race condition can also occur when one thread releases memory or closes a file handle before the work of another thread. Basic Synchronization in ThreadingYou can solve race conditions with the help of Lock. A Lock is an object that acts like a hall pass which will allow only one thread at a time to enter the read-modify-write section of the code. If any other thread wants to enter at the same time, it has to wait until the current owner of the Lock gives it up.  The basic functions are .acquire() and .release(). A thread will call my_lock.acquire() to get the Lock. However, this thread will have to wait if the Lock is held by another thread until it releases it. The Lock in Python also works as a context manager and can be used within a with statement and will be released automatically with the exit of with block. Let us take the previous FalseDatabase class and add Lock to it:class FalseDatabase: def race(self): self.value = 0 self._lock = threading.Lock() def locked_update(self, name): logging.info("Thread %s: starting update...",name) logging.debug("Thread %s about to lock...",name) with self._lock: logging.debug("Thread %s has lock...",name) local_copy = self.value local_copy += 1 time.sleep(0.1) self.value = local_copy logging.debug("Thread %s about to release lock...",name) logging.debug("Thread %s after release...",name) logging.info("Thread %s: finishing update...",name)._lock is a part of the threading.Lock() object and is initialized in the unlocked state and later released with the help of with statement. The output of the code above with logging set to warning level will be as follows: $ ./fixingracecondition.py Testing locked update. Starting value is 0. Thread 0: starting update... Thread 1: starting update... Thread 0: finishing update... Thread 1: finishing update... Testing locked update. Ending value is 2.The output of the code with full logging by setting the level to DEBUG:$ ./fixingracecondition.py Testing locked update. Starting value is 0. Thread 0: starting update... Thread 0 about to lock... Thread 0 has lock... Thread 1: starting update... Thread 1 about to lock... Thread 0 about to release lock... Thread 0 after release... Thread 0: finishing update... Thread 1 has lock... Thread 1 about to release lock... Thread 1 after release... Thread 1: finishing update... Testing locked update. Ending value is 2.The Lock provides a mutual exclusion between the threads.The Producer-Consumer Threading ProblemIn Computer Science, the Producer-Consumer Threading Problem is a classic example of a multi-process synchronization problem.  Consider a program that has to read messages and write them to disk. It will listen and accept messages as they coming in bursts and not at regular intervals. This part of the program is termed as the producer.  On the other hand, you need to write the message to the database once you have it. This database access is slow because of bursts of messages coming in. This part of the program is called the consumer.  A pipeline has to be created between the producer and consumer that will act as the changing part as you gather more knowledge about various synchronization objects.  Using LockThe basic design is a producer thread that will read from a false network and put the message into the pipeline: import random Sentinel = object() def producer(pipeline): """Pretend we're getting a message from the network.""" for index in range(10): msg = random.randint(1,101) logging.info("Producer got message: %s",msg) pipeline.set_msg(msg,"Producer") # Send a sentinel message to tell consumer we're done  pipeline.set_msg(SENTINEL,"Producer")The producer gets a random number between 1 and 100 and calls the .set_message() on the pipeline to send it to the consumer: def consumer(pipeline):     """Pretend we're saving a number in the database.""" msg = 0 while msg is not Sentinel: msg = pipeline.get_msg("Consumer") if msg is not Sentinel: logging.info("Consumer storing message: %s",msg)The consumer reads a message from the pipeline and displays the false database.The main section of the section is as follows:if __name__ == "__main__": format = "%(asctime)s: %(message)s" logging.basicConfig(format=format,level=logging.INFO, datefmt="%H:%M:%S") # logging.getLogger().setLevel(logging.DEBUG) pipeline = Pipeline() with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor: executor.submit(producer, pipeline) executor.submit(consumer, pipeline)Now let us see the code of Pipeline that will pass messages from the producer to consumer: class Pipeline:  """Class to allow a single element pipeline between producer and consumer."""  def pipeline_message(self):  self.msg = 0 self.producer_lock = threading.Lock() self.consumer_lock = threading.Lock() self.consumer_lock.acquire() def get_msg(self, name): logging.debug("%s:about to acquire getlock...",name) self.consumer_lock.acquire() logging.debug("%s:have getlock...",name) msg = self.msg logging.debug("%s:about to release setlock...",name) self.producer_lock.release() logging.debug("%s:setlock released...",name) return msg def set_msg(self, msg, name): logging.debug("%s:about to acquire setlock...",name) self.producer_lock.acquire() logging.debug("%s:have setlock...",name) self.msg=msg logging.debug("%s:about to release getlock...",name) self.consumer_lock.release() logging.debug("%s:getlock released...", name)The members of Pipeline are: .msg - It stores the message to pass..producer_lock - It is a threading.Lock object that does not allow access to the message by the producer..consumer_lock - It is a threading.Lock that does not allow to access the message by the consumer.The function pipeline_message initializes the three members and then calls .acquire() on the .consumer_lock. Now the producer has the allowance to add a message and the consumer has to wait until the message is present.  .get_msg calls .acquire on the consumer_lock and then the consumer copies the value in .msg and then calls .release() on the .producer_lock. After the lock is released, the producer can insert the message into the pipeline. Now the producer will call the .set_msg() and it will acquire the .producer_lock and set the .msg and then the lock is released and the consumer can read the value. The output of the code with the logging set to WARNING: $ ./producerconsumer_lock.py Producer got data 43  Producer got data 45  Consumer storing data: 43  Producer got data 86  Consumer storing data: 45  Producer got data 40  Consumer storing data: 86  Producer got data 62  Consumer storing data: 40  Producer got data 15  Consumer storing data: 62  Producer got data 16  Consumer storing data: 15  Producer got data 61  Consumer storing data: 16  Producer got data 73  Consumer storing data: 61  Producer got data 22  Consumer storing data: 73  Consumer storing data: 22 Objects in Threading Python consists of few more threading modules which can be handy to use in different cases. Some of which are discussed below. Semaphore A semaphore is a counter module with few unique properties. The first property is that its counting is atomic which means that the operating system will not swap the thread while incrementing or decrementing the counter. The internal counter increments when .release() is called and decremented when .acquire() is called.  The other property is that if a thread calls .acquire() while the counter is zero, then the thread will be blocked until another thread calls .release(). The main work of semaphores is to protect a resource having a limited capacity. It is used in cases where you have a pool of connections and you want to limit the size of the pool to a particular number. Timer The Timer module is used to schedule a function that is to be called after a certain amount of time has passed. You need to pass a number of seconds to wait and a function to call to create a Timer:t = threading.Timer(20.0,my_timer_function) The timer is started by calling the .start function and you can stop it by calling  .cancel(). A Timer prompts for action after a particular amount of time.  Summary In this article we have covered most of the topics associated with threading in Python. We have discussed:What is Threading Creating and starting a Thread Multiple threading Race Conditions and how to prevent them Threading Objects We hope you are now well aware of Python threading and how to build threaded programs and the problems they approach to solve. You have also gained knowledge of the problems that arise when writing and debugging different types of threaded programs.  For more information about threading and its uses in the real-world applications, you may refer to the official documentation of Python threading.  To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course. 
Rated 4.5/5 based on 44 customer reviews
24381
A Guide to Threading in Python

In Computer Science, a thread is defined as the sm... Read More

What is PyPI & How To Publish An Open-Source Python Package to PyPI

The Python Standard Library comprises of sophisticated and robust capabilities for working with larger packages. You will find modules for working with sockets and with files and file paths.Though there might be great packages that Python comes with, there are more exciting and fantastic projects outside the standard library which are mostly called the Python Packaging Index (PyPI). It is nothing but a repository of software for the Python programming language.The PyPI package is considered as an important property for Python being a powerful language. You can get access to thousands of libraries starting from Hello World to advanced deep learning libraries.What is PyPI"PyPI" should be pronounced like "pie pea eye", specifically with the "PI" pronounced as individual letters, but rather as a single sound. This minimizes confusion with the PyPy project, which is a popular alternative implementation of the Python language.The Python Package Index, abbreviated as PyPI is also known as the Cheese Shop. It is the official third-party software repository for Python, just like CPAN is the repository for  Perl.  Some package managers such as pip, use PyPI as the default source for packages and their dependencies. More than 113,000 Python packages can be accessed through PyPI.How to use PyPITo install the packages from PyPI you would need a package installer. The recommended package installer for PyPI is ‘pip’. Pip is installed along when you install Python on your system. To learn more about ‘pip’, you may go through our article on “What is pip”. The pip command is a tool for installing and managing Python packages, such as those found in the Python Package Index. It is a replacement for easy_install.To install a package from the Python Package Index, just open up your terminal and type in a search query using the PIP tool. The most common usage for pip is to install, upgrade or uninstall a package. Starting with a Small Python PackageWe will start with a small Python package that we will use as an example to publish to PyPI. You can get the full source code from the GitHub repository. The package is called reader and it is an application by which you can download and read articles. Below shows the directory structure of reader :reader/  │  ├── reader/  │   ├── config.txt  │   ├── feed.py  │   ├── __init__.py  │   ├── __main__.py  │   └── viewer.py  │  ├── tests/  │   ├── test_feed.py  │   └── test_viewer.py  │  ├── MANIFEST.in  ├── README.md  └── setup.py The source code of the package is in a reader subdirectory that is bound with a configuration file. The GitHub repository also contains few tests in a separate subdirectory. In the coming sections, we will discuss the working of the reader package and also take a look at the special files which include setup.py, README.md, MANIFEST.in, and others. Using the Article ReaderThe reader is a primitive data format used for providing users with the latest updated content. You can download the frequent articles from the article feed with the help of reader. You can get the list of articles using the reader:$ python -m reader The latest tutorials from Real Python (https://realpython.com/)   0 How to Publish an Open-Source Python Package to PyPI   1 Python "while" Loops (Indefinite Iteration)   2 Writing Comments in Python (Guide)   3 Setting Up Python for Machine Learning on Windows   4 Python Community Interview With Michael Kennedy   5 Practical Text Classification With Python and Keras   6 Getting Started With Testing in Python   7 Python, Boto3, and AWS S3: Demystified   8 Python's range() Function (Guide)   9 Python Community Interview With Mike Grouchy  10 How to Round Numbers in Python  11 Building and Documenting Python REST APIs With Flask and Connexion – Part 2  12 Splitting, Concatenating, and Joining Strings in Python  13 Image Segmentation Using Color Spaces in OpenCV + Python  14 Python Community Interview With Mahdi Yusuf  15 Absolute vs Relative Imports in Python  16 Top 10 Must-Watch PyCon Talks  17 Logging in Python  18 The Best Python Books  19 Conditional Statements in PythonThe articles in the list are numbered. So if you want to read a particular article, you can just write the same command along with the number of the article you desire to read.For reading the article on “How to Publish an Open-Source Python Package to PyPI”, just add the serial number of the article:$ python -m reader 0  # How to Publish an Open-Source Python Package to PyPI  Python is famous for coming with batteries included. Sophisticated  capabilities are available in the standard library. You can find modules  for working with sockets, parsing CSV, JSON, and XML files, and  working with files and file paths. However great the packages included with Python are, there are many  fantastic projects available outside the standard library. These are  most often hosted at the Python Packaging Index (PyPI), historically  known as the Cheese Shop. At PyPI, you can find everything from Hello  World to advanced deep learning libraries.  ...  ...  ...You can read any of the articles in the list just by changing the article number with the command. Quick LookThe package comprises of five files which are the working hands of the reader. Let us understand the implementations one by one: config.txt -  It is a text configuration file that specifies the URL of the feed of articles. The configparser standard library is able to read the text file. This type of file contains key-value pairs that are distributed into different sections.  # config.txt [feed] url=https://realpython.com/atom.xml__main__.py - It is the entry point of your program whose duty is to control the main flow of the program. The double underscores denote the specialty of this file. Python executes the contents of the __main__.py file. # __main__.py from configparser import ConfigParser  from importlib import resources  import sys from reader import feed  from reader import viewer def main(): # Read URL of the Real Python feed from config file  configure=ConfigParser() configure.read_string(resources.readtext("reader","config.txt"))  URL=configure.get("feed","url") # If an article ID is given, show the article  if len(sys.argv) > 1:  article = feed.getarticle(URL, sys.argv[1])  viewer.show(article) # If no ID is given, show a list of all articles else: site = feed.getsite(URL)  titles = feed.gettitles(URL)  viewer.showlist(site,titles)  if __name__ == "__main__": main() __init__.py - It is also considered a special file because of the double underscore. It denotes the root of your package in which you can keep your package constants, your documentations and so on. # __init__.py # Version of the realpython-reader package  __version__= "1.0.0"__version__ is a special variable in Python used for adding numbers to your package which was introduced in PEP 396. The variables which are defined in __init__.py are available as variables in the namespace also. >>> import reader >>> reader.__version__ '1.0.0'feed.py - In the __main__.py, you can see two modules feed and viewer are imported which perform the actual work. The file feed.py  is used to read from a web feed and parse the result.  # feed.py import feedparser import html2text Cached_Feeds = dict() def _feed(url):  """Only read a feed once, by caching its contents""" if url not in _CACHED_FEEDS: Cached_Feeds[url]=feedparser.parse(url) return Cached_Feeds[url]viewer.py -  This file module contains two functions show() and show_list(). # viewer.py def show(article):  """Show one article""" print(article) def show_list(site,titles):  """Show list of articles""" print(f"The latest tutorials from {site}") for article_id,title in enumerate(titles): print(f"{article_id:>3}{title}")The function of show() is to print one article to the console. On the other hand, show_list prints a list of titles.Calling a Package You need to understand which file you should call to run the reader in cases where your package consists of four different source code files. The Python interpreter consists of an -m option that helps in specifying a module name instead of a file name.An example to execute two commands with a script hello.py:$ python hello.py Hi there! $ python -m hello Hi there!The two commands above are equivalent. However, the latter one with -m has an advantage. You can also call Python built-in modules with the help of it: $ python -m antigravity Created new window in existing browser session.The -m option also allows you to work with packages and modules:$ python -m reader ...The reader only refers to the directory. Python looks out for the file named __main__.py, if the file is found, it is executed otherwise an error message is printed: $ python -m math python: No code object available for mathPreparing Your PackageSince now you have got your package, let us understand the necessary steps that are needed to be done before the uploading process. Naming the Package Finding a good and unique name for your package is the first and one of the most difficult tasks. PyPI has more than 150,000 packages already in their list, so chances are that your favorite name might be already taken. You need to perform some research work in order to find a perfect name. You can also use the PyPI search to verify whether it is already used or not.  We will be using a more descriptive name and call it realpython-reader so that the reader package can be easily found on PyPI and then use it to install the package using pip:$ pip install realpython-readerHowever, the name we have given is realpython-reader but when we import it, it is still called as reader:>>> import reader >>> help(reader) >>> from reader import feed >>> feed.get_titles() ['How to Publish an Open-Source Python Package to PyPI', ...]You can use a variety of names for your package while importing on PyPI but it is suggested to use the same name or similar ones for better understanding. Configuring your PackageYour package should be included with some basic information which will be in the form of a setup.py file. The setup.py is the only fully supported way of providing information, though Python consists of initiatives that are used to simplify this collection of information.The setup.py file should be placed in the top folder of your package. An example of a setup.py  for reader: import pathlib from setuptools import setup # The directory containing this file HERE = pathlib.Path(__file__).parent # The text of the README file README = (HERE/"README.md").read_text() # This call to setup() does all the work setup( name="realpython-reader",  version="1.0.1",  descp="The latest Python tutorials",  long_descp=README, long_descp_content="text/markdown",  URL="https://github.com/realpython/reader",  author="Real Python",  authoremail="office@realpython.com",  license="MIT",  classifiers=[  "License :: OSI Approved :: MIT License",  "Programming Language :: Python :: 3",  "Programming Language :: Python :: 3.7",  ],  packages=["reader"],  includepackagedata=True,  installrequires=["feedparser","html2text"],  entrypoints={  "console_scripts":[  "realpython=reader.__main__:main",  ]  },  ) The necessary parameters available in setuptools in the call to setup() are as follows: name - The name of your package as being appeared on PyPI version - the present version of your package packages - the packages and subpackages which contain your source code You will also have to specify any subpackages if included. setuptools contains find_packages() whose job is to discover all your subpackages. You can also use it in the reader project:from setuptools import find_packages,setup  setup(  ... packages=find_packages(exclude=("tests",)), ... ) You can also add more information along with name, version, and packages which will make it easier to find on PyPI.Two more important parameters of  setup() : install_requires - It lists the dependencies your package has to the third-party libraries. feedparser and html2text are listed since they are the dependencies of reader.entry_points - It creates scripts to call a function within your package. Our script realpython calls the main() within the reader/__main__.py file.Documenting Your PackageDocumenting your package before releasing it is an important step. It can be a simple README file or a complete tutorial webpage like galleries or an API reference.  At least a README file with your project should be included at a minimum which should give a quick description of your package and also inform about the installation process and how to use it. In other words, you need to include your README as the long_descp argument to setup() which will eventually be displayed on PyPI. PyPI uses Markdown for package documentation. You can use the setup() parameter long_description_content_type to get the PyPI format you are working with. When you are working with bigger projects and want to add more documentation to your package, you can take the help of websites like GitHub and Read the Docs. Versioning Your Package Similarly like documentation, you need to add a version to your package. PyPI promises reproducibility by allowing a user to do one upload of a particular version for a package. If there are two systems with the same version of a package, it will behave in an exact manner. PEP 440 of Python provides a number of schemes for software versioning. However, for a simple project, let us stick to a simple versioning scheme. A simple versioning technique is semantic versioning which has three components namely MAJOR, MINOR, and PATCH and some simple rules about the incrementation process of each component: Increment the MAJOR version when you make incompatible API changes. Increment the MINOR version when you add functionality in a backward-compatible manner. Increment the PATCH version when you make backward-compatible bug fixes. (Source) You need to specify the different files inside your project. Also, if you want to verify whether the version numbers are consistent or not, you can do it using a tool called Bumpversion: $ pip install bumpversionAdding Files To Your PackageYour package might include other files other than source code files like data files, binaries, documentation and configuration files. In order to add such files, we will use a manifest file. In most cases, setup() creates a manifest that includes all code files as well as README files.   However, if you want to change the manifest, you can create a manifest template of your own. The file should be called MANIFEST.in and it will specify rules for what needs to be included and what needs to be excluded: include reader/*.txtThis will add all the .txt files in the reader directory. Other than creating the manifest, the non-code files also need to be copied. This can be done by setting the include_package_data toTrue: setup(  ... include_package_data=True,  ... )Publishing to PyPI For publishing your package to the real world, you need to first start with registering yourself on PyPI and also on TestPyPI, which is useful because you can give a trial of the publishing process without any further consequences. You will have to use a tool called Twine to upload your package ton PyPI: $ pip install twineBuilding Your PackageThe packages on PyPI are wrapped into distribution packages, out of which the most common are source archives and Python wheels. A source archive comprises of your source code and other corresponding support files wrapped into one tar file. On the other hand, a Python wheel is a zip archive that also contains your code. However, the wheel can work with any extensions, unlike source archives. Run the following command in order to create a source archive and a wheel for your package: $ python setup.py sdist bdist_wheelThe command above will create two files in a newly created directory called dist, a source archive and a wheel: reader/ │  └── dist/      ├── realpython_reader-1.0.0-py3-none-any.whl      └── realpython-reader-1.0.0.tar.gz The command-line arguments like the sdist and bdist_wheel arguments are all implemented int the upstream distutils standard library. Using the --help-commands option, you list all the available arguments: $ python setup.py --help-commands  Standard commands:    build             build everything needed to install    build_py          "build" pure Python modules (copy to build directory)    < ... many more commands ...>Testing Your Package In order to test your package, you need to check whether the distribution packages you have newly created contain the expected files. You also need to list the contents of the tar source archive on Linux and macOS platforms: $ tar tzf realpython-reader-1.0.0.tar.gz  realpython-reader-1.0.0/  realpython-reader-1.0.0/setup.cfg  realpython-reader-1.0.0/README.md  realpython-reader-1.0.0/reader/  realpython-reader-1.0.0/reader/feed.py  realpython-reader-1.0.0/reader/__init__.py  realpython-reader-1.0.0/reader/viewer.py  realpython-reader-1.0.0/reader/__main__.py  realpython-reader-1.0.0/reader/config.txt  realpython-reader-1.0.0/PKG-INFO  realpython-reader-1.0.0/setup.py  realpython-reader-1.0.0/MANIFEST.in  realpython-reader-1.0.0/realpython_reader.egg-info/  realpython-reader-1.0.0/realpython_reader.egg-info/SOURCES.txt  realpython-reader-1.0.0/realpython_reader.egg-info/requires.txt  realpython-reader-1.0.0/realpython_reader.egg-info/dependency_links.txt  realpython-reader-1.0.0/realpython_reader.egg-info/PKG-INFO  realpython-reader-1.0.0/realpython_reader.egg-info/entry_points.txt  realpython-reader-1.0.0/realpython_reader.egg-info/top_level.txt On Windows, you can make use of the utility tool 7-zip to look inside the corresponding zip file. You should make sure that all the subpackages and supporting files are included in your package along with all the source code files as well as the newly built files. You can also run twine check on the files created in dist to check if your package description will render properly on PyPI: $ twine check dist/* Checking distribution dist/realpython_reader-1.0.0-py3-none-any.whl: Passed  Checking distribution dist/realpython-reader-1.0.0.tar.gz: Passed Uploading Your PackageNow you have reached the final step,i.e. Uploading your package to PyPI. Make sure you upload your package first to TestPyPI to check whether it is working according to your expectation and then use the Twine tool and instruct it to upload your newly created distribution: $ twine upload --repository-url https://test.pypi.org/legacy/ dist/* After the uploading process is over, you can again go to TestPyPI and look at your project being displayed among the new releases.  However, if you have your own package to publish, the command is short: $ twine upload dist/* Give your username and password and it’s done. Your package has been published on PyPI. To look up your package, you can either search it or look at the Your projects page or you can just directly go to the URL of your project: pypi.org/project/your-package-name/. After completing the publishing process, you can download it in your system using pip: $ pip install your-package-nameMiscellaneous Tools There are some useful tools that are good to know when creating and publishing Python packages. Some of these are mentioned below. Virtual Environments Each virtual environment has its own Python binary and can also have its own set of installed Python packages in its directories. These packages are independent in nature. Virtual environments are useful in situations where there are a variety of requirements and dependencies while working with different projects. You can grab more information about virtual environments in  the following references: Python Virtual Environments Pipenv It is recommended to check your package inside a basic virtual environment so that to make sure all necessary dependencies in your setup.py file are included. Cookiecutter Cookiecutter sets up your project by asking a few questions based on a template. Python contains many different templates. Install Cookiecutter using pip: $ pip install cookiecutterTo understand cookiecutter, we will use a template called pypackage-minimal. If you want to use a template, provide the link of the template to the cookiecutter: $ cookiecutter https://github.com/kragniz/cookiecutter-pypackage-minimal  author_name [Louis Taylor]: Real Python  author_email [louis@kragniz.eu]: office@realpython.com  package_name [cookiecutter_pypackage_minimal]: realpython-reader  package_version [0.1.0]:  package_description [...]: Read Real Python tutorials  package_url [...]: https://github.com/realpython/reader  readme_pypi_badge [True]:  readme_travis_badge [True]: False  readme_travis_url [...]: Cookiecutter sets up your project after you have set up answered a series of questions. The template above will create the following files and directories: realpython-reader/  │  ├── realpython-reader/  │   └── __init__.py  │  ├── tests/  │   ├── __init__.py  │   └── test_sample.py  │  ├── README.rst  ├── setup.py  └── tox.ini You can also take a look at the documentation of cookiecutter for all the available cookiecutters and how to create your own template. Summary Let us sum up the necessary steps we have learned in this article so far to publish your own package - Finding a good and unique name for your packageConfiguring your package using setup.py Building your package Publishing your package to PyPI Moreover, you have also learned to use a few new tools that help in simplifying the process of publishing packages.  You can reach out to Python’s Packaging Authority for more detailed and comprehensive information. To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course. 
Rated 4.5/5 based on 34 customer reviews
11932
What is PyPI & How To Publish An Open-Source P...

The Python Standard Library comprises of sophistic... Read More

Boosting and AdaBoost in Machine Learning

Ensemble learning is a strategy in which a group of models are used to find a solution to a challenging problem, by using a strategy and combining diverse machine learning models into one single predictive model.In general, ensemble methods are mainly used for improving the overall performance accuracy of a model and combine several different models, also known as the base learners, to predict the results, instead of using a single model.In one of the articles related to ensemble learning, we have already discussed about the popular ensemble method, Bootstrap Aggregation. Bagging tries to implement similar learners on small sample populations and then takes a mean of all the predictions. It combines Bootstrapping and Aggregation to form one ensemble model. It basically reduces the variance error and helps to avoid overfitting. In this article we will look into the limitations of bagging and how a boosting algorithm can be used to overcome those limitations. We will also learn about various types of boosting algorithms and implement one of them in Python. Let’s get started.What are the limitations of Bagging?Let us recall the concept of bagging and consider a binary classification problem. We are either classifying an observation as 0 or as 1.In bagging, T bootstrap samples are selected, a classifier is fitted on each of these samples, and the models are trained in parallel. In a Random Forest, decision trees are trained in parallel. Then the results of all classifiers are averaged into a bagging classifier:Formula for a Bagging ClassifierLet us consider 3 classifiers and the result for the classification can either be right or wrong. If we plot the results of the 3 classifiers, there are regions in which the classifiers will be wrong. These regions are represented in red in the figure below.Example case in which Bagging works wellThe above example works pretty well as when one classifier is wrong, the two others are correct. By voting classifier, you can achieve a better accuracy. However, there are cases where Bagging does not work properly, when all classifiers are mistaken to be in the same region.Due to this reason, the intuition behind the discovery of Boosting was the following :instead of training parallel models, one should train models sequentiallyeach model should focus on where the performance of the previous classifier was poorWith this intuition, Boosting algorithm was introduced. Let us understand what Boosting is all about.What is Boosting?Boosting is an ensemble modeling technique which attempts to build a strong classifier from the number of weak classifiers. It is done by building a model using weak models in series. First, a model is built from the training data. Then the second model is built which tries to correct the errors present in the first model. This procedure is continued and models are added until either the complete training data set is predicted correctly or the maximum number of models are added.Boosting being a sequential process, each subsequent model attempts to correct the errors of the previous model. It is focused on reducing the bias unlike bagging. It makes the boosting algorithms prone to overfitting. To avoid overfitting, parameter tuning plays an important role in boosting algorithms, which will be discussed in the later part of this article. Some examples of boosting are XGBoost, GBM, ADABOOST etc..How can boosting identify weak learners?To find weak learners, we apply base learning (ML) algorithms with a different distribution. As each time base learning algorithm is applied, it generates a new weak prediction rule. This is an iterative process. After many iterations, the boosting algorithm combines these weak rules into a single strong prediction rule.How do we choose a different distribution for each round?Step 1: The base learner takes all the distributions and assigns equal weight or attention to each observation.Step 2: If there is any prediction error caused by first base learning algorithm, then we pay higher attention to observations having prediction error. Then, we apply the next base learning algorithm.Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher accuracy is achieved.Finally, it combines the outputs from weak learner and creates a strong learner which eventually improves the prediction power of the model. Boosting gives higher focus to examples which are mis-classified or have higher errors by preceding weak rules.How would you classify an email as SPAM or not?Our initial approach would be to identify ‘SPAM’ and ‘NOT SPAM’ emails using the following criteria. If: Email has only one image file (promotional image), It’s a SPAM.Email has only link(s), It’s a SPAM.Email body consists of sentences like “You won a prize money of $ xxxxxx”, It’s a SPAM.Email from our official domain “www.knowledgehut.com” , Not a SPAM.Email from known source, Not a SPAM.Individually, these rules are not powerful enough to classify an email into ‘SPAM’ or ‘NOT SPAM’. Therefore, these rules are called as weak learner.To convert weak learner to strong learner, we’ll combine the prediction of each weak learner using methods like:Using average/ weighted averageConsidering prediction has higher voteExample: Above, we have defined 5 weak learners. Out of these 5, 3 are voted as ‘SPAM’ and 2 are voted as ‘Not a SPAM’. In this case, by default, we’ll consider an email as SPAM because we have higher(3) vote for ‘SPAM’Boosting helps in training a series of low performing algorithms, called weak learners, simply by adjusting the error metric over time. Weak learners are considered to be those algorithms whose error rate is slightly under 50% as illustrated below:Weighted errorsLet us consider data points on a 2D plot. Some of the data points will be well classified, others won’t. The weight attributed to each error when computing the error rate is 1/n where n is the number of data points to classify.Now if we apply some weight to the errors :You might now notice that we give more weight to the data points that are not well classified. An illustration of the weighting process is mentioned below:Example of weighting processIn the end, we want to build a strong classifier that may look like the figure mentioned below:Strong ClassifierTree stumpsThere might be a question in your mind about how many classifiers should one implement in order to ensure it works well. And how is each classifier chosen at each step?Well, Tree stumps defines a 1-level decision tree. At each step, we need to find the best stump, i.e the best data split, which will minimize the overall error. You can see a stump as a test, in which the assumption is that everything that lies on one side belongs to class 1, and everything that lies on the other side belongs to class 0.Many such combinations are possible for a tree stump. Let us look into an example to understand how many combinations we face.3 data points to splitWell there are 12 possible combinations. Let us check how.12 StumpsThere are 12 possible “tests” we could make. The “2” on the side of each separating line simply represents the fact that all points on one side could be points that belong to class 0, or to class 1. Therefore, there are 2 tests embedded in it.At each iteration t, we will choose ht the weak classifier that splits best the data, by reducing the overall error rate the most. Recall that the error rate is a modified error rate version that takes into account what has been introduced before.Finding the best splitThe best split is found by identifying at each iteration t, the best weak classifier ht, generally a decision tree with 1 node and 2 leaves (a stump). Let us consider an example of credit defaulter, i.e whether a person who borrowed money will return or not.Identifying the best splitIn this case, the best split at time t is to stump on the Payment history, since the weighted error resulting from this split is minimum.Simply note that decision tree classifiers like these ones can in practice be deeper than a simple stump. This will be considered as a hyper-parameter.Combining classifiersIn the next step we combine the classifiers into a Sign classifier, and depending on which side of the frontier a point will stand, it is classified as 0 or 1. It can be achieved by:Combining classifiersYou can improve the classifier by adding weights on each classifier, to avoid giving the same importance to the different classifiers.AdaBoostPseudo-codePseudo-codeThe key elements to keep in mind are:Z is a constant whose role is to normalize the weights so that they add up to 1αt is a weight that we apply to each classifierThis algorithm is called AdaBoost or Adaptive Boosting. This is one of the most important algorithms among all boosting methods.ComputationBoosting algorithms are generally fast to train, although we consider every stump possible and compute exponentials recursively.Well, if we choose αt and Z properly, the weights that are supposed to change at each step simplify to:Weights after choice of α and ZTypes of Boosting AlgorithmsUnderlying engine used for boosting algorithms can be anything.  It can be decision stamp, margin-maximizing classification algorithm etc. There are many boosting algorithms which use other types of engines such as: AdaBoost (Adaptive Boosting)Gradient Tree BoostingXGBoostIn this article, we will focus on AdaBoost and Gradient Boosting followed by their respective Python codes and a little bit about XGBoost.Where are Boosted algorithms required?Boosted algorithms are mainly used when there is plenty of data to make a prediction and high predictive power is expected. It is used to reduce bias and variance in supervised learning. It combines multiple weak predictors to build strong predictor.The underlying engine used for boosting algorithms can be anything. For instance, AdaBoost is a boosting done on Decision stump. There are many other boosting algorithms which use other types of engine such as:GentleBoostGradient BoostingLPBoostBrownBoostAdaptive BoostingAdaptive Boosting, or most commonly known AdaBoost, is a Boosting algorithm. This algorithm uses the method to correct its predecessor. It pays more attention to under fitted training instances by the previous model. Thus, at every new predictor the focus is more on the complicated cases more than the others.It fits a sequence of weak learners on different weighted training data. It starts by predicting the original data set and gives equal weight to each observation. If prediction is incorrect using the first learner, then it gives higher weight to observation which have been predicted incorrectly. Being an iterative process, it continues to add learner(s) until a limit is reached in the number of models or accuracy.Mostly, AdaBoost uses decision stamps. But, we can use any machine learning algorithm as base learner if it accepts weight on training data set. We can use AdaBoost algorithms for both classification and regression problems.Let us consider the example of the image mentioned above. In order to build an AdaBoost classifier, consider that as a first base classifier a Decision Tree algorithm is trained to make predictions on our training data. Applying the following methodology of AdaBoost, the weight of the misclassified training instances is increased. Then the second classifier is trained and the updated weights are acknowledged. It repeats the procedure over and over again.At the end of every model prediction we end up boosting the weights of the misclassified instances so that the next model does a better job on them, and so on.This sequential learning technique might sound similar to Gradient Descent, except that instead of tweaking a single predictor’s parameter to minimize the cost function, AdaBoost adds predictors to the ensemble, gradually making it better.One disadvantage of this algorithm is that the model cannot be parallelized since each predictor can only be trained after the previous one has been trained and evaluated.Below are the steps for performing the AdaBoost algorithm:Initially, all observations are given equal weights.A model is built on a subset of data.Using this model, predictions are made on the whole dataset.Errors are calculated by comparing the predictions and actual values.While creating the next model, higher weights are given to the data points which were predicted incorrectly.Weights can be determined using the error value. For instance,the higher the error the more is the weight assigned to the observation.This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.Hyperparametersbase_estimators: specify the base type estimator, i.e. the algorithm to be used as base learner.n_estimators: It defines the number of base estimators, where the default is 10 but you can increase it in order to obtain a better performance.learning_rate: same impact as in gradient descent algorithmmax_depth: Maximum depth of the individual estimatorn_jobs: indicates to the system how many processors it is allowed to use. Value of ‘-1’ means there is no limit;random_state: makes the model’s output replicable. It will always produce the same results when you give it a fixed value as well as the same parameters and training data.Now, let us take a quick look at how to use AdaBoost in Python using a simple example on handwritten digit recognition.import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score from sklearn.model_selection import cross_val_score from sklearn.model_selection import cross_val_predict from sklearn.model_selection import train_test_split from sklearn.model_selection import learning_curve from sklearn.datasets import load_digitsLet us load the data :dataset = load_digits() X = dataset['data'] y = dataset['target']X contains arrays of length 64 which are simply flattened 8x8 images. The aim of this dataset is to recognize handwritten digits. Let’s take a look at a given handwritten digit:plt.imshow(X[4].reshape(8,8))If we stick to a Decision Tree Classifier of depth 1 (a stump), here’s how to implement AdaBoost classifier:reg_ada = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1)) scores_ada = cross_val_score(reg_ada, X, y, cv=6) scores_ada.mean()0.2636257855582272And it should head a result of around 26%, which can largely be improved. One of the key parameters is the depth of the sequential decision tree classifiers. How does accuracy improve with depth of the decision trees?score = [] for depth in [1,2,10] : reg_ada = AdaBoostClassifier(DecisionTreeClassifier(max_depth=depth)) scores_ada = cross_val_score(reg_ada, X, y, cv=6) score.append(scores_ada.mean()) score[0.2636257855582272, 0.5902852679072207, 0.9527524912410157]And the maximal score is reached for a depth of 10 in this simple example, with an accuracy of 95.3%.Gradient BoostingThis is another very popular Boosting algorithm which works pretty similar to what we’ve seen for AdaBoost. Gradient Boosting works by sequentially adding the previous predictors underfitted predictions to the ensemble, ensuring the errors made previously are corrected.The difference lies in what it does with the underfitted values of its predecessor. Contrary to AdaBoost, which tweaks the instance weights at every interaction, this method tries to fit the new predictor to the residual errors made by the previous predictor.So that you can understand Gradient Boosting it is important to understand Gradient Descent first.Below are the steps for performing the Gradient Boosting algorithm:A model is built on a subset of data.Using this model, predictions are made on the whole dataset.Errors are calculated by comparing the predictions and actual values.A new model is created using the errors calculated as target variable. Our objective is to find the best split to minimize the error.The predictions made by this new model are combined with the predictions of the previous.New errors are calculated using this predicted value and actual value.This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.Hyperparametersn_estimators: It controls the number of weak learners.Learning_rate: Controls the contribution of weak learners in the final combination. There is a trade-off between learning_rate and n_estimators.min_samples_split: Minimum number of observation which is required in a node to be considered for splitting. It is used to control overfitting.min_samples_leaf: Minimum samples required in a terminal or leaf node. Lower values should be chosen for imbalanced class problems since the regions in which the minority class will be in the majority will be very small.min_weight_fraction_leaf: similar to the previous but defines a fraction of the total number of observations instead of an integer.max_depth : maximum depth of a tree. Used to control overfitting.max_lead_nodes : maximum number of terminal leaves in a tree. If this is defined max_depth is ignored.max_features : number of features it should consider while searching for the best split.You can tune loss function for better performance.Implementation in PythonYou can find Gradient Boosting function in Scikit-Learn’s library.# for regression from sklearn.ensemble import GradientBoostingRegressor model = GradientBoostingRegressor(n_estimators=3,learning_rate=1) model.fit(X,Y) # for classification from sklearn.ensemble import GradientBoostingClassifier model = GradientBoostingClassifier() model.fit(X,Y)XGBoostXG Boost or Extreme Gradient Boosting is an advanced implementation of the Gradient Boosting. This algorithm has high predictive power and is ten times faster than any other gradient boosting techniques. Moreover, it includes a variety of regularization which reduces overfitting and improves overall performance.AdvantagesIt implements regularization which helps in reducing overfit (Gradient Boosting does not have);It implements parallel processing which is much faster than Gradient Boosting;Allows users to define custom optimization objectives and evaluation criteria adding a whole new dimension to the model;XGBoost has an in-built routine to handle missing values;XGBoost makes splits up to the max_depth specified and then starts pruning the tree backwards and removes splits beyond which there is no positive gain;XGBoost allows a user to run a cross-validation at each iteration of the boosting process and thus it is easy to get the exact optimum number of boosting iterations in a single run.Boosting algorithms represent a different machine learning perspective which is turning a weak model to a stronger one to fix its weaknesses. I hope this article helped you understand how boosting works.We have covered most of the topics related to algorithms in our series of machine learning blogs, click here. If you are inspired by the opportunities provided by machine learning, enroll in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape.
Rated 4.5/5 based on 12 customer reviews
7921
Boosting and AdaBoost in Machine Learning

Ensemble learning is a strategy in which a group o... Read More