Featured blog posts

Why Tableau is The Most Important Tool For Data Visualization?

There are various tools available for data visualization which helps in presenting the visualizations in a very simple manner. Tableau is one of the f... Read More

Guide To Install Power BI

The Power BI demand has raised significantly in recent times due to its amazing features and acceptance by the data visualization professionals/busine... Read More

Top 20 Data Visualization Tools

Why Data Visualization?Data visualization in layman’s language describes any determined attempt to help people understand the significance of da... Read More

SSRS Vs Power BI

SSRS and Power BI are both Business Intelligence tools specially designed to present data to the end user. Since both are part of the Microsoft BI sta... Read More

Power BI Salary-Earning Potential of a Power BI Individual

Power BI, a business analytics service by Microsoft, offers interactive visualizations and business intelligence capabilities using a simple interface... Read More

Differences Between Power BI and Tableau: Power BI  Vs Tableau Comparison

Power BI is a Data Visualization and Business Intelligence tool provided by Microsoft. It can collect data from different data sources like Excel spre... Read More

How to Build a Python GUI Application With wxPython

A Graphical User Interface or GUI is a user interface that includes graphical elements that enable a person to communicate with electronic devices lik... Read More

Apache Kafka Vs Apache Spark: Know the Differences

A new breed of ‘Fast Data’ architectures has evolved to be stream-oriented, where data is processed as it arrives, providing businesses wi... Read More

How To Run Your Python Scripts

If you are planning to enter the world of Python programming, the first and the most essential skill you should learn is knowing how to run Python scr... Read More

Support Vector Machines in Machine Learning

While many classifiers exist that can classify linearly separable data such as logistic regression, Support Vector Machines can handle highly non-line... Read More


Latest Posts

Failures in Agile Transformation

With the increase in Agile adoption across the organization, the stories are piling up too. Now the question is, are these success stories or the failure stances. Usually, in the conferences or meetups, you get to meet people from various establishments, this is just another way of getting to hear what worked or what not. Even in your own organization, you can feel the pulse of the transformation. There are many reasons why we end up with messy transformation, I have tried elaborating few in the discussion today but there’s more to it. As you read, you might connect with some of the points, let’s start the digging:1. AIMING ON PROCESSES AND NOT PEOPLEThe inclination of Agile is more towards people or individuals, we talk about empowering team, making them self-organized. We even talk about creating high performing teams which can deliver maximum value by the end of every increment. With the adoption of Scrum, where the core values involve Focus, Openness, Respect, Commitment and Courage at the root level, the ‘people’ factor becomes important. Ultimately, it the ‘people’ who work at the ground level for client satisfaction, to achieve this, they adopt processes for smoother workflow. That’s it! Organizations, during their transformation, tend to overlook this critical piece. Their approach becomes more inclined towards processes and they start considering Agile as a process but in fact, Agile is more of a mindset change. Process starts getting priority over people or individuals, always remember, process is just for supporting or helping the individuals with their work. People are not for the processes, even one of the four principles in the agile manifesto says: “Individuals and Interactions over Processes and tools”. I was surprised to hear in one of the trainings’ that people in the same team are using mail to communicate around the stories/deliverables. Can we not give preference to face to face interactions?2. MICROMANAGING THROUGH CEREMONIESSelf-organization is one of entities in Agile, the core says, trust your teams. Micromanagement not only hinders creativity, but it also impacts the morale of agile teams. Management or the scrum master should refrain from getting into the minute details during the scrum ceremonies. Most of the times, it is observed that the daily scrum gets converted into a status meeting either for the scrum master or for the technical lead. As a Scrum Master, you should help the team getting self-organized rather than being directive. One of the principles from Agile talks about giving the space and trust: “The best architectures, requirements, and designs emerge from self-organizing teams.” This enables the team to find their own solutions, helps them with innovation and most of all, teaches them the value of being a team. Every ceremony has its own schema and none of the ceremony is to track individual chores. You must trust your teams, remember, trust can do wonders, believe me, if you trust your people, they will never let you down. Here comes another principles’ focusing on the same. “Build projects around motivated individuals. Give them the environment and support they need and trust them to get the job done.”3. Hastening the TransformationAny change, whether it is for the system or for an individual, takes time. When was the last time you promised yourself to change one of your habits, was that easy? Almost every individual at the start of the year takes so many resolutions, how many do you think gets accomplished! Same goes with the transformation, it takes time and you will have to give because it is not just about the individual, it is about the organization. When the transformation journey starts, the delivery teams take the maximum heat. They are the ones who are expected to be Agile BUT the management usually lags, they are still in need of mindset to support the change. Just by doing the agile ceremonies is not being Agile, it comes with a wide variety of shift, be it - culture, technical, management or team. You have to accept that it is going to be a challenging journey which will have its own milestones and you cannot skip those. Also, it is long journey with hiccups, be ready to accept the challenges, learn from the mistakes and come up with the action items to improve.4. Scrum Roles Getting a Back SeatWhen teams in agile are formed, some of the roles are asked to play dual. In such scenarios, scrum master roles can be played either by the team lead or by someone who is ready for that extra piece. Though the role gets played, but the essence of this position goes for a toss. The focus just stays on delivery without instilling the purpose.  The organisation needs to acknowledge that scrum master'sis a fulltime job that helps teams in staying on track, motivated and helps them see their own progress through information radiators. Anyone else playing a dual role might not do justice and end up walking on two different tracks - not able to reach the goal in both the positions. The team needs someone who can teach, mentor, train and be with them. Just forming the team is not enough if the organization doesn't follow the best practices, such cases tend to get trapped.5. Lack of support from Middle managementIn my past experiences with the organizations in transformation mode, usually the middle layer is where the problem brews. Transformations get a go ahead from the top layer, but it gets difficult for the middle management to adopt. It is important to ensure alignment among the leaders of the organization on the aspiration and value of the transformation is done before moving ahead with the approach. In another example we can see a project manager being insecure as most of the juggling is being handled by the scrum master. In such cases, to prove their existence, the middle layer starts getting into the little details, this impacts the team as half the time they are functioning as per the project manager. Due to this, we hear a lot about the role of scrum master being questioned on their responsibilities. They are assumed to be just sitting and watching, in other words, doing nothing! According to the research led by in Turkey in 2018, the two major hindrances to agile transformation were found out to be the resistance to transformation and culture transformation. Existing managers have a lot to do in order to manage with these two major challenges, they should be part of the transformation beginning from themselves, their day-to-day actions, and should guide the whole company by being supportive of the process.6. Tools over mindsetWith the transformation comes the need to use fancy tools and abide by ‘laws of the tools. Some feel proud in using the costliest tool, some make it a point to introduce the tool being used by the competitor. But is it worth it? Transformation can be done using ‘MS Excel’, there is no set protocol for focusing on the tool. Though the tools play an important part, but teams should focus on Agility and Scrum framework first and then on tools. You certainly need to track metrics like velocity, burn-down, estimates. But with the right mindset, the goal can be achieved with no trouble. This doesn't mean tools are bad, but it means mindset should be given priority over the extensive use of tools.7. Transformation as a Destination (Thinking Your Transformation Is Done)Many a time, we hear ‘We are now agile, we transformed in so and so year’ and what not. But is agile a destination, NO, it a journey, a never-ending journey. Some teams think just by implementing a bunch of backlogs, doing the ceremonies and tracking metrics, they are Agile. No, they are not!Whatever way out we have today are based on our experience, knowledge and the situation we are in. If any of the factors change, our solutioning will be different. Both the values and principles of the Agile Manifesto point to the continuity of the process. Responding to change over following a plan relates appropriately as much to the processes as to our sprint outputs. We have to understand that process is never whole. You just have to continue reflecting what worked and how to fine tune the process whenever required.8. Misunderstood cross-functional teamEvery time I speak about the delivery teams, at least one of the participants from the audience ask this question “We say it's a cross functional team, but my tester is not ready to code!” Have we understood it correct? None of the processes are bad, it is about how we adopt them that makes a difference. “Cross Functional Doesn’t Mean Everyone Can Do Everything, a cross-functional team has members with a variety of skills, but that does not mean each member has all of the skills.”  – Mike Cohn. This interpretation of cross-functional imposes a pressure on the delivery team which breaks the team apart and is sometimes the cause of conflicts among the members. As per scrum guide – “Cross functional teams are groups consisting of people from different functional areas of the company. – it should be formed not only with technical specialists (Back-end, Front-end developers, QA engineers, etc.), but also consists of member like Business Analysts, Marketing and UX specialists or anyone else taking an active part in the project.”9. Scrum for AllJust because everyone is going that way doesn’t mean that way is for you too! It is necessary to understand the current and what exactly you want it to be once the journey starts.  Scrum is one of the frameworks to help with complex adaptive projects, but it is not for all the products or projects. If you are transforming your IT helpdesk department, scrum might come out as a failure, the support team cannot say that they will be delivering 10 high priority tickets after the end of sprint, where sprint duration ranges from two week to a month. Second scenario can be the team handling production defects. But this does not mean that Scrum is bad, no it is not. It is just that these scenarios are not meant for Scrum.Every story is different and so are the reasons, as said earlier as well, this is not just a complete list, there can numerous other details depending on the situation. I will be happy to hear your viewpoints on the misalignment and disorientation. Lastly, it is significant to focus more on the people, the mindset and the collaboration to get better results.
Rated 4.5/5 based on 5 customer reviews
Failures in Agile Transformation

With the increase in Agile adoption across the org... Read More

How to Build a Python GUI Application With wxPython

A Graphical User Interface or GUI is a user interface that includes graphical elements that enable a person to communicate with electronic devices like computers, hand-held devices, and other appliances. It displays information using icons, menus, and graphics. They are handled with the help of a pointing device such as a mouse, trackball or a stylus.A GUI is basically an application that has windows, buttons and lots of other widgets that allows a user to interact with your application. A web browser is a common example of a GUI that has buttons, tabs, and the main window and where the content is displayed.It was developed by the Xerox Palo Alto research laboratory in the late 1970s. Today, every OS has its own GUI. Software applications make use of these and develop their own GUIs.Python contains many GUI toolkits, out which Tkinter, wxPython, and PyQt are the important ones. All these toolkits have the ability to work with Windows, macOS, and Linux with the additional quality of working on mobile phones.How to get started with wxPython?wxPython was first released in the year 1998. It is an open-source cross-platform Python GUI toolkit used as a wrapper class around a C++ library called wxWidgets. The main feature of wxPython which distinguishes itself from other toolkits like PyQt and Tkinter is that it uses actual widgets on the native platform where required. This allows the wxPython applications to look native to the operating system in which it is running.The wxPython toolkit contains a lot of core widgets along with many custom widgets which you can download from the Extra Files section of the wxPython official page.Here, there is a download of the wxPython demo package which is an application demonstrating the robustness of the widgets included with wxPython. The main advantage of this demo is that you can view it one tab and run it in another and it also allows to edit and re-run the code to observe the changes.Installing wxPythonYou will be using the latest wxPython release, wxPython 4, which is also called the wxPython’s Project Phoenix. It is a new implementation of wxPython that aims at improving the speed, maintainability, and extensibility of wxPython. The wxPython 3 and wxPython 2 were built only for Python 2. The maintainer of wxPython rejected a lot of aliases and cleaned up a lot of code to make wxPython more easy and Pythonic.If you are migrating from an older version of wxPython to wxPython 4, take a look at the following references:Classic version vs project PhoenixPhoenix Migration GuideThe Phoenix version is compatible with both Python 2.7 and Python 3. You can use pip to install wxPython 4:$ pip install wxpythonYou will get a prerequisites section on the Github page of wxPython which will provide information to install wxPython on Linux systems.You can also look into the Extras Linux section to learn about the Python wheels for both GTK2 and GTK3 versions. To install one of the wheels, use the command below:$ pip install -U -f wxPythonRemember to modify the command to match with the version of Linux.Components of GUIAs mentioned earlier, GUI is nothing but an interface that allows user interaction.Common components of the user interfaces:Main Window.Menu.Toolbar.Buttons.Text Entry.Labels.These items are generally known as widgets. wxPython supports many other common widgets and many custom widgets that are arranged in a logical manner by a developer to allow user interaction.Event LoopsA GUI works by waiting for the user to perform an action. This is known as an event. An event occurs when something is typed by the user or when the user uses their mouse to press a button or some widget while the application is in focus.The GUI toolkit runs an infinite loop called an event loop underneath the covers. The task of the event loop is to act on occurred events on the basis of what the developer has coded the application to do. The application ignores the event when it is not able to catch it.When you are programming a graphical user interface, make sure to attach the widgets to event handlers in order to make your application do something.You can also block an event loop to make the GUI unresponsive which will appear to freeze to the user. This is a special consideration for you to keep in mind while working with event loops. Launch a special thread or process whenever a GUI takes longer than a quarter of a second to launch a process.The frameworks of wxPython contain special thread-safe methods that you can use to communicate back to your application. This informs the thread is finished or given an update.How to create a Skeleton Application?An application skeleton is basically used for prototyping. It is a user interface comprising of widgets that do not contain event handlers. You just need to create the GUI and show it to the stakeholders for signing off and avoid spending time on the backend logic.An example of creating a Hello World application with Python:import wx   application = wx.App() framework = wx.Frame(parent=None, title='Hello World') framework.Show() app.MainLoop()In the example above, there are two parts of the program – wx.App and wx.Frame. The former one is wxPython’s application object which is basically required for running the GUI. It initiates the .MainLoop() which is the event loop you have learned earlier.The latter part creates a window for user interaction. It informs wxPython that the frame has no parent and its title is Hello World. If you run the code above, this is how it will look like:The application will look different if you execute it in Mac or Linux.Note: Mac users may get the following message: This program needs access to the screen. Please run with a Framework build of Python, and only when you are logged in on the main display of your Mac. If you see this message and you are not running in a virtualenv, then you need to run your application with pythonw instead of python. If you are running wxPython from within a virtualenv, then see the wxPython wiki for the solution.The minimize, maximize and exit will be included in the wx.Frame by default. However, most wxPython code will require you to make the wx.Frame as a subclass and other widgets in order to grab the full power of the toolkit.Let us rewrite the code using class:import wx   class MyFramework(wx.Frame):     def frame(self):         super().frame(parent=None, title='Hello World') self.Show()   if __name__ == '__main__':     application = wx.App()     framework = MyFramework() application.MainLoop()This code can be used as a template for your application.Widgets in wxPythonThe wxPython toolkit allows you to create rich applications from more than one hundred widgets. But it can be very daunting to choose the perfect widget from such a large number, so wxPython has included a wxPython Demo which contains a search filter which will help you to find the right widget from the list.Now, let us add a button and allow the user to enter some text by adding a text field:import wx   class MyFramework(wx.Frame):     def frame(self):         super().frame(parent=None, title='Hello World')         panel = wx.Panel(self)   self.text_ctrl = wx.TextCtrl(panel, pos=(5, 5)) my_button = wx.Button(panel, label='Press Me', pos=(5, 55))   self.Show()   if __name__ == '__main__':     application = wx.App()     framework = MyFramework() application.MainLoop()When you run the code, the application will look like this:The first widget that is recommended on Windows is wx.Panel. It makes the background color of the frame as the right shade of gray. Tab traversal is disabled without a Panel on Windows.If the panel is the sole child of the frame, it will be expanded automatically to fill the frame with itself. The next thing you need to do is to add a wx.TextCtrl to the panel. The first argument is always that which parent the widget should go to for almost all widgets. So if you are willing to keep the text control and the button on the top of the panel, it is the parent you need to specify.You also need to inform wxPython about the position of the widget. You can do it using the pos parameter. The default location is (0,0) which is actually at the upper left corner of the parent. So to change the text control, you can change the position of the frame, you can shift its left corner 5 pixels from the left(x) and 5 pixels from the top(y). Finally, you can add your button to the panel and label it. You can also set the y-coordinate to 55 to prevent the overlapping of widgets.Absolute PositioningAbsolute positioning is the technique found in most GUI toolkits by which you can provide the exact coordinates for your widget’s position.There might be situations when you need to keep track of all your widgets and relocate the widgets in case of a complex application. This can be a really difficult thing to do. However, most modern-day toolkits provide a solution for this, which we’ll study next.Sizers (Dynamic Sizing)Sizers are methods to define the control layout in dialogs in wxPython. They have the ability to create dialogs that are not dependent on the platform. They manage the positioning of the widgets and adjust them when the user resizes the application window.Some of the primary types of sizers that are commonly used are:wx.BoxSizerwx.GridSizerwx.FlexGridSizerAn example code to add wx.BoxSizer to the previous code:import wx   class MyFramework(wx.Frame):     def frame(self):         super().frame(parent=None, title='Hello World')         panel = wx.Panel(self)         my_sizer = wx.BoxSizer(wx.VERTICAL)         self.text_ctrl = wx.TextCtrl(panel) my_sizer.Add(self.text_ctrl, 0, wx.ALL | wx.EXPAND, 5)         my_button = wx.Button(panel, label='Press Me') my_sizer.Add(my_btn, 0, wx.ALL | wx.CENTER, 5)         panel.SetSizer(my_sizer)         self.Show()   if __name__ == '__main__':     application = wx.App()     framework = MyFramework() application.MainLoop() In the example above, an instance of wx.BoxSixer is created and passed to wx.VERTICAL which is actually the orientation that widgets are included in the sizer. The widgets will be added in a vertical manner from top to bottom. You can also set the BoxSizer’s orientation to wx.HORIZONTAL. In this case, the widgets are added from left to right.  You can use .Add() to a widget to a sizer which takes maximum five arguments as follows: window ( the widget )- This is the widget that is added to the sizer. proportion - It sets how much space corresponding to other widgets in the sizer will the widget should take. By default, the proportion is zero which leaves the wxPython to its original proportion. flag - It allows you to pass in multiple flags by separating them with a pipe character: |. The text control is added using wx.ALL and wx.EXPAND flags. The wx.ALL flag adds a border on all sides of the widget. On the other hand, wx.EXPAND expands the widgets as much as the sizer can be expanded. border - This parameter informs wxPython about the number of pixels of border needed around the widget.  userData - It is a rare argument that is used for resizing in case of complex applications. However, in this example, the wx.EXPAND flag is replaced with wx.CENTER to display the button in the center on-screen. When you run the code, your application will look something like this:Adding an event using wxPython Though your application looks cool, but it really does nothing. The button you have created does nothing on pressing it. Let us give the button a job:import wx   class MyFramework(wx.Frame):     def frame(self):         super().frame(parent=None, title='Hello World')         panel = wx.Panel(self)         my_sizer = wx.BoxSizer(wx.VERTICAL)         self.text_ctrl = wx.TextCtrl(panel) my_sizer.Add(self.text_ctrl, 0, wx.ALL | wx.EXPAND, 5)         my_button = wx.Button(panel, label='Press Me') my_button.Bind(wx.EVT_BUTTON, self.on_press) my_sizer.Add(my_btn, 0, wx.ALL | wx.CENTER, 5)         panel.SetSizer(my_sizer)         self.Show()   def button_press(self, event):         value = self.text_ctrl.GetValue()         if not value:             print("You didn't enter anything!")        else:             print(f'You typed: "{value}"')   if __name__ == '__main__':     application = wx.App()     framework = MyFramework() application.MainLoop() You can attach event bindings to the widgets in wxPython. This allows them to respond to certain types of events.If you want the button to do something, you can do it using the button’s .Bind() method. It takes the events you want to bind to, the handler to call an event, an optional source, and a number of optional ids. In the example above, the button object is binded to wx.EVT_BUTTON and told to call button_press when the event gets fired..button_press also accepts a second argument by convention that is called event. The event parameter suggests that the second argument should be an event object.You can get the text control’s contents with the help of GetValue() method within .button_press.How to create a Working Application?Consider a situation where you are asked to create an MP3 tag editor. The foremost thing you need to do is to look out for the required packages.Consider a situation where you are asked to create an MP3 tag editor. The foremost thing you need to do is to look out for the required packages.If you make a Google search for Python mp3 tag editor, you will find several options as below:mp3 -taggereyeD3mutagenOut of these, eyeD3 is a better choice than the other two since it has a pretty nice API that can be used without getting bogged down with MP3’s ID3 specification.You can install eyeD3 using pip from your terminal:pip install eyed3If you want to install eyeD3 in macOS, you have to install libmagic using brew. Linux and Windows users can easily install using the command mentioned above.Designing the User Interface using wxPythonThe very first thing you must do before designing an interface is to sketch out how you think the interface should look.The user interface should perform the following tasks:Open up one or more MP3 files.Display the current MP3 tags.Edit an MP3 tag.If you want to open a file or a folder, you need to have a menu or a button in your user interface. You can do that with a File menu. You will also need a widget to see the tags for multiple MP3 files. A tabular structure consisting of columns and rows would be perfect for this case since you can have labeled columns for the MP3 tags. wxPython toolkit consists of afew widgets to perform this task:wx.grid.Gridwx.ListCtrlwx.ListCtrl would be a better option of these two since the Grid widget is overkill and complex in nature. Finally, you can use a button to perform the editing tasks.Below is an illustration of what the application should look like:Creating the User Interface You can refer to a lot of approaches when you are creating a user interface. You can follow the Model-View-Controller design pattern that is used for developing user interfaces which divides the program logic into three interconnected elements. You should know how to split up classes and how many classes should be included in a single file and so on.However, in this case, you need only two classes which are as follows:wx.Panel classwx.Frame class Let’s start with imports and the panel class:import eyed3 import glob import wx   class Mp3Panel(wx.Panel):     def frame(self, parent):         super().__init__(parent) main_sizer = wx.BoxSizer(wx.VERTICAL) self.row_obj_dict = {}   self.list_ctrl = wx.ListCtrl(             self, size=(-1, 100),               style=wx.LC_REPORT | wx.BORDER_SUNKEN         ) self.list_ctrl.InsertColumn(0, 'Artist', width=140) self.list_ctrl.InsertColumn(1, 'Album', width=140) self.list_ctrl.InsertColumn(2, 'Title', width=200) main_sizer.Add(self.list_ctrl, 0, wx.ALL | wx.EXPAND, 0)         edit_button = wx.Button(self, label='Edit') edit_button.Bind(wx.EVT_BUTTON, self.on_edit) main_sizer.Add(edit_button, 0, wx.ALL | wx.CENTER, 5)         self.SetSizer(main_sizer)   def on_edit(self, event):         print('in on_edit')   def update_mp3_listing(self, folder_path):         print(folder_path)In this example above, the eyed3 package, glob package, and the wx package are imported. Then, the user interface is created by making wx.Panel a subclass. A dictionary row_obj_dict is created for storing data about the MP3s. The next thing you do is create a wx.ListCtrl and set it to report mode, i.e. wx.LC_REPORT. This report flag is the most popular among all but you can also choose your own depending upon the style flag that you pass in. Now you need to call .InsertColumn() to make the ListCtrl have the correct headers and then provide the index of the column, its label and the width of the column pixels. Finally, you need to add your Edit button, an event handler, and a method. The code for the frame is as follows:class Mp3Frame(wx.Frame):     def__init__(self):         super().__init__(parent=None,                          title='Mp3 Tag Editor') self.panel = Mp3Panel(self) self.Show()   if __name__ == '__main__':     app = wx.App(False)     frame = Mp3Frame() app.MainLoop()This class function is a better and simpler approach than the previous one because you just need to set the title of the frame and instantiate the panel class, MP3Panel. The user interface will look like this after all the implementations:The next thing we will do is add a File menu to add MP3s to the application and also edit their tags.Make a Functioning ApplicationThe very first thing you need to do to make your application work is to update the wx.Frame class to include the File menu which will allow you to add MP3 files.Code to add a menu bar to our application:class Mp3Frame(wx.Frame):   def__init__(self):         wx.Frame.__init__(self, parent=None,                             title='Mp3 Tag Editor') self.panel = Mp3Panel(self) self.create_menu() self.Show()   def create_menu(self): menu_bar = wx.MenuBar() file_menu = wx.Menu() open_folder_menu_item = file_menu.Append( wx.ID_ANY, 'Open Folder',   'Open a folder with MP3s'         ) menu_bar.Append(file_menu, '&File') self.Bind(             event=wx.EVT_MENU,               handler=self.on_open_folder,             source=open_folder_menu_item,         ) self.SetMenuBar(menu_bar)   def on_open_folder(self, event):         title = "Choose a directory:" dlg = wx.DirDialog(self, title,                              style=wx.DD_DEFAULT_STYLE) if dlg.ShowModal() == wx.ID_OK:             self.panel.update_mp3_listing(dlg.GetPath()) dlg.Destroy() In the example code above, .create_menu() is called within the class’s constructor and then two instances – wx.MenuBar and wx.Menu are created.Now, if you’re willing to add an item to the menu, you need to call the menu instance’s .Append() and pass the following things:A unique identifierLabelA help stringAfter that call the menubar’s .Append() to add the menu to the menubar. It will take the menu instance and the label for menu. The label is called as &File so that a keyboard shortcut is created to open the File menu using just the keyboard.Now self.Bind() is called to bind the frame to wx.EVT_MENU. This informs wxPython about which handler should be used and which source to bind the handler to. Lastly, call the frame’s .SetMenuBar and pass it the menubar instance. Your menu is now added to the frame.Now let’s come back to the menu item’s event handler:def on_open_folder(self, event):     title = "Choose a directory:" dlg = wx.DirDialog(self, title, style=wx.DD_DEFAULT_STYLE) if dlg.ShowModal() == wx.ID_OK:         self.panel.update_mp3_listing(dlg.GetPath()) dlg.Destroy()You can use wxPython’s wx.DirDialog to choose the directories of the correct MP3 folder. To display the dialog, use .ShowModal(). This will display the dialog modally but will disallow the user to interact with the main application.  You can get to the user’s choice of path using .GetPath() whenever the user presses the OK button. This path has to be added to the panel class and this can be done by the panel’s .update_mp3_listing().Finally, you will have to close the dialog and the best method is using .Destroy().  There are methods to close the dialog like .Close() which will just dialog but will not destroy it, so .Destroy() is the most effective option to prevent such situation.Now let’s update the MP3Panel class starting with .update_mp3_listing():def update_mp3_listing(self, folder_path): self.current_folder_path = folder_path self.list_ctrl.ClearAll()   self.list_ctrl.InsertColumn(0, 'Artist', width=140) self.list_ctrl.InsertColumn(1, 'Album', width=140) self.list_ctrl.InsertColumn(2, 'Title', width=200) self.list_ctrl.InsertColumn(3, 'Year', width=200)       mp3s = glob.glob(folder_path + '/*.mp3')     mp3_objects = []     index = 0 for mp3 in mp3s:         mp3_object = eyed3.load(mp3) self.list_ctrl.InsertItem(index,               mp3_object.tag.artist) self.list_ctrl.SetItem(index, 1,               mp3_object.tag.album) self.list_ctrl.SetItem(index, 2,               mp3_object.tag.title)         mp3_objects.append(mp3_object) self.row_obj_dict[index] = mp3_object         index += 1In the example above, the current directory is set to the specified folder and the list control is cleared. The list controls stay fresh and shows the MP3s you’re currently working with. Next, the folder is taken and Python’s globmoduleis used to search for the MP3 files. Then, the MP3s are looped over and converted into eyed3 objects. This is done by calling the .load() of eyed3. After that, you can add the artist, album, and the title of the Mp3 to the control list given that the MP3s have the appropriate tags..InsertItem() is used to add a new row to a list control for the first time and SetItem()  is used to add rows to the subsequent columns. The last step is to save your MP3 object to your Python dictionary row_obj_dict.Now to edit an MP3’s tags, you need to update the .on_edit() event handler:def on_edit(self, event):     selection = self.list_ctrl.GetFocusedItem() if selection >= 0:         mp3 = self.row_obj_dict[selection] dlg = EditDialog(mp3) dlg.ShowModal()         self.update_mp3_listing(self.current_folder_path) dlg.Destroy()The user’s selection is taken by calling the list control’s .GetFocusedItem(). It will return -1 if the user will not select anything in the list control. However, if you want to extract the MP3 obj3ct from the dictionary, the user have to select something. You can then open the MP3 tag editor dialog which will be a custom dialog. As before, the dialog is shown modally, then the last two lines in .on_edit() will execute what will eventually display the current MP3 tag information. SummaryLet us sum up what we have learned in this article so far – Installing wxPython and Working with wxPython’s widgets Working of events in wxPython Comparing absolute positioning with sizers Creating a skeleton application and a working application The main feature of the wxPython Graphical User Interface is its robustness and a large collection of widgets that you can use to build cross-platform applications. Since you have now learned how to create a working application, that is an MP3 tag editor, you can try your hand to enhance this application to a more beautiful one with lots of new features or you can perhaps create your own wonderful application. To gain more knowledge about Python tips and tricks, check our Python tutorial and get a good hold over coding in Python by joining the Python certification course.
Rated 4.5/5 based on 2 customer reviews
How to Build a Python GUI Application With wxPytho...

A Graphical User Interface or GUI is a user interf... Read More

Bagging and Random Forest in Machine Learning

In today’s world, innovations happen on a daily basis, rendering all the previous versions of that product, service or skill-set outdated and obsolete. In such a dynamic and chaotic space, how can we make an informed decision without getting carried away by plain hype? To make the right decisions, we must follow a set of processes; investigate the current scenario, chart down your expectations, collect reviews from others, explore your options, select the best solution after weighing the pros and cons, make a decision and take the requisite action. For example, if you are looking to purchase a computer, will you simply walk up to the store and pick any laptop or notebook? It’s highly unlikely that you would do so. You would probably search on Amazon, browse a few web portals where people have posted their reviews and compare different models, checking for their features, specifications and prices. You will also probably ask your friends and colleagues for their opinion. In short, you would not directly jump to a conclusion, but will instead make a decision considering the opinions and reviews of other people as well. Ensemble models in machine learning also operate on a similar manner. They combine the decisions from multiple models to improve the overall performance. The objective of this article is to introduce the concept of ensemble learning and understand algorithms like bagging and random forest which use a similar technique. What is Ensemble Learning? Ensemble methods aim at improving the predictive performance of a given statistical learning or model fitting technique. The general principle of ensemble methods is to construct a linear combination of some model fitting method, instead of using a single fit of the method. An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. Ensemble methods combine several decision trees classifiers to produce better predictive performance than a single decision tree classifier. The main principle behind the ensemble model is that a group of weak learners come together to form a strong learner, thus increasing the accuracy of the model.When we try to predict the target variable using any machine learning technique, the main causes of difference in actual and predicted values are noise, variance, and bias. Ensemble helps to reduce these factors (except noise, which is irreducible error). The noise-related error is mainly due to noise in the training data and can't be removed. However, the errors due to bias and variance can be reduced.The total error can be expressed as follows: Total Error = Bias + Variance + Irreducible Error A measure such as mean square error (MSE) captures all of these errors for a continuous target variable and can be represented as follows: Where, E stands for the expected mean, Y represents the actual target values and fˆ(x) is the predicted values for the target variable. It can be broken down into its components such as bias, variance and noise as shown in the following formula: Using techniques like Bagging and Boosting helps to decrease the variance and increase the robustness of the model. Combinations of multiple classifiers decrease variance, especially in the case of unstable classifiers, and may produce a more reliable classification than a single classifier. Ensemble Algorithm The goal of ensemble algorithms is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator. There are two families of ensemble methods which are usually distinguished: Averaging methods. The driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.|Examples: Bagging methods, Forests of randomized trees. Boosting methods. Base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble.Examples: AdaBoost, Gradient Tree Boosting.Advantages of Ensemble Algorithm Ensemble is a proven method for improving the accuracy of the model and works in most of the cases. Ensemble makes the model more robust and stable thus ensuring decent performance on the test cases in most scenarios. You can use ensemble to capture linear and simple as well nonlinear complex relationships in the data. This can be done by using two different models and forming an ensemble of two. Disadvantages of Ensemble Algorithm Ensemble reduces the model interpret-ability and makes it very difficult to draw any crucial business insights at the end It is time-consuming and thus might not be the best idea for real-time applications The selection of models for creating an ensemble is an art which is really hard to master Basic Ensemble Techniques Max Voting: Max-voting is one of the simplest ways of combining predictions from multiple machine learning algorithms. Each base model makes a prediction and votes for each sample. The sample class with the highest votes is considered in the final predictive class. It is mainly used for classification problems.  Averaging: Averaging can be used while estimating the probabilities in classification tasks. But it is usually used for regression problems. Predictions are extracted from multiple models and an average of the predictions are used to make the final prediction. Weighted Average: Like averaging, weighted averaging is also used for regression tasks. Alternatively, it can be used while estimating probabilities in classification problems. Base learners are assigned different weights, which represent the importance of each model in the prediction. Ensemble Methods Ensemble methods became popular as a relatively simple device to improve the predictive performance of a base procedure. There are different reasons for this: the bagging procedure turns out to be a variance reduction scheme, at least for some base procedures. On the other hand, boosting methods are primarily reducing the (model) bias of the base procedure. This already indicates that bagging and boosting are very different ensemble methods. From the perspective of prediction, random forests is about as good as boosting, and often better than bagging.  Bootstrap Aggregation or Bagging tries to implement similar learners on small sample populations and then takes a mean of all the predictions. It combines Bootstrapping and Aggregation to form one ensemble model Reduces the variance error and helps to avoid overfitting Bagging algorithms include: Bagging meta-estimator Random forest Boosting refers to a family of algorithms which converts weak learner to strong learners. Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model. Boosting is focused on reducing the bias. It makes the boosting algorithms prone to overfitting. To avoid overfitting, parameter tuning plays an important role in boosting algorithms. Some examples of boosting are mentioned below: AdaBoost GBM XGBM Light GBM CatBoost Why use ensemble models? Ensemble models help in improving algorithm accuracy as well as the robustness of a model. Both Bagging and Boosting should be known by data scientists and machine learning engineers and especially people who are planning to attend data science/machine learning interviews. Ensemble learning uses hundreds to thousands of models of the same algorithm and then work hand in hand to find the correct classification. You may also consider the fable of the blind men and the elephant to understand ensemble learning, where each blind man found a feature of the elephant and they all thought it was something different. However, if they would work together and discussed among themselves, they might have figured out what it is. Using techniques like bagging and boosting leads to increased robustness of statistical models and decreased variance. Now the question becomes, between these different “B” words. Which is better? Which is better, Bagging or Boosting? There is no perfectly correct answer to that. It depends on the data, the simulation and the circumstances. Bagging and Boosting decrease the variance of your single estimate as they combine several estimates from different models. So the result may be a model with higher stability. If the problem is that the single model gets a very low performance, Bagging will rarely get a better bias. However, Boosting could generate a combined model with lower errors as it optimizes the advantages and reduces pitfalls of the single model. By contrast, if the difficulty of the single model is overfitting, then Bagging is the best option. Boosting for its part doesn’t help to avoid over-fitting; in fact, this technique is faced with this problem itself. For this reason, Bagging is effective more often than Boosting. In this article we will discuss about Bagging, we will cover Boosting in the next post. But first, let us look into the very important concept of bootstrapping. Bootstrap Sampling Sampling is the process of selecting a subset of observations from the population with the purpose of estimating some parameters about the whole population. Resampling methods, on the other hand, are used to improve the estimates of the population parameters. In machine learning, the bootstrap method refers to random sampling with replacement. This sample is referred to as a resample. This allows the model or algorithm to get a better understanding of the various biases, variances and features that exist in the resample. Taking a sample of the data allows the resample to contain different characteristics then it might have contained as a whole. This is demonstrated in figure 1 where each sample population has different pieces, and none are identical. This would then affect the overall mean, standard deviation and other descriptive metrics of a data set. In turn, it can develop more robust models. Bootstrapping is also great for small size data sets that can have a tendency to overfit. In fact, we recommended this to one company who was concerned because their data sets were far from “Big Data”. Bootstrapping can be a solution in this case because algorithms that utilize bootstrapping can be more robust and handle new data sets depending on the methodology chosen(boosting or bagging). The reason behind using the bootstrap method is because it can test the stability of a solution. By using multiple sample data sets and then testing multiple models, it can increase robustness. Perhaps one sample data set has a larger mean than another, or a different standard deviation. This might break a model that was overfit, and not tested using data sets with different variations. One of the many reasons bootstrapping has become very common is because of the increase in computing power. This allows for many times more permutations to be done with different resamples than previously. Bootstrapping is used in both Bagging and Boosting Let us assume we have a sample of ‘n’ values (x) and we’d like to get an estimate of the mean of the sample. mean(x) = 1/n * sum(x) Consider a sample of 100 values (x) and we’d like to get an estimate of the mean of the sample. We can calculate the mean directly from the sample as: We know that our sample is small and that the mean has an error in it. We can improve the estimate of our mean using the bootstrap procedure: Create many (e.g. 1000) random sub-samples of the data set with replacement (meaning we can select the same value multiple times). Calculate the mean of each sub-sample Calculate the average of all of our collected means and use that as our estimated mean for the data Example: Suppose we used 3 re-samples and got the mean values 2.3, 4.5 and 3.3. Taking the average of these we could take the estimated mean of the data to be 3.367. This process can be used to estimate other quantities like the standard deviation and even quantities used in machine learning algorithms, like learned coefficients. While using Python, we do not have to implement the bootstrap method manually. The scikit-learn library provides an implementation that creates a single bootstrap sample of a dataset. The resample() scikit-learn function can be used for sampling. It takes as arguments the data array, whether or not to sample with replacement, the size of the sample, and the seed for the pseudorandom number generator used prior to the sampling. For example, let us create a bootstrap that creates a sample with replacement with 4 observations and uses a value of 1 for the pseudorandom number generator. boot = resample(data, replace=True, n_samples=4, random_state=1)As the bootstrap API does not allow to easily gather the out-of-bag observations that could be used as a test set to evaluate a fit model, in the univariate case we can gather the out-of-bag observations using a simple Python list comprehension. # out of bag observations  oob = [x for x in data if x not in boot]Let us look at a small example and execute it.# scikit-learn bootstrap  from sklearn.utils import resample  # data sample  data = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]  # prepare bootstrap sample  boot = resample(data, replace=True, n_samples=4, random_state=1)  print('Bootstrap Sample: %s' % boot)  # out of bag observations  oob = [x for x in data if x not in boot]  print('OOB Sample: %s' % oob) The output will include the observations in the bootstrap sample and those observations in the out-of-bag sample.Bootstrap Sample: [0.6, 0.4, 0.5, 0.1]  OOB Sample: [0.2, 0.3]Bagging Bootstrap Aggregation, also known as Bagging, is a powerful ensemble method that was proposed by Leo Breiman in 1994 to prevent overfitting. The concept behind bagging is to combine the predictions of several base learners to create a more accurate output. Bagging is the application of the Bootstrap procedure to a high-variance machine learning algorithm, typically decision trees. Suppose there are N observations and M features. A sample from observation is selected randomly with replacement (Bootstrapping). A subset of features are selected to create a model with sample of observations and subset of features. Feature from the subset is selected which gives the best split on the training data. This is repeated to create many models and every model is trained in parallel Prediction is given based on the aggregation of predictions from all the models. This approach can be used with machine learning algorithms that have a high variance, such as decision trees. A separate model is trained on each bootstrap sample of data and the average output of those models used to make predictions. This technique is called bootstrap aggregation or bagging for short. Variance means that an algorithm’s performance is sensitive to the training data, with high variance suggesting that the more the training data is changed, the more the performance of the algorithm will vary. The performance of high variance machine learning algorithms like unpruned decision trees can be improved by training many trees and taking the average of their predictions. Results are often better than a single decision tree. What Bagging does is help reduce variance from models that are might be very accurate, but only on the data they were trained on. This is also known as overfitting. Overfitting is when a function fits the data too well. Typically this is because the actual equation is much too complicated to take into account each data point and outlier. Bagging gets around this by creating its own variance amongst the data by sampling and replacing data while it tests multiple hypothesis(models). In turn, this reduces the noise by utilizing multiple samples that would most likely be made up of data with various attributes(median, average, etc). Once each model has developed a hypothesis. The models use voting for classification or averaging for regression. This is where the “Aggregating” in “Bootstrap Aggregating” comes into play. Each hypothesis has the same weight as all the others. When we later discuss boosting, this is one of the places the two methodologies differ. Essentially, all these models run at the same time, and vote on which hypothesis is the most accurate. This helps to decrease variance i.e. reduce the overfit. Advantages Bagging takes advantage of ensemble learning wherein multiple weak learners outperform a single strong learner.  It helps reduce variance and thus helps us avoid overfitting. Disadvantages There is loss of interpretability of the model. There can possibly be a problem of high bias if not modeled properly. While bagging gives us more accuracy, it is computationally expensive and may not be desirable depending on the use case. There are many bagging algorithms of which perhaps the most prominent would be Random Forest.  Decision Trees Decision trees are simple but intuitive models. Using a top-down approach, a root node creates binary splits unless a particular criteria is fulfilled. This binary splitting of nodes results in a predicted value on the basis of the interior nodes which lead to the terminal or the final nodes. For a classification problem, a decision tree will output a predicted target class for each terminal node produced. We have covered decision tree algorithm  in detail for both classification and regression in another article. Limitations to Decision Trees Decision trees tend to have high variance when they utilize different training and test sets of the same data, since they tend to overfit on training data. This leads to poor performance when new and unseen data is added. This limits the usage of decision trees in predictive modeling. However, using ensemble methods, models that utilize decision trees can be created as a foundation for producing powerful results. Bootstrap Aggregating Trees We have already discussed about bootstrap aggregating (or bagging), we can create an ensemble (forest) of trees where multiple training sets are generated with replacement, meaning data instances. Once the training sets are created, a CART model can be trained on each subsample. Features of Bagged Trees Reduces variance by averaging the ensemble's results. The resulting model uses the entire feature space when considering node splits. Bagging trees allow the trees to grow without pruning, reducing the tree-depth sizes and resulting in high variance but lower bias, which can help improve predictive power. Limitations to Bagging Trees The main limitation of bagging trees is that it uses the entire feature space when creating splits in the trees. Suppose some variables within the feature space are indicating certain predictions, there is a risk of having a forest of correlated trees, which actually  increases bias and reduces variance. Why a Forest is better than One Tree?The main objective of a machine learning model is to generalize properly to new and unseen data. When we have a flexible model, overfitting takes place. A flexible model is said to have high variance because the learned parameters (such as the structure of the decision tree) will vary with the training data. On the other hand, an inflexible model is said to have high bias as it makes assumptions about the training data. An inflexible model may not have the capacity to fit even the training data and in both cases — high variance and high bias — the model is not able to generalize new and unseen data properly. You can through the article on one of the foundational concepts in machine learning, bias-variance tradeoff which will help you understand that the balance between creating a model that is so flexible memorizes the training data and an inflexible model cannot learn the training data.  The main reason why decision tree is prone to overfitting when we do not limit the maximum depth is because it has unlimited flexibility, which means it keeps growing unless there is one leaf node for every single observation. Instead of limiting the depth of the tree which results in reduced variance and increase in bias, we can combine many decision trees into a single ensemble model known as the random forest. What is Random Forest algorithm? Random forest is like bootstrapping algorithm with Decision tree (CART) model. Suppose we have 1000 observations in the complete population with 10 variables. Random forest will try to build multiple CART along with different samples and different initial variables. It will take a random sample of 100 observations and then chose 5 initial variables randomly to build a CART model. It will go on repeating the process say about 10 times and then make a final prediction on each of the observations. Final prediction is a function of each prediction. This final prediction can simply be the mean of each prediction. The random forest is a model made up of many decision trees. Rather than just simply averaging the prediction of trees (which we could call a “forest”), this model uses two key concepts that gives it the name random: Random sampling of training data points when building trees Random subsets of features considered when splitting nodes How the Random Forest Algorithm Works The basic steps involved in performing the random forest algorithm are mentioned below: Pick N random records from the dataset. Build a decision tree based on these N records. Choose the number of trees you want in your algorithm and repeat steps 1 and 2. In case of a regression problem, for a new record, each tree in the forest predicts a value for Y (output). The final value can be calculated by taking the average of all the values predicted by all the trees in the forest. Or, in the case of a classification problem, each tree in the forest predicts the category to which the new record belongs. Finally, the new record is assigned to the category that wins the majority vote. Using Random Forest for Regression Here we have a problem where we have to predict the gas consumption (in millions of gallons) in 48 US states based on petrol tax (in cents), per capita income (dollars), paved highways (in miles) and the proportion of population with the driving license. We will use the random forest algorithm via the Scikit-Learn Python library to solve this regression problem. First we import the necessary libraries and our dataset. import pandas as pd  import numpy as np  dataset = pd.read_csv('/content/petrol_consumption.csv')  dataset.head() Petrol_taxAverage_incomepaved_HighwaysPopulation_Driver_licence(%)Petrol_Consumption09.0357119760.52554119.0409212500.57252429.0386515860.58056137.5487023510.52941448.043994310.544410You will notice that the values in our dataset are not very well scaled. Let us scale them down before training the algorithm. Preparing Data For Training We will perform two tasks in order to prepare the data. Firstly we will divide the data into ‘attributes’ and ‘label’ sets. The resultant will then be divided into training and test sets. X = dataset.iloc[:, 0:4].values  y = dataset.iloc[:, 4].valuesNow let us divide the data into training and testing sets:from sklearn.model_selection import train_test_split  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)Feature Scaling The dataset is not yet a scaled value as you will see that the Average_Income field has values in the range of thousands while Petrol_tax has values in the range of tens. It will be better if we scale our data. We will use Scikit-Learn's StandardScaler class to do the same. # Feature Scaling  from sklearn.preprocessing import StandardScaler  sc = StandardScaler()  X_train = sc.fit_transform(X_train)  X_test = sc.transform(X_test)Training the Algorithm Now that we have scaled our dataset, let us train the random forest algorithm to solve this regression problem. from sklearn.ensemble import Random Forest Regressor  regressor = Random Forest Regressor(n_estimators=20,random_state=0), y_train)  y_pred = regressor.predict(X_test)The RandomForestRegressor is used to solve regression problems via random forest. The most important parameter of the RandomForestRegressor class is the n_estimators parameter. This parameter defines the number of trees in the random forest. Here we started with n_estimator=20 and check the performance of the algorithm. You can find details for all of the parameters of RandomForestRegressor here. Evaluating the Algorithm Let us evaluate the performance of the algorithm. For regression problems the metrics used to evaluate an algorithm are mean absolute error, mean squared error, and root mean squared error.  from sklearn import metrics  print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))  print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))  print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred))) Mean Absolute Error: 51.76500000000001 Mean Squared Error: 4216.166749999999 Root Mean Squared Error: 64.93201637097064 With 20 trees, the root mean squared error is 64.93 which is greater than 10 percent of the average petrol consumption i.e. 576.77. This may indicate, among other things, that we have not used enough estimators (trees). Let us now change the number of estimators to 200, the results are as follows: Mean Absolute Error: 48.33899999999999 Mean Squared Error: 3494.2330150000003  Root Mean Squared Error: 59.112037818028234 The graph below shows the decrease in the value of the root mean squared error (RMSE) with respect to number of estimators.  You will notice that the error values decrease with the increase in the number of estimators. You may consider 200 a good number for n_estimators as the rate of decrease in error diminishes. You may try playing around with other parameters to figure out a better result. Using Random Forest for ClassificationNow let us consider a classification problem to predict whether a bank currency note is authentic or not based on four attributes i.e. variance of the image wavelet transformed image, skewness, entropy, andkurtosis of the image. We will use Random Forest Classifier to solve this binary classification problem. Let’s get started. import pandas as pd  import numpy as np  dataset = pd.read_csv('/content/bill_authentication.csv')  dataset.head()VarianceSkewnessKurtosisEntropyClass03.621608.6661-2.8073-0.44699014.545908.1674-2.4586-1.46210023.86600-2.63831.92420.10645033.456609.5228-4.0112-3.59440040.32924-4.45524.5718-0.988800Similar to the data we used previously for the regression problem, this data is not scaled. Let us prepare the data for training. Preparing Data For Training The following code divides data into attributes and labels: X = dataset.iloc[:, 0:4].values  y = dataset.iloc[:, 4].values The following code divides data into training and testing sets:from sklearn.model_selection import train_test_split  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) Feature Scaling We will do the same thing as we did for the previous problem. # Feature Scaling  from sklearn.preprocessing import StandardScaler  sc = StandardScaler()  X_train = sc.fit_transform(X_train)  X_test = sc.transform(X_test)Training the Algorithm Now that we have scaled our dataset, let us train the random forest algorithm to solve this classification problem. from sklearn.ensemble import Random Forest Classifier  classifier = RandomForestClassifier(n_estimators=20, random_state=0), y_train)  y_pred = classifier.predict(X_test)For classification, we have used RandomForestClassifier class of the sklearn.ensemble library. It takes n_estimators as a parameter. This parameter defines the number of trees in out random forest. Similar to the regression problem, we have started with 20 trees here. You can find details for all of the parameters of Random Forest Classifier here. Evaluating the Algorithm For evaluating classification problems,  the metrics used are accuracy, confusion matrix, precision recall, and F1 valuesfrom sklearn.metrics import classification_report, confusion_matrix, accuracy_score  print(confusion_matrix(y_test,y_pred))  print(classification_report(y_test,y_pred))  print(accuracy_score(y_test, y_pred)) The output will look something like this: Output:[ [ 155   2] [     1  117] ]Precisionrecallf1-scoresupport00.990.990.9915710.980.990.99118accuracy0.99275macro avg0.990.990.992750.98909090909090910.990.990.99275The accuracy achieved by our random forest classifier with 20 trees is 98.90%. Let us change the number of trees to 200.from sklearn.ensemble import Random Forest Classifier  classifier = Random Forest Classifier(n_estimators=200, random_state=0), y_train)  y_pred = classifier.predict(X_test) Output:[ [ 155   2] [     1  117] ]Precisionrecallf1-scoresupport00.990.990.9915710.980.990.99118accuracy0.99275macro avg0.990.990.992750.98909090909090910.990.990.99275Unlike the regression problem, changing the number of estimators for this problem did not make any difference in the results.An accuracy of 98.9% is pretty good. In this case, we have seen that there is not much improvement if the number of trees are increased. You may try playing around with other parameters of the RandomForestClassifier class and see if you can improve on our results. Advantages and Disadvantages of using Random Forest As with any algorithm, there are advantages and disadvantages to using it. Let us look into the pros and cons of using Random Forest for classification and regression. Advantages Random forest algorithm is unbiased as there are multiple trees and each tree is trained on a subset of data.  Random Forest algorithm is very stable. Introducing a new data in the dataset does not affect much as the new data impacts one tree and is pretty hard to impact all the trees. The random forest algorithm works well when you have both categorical and numerical features. With missing values in the dataset, the random forest algorithm performs very well. Disadvantages A major disadvantage of random forests lies in their complexity. More computational resources are required and also results in the large number of decision trees joined together. Due to their complexity, training time is more compared to other algorithms. Summary In this article we have covered what is ensemble learning and discussed about basic ensemble techniques. We also looked into bootstrap sampling involves iteratively resampling of a dataset with replacement which allows the model or algorithm to get a better understanding various features. Then we moved on to bagging followed by random forest. We also implemented random forest in Python for both regression and classification and came to a conclusion that increasing number of trees or estimators does not always make a difference in a classification problem. However, in regression there is an impact.  We have covered most of the topics related to algorithms in our series of machine learning blogs,click here. If you are inspired by the opportunities provided by machine learning, enrol in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape. 0.99
Rated 4.5/5 based on 12 customer reviews
Bagging and Random Forest in Machine Learning

In today’s world, innovations happen on a daily ... Read More