Course Discount

Search

Machine learning Filter

Support Vector Machines in Machine Learning

While many classifiers exist that can classify linearly separable data such as logistic regression, Support Vector Machines can handle highly non-linear problems using a kernel trick which implicitly maps the input vectors to higher-dimensional feature spaces. The transformation rearranges the dataset in such a way that it is then linearly solvable. In this article we are going to look at how SVM works, learn about kernel functions, hyperparameters and pros and cons of SVM along with some of the real life applications of SVM. Support Vector Machines (SVMs), also known as support vector networks, are a family of extremely powerful models which use method based learning and can be used in classification and regression problems. They aim at finding decision boundaries that separate observations with differing class memberships. In other words, SVM is a discriminative classifier formally defined by a separating hyperplane.Method Based Learning There are several learning models namely:Association rules basedEnsemble method basedDeep Learning basedClustering method basedRegression Analysis basedBayesian method basedDimensionality reduction based Instance basedKernel method basedLet us understand what Kernel method based learning is all about.In simple terms, a kernel is a similarity function which is fed into a machine learning algorithm. It accepts two inputs and suggests the similarity. For example, suppose we want to classify images, the input data is a key-value pair (image, label). The image data is taken into consideration, features are computed, and a vector of features are fed into the Machine learning algorithm. But in the case of similarity functions, a kernel function can be defined which internally computes the similarity between images, and then feeds into the learning algorithm along with the images and label data. The outcome of this is a classifier. Perceptron frameworks or Support vector machines work with kernels and use vectors only. Here, the machine learning algorithms are expressed as dot products so that kernel functions can be used.Feature vectors generally prefer kernels. Its ease of computing makes it one of the key reasons, also, feature vectors need more storage space in comparison to dot products. You can writeMachine learning algorithms to use dot products and later map them to use kernels. This completely avoids the usage of feature vectors. This allows us to work with highly complex, efficient-to-compute, and yet high performing kernels effortlessly, without really developing multi-dimensional vectors.Kernel functionsLet us understand what kernel functions are: The figure shown below represents a 1D function using a simple 1-Dimensional example. Assume that given points are as follows, it will depict a vertical line and no other vertical lines will separate the dataset.Now, if we consider a 2-Dimensional representation, as shown in the figure below, there is a hyperplane (an arbitrary line in 2-Dimensions) which separates red and blue points, which can be separated using Support Vector Machines.As we keep increasing dimensional space, the need to be able to separate data will eventually decrease. This mapping, x -> (x, x2), is called the kernel function. In case of growing dimensional space, the computations become more complex and kernel trick needs to be applied to address these computations cheaply. What is Support Vector Machine? Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However,  it is mostly used in classification problems. In this algorithm, each data is plotted in n-dimensional space (where n is the number of features you have) with the value of each feature being the value of a particular coordinate. After that, we perform classification by locating the hyperplane which differentiates both the classes.Let us create a dataset to understand support vector classification:# importing scikit learn with make_blobs from sklearn.datasets.samples_generator import make_blobs# creating datasets X containing n_samples # Y containing two classes X, Y = make_blobs(n_samples=500, centers=2,        random_state=0, cluster_std=0.40)# plotting scatters plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap='spring'); plt.show()Support vector machine is based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects with different class memberships. For example, in the figure mentioned below, there are objects which belong to either class Green or Red. The separating line defines a boundary on the right side of which all objects are Green and to the left of which all objects are Red. Any new object (white circle) falling to the right is labeled, i.e., classified, as Green (or classified as Red should it fall to the left of the separating line).Support vector machines not only draw a line between two classes, but consider a region about the line of some given width. Here’s an example of what it can look like:# creating line space between -1 to 3.5 xfit = np.linspace(-1, 3.5) # plotting scatter plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap='spring') # plot a line between the different sets of data for m, b, d in [(1, 0.65, 0.33), (0.5, 1.6, 0.55), (-0.2, 2.9, 0.2)]:     yfit = m * xfit + b     plt.plot(xfit, yfit, '-k')     plt.fill_between(xfit, yfit - d, yfit + d, edgecolor='none',     color='#AAAAAA', alpha=0.4)plt.xlim(-1, 3.5); plt.show()Another scenario, where it is clear that a full separation of the Green and Red objects would require a curve (which is more complex than a line). Classification tasks based on drawing separating lines to distinguish between objects of different class memberships are known as hyperplane classifiers. Support Vector Machines are particularly suited to handle such tasks.The figure below shows the basic idea behind Support Vector Machines. Here you will see that the original objects (left side of the schematic) mapped, are rearranged using a set of mathematical functions called kernels. This process of rearranging objects is known as mapping or transformation. You will notice that the right side of the schematic is linearly separable. All we can do is find an optimal line that will separate red and green objects.What is a hyperplane?The goal of Support Vector Machine is to find the hyperplane which separates these two objects or classes. Let us consider another figure which shows some of the possible hyperplanes which can help in separating or dividing the dataset. It is the choice of the best hyperplane which is also the goal. The best hyperplane is defined by the extent to which a maximum margin is left for both classes. The margin is the distance between the hyperplane and the closest point in the classification.Let us consider two hyperplanes among all and then check the margins represented by M1 and M2. You will notice that margin M1 > M2, so the choice of the hyperplane which separates the best one is the new plane between the green and blue planes.How do we find the right hyperplane?Now, let us represent the new plane by a linear equation as: f(x) = ax + bLet us consider that this equation delivers all values ≥ 1 from the green triangle class and ≤ -1 for the gold star class. The distance of this plane from the closest points in both the classes is at least one; the modulus is one. f(x) ≥ 1 for triangles and f(x) ≤ 1 or |f(x)| = 1 for starThe distance between the hyperplane and the point can be computed using the following equation. M1 = |f(x)| / ||a|| = 1 / ||a||The total margin is 1 / ||a|| + 1 / ||a|| = 2 / ||a|. In order to maximize the separability, we will have to maximize the ||a|| value. This particular value is known as a weight vector. We can minimize the weight value which is a non-linear optimization task. One of the methods is to use the Karush-Kuhn-Tucker (KKT) condition, using the Lagrange multiplier λi.What is a support vector in SVM?Let's take an example of two points between the two attributes X and Y. We need to find a point between these two points that has a maximum distance between these points. This requirement is represented in the graph depicted next. The optimal point is depicted using the red circle.The maximum margin weight vector is parallel to the line from (1, 1) to (2, 3). The weight vector is at (1,2), and this becomes a decision boundary that is halfway between and in perpendicular, that passes through (1.5, 2). So, y = x1 +2x2 − 5.5 and the geometric margin is computed as √5. Following are the steps to compute SVMs: With w = (a, 2a) for the functions of the points (1,1) and (2,3) can be represented as shown here: a + 2a + ω0 = -1 for the point (1,1) 2a + 6a + ω0 = 1 for the point (2,3) The weights can be computed as follows:These are the support vectors:Lastly, the final equation is as follows:Large Margin IntuitionIn logistic regression, the output of linear function is taken and the value is squashed within the range of [0,1] using the sigmoid function. If the value is greater than a threshold value, say 0.5, label 1 is assigned else label 0.  In case of support vector machines, the linear function is taken and if the output is greater than 1 and we identify it with one class and if the output is -1, it is identified with another class. Since the threshold values are changed to 1 and -1 in SVM, we obtain this reinforcement range of values([-1,1]) which acts as margin. Cost Function and Gradient UpdatesIn the SVM algorithm, we maximize the margin between the data points and the hyperplane. The loss function that helps maximize the margin is called the hinge loss.Hinge loss function (function on the left can be represented as a function on the right)   If the predicted value and the actual value are of the same sign, the cost is 0 . If not, we calculate the loss value. We also add a regularization parameter the cost function. The objective of the regularization parameter is to balance the margin maximization and loss. After adding the regularization parameter, the cost functions looks as below.Loss function for SVM  Now that we have the loss function, we take partial derivatives with respect to the weights to find the gradients. Using gradients, we can update our weights.Gradients  When there is no misclassification, i.e our model correctly predicts the class of our data point, we only have to update the gradient from the regularization parameter.Gradient Update — No misclassification  When there is a misclassification, i.e our model makes a mistake on the prediction of the class of our data point, we include the loss along with the regularization parameter to perform gradient update.Gradient Update — Misclassification  Let us start with a code and import the necessary libraries:import pandas as pd  import numpy as np  from sklearn.model_selection import train_test_split  from sklearn.model_selection import cross_val_score, GridSearchCV  from sklearn import metrics  from sklearn.preprocessing import MinMaxScaler  pd.set_option('display.max_columns', None)Read the Wisconsin Breast Cancer dataset using pandas.read_csv function into an object 'data' from the current directorydata = pd.read_csv('wisconsin.csv')After reading the data, we have prepared the data as per requirement. Feature scaling is a method used to standardize the range of independent variables or features of data. The min-max scaling (or min-max normalization) shrinks the range of feature such that the range is in between 0 and 1 (or -1 to 1 if there are negative values).sclr = MinMaxScaler() predictor_sc = sclr.fit_transform(predictor)predictor_sc.shapeSplit the scaled data into train-test split:x_train_sc,x_test_sc, y_train, y_test = train_test_split(predictor_sc, target, test_size = 0.30, random_state=101) print("Scaled train and test split") print("x_train ",x_train_sc.shape) print("x_test ",x_test_sc.shape) print("y_train ",y_train.shape) print("y_test ",y_test.shape)Scaled train and test split x_train  (398, 30) x_test  (171, 30) y_train  (398,) y_test  (171,)But what happens when there is no clear hyperplane? Support Vector Machines can probably help you to find a separating hyperplane but only if it exists. There are certain cases when it is not possible to define a hyperplane, this happens due to noise in the data. Another possible reason can be a non-linear boundary. The first graph below depicts noise and the next one shows a non-linear boundary.There might be cases where there is no possibility to define a hyperplane, which can happen due to noise in the data. In fact, another reason can be a non-linear boundary as well. The following first graph depicts noise and the second one shows a non-linear boundary.For such problems which arise due to noise in the data, the best way is to reduce the margin itself and introduce slack.The non-linear boundary problem can be solved if we introduce a kernel. Some of the kernel functions that can be introduced are mentioned below:A radial basis function is a real-valued function whose value is dependent on the distance between the input and some fixed point. In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms.The RBF kernel on two samples x and x', represented as feature vectors in some input space, is defined as:Applying SVM with default hyperparametersLet us get back to the example and apply SVM after data pre-processsing with default hyperparameters. Linear Kernelfrom sklearn import svm svm2 = svm.SVC(kernel='linear') svm2 SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) model2 = svm2.fit(x_train_sc, y_train) y_pred2 = svm2.predict(x_test_sc) print('Accuracy Score’) print(metrics.accuracy_score(y_test,y_pred2))Accuracy Score:0.9707602339181286Gaussian Kernelsvm3 = svm.SVC(kernel='rbf') svm3 SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) model3 = svm3.fit(x_train_sc, y_train) y_pred3 = svm3.predict(x_test_sc) print('Accuracy Score’) print(metrics.accuracy_score(y_test, y_pred3))Accuracy Score:0.935672514619883Polynomial Kernelsvm4 = svm.SVC(kernel='poly') svm4SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='poly', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)model4 = svm4.fit(x_train_sc, y_train) y_pred4 = svm4.predict(x_test_sc) print('Accuracy Score’) print(metrics.accuracy_score(y_test,y_pred4)) Accuracy Score:0.6198830409356725How to tune Parameters of SVM? Kernel: Kernel in support vector machine is responsible for the transformation of the input data into the required format. Some of the kernels used in support vector machines are linear, polynomial and radial basis function (RBF). In order to create a non-linear hyperplane, we use RBF and Polynomial function, and for complex applications, you should use more advanced kernels to separate classes that are nonlinear in nature. With this transformation, you can obtain accurate classifiers. Regularization: Using the Scikit-learn’s C parameters and adjusting we can maintain regularization. C denotes a penalty parameter representing an error or any form of misclassification. This misclassification allows you to understand how much of the error is actually bearable. This helps you nullify the compensation between the misclassified term and the decision boundary. With a smaller C value, you obtain hyperplane of small margin and with a larger C value, hyperplane of larger value is obtained. Gamma: Lower value of Gamma creates a loose fit of the training dataset. On the other hand, a high value of gamma allows the model to get fit more appropriately. A low value of gamma will only provide consideration to the nearby points for the calculation of a separate plane. However, the high value of gamma will consider all the data-points to calculate the final separation line. Do we need to tune parameters always?? You do not need to tune parameter in all cases. There are inbuilt functions in sklearn tool kit which can be used. Tuning HyperparametersThe 'C' and 'gamma' hyperparameterC is the parameter for the soft margin cost function, which controls the influence of each individual support vector. This process involves trading error penalty for stability. Small C tends to emphasize the margin while ignoring the outliers in the training data(Soft Margin), while large C may tend to overfit the training data(Hard Margin). Thus for a very large values we can cause overfitting of the model and for a very small value of C we can cause underfitting.Thus the value of C must be chosen in such a manner that it generalises the unseen data well. The gamma parameter is the inverse of the standard deviation of the RBF kernel (Gaussian function), which is used as a similarity measure between two points. A small gamma value define a Gaussian function with a large variance. In this case, two points can be considered similar even if are far from each other. On the other hand, a large gamma value define a Gaussian function with a small variance and in this case, two points are considered similar just if they are close to each other. Taking kernel as linear and tuning C hyperparameterC_range=list(range(1,26)) acc_score=[] for c in C_range: svc = svm.SVC(kernel='linear', C=c) scores = cross_val_score(svc, predictor_sc, target, cv=10, scoring='accuracy') acc_score.append(scores.mean()) print(acc_score) [0.9772210699161695, 0.9772210699161695, 0.9806995938121164, 0.9824539797770286, 0.9789754558810818, 0.9789452078472042, 0.9806995938121164, 0.9789452078472041, 0.9789452078472041, 0.9789452078472041, 0.9806995938121164, 0.9789452078472041, 0.9789452078472041, 0.9772210699161695, 0.9772210699161695, 0.9772210699161695, 0.9772210699161695, 0.9754666839512574, 0.9754666839512574, 0.9754666839512574, 0.9754666839512574, 0.9754666839512574, 0.9754666839512574, 0.9754666839512574, 0.9754666839512574]Let us visualize the above points:import matplotlib.pyplot as plt %matplotlib inline C_Val_list = list(range(1,26)) plt.plot(C_Val_list,acc_score) plt.xticks(np.arange(0,27,2)) plt.xlabel('Value of C for SVC') plt.ylabel('Cross-Validated Accuracy')From the plot we can see that accuracy has been close to 98% somewhere in between C=4 and C=5 and then it drops.#Taking a close look at the cross-validation accuracy in the range C(4,5) C_range=list(np.arange(4,5,0.2)) acc_score=[] for c in C_range: svc = svm.SVC(kernel='linear', C=c) scores = cross_val_score(svc, predictor_sc, target, cv=10, scoring='accuracy') acc_score.append(scores.mean()) print(acc_score) [0.9824539797770286, 0.9806995938121164, 0.9789754558810818, 0.9789754558810818, 0.9789754558810818] Accuracy score is highest for C=4Taking kernel as gaussian and tuning gamma hyperparametergamma_range=[0.0001,0.001,0.01,0.1,1,10,100] acc_score=[] for g in gamma_range: svc = svm.SVC(kernel='rbf', gamma=g) scores = cross_val_score(svc, predictor_sc, target, cv=10, scoring='accuracy') acc_score.append(scores.mean()) print(acc_score) [0.6274274047186933, 0.6274274047186933, 0.9195035001296346, 0.9561651974764496, 0.9806995938121164, 0.9420026359000951, 0.6274274047186933] Let us visualize the above points: gamma_range=[0.0001,0.001,0.01,0.1,1,10,100]# plotting the value of gamma for SVM versus the cross-validated accuracy plt.plot(gamma_range,acc_score) plt.xlabel('Value of gamma for SVC ') plt.xticks(np.arange(0.0001,100,5)) plt.ylabel('Cross-Validated Accuracy')Text(0,0.5,'Cross-Validated Accuracy')For gamma between 5 and 100 the kernel performs very poorly.Let us take a closer look at the cross-validated accuracy for gamma value in between 0 and 5.gamma_range=list(np.arange(0.1,5,0.1))  acc_score=[] for g in gamma_range:  svc = svm.SVC(kernel='rbf', gamma=g)  scores = cross_val_score(svc, predictor_sc, target, cv=10, scoring='accuracy') acc_score.append(scores.mean())  print(acc_score)[0.9561651974764496, 0.9718952553798289, 0.9754051075965776, 0.9737122979863452, 0.9806995938121164, 0.9806995938121164, 0.9806995938121164, 0.9806995938121164, 0.9806995938121164, 0.9806995938121164, 0.9789754558810818, 0.9754969319851352, 0.9754969319851352, 0.9754969319851352, 0.9754969319851352, 0.9737727940541007, 0.9737727940541007, 0.9737727940541007, 0.9737727940541007, 0.9720184080891883, 0.9720184080891883, 0.9720184080891883, 0.9720184080891883, 0.9720184080891883, 0.9720184080891883, 0.9702326938034741, 0.9702326938034741, 0.9702326938034741, 0.9702326938034741, 0.9702326938034741, 0.9702326938034741, 0.9702326938034741, 0.9702326938034741, 0.9666925935528475, 0.9666925935528475, 0.9684167314838821, 0.9684167314838821, 0.9684167314838821, 0.9701711174487941, 0.9701711174487941, 0.96838540316308, 0.9649068792671333, 0.9649068792671333, 0.9649068792671333, 0.9649068792671333, 0.9649068792671333, 0.9649068792671333, 0.963152493302221, 0.963152493302221] gamma_range=list(np.arange(0.1,5,0.1)) plt.plot(gamma_range,acc_score) plt.xlabel('Value of gamma for SVC ') #plt.xticks(np.arange(0.0001,5,5)) plt.ylabel('Cross-Validated Accuracy') Text(0,0.5,'Cross-Validated Accuracy')The highest cross-validated accuracy for rbf kernel remains constant in between gamma=0.5 and gamma=1Taking polynomial kernel and tuning degree hyperparameterdegree=[2,3,4,5,6] acc_score=[] for d in degree: svc = svm.SVC(kernel='poly', degree=d) scores = cross_val_score(svc, predictor_sc, target, cv=10, scoring='accuracy') acc_score.append(scores.mean()) print(acc_score) [0.8350974418805635, 0.6450652493302222, 0.6274274047186933, 0.6274274047186933, 0.6274274047186933] plt.plot(degree,acc_score) plt.xlabel('degrees for SVC ') plt.ylabel('Cross-Validated Accuracy') Text(0,0.5,'Cross-Validated Accuracy')Score is high for second degree polynomial. There is drop in the accuracy score as degree of polynomial increases.Thus increase in polynomial degree results in high complexity of the model. Advantages and Disadvantages of Support Vector MachineAdvantages of SVMSVM Classifiers offer good accuracy and perform faster prediction compared to Naïve Bayes algorithm. SVM guarantees optimality due to the nature of Convex Optimization, the solution will always be global minimum not a local minimum. SVMcan be access it conveniently, be it from Python or Matlab. SVM can be used for both linearly separable as well as non-linearly separable data. Linearly separable data is the hard margin however, non-linearly separable data poses a soft margin. SVM provides compliance to the semi-supervised learning models as well. It can be implemented in both labelled and unlabelled data. The only thing it requires is a condition to the minimization problem which is known as the Transductive SVM. Feature Mapping used to be complex with respect to computation of the overall training performance of the model. With the help of Kernel Trick, SVM can carry out the feature mapping using simple dot product. SVM works well with a clear margin of separation and with high dimensional space.  Disadvantages of SVM SVM is not at all capable of handling text structures. It leads to bad performance as it results in the loss of sequential information. SVM is not suitable for large datasets because of its high training time and it also takes more time in training compared to Naïve Bayes. SVM works poorly with overlapping classes and is also sensitive to the type of kernel used. In cases where the number of features for each data point exceeds the number of training data samples , the SVM under performs. Applications of SVM in Real WorldSupport vector machines depend on supervised learning algorithms. The main goal of using SVM is to classify unseen data correctly. SVMs can be used to solve various real-world problems: Face detection – SVM can be used to classify parts of the image as a face and non-face and create a square boundary around the face. Text and hypertext categorization – SVM allows text and hypertext categorization for both inductive and transductive models. It uses training data for classification of documents into different categories. It categorizes based on the score generated and then compares with the threshold value. Classification of images – SVMs enhances search accuracy for image classification. In comparison to the traditional query-based searching techniques, SVM provides better accuracy. Bioinformatics – It includes classification of proteins and classification of cancer. SVM is used for identifying the classification of genes, patients on the basis of genes and other biological problems. Protein fold and remote homology detection – SVM algorithms are applied for protein remote homology detection. Handwriting recognition –  SVMs are used widely to recognize handwritten characters.  Generalized predictive control(GPC) – You can use SVM based GPC in order to control chaotic dynamics with useful parameters. Summary In this article, we looked at the machine learning algorithm, Support Vector Machine in detail. We have discussed the concept behind support vector machines, how it works, the process of implementation in Python.  We also looked into how to tune its parameters and make efficient models. Lastly, we came across the advantages and disadvantages of SVM along with various real world applications of support vector machines.We have covered most of the topics related to algorithms in our series of machine learning blogs,click here. If you are inspired by the opportunities provided by machine learning, enrol in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape.
Rated 4.0/5 based on 67 customer reviews

Support Vector Machines in Machine Learning

27984
Support Vector Machines in Machine Learning

While many classifiers exist that can classify linearly separable data such as logistic regression, Support Vector Machines can handle highly non-linear problems using a kernel trick which implicitly maps the input vectors to higher-dimensional feature spaces. The transformation rearranges the dataset in such a way that it is then linearly solvable. In this article we are going to look at how SVM works, learn about kernel functions, hyperparameters and pros and cons of SVM along with some of the real life applications of SVM. 

Support Vector Machines (SVMs), also known as support vector networks, are a family of extremely powerful models which use method based learning and can be used in classification and regression problems. They aim at finding decision boundaries that separate observations with differing class memberships. In other words, SVM is a discriminative classifier formally defined by a separating hyperplane.

Method Based Learning

Method Based Learning in Machine Learning

 There are several learning models namely:

  • Association rules based
  • Ensemble method based
  • Deep Learning based
  • Clustering method based
  • Regression Analysis based
  • Bayesian method based
  • Dimensionality reduction based
  • Instance based
  • Kernel method based

Let us understand what Kernel method based learning is all about.

In simple terms, a kernel is a similarity function which is fed into a machine learning algorithm. It accepts two inputs and suggests the similarity. For example, suppose we want to classify images, the input data is a key-value pair (image, label). The image data is taken into consideration, features are computed, and a vector of features are fed into the Machine learning algorithm. But in the case of similarity functions, a kernel function can be defined which internally computes the similarity between images, and then feeds into the learning algorithm along with the images and label data. The outcome of this is a classifier. 

Perceptron frameworks or Support vector machines work with kernels and use vectors only. Here, the machine learning algorithms are expressed as dot products so that kernel functions can be used.

Feature vectors generally prefer kernels. Its ease of computing makes it one of the key reasons, also, feature vectors need more storage space in comparison to dot products. You can write

Machine learning algorithms to use dot products and later map them to use kernels. This completely avoids the usage of feature vectors. This allows us to work with highly complex, efficient-to-compute, and yet high performing kernels effortlessly, without really developing multi-dimensional vectors.

Kernel functions

Let us understand what kernel functions are: The figure shown below represents a 1D function using a simple 1-Dimensional example. Assume that given points are as follows, it will depict a vertical line and no other vertical lines will separate the dataset.

Kernel functions in Machine Learning

Now, if we consider a 2-Dimensional representation, as shown in the figure below, there is a hyperplane (an arbitrary line in 2-Dimensions) which separates red and blue points, which can be separated using Support Vector Machines.

Kernel functions In Machine Learning

As we keep increasing dimensional space, the need to be able to separate data will eventually decrease. This mapping, x -> (x, x2), is called the kernel function. In case of growing dimensional space, the computations become more complex and kernel trick needs to be applied to address these computations cheaply. 

What is Support Vector Machine? 

Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However,  it is mostly used in classification problems. In this algorithm, each data is plotted in n-dimensional space (where n is the number of features you have) with the value of each feature being the value of a particular coordinate. After that, we perform classification by locating the hyperplane which differentiates both the classes.

Let us create a dataset to understand support vector classification:

# importing scikit learn with make_blobs 
from sklearn.datasets.samples_generator import make_blobs
# creating datasets X containing n_samples
# Y containing two classes
X, Y = make_blobs(n_samples=500, centers=2,
       random_state=0, cluster_std=0.40)
# plotting scatters
plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap='spring');
plt.show()

Support Vector Machine Graph in Machine Learning

Support vector machine is based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects with different class memberships. For example, in the figure mentioned below, there are objects which belong to either class Green or Red. The separating line defines a boundary on the right side of which all objects are Green and to the left of which all objects are Red. Any new object (white circle) falling to the right is labeled, i.e., classified, as Green (or classified as Red should it fall to the left of the separating line).

Support vector machine Boundaries in machine Learning

Support vector machines not only draw a line between two classes, but consider a region about the line of some given width. Here’s an example of what it can look like:

# creating line space between -1 to 3.5 
xfit = np.linspace(-1, 3.5) 
# plotting scatter 
plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap='spring')

# plot a line between the different sets of data 
for m, b, d in [(1, 0.65, 0.33), (0.5, 1.6, 0.55), (-0.2, 2.9, 0.2)]: 
    yfit = m * xfit + b 
    plt.plot(xfit, yfit, '-k') 
    plt.fill_between(xfit, yfit - d, yfit + d, edgecolor='none', 
    color='#AAAAAA', alpha=0.4)
plt.xlim(-1, 3.5);
plt.show()

Support vector machine Graph in Machine Learning

Another scenario, where it is clear that a full separation of the Green and Red objects would require a curve (which is more complex than a line). Classification tasks based on drawing separating lines to distinguish between objects of different class memberships are known as hyperplane classifiers. Support Vector Machines are particularly suited to handle such tasks.

hyperplane classifiers in Machine Learning

The figure below shows the basic idea behind Support Vector Machines. Here you will see that the original objects (left side of the schematic) mapped, are rearranged using a set of mathematical functions called kernels. This process of rearranging objects is known as mapping or transformation. You will notice that the right side of the schematic is linearly separable. All we can do is find an optimal line that will separate red and green objects.

 kernels in Machine Learning

What is a hyperplane?

Hyperplane in Machine Learning

The goal of Support Vector Machine is to find the hyperplane which separates these two objects or classes. Let us consider another figure which shows some of the possible hyperplanes which can help in separating or dividing the dataset. It is the choice of the best hyperplane which is also the goal. The best hyperplane is defined by the extent to which a maximum margin is left for both classes. The margin is the distance between the hyperplane and the closest point in the classification.

Hyperplane in Machine Learning

Let us consider two hyperplanes among all and then check the margins represented by M1 and M2. You will notice that margin M1 > M2, so the choice of the hyperplane which separates the best one is the new plane between the green and blue planes.

Hyperplane in Machine Learning

How do we find the right hyperplane?

Now, let us represent the new plane by a linear equation as: 

f(x) = ax + b

Let us consider that this equation delivers all values ≥ 1 from the green triangle class and ≤ -1 for the gold star class. The distance of this plane from the closest points in both the classes is at least one; the modulus is one. 

f(x) ≥ 1 for triangles and f(x) ≤ 1 or |f(x)| = 1 for star

The distance between the hyperplane and the point can be computed using the following equation. 

M1 = |f(x)| / ||a|| = 1 / ||a||

The total margin is 1 / ||a|| + 1 / ||a|| = 2 / ||a|

In order to maximize the separability, we will have to maximize the ||a|| value. This particular value is known as a weight vector. We can minimize the weight value which is a non-linear optimization task. One of the methods is to use the Karush-Kuhn-Tucker (KKT) condition, using the Lagrange multiplier λi.

Hyperplane Equation in Machine Learning

Hyperplane Graph in Machine Learning

What is a support vector in SVM?

support vector in SVM

Let's take an example of two points between the two attributes X and Y. We need to find a point between these two points that has a maximum distance between these points. This requirement is represented in the graph depicted next. The optimal point is depicted using the red circle.

support vector in SVM

The maximum margin weight vector is parallel to the line from (1, 1) to (2, 3). The weight vector is at (1,2), and this becomes a decision boundary that is halfway between and in perpendicular, that passes through (1.5, 2)

So, y = x1 +2x2 − 5.5 and the geometric margin is computed as √5

Following are the steps to compute SVMs: 

With w = (a, 2a) for the functions of the points (1,1) and (2,3) can be represented as shown here: 

a + 2a + ω0 = -1 for the point (1,1) 

2a + 6a + ω0 = 1 for the point (2,3) 

The weights can be computed as follows:

support vector Equations in SVM

support vector Equations in SVM

These are the support vectors:

support vectors in Machine Learning

Lastly, the final equation is as follows:

support vector Final Equations in SVM

Large Margin Intuition

In logistic regression, the output of linear function is taken and the value is squashed within the range of [0,1] using the sigmoid function. If the value is greater than a threshold value, say 0.5, label 1 is assigned else label 0.  

In case of support vector machines, the linear function is taken and if the output is greater than 1 and we identify it with one class and if the output is -1, it is identified with another class. Since the threshold values are changed to 1 and -1 in SVM, we obtain this reinforcement range of values([-1,1]) which acts as margin. 

Cost Function and Gradient Updates

In the SVM algorithm, we maximize the margin between the data points and the hyperplane. The loss function that helps maximize the margin is called the hinge loss.

Cost Function and Gradient Updates in Machine LearningHinge loss function (function on the left can be represented as a function on the right)   If the predicted value and the actual value are of the same sign, the cost is 0 . If not, we calculate the loss value. We also add a regularization parameter the cost function. The objective of the regularization parameter is to balance the margin maximization and loss. After adding the regularization parameter, the cost functions looks as below.

Cost Function and Gradient Updates in Machine LearningLoss function for SVM  Now that we have the loss function, we take partial derivatives with respect to the weights to find the gradients. Using gradients, we can update our weights.

Cost Function and Gradient Updates in Machine LearningGradients  When there is no misclassification, i.e our model correctly predicts the class of our data point, we only have to update the gradient from the regularization parameter.

Cost Function and Gradient Updates in Machine LearningGradient Update — No misclassification  When there is a misclassification, i.e our model makes a mistake on the prediction of the class of our data point, we include the loss along with the regularization parameter to perform gradient update.

Gradient Update — Misclassification  Let us start with a code and import the necessary libraries:

import pandas as pd 
import numpy as np 
from sklearn.model_selection import train_test_split 
from sklearn.model_selection import cross_val_score, GridSearchCV 
from sklearn import metrics 
from sklearn.preprocessing import MinMaxScaler 
pd.set_option('display.max_columns', None)

Read the Wisconsin Breast Cancer dataset using pandas.read_csv function into an object 'data' from the current directory

data = pd.read_csv('wisconsin.csv')

After reading the data, we have prepared the data as per requirement. Feature scaling is a method used to standardize the range of independent variables or features of data. The min-max scaling (or min-max normalization) shrinks the range of feature such that the range is in between 0 and 1 (or -1 to 1 if there are negative values).

sclr = MinMaxScaler()
predictor_sc = sclr.fit_transform(predictor)predictor_sc.shape

Split the scaled data into train-test split:

x_train_sc,x_test_sc, y_train, y_test = train_test_split(predictor_sc, target, test_size = 0.30, random_state=101)
print("Scaled train and test split")
print("x_train ",x_train_sc.shape)
print("x_test ",x_test_sc.shape)
print("y_train ",y_train.shape)
print("y_test ",y_test.shape)
Scaled train and test split
x_train  (398, 30)
x_test  (171, 30)
y_train  (398,)
y_test  (171,)

But what happens when there is no clear hyperplane? 

Support Vector Machines can probably help you to find a separating hyperplane but only if it exists. There are certain cases when it is not possible to define a hyperplane, this happens due to noise in the data. Another possible reason can be a non-linear boundary. The first graph below depicts noise and the next one shows a non-linear boundary.

There might be cases where there is no possibility to define a hyperplane, which can happen due to noise in the data. In fact, another reason can be a non-linear boundary as well. The following first graph depicts noise and the second one shows a non-linear boundary.

Hyperplane Graph in Machine Learning

Hyperplane Graph in Machine Learning

For such problems which arise due to noise in the data, the best way is to reduce the margin itself and introduce slack.

Hyperplane Data in Machine Learning

The non-linear boundary problem can be solved if we introduce a kernel. Some of the kernel functions that can be introduced are mentioned below:

non-linear boundary problem in Machine Learning

A radial basis function is a real-valued function whose value is dependent on the distance between the input and some fixed point. In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms.

The RBF kernel on two samples x and x', represented as feature vectors in some input space, is defined as:

RBF kernel Equation in Machine Learning

Applying SVM with default hyperparameters

Let us get back to the example and apply SVM after data pre-processsing with default hyperparameters. 

Linear Kernel

from sklearn import svm 
svm2 = svm.SVC(kernel='linear') 
svm2

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, 
decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear', 
max_iter=-1, probability=False, random_state=None, shrinking=True, 
tol=0.001, verbose=False)

model2 = svm2.fit(x_train_sc, y_train) 
y_pred2 = svm2.predict(x_test_sc) 
print('Accuracy Score’) 
print(metrics.accuracy_score(y_test,y_pred2))
Accuracy Score:0.9707602339181286

Gaussian Kernel

svm3 = svm.SVC(kernel='rbf') 
svm3 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf', 
max_iter=-1, probability=False, random_state=None, shrinking=True, 
tol=0.001, verbose=False) 
model3 = svm3.fit(x_train_sc, y_train) 
y_pred3 = svm3.predict(x_test_sc) 
print('Accuracy Score’) 
print(metrics.accuracy_score(y_test, y_pred3))
Accuracy Score:0.935672514619883

Polynomial Kernel

svm4 = svm.SVC(kernel='poly') 
svm4
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, 
decision_function_shape='ovr', degree=3, gamma='auto', kernel='poly', 
max_iter=-1, probability=False, random_state=None, shrinking=True, 
tol=0.001, verbose=False)
model4 = svm4.fit(x_train_sc, y_train) 
y_pred4 = svm4.predict(x_test_sc) 
print('Accuracy Score’) 
print(metrics.accuracy_score(y_test,y_pred4)) 
Accuracy Score:0.6198830409356725

How to tune Parameters of SVM? 

Kernel: Kernel in support vector machine is responsible for the transformation of the input data into the required format. Some of the kernels used in support vector machines are linear, polynomial and radial basis function (RBF). In order to create a non-linear hyperplane, we use RBF and Polynomial function, and for complex applications, you should use more advanced kernels to separate classes that are nonlinear in nature. With this transformation, you can obtain accurate classifiers. 

Regularization: Using the Scikit-learn’s C parameters and adjusting we can maintain regularization. C denotes a penalty parameter representing an error or any form of misclassification. This misclassification allows you to understand how much of the error is actually bearable. This helps you nullify the compensation between the misclassified term and the decision boundary. With a smaller C value, you obtain hyperplane of small margin and with a larger C value, hyperplane of larger value is obtained. 

Gamma: Lower value of Gamma creates a loose fit of the training dataset. On the other hand, a high value of gamma allows the model to get fit more appropriately. A low value of gamma will only provide consideration to the nearby points for the calculation of a separate plane. However, the high value of gamma will consider all the data-points to calculate the final separation line. 

Do we need to tune parameters always?? 

You do not need to tune parameter in all cases. There are inbuilt functions in sklearn tool kit which can be used. 

Tuning Hyperparameters

The 'C' and 'gamma' hyperparameter

C is the parameter for the soft margin cost function, which controls the influence of each individual support vector. This process involves trading error penalty for stability. Small C tends to emphasize the margin while ignoring the outliers in the training data(Soft Margin), while large C may tend to overfit the training data(Hard Margin). Thus for a very large values we can cause overfitting of the model and for a very small value of C we can cause underfitting.Thus the value of C must be chosen in such a manner that it generalises the unseen data well. 

The gamma parameter is the inverse of the standard deviation of the RBF kernel (Gaussian function), which is used as a similarity measure between two points. A small gamma value define a Gaussian function with a large variance. In this case, two points can be considered similar even if are far from each other. On the other hand, a large gamma value define a Gaussian function with a small variance and in this case, two points are considered similar just if they are close to each other. 

Taking kernel as linear and tuning C hyperparameter

C_range=list(range(1,26))
acc_score=[]
for c in C_range:
svc = svm.SVC(kernel='linear', C=c)
scores = cross_val_score(svc, predictor_sc, target, cv=10,
scoring='accuracy')
acc_score.append(scores.mean())
print(acc_score)

[0.9772210699161695, 0.9772210699161695, 0.9806995938121164,
0.9824539797770286, 0.9789754558810818, 0.9789452078472042,
0.9806995938121164, 0.9789452078472041, 0.9789452078472041,
0.9789452078472041, 0.9806995938121164, 0.9789452078472041,
0.9789452078472041, 0.9772210699161695, 0.9772210699161695,
0.9772210699161695, 0.9772210699161695, 0.9754666839512574,
0.9754666839512574, 0.9754666839512574, 0.9754666839512574,
0.9754666839512574, 0.9754666839512574, 0.9754666839512574,
0.9754666839512574]

Let us visualize the above points:

import matplotlib.pyplot as plt
%matplotlib inline
C_Val_list = list(range(1,26))

plt.plot(C_Val_list,acc_score)
plt.xticks(np.arange(0,27,2))
plt.xlabel('Value of C for SVC')
plt.ylabel('Cross-Validated Accuracy')

From the plot we can see that accuracy has been close to 98% somewhere in between C=4 and C=5 and then it drops.

#Taking a close look at the cross-validation accuracy in the range C(4,5) 
C_range=list(np.arange(4,5,0.2)) 
acc_score=[]
for c in C_range:
svc = svm.SVC(kernel='linear', C=c)
scores = cross_val_score(svc, predictor_sc, target, cv=10,
scoring='accuracy')
acc_score.append(scores.mean())
print(acc_score) 

[0.9824539797770286, 0.9806995938121164, 0.9789754558810818,
0.9789754558810818, 0.9789754558810818] 

Accuracy score is highest for C=4

Taking kernel as gaussian and tuning gamma hyperparameter

gamma_range=[0.0001,0.001,0.01,0.1,1,10,100] 
acc_score=[] 
for g in gamma_range: 
svc = svm.SVC(kernel='rbf', gamma=g) 
scores = cross_val_score(svc, predictor_sc, target, cv=10,
scoring='accuracy')
acc_score.append(scores.mean()) 
print(acc_score)

[0.6274274047186933, 0.6274274047186933, 0.9195035001296346,
0.9561651974764496, 0.9806995938121164, 0.9420026359000951,
0.6274274047186933]

Let us visualize the above points: 
gamma_range=[0.0001,0.001,0.01,0.1,1,10,100]
# plotting the value of gamma for SVM versus the cross-validated accuracy 
plt.plot(gamma_range,acc_score) 
plt.xlabel('Value of gamma for SVC ') 
plt.xticks(np.arange(0.0001,100,5)) 
plt.ylabel('Cross-Validated Accuracy')
Text(0,0.5,'Cross-Validated Accuracy')

For gamma between 5 and 100 the kernel performs very poorly.
Let us take a closer look at the cross-validated accuracy for gamma value in between 0 and 5.

gamma_range=list(np.arange(0.1,5,0.1)) 
acc_score=[]
for g in gamma_range: 
svc = svm.SVC(kernel='rbf', gamma=g) 
scores = cross_val_score(svc, predictor_sc, target, cv=10,
scoring='accuracy')
acc_score.append(scores.mean()) 
print(acc_score)


[0.9561651974764496, 0.9718952553798289, 0.9754051075965776, 
0.9737122979863452, 0.9806995938121164, 0.9806995938121164, 
0.9806995938121164, 0.9806995938121164, 0.9806995938121164, 
0.9806995938121164, 0.9789754558810818, 0.9754969319851352, 
0.9754969319851352, 0.9754969319851352, 0.9754969319851352, 
0.9737727940541007, 0.9737727940541007, 0.9737727940541007, 
0.9737727940541007, 0.9720184080891883, 0.9720184080891883, 
0.9720184080891883, 0.9720184080891883, 0.9720184080891883, 
0.9720184080891883, 0.9702326938034741, 0.9702326938034741, 
0.9702326938034741, 0.9702326938034741, 0.9702326938034741, 
0.9702326938034741, 0.9702326938034741, 0.9702326938034741, 
0.9666925935528475, 0.9666925935528475, 0.9684167314838821, 
0.9684167314838821, 0.9684167314838821, 0.9701711174487941, 
0.9701711174487941, 0.96838540316308, 0.9649068792671333, 
0.9649068792671333, 0.9649068792671333, 0.9649068792671333, 
0.9649068792671333, 0.9649068792671333, 0.963152493302221, 
0.963152493302221]

gamma_range=list(np.arange(0.1,5,0.1))

plt.plot(gamma_range,acc_score)
plt.xlabel('Value of gamma for SVC ')
#plt.xticks(np.arange(0.0001,5,5))
plt.ylabel('Cross-Validated Accuracy')

Text(0,0.5,'Cross-Validated Accuracy')

The highest cross-validated accuracy for rbf kernel remains constant in between gamma=0.5 and gamma=1

Taking polynomial kernel and tuning degree hyperparameter

degree=[2,3,4,5,6] 
acc_score=[]
for d in degree:
svc = svm.SVC(kernel='poly', degree=d)
scores = cross_val_score(svc, predictor_sc, target, cv=10,
scoring='accuracy')
acc_score.append(scores.mean()) 
print(acc_score)
 
[0.8350974418805635, 0.6450652493302222, 0.6274274047186933,
0.6274274047186933, 0.6274274047186933]

plt.plot(degree,acc_score) 
plt.xlabel('degrees for SVC ') 
plt.ylabel('Cross-Validated Accuracy')

Text(0,0.5,'Cross-Validated Accuracy')

Score is high for second degree polynomial. There is drop in the accuracy score as degree of polynomial increases.Thus increase in polynomial degree results in high complexity of the model. 

Advantages and Disadvantages of Support Vector Machine

Advantages of SVM

  • SVM Classifiers offer good accuracy and perform faster prediction compared to Naïve Bayes algorithm. 
  • SVM guarantees optimality due to the nature of Convex Optimization, the solution will always be global minimum not a local minimum. 
  • SVMcan be access it conveniently, be it from Python or Matlab. 
  • SVM can be used for both linearly separable as well as non-linearly separable data. Linearly separable data is the hard margin however, non-linearly separable data poses a soft margin. 
  • SVM provides compliance to the semi-supervised learning models as well. It can be implemented in both labelled and unlabelled data. The only thing it requires is a condition to the minimization problem which is known as the Transductive SVM. 
  • Feature Mapping used to be complex with respect to computation of the overall training performance of the model. With the help of Kernel Trick, SVM can carry out the feature mapping using simple dot product. 
  • SVM works well with a clear margin of separation and with high dimensional space.  

Disadvantages of SVM 

  • SVM is not at all capable of handling text structures. It leads to bad performance as it results in the loss of sequential information. 
  • SVM is not suitable for large datasets because of its high training time and it also takes more time in training compared to Naïve Bayes. 
  • SVM works poorly with overlapping classes and is also sensitive to the type of kernel used. 
  • In cases where the number of features for each data point exceeds the number of training data samples , the SVM under performs. 

Applications of SVM in Real World

Support vector machines depend on supervised learning algorithms. The main goal of using SVM is to classify unseen data correctly. SVMs can be used to solve various real-world problems: 

  • Face detection – SVM can be used to classify parts of the image as a face and non-face and create a square boundary around the face. 
  • Text and hypertext categorization – SVM allows text and hypertext categorization for both inductive and transductive models. It uses training data for classification of documents into different categories. It categorizes based on the score generated and then compares with the threshold value. 
  • Classification of images – SVMs enhances search accuracy for image classification. In comparison to the traditional query-based searching techniques, SVM provides better accuracy. 
  • Bioinformatics – It includes classification of proteins and classification of cancer. SVM is used for identifying the classification of genes, patients on the basis of genes and other biological problems. 
  • Protein fold and remote homology detection – SVM algorithms are applied for protein remote homology detection. 
  • Handwriting recognition –  SVMs are used widely to recognize handwritten characters.  
  • Generalized predictive control(GPC) – You can use SVM based GPC in order to control chaotic dynamics with useful parameters. 

Summary 

In this article, we looked at the machine learning algorithm, Support Vector Machine in detail. We have discussed the concept behind support vector machines, how it works, the process of implementation in Python.  We also looked into how to tune its parameters and make efficient models. Lastly, we came across the advantages and disadvantages of SVM along with various real world applications of support vector machines.

We have covered most of the topics related to algorithms in our series of machine learning blogs,click here. If you are inspired by the opportunities provided by machine learning, enrol in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape.

Priyankur

Priyankur Sarkar

Data Science Enthusiast

Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.

Join the Discussion

Your email address will not be published. Required fields are marked *

Suggested Blogs

Trending Specialization Courses in Data Science

Data scientists, today are earning more than the average IT employees. A study estimates a need for 190,000 data scientists in the US alone by 2021. In India, this number is estimated to grow eightfold, reaching $16 billion by 2025 in the Big Data analytics sector. With such a growing demand for data scientists, the industry is developing a niche market of specialists within its fields.  Companies of all sizes, right from large corporations to start-ups are realizing the potential of data science and increasingly hiring data scientists. This means that most data scientists are coupled with a team, which is staffed with individuals with similar skills. While you cannot remain a domain expert in everything related to data, one can be the best at the specific skill or specialization that they were hired for. Not only thisspecialization within data science will also entail you with more skills in paper and practice, compared to other prospects during your next interview. Trending Specialization Courses in Data Science One of the biggest myths about data science is that one needs a degree or Ph.D. in Data Science to get a good job. This is not always necessary. In reality, employers value job experience more than education. Even if one is from a non-technical background, they can pursue a career in data science with basic knowledge about its tools such as SAS/R, Python coding, SQL database, Hadoop, and a passion towards data.  Let’s explore some of the trending specializations that companies are currently looking out for while hiring data scientists: Data Science with Python Python, originally a general-purpose language, isan open-source code and a common language for data science. This language has a dedicated library for data analysis and predictive modeling, making it a highly demandeddata science tool. On a personal level, learning data science with python can also help you produce web-based analytics products.  Data Science with R A powerful language commonly used for data analysis and statistical computing; R is one of the best picks for beginners as it does not require any prior coding experience. It consists of packages like SparkR, ggplot2, dplyr, tidyr, readr, etc., which have made data manipulation, visualization, and computation faster. Additionally, it also has provisions to implement machine learning algorithms. Big Data analytics Big data is the most trending of the listed specializations and requires a certain level of experience. It examines large amounts of data and extracts hidden patterns, correlations, and several other insights. Companies world-over are using it to get instant inputs and business results. According to IDC, Big Data and Business Analytics Solutions will reach a whopping $189.1 billion this year. Additionally, big data is a huge umbrella term that uses several types of technologies to get the most value out of the data collected. Some of them include machine learning, natural language processing, predictive analysis, text mining, SAS®, Hadoop, and many more.  Other specializations Some knowledge of other fields is also required for data scientists to showcase their expertise in the industry. Being in the know-how of tools and technologies related to machine learning, artificial intelligence, the Internet of Things (IoT), blockchain and several other unexplored fields is vital for data enthusiasts to emerge as leaders in their niche fields.  Building a career in Data Science  Whether you are a data aspirant from a non-technical background, a fresher, or an experienced data scientist – staying industry-relevant is important to get ahead. The industry is growing at a massive rate and is expected to have 2.7 million open job roles by the end of 2020. Industry experts point out that one of the biggest causes for tech companies to lay off employees is not automation, but the growing gap between evolving technologies and the lack of niche manpower to work on it. To meet these high standards keeping up with your data game is crucial. 
Rated 4.5/5 based on 0 customer reviews
2875
Trending Specialization Courses in Data Science

Data scientists, today are earning more than the a... Read More

10 Mandatory Skills to Become a Data Scientist

The data science industry is growing at an alarming pace, generating a revenue of $3.03 billion in India alone. Even a 10% increase in data accessibility is said to result in over $65 million additional net income for the typical Fortune 1000 companies worldwide. The data scientist has been ranked the best job in the US for the 4th year in a row, with an average salary of $108,000; and the demand for more data scientists only seems to be growing. Who is a Data scientist? A data scientist is precisely someone who collects all the massive data that is available online, organizes the unstructured formats into bite-sized readable content, and analyses this to extract vital information about customer trends, thinking patterns, and behavior. This information is then used to create business goals or agendas that are aligned to the end-user/customer’s needs.  This outlines that a data scientist is someone with sound technical knowledge, interpersonal skills, strong business acumen, and most importantly, a passionate data enthusiast. Listed below are some mandatory skills that an aspiring data scientist must develop. 10 Mandatory Skills to Become a Data Scientist Technical Skills  Programming, Packages, and Software Since the first task of data scientists is to gather all the information or raw data and transform this into actionable insights, they need to have advanced knowledge in coding and statistical data processing. Some of the common programming languages used by data scientists are Python, R, SQL, NoSQL, Java, Scala, Hadoop, and many more.  Machine Learning and Deep Learning Machine Learning and Deep Learning are subsets of Artificial Intelligence (AI). Data science largely overlaps the growing field of AI, as data scientists use their potentials to clean, prepare, and extract data to run several AI applications. While machine learning enables supervised, unsupervised, and reinforced learning, deep learning helps in making datasets study and learn from existing information. A good example is the facial recognition feature in photos, doodling games like quick draw, and more. NLP, Cloud Computing and others Natural Language Processing (NLP), a branch of AI that uses the language used by human beings, processes it and learns to respond accordingly. Several apps and voice-assisted devices like Alexa and Siri are already using this remarkable feature. As data scientists use large amounts of data stored on clouds, familiarity with cloud computing software like AWS, Azure, and Google cloud will be beneficial. Learning frameworks like DevOps can help data scientists streamline their work, along with several other such upcoming technologies. Database knowledge, management, and visualization A collection of information organized to easily access, manage, and update the data is called a database. Data scientists must have a strong database knowledge and use its different types to their advantage. Some examples include relational databases like SQL, distributed database, cloud database, and many more. Once this is expertise is established, analyzing the data, database management, and data visualization are also important skills.  Domain knowledge  Domain knowledge about the domain in which data is to be analyzed and predictions will be made is important. One can harness the true power and fullest potential of an algorithm and data only by having specific domain language. Instead of waiting to analyze the data, the goals itself can be shaped towards actionable results with the help of domain knowledge.  Non-technical Skills Communication skills As explained above, once the raw data is processed, it needs to be presented understandably. This does not limit the job to just visually coherent information but also the ability to communicate the insights of these visual representations. The data scientist should be excellent at communicating the results to the marketing team, sales team, business leaders, and other stakeholders. Team player This is related to the previous point. Along with effective communication skills, data scientists need to be good team players, accommodating feedback, and other inputs from business teams. They should also be able to efficiently communicate their requirements to the data engineers, data analysts, and other members of the team. Coordination with their team members can yield faster results and optimal outputs. Business acumen  Since the job of the data scientist ultimately boils down to improving/growing the business, they need to be able to think from a business perspective while outlining their data structures. They should have in-depth knowledge of the industry of their business, the existing business problems of their company, and forecasting potential business problems and their solutions. Critical thinking Apart from finding insights, data scientists need to align these results with the business. They need to be able to frame appropriate questions and steps/solutions to solve business problems. This objective ability to analyze data and addressing the problem from multiple angles is crucial in a data scientist. Intellectual curiosity  According to Harvard Business Review, data scientists spend 80% of their time discovering and preparing data. For this, they must always be a step ahead and catch up with the latest trends. Constant upskilling and a curiosity to learn new ways to solve existing problems quicker can get data scientists a long way in their careers. Taking data-driven decisions Data science is indisputably one of the leading industries today. Whether you are from a technical field or a non-technical background, there are several ways to build up the skill to become a data scientist. From online courses to bootcamps, one should always be a step ahead in this competitive field to build up their data work portfolios. Additionally, reading up on the latest technologies and regular experimentation with new trends is the way forward for aspirants.  
Rated 4.5/5 based on 0 customer reviews
3907
10 Mandatory Skills to Become a Data Scientist

The data science industry is growing at an alarmin... Read More

10 Mandatory Skills to Become an AI & ML Engineer

The world has been evolving rapidly with technological advancements. Out of many of these, we have AI (Artificial Intelligence) and ML (Machine learning). From automated cars to android systems in many phones, apps, and other electronic devices, AI and ML have a wide range of impact on how easy machines and AI can make our lives. Given the pace of the industry growth, the demand for AI and ML engineers is also increasing. So, what are the essential skills to become an AI or ML engineer? Let’s begin by understanding the two concepts first.  AI Engineer vs ML Engineer: Are they the same? Although both AI and ML engineers work under the common umbrella of artificial intelligence, their job roles and responsibilities vary slightly based on the tools and techniques in use, and end-results. Machine Learning chiefly focuses on the accuracy of data, and Artificial Intelligence prioritizes on the chances of succeeding over accuracy. While AI engineers use data for decision-making, ML engineers learn new things from the data. AI engineers use Java, C ++, and other software development tools; while ML engineers are required to know algorithms and data tools like H2O, TensorFlow. Essentially, these two job roles get the same output using different methods. However, many top companies are hiring professionals skilled in both AI and ML. Listed below are some of the top skills that companies look out for.Common skills for Artificial and Machine Learning Technical Skills Programming Languages A good understanding of programming languages, preferably python, R, Java, C++ is necessary. They are easy to learn, and their applications provide more scope than any other language. Python is the undisputed lingua franca of Machine Learning. Linear Algebra, Calculus, Statistics It is recommended to have a good understanding of the concepts of Matrices, Vectors, and Matrix Multiplication. Moreover, knowledge in Derivatives and Integrals and their applications is essential to even understand simple concepts like gradient descent. Whereas statistical concepts like Mean, Standard Deviations, and Gaussian Distributions along with probability theory for algorithms like Naive Bayes, Gaussian Mixture Models, and Hidden Markov Models are necessary to thrive in the industry. Signal Processing Techniques Competence in understanding Signal Processing and the ability to solve several problems using Signal Processing techniques is crucial for feature extraction, an important aspect of Machine Learning. Then we have Time-frequency Analysis and Advanced Signal Processing Algorithms like Wavelets, Shearlets, Curvelets, and Bandlets. A profound theoretical and practical knowledge of these will help you solve complex situations. Applied Math and Algorithms A solid foundation and expertise in algorithm theory are surely a must. This skill set will enable understanding subjects like Gradient Descent, Convex Optimization, Lagrange, Quadratic Programming, Partial Differential equation, and Summations. Neural Network Architectures Machine Learning is used for complex tasks that are beyond human capability to code. Neural networks have been understood and proven to be by far the most precise way of countering many problems like Translation, Speech Recognition, and Image Classification, playing a pivotal role in the AI department. Non-Technical skills Communication Communication is the key in any line of work, AI/ML engineering is no exception. Explaining AI and ML concepts to even to a layman is only possible by communicating fluently and clearly. An AI and ML engineer does not work alone. Projects will involve working alongside a team of engineers and non-technical teams like the Marketing or Sales departments. Domain Knowledge Machine learning projects that focus on major troubling issues are the ones that finish without any flaws. Irrespective of the industry an AI and ML engineer works for, profound knowledge of how the domain works and what benefits the business is crucial. For example, if you have to apply AI or ML in genetic engineering, then you need to have a good understanding of genetic biology. Proper domain knowledge also facilitates in interpreting potential challenges and enabling the continual running of the business. Rapid Prototyping It is quite critical to keep working on the perfect idea with the minimum time consumed. Especially in Machine Learning, choosing the right model along with working on projects like A/B testing holds the key to a project’s success. Rapid Prototyping helps in forming different techniques to fasten developing a scale model.  Additional skills for Machine Learning Language, Audio and Video Processing With Natural Language Processing, AI and ML engineers get the chance to work with two of the foremost areas of work: Linguistics and Computer Science like text, audio, or video. An AI and ML engineer should be well versed with libraries like Gensim, NLTK, and techniques like word2vec, Sentimental Analysis, and Summarization. Physics, Reinforcement Learning, and Computer Vision Physics: There will be real-world scenarios that require the application of machine learning techniques to systems, and that is when the knowledge of Physics comes into play. Reinforcement Learning: The year, 2017 witnessed Reinforcement Learning as the primary reason behind improving deep learning and artificial intelligence to a great extent. This will act as a helping hand to pave the way into the field of robotics, self-driving cars, or other lines of work in AI. Computer Vision: Computer Vision (CV) and Machine Learning are the two major computer science branches that can separately work and control very complex systems, systems that rely exclusively on CV and ML algorithms but can bring more output when the two work in tandem. 
Rated 4.5/5 based on 0 customer reviews
3636
10 Mandatory Skills to Become an AI & ML Engineer

The world has been evolving rapidly with technolog... Read More