Automation and machine learning have changed our lives. From the most technologically savvy person working in leading digital platform companies like Google or Facebook to someone who is just a smartphone user, there are very few who have not been impacted by artificial intelligence or machine learning in some form or the other; through social media, smart banking, healthcare or even Uber. From self – driving Cars, robots, image recognition, diagnostic assessments, recommendation engines, Photo Tagging, fraud detection and more, the future for machine learning and AI is bright and full of untapped possibilities.

With the promise of so much innovation and path-breaking ideas, any person remotely interested in futuristic technology may aspire to make a career in machine learning. But how can you, as a beginner, learn about the latest technologies and the various diverse fields that contribute to it? You may have heard of many cool sounding job profiles like Data Scientist, Data Analyst, Data Engineer, Machine Learning Engineer etc., that are not just rewarding monetarily but also allow one to grow as a developer and creator and work at some of the most prolific technology companies of our times. But how do you get started if you want to embark on a career in machine learning? What education background should you pursue and what are the skills you need to learn? Machine learning is a field that encompasses probability, statistics, computer science and algorithms that are used to create intelligent applications. These applications have the capability to glean useful and insightful information from data that is useful to arrive business insights. Since machine learning is all about the study and use of algorithms, it is important that you have a base in mathematics.

Why do I need to Learn Math?

Math has become part of our day-to-day life. From the time we wake up to the time we go to bed, we use math in every aspect of our life. But you may wonder about the importance of math in Machine learning and whether and how it can be used to solve any real-world business problems.

Whatever your goal is, whether it’s to be a Data Scientist, Data Analyst, or Machine Learning Engineer, your primary area of focus should be on “Mathematics”. Math is the basic building block to solve all the Business and Data driven applications in the real-world scenario. From analyzing company transactions to understanding how to grow in the day-to-day market, making future stock predictions of the company to predicting future sales, Math is used in almost every area of business. The applications of math are used in many Industries like Retail, Manufacturing, IT to bring out the company overview in terms of sales, production, goods intake, wage paid, prediction of their level in the present market and much more.

Pillars of Machine Learning

To get a head start and familiarize ourselves with the latest technologies like Machine learning, Data Science, and Artificial Intelligence, we have to understand the basic concepts of Math, write our own Algorithms and implement existing Algorithms to solve many real-world problems.

There are four pillars of Machine Learning, in which most of our real-world business problems are solved. Many algorithms in Machine Learning are also written using these pillars. They are

Statistics

Probability

Calculus

Linear Algebra

Machine learning is all about dealing with data. We collect the data from organizations or from any repositories like Kaggle, UCI etc., and perform various operations on the dataset like cleaning and processing the data, visualizing and predicting the output of the data. For all the operations we perform on data, there is one common foundation that helps us achieve all of this through computation-- and that is Math.

STATISTICS

It is used in drawing conclusions from data. It deals with the statistical methods of collecting, presenting, analyzing and interpreting the Numerical data. Statistics plays an important role in the field of Machine Learning as it deals with large amounts of data and is a key factor behind growth and development of an organization.

Collection of data is possible from Census, Samples, Primary or Secondary data sources and more. This stage helps us to identify our goals in order to work on further steps.

The data that is collected contains noise, improper data, null values, outliers etc. We need to clean the data and transform it into a meaningful observations.

The data should be represented in a suitable and concise manner. It is one of the most crucial steps as it helps to understand the insights and is used as the foundation for further analysis of data.

Analysis of data includes Condensation, Summarization, Conclusion etc., through the means of central tendencies, dispersion, skewness, Kurtosis, co-relation, regression and other methods.

The Interpretation step includes drawing conclusions from the data collected as the figures don’t speak for themselves.

Statistics used in Machine Learning is broadly divided into two categories, based on the type of analyses they perform on the data. They are Descriptive Statistics and Inferential Statistics.

a) Descriptive Statistics

Concerned with describing and summarizing the target population

It works on a small dataset.

The end results are shown in the form of pictorial representations.

The tools used in Descriptive Statistics are – Mean, Median, Mode which are the measures of Central and Range, Standard Deviation, variance etc., which are the measures of Variability.

b) Inferential Statistics

Methods of making decisions or predictions about a population based on the sample information.

It works on a large dataset.

Compares, tests and predicts the future outcomes.

The end results are shown in the probability scores.

The specialty of the inferential statistics is that, it makes conclusions about the population beyond the data available.

Hypothesis tests, Sampling Distributions, Analysis of Variance (ANOVA) etc., are the tools used in Inferential Statistics.

Statistics plays a crucial role in Machine Learning Algorithms. The role of a Data Analyst in the Industry is to draw conclusions from the data, and for this he/she requires Statistics and is dependent on it.

PROBABILITY

The word probability denotes the happening of a certain event, and the likelihood of the occurrence of that event, based on old experiences. In the field of Machine Learning, it is used in predicting the likelihood of future events. Probability of an event is calculated as

P(Event) = Favorable Outcomes / Total Number of Possible Outcomes

In the field of Probability, an event is a set of outcomes of an experiment. The P(E) represents the probability of an event occurring, and E is called an Event. The probability of any event lies in between 0 to 1. A situation in which the event E might occur or not is called a Trail.

Some of the basic concepts required in probability are as follows

Joint Probability: P(A ∩ B) = P(A). P(B), this type of probability is possible only when the events A and B are Independent of each other.

Conditional Probability: It is the probability of the happening of event A, when it is known that another event B has already happened and is denoted by P (A|B) i.e., P(A|B) = P(A ∩ B)/ P(B)

Bayes theorem: It is referred to as the applications of the results of probability theory that involve estimating unknown probabilities and making decisions on the basis of new sample information. It is useful in solving business problems in the presence of additional information. The reason behind the popularity of this theorem is because of its usefulness in revising a set of old probabilities (Prior Probability) with some additional information and to derive a set of new probabilities (Posterior Probability).

From the above equation it is inferred that “Bayes theorem explains the relationship between the Conditional Probabilities of events.” This theorem works mainly on uncertainty samples of data and is helpful in determining the ‘Specificity’ and ‘Sensitivity’ of data. This theorem plays an important role in drawing the CONFUSION MATRIX.

Confusion matrix is a table-like structure that measures the performance of Machine Learning Models or Algorithms that we develop. This is helpful in determining the True Positive rates, True Negative Rates, False Positive Rates, False Negative Rates, Precision, Recall, F1-score, Accuracy, and Specificity in drawing the ROC Curve from the given data.

We need to further focus on Probability distributions which are classified as Discrete and Continuous, Likelihood Estimation Functions etc. In Machine Learning, the Naive Bayes Algorithm works on the probabilistic way, with the assumption that input features are independent.

Probability is an important area in most business applications as it helps in predicting the future outcomes from the data and takes further steps. Data Scientists, Data Analysts, and Machine Learning Engineers use this probability concept very often as their job is to take inputs and predict the possible outcomes.

CALCULUS:

This is a branch of Mathematics, that helps in studying rates of change of quantities. It deals with optimizing the performance of machine learning models or Algorithms. Without understanding this concept of calculus, it is difficult to compute probabilities on the data and we cannot draw the possible outcomes from the data we take. Calculus is mainly focused on integrals, limits, derivatives, and functions. It is divided into two types called Differential Statistics and Inferential Statistics. It is used in back propagation algorithms to train deep Neural Networks.

Differential Calculus splits the given data into small pieces to know how it changes.

Inferential Calculus combines (joins) the small pieces to find how much there is.

Calculus is mainly used in optimizing Machine Learning and Deep Learning Algorithms. It is used to develop fast and efficient solutions. The concept of calculus is used in Algorithms like Gradient Descent and Stochastic Gradient Descent (SGD) algorithms and in Optimizers like Adam, Rms Drop, Adadelta etc.

Data Scientists mainly use calculus in building many Deep Learning and Machine Learning Models. They are involved in optimizing the data and bringing out better outputs of data, by drawing intelligent insights hidden in them.

Linear Algebra:

Linear Algebra focuses more on computation. It plays a crucial role in understanding the background theory behind Machine learning and is also used for Deep Learning. It gives us better insights into how the algorithms really work in day-to-day life, and enables us to take better decisions. It mostly deals with Vectors and Matrices.

A scalar is a single number.

A vector is an array of numbers represented in a row or column, and it has only a single index for accessing it (i.e., either Rows or Columns)

A matrix is a 2D array of numbers and can be accessed with the help of both the indices (i.e., by both rows and columns)

A tensor is an array of numbers, placed in a grid in a particular order with a variable number of axes.

The package named Numpy in the Python library is used in computation of all these numerical operations on the data. The Numpy library carries out the basic operations like addition, subtraction, Multiplication, division etc., of vectors and matrices and results in a meaningful value at the end. Numpy is represented in the form of N-d array.

Machine learning models cannot be developed, complex data structures cannot be manipulated, and operations on matrices would not have been performed without the presence of Linear Algebra. All the results of the models are displayed using Linear Algebra as a platform.

Some of the Machine Learning algorithms like Linear, Logistic regression, SVM and Decision trees use Linear Algebra in building the algorithms. And with the help of Linear Algebra we can build our own ML algorithms. Data Scientists and Machine Learning Engineers work with Linear Algebra in building their own algorithms when working with data.

How do Python functions correlate to Mathematical Functions?

So far, we have seen the importance of Mathematics in Machine Learning. But how do Mathematical functions corelate to Python functions when building a machine learning algorithm? The answer is quite simple. In Python, we take the data from our dataset and apply many functions to it. The data can be of different forms like characters, strings, numerical, float values, double values, Boolean values, special characters, Garbage values etc., in the data set that we take to solve a particular machine learning problem. But we commonly know that the computer understands only “zeroes & ones”. Whatever we take as input to our machine learning model from the dataset, the computer is going to understand it as binary “Zeroes & ones” only.

Here the Python functions like “Numpy, Scipy, Pandas etc.,” mostly use pre-defined functions or libraries. These help us in applying the Mathematical functions to get better insights of the data from the dataset that we take. They help us to work on different types of data for processing and extracting information from them. Those functions further help us in cleaning the garbage values in data, the noise present in data and the null values present in data and finally help to make the dataset free from all the unwanted matter present in it. Once the data is preprocessed with the Python functions, we can apply our algorithms on the dataset to know which model works better for the data and we can find the accuracies of different algorithms applied on our dataset. The mathematical functions help us in visualizing the content present in the dataset, and helps to get better understanding on the data that we take and the problem we are addressing using a machine learning algorithm.

Every algorithm that we use to build a machine learning model has math functions hidden in it, in the form of Python code. The algorithm that we develop can be used to solve a variety of things like a Boolean problem or a matrix problem like identifying an image in a crowd of people and much more. The final stage is to find the best algorithm that suits the model. This is where the mathematical functions in the Python language help us. It helps to analyze which algorithm is best through comparison functions like correlation, F1 score, Accuracy, Specificity, sensitivity etc. Mathematical functions also help us in finding out if the selected model is overfitting or underfitting to the data that we take.

To conclude, we cannot apply the mathematical functions directly in building machine learning models, so we need a language to implement the mathematical strategies in the algorithm. This is why we use Python to implement our math models and draw better insights from the data. Python is a suitable language for implementations of this type. It is considered to be the best language among the other languages for solving real-world problems and implementing new techniques and strategies in the field of ML & Data Science.

Conclusion:

For machine learning enthusiasts and aspirants, mathematics is a crucial aspect to focus on, and it is important to build a strong foundation in Math. Each and every concept you learn in Machine Learning, every small algorithm you write or implement in solving a problem directly or indirectly has a relation to Mathematics. The concepts of math that are implemented in machine learning are built upon the basic math that we learn in 11th and 12th grades. It is the theoretical knowledge that we gain at that stage, but in the area of Machine Learning we experience the practical use cases of math that we have studied earlier. The best way to get familiar with the concepts of Mathematics is to take a Machine Learning Algorithm, find a use case, and solve and understand the math behind it. An understanding of math is paramount to enable us to come up with machine learning solutions to real world problems. A thorough knowledge of math concepts also helps us enhance our problem-solving skills.

Harsha Vardhan Garlapati is a Data Science Enthusiast and loves working with data to draw meaningful insights from it and further convert those results and implement them in business growth. He is a final year undergraduate student and passionate about Data Science. He is a smart worker, passionate learner, an Ice-Breaker and loves to participate in Hackathons to work on real time projects. He is a Toastmaster Member at S.R.K.R Toastmasters Club, a Public Speaker, a good Innovator and problem solver.