Artificial Intelligence is an advanced technological field that is evolving every day. From face tagging to defense robots and autonomous driving cars, we can define numerous use cases of artificial intelligence. Due to this extensive list of applications, it is said that the next disruption will be done by AI and its related sub-domains. Many professionals are either learning to enter or switching to this domain. This article aims to help professionals with artificial intelligence interview questions by covering basic questions to scenario-based interview questions. Whether you are a beginner or an intermediate or an experienced AI professional, this guide will help you to increase your confidence and knowledge in artificial intelligence. This article covers the frequently asked artificial intelligence interview questions and answers including machine learning to help you get prepared for your next interview.
Organizations from various industries, including manufacturing, customer service, information technology, etc., are employing artificial intelligence to develop systems and machines that can perform jobs that humans find difficult by minimizing delays and errors and maximizing productivity.
AI may be employed to create more effective and devastating weapons. AI is a tool that cybercriminals can employ to hack humans. Numerous end-of-the-world prophets have even envisioned a situation where artificial intelligence will rule, and humans will be reduced to slaves.
Machine learning is a subset of artificial intelligence that uses mathematical models to enable a system to keep picking up new skills and improve based on experience. On the other hand, artificial intelligence imitates human cognitive processes like learning and problem-solving using logic and math.
I believe that the employment of AI for a noble human cause will make it more significant than anything else. AI can aid in developing vaccinations and treatments for diseases that are now incurable. Additionally, it can aid in developing robotic arms that can aid in sensitive surgeries that are difficult for humans to do. It can assist in developing systems that enable especially abled persons to lead normal lives.
There are billions of linked neurons in the human brain. Interconnections are extremely complicated with billions of connections among billions of neurons. Numerous millions of additional neurons are connected to each link. The cell body is what makes up each neuron in the brain. There is only one axon and one cell body, yet many dendrites are linked. Information from the outside is sent to the cell by the dendrites. The axons in the brain cell help to link the neurons. Two neurons can interact with one another due to these connections or edges. The weight connected to an edge determines how many interactions there are between two neurons. A larger edge weight would indicate higher interactions, and a lower edge weight would indicate lower interactions.
An artificial neural network is designed to mimic the human brain. This design comprises an input layer, one or more hidden layers, and an output layer. There are one or more nodes in each layer. These nodes can accept many inputs and outputs. These nodes are trainable because they have adjustable weights and thresholds. The neuron must adjust its threshold or weights if the intended output is not produced. The data is fed into the artificial neural network through an input layer. Data is transmitted from the input layer to one of the hidden layers. The hidden layer processes the data and, if another hidden layer is available, transmits the output to it. Otherwise, it sends it to the output layer. The output layer then provides the final output. There may be one or more nodes in each layer of the network.
An AI system has a wide range of skills or features. The capacity to make precise decisions by being given enough data to identify patterns is one of these features. Several decision-makers and several different criteria make complex decisions. The systems can reason logically, much like the human brain. Your experiences grow with each new learning encounter. Your experiences can improve your selections, which can also open more favorable prospects. We can use Google Photos as an illustration because it can tag people in the pictures. The user may occasionally be prompted to describe the person in the image. With the information you provide, it can learn and further enhance its capabilities.
What are the activation functions in a neural network? Why do we prefer non-linear activation functions?
Expect to come across this popular question in AI interview questions for freshers. We may require coffee or breakfast to increase the effectiveness of our mornings. Similarly, the input layer might need to be activated in a certain way to get precise output. The artificial neural network's output is determined by computing this activation function over the net input. The input is linked to an integration function that combines external evidence, information, and activation. To guarantee that a neuron's response is bounded, non-linear activation functions are applied. When a signal is fed through a multilayer network with linear activation functions, the output obtained remains the same as that could be obtained using a single-layer network. Due to this, nonlinear activation functions are used in multilayer networks compared to linear functions.
In contrast to rule-based systems, learning-based systems apply general Al through the systems' capacity for learning. A learning system is envisioned to have an infinite capacity to simulate intelligence. It is said to possess adaptive intelligence, or the capacity to learn. It aids in changing the volume of currently held information. It also makes it easier to learn new things. That is what distinguishes learning from rule-based testing. An artificial neural network is an example of a learning system.
The rule-based approach defines a set of rules to arrive at a particular outcome. These rules need to be altered manually. Machines are not expected to learn from events. Instead, it will only work with established rules. Therefore, in situations where no rules have been formulated, it gets stuck and is not able to solve it. It may not even understand the problem. These systems are hard to maintain due to the manual efforts required to define rules for unseen events. Also, they are not very useful in solving complex problems.
The main distinction between rule-based systems and learning-based systems is that rule-based systems are manually programmed, whereas self-learning systems are automatically trained by computers. In other words, instead of receiving explicit instructions from humans, self-learning systems learn through their own experiences. Self-learning systems analyze a lot of historical data and decide depending on what they've learned from it.
Humans make decisions and see patterns in their daily lives. We can achieve the same with neural networks. Everywhere, AI is being used increasingly. For instance, Siri and Alexa are becoming increasingly popular in automating some manual duties like setting alarms, reading notifications, turning on or off lights, and playing music. We have made some progress in bridging the gap between languages thanks to language translation software and programs like Google Translate. AI-based suggestions and recommendations, which we frequently see in e-commerce platforms and other applications, have the potential to increase our spending by considering our preferences. With the development of AI, targeted advertising has become more popular and profitable for businesses.
This is a frequently asked question in Artificial Intelligence interviews questions. Each neuron in an artificial neural network has a direct communication link with every other neuron, and each of these links carries weight. The weights have data about the input signal that the network uses to solve a problem. If we are looking at ‘m’ input nodes and ‘n’ output nodes, then all m x n weights can be represented by a weight matrix of ‘m’ rows and ‘n’ columns.
One way to think of bias is as a constant. For instance, the initial compensation may be viewed as a bias in a salary increase every year by a certain percentage. The constant in the straight-line equation acts like a bias. The bias included in the network has an impact on calculating the net input. Bias is included by adding a component to the input vector. It can be considered another weight that plays a major role in determining the network's output. The bias can be of two types: positive bias and negative bias. The positive bias helps increase the net input of the network, and the negative bias helps decrease its net input.
The threshold is used in activation functions based on which the neural network's final output is calculated. The threshold value defines a limit that is compared against the net input of the network based on which the final output is calculated. For example, in a ramp function, we have thresholds defined at 0 and 1. Based on the comparison of the net input value ‘x’ and the threshold limits, the output is generated. The threshold is usually represented as ‘θ.’
Create an artificial neural network with multi-input and single output having a hidden layer and explain each component.
Consider that in the multi-input architecture, and we have 3 input variables or neurons present in the input layer. The output layer will consist of a single neuron representing a single output network. We have one hidden layer in the network with three neurons. Each input neuron is connected to the hidden layer's other neurons. Let w11, w12, and w13 be the weights connected from input neuron x1 to neurons z1, z2, and z3 present in the hidden layer, respectively. Similarly, we have w21, w22, and w23 from input neuron x2 to neurons z1, z2, and z3, respectively. Followed by, w31, w32, and w33 from input neuron x3 to neurons z1, z2, and z3, respectively. The weights of the links present between the hidden layer and the output layer are v1, v2, and v3, connecting the respective neurons from each of these layers. The output is also connected to a bias term ‘b.’ The neurons from the hidden layers can also be connected to some biased term.
The model is trained on a labeled dataset in supervised learning. It means that both the raw input data and the outcome are present. We separate the data into a training dataset and a test dataset. Our network is trained using the training data, and the test data is used to act as new input data for output prediction or to assess the model's correctness. The model learns from the predetermined outcomes in this sort of learning. Due to the short training period, this model performs at a rapid rate. The algorithm is aware of the patterns of input that lead to the desired results. Once trained, the system can predict the right answer from a new input. Predicting future sales using historical data is an example of supervised learning.
In unsupervised learning types, the training input is neither classified nor labeled. The objective of unsupervised learning is to understand the patterns hidden in the data. Once a model learns to understand the patterns, it can predict them for any new dataset. The system does not understand the correct output but explores the data and can draw inferences from the given datasets to describe the latent structures from unlabeled data. If we feed the images of a cycle, car, and helicopter as raw inputs, our model will easily differentiate among the three. However, it will not be able to tell whether a given cluster is of a cycle or not, as it is unlabeled. Finding customer segments based on common characteristics such as demographics or behaviors is an example of unsupervised learning.
There are no labeled datasets or results tied to the data in reinforcement learning. Because of this, the accurate target output values are unknown. Therefore, the only way to successfully predict a given input is to gain experience. In this case, the exact information is unavailable; only the critical information is. An algorithm receives positive reinforcement for each wise move or choice it makes.
On the other hand, negative reinforcement is given for every incorrect action. It gains knowledge of the types of actions that must be taken. Industrial automation can benefit from this kind of learning.
This is one of the most frequently asked AI questions and answers. Neural networks learn through an element of feedback. The feedback allows the network to evaluate the model output. It is like the feedback mechanism that we see in humans. If we touch something hot, we immediately take back our hands and, next time, stay cautious of hot substances. Our senses provide us with feedback during our first encounter with hot substances that might hurt us. Artificial neural networks have similar learning patterns through feedback. Based on the feedback, the threshold and the weights are modified. This process is repeated with every input, and the system continues to learn.
The McCullouch-Pitts neuron or M-P neuron is one of the most basic neural networks. The network has a binary activation function and suits for binary tasks. The M-P neurons are connected by directed weighted links that may be positive or negative. There is a fixed threshold for each neuron, and if the net input to it is greater than the threshold, the neuron output is positive. For the net input less than the threshold, the neuron's output is negative. These are mostly used in logic functions.
The learning rate of an artificial neural network determines the rate at which we want the algorithm to perform the training of the network. Learning refers to the weight adjustment at each step of training. This can be controlled using the learning rate. It is represented using ‘α.’ The value of the learning rate ranges from 0 to 1. It is determined at each step during the training process. A smaller learning rate might give the most accurate results but requires more time and computing resources. A higher learning rate will train the network quickly but might not produce the best results. Therefore, deciding on a learning rate will provide the right balance is important. This can be decided after running through certain iterations of training using a random learning rate and then adjusting it accordingly.
An AI system processes the data we input into it, performs intricate mathematical calculations, and outputs a result. It is impossible for humans to comprehend the algorithms and intricate mathematics required to carry out this procedure manually, and it would take a lot of time. A situation like this is referred to as "black-box learning." Black-box models come in many forms, including random forests, neural networks, and others. Models with simple logic and straightforward interpretation include linear regression, KNN, decision trees, etc. These kinds of models are known as white box models since the underlying logic or algorithm is simple to comprehend.
Python is the most widely used language in Data Analytics, Machine Learning, and Artificial Intelligence. The different frameworks and libraries make the python programming language the most versatile library for these applications. Some of these libraries include:
The different applications of Artificial Intelligence are:
In this type of learning, the system automatically identifies patterns and tries to make estimates without requiring clear programming instructions.
Robotics entails the creation, maintenance, use, and operation of robots. Robotics aims to create devices that can aid and support people.
A self-driving car often referred to as an autonomous car, is a vehicle that has been automated to the point where it can sense its surroundings and move safely with little to no human intervention. Tesla is renowned for building driverless vehicles.
Natural Language Processing
The ability of computer software to comprehend spoken and written human language is known as natural language processing (NLP). It is a part of artificial intelligence. For instance, the typing assistant Grammarly employs AI to check for spelling, grammatical, and punctuation errors.
Speech recognition is an AI technology that enables the recognition and translation of spoken language into text by computers Services such as Amazon Alexa and Google Assistant is an examples of speech recognition.
Personalized experiences on websites and apps that are determined by a user's location, search history, etc., are an application of AI. Amazon, Flipkart, Google, Instagram, Facebook, etc., can provide AI-generated recommendations based on each user's experience.
Data is the core aspect of any Artificial Intelligence system. Data for AI systems can be categorized as follows:
This sort of data is typically kept in relational databases in tabular form and has a specified data schema. Train timetables, stock information, and sales data from a certain retailer are a few typical examples of this type of data.
There is no fixed structure for this kind of data. It can appear in any way. Most data in the world is available in this format. Audio data, video or image data, textual data, etc., can be categorized as unstructured data.
The structure of this type of data combines the advantages of both structured and unstructured data. This information is not rationally arranged. Nevertheless, it features a few different types of tags and markers that give it a recognizable structure. JSON file formats are an example of semi-structured data.
Consider we are working on a regression task where 3 input variables or neurons are present in the input layer. The output layer will consist of a single neuron which will output a real value for the regression. We have one hidden layer in the network with three neurons. Each input neuron is connected to the hidden layer's other neurons. Let w11, w12, and w13 be the weights connected from input neuron x1 to neurons z1, z2, and z3 present in the hidden layer, respectively. Similarly, we have w21, w22, and w23 from input neuron x2 to neurons z1, z2, and z3, respectively. Followed by, w31, w32, and w33 from input neuron x3 to neurons z1, z2, and z3, respectively. The weights of the links present between the hidden layer and the output layer are v1, v2, and v3, connecting the respective neurons from each of these layers. The output is also connected to a bias term ‘b.’
There are several iterations that are performed before we come up with some satisfactory results. While we are propagating from the input layer to the hidden layer, x1, x2, and x3 act as the input neurons, which are multiplied to the respective weights while traversing to the hidden layer. The hidden layer will sum all the inputs it receives from this link and outputs it to the next layer. Before the next layer consumes the output from the hidden layer as its input, it is again multiplied with its respective weights v1, v2, and v3. The output from neuron y is then compared to the desired output. The loss is calculated as the difference between the actual output from the network and the desired output. The weights are then adjusted with respect to this loss, and the same steps are repeated in the next iteration. The network starts to minimize the loss, and then there comes the point where the loss starts to increase. In that instance, we can consider reducing the learning rate to reach the minimum value. This is like gradient descent methodology.
Common interview questions for artificial intelligence, don't miss this one. The back propagation-based artificial neural networks follow the backpropagation algorithm, which aims to generalize the training set to achieve the network’s ability to generate the desired output.
Alan Turing is credited with deciphering Nazi codes and assisting the Allies in winning World War I. He is also credited for creating the Turing Test and becoming modern computers' founder. The test was first created to determine whether a conversation between humans and artificial intelligence, displayed simply in text, could trick a human. A machine is said to have human intelligence if it can have a conversation with a human without being recognized as a machine. However, it has since become shorthand for any Al that can fool a person into trusting they are witnessing or interacting with a real human.
Once an AI system has learned something, it can continue to build on its existing knowledge. Every artificial neural network requires massive amounts of data for effective training of the model. Therefore, it is not a good idea to train a neural network that requires massive amounts of data to be trained from scratch. Instead, we can always find a pre-trained model that can achieve similar tasks. The pre-trained model is reused in a new learning model. If the two models are developed to perform similar tasks, then generalized knowledge can be shared between them. We reuse the lower layers of the pre-trained model, which not only requires less training data but also speeds up the training time significantly. This idea of re-training on a pre-trained model instead of training it from scratch is known as transfer learning. Transfer learning is becoming increasingly popular with Google, Microsoft, Hugging face, etc., training models for a widespread use case on the already available big data with them. For instance, to train a text similarity model or sentiment analyzer model, we can make use of pre-trained models like BERT or MLNet and perform transfer learning on these models to generalize to a specific use case.
Natural language processing is a subfield of artificial intelligence concerned with the interactions between computers and human language, with a focus on how to design computers to handle and interpret massive volumes of natural language data.
Following are some of the applications of Natural Language Processing
By using software to translate text or speech from one language to another, machine translation aids in overcoming language barriers. To translate text, documents, and webpages from one language into another, Google developed the machine translation tool known as Google Translate.
Automatic summarization is useful for summarizing the contextual meaning of documents and information while maintaining the emotional meanings hidden inside the information. Automatic summarization is particularly useful when we wish to get an overview of a news item or blog post while avoiding redundancy from multiple sources.
Companies make use of NLP applications, like sentiment analysis, to identify opinions and sentiments online to understand what users feel about their products and services.
Text classification enables us to assign predefined categories to a document to organize, structure, and filter the information. For example, an application of text categorization is spam filtering in email.
Virtual assistants use natural language processing (NLP) to understand user text or voice input and even respond to them or perform certain actions. For example, Siri by Apple and Alexa by Amazon is the most popular and widely used virtual assistants.
Explain gradient descent. Mention the notable differences between batch, stochastic and mini-batch gradient descent.
It's no surprise that this one pops up often in artificial intelligence interview questions. The cost function is used to measure the performance of an AI model by computing the error between predicted and expected values. The gradient descent helps minimize this cost function by tuning the model parameters. Not every solution has a coinciding local and global minimum. There are cases where we have multiple local minimums, but there is always only one global minimum. The cost function is the minimum at the global minimum. However, it is not always possible to reach the global minimum, so the idea is to reach as much as close to it. This is controlled by the learning rate. The batch, stochastic, and mini-batch gradient descent are different techniques of gradient descent to reach a solution close to the optimal.
At each stage, Batch Gradient Descent employs the complete set of training data. Its performance on exceptionally large training sets is, therefore, terrible. As a result, it is not preferred as we usually deal with Big Data regarding AI solutions. Every time a step is completed, stochastic gradient descent simply chooses a random instance from the training set and computes the gradients based solely on that one occurrence. This obviously speeds up the process because there is a lot fewer data for it to deal with at each iteration. Since only one instance must be kept in memory during each iteration, it also allows for training large training sets. Mini-batch gradient descent computes the gradients on small random groups of examples termed mini batches rather than computing the gradients based on the entire training set as in batch gradient descent or based on only one instance as in stochastic gradient descent. Performance can be improved, which is the fundamental benefit of mini-batch gradient descent over stochastic gradient descent.
There are a lot of hyperparameters in a neural network that one can tweak. We can change the type of activation function, the number of hidden layers, the number of neurons in a hidden layer, weight initializations, etc. To start with, we can begin with a single or two hidden layers. A deep neural network contains more hidden layers. We can then increase the number of neurons in each hidden layer before trying to add another layer to the architecture. A common practice is to size them to form a funnel, with fewer and fewer neurons at each layer. Finding a perfect number of hidden layers and neurons in each layer is a difficult job and does require tuning. A simpler approach is to pick a model with more layers and neurons than you need, then use early stopping to prevent it from overfitting. However, we can use grid search with cross-validation to find the right hyperparameters.
The decision tree is one of the most basic techniques used today for data classification. It is a part of tree neural networks. We employ a decision-tree-like model that includes every potential outcome. Numerous nodes make up a decision tree. The Al system uses a test at each node to move the query down a node. Up until the query is sent to the terminal or leaf node, this process continues. There is a preassigned value for each leaf node. This value is regarded as the tree's output. This tree can alternatively be seen as a flowchart with the root node at the top and the decision-making root node at the bottom.
The following are some of the terms frequently associated with decision trees:
In the process of Text Normalizations, we undertake several steps to normalize the text to a lower level from multiple documents. The textual data from all the documents altogether is called a corpus. The steps involved in text normalization are –
Sentence tokenization is also known as sentence segmentation. In this step, a string of written language is divided into its component sentences. In human languages, we can split sentences into further smaller sentences whenever we see a punctuation mark. For example, the sentence “This is amazing! I won the competition.” Will be tokenized into two sentences – “This is amazing!” and “I won the competition.”
Word tokenization is also known as word segmentation. In this process, each sentence is further divided into component words or tokens. In English and many other languages that are based on Latin alphabets, space is a good estimate of a word divider. For example, “interview questions artificial intelligence” when tokenized produces [‘interview,’ ‘questions,’ ‘artificial,’ ‘intelligence’].
Text Lemmatization and Stemming
For grammar rules, documents can have various forms of a word, such as run, runs, and running. Moreover, sometimes, we have related words with similar meanings, such as nation, national, and nationality. Two of the specialized cases of normalization are Stemming and lemmatization. However, they are different from each other.
Stemming refers to a crude heuristic process that involves chopping off the ends of the words with the aim of achieving this goal correctly. Stemming algorithms work by cutting off the common prefixes or suffixes that can be found in a word. This cutting does not always produce a successful result, i.e., a meaningful word is not always obtained after stemming. For example, the stemming operation for collected will be collected, but for closed will be closed.
Lemmatization refers to performing tasks properly with vocabulary and morphological analysis of words. It is done with the aim of removing inflectional endings only and obtaining the base or dictionary form of a word, known as the lemma. For example, the stemming operation for collected will be collected, and for closed will be closed.
Bag of Words:
Machine learning algorithms cannot perform with raw text. The text is first converted into vectors of numbers. This process is known as feature extraction. The bag of words is an extremely popular model which uses a feature extraction technique to work with the text. It explains the occurrence of each word within a document. We can use this model by designing a vocabulary of known words (also called tokens) and choosing a measure of the presence of the known words. Any information about the order or structure of words is rejected. That is why we call it a bag of words. This model attempts to find if a known word occurs in a document, but it does not know the location of the word in the document. The perception is that similar documents have similar content. We can also learn something about the meaning of the document from its content page.
To implement a bag of words algorithm, we must follow the following steps:
Term Frequency and Inverse Document Frequency (TFIDF)
Term Frequency and Inverse Document Frequency (TFIDF) is a statistical measure that evaluates the importance of a word to a document in a corpus. The TFIDF scoring value increases in proportion to the number of times a word appears in the document, but it is equalized by the number of documents in the corpus that contains the word.
The following formula is used to calculate a TF-IDF score for a given term x within document y.
A staple in AI Interview Questions, be prepared to answer this one. The most common and early activation functions include:
Identity function – It is a linear activation function where the output remains the same as the input.
Binary step function – It is most widely used in single-layer artificial networks to convert the net input to a binary output representing 1s and 0s. Here, the value θ represents the threshold. If the value of x is greater than the threshold, then the activation function outputs 1, else 0.
Computer Vision is a domain of AI that gains information from digital images and videos. It helps to understand and automate tasks that the human visual system can do. The several applications of Computer Vision include:
The idea of conditional probability considers data on the likelihood of one event occurring while estimating the likelihood of another event. This idea can be expanded upon to calculate the likelihood that a certain effect was caused by a particular cause and to adjust probabilities in the context of new knowledge. The Bayes theorem describes the process for updating this probability. The Bayesian network is based on the Bayes theorem and may be used to respond to probabilistic queries in AI. Because they are constructed from a probability distribution and employ probability theory for anomaly detection and prediction, Bayesian networks are probabilistic. Graphical representations of the probability link between a set of variables are called Bayesian networks. There are several edges in this directed cycle graph, and each edge denotes a conditional reliance. Document classification, semantic search, spam filter, image processing, etc., are some of the applications of a Bayesian network.
Overfitting is the term used to describe a model that performs well on training data but not well on test data. Overfitting occurs when the model is excessively complicated in comparison to the volume and granularity of the training data. The following options may be used to address the issue:
Underfitting normally occurs when your model is too basic to understand the fundamental structure of the data. The principal solutions to this issue are:
A data pipeline is a series of components used in the data processing. Usually, components operate asynchronously. Each component in the pipeline takes in a significant quantity of data, processes it, and outputs the result in a different data store. Later, the following component in the pipeline takes this data and produces its own output, and so on. Each component is mostly self-contained; the data store acts as the only contact between them. This makes the system easy to understand, and several teams can concentrate on various components. Additionally, if a component malfunctions, the downstream components can frequently keep operating normally by using its last output.
One of the most frequently posed artificial intelligence Interview Questions, be ready for it. Convolution Neural Networks or CNN is a popular artificial deep neural network used widely in image recognition applications. They also work well with audio signal input and text data. The three main types of layers of CNN are the convolution layer, pooling layer, and fully connected layer.
The central component of a CNN is the convolutional layer, where most computation takes place. It needs input data, a filter, and a feature map, among other things. Assume that the input will be an RGB color image. As a result, the input will have three dimensions; height, width, and depth, which is analogous to RGB in an image. Only the pixels in their receptive fields are connected to the neurons in the first convolutional layer, not every pixel in the input image. Each neuron in the second convolutional layer is, therefore, entirely connected to neurons situated within the first layer's receptive regions of the image. With this architecture, the network can focus on low-level features in the first hidden layer, put them together into higher-level features in the next hidden layer, and so on. This process is known as convolution.
The number of parameters in the input is decreased via dimensionality reduction carried out through the pooling layer. The pooling operation sweeps a filter across the entire input and populates the output array by applying an aggregation function (minimum, maximum, average) to the values in the receptive field. Both convolution and pooling layers use the ReLU activation function.
Each node in the output layer of the fully connected layer is directly connected to a node in the layer below it. This layer conducts the classification operation using the features extracted using the various filters and preceding layers. Fully connected layers often utilize a SoftMax activation function to categorize inputs appropriately, producing a probability ranging from 0 to 1.
Pooling is an operation that enables the use of a filter on the feature data set such that we can preserve the features and account for any kind of distortions. It is one of the steps involved in convolution neural networks. The idea behind pooling is to be able to recognize images that might represent the same entity, say a cat but might be distorted. The cat can be sleeping or walking, and only the face is visible in the image or even a rotated image. Pooling might reduce the total information content we have with us, but still, it preserves the most key features. By disregarding unnecessary, non-important information, we will prevent overfitting. Pooling is categorized majorly into three types – max pooling, min pooling, and average pooling.
In the image, the pooling operation is shown. We have a 3x3 filter which is applied on the 5x5 feature map. The result is a 3x3 feature map which means that we have lost information compared to the original feature set. But since we have used max-pooling, which means that when a given filter is applied on a subset from the feature map, we pick the max value from the feature set. The maximum number in our feature map represents the closest match to our feature. In the case of min pooling and average pooling, we take the minimum and average values from the filter, respectively.
The brighter pixels are chosen from the image by max pooling. When the image's background is black, and we are just interested in the lighter pixels, max pooling is helpful. On the other hand, min pooling keeps the darker pixels, which results in a darker image than the original. Since the average pooling method is smoothest down the image, it may be difficult to see the sharp features while using this pooling method.
Fuzzy logic is based on a rule-based system that has a nonlinear mapping of the input to the output. Fuzzy logic provides an inference structure that enables a mechanism of representing linguistic constructs such as ‘high,’ ‘low,’ good,’ ‘bad,’ ‘average,’ etc. In fuzzy systems, values are indicated by a number called the truth value. It ranges from 0 to 1, where 0 represents absolute falseness, and 1 represents absolute truth. The fuzzy logic operates on the concept of a set function or membership function. Fuzzy systems are based on rules. The number of rules increases exponentially with the dimension of the input space. For example, the statement “India is an ancient country” can be translated by fuzzy systems as India is a member of a set of countries in the world.
The Markov decision process is a form of reinforcement learning used in artificial intelligence. They represent Markov chains with the possibility that an agent may select one of the many potential courses of action, and the transition probabilities will change depending on the action selected. For instance, there are always 1, 2, 3, or 4 actions in the Pacman video game. The objective is to get to the exit, but there are ghosts and rewards along the route. Every action the agent does will be accompanied by a set of probabilities. Consider that the agent will have a 60% chance of receiving a reward if it moves up, compared to a 30% chance of being eaten by the ghost if it advances to the right. Sometimes you have no choice but to act in the only way you can. The Markov decision process offers a mathematical framework that aids in developing a plan of action that can provide the maximum rewards over time. At each given point, the decision is partially random and partly in the decision maker's control.
Recurrent neural networks (RNNs) are distinct from feed-forward neural networks, in which activations only flow from input to output layers. A recurrent neural network has a similar architecture, but it also incorporates connections that point backward in the network, much like feedback loops. All the inputs from earlier steps are a function of a recurrent neuron's output. The output of these earlier time steps is kept in a network memory cell. The shortcoming of CNN's memory is eliminated by this. The network must memorize a sequence for use scenarios like trying to analyze consecutive video frames. This makes RNN the best choice. The most widely used recurrent network is the Long Short-Term Memory Network or LSTM. The applications of RNNs include predicting stock prices, autonomous driving systems, speech-to-text, sentiment analysis, etc.
An autoencoder is a feed-forward neural network with three layers. They are unsupervised machine learning models that recreate input data to minimize its size. They have one characteristic in that the autoencoder's only job is to replicate the inputs since the targets are the same values as the inputs. This implies that the input layer and output layer must each include the same number of neurons. These models are taught as supervised machine learning models and then operate as unsupervised models during inference. Encoder and decoder are the two components of an autoencoder. The encoder performs the function of a compression unit by condensing the input data. The decoder reconstructs the compressed input after decompression.
The outcome of multiplying a number between 0 and 1 with another number in the same range is a significantly smaller number than either of the two. The backpropagation technique propagates the error gradient while moving from the output layer to the input layer. Backpropagation multiplies the derivatives with values between 0 and 1, acting like a chain rule. We had to multiply more than 0 by 1 number because of the addition of layers, which required us to backpropagate, and this usually makes the weights exceedingly small very rapidly. For activation functions to work, the signal must have large enough values. The farther neuron layers are from the output, the higher the likelihood that they’ll get locked out of updates because the signals are too small, and the activation functions will stop them. In artificial neural networks, this phenomenon refers to as the vanishing gradient problem. The deep neural networks are affected by the vanishing gradient problem, which makes it difficult to train the lower layers. As a result, your network either stops learning altogether or learns extremely slowly.
In a vanishing gradient, the weights of the links grow extremely small, thereby not allowing the network to learn further. The exploding gradient problem is the opposite. It is the case in which the gradients can grow bigger and bigger. The weights are so large that when the neuron's output is updated using these weights, the algorithm diverges. This phenomenon is most likely to happen in recurrent neural networks where there is a presence of memory and feedback loops.
A must-know for anyone heading into an AI interview, this question is frequently asked in AI Interview Questions. LSTM, or Long Short-Term Memory, is a complex, black-box model that belongs to the family of recurrent neural networks. LSTM has feedback connections, i.e., it can process the entire sequence of data, apart from single data points such as images. LSTM introduces memory units known as cell states. Gates in LSTMs control the addition and removal of information from the cell state. These gates may allow information to enter and exit the cell. It has a sigmoid neural network layer and a pointwise multiplication operation that helps the mechanism. The network can interact with the memory cells only via the gates.
A component of bagging ensemble models is random forest. In an ensemble model, the estimation is produced by using two or more predictors. A group of decision tree classifiers, for instance, could be trained on various random subsets of the training set. We just gather the predictions from each individual tree and then predict the class that receives the most votes. A Random Forest is a group of such decision trees. It is one of the most powerful classification algorithms. The random forest algorithm shares the hyper-parameters of a decision tree. The algorithm can handle large datasets and does not require much data preparation or numeric data scaling.
The Hopfield network is an interconnected single-layer feedforward network. It is auto-associative, which means it can create associations from its own content. The auto-associate memory allows the Hopfield network to recover an original stored vector. The inputs to the network can be binary or bipolar. The network has symmetric weights and no self-connection. The input neurons, after the updating process, reach the output layer. The output is then again fed to the input acting as a feedback connection. The outputs are fed back to the input of other processing elements but not to themselves. This process continues until there are no latest updates happening to the network responses. The connections are positive if there is no change in the output and the input processing elements. Various optimization problems utilize the associative Hopfield network. Hopfield networks can further be classified as discrete and continuous Hopfield networks.
Kohonen self-organizing feature map is an artificial neural network that follows an unsupervised learning approach. It consists of a network performing the mapping of a wide pattern space into a typical feature space. That means it converts the patterns of arbitrary dimensionality into a response of a one-dimensional or two-dimensional array. The maps help to maintain the relationship among the input patterns in their arbitrary dimension. A one-dimensional array consists of a single layer where each of the neurons relates to the other neurons in the layer. A two-dimensional array means that there are two layers of fully connected neurons.
In ensemble learning, the terms "hard voting classifiers" and "soft voting classifiers" describe a chain of weak classifiers that produce an inference. The classifier is referred to as a hard-voting classifier or a majority-vote classifier if we combine the predictions of each classifier and predict the class that receives the greatest number of votes. You can predict the class with the highest-class probability, averaging over all the individual classifiers if all classifiers are able to estimate class probabilities. This is termed the soft voting classifier. Because it lends more weight to votes with elevated levels of confidence, it frequently produces results that are higher than hard voting.
A Reinforcement Learning agent's performance can be evaluated by adding up the rewards it receives. You can run numerous episodes and examine the average total rewards with acceptable standard deviation in a simulated setting.
The process of backpropagation is used to train artificial neural networks. Backpropagation is the term used to describe the entire process of training an artificial neural network utilizing the backpropagation step, each of which performs a gradient descent step while computing the gradients. Reverse-mode autodiff, on the other hand, is only an efficient gradient computation method that is employed by backpropagation.
Consider you are working with a client who wants to implement sentiment analysis on the comments and review section of their portal. However, they have accumulated only a year’s data. How would you proceed in this case?
Whether or not a year’s data is sufficient can only be answered by looking at the volume of data which will be primarily driven by the number of users accessing the portal. This is a case of NLP. In such cases, it is always better to go ahead with pre-trained models and avoid modeling from scratch. There are a lot of open-source pre-trained models available such as BERT, MLNet, etc. We can always test our samples using these pre-trained models. It also has provision for incremental training or transfer learning which can let us feed our data to the model for training purposes. This can prove beneficial if we are working on a particular use case and leverage the existing pre-trained models trained on thousands or millions of data points.
Suppose you are training a random forest classifier and notice that your model is starting to overfit the training data. How can you prevent the model from overfitting?
Overfitting is possible while working with the random forest since tree-based models are free to learn. Our models can be controlled using hyperparameters. There are certain hyperparameters of random forest that need to be set so that the model is constrained from over-learning the patterns in the data.
Is there any possibility of combining five models where each reaches 75% precision after being trained on the exact same training data?
A voting ensemble, which frequently produces even better results, can be created if you have trained five separate models and they have all achieved a 75% precision rate. It functions better if the model’s learning algorithms are quite different from one another. The goal of bagging and pasting ensembles is to train them on different training instances, and however even if they aren't, it still works if the models are dissimilar.
Should you increase or reduce the learning rate if your Gradient Boosting ensemble overfits the training set?
Reduce the learning rate if your Gradient Boosting ensemble overfits the training set. Additionally, early stopping could be used to determine the ideal number of predictors. Early stopping is a type of regularization used when training a learner using an iterative approach, such as gradient descent, to prevent overfitting.
If a Decision Tree is underfitting the training set, is it a good idea to try scaling the input features?
The decision tree is a tree-based algorithm that requires little to no data preparation. It is not a distance-based algorithm; therefore, scaling does not have any impact on it. Underfitting might be caused due to stricter constraints on the hyperparameters of the model, less data records, etc.
How long will it take to train a Decision Tree on a training set with 10 million cases if it takes 45 minutes to train on a training set with 1 million instances?
The time taken by an algorithm or a model to learn depends on the computational complexity of the model training. The complexity of a decision tree is O (n x m x log(m)). Therefore, if we multiply the number of records by ten, the training time will multiply by 11. We can expect the newer model to get trained in 495 minutes.
Why are encoder-decoder RNNs chosen for automatic translation over simple sequence-to-sequence RNNs?
If you translate a statement word-by-word, the outcome will be awful. When translated word by word, the Spanish phrase "me quedo en la India" (which translates to "I stay in India") reads, "I stay in India." It is much preferable to read the entire text before translating it. While an encoder-decoder RNN will first read the entire sentence before translating it, a basic sequence-to-sequence RNN will begin translating a sentence as soon as the first word is read.
Assume you wish to train a classifier but have just a little amount of labeled data and a lot of unlabeled training data. How can autoencoders be of assistance?
If you want to train a classifier but only have a small amount of labeled data, you could train a deep autoencoder on the entire dataset consisting of labeled and unlabeled records. Then reuse its lower half for the classifier and train the classifier using the labeled data. When training the classifier, you should freeze the reused layers if there is not much labeled data available.
One architecture for categorizing videos based on their visual content might be to take one frame per second, process each frame through a convolutional neural network, feed the CNN's output to a sequence-to-vector RNN, and then pass the RNN's output through a SoftMax layer to provide all the class probabilities. Cross entropy would be the sole cost function used for training. You could convert each second of audio to a spectrograph, feed the output of the spectrograph to a CNN, and then use the output of the CNN to train an RNN if you also want to use the audio for categorization.
Consider that the data about every purchase made by a customer at a grocery store is available. Tell me how you would create an algorithm to group the customers into clusters. How would you decide how many clusters are appropriate to include?
Here, it is a type of unsupervised learning since we do not have any labels associated with the data. We must create clusters that are a part of the clustering model. We can group these customers based on their similarities, like similar age groups, shopping behavior, etc. A clustering model will try to identify a pattern in the data. Data clustering methods look for clusters in the data, and there are many ways to adjust and modify these algorithms. The idea behind clustering is that objects in a cluster must be related to one another. Related elements are placed next to one another in the data grouping. While there are so many clustering algorithms present, the K-means clustering algorithm might be the best place to start with. To decide the appropriate number of clusters, we can use silhouette analysis or elbow method. However, we can initially use a rule of thumb to include the root of half of the number of data points as the number of clusters in our algorithm.
A CNN contains far less parameters than a fully connected DNN, which speeds up training, lowers the chance of overfitting, and necessitates much less training data. This is because successive layers are only partially connected and because it heavily reuses their weight. CNNs can generalize significantly better than DNNs for image processing tasks like classification, utilizing less training data. This is because CNNs use kernels.
Let us say you train a DNN on a TensorFlow cluster for days, but you forget to use a Saver to store the model before your training program is finished. Is your trained model lost?
When using distributed TensorFlow, the model parameters continue to exist on the cluster even after the session, and client software is closed since the variable values are stored in containers that are managed by the cluster. You only need to start a new cluster session and save the model.
Yes, dropouts do slow down training by a ratio of two. Since it is only activated during training, it has no effect on inference.
How many neurons are required in the output layer to separate spam from legitimate email? What activation method ought to be applied to the output layer?
We only need one neuron in the output layer of a neural network to categorize email as spam or junk, for example, by signaling the likelihood that the email is spam. When estimating a probability, you would normally utilize the logistic activation function at the output layer. It gives the probability of an event occurring.
If an autoencoder perfectly reconstructs the inputs, is it necessarily a good autoencoder? How can you evaluate the performance of an autoencoder?
An autoencoder's ability to completely rebuild its inputs does not necessarily indicate that it is a good autoencoder; it could simply be an overcomplete autoencoder that has learned the technique of copying its inputs to the coding layer before applying them to the outputs. If an autoencoder result in incredibly poor reconstructions, it cannot be said to be good. Measuring the reconstruction loss using a tool like MSE is one method for evaluating an autoencoder's performance.