10X Sale
kh logo
All Courses
  1. Tutorials
  2. Data Science

Machine Learning Terminologies

Updated on Aug 21, 2025
 
12,514 Views
  • Batch size: It refers to the number of examples that are present in the group/batch.
  • Categorical data: Every feature in a dataset might take up different (discrete) values based on other features.
  • Data Analysis: This process helps in gaining better understanding of the data with the help of sampling, measuring and visualizing data.
  • Discrete feature: A feature that has a fixed set of values which wouldn’t change.
  • Epoch: One full cycle of executing a training phase with respect to a dataset. It can be understood as the value which is obtained by calculating total number of examples in the dataset / batch size.
  • Feature: An input variable or an entry from the dataset that is used to make predictions on the dataset.
  • Feature engineering: The process used to determine which features would be useful and help in training the model better, thereby helping make better predictions. Once the right/relevant features are selected, they are converted into features.
  • Feature extraction: It is just another name for feature engineering. It could also refer to extracting intermediate features which need to be passed as inputs to other machine learning models.
  • Feature set: The set of features on which the machine learning model is trained upon
  • Feature vector: The feature values in the form of a list which is used to represent the example that is passed as an input to the machine learning model.
  • Fine tuning: The process of performing operations on the already trained model by tuning the parameters so that it predicts better and gives better outputs.
  • Ground truth: The correct value or the answer to a task or problem in hand.
  • Heuristics: A quick solution to a problem in hand. This might not be the optimal or well-planned but can be thought of as a quick fix.
  • Hyperparameter: The parameters which are fine-tuned when a model is trained.
  • Instance: It is a synonym for ‘example’. It refers to a single row of a dataset or a part of the dataset or the entire dataset itself.
  • Label: This is in the context of ‘supervised learning’ since models that implement supervised algorithms learn from examples that are labelled. It refers to a naming that is provided to a specific feature which distinguishes it from the other features.
  • Labelled example: An example in the dataset that contains features and label.
  • Loss: It is useful in determining how good/bad the model is. This can be determined by defining a ‘loss’ function that essentially tells how far the predicted values are from the original ones.
  • Loss curve: It helps in determining how well the model fits the data. It also helps in determining whether the data and model overfit, underfit, or converge.
  • Model: It represents the learning of the machine learning system, which would have learnt patterns and other learning from the input data supplied to it.
  • Model capacity: It tells about how complicated problems the machine learning system can learn. This corresponds to the capacity, the more complicated problem that the system can learn, higher is its capacity. The capacity of the model increases with increase in number of parameters of the model.
  • Model training: The process of finding out the best model to train the data on to get good results with predictions.
  • Model function: A method of implementing the processes involved in machine learning. They include training, evaluation and inference.
  • Noise: Something which shouldn’t be a part of dataset but is a part. It can be an entity that doesn’t allow effective training of data. It could be because a device mis-measured something or a human made a mistake while noting values or labelling the data.
  • Normalization: The process of conversion of a range of values into a standardized range of values, which could be between -1 to +1 or 0 to 1. Data can be normalized with the help of subtraction and division.
  • Numerical data: Features in a dataset that are represented as integers or real-valued numbers. They are also known as continuous features. When a feature is represented as a numerical value, it can be inferred that the values have a mathematical relationship with each other.
  • Objective: The goal which the machine learning algorithm is trying to achieve. It can also be defined as a metric which is trying to be optimized so that the algorithm would yield better predictions.
  • Outliers: Values which don’t synchronize or fall in range with the other values present in the same dataset.
  • Parameter: A variable value of a model which is used by the machine learning system to be trained.
  • Pipeline: The infrastructure which is used with the machine learning algorithm. This include data gathering, placing this data into separate files, training this data and exporting the obtained models into production environment.
  • Prediction: The output of a model after it has been trained on input data.

Conclusion

In this post, we understood about the various Machine Learning terminologies which are used in the world of ML.

+91

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

Get your free handbook for CSM!!
Recommended Courses