Some ways to practice your data science skills are given below:
Beginner Level:
The Iris data set is said to be the easiest data set. This is the best data set for a beginner and consists of merely 4 columns and 50 rows.
Practice Problem: Predict the class of a flower on the basis of their parameters.
- Loan Prediction Data Set:
The Loan Prediction data set provides the learner the concepts that are applicable in the domain of banking and insurance - the challenges faced, the variables that influence the outcomes, etc. It consists of 13 columns and 615 rows and is a classification problem data set.
Practice Problem: Predict if a given loan will be approved by a bank based in Dallas or not.
Operations such as Product Bundling, offer customizations, inventory management, etc. are efficiently handled with the help of Data Science and Business Analytics. The Big Mart Sales Data Set is used in Regression problems and consists of 12 variables and 8523 rows.
Practice Problem: Predict the sales of a retail store of Dallas, Texas.
Intermediate Level:
The Black Friday Data Set has sales transactions from a retail store. It is an apt data set to expand and explore engineering skills .It has 12 columns and 550,069 rows and is a regression problem.
Practice Problem: Predict the amount of total purchase made on a day in Dallas, Texas.
- Human Activity Recognition Data Set:
The Human Activity Data Set has a collection of 30 human subjects that were collected via recordings by smartphones. It consists of 561 columns and 10,299 rows.
Practice Problem: Predict the human activity category.
This data set consists of aviation safety reports that describe the problems that were encountered on a certain flight. The Text Mining Data Set consists of 30,438 and 21,519 columns. It is a high dimensional and multi-classification problem.
Practice Problem: Classify the documents on the basis of their labels.
Advanced Level:
- Urban Sound Classification:
The Urban Sound Classification data set is for implementation of Machine Learning concepts to real-world problems by audio-processing. It consists of 8,732 sound clippings of urban sounds that can be categorized in 10 classes.
Practice Problem: Classify the type of sound that is obtained from particular audio.
- Identify the digits data set:
This data set comprises of 7000 images, totaling 31MB, with dimensions of 28X28 each. It allows the developer to study, analyze and recognize the elements present in an image.
Practice Problem: Identify the digits present in a given image.
The Vox Celebrity Data Set is for large scale speaker identification and speech recognition. It is a collection of words spoken by celebrities and extracted from YouTube videos. This data set consists of 100,000 words spoken by 1,251 celebrities around the world.
Practice Problem: Identify the celebrity that a given voice belongs to.