Data Science seems to be the buzzword for the decade. The internet is talking about it, companies want it, and your friends and family are in the loop about how data science changes our world. For example, notice how Google’s news feed algorithm is updated based on the conversations around you and how YouTube videos are so apt based on recommendations.
Let us now come to budding data science professionals and how complete data science Bootcamp can be an answer to this question. Irrespective of the position, industry, and level of technical knowledge, you would like to understand where you fit into all of this? Is it possible to leverage this “new oil” everyone talks about? The title does say, “Is Data Science for All?” but I would like to focus on “Is Data Science for you?.” Let us get to it.
Is Data Science for You?
- The short answer:
- The long answer:
Data Science has been around for a while. And, it has been the sexiest job around, according to many people and companies, for over a decade. So if you are willing to put in the time and effort, no matter what your nature, area of interest /expertise/domain is, Data Science can be for you.
The Intuition of Data in Data Science
Let me give you some context. Suppose your grandparents learn and upskill themselves on how to use the internet. Then, they will enjoy a whole set of new privileges in life-like video calls with family, content online, group conversations, digital photos etc.
On the other hand, if they do not know how to use the internet, they are digitally disabled, missing a big part of the new world.
If you do not know how to leverage Data in this Digital Age, you become digitally disabled
Whether you are a student, professional or entrepreneur, let me explain to you a situation that you might have faced in your day to day. If I were to give you a dataset or an excel file with one million rows (ten lakh rows), you can open up excel and perform some sort of analysis.
The situation changes if I were to give you ten million rows – you would not be able to open it in Microsoft Excel. Now, suddenly, you cannot do much because Excel has a limit of one million rows. Learning how to work with data, perform analysis and even visualize data can help you make better decisions. People who are done with their MBA (Master Of Business Administration) are returning to Data Science because it can help them make better decisions.
The most unique thing about Data Science is whether you are a fresher or an experienced professional in any field, you can become a Data Scientist.
There are two main reasons that anyone can enter Data Science with a little bit of effort are:
- Data Science is an industry-agnostic field, which means that irrespective of the domain you are in, you can add a “data-edge” to it, and you will also be able to leverage the knowledge and experience you already have
- Data Science is practical. Most of the problems being solved in companies across are understandable and doable.
Whatever your nature or experience is, there is a Data Science course in India available for your enrollment. How can I be so sure? Let me show you precisely what I am discussing using the roles under Data Science.
The Roles that Comes Under Data Science
I could go about and just jot down the roles and tell you what we do in them. But instead, let us try to understand with a solid example of how Data Science works in the real world. Follow along closely. We will also be talking briefly about the tools and technologies needed!
Let us say we are a retail giant, Knowledgehut Retail. We are looking to optimise our products in our retail stores across India. So the business stakeholders will need to understand our best selling and worst selling products. The steps that we can take to understand what is Data Science all about are as follows:
- Orchestrate Data: Get data from the various stores that we have. The data can be in excel, a local database, or the Cloud.
- Clean & Standardise Data: Bring the data into the same format, clean and organise the data to make one extensive master dataset
- Modelling & Analytics: On this master data set, we need to perform our analytics of top and bottom performing products to optimise. This can be done in three ways:
1. Heuristic model
A heuristic model is a fancy way of saying that business rules are used to decide whether the products are performing well or poorly. For instance, we can say that if a particular oil brand is selling over 1,700 packets in a month, it is a good product. For example, writing code for these business rules as if-else statements would entail a heuristic model.
2. Statistical model
Let us say we are selling umbrellas. The umbrellas sale is low throughout the year, except during the monsoon season. Does this mean that umbrellas are poor performing products? No. If we were to look at the distribution over five years, we would see that it peaks during the monsoon season. So the seasonal sales can be inferred by looking at their statistical distribution.
So, in this case, umbrellas would not be a poor product but rather a seasonal product, based on what we can see from the statistical distribution over the years.
3. AI/Machine Learning model
Think of a Machine Learning model as a black-box model. The input is the same as we have seen above, an extensive master dataset. The output is the good and bad performing products. However, instead of a manual, heuristic approach or just statistics, we let the model look at historical data. It learns what is happening and tells us what the good and bad commodities are.
Great job, you understand how the end-to-end process of how a Data Science process flow works. That was all about Data Science. Let us look at what you have understood in the form of an image and briefly discuss the prominent roles that exist in the industry.
The roles that exist in the field of Data Science are traditionally distributed as follows:
1. Data Engineer
Like how water flows from one end of a pipeline, data also flows in from multiple sources to the master dataset. Data Engineers help make the data pipelines to get data from many sources and bring it to a common location. While getting the data, the data needs to be cleaned, standardised, and bought to a standard format.
Platforms to be familiar with (either one) - Google Cloud Platform, Microsoft Azure, Amazon Web Services
Languages & skillsets: SQL, understanding of databases and python (fundamentals)
2. Machine Learning Engineer
As discussed, there are three primary kinds of models that one can use: the heuristic model, statistical and machine learning model. A machine learning engineer needs to work with the business team to understand the requirements and the data engineering team to get the relevant data. A knack for practical problem solving can aid an individual in fitting into this role.
Languages: Python, SQL, MLOps fundamentals
Other skillsets: Machine Learning, Big Data, Business & domain understanding
3. Business Analyst
Once the model is made, and we have the output, we display it to the business so they can view the information visually and come to a business decision that benefits the organisation. This can be done with the help of a dashboard and presentations to quantify business impact.
Visualization & Dashboarding tools (either one or two): Tableau, Power BI, Qlik Sense
Tools & skillsets: Business Understanding, stakeholder management, Microsoft excel, project management, strong communication skills, presentation skills
4. Data Scientist
The most sought-after role in the market is that of a full-stack Data Scientist. To be one, the candidate must understand everything mentioned above end-to-end. In addition, a Data Scientist needs to have the skills of a Data Engineer, Machine Learning Engineer and Business Analyst combined.
Skills: All the above (fundamental understanding) + Team management
Now that you have gained a fair knowledge of the roles under Data Science let us explore the golden question – “What role should you pick?”
What Role Should You Pick?
This is a subjective question. Based on my substantial experience in the industry, I would chalk out this decision to the knowledge of the person you are currently and the kind of person you would like to be. We would also want to consider future growth over the next few years. One of the best pieces of advice I can give you is:
Talk to people who are where you want to be.
I have done the research for you, and here is my opinion that I will divide based on the roles in Data Science that we discussed above.
Should you be a Data Engineer?
If you are the kind of person that is an introvert and is comfortable with extremely process-oriented work, this role is for you. You will learn and eventually master a particular cloud platform and implement it for a few projects in terms of growth path.
As you grow up the ladder, the roles are Senior Data Engineer and Technical Architect. You will need to manage data pipelines for more projects, help team members/teams, and eventually design the end to end pipelines for various groups.
Should you be a Machine Learning Engineer?
Having a knack for experimentation with data, along with being good with mathematics, numbers, and statistics, will put you in an excellent position to be a machine learning engineer. This is one of the roles in Data Science that has the word “engineer” in it, which indicates that there is some amount of Software engineering present in it.
An extraordinarily respected and future-facing role in the market, the growth in this role can be in terms of business or technology, depending on the individual’s inclination.
Should you be a Business Analyst?
With a heavy emphasis on efficient communication, presentation and visualisation, this role is suitable for extroverts with a presence of mind. Furthermore, in a position that has a low amount of coding, business analysts are also highly desired in the market.
Growth in this role is along the lines of Senior Business Analyst, Business Managers, VP, director, etc. Strong domain knowledge can help business analysts be influential decision-makers.
Should you be a Data Scientist?
Given the current pace of technology, being a Data Scientist would entail a lifelong role of learning. Having the patience and know-how to manage technology, clients, and teams. A strong growth mindset coupled with strong technical experience can help individuals make a big difference in the organisations that leverage Data Scientist folks!
Growth can be along with technology or management. Data Science can help you be wherever you want to be!
Disclaimer: The Data Science market is in its nascent stages of growth. Although the terminology and work mentioned in the above sections would be ideal, different companies have different terminology, roles, and responsibilities for the same positions. The same has not yet been standardised across the industry. Before applying for any job(s), speaking to someone already doing the role you are looking for is strongly recommended.
Now that you have understood the “what,” you might be interested in knowing more about the timeline for you to become one!
How Long Does It Take to Become a Data Scientist?
- The Short answer
It depends. Typically six months to one year of dedicated effort, depending on the individual.
- The Long answer
It would depend on your passion for data/mathematics/business. Essentially, Data Science is a combination of Mathematics, Business and Technology. Suppose you are willing to work towards being good at one of these and intermediate at another. In that case, it is more than enough to succeed as a data scientist.
Focusing not just on how you can crack the interview but also on giving importance to other factors can make you stand out in a crowd.
- Doing a Data Science course/certification
- Writing articles on Data Science
- Being active, engaging and posting on LinkedIn
- Working towards building a strong GitHub profile
The more work you put into it, the more you get out of it. I have practised all I preach; please feel free to Google my name. You can get into Data Science as fast as you would like to.
Mandatory Skills You Need to Become a Data Scientist
When I started out as a data scientist, my major fear of code. The one thing I have learned is - The most effective way to get good at something is to keep doing/practising it daily. Secondly, rather than just learning the theoretical way, put the knowledge into practice, which can supremely enhance one’s skillset. Here, I would like to break the mandatory skills into different domains of skills, i.e., interpersonal, and technical.
- Effective and Efficient Communication skills: Personally, I have enhanced this skill by observing, communicating, and practising with peers, team leads, higher-up management folks, multiple business stakeholders and IT managers across various industries.
- Grit and the learning attitude: You might wonder why I would have included this here. You will be committing to this field is lifelong learning as this industry continuously evolves. Every day, something new is coming up in different technologies and tools. A learning attitude takes you a long way.
- Organisational Skills: Good organisational skills are necessary as one starts getting involved in different aspects of a project by portraying various roles. Putting structure into your thought process can make everything you do – personally and professionally- much more accessible.
- Programs and Software: Python, SQL, R, and Java are the go-to languages to start off with anything related to data sciences
- Cloud Computing: Microsoft Azure, Google Cloud Platform and Microsoft Azure. The best way to pick up this skill would be to master one and draw parallels across the other.
- Machine Learning Skills: Start learning from the few basic ML (Machine Learning) algorithms such as Linear Regression, Decision Tree, Random Forrest, etc. And you can go higher on the ladder by looking into Support Vector Machines, Neural Networks, Deep Learning, etc. But again, practice and application become key to learning ML. There are wide ranges of resource material available if one would want to deep dive into the math of the algorithm, a few of them which I have tagged below
- Data Visualization Tools: Basic tools, to begin with, can be Microsoft Power BI and Tableau, as these would be more straightforward for one to pick up. There are also a lot of other tools one could investigate
- Database Management: I can vouch that you would spend about 80% of your time preparing your data for analysis and modelling in the following steps as a data scientist. So, database management through different skills such as MySQL, SQL Server, Oracle, and NoSQL databases such as MongoDB, HBase amongst the others
To master all these tools, let us discuss some of the more popular Data Science resources available.
Data Science Resources
Multiple resources can be used for Data Science.
- Kaggle: The top machine learning and data science community on the internet, you can find like-minded people, code, data, courses, and everything data science here
- YouTube – From the same community of data scientists, YouTube can be a disorganised but extremely informative source to learn data science.
- KnowledgeHut’s Complete Data Science Bootcamp – An end to end Bootcamp for a beginner to understand Data Science
- Books & Self-Learning – If you are a self-motivated individual, it is tedious but possible to learn data science on your own from scratch
Congratulations! You are the top 5% that have managed to come to the end of the article. This shows that you are here to understand and know how Data Science works in the real world. Whatever you decide regarding your journey in Data Science may be, remember that data is the new future. Everyone is afraid of code initially, but eventually, we must move in the direction that the world is moving. The faster we adapt, the more we will benefit from the boom of Data Science.
Thank you for taking the time to read my article. I am a Lead Data Scientist with the most prominent Data Science firm in the country – you can connect with me on LinkedIn. If you have any doubts, please feel free to reach out to the team or Google me and shoot me an email. I am always happy to help!