Big Data is creating a revolution in the IT field, every year the use of analytics is increasing drastically every year. We are creating 2.5 quintillion bytes of data every day hence the field is expanding in B2C apps. Big Data has entered almost every industry today and is a dominant driving force behind the success of enterprises and organizations across the Globe.
Let us first discuss- “What is Big Data?”
“Data” is defined as ‘the quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media’, as a quick google search will show.
The concept of Big Data is nothing complex; as the name suggests, “Big Data” refers to copious amounts of data which are too large to be processed and analyzed by traditional tools, and the data is not stored or managed efficiently. Since the amount of Big Data increases exponentially- more than 500 terabytes of data are uploaded to Facebook alone, in a single day- it represents a real problem in terms of analysis.
Before we jump into the article, let's have a visual introduction on what is Big data and its types. (Structured Data, Semi-Structured & Unstructured Data)
Types of Big Data:
Classification is essential for the study of any subject. So Big Data is widely classified into three main types, which are-
1. Structured data
Structured Data is used to refer to the data which is already stored in databases, in an ordered manner. It accounts for about 20% of the total existing data and is used the most in programming and computer-related activities.
There are two sources of structured data- machines and humans. All the data received from sensors, weblogs, and financial systems are classified under machine-generated data. These include medical devices, GPS data, data of usage statistics captured by servers and applications and the huge amount of data that usually move through trading platforms, to name a few.
Human-generated structured data mainly includes all the data a human input into a computer, such as his name and other personal details. When a person clicks a link on the internet, or even makes a move in a game, data is created- this can be used by companies to figure out their customer behavior and make the appropriate decisions and modifications.
Let’s understand Structured data with an example.
Top 3 players who have scored most runs in international T20 matches are as follows:
|Player||Country||Scores||No of Matches played|
|Brendon McCullum||New Zealand||2140||71|
2. Unstructured data
While structured data resides in the traditional row-column databases, unstructured data is the opposite- they have no clear format in storage. The rest of the data created, about 80% of the total account for unstructured big data. Most of the data a person encounters belong to this category- and until recently, there was not much to do to it except storing it or analyzing it manually.
Unstructured data is also classified based on its source, into machine-generated or human-generated. Machine-generated data accounts for all the satellite images, the scientific data from various experiments and radar data captured by various facets of technology.
Human-generated unstructured data is found in abundance across the internet since it includes social media data, mobile data, and website content. This means that the pictures we upload to Facebook or Instagram handle, the videos we watch on YouTube and even the text messages we send all contribute to the gigantic heap that is unstructured data.
Examples of unstructured data include text, video, audio, mobile activity, social media activity, satellite imagery, surveillance imagery – the list goes on and on.
The following image will clearly help you to understand what exactly Unstructured data is
The Unstructured data is further divided into –
- User-Generated data
a. Captured data:
It is the data based on the user’s behavior. The best example to understand it is GPS via smartphones which help the user each and every moment and provides a real-time output.
b. User-generated data:
It is the kind of unstructured data where the user itself will put data on the internet every movement. For example, Tweets and Re-tweets, Likes, Shares, Comments, on Youtube, Facebook, etc.
3. Semi-structured data:
The line between unstructured data and semi-structured data has always been unclear since most of the semi-structured data appear to be unstructured at a glance. Information that is not in the traditional database format as structured data, but contains some organizational properties which make it easier to process, are included in semi-structured data. For example, NoSQL documents are considered to be semi-structured, since they contain keywords that can be used to process the document easily.
Big Data analysis has been found to have definite business value, as its analysis and processing can help a company achieve cost reductions and dramatic growth. So it is imperative that you do not wait too long to exploit the potential of this excellent business opportunity.
Diagram showing Semi-structured data
Difference between Structured, Semi-structured and Unstructured data
|Factors||Structured data||Semi-structured data||Unstructured data|
|Flexibility||It is dependent and less flexible||It is more flexible than structured data but less than flexible than unstructured data||It is flexible in nature and there is an absence of a schema|
|Transaction Management||Matured transaction and various concurrency technique||The transaction is adapted from DBMS not matured||No transaction management and no concurrency|
|Query performance||Structured query allow complex joining||Queries over anonymous nodes are possible||An only textual query is possible|
|Technology||It is based on the relational database table||It is based on RDF and XML||This is based on character and library data|
Big data is indeed a revolution in the field of IT. The use of Data analytics is increasing every year. In spite of the demand, organizations are currently short of experts. To minimize this talent gap many training institutes are offering courses on Big data analytics which helps you to upgrade skills set needed to manage and analyze big data. If you are keen to take up data analytics as a career then taking up Big data training will be an added advantage