When I got introduced to the data-world with my first corporate induction training, about 10 years ago. I was then still processing the difference between Data and Information. The following helped me understand the same:
- Data: It is raw information (unprocessed facts and figures) without any context for e.g. Number 20
- Information: structured Data grouped together which can have interpretation. E.g $20 for a toy.
- Knowledge: combination of information, experience and insight that may benefit the individual for the organisation. E.g. $20 for a toy in Black Friday Sale in a mall.
- Wisdom: Knowledge becomes wisdom when one can assimilate and apply this knowledge to make the right decisions. E.g. One who wants to buy a toy will wait for the Black Friday Sale to get it at a cheaper price.
By the time I started understanding above differences, ‘Big data’ term was already making it big and then the obvious question in mind was,” When to call ‘data’ à ‘ Big data’? “
I then made an attempt to understand ‘how big is a data to be called big data?’ and here, I have a big revelation to make, for all of you reading this article, that ‘Big Data’ is actually misleading term and it is irrelevant with “Bigness of data” but it is to be used in relevance. In fact, it is a term which needs to be understood, only in perspective.
The simplest one I could find relevant is, Big data is the data that cannot be stored with traditional storages, cannot be processed with traditional methods/ways and within a short period of time (and these references would still be valid as time advances.) but this is not textbook or only definition of big data. Interestingly, One who finds one set of data as big data can be traditional data for others so truly it cannot be bounded in words but loosely can be described through numerous examples. I am sure by the end of the article you will be able to answer the question for yourself. Let’s start.
Do you know? - NASA researchers Michael Cox and David Ellsworth use the term “big data” for the first time to describe a familiar challenge in the 1990s supercomputers generating massive amounts of information - in Cox and Ellsworth’s case, simulations of airflow around aircraft - that cannot be processed and visualized.
If you go through a brief history of big data, you would know data which is not fitting into memory or disk was called ‘Big data problem’ back in 1997.
As the years passed by innovations were on rising and disruptions were made so the data universe is growing all the time. Let’s understand a few widely available and stated statistics for ‘big data’ (Collected around 2017 or before) >>
You might be interested to read through Domo’s Data Never Sleeps 5.0 report, for the numbers generated every minute of the day.
Understanding that the above stats are probably about 1.5-2 years older and data is ever-growing, it helps to establish the fact that ‘big data‘ is a moving target and….
“Today’s big data is tomorrow’s small data.”
Now that we have some knowledge about transactions/tweets/snaps in a day, Let’s also understand how much data, all these “One-minute Quickies” are generating. Let’s talk about some volumes too. Afterall volumes are one of the characteristics of big data but mind you, not only characteristic of big data.
It is believed that, In a single day, the world produces 2.5 quintillion bytes (2.3 trillion gigabytes) of data, in layman's terms, this is the equivalent of everyone in the world downloading 60 episodes of Breaking Bad, in HD, 20 times! [Source: VCloud 2012] and According to estimates, the volume of data worldwide doubles every 1.2 years.
IDC predicts that the collective sum of the world's data will grow from 33 zettabytes this year to a 175ZB by 2025, for a compounded annual growth rate of 61 per cent. The 175ZB figure represents a 9 per cent increase over last year's prediction of data growth by 2025 – As per the report published in Dec’2018.
But, do you know: how much would be 1 zettabyte of data? Let’s understand.
One zettabyte is equal to one sextillion bytes or 1021 (1,000,000,000,000,000,000,000) bytes or, one zettabyte is roughly equal to a trillion gigabytes.
Fun Fact: There is a legit term coined as The Zettabyte Era (Today’s Era).
The Zettabyte Era can also be understood as an age of growth of all forms of digital data that exist in the world which includes the public Internet, but also all other forms of digital data such as stored data from security cameras or voice data from cell-phone calls.
You must check out this infographic by economywatch (taken from SearchEngineJournal) to understand how much data zettabyte consists of, putting it into context with current data storage capabilities and usage.
Today’s ‘big data’ is generated from majority 3 sources i.e.
- People Generated: Social media uploads, Mails etc.
- Machine Generated: M2M (machine to machine) interactions, IOT devices etc.
- Business Generated: Data generated and stored into today’s OLTPs, OLAPs, Data warehouses, data marts, reports, operational data throughout the enterprise/organization.
Various analytics tools available in the market today, help in solving big data challenges by providing ways for storing this data, process this data and make valuable insights from this data.
As we discussed, big data is moving target as time advances, it is also interesting to know even today, data which is not of huge size but is difficult to process and of relatively smaller volume would still be categorized as Big Data. For example, unstructured data in emails, from social media platforms, data which is required to process with real-time/near real-time etc. all the examples we have seen so far, all of it is big data.
But, It would be a mistake to assume that, Big Data only as data that is analyzed using Hadoop, Spark or another complex analytics platform. As big data is moving the target and it’s ever-growing, also with various disruptive sources of data are being introduced every day, to process this data newer tools would be invented, and hence big data cannot just remain a function of tools being used to analyze it.
To conclude, as discussed at the starting of the article, it would still be appropriate and reasonable to say, this moving target of big data which would always be challenged for storage, processing methods and process it within a short period as well. So big data is a function of volume and/or time and/or storage and/or variety.
It was fun and exciting to know what different aspects are hidden in ‘BIG DATA’ word and I thoroughly enjoyed solving this mystery.
Did you enjoy solving it too?
Do let us know how was experience through comments below.