1-Big Data

What is Big Data? How big companies manage Big Data?

How big MNC’s like Google, Facebook, Instagram, etc stores, manages, and manipulate Thousands of Terabytes of data with High Speed and High Efficiency

3 min readSep 17, 2020

What is Big Data?🤔

We are going to discuss something interesting today and that is “Big Data”.
Before moving next let's understand what is big data.
We are constantly generating data.. even nowadays our kitchen appliances are now connected to the internet and sharing and storing mountains of data.

The amount of information being collected around the world is soo much big to process. That’s why our topic comes in play i.e Big Data.

In simple words, big data has huge amount of raw data and it is too complex for traditional software to process it. we need to study that data for business to grow like Facebook and Walmart and many more.

Let's take some examples so we can get more idea of big data.🤯

People are generating 2.5 quintillion bytes of data each day.
Nearly 90% of all data has been created in the last two years
Walmart handles more than 1 million customer transactions every hour.
Facebook generates 500 Terabytes of data each day.
Google currently processes over 20 petabytes of data per day.

Daily smartphone and computer usage mean that the volume of data is expanding rapidly. The average user shares dozens of media links daily, and all of that has to be stored somewhere.

Boeing 737 — the plane used by many carriers on this route — the total amount of data generated would be a massive 240 terabytes of data.
You can read more about it. (click here)

Over 2.5 quintillion bytes of data are created every single day, and it’s only going to grow from there. By 2020, it’s estimated that 1.7MB of data will be created every second for every person on earth. (Read more)

How to store this much data now? 🧐
Softwares are available to solve this issue like

Hadoop
HBase
Hive

We will discuss little about Hadoop not going any technical.
To solve the problem of Big data a new concept was introduced known as the distributed storage system, and the product of this concept is known as Hadoop.
What is Distributed Storage? A distributed storage system is an infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.
let's take an example.. suppose we have one file of 50Gb and we want to store it somewhere. If we try to store it one hard disk it will take time to store it.. and if store this file into different 50 machines then so much less time to the previous one.

Big Data demands a cost-effective, innovative solution to store and analyze it. Hadoop is the answer to all Big Data requirements. So, let’s explore why Hadoop is so important.
“Hadoop Market is expected to reach $99.31B by 2022 at a CAGR of 42.1%”.

Why Hadoop? 🙂

Hadoop provides a cost-effective storage solution for business.
It facilitates businesses to easily access new data sources and tap into different types of data to produce value from that data.
It is a highly scalable storage platform
Hadoop is fault tolerance. When data is sent to an individual node, that data is also replicated to other nodes in the cluster, which means that in the event of failure, there is another copy available for use.
Hadoop is more than just a faster, cheaper database and analytics tool. It is designed as a scale-out architecture that can affordably store all of a company’s data for later use.

Hope you found this post to be informative.
Comment down your thoughts on Big data.
Share it with your friends and lets connects on LinkedIn to get more updates.
Please do not hesitate to keep 👏👏👏👏👏 it.

Thank you and stay motivated…