Storage of data: How it works
Big data is hard to define. Usually it is associated with instances
where you have tremendous velocity, volume and variety of data. Velocity
would mean you are being streamed with data that has to be ingested and
processed fast.

Graphic : Amit Sharma
What is Big Data?
Big data is hard to define. Usually it is associated with instances where you have tremendous velocity, volume and variety of data. Velocity would mean you are being streamed with data that has to be ingested and processed fast. When you are being streamed with data at that rate, you are likely to run into a storage problem sooner rather than later. Then this kind of data is not coming from one source and the sheer variety makes it hard to manage.
Cold storage:
Storage of data that is old and not often retrieved or processed. Can be stored on slower hard disks.
Hot storage:
Storage of data that needs to be retrieved and processed quickly. Preferably on quick solid state drives
DATA SCALE
Byte: 8 Bits
1 byte: A single character
Kilobyte (1000 Bytes)
1 Kilobyte: A very short story
Megabyte (1 000 000 Bytes)
1 Megabyte: A small novel OR A 3.5 inch floppy disk
Gigabyte (1 000 000 000 Bytes)
1 Gigabyte: A pickup truck filled with paper"X OR a symphony in high-fidelity sound OR a movie at TV quality
Terabyte (1 000 000 000 000 Bytes)
1 Terabyte: An automated tape robot OR all the"X X-ray films in a large technological hospital OR 50,000 trees made into paper and printed OR daily rate of EOS data (1998)
Petabyte (1 000 000 000 000 000 Bytes)
1 Petabyte: 5 years of EOS data (at 46 mbps
Exabyte (1 000 000 000 000 000 000 Bytes)
5 Exabytes: All words ever spoken by human beings
Zettabyte (1 000 000 000 000 000 000 000 Bytes)
1024 Exabytes
40 zetabytes Stored information by 2020. If one byte is a human cell, then 40 zetabytes will mean a population of 400 million people
Yottabyte (1 000 000 000 000 000 000 000 000 Bytes)
Xenottabyte (1 000 000 000 000 000 000 000 000 000 Bytes)
Shilentnobyte (1 000 000 000 000 000 000 000 000 000 000 Bytes)
Domegemegrottebyte (1 000 000 000 000 000 000 000 000 000 000 000 Bytes
Uses of big data
Manufacturing: 25% reduction in testing time Security: Predictive analytics can help prevent an incident even before it happens studying changes in data behaviours Medicine: Study genomics to find larger patterns in population
Apache Hadoop
This open-source software framework, which finds its origins in Google's MapReduce and Google File System (GFS), supports data-intensive distributed applications on large data clusters. Simply put, Hadoop is taking the compute to the storage.
Intel for Hadoop
Intel calls itself a facilitator for Apache Hadoop by providing its expertise in computing, networking and storage to sync and work optimally for processing data. According to Intel, its advantage is in careful workload balancing with the knowledge of silicon.
Big data is hard to define. Usually it is associated with instances where you have tremendous velocity, volume and variety of data. Velocity would mean you are being streamed with data that has to be ingested and processed fast. When you are being streamed with data at that rate, you are likely to run into a storage problem sooner rather than later. Then this kind of data is not coming from one source and the sheer variety makes it hard to manage.
Cold storage:
Storage of data that is old and not often retrieved or processed. Can be stored on slower hard disks.
Hot storage:
Storage of data that needs to be retrieved and processed quickly. Preferably on quick solid state drives
DATA SCALE
Byte: 8 Bits
1 byte: A single character
Kilobyte (1000 Bytes)
1 Kilobyte: A very short story
Megabyte (1 000 000 Bytes)
1 Megabyte: A small novel OR A 3.5 inch floppy disk
Gigabyte (1 000 000 000 Bytes)
1 Gigabyte: A pickup truck filled with paper"X OR a symphony in high-fidelity sound OR a movie at TV quality
Terabyte (1 000 000 000 000 Bytes)
1 Terabyte: An automated tape robot OR all the"X X-ray films in a large technological hospital OR 50,000 trees made into paper and printed OR daily rate of EOS data (1998)
Petabyte (1 000 000 000 000 000 Bytes)
1 Petabyte: 5 years of EOS data (at 46 mbps
Exabyte (1 000 000 000 000 000 000 Bytes)
5 Exabytes: All words ever spoken by human beings
Zettabyte (1 000 000 000 000 000 000 000 Bytes)
1024 Exabytes
40 zetabytes Stored information by 2020. If one byte is a human cell, then 40 zetabytes will mean a population of 400 million people
Yottabyte (1 000 000 000 000 000 000 000 000 Bytes)
Xenottabyte (1 000 000 000 000 000 000 000 000 000 Bytes)
Shilentnobyte (1 000 000 000 000 000 000 000 000 000 000 Bytes)
Domegemegrottebyte (1 000 000 000 000 000 000 000 000 000 000 000 Bytes
Uses of big data
Manufacturing: 25% reduction in testing time Security: Predictive analytics can help prevent an incident even before it happens studying changes in data behaviours Medicine: Study genomics to find larger patterns in population
Apache Hadoop
This open-source software framework, which finds its origins in Google's MapReduce and Google File System (GFS), supports data-intensive distributed applications on large data clusters. Simply put, Hadoop is taking the compute to the storage.
Intel for Hadoop
Intel calls itself a facilitator for Apache Hadoop by providing its expertise in computing, networking and storage to sync and work optimally for processing data. According to Intel, its advantage is in careful workload balancing with the knowledge of silicon.