Storage of data: How it works

Big data is hard to define. Usually it is associated with instances where you have tremendous velocity, volume and variety of data. Velocity would mean you are being streamed with data that has to be ingested and processed fast.

Nandagopal Rajan

Print Edition: 01 Oct, 2013

Graphic : Amit Sharma

What is Big Data?
Big data is hard to define. Usually it is associated with instances where you have tremendous velocity, volume and variety of data. Velocity would mean you are being streamed with data that has to be ingested and processed fast. When you are being streamed with data at that rate, you are likely to run into a storage problem sooner rather than later. Then this kind of data is not coming from one source and the sheer variety makes it hard to manage.

Cold storage:
Storage of data that is old and not often retrieved or processed. Can be stored on slower hard disks.

Hot storage:
Storage of data that needs to be retrieved and processed quickly. Preferably on quick solid state drives

DATA SCALE

Byte: 8 Bits
1 byte: A single character

Kilobyte (1000 Bytes)
1 Kilobyte: A very short story

Megabyte (1 000 000 Bytes)
1 Megabyte: A small novel OR A 3.5 inch floppy disk

Gigabyte (1 000 000 000 Bytes)
1 Gigabyte: A pickup truck filled with paper"X OR a symphony in high-fidelity sound OR a movie at TV quality

Terabyte (1 000 000 000 000 Bytes)
1 Terabyte: An automated tape robot OR all the"X X-ray films in a large technological hospital OR 50,000 trees made into paper and printed OR daily rate of EOS data (1998)

Petabyte (1 000 000 000 000 000 Bytes)
1 Petabyte: 5 years of EOS data (at 46 mbps

Exabyte (1 000 000 000 000 000 000 Bytes)
5 Exabytes: All words ever spoken by human beings

Zettabyte (1 000 000 000 000 000 000 000 Bytes)
1024 Exabytes
40 zetabytes Stored information by 2020. If one byte is a human cell, then 40 zetabytes will mean a population of 400 million people

Yottabyte (1 000 000 000 000 000 000 000 000 Bytes)
Xenottabyte (1 000 000 000 000 000 000 000 000 000 Bytes)
Shilentnobyte (1 000 000 000 000 000 000 000 000 000 000 Bytes)
Domegemegrottebyte (1 000 000 000 000 000 000 000 000 000 000 000 Bytes

Uses of big data
Manufacturing: 25% reduction in testing time Security: Predictive analytics can help prevent an incident even before it happens studying changes in data behaviours Medicine: Study genomics to find larger patterns in population

Apache Hadoop
This open-source software framework, which finds its origins in Google's MapReduce and Google File System (GFS), supports data-intensive distributed applications on large data clusters. Simply put, Hadoop is taking the compute to the storage.

Intel for Hadoop
Intel calls itself a facilitator for Apache Hadoop by providing its expertise in computing, networking and storage to sync and work optimally for processing data. According to Intel, its advantage is in careful workload balancing with the knowledge of silicon.