Storage and technology companies have been throwing numbers at us for the last couple of years, cautioning that we are being deluged by data which we need to make sense of, and capitalise on.
Technology player IBM Inc. says individuals, companies and governments together generate 2.5 quintillion (or a billion billion) bytes of data daily -- 90% of which has been created in the last two years. Data in 2010 exceeded a zettabyte (1 trillion gigabytes), according to research firm IDC. The study was sponsored by storage vendor, EMC, which has interests in telling enterprises how to store and retrieve this data meaningfully. Nevertheless, IDC maintains that the digital universe is more than doubling every two years.
In 2011, the amount of information created and replicated will surpass 1.8 zettabytes (1.8 trillion gigabytes) -- growing by a factor of 9 in just five years, according to an IDC paper ’Extracting value from chaos’. The data bombard us from everywhere--from sensors used to gather climate information, posts to social media sites, digital pictures and videos posted online, transaction records of online purchases, and from cellphone global positioning system (GPS) signals. For instance, Facebook alone is home to 40 billion photos, and 48 hours of video are uploaded every minute on YouTube. And the world’s largest particle accelerator, CERN, generates over 25 petabytes each year of scientific data.
Technology vendors and research firms call this trend ’Big Data’. Cisco defines the phenomenon as data sets that are too large to manage and analyse with traditional data management tools and techniques in a reasonable timeframe. However, IDC researchers caution that while 75% of the information in the digital universe is generated by individuals, enterprises have some liability for 80% of information in the digital universe at some point in its digital life. Also, the amount of information individuals create themselves – writing documents, taking pictures, downloading music, etc. – is far less than the amount of information being created about them in the digital universe. IDC christens this the ’Digital Shadow’.
Since 2005, companies have increased their investments in storing data by over 50% to $4 trillion, says IDC. That’s money spent on hardware, software, services, and staff to create, manage, and store — and derive revenues from — the digital universe.
And companies like IBM, Oracle, Microsoft and SAP between them have spent over $15 billion on buying software firms specialising in data management and analytics. This industry is estimated to be worth more than $100 billion and growing at almost 10% a year.
The question is that given the growth, complexity and diversity involved, how does one contain that chaos and extract value from the piles of data? According to research by MGI and McKinsey’s Business Technology Office, ’Big Data’ can generate value in many sectors. For example, a retailer using big data to the full could increase its operating margin by more than 60%. If US healthcare were to use big data creatively, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US healthcare expenditure by about 8%, say MGI researchers.
According to a McKinsey report, the use of big data will become a key basis of competition and growth for firms with both established competitors and new entrants leveraging data-driven strategies to innovate, compete, and capture value from deep and up-to-real-time information.
Research firm Gartner, too, cautions that worldwide information volume is growing annually at a minimum rate of 59% annually, and while volume is a significant challenge in managing big data, business and IT leaders must focus on information volume, variety and velocity. Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand.
Over the next decade, the number of servers (virtual and physical) worldwide will grow by a factor of 10; the amount of information managed by enterprise data centres will grow by a factor of 50; and the number of files the data centre will have to deal with will grow by a factor of 75, according to the IDC paper. However, the number of IT professionals in the world will grow by less than a factor of 1.5, which means talent will be a problem. And so will security be a concern.
Current solutions to tackle ‘Big Data’ include Business intelligence tools, which increasingly are dealing with real-time data and new storage management tools to tackle data-related problems like de-duplication and virtualization. And just last month, Accel Partners launches a $100 million ’Big Data’ fund that aims to assist transformative early stage and growth companies throughout the Big Data ecosystem, from next generation storage and data management platforms to a range of software applications and services – i.e. data analytics, business intelligence, collaboration, mobile, vertical applications and many more. Efforts such as these will bring more innovative solutions in the data ecosystem.