Big data is certainly getting bigger.
McKinsey Global Institute forecasts that the volume of data you get on the Internet will grow at around 40% a year, representing a growth of 44x in 11 years, between 2009 to 2020.
The thing with big data is that it is not only the volume that you need to consider. As more and more machines become smarter, there is now an influx of machine generated data that you have to store and analyze. Then you have more people with Internet access creating a whole lot of content that you should capture.
Aside from volume, you should also check out velocity or frequency that data is coming in. Case in point, social media allows your customers to share their opinions and you need to capture these opinions because these are important to your customer relationship management.
Then you also have to consider variety. In the old days, data came to you pretty much structured and follows a specified data format. These formats are pretty much standard and changes slowly over time. Today, you get unstructured data using different formats and is constantly changing.
Further, you need to consider value. Not all data sets are gold mines. You need to identify which sets of data are valuable and you need to know how to capture, extract, transform and analyze the data for it to be useful.
Oracle gives you an idea of just how fast data is growing. A single jet engine generates up to 10 terabytes of data in just half an hour. Considering that there are 25,000 flights each day, jet engines all around the world can generate petabytes and petabytes of data daily. Meanwhile, Twitter might be limiting data to only 140 characters, but the speed with which tweets are sent out means that Twitter as a data source adds around 8 terabytes of data every day.
As you can imagine older databases will not be enough to handle the volume, variety and velocity of big data. What are the database technologies you need?
• Column-oriented databases. If you are running a database for online transaction processing, then you would do well with traditional databases because these are row-oriented and updates are very fast. However, row-oriented databases often falter when data volumes grow. This is why a column-oriented database is needed for the speediest query times and data compression.
• NoSQL. NoSQL databases works best with unstructured data, because it does away with restrictions that are imposed by traditional databases, allowing you to scale easily.
• Hadoop. This is an open source platform that is geared towards managing big data. It helps you work with data coming from different sources.
• SkyTree. It helps you with data analytics and machine learning when dealing with big data.
• Hive. This allows you to use your traditional BI applications to query Hadoop clusters. Other similar technologies that you should know include PIG, WibiData, and Platfora.
If you are thinking about how to get your big data initiative right, then you should call Four Cornerstone. We can help you determine all the right technologies that would allow you to get your databases ready for the unrelenting growth of the Internet.
Photo by Christoph Scholz.