At The Data Team, we realize that “big data” and “data science” are hyped and over-used terms whereas in reality organizations find it challenging to go beyond the initial hype and see the value. The main reasons are a lack of clarity on what to expect from “big data” and “data science”, and the absence of a mature strategy to leverage data. In this post, we will demystify the term “big data” and then touch upon what constitutes “data as a strategy”. The two concepts are related so much so that the latter is the framework that leverages the former at the right time. In a subsequent post, we’ll be dissecting the term “data science” and tying it back to data strategy as well.
Let us begin by seeing how the popular conceptions of “big data” fall short.
Big data is not about the Three V’s. After all, large volumes have been handled by massively parallel processing architectures for a while now (for instance, my ex-employer). It is not about velocity since rapid ingestion and action on data too has been around from the time of transaction processing systems.
Big data is not about a use case. I have come across innumerable companies claiming to offer “big data products” or “be” big data companies whereas in fact most of them play in the social media/digital marketing space. Let me tell you that social media or digital marketing is probably not the first use case your company will be solving with big data, since deriving value from social media requires reasonable-to-high penetration in various social media channels, marketing maturity to take advantage of such engagements, and some legal clearances.
Big data should not be mistakenly equated to a specific technology. It is not a farm where all animals are equal and the elephant is more equal than the rest.
The hype around big data is certainly justified. We postulate that this is because of the emphasis big data has placed on promoting a culture that uses data for furthering business. This data culture demands of the organization the ability to allow anyone to analyze any data of any size by using any (combination of) tool to serve business objectives. This data culture doesn’t obey organizational boundaries like business and IT, is motivated by feedback and sharing internally and externally, doesn’t shy away from large data sizes, and in fact thrives when challenged with frugality and complexity. Some of the tools the practitioners use have been around in the enterprise ecosystem for a while now, and some are relatively new. A select few are powered by research at the cutting edge of computer science (for example, deep learning).