MMS • RSS
Data is a thorny subject. For a start, we’re not sure how we are supposed to refer to it, that is – data is the plural of datum. Strictly speaking we should talk about data that ‘are’ not ‘is’ available to support a theory etc. The Guardian newspaper discussed the debate here and appeared to suggest that (split infinitives and nuances of idiomatic Latin notwithstanding) our day-to-day usage of the term is allowed to remained conveniently grammatically incorrect.
“For what it’s worth, I can confidently say that this will probably be the only time I ever write the word ‘datum’ in a [blog] post. Data as a plural term may be the proper usage, but language evolves and we want to write in terms that everyone understands – and that don’t seem ridiculous,” wrote Simon Rogers, in 2012, before moving to his position as data editor at Google.
So of the many different instances of individual datum (sorry, data) that exist, can we group them into distinct types, categories, varieties and classifications? In this world of so-called digital transformation and cloud computing that drives our always-on über-connected lifestyles, surely it would be useful to understand the what, when, where and why of data on our journey to then starting to appreciate the how factor.
1 – Big data
A core favorite, big data has arisen to be defined as something like: that amount of data that will not practically fit into a standard (relational) database for analysis and processing caused by the huge volumes of information being created by human and machine-generated processes.
“While definitions of ‘big data’ may differ slightly, at the root of each are very large, diverse data sets that include structured, semi-structured and unstructured data, from different sources and in different volumes, from terabytes to zettabytes. It’s about data sets so large and diverse that it’s difficult, if not impossible, for traditional relational databases to capture, manage, and process them with low-latency,” said Rob Thomas, general manager for IBM Analytics.
Thomas suggests that big data is a big deal because it’s the fuel that drives things like machine learning, which form the building blocks of artificial intelligence (AI). He says that by digging into (and analyzing) big data, people are able to discover patterns to better understand why things happened. They can also then use AI to predict how they may happen in the future and prescribe strategic directions based on these insights.
2 – Structured, unstructured, semi-structured data
All data has structure of some sort. Delineating between structured and unstructured data comes down to whether the data has a pre-defined data model and whether it’s organized in a pre-defined way.
Mat Keep is senior director of products and solutions at MongoDB. Keep explains that, in the past, data structures were pretty simple and often known ahead of data model design — and so data was typically stored in the tabular row and column format of relational databases.
“However, the advance of modern web, mobile, social, AI, and IoT apps, coupled with modern object-oriented programming, break that paradigm. The data describing an entity (i.e. a customer, product, connected asset) is managed in code as complete objects, containing deeply nested elements. The structure of those objects can vary (polymorphism) – i.e. some customers have a social media profile that is tracked, and some don’t. And, with agile development methodologies, data structures also change rapidly as new application features are built.” said Keep.
As a result of all this polymorphism today, many software developers are looking towards more flexible alternatives to relational databases to accommodate data of any structure.