Big small data

We really like the concept of small data. It seems to be well suited for the data analysis revolution that we are currently in. A lot of data about ourselves, our behavior, interactions and environments are collected on daily basis; and with the Internet of Everything there will only be more of them. As analysis techniques, from basic to highly specialized, are becoming more available for applications, the new challenges are what to do with data, what questions to ask and how to use the answers. Small data and related paradigms may indicate an interesting path towards turning available data into meaningful and actionable information that could be smoothly integrated into our daily lives.

Small data are relevant for their users

The concept of small data has been discussed for a few years and there are some interesting efforts to develop its definition. Small data are definitely not about size. They are also not necessarily about being always understood by a user, as the user may be not only an individual, but may be a group or an organization. Small data are however about being relevant for a user, immediately or potentially – i.e. after processing. Small data can be output of an analysis (including big data solutions) as well as input for the process. They can be human sourced, process mediated or machine generated. They can include unstructured elements and imperfections. They are often personal, unique and subjectively valuable, from sleep and activity records, through streams of financial transactions to family histories stored in digitized photographs. In every case small data exist in a well-defined context of users’ needs, requirements and preferences and are usually associated with some decision processes.

Despite emphasis on the small aspect of data, this concept is not in opposition to big data. The practical approaches related to these concepts can actually be complementary. Big data solutions can lead to amazing results. With practically unlimited storage, bandwidth and computing power, they create opportunities for conducting advanced analysis on entire populations rather than only on limited subsets. But this also means that the goals of such efforts are usually “big” and not always close to the “small” goals of individual users. In some cases, they can even be in conflict, especially when data ownership, transparency or privacy are added to the equation. Small data can very often be smoothly integrated with big data solutions, but small data analysis scenarios should be always build around user’s goals and priorities.

Small data analysis can solve big problems

These considerations may seem theoretical, but they may have very practical consequences, as focusing on small data can lead to new paradigms for designing data analysis solutions.

Small data analysis is about simplicity, focus on practical scenarios, connection to problem domains, with well-defined goals and expectation of inherently useful results. It is the bottom-up engineering approach that starts with available data, clear questions, and basic methods that can be quickly applied. In further steps, the process can be incrementally expanded as the context becomes better understood and more sophisticated methods can be selected for specific scenarios. Starting a data analysis project with advanced AI algorithms may be very tempting, but will not necessarily lead to the expected results. At the beginning, there are usually a lot of small treasures hidden in available data, treasures which can be extracted using relatively simple tools. Advanced techniques are more useful at later stages, when more complex questions are identified, and simple answers are no longer easy to find.

Small data are about providing value for their users. We find the concept especially useful as one of the foundation elements for defining personal analysis spaces, i.e. contexts for executing data analysis processes focused on individual goals with full control over data sharing (or accepting external streams) and tailored experiences. Emphasis on the personal aspects of analysis obviously doesn’t imply isolation or any other limitations. Similarly, a focus on the simplicity of solution’s design doesn’t mean restrictions to simple applications. On the contrary, small data analysis and the paradigm of incremental expansion seem to be very well suited to applications in complex domains, like digital healthcare. In this case, a system designed with a focus on a user and small data might enable brand new scenarios, based on data that users would not feel comfortable sharing and submitting to big data solutions.

Small data will need new rules and contracts

Small data analysis is in a sense an extremely user-centered approach to data processing. As such it can create unique opportunities to better understand users’ needs, requirements and preferences. Understanding of user’s context can lead to improving the value of results of data analysis, but also to more accurate specifications of technical requirements for design of data analysis solutions, with special emphasis on data flows, dependencies and trust boundaries. Small data have great potential value, especially when it comes to data generated, directly or indirectly, by users themselves. Privacy concerns are only the beginning of the story. The rules and contracts for handling small data (and ultimately benefiting from them) are yet to be determined.