December 02, 2017

Visual Artifacts in Data Analysis

December 02, 2017/ Tomasz

It’s difficult to overestimate the value of visualization in data analysis. Visual representations of data should not be considered the results of an analysis process, but rather the essential tools and methods that should be applied at every stage of working with data. When dealing with specific data and questions, we often find it useful to add non-standard visual elements that are adapted to characteristics of the data, goals of analysis tasks or individual and organizational requirements. We refer to such new elements as analysis artifacts, which can be defined as visual products of analysis methods, general or specific for domain and scenario, providing additional context for the results or the analysis process. There may be various goals identified for specific analysis artifacts, but their general role is to make the analysis user experience more accessible, adaptable and available.

Analysis artifacts can take many forms, from text elements, through supplementary data series, to new custom shapes and visual constructs. The simplest example are contextual text annotations, often automatically added, with details regarding the data or the process (see example). Some analysis artifacts are generic as they address data properties, like the cyclical or seasonal nature of a time series, patterns and trends (e.g. rent case study), outliers and anomalies or differences and similarities against population data. Some others are specific for a domain and/or type of an analysis task, and may be closely integrated with methods implemented in related analysis templates. In practice, we can think about different types of analysis artifacts in terms of tasks used in analysis decision support:

SUMMARIZE the structure and quality of data, results of analysis or execution of the process
EXPLAIN applied analysis methods, identified patterns, trends or other interesting discoveries
GUIDE through the next available and recommended steps at given point of the analysis process
INTEGRATE the data from different sources and the results from different methods and agents
RELATE the results, data points, data series, big picture context, and input from different users
EXPLORE anomalies, changes, hypothesizes and opportunities for experimentation or side questions

Figure 1 includes an example of domain and scenario specific analysis artifact (INTEGRATE type) for stock market transactions. This artifact illustrates a single purchase of stock and subsequent sales, using historical quotes as the background. For each sale event, the information about number of sold stock, their value and gain/loss are included.

Figure 1. Analysis artifact for stock transactions, with a single purchase and multiple sales events (SVG)

Analysis artifacts can be considered means for adaptation of analysis experience to the context of analysis tasks and user’s needs, requirements and preferences. They can be highly customized and personalized, leading to adaptive user experiences that are more effective and efficient in the scope of controlling the analysis processes and interpreting the results. These capabilities make analysis artifacts very powerful tools for complex decision problems and situations. They can be very useful in dealing with imperfect data, potential information overflow, operating under strong external requirements (e.g. time constraints) or in configurations with multiple participants and incompatible goals and priorities. We found that they can be also helpful beyond visualization, as the artifact-related data structures can become subjects of analysis or be applied in a completely different scope or a different scenario.

For example, Figure 2 presents a simple simulation based on the analysis artifact presented in Figure 1. The data structure related to that artifact are used here to illustrate a hypothetical scenario of investing the same amount of money in a different stock and following the same exact sales schedule (selling the same percentage of stock at each time point).

Figure 2. Simulation of the purchase and sell transactions from Figure 1 applied to a different stock (SVG, 2, 3)

We think about visualization of our data as merely a canvas upon which we can paint specialized and personalized artifacts from our analysis processes. These artifacts can be applied not only in the scope of individual charts, but also for interactive decision workflows, based on multiple data sources, that may require many integrated visualizations in order to provide a sufficient context for a decision maker. This is especially important for data analysis in social context, with advanced collaboration scenarios involving multiple human participants and AI agents. As the complexity of algorithms and models is increasing, we need to provide significantly more user-friendly analysis environments for expanding number and variety of users. Multimodal interactions and technologies for virtual or mixed reality have great potential, but the best way to deal with complexity is to focus on simplicity. Analysis artifacts seem to be natural approach to that challenge and they should lead us to brand new types of data analysis experiences, which may soon be necessities, not just opportunities.

June 30, 2017

Visual Decision Support

June 30, 2017/ Tomasz

In this blog post, we'll use examples from the first prototype that was implemented at Salient Works - the set of libraries and applications for decision support in air traveling scenarios. The primary motivation for that project was to address challenges related to long distance air travel, which can be very stressful, even when everything is going according to plan. They usually start with selecting a connection, when a traveler may experience information overload as different options are difficult to compare. During a trip, transfers between flights can be especially overwhelming due to time zones, unfamiliar airports and overall travel fatigue. In that context any unexpected event, like a missed or cancelled flight, can be traumatic, especially for elder or less frequent flyers. Many of these factors cannot be controlled, but we can build technical solutions to keep users comfortably informed, assist them at critical points and facilitate their interactions with other entities (like airlines).

Such solutions require effective presentation of information, usually in a visual form (though other types of interfaces can be also applied). We plan to publish a dedicated post about visualization as an essential part of data analysis, but here we’d like to talk about a more specific scope – role of visualization in decision support. We refer to visual decision support to describe situations when the user experience in a decision scenario is built around a visualization pattern specifically designed to address requirements of this scenario, with all its context and constraints. In practice, it means that all information required to make a correct decision, or series of decisions, should be delivered to a user at the right time and in the form adapted to user’s situation and most probable cognitive state. In our prototype applications, the relevant information is mostly related to time and space orientation and it should be presented in a way that reduces probability of errors (e.g. caused by lack of sleep) and stress related to operating in an unknown environment.

Let’s move to specific examples from our prototype. The main idea was to design a simple and universal visualization pattern that could be consistently used throughout different stages of a travel experience, including planning, the actual trip and dealing with emergencies. An example visualization of a trip between Seattle and Poznan using this pattern is presented in Figure 1. The pattern is built around time as perceived by the traveler (horizontal axis) and we placed a special emphasis on critical time points like beginning of each trip segment, as well as translating time zone differences upon arrival at a destination. The grayed area in the background indicates night time, so it should be easier for a traveler to plan working and resting during a trip. Creating such a visualization is the first step, as it can be customized and personalized (also for accessibility), used in a static itinerary (see example from our prototype) or in a dynamic companion application and updated with current details, such as a departure gate.

Figure 1. An example visualization if a trip between Seattle and Poznan

One key property of this pattern that may not be immediately obvious is the dimension of vertical axis - in the configuration of our examples it is based on latitude of visited airports. This property was introduced in order to create unique shapes for different trip options and to make a selected one look familiar and recognizable. After all, the same visual representation is about to be used during different stages of a trip, starting with its planning. This is actually the stage when the uniqueness of shapes turned out to be the most useful since it made comparison of available options much simpler and cleaner. Figure 2 contains examples of 5 different options for a trip from Seattle to Paris. As you can see, they are all presented using the same time range, so they are much easier to compare, including departure and arrival times, as well as layovers’ durations. We conducted limited usability tests and found out that this approach works also for comparing a significant number of options (see multiple results for the same query), especially when combined with multistage selection. Using our visual pattern, we were able to build a fully visual experience for trip searching, comparing and selection.

Figure 2. Comparison of 5 different options for a trip from Seattle to Paris

This was our first big project at Salient Works and we spent way too much time on its design and prototyping. In addition to core and support libraries, we built a visual search portal (integrated with Google QPX), functionality for generating personalized itineraries and even a proof of concept for a contextual app with a demo for re-scheduling a missed or cancelled connection. Unfortunately, we were not able to establish working monetization scenarios or find partners to introduce our prototypes into production. But we gained a lot of experience, which we later used in development of our framework, where we implement concept of visual decision support in a more flexible way, through application of analysis artifacts associated with different domain libraries and templates. And our prototypes may still find their way into production environments, as we recently came back to the project and adapted this pattern to visualization of flight time limitations, with pilots and other flying personal as intended users.

May 31, 2016

Healthcare and data analysis

May 31, 2016/ Tomasz

Healthcare is one of three domains we selected for initial applications of our framework for analysis of time oriented data (the other are security and finance). This is undoubtedly the most challenging domain, but also one with the biggest opportunities for delivering meaningful changes and positive impact. Healthcare is currently going through a radical revolution. And due to the nature of this domain, the consequences of the changes will affect everybody. We believe that data analysis is one of the key elements of the next generations of healthcare. But this also works in reverse – requirements and scenarios from this domain are great driving forces for innovation in data analysis.

Healthcare on the eve of a revolution

Healthcare is going through a gale of creative destruction, which will fundamentally change the landscape of the life science industry. It is primarily caused by scientific and technical progress: the revolution in sensors, always connected devices and capabilities to process and store massive amount of data. But it is not only about technologies themselves; it is also about how they have already changed users’ behaviors and expectations. Healthcare cannot be disconnected from the digital world that patients are used to. It is interesting to see that while many research efforts are still driven by the life science industry (e.g. genomics becoming more available), there are also strong initiatives originating from companies traditionally involved in information processing (IBM Watson, Microsoft Health or Apple CareKit). Revolution in healthcare is happening, even if there is significant resistance to change among medical professionals.

In our work we obviously focus on data analysis, which is actually very natural in the case of healthcare, as this domain is all about data. It always has been, long before the first signs of the digital revolution. Even when all doctor-patient interactions were direct and 1-on-1, they were based on insightful observations, and the doctor’s knowledge and experience to process them, to form a diagnosis and propose a therapy. Now we have more data, much more data, that can be available to patients and physicians. The data include genomics, anatomical imaging, physiological metrics, environmental records, patients’ (and physicians’) behaviors, decisions and observations. There are many functional, technical and business challenges, but these data will eventually flow smoothly in networks of patients, doctors and AI systems. And then we’ll have to focus on different, but already familiar challenges: what to do with the new data that are within our reach and how to use them to help solve old problems?

Data analysis will change healthcare

It should be clear now that we’re not talking here about data analysis understood as a basic process with some data on input and some results on the output. Data analysis in healthcare must be understood in a much broader scope, not limited only to selected, even if very useful, tasks, like analysis of anatomical images. Technical elements like data processing, extraction of information, creating models, learning from historical data, or providing practical decision support will always remain essential. But since data are expected to flow from multiple sources, and in many directions, data analysis should become one of the foundations, based on which required goal-oriented tasks can be effectively executed. And like any technology designed for digital healthcare, it must be extremely user-centered.

The goals of medicine should be always focused on the well-being of the patient. But in the digital world data are processed automatically, so the scope of data analysis can be much broader, without weakening that focus. What is more important, such expansion will actually increase the effectiveness of individual therapies. There are (at least) 3 levels at which data analysis should be considered in healthcare:

Individual healthcare will be radically changed by personalized medicine based on individual characteristics and situation. The context of decision making will be vastly expanded by the technology beyond the knowledge and experience of medical professionals that are directly involved. On the other side, patients will become more active participants, also benefiting from decision support mechanisms delivering information in highly usable form.
Healthcare relationships will become more recognized as essential for healthcare experience. There are concerns about reduced frequency of face to face interactions, but technology actually has the potential to help with establishing data flows and building strong trust-based connections. These should be the foundations for long-term relationships that are simple and convenient in good times, but remain very effective and natural in a case of a medical emergency.
Population research will be redefined by the availability of detailed data about individual differences and the ability to select subsets of population with a high degree of similarity. Data about individual treatment, after proper processing, will be submitted for population analysis and help with detection of trends and patterns on a local or global level. More importantly however, results from population analysis will also be used in individual diagnosis and treatment.

Healthcare of the future will be very different, though the details still remain unknown. One thing however is certain, it will be based on various types of data automatically collected, shared, processed and analyzed in many ways we are not yet able to foresee.

A pattern case study

Data analysis is not only about big data, as there is also big value hidden in small data and simple methods that can be easily adopted to solve big problems. Figure 1 presents a very basic example of visualization generated using our framework. This simple scenario integrates data streams of different types: physiological metrics of body temperature, and two streams covering medicine intake below the main chart: one with a predefined schedule (MED1), which could be supported by adaptive reminders in a mobile device, and second with a basic record (MED2). The chart also includes analysis artifacts showing target temperature range, notifications when the ranged was reached (for the 1st time, and for continuous 24h), and the expected path of temperature change within selected timespan, assumed to be associated with scheduled treatment.

Figure 1. Tracking temperature and medicine regime with associated analysis artifacts for target value range, expected change, and detected relevant events (SVG)

This example is based on test data, but it is a useful illustration of the analysis & visualization pattern that could be applied in many medical scenarios. This pattern is based on streams with metrics of current state, streams with records of selected actions (including a plan), and artifacts generated from analysis of these data in the context of related models (possibly constructed based on analysis of similar cases). Healthcare is a very practical domain and it is fundamentally focused on a change. It is about learning from the past, understanding current conditions, and planning for the future, with emphasis on available options and the probability of their outcomes. This applies to the context of an individual patient, relationships between patients and physicians as well as populations of different scale. Healthcare is therefore not only about data in general, but about time-oriented data! This makes it a perfect application domain for our framework.

This is our first post about healthcare and it is the beginning of the longer story. We plan several more healthcare-related posts covering topics like practical security and privacy requirements, specific decision support scenarios and design patterns for interactions between patient and physicians. All posts in this series will be tagged as healthcare. In the meantime, we are always interested in healthcare-related research data we could use for template development or experiments with our framework. If you have time-oriented data and you are interested in their analysis or visualization, please don’t hesitate to contact us.

If you want to learn more about challenges and opportunities for modern healthcare, you may want to start with a great book Creative Destruction of Medicine by Eric Topol.

April 06, 2016

Everything is a stream

April 06, 2016/ Tomasz

There is a nice consequence of our fixation on time-oriented data - we can consider everything as a data stream. Any input stream processed by our framework is required to include a time variable, from a simple timestamp field to more complex time structures. We use this variable to organize the data, but also to establish a shared dimension for connecting data elements from different sources. Such an approach gives us the opportunity to operate on multiple streams and implement interesting scenarios with the help of various stream transformations (e.g. time aggregation) and analysis methods.

We define data streams as sequences of events (or state snapshots/changes) that are time ordered, i.e. an event with a timestamp earlier than a previous one is flagged as a potential error. The simplest example of such a data stream is a univariate time series consisting of a timestamp and a data point (a single value), often with some additional requirements, like equal spacing between two consecutive data points. More complex objects in a data stream can include several variables describing an event, often closely related (e.g. time and distance of a running workout, enabling calculation of average speed). In many practical application scenarios, these objects can be models (e.g. snapshots of a system’s attack surface) or collections of unstructured data (e.g. articles published on a subject).

The beauty of a time variable being present in all our data is that it can become a shared dimension enabling integration of streams from different sources and building a bigger (or just cleaner) picture from seemingly disconnected elements. Let’s illustrate it using a case study based on very simple historical rent data. The input stream includes monthly rent payments and additional variables like location of a rented place and tags indicating lease renewal events. The interim visualization of the rent series is presented in Figure 1, with location and renewal events added as annotations (Point Marker artifacts).

Figure 1: Monthly rent payments with locations and renewal events as point annotations (SVG)

The history of rent payment is an obvious example of time-oriented data, but in this case study more interesting are variables for location and lease renewal events, all connected by time (of monthly granularity). Let’s start with the location variable - we can detect when its value changes and automatically create an annotation (Time Marker artifact) indicating an event of moving between apartments. The lease renewal variable on the other hand can be used to identify critical data points when rent is most likely to change the most, and produce a new series with percentage difference between consecutive renewals. Figure 2 includes the results of these operations.

Figure 2: Monthly payments with markers for moving and differences between lease renewals (SVG)

The requirement of time variable is one of the foundations of the framework we are creating. Time gives order to a stream of events, but also provides a frame of reference for data analysis. We can use time variables to effectively manage multiple different streams, create useful streams based on multiple input sources or extract key information from streams of unstructured data. Obviously there are challenges related to time granularities, irregular samples or overlapping of intervals and data points. All these problems however can be solved with help of the right technology.