doug laney

“Ask not what your data can do for you, ask what you can do for your data.”

data literacy

Introduction

Self-service analytics is the Holy Grail for end users, IT, and analytics staff. When it is realized, users can access and create reports and dashboards, develop visualizations to analyze and understand problems and opportunities, and combine data to answer new kinds of questions. Users do not have to wait for someone else to create the information they need now — they can do it themselves. IT and analytics professionals, on the other hand, can step back to more of a supporting role, experience a reduction in the backlog of work requests, and take on more challenging applications that create business value. It’s a win-win situation.

Unfortunately, self-service analytics has yet to reach its potential. Adoption rates are picking up but are still lower than desired, even though the business need exists. Users are more computer savvy and software (such as Tableau, Qlik, and Power BI) continues to get better. So, what’s the problem? There are several possible impediments, with various combinations applying in different situations. Below are some of the hurdles we commonly see:

  • Some users prefer that other people create the information that is needed (the “that’s not my job” attitude)
  • The software is perceived to be too difficult to use, especially by casual user
  • Needed data is not easily, readily available
  • Integrating data from multiple sources is too difficult
  • Training is either inadequate or ineffective

While attending the TDWI Strategy Summit In Las Vegas in February 2020, Doug Laney, Marie Clark (GE Entrepreneur in Residence and founder of Ambient Intelligence), and Hugh Watson chatted about self-service analytics and discovered that we share some similar thoughts about the problems with self service analytics, including one that doesn’t get enough attention — what we call the data literacy problem. We were noting how quick many organizations are to give their business professionals the keys to analytics tools without users having a sufficient appreciation of the data and how to work with it. Let’s consider some of the good, bad, and the ugly aspects of self-service analytics with an in-depth look at the data literacy problem and potential solutions.

The Long Path

For the past 20 years, software vendors have touted self-service analytics with claims about how their products could help business users be more self sufficient. The cold reality is that most of the early product offerings were perceived to be too difficult to use. It didn’t help that typical business users were often underrepresented on software selection committees, usually resulting in software choices that were high in functionality but low in usability.

The good news is that current software is both powerful and easier to use. For example, the ability to implement different personas makes it easier to provide the right mix of functionality and ease-of-use for different kinds and groups of users. The integration of AI capabilities also helps. For example, contemporary analytics software can suggest the best visualization (i.e., auto-charting), automatically build a dashboard based on user-specified metrics, and generate text that explains significant findings in visualizations and charts. The further development of virtual (i.e., digital) assistants will make it easier to simply ask for and receive analytics-related information.

The need for data wrangling (also referred to as data munging) is another long-standing barrier. Users must access the appropriate source or sources of data, possibly make transformations, and in the case of multiple data sources, integrate the data into a single table. Performing these tasks requires data management skills that most users did not posses and are potentially challenging to learn.

Once again, software vendors responded with AI capabilities that make data wrangling easier (but still difficult for some users). For example, analytics software automatically identifies whether a source data field is categorical, numerical, text, or date. It may identify and recommend fixes for inconsistent data, such as out-of-range values. It can automatically identify and execute potential joins (the user may have to specify the type of join).

Despite these ease-of-use improvements, there still seems to be a missing piece to self-service analytics, and we think that its greater data literacy. It’s something that is the joint responsibilities of IT, analytics, information and business professionals to work together to achieve.

Data Literacy

At the TDWI Strategy Summit, Doug Laney, in a moment of adapted political nostalgia, blurted “Ask not what your data can do for you, ask what you can do for your data.” We had a quick laugh, then got down to outlining what this means.

Our contention is that business professionals can do themselves and their organization’s data a great service by better curating, understanding, managing, and preparing their data before attempting any kind of analysis. It’s often been said that you can’t manage what you don’t measure. And it follows that you can’t create value from what you’re not managing well. Nowhere else is this more evident than in today’s data landscape.

Understanding Your Data

So business professionals, do your data a favor by getting to know it better. This is a key first component of data literacy. Understand where it originates (its provenance), its age, its lineage, its limits, its regulatory usage restrictions, its biases, how it may be the combination of other data sources, and its real (not assumed) business meaning. Does your data represent a complete set of something (e.g. sales transactions) or only a subset (e.g. only sales of items that were not returned). In fact, most organizations have a dozen or more definitions of what constitutes “a customer” (e.g. individual, household, current/former, shipping, billing), and at least as many data sets for each version. Learn how others have used similar data, both inside and outside your organization. Moreover, understand how any given data set relates to others. Reading a conceptual or logical data model isn’t really that difficult, presuming one exists. And data profiling tools are integrated into many analytics platforms and can help understand the ranges, averages, outliers, and other measures of the data itself.

Curating Data

Business leaders are desperate to generate more economic value from the information assets available to them. This means that they shouldn’t just focus on their own data, but those available externally as well. So, add value to your data by identifying external/alternative data assets that may enhance or be an improvement upon the value of those you already have. This may be in the form of enhanced customer data from any variety of data broker, competitor pricing data harvested from their websites, social media insights to augment your own customer support/feedback data, or global economic indicators to enhance your own forecast data. Unfortunately, in a room of about 100 data and analytics leaders at the Strategy Summit, not one person acknowledged that their organization has anyone dedicated to identifying and arranging access to external data sources. Yet each of their organizations has an entire department dedicated to procuring office furniture or other material assets.

Managing Your Data

Be circumspect about moving or copying data. Every time data is extracted or copied or moved from one place to another, your organization incurs certain risks and expenses. You incur the risk that the data become out of sync with the source, thereby rendering it inaccurate or incomplete. You increase your attack surface for data breeches. And you incur additional expenses to sync, store, and manage yet another data set. So do your data another favor and leave it where it lives — either in an accessible operational database, or more likely a data warehouse or data lake. Technologies abound for creating virtual databases that are easier to understand, manipulate, and analyze than trying to navigate an enterprise data warehouse. Resist creating dangerous Excel extracts of valuable corporate data assets, regardless how expedient it may be to do so.

Preparing Your Data

Your data may not be ready to serve you in its current form. Don’t expect that raw data or even data that exists in the data warehouse that will fully meet your analytic needs. So do your data a favor by integrating it with other data, sub select it, filter it, tag it, cleanse it, or otherwise transform it for improved analysis. Ideally you want to do this within the existing data warehouse environment if possible. Data leaders such as Chief Data Officers (CDOs) should ensure that these kinds of basic data prep services or technologies are available to business professionals, and that sufficient training on them exists.

Analyzing Your Data

Additionally, do your data a favor by learning a bit more than basic analytic functions and operations. Also, learn how to develop and test hypotheses and which types of visual representations are appropriate for enabling different types of insights. If you’re just knowledgeable on how to create pie charts and bar charts, or a basic linear regression, then you’re doing your data, and your organization a disservice. The better prepared you are to perform diagnostic, predictive, or prescriptive analyses upon the data, the more your data will do for you and your organization.

The Last Mile

Our prescriptions for increased data literacy will not be recognized and followed by many business professionals without help. Users need training and support from IT and analytics, if users are to realize the full potential of self-service analytics. Let’s consider what IT, analytics, and users need to do.

Training

While company training programs for analytics have improved over the years, many are still short of being optimal. Let’s consider what some companies are doing and what they need to do. The training efforts can be thought of as existing on different levels.

Level 1: Learning the Software. Here the emphasis is on learning the software tool — what to click. There is typically an accompanying data set to use, but it is not your company’s data and is most likely from a different industry. The training may be in-house or taken online. For most bright people, learning the software is relatively easy.

Level 2: Learning about the Data. In addition to learning the software tool, users are shown what company data is available, how to access it, and how to use it. Some of our prescriptions for greater data literacy are applicable here. This training is normally done in-house because the training employs actual company data. This training is an improvement over level 1; but unfortunately, users often return to their jobs and quickly forget much of what they have learned because they have not used it in their work.

Level 3: Using the Software and Data on Actual Work. Here the business professionals are required to complete a project related to their job using the software and whatever data is needed for their application. It is our experience that this is challenging for most users, but once they are successful, they feel a high level of satisfaction and are likely to use the tool in the future. All of the prescriptions for greater data literacy should be included in this training level. Much of the training involves coaching and helping the business professionals develop their applications. Users may also be asked to present their completed projects to others. Once users have finished Level 3 training, they are ready to be certified and given the “keys” (i.e., logon credentials) to start using the analytics tool in their jobs.

Support

Continuing support is needed for self-service analytics and can be provided in a variety of ways. There can be open houses where new and current analytics users drop in to talk with the analytics staff and discuss any topic of interest. User group meetings can be held in-house or online for the analytics staff to share information (e.g., new data sets) or for users to ask questions. The analytics staff can provide a “doctor service” and make “house calls” (i.e., visit users who are having a problem developing an application) or give users help over the phone. Training programs can be conducted to provide advanced skills, such as the use of analytics beyond reports, dashboards, and visualizations. You can establish an analytics center of excellence (ACE) and/or formal data literacy and certification program to prepare your business professionals to do more with their data.

Conclusion

For self-service analytics to be successful, more is required than training users on the software and providing a cursory understanding of what data is available. Users should have a data literacy that includes knowing what data is available (including its weaknesses), how data can be accessed and combined from a variety of sources, good data management practices, how to profile and transform data that is going to be analyzed, and more than the most basic visualizations and analytics. Go beyond Level 1 and 2 training and require users to build applications that they will use in their jobs. And then give them strong support so they can be successful in their efforts and develop greater skills over time.

 

Hugh J. Watson is a Professor and C. Herman and Mary Virginia Chair of Business Administration in the Terry College of Business at the University of Georgia. He is Senior Editor of the Business Intelligence Journal.

hwatson@uga.edu

Doug Laney is Principal, Data & Analytics Strategy, with Caserta. He is the author of the best-selling book, Infonomics.

doug.laney@caserta.com

A version of this piece originally appeared in TDWI’s Business Intelligence Journal.