(US and Canada) The following is part of a series, “Drowning in Data (but no better informed),” exploring a critical evolution in data analytics. Based on research conducted by professor Michael Stonebraker from MIT CSAIL, this series outlines how the current speed, pace, and scale of data overwhelms most existing analytics and visualization tools and what to do about it.
This article makes five assertions about the future Internet of Things (IoT) market.
Assertion 1: We’re sensor-tagging everything
We are in the process of sensor-tagging everything of material significance so it can report its state or location in real-time.
In a hospital setting, we expect patient wristbands to become active so there will never be another lost patient. Every piece of mobile machinery (e.g., EKG machines) will be sensor-tagged. Over time, we expect blood pressure cuffs to be tagged so they don’t “walk away.” Also, we expect controlled drugs to be sensor-tagged to better track doses. In summary, as sensor technology prices continue to decline, we expect more and more objects to be tagged.
In a home setting, your doorbell and thermostat are already electronic. Your EV reports its battery level electronically. Your cell phone is a major tag. “Medic alert” systems will become even more common among “at risk” populations. One can imagine many more applications of tags in the energy sector. Again, as the price of sensor tags decreases, more household objects will be sensor-tagged.
In a factory setting, we expect most production machines to be sensor-tagged so they can report their status in real-time. They will also be tagged so they can report throughput and identify bottlenecks in production. For example, excessive vibrations, pressure, or energy usage may signify a device needing urgent maintenance. Over time, more individual units will be tagged. Expensive objects (e.g., John Deere tractors) are already tagged, and that will eventually expand to lesser-value things.
In summary, there is a revolution as we move to sensor-tagging most everything.
Assertion 2: Only if someone cares
To state the obvious, objects will be sensor-tagged only if somebody cares about the tag output. The “observer,” whether a human or a computer, will be interested in two classes of sensor events:
There are many examples of the utility of both kinds of events.
Assertion 3: Both analysis and visualization systems are required
Analysis of IoT events requires both an analytics system and a visualization system. If vibration analytics are sufficiently precise, one can simply have a “receiver” analyze the incoming time series traffic and “ring the red telephone” when the situation becomes critical. These alerts can be programmed using your favorite data science tool that is good at computing specialized aggregate data.
However, suppose there is an earthquake centered on a known fault line. There may be several factories at varying distances from the fault line, with sensor-tagged machinery at each one. It is hard to imagine a single analytic or a suite of analytics that could respond to “Are my factories OK?” A much better solution would be a powerful visualization system that shows an overview and detailed data in a single framework. That is why a complete solution requires an analysis and a visualization system.
Assertion 4: Operation at scale is often required.
One can imagine many applications which must monitor thousands to millions of sensor tags. Sensor-tagging cars in an urban area for traffic flow improvement and medic alert systems must operate at scale.
Any analysis system that cannot run in parallel on multiple nodes in a computing system will probably fail (i.e., take forever to run an analysis) at scale. Similarly, any visualization system that depends on a main memory-specialized data structure will fail at scale. To be meaningful, an IoT stream should be cross-referenced with non-IoT information like thresholds, machine models, and last maintenance.
The obvious conclusion: Make sure you deal with vendors whose products scale.
Assertion 5: At scale, an IoT repository should be a DBMS.
A time-series DBMS will be exceedingly helpful in the IoT world. At scale, an IoT repository should be a DBMS. Similarly, time-series events are best monitored in a time-series database. Assembling the time series of reading from a given sensor should be a very high-speed operation.
About the Author
Michael Stonebraker is a pioneer in database research and technology. He joined the University of California, Berkeley, as an assistant professor in 1971 and taught in the computer science and EECS departments for 29 years. While at Berkeley, Stonebraker developed prototypes for the INGRES relational data management system, the object-relational DBMS, POSTGRES, and Mariposa federated system.
He is the founder of three successful Silicon Valley startups whose objective was commercializing these prototypes. Stonebraker has authored scores of research papers on database technology, operating systems, and the architecture of system software services. He was awarded the ACM System Software Award in 1992 (for INGRES) and the Turing Award in 2015.
Stonebraker was elected to the National Academy of Engineering and is presently an adjunct professor of computer science at MIT’s Computer Science and AI Laboratory (CSAIL).