Before we get started I wanted to say that I’m absolutely impressed by the questions and comments that the first and second articles in this series generated. All your feedback seems to indicate that, in spite of all the hype around Big Data, understanding fundamental concepts like data lifecycle, insights, true value of data and so on is still a challenge. Few comments indicate that the complexity is to blame. Steve Jobs once said, “Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple.”
Getting our “thinking clean” is one of the reasons that led me to do this series.
Today we’ll dive into the first part of the Journey, the Data. Let’s start by agreeing that Data is everything. You and I are Data. Our own DNA is a treasure trove of data. Data is both Form and Function. Data has a Life of its own.
You probably remember that I ended the last article with a question: “What the world would look like to you if you were to be that ONE text message that your cell phone just sent out?”. My intent was to get you to think what the world would look like … to a text message. Or … to a banking transaction record? Or … to a picture someone just posted on Instagram?
This quick experiment leads us to realize that approaching Data from inside out can greatly reduce the dimensionality of the external universe, especially when it comes to gathering and analyzing Data. It is like peeking through a keyhole, except that you and I are *inside* the Data (room). This is all good, however … does this imaginary experiment have any merit in the real world? Fortunately, Einstein’s famous imaginary elevator taught us that we can learn about reality just by thinking … and then use Data to vet out our newly acquired knowledge!
In conclusion, taking an “Insider” approach to understanding Data greatly simplifies the problem domain. The immediate benefit is that models become easier to train, variability is greatly reduced and the noise level goes way down. I hope we can all agree that is a good starting point for our exploration. The ability to map out our journey in small (byte-sized) quantities and in a highly predicable fashion was exactly what we were looking for.
As a quick side note, it’s worth mentioning that this “Insider” approach to understanding Data is just one of the many perspectives that the industry had pioneered over the years. Other perspectives (or approaches) include the Observer Approach, the Consumer Approach, or a combination. These are all topics for future articles.
Now that we established our first vantage point, let’s look at another key characteristic of Data, that is, the Data lifecycle.
Essentially, Data goes through an evolution of its own, just like everything else in the Universe. In its initial state, when the Data is gathered, it is nothing more than an assembly of bits. Next, Data starts taking different forms or starts fulfilling different functions. In the prior article I gave an example of how a piece of Data generated by a meteorologic station is morphed into information that then can be leveraged. Quite intriguing is the fact that Data reaches its peak value during those evolutionary transformations in the earlier stages of its lifecycle. In other words, the time when the Data is at its highest value is within seconds or minutes since the moment it was gathered. If don’t think that’s the case, please think about that ONE text message that you just received — it is important NOW! Twenty minutes and hundred text messages later you will hardly remember this ONE text message. Another day or two go by and you will completely forget it.
What’s even more intriguing is that different types of Data have different lifespans (and different value timelines). Some of them are very short lived, some of them seem to have more of traditional lifespan while gradually dropping their value, and some of them continue to be relevant for years to come. Categorizing Data based on its expected time value has implications on how we engineer Data processing systems. For example, if the data is highly “perishable” (its value drops in seconds) then processing that data has to happen really fast, otherwise the insights gained from it become completely useless. On the opposite end of that spectrum is Data that stays relevant for days, months or even years. Processing such data types require much more advanced architectures with an eye not only what we can learn from Data today, but also what it is that we will be demanding from it tomorrow. One such type of long lived data is the health data. The patient describing his health concerns becomes valuable almost immediately as the doctor uses that Data to prescribe the medication. Same Data becomes even more valuable over time for medical research, market demographics and so.
I hope this quick introspective into Data helped us realize why it is so important to understand Data and its many different facets before anything else. Where all this research transcends science and technology and becomes real is when we leverage it to improve someone’s wellbeing. That’s where the proverbial rubber meets the road!
In summary, having a predictable roadmap for navigating healthcare data records and extracting insightful information along the way is something that we, at BitMED, had to address early on. But we haven’t stopped there! We embarked ourselves on a quest to advance our know-how and continue to explore new perspectives as we are being presented with larger amounts of Data, new geo-demographics, new behaviors, and new business demands. In spite of all that cool geekiness, the real value of all this work is to the Patient. The immediate benefit is Patients have access to higher quality care without having to travel hundreds of miles or having to spend countless hours on the phone. That’s when We, the Engineering Team at BitMED, know that we’ve done our job!
I’d like to leave you today with a question that I’ve been pondering in my head for awhile: Is Data truly the currency of tomorrow?
Until next time, all the best!