By Lindsey Morris, Ph.D., Director of Data Science and Analytics
Sin City beckons tens of thousands of technology leaders from around the world to Amazon Web Service’s (AWS) re:Invent, a quintessential, annual technology conference where AWS reveals it’s cutting-edge technology offerings for the coming year. The advancements are always presented during keynote addresses in a minimalistic yet curiously flamboyant style, befitting a venue like the Las Vegas strip.
Apart from the new technologies added to the arguably already first-in-class and highly integrated platforms offered by AWS, the conference boasts literally thousands of talks, workshops, and bootcamps over the course of the 5-day event. Attendees navigate this labyrinth via a smartphone app, again apropos for a technology conference such as re:Invent.
I was initially overwhelmed at the prospect of spending a full week in Las Vegas coupled with attempting to leverage the value of a conference that had no less than 3700+ session offerings when I first created a username and password and logged into my re:Invent app. However, after a couple of days being what could only be described as an ‘re:Invent newb,’ I developed some strategies to navigate, locate the talks and workshops most relevant to my work at axialHealthcare, and bring back some technology acumen that will enable more valuable, advanced, and faster development of machine learning algorithms to solve problems for patients experiencing uncoordinated pain care and in cycles of opioid abuse.
Additionally, I along with 3 other colleagues were proud to support Kevin Harvey, our VP of Engineering, who presented on the development of a HIPAA-compliant phone platform called ‘Consult Connect.’ This platform is a testament to how we are working with AWS and utilizing their powerful tools to advance our capabilities, analytics, and machine learning.
My first day, and other days hence, was entirely devoted to a bootcamp centered on hands-on learning of AWS’s SageMaker, a tool that allows not just data scientists, but anyone to load data, train, test, validate, and deploy machine learning algorithms. The instructor worked us through a few examples of how to do this, one such example being training and testing an algorithm to predict the presence of breast cancer using both medical records and image data of the breast mass.
I found the hands-on aspect of this bootcamp extremely valuable, because while AWS’s services are not difficult to use, it can be intimidating to get started. Hands-on training really lifts the veil and provides a lot of confidence to continue forward. The remainder of my time at the conference was spent in hands-on labs (mostly on SageMaker and Redshift, a data warehouse), some talks on PostgreSQL databases, and application of machine learning methodologies using AWS’ services.
Democratizing Data Science
Because this was a technology conference and not centered on pain and opioids, I thought a lot about the adoption of analytics and machine learning outputs in the context of healthcare. AWS appears to have a strategy of democratizing data science, meaning putting data science tools at the hands of anyone, not just data scientists.
This is a powerful strategy. However, the barriers to adoption of both data-driven insights and technology platforms in the healthcare space can’t be ignored. The different data science and machine learning models out there, such as regression techniques, classifiers, boosting and other ensembling techniques, naive Bayes, neural networks, and many others, are complex and difficult to interpret. Also, the objective of most machine learning algorithms is to create a function f(X) that relates input variables X to an output variable Y. The Y variable is the healthcare outcome of interest, and X represents any measurement, such as age, gender, social determinants, among others. Determining X is where the art and fun of data science really begins.
For healthcare providers to really use these algorithms they must 1) be valuable in helping patients, and 2) be interpretable and actionable. The values that are X are really critical in helping users interpret the output Y, and unfortunately, more data-driven determinations of X that may enhance discriminability don’t always make sense from an interpretability standpoint.
While democratizing data science with more out of the box tools like SageMaker and Google Cloud AutoML may expand the use of data science and significantly speed development times, it’s important to understand that it may come at a risk: the developer may not be able to explain the output, resulting in limited adoption or misinterpretation.
Recently, the data science field is focused on the role of the data translator in many businesses to help not just users, but business leaders and executives to understand and consume data science artifacts to drive value. In subsequent blog posts, our data scientists discuss mathematical ways we can expose what’s under the hood of these models to better communicate their value and start the process of learning from data, even when the data support conclusions that deviate from our notions of what is correct.
Ultimately, I believe we can leverage tools like SageMaker to get quick, rapid prototyping of data science algorithms for products, but it’s necessary to spend more effort to understand and craft the inputs, interpret the outputs, and offer up results in an easy to consume and valuable way. Nothing beats concisely knowing your data, and that takes time.