July Symposium 2023 – Rachel & Gary

July 26, 2023

AB: Ansley Bowen | RT: Rachel Tornheim | GG: Gary Gravesandy

AB: And really quickly, I will go ahead and hand that off to Rachel and Gary to introduce themselves. Rachel?

RT: Thanks so much. I’m going to share my screen here. So I am Rachel Tornheim, and I’m going to be presenting today with my colleague Gary Gravesandy. We are both part of the Consumer Digital Innovations team at Mount Sinai Health System, and equity implications of clinical decision support tools is something that we’re both very interested in, so we are excited to be presenting to you today.

GG: Thank you for the intro, Rachel. It’s actually pretty common for us to think of our doctors as infallible and invincible, but after all, we don’t want to doubt the person we’re entrusting our life and wellbeing to. So, to deliver on that trust, clinical decision support tools are frequently used by physicians to diagnose patients, develop treatment plans, and, in general, interpret results of certain tests. And these tools consolidate the general findings of available medical research to-date and allow these physicians to apply it to the specific patients that they’re treating at this time. And frankly, given the unreasonable number of patients certain clinicians are being asked to see daily, we wouldn’t be able to provide optimal care without these tools.

RT: So, for today’s talk, we’re going to look at two categories of clinical decision support tools. On one hand, we have logic-based systems. So, these are tools where the rules that they use to determine their recommendations have been explicitly coded by humans. So, as an example, you might think about a rule that says, if a patient’s blood pressure is higher than 140 over 90, and they’ve been documented as having high risk of stroke or heart disease, then prescribe an ACE inhibitor. These logic-based systems can be very complex. They could have thousands of tools. The tools could have many conditions each, but the defining characteristic is that humans have explicitly coded in those rules. As a result of that, the output of these models is 100% predictable and explainable, but the downside is that they’re limited to the knowledge that humans have coded into them, so they really can’t handle novel situations. The other type of clinical decision support tool we’re going to talk about is machine-learning-based tools. So, these have gotten a lot of press recently with advances in AI. In this case, we don’t tell them the rules to use. Instead, we give them training data, and they use a machine- learning algorithm to deduce rules from that training data. The advantage here is that they have a greater ability to develop novel insights and to respond to new scenarios that we haven’t given them explicit rules for. The downside is that these models tend to be a black box, meaning we know what their recommendation is, but we don’t necessarily know why.

GG: Both the logic-based tools and the machine learning tools can have problematic recommendations from an equity perspective, and they come in a few different flavors. The first one is focused on the data set used to either determine the rules of logic- based models or train a machine learning model. And that data set just might be missing key elements that lead to biased results. With machine learning tools, since they have to be trained on what is the correct recommendation in any given scenario, they have this unique issue of perpetuating existing biases in existing methods. So, and both of these actually can still fall victim to the correlation fallacy where you take data, interpret it as an input that has a direct effect on the desired outcomes when in reality that’s not the case.

RT: So let’s dive in a little more deeply into each of these three scenarios. The first one is where the data used to develop the tool is not sufficiently robust. So the first scenario here is where certain populations are excluded from the research and the previous speaker actually alluded to this, specifically talking about clinical trials being primarily of white people and minorities being underrepresented there. We also see a lot of bias in clinical trials with regard to gender. So historically for decades, women were often excluded from clinical trials. The reasons given were that there was a concern that fluctuating hormones in women might make them more difficult to study or worries over potentially causing birth defects if the women were pregnant. And those are valid concerns, but at the same time, you can imagine that if we base research, for instance, on the efficacy of a medication, the level of results from that medication that are adverse effects or the correct dosage, and that research is done on men, but then we try to apply those findings to women, you can see how that could create an equity concern because women’s bodies often react differently. We’ve also seen issues with clinical trials excluding, for example, people with disabilities, which creates those same concerns when trying to apply research findings to those populations.

GG: And these issues manifest in tools being built today. There’s a team of researchers looking to create a machine learning model to essentially diagnose a disease given only pictures of the rashes normally used to diagnose the disease. And it’s well-known with skin symptoms such as rashes that if you look at patients of different skin tones, you’ll have different presentations. Despite this knowledge, the researchers did not include skin tone nor proxy, such as race, where you could utilize that data to either train the model to act differently based off the skin tone, or at least test the model after it’s been trained on the different skin tones. And even though they included 14 different countries in their data set, of those 14, only 1 was a predominantly non-white country. So in this case, leaving skin tone and race out of the training and the model from the get-go is probably a wrong decision because now we don’t even know if this tool is effective across different skin tones and therefore across different races. Now, the previous speaker actually did talk about high-risk care management programs, and health systems actually use a risk score algorithm to decide which patients in their large populations might benefit from these programs. One of the ones that some health systems have chosen to use, though, actually, rather than focusing on just clinical outcomes, does also include the ways in which people access medical resources and actually can create a very good model of risk of having a high cost of care. But a retrospective study on this specific algorithmic risk score showed that Black patients consistently received lower risk scores than white patients given the same clinical situation. The graph I have here is just one where, essentially, to have the same risk score, you would need to have more chronic diseases as a Black patient compared to a white patient. What makes this interesting, though, is that the model does not include race as an input at all. But the focus on the cost of care, instead of just clinical outcomes, actually manifested the ways in which Black patients have lower access to certain healthcare, as well as lower likelihood of using the same amount of healthcare given the same clinical situation. Then, applying this to determine whether or not they deserve to be part of a high-risk care management program, so additional resources, you end up perpetuating this lower access to care because you’re assuming that the output is what you’re trying to give.

RT: And then the third bucket we’re going to go into is cases where the clinical decision support tools mistake correlation for causation. And this is usually because they treat a demographic of the population as a proxy for something else. This primarily comes up when race is used as an input to medical calculators that are meant to determine care decisions for conditions that have nothing to do with skin tone, and this is a practice that has increasingly come under fire in recent years. So I’m going to talk about two examples. The first one is VBAC, which stands for Vaginal Birth After Cesarean. There’s a VBAC calculator that was introduced in 2007, and it estimates the likelihood that a patient who has previously had a cesarean section will be able to successfully give birth vaginally. The calculator gives that percentage, and then that is used to determine whether the patient is sent directly to have a C-section or if they try to give birth vaginally first. When this calculator was first introduced, it included race as one of its inputs, and if the patient was indicated as being Black or Hispanic, the calculator would give them a lower likelihood of having a successful VBAC and therefore a higher likelihood of being sent directly to have a C-section. Now, there are well-documented clinical benefits of having a vaginal birth versus a C-section, and so this is problematic. So why was race included as an input here? Well, studies done in early 2000s had shown a correlation between a patient being White and then having a successful VBAC. Now, there’s obviously no biological basis for that. Instead, Whiteness was a marker of privilege. The White patients were more likely to have higher income, higher education, more access to social supports, and better access to healthcare, and that in turn resulted in a higher success rate with VBAC. In fact, the study that showed that Whiteness is correlated with higher success of VBAC also showed that if the patient is married or has private health insurance, that’s correlated with higher success in VBAC, but neither of those factors ended up as inputs into the equation. So why is this problematic? While of course, we do want to research and understand how health outcomes vary among racial groups at a population level, it’s problematic to use that assumption to steer individual Black and Hispanic patients away from trying to give birth vaginally. It’s a false generalization, and it effectively codifies the impact of racial inequity into tools that are used to direct care. Because of those concerns, that original VBAC calculator has been taken out of use and replaced with a version that does not include race as an input. A second example is the GFR equation. So this is an equation that calculates kidney function, and it looks at how healthy someone’s kidneys are, and based on the output of the equation for a given patient, they may be recommended certain interventions, such as dialysis or a kidney transplant. When this equation came out, it also includes an input for race, and specifically if the patient is indicated as being Black, then their kidney function is estimated as higher than if they were white. And as a result, they’re less likely to be recommended those interventions. Why is that? Well, the research that was done showed again, a correlation between race and in this case, muscle mass. This was based on research done in the 1990s and it’s later been discredited that there’s that actual relationship. It’s likely due to third factors such as diet and profession. For example, if you are in a profession that’s primarily manual labor, you will tend to have higher muscle mass than if you’re an office worker. So using race as an input to this equation is problematic for the same reasons as VBAC. Even if the correlation holds at a population level, it’s a false generalization to use that assumption to direct the care of an individual patient. And it’s especially concerning in this case because the black population has a higher rate of kidney disease than the overall population. So many medical institutions and medical societies have recommended removing race as an input in this equation, although the version with race is still used often. So we’re going to close with some recommendations. If you’re part of an organization that’s building clinical decision support tools and you want to try to make sure that you don’t run afoul of equity concerns, couple of things you can do. First is pay close attention to the dataset that you’re using to build the tool. Make sure that it’s representative of the population that the tool is going to be used to direct care for. Specifically, if you’re building a machine learning-based tool, you want to really question whether there could be any biases built into the training data that may be replicated. You want to think critically about what variables to use. So we’ve given examples where it makes sense to include race in a tool and other examples where it really doesn’t. So that’s just something to weigh. And then finally, make sure that equity is part of the discussion, that you’re doing an equity assessment on any tools, publicizing that, and just helping to raise awareness of the importance of considering equity in clinical decision support.

GG: And many of these recommendations are relevant when you’re buying the tools as well. You can ask vendors for any data or assessments they have on the equity concerns that their tool might end up exacerbating, as well as when you’re running the pilot programs, ensuring the patients affected by these tools have relevant demographics recorded that you can compare against. Ensure the tool isn’t providing different recommendations to different patients inaccurately. And finally, as long as you have equity as a key element in your organization’s vendor selection process, then you can be sure that everyone is being held to the same standard. And given the time, we’ll leave questions to chat in the afterwards, so thank you for the time.

AB: Thank you both so much for this wonderful and very insightful presentation. Thank you again for taking the time to come and speak, Rachel and Gary. It was a really, really great and insightful presentation.