How ABFM Combats Unconscious Bias in Exam Questions

ABFM is committed to ensuring that all questions, which cover a broad spectrum of primary care practice, are fair for all examinees.

ABFM is committed to ensuring that all questions, which cover a broad spectrum of primary care practice, are fair for all examinees. This requires an annual, independent assessment to test for any group score differences that might represent potential item-level bias in the examination. In 2013, ABFM’s Psychometrics Department began conducting a process known as Differential Item Functioning (DIF) to accomplish this review.

ABFM annually assembles an 11-person DIF Panel, comprised of eight physicians of diverse backgrounds, as well as linguist Jennifer Cramer, Ph.D., ABFM’s Vice President of Psychometrics Tom O’Neill, Ph.D., and Senior Psychometrician Ting Wang, Ph.D., to review exam questions for potential bias.

DIF Panel physicians come from a variety of racial and ethnic backgrounds including African American, Hispanic, East Asian, and Middle Eastern, with balanced gender diversity. “We convene this panel of content experts, family medicine physicians with lived experience as part of a minority group,” said O’Neill. “Then we ask them,
‘Can you identify something in these questions that would cause them to function differently?’”

Over the course of one or two days each year, the panel reviews all questions that were flagged as potentially having bias in the previous year’s examination. Members of the DIF Panel are provided with statistics about these questions that may suggest a difference in how different people responded, based on a model that determines the relative functioning of each question by comparing groups of physicians who took part in the exam.

“Then we start to examine why one group would select the wrong answers, or the right ones, more often,” said O’Neill. In the eight years since the DIF Panel was formed, the parameters of the process have remained remarkably consistent. Of the 374 unique questions flagged since 2013, the DIF Panel identified just four questions which were found to have an identifiable source of bias. Questions identified by the DIF Panel members are removed from the item bank.

Dr. Jesse Hsieh, Chair of Beacon Health System in South Bend, Indiana, has served on the DIF Panel since 2016. “Everyone on the panel gets to know each other on a personal basis,” he said.

“We come from very different backgrounds. Some are academic, some high-level administration, some solo practice, some from public health. Some have practiced for 10 years, and some like me have practiced 30. So, it’s not just a diversity of ethnicities that are represented, but a diversity of backgrounds as well. That’s just as important.”

Bias is not always identified as negative. Some questions may actually be easier for a particular subgroup based on a variety of different criteria. Dr. O’Neill gave a general example. “Could you identify the source of bias if African American physicians are doing better on questions related to sickle cell anemia? In addition to caring for patients with this condition, these physicians are more likely to have family members affected by sickle cell anemia, or even be affected themselves, than their white colleagues. It’s not that the questions indicate a weakness of learning in other sub-groups,” Dr. O’Neill continued.

“It’s a special strength that one group of physicians may have. So, a question related to that topic may show bias and get flagged. Then the question comes up: ‘Is this an important aspect of family medicine?’ In this example, I believe the answer would probably be yes, so the question would be retained.”

In addition to the consideration of positive vs negative bias, the DIF Panel is also examining the structure of each question from a linguistic point of view. For that purpose, Jennifer Cramer, Chair of the University of Kentucky’s Department of Linguistics, was invited to lend her expertise to the DIF Panel’s annual discussions.

“For ABFM’s DIF Panel, the focus is on social variation. What linguistic structures exist in the system and how could those issues appear for potential test takers?” asked Cramer. “I’m not there for the medical content. I’m thinking about the test takers’ experience when they’re looking at a question and its potential answers. The best way to describe what I bring to the table would be, ‘is there something in the way the sentence is worded, or the way the answers are worded that makes the question problematic?’”

The combination of linguistic sensibilities and medical expertise have made the DIF Panel an efficient workgroup. During 2021’s panel meeting, they were able to review several hundred questions over the course of one long workday.

Dr. David Lowe is the Chief Medical Officer of Healthpoint Family Care in Covington, Kentucky. He’s served on the DIF Panel since 2017 and found the experience eye-opening. “I found fascinating the whole idea of looking at test questions for implicit bias,” he said. “When you’re taking a test, it is not uncommon for there to be questions that are very difficult, but I’d never thought that the questions might be difficult due to implicit bias. When I learned about the DIF Panel, I was thankful
this analysis was being done behind the scenes. It’s important because physicians should have confidence that ABFM looks at the exam critically each year.”

As the DIF Panel nears its 10-year anniversary, ABFM will continue to build on the process’s structure and success.


Aaron Burch serves as Editorial Content Manager for the American Board of Family Medicine. He has been writing professionally in the health care field since 2014.