How to ensure that human biases do not permeate the algorithms?

When data scientists study data, they must avoid survivor bias, which would consist in drawing conclusions on the basis of an incomplete population, comprising only those elements who “survived”, which are in fact exceptions. , rather than representative cases.

In France, “antivax” have fallen victim to Simpson’s paradox. They wrongly assured on social networks that the non-vaccinated do not saturate the country’s resuscitation services, relying on misinterpreted Drees data. Their main mistake is looking at raw numbers instead of percentages. There are indeed nine times more vaccinated than non-vaccinated in France.

While the unvaccinated are a very small minority, they are overrepresented in hospital, with 63% of critical care admissions. If we look at the absolute figures, we may have the impression that the two populations are in balanced numbers in the hospitals, but that would be to forget to look at the proportion of each in the general population. Thus several recent surveys carried out on hospital samples have concluded that the non-vaccinated represented between 70% and 90% in intensive care units.

How do data scientists correct bias?

Once we have identified why and how biases pose a problem for data scientists, we will focus on what data scientists do, do not do, and should do to limit the risks associated with these biases.

1. Become aware of the problem and ask yourself the right questions

To become aware of the problem of cognitive biases, data scientists have access to different types of resources. They can, for example, start getting information through a charter, such as the ethical charter drawn up within datacraft. They can also analyze the content of trusted AI evaluation repositories, such as that of Labelia Labs (formerly Substra Foundation) or that of the LNE. The oaths also establish a relevant list of criteria for responsible AI, as do Tech pledge and Holberton-Turing Oath. Finally, there are practical tools such as the deon ethical Data Science checklist, accessible from the command line.

It is important to have a critical mind on your own work when you are a data scientist. If we had to choose only 3 questions to ask absolutely, here is what I propose:

  • Commit to pausing to consider all the consequences of your work, whether intended or not;
  • Control the consequences of his work over time;
  • Aim for self-regulation using evaluation reference systems, certification with audit), in addition to the “7 points of vigilance” highlighted by the European Commission.

2. Measure bias

After becoming aware of the potential existence of bias, the second step is to define appropriate metrics in order to measure them properly. The choice of metrics then depends essentially on what one seeks to control. Aequitas is an open source toolkit for auditing bias, created by the Center for Data Science and Public Policy from the University of Chicago.

It helps verify the predictions of machine learning-based risk assessment tools to understand the different types of biases and make informed decisions about the development and deployment of these systems. The “fairness tree” helps to choose the right metric. Here as elsewhere, it is advisable to be attentive to the choices made since there is a new possible bias. Indeed, we must be aware that, by choosing a metric, we rule out all the others.

We want to say thanks to the author of this post for this awesome web content

How to ensure that human biases do not permeate the algorithms?

You can find our social media profiles here as well as other pages related to them here.