Feminizing Wikipedia with auto-generated biographies

A researcher from Facebook’s artificial intelligence laboratory in Paris has developed a model capable of creating biographical articles of real personalities. His idea, to use the tool to populate Wikipedia with articles on female personalities. Angela Fan points out that these only account for a fifth of the biographies that Wikipedia currently has. A problem knowing that the participatory encyclopedia serves as a reference, appears among the first results of an online search, and often serves as a source for school children for their presentations. Not to mention that Wikipedia articles are also used to train algorithms likely to integrate this biased representation.

Automated generation of biographies

The researcher sees in her approach a complementary approach to existing initiatives to write “manually” women’s biographies. “Researching, creating a bibliography and writing it are intensive activities, but there is a wealth of information available on the web that can be used to tell the stories of women whose achievements, voices and legacies have been forgotten or marginalized”, she explains on the blog from Meta AI.

The model developed by Angela Fan proceeds in several stages. First, he learns to identify important biography information on the web. He then uses it to write the text itself, while a third module creates the bibliography from the sources he has used. The model will thus produce one after the other the different contents composing the Wikipedia article: youth, education and career.

Auto-generated content issues

However, the system suffers from several known problems with automatic text generation. First, it tends to reproduce the biases found in its sources and training texts. For example, he risks using the formula “woman scientist” rather than “scientist”, or devoting an exaggeratedly important part of the article to private life, for the simple reason that the web provides more information on this aspect. female personalities. Second, the content generation models sometimes produce fanciful information. In this case, as part of the research project, 68% of the text generated in the biographies did not appear in the sources.

>> On the topic: Giant Language Models: Risks Meet Capabilities

Moreover, in the same way as the algorithmic censorship of unwanted content, the algorithmic generation of desired content undermines the understanding that users have of the content presented to them. Not to mention that the practice is also used for less noble purposes (phishing, fake news). Thus, Google indicates in its instructions that it removes auto-generated content from its results when it is “intended to manipulate search rankings rather than help users.”

We would love to thank the author of this post for this remarkable web content

Feminizing Wikipedia with auto-generated biographies


You can find our social media accounts as well as other related pageshttps://www.ai-magazine.com/related-pages/