EPFL researchers offer an open platform for chemical data management

Chemistry laboratories generate a large amount of data. However, some of them are still in paper format and are difficult to access in their entirety. Three EPFL scientists present a modular open science platform for managing the large amounts of data produced in chemical research. Their study titled Making the collective knowledge of chemistry open and machine-readable” was published in Nature Chemistry.

Data management in modern chemistry is difficult. If we take the example of the synthesis of a new compound, many trials and errors take place before finding the right reaction conditions and thus generate large amounts of raw data. This data is very important because, like human beings, machine learning algorithms also learn from failed or partially successful experiments.

Currently, only the most successful experiments are published. artificial intelligence, in particular the machine learningmay allow data from failed experiments to be processed provided it is stored in a machine-readable format that anyone can use.

Professor Berend Smit, who heads the Molecular Simulation Laboratory at EPFL Valais Wallis, explains:

“For a long time, we had to compress data due to the limited number of pages of paper newspaper articles. Today, many newspapers no longer even have paper editions. Yet chemists still face reproducibility issues because journal articles miss important details. Researchers waste time and resources replicating the authors’ failed experiments. They find it difficult to rely on published results because raw data is rarely published. »

Berend Smit, Luc Patiny and Kevin Jablonka from EPFL have published an outlook that presents an open platform for the entire chemistry workflow: from project initiation to publication.

Machine-readable FAIR data

Their main thesis is that if we want to advance chemistry with data-intensive research and also solve problems of reproducibility, we must change the way experimental data is collected and reported.

Three steps are essential: collecting, processing and publishing data, at minimal cost to researchers. The guiding principle is that data should be easily findable, accessible, interoperable and reusable (FAIR).

Berend Smith says:

“At the time of data collection, the data will be automatically converted into a standard FAIR format, which will allow all failed or partially successful experiments to be published automatically, as well as the most successful experiment. »

The authors propose that the data is also exploitable by machines.

Kevin Jablonka says:

“We are seeing more and more data science studies in chemistry. Indeed, the latest machine learning results attempt to tackle some of the problems that chemists believe are unsolvable. For example, our group has made significant progress in predicting optimal reaction conditions using machine learning models. These models would be much more valuable if they could also learn reaction conditions that fail, but they remain biased because only successful conditions are published. »

To establish a FAIR data management plan, the researchers present 5 measures:

  • The chemical community should adopt its own standards and solutions;
  • Journals should mandate the deposit of reusable raw data, where community standards exist;
  • We must accept the publication of “failed” experiments;
  • Electronic lab notebooks that do not allow all data to be exported in an open, machine-readable form should be avoided;
  • Data-driven research must be part of our curricula.

Luc Patiny says:

“We believe there is no need to invent new file formats or technologies. In principle, we have all the technologies. We need to adopt them and make them interoperable. »

The authors point out that storing data in an electronic lab notebook, which is the current trend, does not mean that humans and machines can reuse it. Structuring and publishing the data in a standardized format is the best alternative provided there is sufficient context.

Berend Smit adds:

“Our perspective offers insight into what are believed to be the key elements in bridging the gap between data and machine learning for fundamental problems in chemistry. We also provide an open science solution where EPFL can lead by example. »

Sources of the article:

Kevin Maik Jablonka, Luc Patiny, Berend Smit. Making the collective knowledge of chemistry open and machine-actionable. Nature Chemistry April 4, 2022. DOI: 10.1038/s41557-022-00910-7

We would love to say thanks to the writer of this short article for this awesome web content

EPFL researchers offer an open platform for chemical data management


Discover our social media profiles as well as the other related pageshttps://www.ai-magazine.com/related-pages/