摘要:Augmenting the artificial intelligence GPT-4 with extra chemistry knowledge made it much better at planning chemistry experiments, but it refused to make heroin or sarin gas

Language models that power chatbots like ChatGPT can be used for automated chemistry, from synthesising chemicals and discovering drugs to designing, planning and carrying out scientific experiments.

Large language models like GPT-4 have been trained on data from much of the internet and appear to be competent at answering problems from a wide range of disciplines, but they can struggle with tasks requiring more expert knowledge, such as chemistry.

“They lack this chemical knowledge and they are not really good at representing molecules,” says Philippe Schwaller at the Swiss Federal Institute of Technology in Lausanne.

To make GPT-4 a better chemist, Schwaller and his team augmented it with the ability to search through libraries of molecules, chemical reactions and scientific research. “This basically makes it possible for the language models to automatically query those tools while solving a task and get much more specific information, and then be a lot more accurate on the chemistry tasks,” says Schwaller.

The researchers tested this augmented AI, dubbed ChemCrow, on 12 common chemistry tasks, such as synthesising the drug atorvastatin, a common medication for high blood pressure, and calculating how much the ingredients would cost. They also gave the same tasks to the regular version of GPT-4, then asked chemists to evaluate the feasibility of both AIs’ plans.

For the atorvastatin task, GPT-4 failed to synthesise the compound, while ChemCrow came up with a workable seven-step plan, including quantities, timings and lab conditions.

On average, ChemCrow scored more than 9 out of 10 for completing the 12 requests, according to human evaluators, but sometimes failed on tasks like judging whether a synthesis method was novel or toxic. By comparison, GPT-4 scored less than 7 out of 10.

The evaluators were also asked to judge whether the AIs provided factually accurate information, and again ChemCrow scored much higher – more than 9 out of 10 versus less than 5 for GPT-4.

Read more: Automated chemistry: The machines that can discover new drugs

In a separate study, Gabriel Gomes at Carnegie Mellon University in Pennsylvania and his colleagues augmented GPT-4 with chemistry tools, similar to ChemCrow, but also supplied it with the documentation and software interface of a remotely controlled chemistry lab that had various liquid compounds attached to robotic arms and plates.

They then asked it to perform specific reactions using the liquids and found that it could draft a workable plan and carry out actions to produce the required compounds.

Gomes and his team also asked the language model to come up with plans for making illegal or dangerous substances, such as heroin or sarin gas, but the model refused. For tools like ChemCrow, which Schwaller says could help democratise access to chemistry and lower the barrier for entry to people without significant scientific experience, there is also the risk that the AI suggestions lead to accidents and harmful compounds being created.

However, many recipes for synthesising dangerous compounds are already available via web searches, says Ross King at the University of Cambridge. “You can get public domain tools to help you do that sort of thing if you were really determined to try to synthesise something illegal or dangerous.”

There is also the problem of hallucination, which is when language models fabricate seemingly plausible information — but this might not matter for AI chemistry, says King. “If you actually have a new novel hypothesis, and if it’s been hallucinated by a machine, it doesn’t really matter how it’s generated, as long as when you do the scientific experiments it’s consistent with reality and actually works.”

Reference:arxiv.org/abs/2304.05376 & arxiv.org/abs/2304.05332