Trusted Press Release Distribution   Plans | Login    

Briefing Search
Keyword:
Category:

       

    
Author Details
ACN Newswire

Bookmark and Share
Unearthing experimental data buried in scientific papers
Unearthing experimental data buried in scientific papers

BriefingWire.com, 1/07/2026 - Large language models accelerate construction of materials property databases.

TSUKUBA, Japan, Jan 8, 2026 - (ACN Newswire) - Technologies that underpin modern society, such as smartphones and automobiles, rely on a diverse range of functional materials. Materials scientists are therefore working to develop and improve new materials, but predicting material properties is no simple task. Data science is key to transforming this field, and new tools powered by artificial intelligence are expected to accelerate the exploration, collection, and management of materials property data worldwide.

The relationship between functional materials and their properties is complex. Even slight differences in composition or synthesis methods can affect electronic states and microstructures, often resulting in entirely different properties. For this reason, theoretical models alone cannot provide reliable predictions, and the intuition of researchers and engineers built on years of experience has played a significant role.

Machine learning is a technology that can learn empirical trends rather than relying on theory. By applying machine learning to experimental data in materials science, it may be possible to replicate such intuition computationally. Large language models (LLMs), such as ChatGPT, now support the daily lives of many people and are capable of flexible information extraction that takes background knowledge and context into account. This opens up the possibility of automating the process of converting complex information sources like scientific papers into structured data. If large-scale datasets of experimental data can be built through this approach, it is expected to enable researchers to gain inspiration through a bird's-eye view of the data, as well as to realize property predictions based on empirical trends using machine learning.

A team led by Dr. Yukari Katsura, a Senior Researcher at the National Institute for Materials Science (NIMS), has focused on this potential and developed two new tools to accelerate the construction of Starrydata, a materials property database built from data collected from scientific papers. This work was recently published in the journal Science and Technology of Advanced Materials: Methods.

"Graphs in the millions of papers published to date contain valuable experimental data collected by past researchers, and much of it remains untapped," says Prof. Katsura. In the Starrydata project, which she launched in 2015, data collection from papers was performed manually and supported by the independently developed Starrydata2 web system, successfully amassing an unprecedented volume of experimental data. The new tools are designed to further streamline this data collection process. "We found that by specifying a data structure and giving instructions to an LLM, we can accurately and comprehensively extract information about figures, tables, and samples from the text of paper PDFs across a wide range of fields."

Prof. Katsura added, "Many publishers prohibit the use of artificial intelligence on paper PDFs, so we are currently developing the system to target open-access papers."

Click here to continue

 
 
FAQs | Contact Us | Terms & Conditions | Privacy Policy
© 2026 Proserve Technology, Inc.