Automatic captioning of fusion physics figures with image-to-text AI

Promotor: Prof. Geert Verdoolaege
Supervisor: Jerome Alhage
Study programs: Master of Science in Engineering Physics, Master of Science in Physics and Astronomy, Master of Science in Teaching in Science and Technology (Physics and Astronomy), European Master of Science in Nuclear Fusion and Engineering Physics, Master of Science in Information Engineering Technology
Location: Technicum, at home

Problem setting

The worldwide research on fusion energy aims to realize a source of electricity that is limitless, clean and safe. It is a large-scale enterprise that mimics the power of the stars. Ghent University is involved in fusion R&D by developing data science methods and numerical models, used for both enhancing the scientific understanding of magnetically confined plasma, and advancing the complex technology of fusion devices.

Fusion energy research is also a long-term collaboration of specialists with various areas of experience: technicians, engineers, experiment designers, analysts, theoreticians and managers. The volume of published works, both internally and in research papers, is substantial. Finding specific information can be a serious effort for individual researchers, however. Recent developments in large language models have created commonplace systems capable of answering questions about documents, using natural language. Since lots of scientific information in fusion research occurs in graphical form (plots, annotated images, or industry-standard diagrams), understanding image data would be a considerable asset of such a tool. These so-called multi-modal systems typically utilize a smaller image-to-text neural network to map figures to a common latent space. Then, the more powerful LLM will processes this additional data to answer questions using information presented in figures, to search for images in documents using natural language, or to classify figures based on context and domain expertise.

Objectives

The goal of this project is to create an expert system that can understand the figures of fusion research. Using the captions produced by this model, and the surrounding text, a future user will be able to ask the chat bot to look for a figure and to answer questions about it. This work is in line with ongoing conversational AI projects in the fusion community [1, 2]. Below are a few examples of images with the original captions.

Tasks

Create a dataset of fusion physics images and their captions.
Fine-tune a vision-language model to this dataset. Generate captions.
Evaluate performance (ex: sentence similarity).
[optional] Improve generated output with image segmentation.
[optional] Enrich generated output with OCR for annotations.
Report on findings.

Practical

You may use Google Colab or UGent’s HPC infrastructure if needed.
The university network will give you access to research papers.

Depending on the student’s background (emphasis on physics or computer science), the thesis can focus on correct interpretation by the chatbot of physics results, or on the technical aspects of the tool. Hence, students with limited physics background, but with the right computer skills are very welcome too.

Resources

BLIP: (paper)[https://arxiv.org/abs/2201.12086] (code)[https://opensource.salesforce.com/LAVIS//latest/tutorial.training-example.html]
mPLUG: (paper)[https://arxiv.org/abs/2205.12005] (code)[https://github.com/alibaba/AliceMind/tree/main/mPLUG#image-captioning]
ViT: (paper)[https://arxiv.org/abs/2010.11929] (code)[https://github.com/TheTensorDude/vision_transformer_tf#finetuning-on-custom-dataset]
CLIP: (paper)[https://arxiv.org/abs/2103.00020] (code)[https://github.com/openai/CLIP/issues/83]

References
[1] V. Mehta, Towards LLMs as Operational Copilots for Fusion Reactors (2023) [(link)[https://openreview.net/pdf?id=yGVChrbJ4E]]
[2] F. Almuhisen, Towards Tokamak operations Conversational AI Interface Using Multimodal Large Language Models (2024) [(link)[https://indico.euro-fusion.org/event/3118/contributions/12289/attachments/5768/10110/Kick-off- PRIO-IA-CEA_21_5_meeting.pdf]]

Example of AI interpretation of COMPASS data

Example of AI interpretation of JET data