Escuela Técnica Superior de Ingenieros en Topografía, Geodesia y Cartografía

Harnessing large language models to build knowledge graphs from earthquake news

El profesor de nuestra Escuela Luis M. Vilches-Blázquez, junto a investigadores del Centro de Investigación en Computación del Instituto Politécnico Nacional (México), han publicado este interesante artículo. ¡Enhorabuena!

Autores: Luis Roberto Polo-Bautista, Luis M. Vilches-Blázquez, Sandra Dinora Orantes-Jiménez

Universidades participantes: Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico; Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain

Revista: International Journal of Digital Earth

Link to this article: https://doi.org/10.1080/17538947.2025.2594950

ABSTRACT

Earthquakes are geological phenomena that have a significant impact on human lives, infrastructure, and economies. Although numerous seismic events are documented annually, only a fraction of them cause substantial damage. The media have been instrumental in collecting information on seismic events worldwide. However, existing databases often lack specific details. Consequently, scholars turn to newspaper archives to learn about aspects related to these events. Knowledge Graphs (KGs) have emerged to represent and retrieve data in a systematic way, playing a key role in artificial intelligence applications. Recent studies have explored the use of Large Language Models (LLMs) to construct KGs, but problems such as inconsistency and lack of precision persist. In this paper, a zero-shot learning approach is proposed to semi-automatically build KGs by leveraging natural language processing and semantic analysis techniques. The proposed approach aims to improve entity and relation detection while minimizing manual intervention. In addition, a set of metrics has been defined to evaluate the quality of the generated KGs, focusing on factual accuracy and semantic consistency. The effectiveness of the proposed approach is evaluated using a GDELT corpus, which contains 2,241 Spanish-language news articles on seismic events in Mexico from 2017 to 2021.

KEYWORDS

Knowledge graphs; large language models; zero-shot approach; knowledge extraction; earthquake news

To cite this article: Luis Roberto Polo-Bautista, Luis M. Vilches-Blázquez & Sandra Dinora Orantes-Jiménez (2025) Harnessing large language models to build knowledge graphs from earthquake news, International Journal of Digital Earth, 18:2, 2594950, DOI: 10.1080/17538947.2025.2594950

FIGURES

Figure 1. Semi-automatic construction process of KGs

Figure 2. Visualizing 2% of each graph. Note: (A) The Llama 3.1 subgraph is comprised of 106 nodes and 352 edges. (B) The OLMO subgraph consists of 186 nodes and 576 edges. (C) The Gemma 2 subgraph is made up of 341 nodes and 950 edges. (D) The GPT-4o subgraph is composed of 257 nodes and 831 edges.