Following the review carried out this past year of the LUCIA project (Understanding Lung Cancer related risk factors and their Impact Assessment) by the European Commission, and with the participation of external experts Ildiko Kissné Horváth, linked to the Ministry of Human Resources, Semmelweis University and the Koranyi National Institute of Pulmonology; as well as Raouf Naguib, from BIOCORE Research and Consultancy, we will now describe some of the most significant aspects of the project in its different approaches.
In lung cancer, the most exciting advances no longer come solely from a new test, an isolated biomarker, or an algorithm that works well in the lab. What's starting to make a difference is the ability to connect elements that typically progress separately, and that's what makes the LUCIA project particularly interesting.
The project brings together in a single initiative several lines that are currently gaining importance in lung cancer research: more sophisticated risk stratification models, non-invasive technologies for screening and diagnosis, multi-omics analysis, artificial intelligence applied to longitudinal data, and a data infrastructure designed so that all of this can coexist in an interoperable and traceable way.
From Fragmented Data to a Common Architecture
One of LUCIA's pillars is its technological foundation. The project has developed a Health Data Platform (HDP) that acts as a central repository for data, models, and analytical tools.
In addition, there is a Virtual Research Environment (VRE) that allows for data navigation, notebooks, model inference, image modules, GeoAI components, and visual analytics of risk factors.
Here, interoperability is not considered a last-minute technical add-on, but rather a starting point. Therefore, the project has worked with OMOP-CDM to map retrospective cohorts and the unified prospective dataset, especially in areas such as diagnoses, prescriptions, and laboratory results.
Alignment with the EHDS has also been sought through HealthDCAT-AP. This is key for data reuse, sharing information in a structured way, and scaling this type of initiative within the European context.
Simply put: LUCIA not only generates knowledge, it also creates the necessary structure to organize it.
Risk Models Closer to Clinical Practice
Another important contribution of the project lies in the risk algorithms based on electronic health records. In contrast to approaches relying almost exclusively on traditional variables, LUCIA works with structured temporal data in OMOP format to predict the incidence of lung cancer within one year. In this context, different approaches are compared: Logistic Regression, RNN, STraTS-no time, and STraTS.
In the case of RNN, Transformers (such as STraTS) are being adopted in clinical research due to several structural limitations of RNN when applied to electronic health records (EHRs) and complex longitudinal data, because this data model adapts very well to token/event architectures.
Beyond the numbers, the interesting aspect is what it suggests: when the medical record is well-structured, it can reveal risk signals that do not always appear when only traditional variables such as age or smoking are considered.
Clinical, but also environmental and territorial risk
LUCIA further expands the concept of risk by incorporating a layer that is often left out of the analysis: the environmental and geospatial context. Among the factors that appear as relevant are radon, PM2.5 particles, land cover, access to green spaces, and deprivation indices.
Non-invasive technologies that are already generating real data
One of the most striking sections of the project is the one dedicated to non-invasive sensors for screening and diagnosis. Here, LUCIA doesn't stop at a promising idea; it deploys concrete technologies and generates data at a scale sufficient to be analyzed with advanced models. These technologies include the Breath Analyzer (BAN), the Wide-spectrum-biomarker multi-use sensing patch (WBSP), and Spectrometry-on-card (SPOC).
Biomarkers and Multi-Omics to Refine Stratification
LUCIA is also working on another of the most active fronts of the moment: biomarkers and multi-omics analysis.
The project includes models based on proteomics, methylation, and metabolomics. In addition, there is a line of research that integrates genetic and epigenetic variation using artificial intelligence. The combination of methylation and DNA signals improves performance compared to using methylation alone. In other words, multi-omics is treated as one more piece within an integrated approach to risk.
An innovation that also considers traceability and regulation
Another strength of LUCIA is that it does not separate technological development from its future application. The platform includes components for managing the AI lifecycle, experiment tracking, model traceability, and tools to manage the entire analytical process. Furthermore, the project incorporates, from the design stage, considerations related to Trustworthy AI, the AI Act, GDPR, and CEN-CENELEC standards.
At a time when innovation in health not only has to demonstrate performance, but also robustness, explainability and regulatory compliance, this component takes on special relevance.
Our Role Within This Infrastructure
Within this architecture, we participate in the digital ecosystem that enables the capture, organization, and utilization of project information.
Specifically, the prospective study relies on the eCRF integrated into the LUCIA platform. This tool allows for the consistent structuring of clinical data collection and facilitates its subsequent use in advanced analyses and model development.
At BILBOMÁTICA, we also contribute extensive experience in the design and development of information systems that work with clinical data from diverse sources and of varying nature. This expertise has been built through numerous projects in the healthcare sector, not only in lung cancer but also in other areas such as breast cancer.
Our work focuses on facilitating the integration, structuring, and consistent use of this data, often generated in very different systems, for research and clinical decision-making.
A key aspect of this approach is interoperability. We have experience implementing standards such as HL7 and FHIR, which are essential for the secure and understandable exchange of clinical information between systems, institutions, and research platforms.
This expertise allows tools like eCRF to be more than just a data capture mechanism; they are an integral part of a broader infrastructure, designed to harmonize clinical information, support advanced analytical models, and facilitate data reuse in research environments.
What makes LUCIA different
What truly sets LUCIA apart is not a single technology: it's the way it connects several capabilities that rarely appear together with this level of detail: mapping to OMOP-CDM, alignment with EHDS and HealthDCAT-AP, risk algorithms like STraTS, environmental and geospatial models, non-invasive sensors analyzed with explainable deep learning, and a technological foundation designed for traceability, validation, and future deployment.
In a field where many initiatives are progressing separately, LUCIA brings something more difficult to build: a true logic of integration.
In lung cancer research, this integration can be as important as any of the individual innovations that comprise it.
The progress achieved in the LUCIA project has been made possible by the collaborative work of the entire consortium. We would like to acknowledge and thank all the partners for their involvement, whose effort and collaboration have been key to achieving these results.
The positive evaluation received during the review reflects not only the scientific and technological rigor of the solutions developed, but also the consortium's ability to work in a coordinated manner and move toward a common goal.
Now, as we reach the end of the project, we maintain that same spirit of collaboration. Our objective is to conclude this phase by consolidating high-quality results and contributing new knowledge that will help improve the diagnosis and understanding of lung cancer.
The LUCIA project receives funding from the European Union's Horizon Europe research and innovation programme under the grant agreement 101096473.