Abstract
Decrypting our genome can bring answers to many questions concerning our existence: why we look the way we do, how we live, and how we develop diseases. For decades, scientists have searched for the origins of disease in lifestyle, environmental exposure, and the genetic code we carry and pass down through generations. Nevertheless, some diseases, while clearly hinting at a genetic basis, remain enigmas. Among them is the neuromuscular disorder Amyotrophic Lateral Sclerosis (ALS). Now that the reading of the human genetic code has been completed, research efforts focused on collecting genetic data to solve this mystery have gained more momentum than ever. Yet the genome itself is a vast and complex data sequence, too intricate to be fully understood using traditional mathematical or statistical models. This complexity calls for more sophisticated approaches, such as those offered by deep learning.
This thesis primarily recognizes the potential of deep learning models in solving the genetic puzzle of ALS disease, motivate the need to employ these methods in genetics research, and decomposes the obstacles to their use. The thesis is a collection of three chapters that study these obstacles and propose methodological solutions for the greater purpose of enhancing the utility of deep learning in genome research. To ultimately shed light on how genetic factors contribute to ALS, the genome must first be transformed into manageable data representations (Chapter 2). These representations must then be constructed in a way that retains the subtle genetic signals associated with the disease, rather than only capturing the most prominent genetic characteristics of an individual (Chapter 3). Finally, predicting who has the disease may not always align with what truly causes it.
Such predictions can be complicated by patterns in the genetic data that are irrelevant to the actual biology of the disease. An alternative is to model what differentiates a patient from a healthy person, and what genetic patterns are shared between two patients or two healthy individuals (Chapter 4).
This thesis primarily recognizes the potential of deep learning models in solving the genetic puzzle of ALS disease, motivate the need to employ these methods in genetics research, and decomposes the obstacles to their use. The thesis is a collection of three chapters that study these obstacles and propose methodological solutions for the greater purpose of enhancing the utility of deep learning in genome research. To ultimately shed light on how genetic factors contribute to ALS, the genome must first be transformed into manageable data representations (Chapter 2). These representations must then be constructed in a way that retains the subtle genetic signals associated with the disease, rather than only capturing the most prominent genetic characteristics of an individual (Chapter 3). Finally, predicting who has the disease may not always align with what truly causes it.
Such predictions can be complicated by patterns in the genetic data that are irrelevant to the actual biology of the disease. An alternative is to model what differentiates a patient from a healthy person, and what genetic patterns are shared between two patients or two healthy individuals (Chapter 4).
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 25 Jun 2025 |
Place of Publication | Tilburg |
Publisher | |
Print ISBNs | 978 90 5668 773 1 |
DOIs | |
Publication status | Published - 2025 |