Abstract
This dissertation addresses the challenge of deciphering extensive datasets collected from multiple sources, such as health habits and genetic information, in the context of studying complex issues like depression. A data analysis method known as Principal Covariate Regression (PCovR) provides a strong basis in this challenge.
Yet, analyzing these intricate datasets is far from straightforward. The data often contain redundant and irrelevant variables, making it difficult to extract meaningful insights. Furthermore, these data may involve different types of outcome variables (for instance, the variable pertaining to depression could manifest as a score from a depression scale or a binary diagnosis (yes/no) from a medical professional), adding another layer of complexity.
To overcome these obstacles, novel adaptations of PCovR are proposed in this dissertation. The methods automatically select important variables, categorize insights into those originating from a single source or multiple sources, and accommodate various outcome variable types. The effectiveness of these methods is demonstrated in predicting outcomes and revealing the subtle relationships within data from multiple sources.
Moreover, the dissertation offers a glimpse of future directions in enhancing PCovR. Implications of extending the method such that it selects important variables are critically examined. Also, an algorithm that has the potential to yield optimal results is suggested.
In conclusion, this dissertation proposes methods to tackle the complexity of large data from multiple sources, and points towards where opportunities may lie in the next line of research.
Yet, analyzing these intricate datasets is far from straightforward. The data often contain redundant and irrelevant variables, making it difficult to extract meaningful insights. Furthermore, these data may involve different types of outcome variables (for instance, the variable pertaining to depression could manifest as a score from a depression scale or a binary diagnosis (yes/no) from a medical professional), adding another layer of complexity.
To overcome these obstacles, novel adaptations of PCovR are proposed in this dissertation. The methods automatically select important variables, categorize insights into those originating from a single source or multiple sources, and accommodate various outcome variable types. The effectiveness of these methods is demonstrated in predicting outcomes and revealing the subtle relationships within data from multiple sources.
Moreover, the dissertation offers a glimpse of future directions in enhancing PCovR. Implications of extending the method such that it selects important variables are critically examined. Also, an algorithm that has the potential to yield optimal results is suggested.
In conclusion, this dissertation proposes methods to tackle the complexity of large data from multiple sources, and points towards where opportunities may lie in the next line of research.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Supervisors/Advisors |
|
Award date | 17 Nov 2023 |
Place of Publication | s.l. |
Publisher | |
Publication status | Published - 17 Nov 2023 |