Deep Dive Into Gaussian Processes and Variable Importance
When telling stories with data, regression models are helpful to explain trends and predict results for new observations. Oftentimes, linear regression models are used to describe relationships between outcome and predictor variables because there are well-defined methods of measuring a feature variable’s importance and the accuracy of the model as a whole. However, the story a linear model tells may be inaccurate in cases where the outcome has nonlinear relationships with its predictor variables. A nonlinear regression model may be more appropriate and precise, yet these models lack easily defined feature variable importance metrics. In this paper, I will apply a recently developed variable importance operator that measures a variable’s importance on a local level, for each observation, and on a global level, for a population. I will compare the accuracy of simple linear regression, traditional one-layer Gaussian processes, and two-layer Gaussian processes. Using each of the three models, I will measure variable importance in the context of relating different gene expressions in mice (predictors) to their observed traits (outcomes). The results of this study will provide a new perspective on how to eradicate the trade-off between model accuracy and interpretability in terms of measuring variable importance. This has the potential to improve the field of data science and the application towards biomedicine–whether it be a gene expression’s effect on disease progression across a whole population or per individual within specific subpopulations.
Research Area | Presenter | Title | Keywords |
---|---|---|---|
Environmental Science and Sustainability | Saunders, Sam | Regression Analysis |