NA-MIC Project Weeks

Back to Projects List

AMP SCZ Combining baseline and longitudinal information for prediction of psychosis conversion

Key Investigators

Pablo Polosecki (IBM Research, USA)
Nora Penzel (MGH, USA)
Ofer Pasternak (MGH, USA)

Presenter location: In-person

Project Description

This project is part of the AMP SCZ program, an initiative for early detection of risk for schizophrenia.

A key goal in AMP SCZ is to predict which patients that present initially mild or sub-threshold symptoms will eventually develop psychosis. Most predictive models are based on data acquired on their first medical visit (the baseline visit). An important question is how much is gained by following patients over time (longitudinal data). Moreover, what is a principled way to combine baseline and longitudinal information?

In this project we will implement predictive models that make use of both baseline and longitudinal information for psychosis prediction. This project builds on a previous one, in which we implemented an approach called “joint modeling”, which had important limitations. For this project, we will implement one based on a combination of two approaches:

Multiple kernel learning (MKL): a simple predictive model for the fusion of multiple modalities. MKL combines kernels (i.e. a similarity measure across samples) from different modalities. Some modalities could be baseline measures, while others could be longitudinal trajectories.
Dynamic time warping (DTW): a way to estimate the dissimilarity or distance between trajectories, regardless of differences in the number of time points, sampling rate, or the existence of delays between them. It is simple to build kernels for MKL from DTW distances.

Objective

Implement a Python-based version of MKL-DTW longitudinal models adapted for common best practices in machine learning (separate train/test, scikit-learn compatible methods).
Quantify the advantage of longitudinal models vs baseline predictors in a legacy dataset.

Approach and Plan

Write an estimator of kernel distances based on DTW in python.
Write an extension of the MKL package MKLpy that can integrate DTW kernels for longitudinal modalities with traditional kernels for baseline modalities.
Benchmark performance on a legacy dataset.

Progress and Next Steps

We implemented a number of similarity measures for multivariate longitudinal sequences.
We implemented the extension of multiple kernel learning to use these kernels in longitudinal datasets.
We curated a dataset from a semi-public source (NIH) with cross-sectional and longitudinal information.
We tried using the curared dataset to validate the new prediction method. We are currently finding some issues with the samples, which we are fixing.

Next steps:

Fix the issues with the proposed dataset.
Find a new dataset to make longitudinal predictions in a clinically usefull scenario (e.g. few visits)

Illustrations

No response

Background and References

No response