Motivated by analyzing long-termphysiological time series, we design a robust and scalable spectral embedding algorithm that we refer to as RObust and Scalable Embedding via LANdmark Diffusion ( Roseland). The key is designing a diffusion process on the dataset where the diffusion is done via a small subset called the landmark set. Roseland is theoretically justified under the manifold model, and its computational complexity is comparable with commonly applied subsampling scheme such as the Nyström extension. Specifically, when there are n data points in Rq and nβ points in the landmark set, where β ∈ (0; 1), the computational complexity of Roseland is O(n1+2β + qn1+β), while that of Nystrom is O(n2:81β +qn1+2β). To demonstrate the potential of Roseland, we apply it to three datasets and compare it with several other existing algorithms. First, we apply Roseland to the task of spectral clustering using the MNIST dataset (70,000 images), achieving 85% accuracy when the dataset is clean and 78% accuracy when the dataset is noisy. Compared with other subsampling schemes, overall Roseland achieves a better performance. Second, we apply Roseland to the task of image segmentation using images from COCO. Finally, we demonstrate how to apply Roseland to explore long-term arterial blood pressure waveform dynamics during a liver transplant operation lasting for 12 hours. In conclusion, Roseland is scalable and robust, and it has a potential for analyzing large datasets.
|Original language||English (US)|
|Journal||Journal of Machine Learning Research|
|State||Published - 2022|
ASJC Scopus subject areas
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence