During the summer of 2013, I directed an NSF REU project at Cornell University on high dimensional data analysis. I had six students working with me across five projects. Here are summaries of each one, along with each student’s final presentation and any other relevant information.

Organization of Historical Economic Data
Wendy Zeng (Cornell University)

Wendy worked on historical economic data, which consisted of a large number of economic indicators (e.g., GDP) across numerous country/year pairs (e.g., USA/2003). She was interested in which countries have similar economic patterns and which economic indicators are correlated. Wendy employed diffusion based tree structures to organize both the country/years and the indicators, leveraging one organization against the other in a back and forth scheme that yielded some very cool low dimensional geometric patterns.

Wendy’s blog for this project:
Final presentation:

Diffusion Maps for Dynamical Systems
Christian Smith (Macalester College)

Christian worked on a project involving dynamical systems, and in particular the Lotka-Volterra equations used in biology to model the interaction of different species. Christian obtained a low dimensional embedding of the dynamical system that organized trajectories according to their behavior. From there he studied how the low dimensional embeddings changed as he adjusted the parameters of the system (the parameters govern how the species interact).

Final presentation: zip

Time-Coupled Diffusion Maps
Nicholas Marshall (Clarkson University)

Nicholas studied the heat equation with time dependent Laplacian, which governs how heat spreads over a time evolving manifold. Just as random walks can be used to approximate heat flow on a static manifold, Nick was interested in coming up with an analogous approximation for the time-dependent case. The solution was to use time inhomogeneous Markov chains, which can be thought of as random walks whose transition probabilities change with each step. In the same spirit as the original diffusion maps by Coifman and Lafon, one can then embed dynamic high dimensional data into a low dimensional space via a nonlinear mapping, which Nick calls the time-coupled diffusion map.

Final presentation: pdf
Nick’s website:

Automatic Gating in Flow-Cytometry Data
Keyi Wu (Cornell University)

Keyi studied the use of minimum path distances to organize flow cytometry data from cell biology. Through a process called gating, flow cytometry measurements are used to organize various cell populations for the diagnosis of medical disorders. Keyi has efficiently automated this gating process by clustering the data according to a specific type of minimum path distance that can isolate strangely shaped regions of high density, which usually correspond to specific populations. 

Final presentation: ppt

Smooth Interpolation
Ariel Herbert-Voss (University of Utah) and Frederick McCollum (University of Arkansas)

Ariel and Frederick worked together on a project involving interpolation of data by smooth functions. Given function values and partial derivatives, the goal was to compute the interpolant whose derivative has minimal Lipschitz constant. It turns out that one can construct such an interpolant (but not easily), and together Ariel and Derick translated this construction into an efficient algorithm by utilizing a litany of tools, including algorithms for computing convex hulls, power diagrams, triangulations, and tree structures. 

Part 1: Frederick’s final presentation: zip
Part 2: Ariel’s final presentation: pdf

Frederick’s website (code available here):
Ariel’s website:

This material is based upon work supported by the National Science Foundation under grant number NSF-1156350. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

© Matthew Hirn 2013-2015