Analysis of Spatial Single-Cell Datasets

Frederick National Lab

West Lafayette, IN   National Data Mine Network  

Background:

Spatial single-cell datasets (SPAC) are a modular, end-to-end, web-accessible toolkit to analyze spatial single-cell datasets derived from multiplexed whole-slide tissue, CODEX, and imaging mass cytometry (IMC) images. SPAC enables scientist and image analysts to build and configure scalable and fault tolerant multi-step analysis pipelines on the web and share them with their collaborators by a single click.

A typical SPAC analysis involves data aggregation, data sampling from multiple slides, exploratory features expression analysis, features preprocessing and normalization, clustering, dimensionality reduction, cluster analysis, interactive spatial plots, and spatial statistics plots. SPAC users are empowered to setup their analysis pipelines without coding experience, visualize and interpret multi-step intermediate and end results, suggest and test new hypothesis, and generate figures for presentations and publications.

GitHub: https://github.com/FNLCR-DMAP/SCSAWorkflow/tree/fa6186e2afe2e96ef598f0e03277f095125b585f

Project Description:

The project will continue building upon work conducted during the pilot project with the Data Mine. It will extend the analysis methods that SPAC enables. This includes, but is not limited to, researching and implementing spatial statistic methods, statistical spatial interaction methods, programmatic phenotyping of cells, acceleration of algorithms using graphic processing units, and implementing interactive visualization tools. The SPAC package is a research software engineering effort where best practices for code development and documentation, unit testing, and continuous integration are being used. At the conclusion of the project, the students will contribute and deliver fifteen new medium complexity capabilities/enhancement of the SPAC tools.

Rational:

The proposed project uses open-source data, making it easily accessible to students. In addition, it has well-defined expectations for the research methods while still allowing students to be creative in the implementation. Once trained, the students will be able to work independently. However, the project requires contributors with skills in Python, statistics (hypothesis testing), machine learning (clustering, dimensionality reduction, and classification), and git. Therefore, it will be a good test of whether the Data Mine students possess the skills needed to serve other FNL projects.