FASTGenomics – Algorithms for single-cell transcriptomics at your fingertips

In this article, we present a new, free-of-charge and easy-to-use online platform for the analysis of single-cell transcriptomics data developed by the group of the of cluster scientist Prof. Joachim L. Schultze in collaboration with the Bonn-based IT and Data Science Consulting company Comma Soft.

The human body consists of approximately 37 trillion cells, all of which carry essentially the same genetic information. The cells can be grouped into diverse classes, i.e. nerve cells, muscle cells, skin cells or immune cells, depending on obvious traits, such as their location within the body, their morphology or their apparent function. However, to explore the true heterogeneity of subtypes and states constituting these cellular classes molecular resolution is required.

Where exactly lies the root of this heterogeneity? The answer is to be found in the expression of the genetic information, which is regulated by multiple layers of complex control mechanisms. Transcription is the initial process of selectively reading out genetic information and transcribing it into RNA molecules to serve either as messenger RNA coding for proteins or as functional entities in themselves. The entirety of the transcribed RNA constitutes a cell’s transcriptome, which holds great information about a cell’s identity and state and can be assessed using modern sequencing methods.

Now, what would happen if we had a technique to analyze the transcriptomes of thousands of individual cells in parallel? Could we identify patterns in gene expression revealing basic principles that define cellular identities? Could we assess the true heterogeneity of cells within different organs? Could we observe how certain genes are switched on and off determining different cell types and states under varying conditions?The answer is yes – and the online analysis platform FASTGenomics is part of this exciting technological innovation to probe single cells.

In the last decade, numerous so-called single-cell RNA sequencing (scRNA-seq) protocols were developed that allow isolation of and transcriptome library production from individual cells. Starting from a handful of single cells in 2009, newest advances allow the analysis of up to millions of cells in a single experiment (1, 2).

Handling vasts amounts of transcriptomic data presents a substantial challenge!

While the production of single-cell genomics data gets more and more common, analysis of such high-dimensional data remains a substantial bottleneck. The single-cell revolution holds great potential to gain completely new perspectives but also holds unprecedented analytical challenges. Without computational power and sophisticated algorithms, there is no chance to make sense of such high-dimensional data. Just imagine sitting on the transcriptomic information of thousands of cells and trying to understand how these cells belong together and what makes them different.

When Joachim L. Schultze, Professor for Genomics and Immunoregulation at the University of Bonn, met Stephan Huthmacher, a graduate of the University of Bonn and the CEO of the Bonn-based IT and Data Science Consulting company Comma Soft, they both were not only excited about the scientific achievement of scRNA-seq, but also saw the opportunities that came with the big data challenge. Together, they developed the idea for a cloud-based research infrastructure including data management and AI-based analytics combined with the ability to showcase and share reproducible results – the blueprint for what was soon to be FASTGenomics.

An interdisciplinary team of data scientists from Comma Soft and researchers from the Schultze lab at the University of Bonn was put together and prepared a proposal for the German Federal Ministry of Economic Affairs and in 2016 FASTGenomics was funded for three years within the Digital Technologies framework. The work on the prototype began. After extensive development and testing over two years, FASTGenomics allows analyzing single-cell gene expression data using a variety of pre-installed algorithms and workflows without spending hours on the setup of a local machine. Thus, the free-of-charge platform provides a ready-to-use and intuitive interface for scRNA-seq analysis allowing non-experts in the field to explore their own as well as a comprehensive collection of public data sets. In 2018, an introduction to the platform was published by Scholz et al. (3). and the gates to the platform were opened for the single-cell community. This led to much interaction with the users and their feedback was continuously integrated into the platform.

FASTGenomics: Exploratory analysis with jupyter notebooks

In 2019, FASTGenomics was all set up with a comprehensive app-store providing diverse analysis methods for scRNA-seq analyses and was seeking to become part of promising new research projects such as DiscovAIR, aiming to establish the first draft of the Human Lung Cell Atlas, and LifeTime, an initiative to revolutionize health care by tracking and understanding human cells during disease.

Nevertheless, during the last months, there have been many rapid developments in the field – beginning from emerging standard data formats to updated best-practice workflows and new versions of the popular Scanpy (4) and Seurat (5) packages. The team behind FASTGenomics took the opportunity to re-design the platform from website to backend to enable instantaneous support of the newest algorithms and software tools. To this end, they integrated the widely used Jupyter notebooks, which are designed to write software for interactive computing across dozens of programming languages, into FASTGenomics and combined this with new best-practices workflows to provide state-of-the-art analysis pipelines but also give the users the freedom to incorporate own tools and ideas.

Now FASTGenomics offers the possibility to freely explore own as well as public single-cell RNA-seq data, to follow best-practices or custom workflows, to share data and analyses with cooperation partners, and to develop completely new ideas. FASTGenomics does not charge any money and will remain free of charge for academic communities. Furthermore, users can upload their data in a completely private space protected against any unwanted access. Nevertheless, FASTGenomics also presents a platform to publish data and analyses if desired by the user and offers intuitive means for the community to conveniently explore the data. The growing community on FASTGenomics provides valuable feedback, which is indispensable to improve and update the platform and you are welcome to become part of it – just visit www.fastgenomics.org.


Author: Jonas Schulte-Schrepping


Literature

  1. Svensson, V., Vento-Tormo, R. & Teichmann, S. Exponential scaling of single-cell RNA-seq in the past decade. Nat Protoc13, 599–604 (2018). https://doi.org/10.1038/nprot.2017.149
  2. Svensson, V., da Veiga Beltrame, E., Pachter, L. A curated database reveals trends in single-cell transcriptomics. bioRxiv (2019). https://doi.org/10.1101/742304
  3. Scholz et al. FASTGenomics: An analytical ecosystem for single-cell RNA sequencing data. (2018). bioRxiv; https://doi.org/10.1101/272476
  4. Wolf, F., Angerer, P. & Theis, F. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol19, 15 (2018). https://doi.org/10.1186/s13059-017-1382-0
  5. Butler, A., Hoffman, P., Smibert, P. et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol36, 411–420 (2018). https://doi.org/10.1038/nbt.4096