Main

Whole-brain image volumes at the micron scale are helping scientists characterize neuron-level morphology and connectivity, and discover new neuronal subtypes. These volumes require intense computational processing to uncover the rich neuronal information they contain. Currently, however, image acquisition is outstripping the availability and throughput of analysis pipelines. The steps in analyzing these images include registration, axon segmentation, soma detection, visualization and analysis of results. Several tools exist for these individual steps, but are rarely all part of an integrated pipeline and able to facilitate cloud-based collaboration (Tyson et al., 2022; Pisano et al., 2021). Further, many existing machine learning based tools are highly tuned to their training data and perform poorly when they encounter out-of-distribution artifacts or signal levels (Geisa et al., 2021).

To address these challenges, we present BrainLine, an open-source, fully-integrated pipeline that performs registration, axon segmentation, soma detection, visualization, and analysis on whole-brain fluorescence volumes (Figure 1a). BrainLine combines state-of-the-art, already available open-source tools such as CloudReg (Chandrashekhar et al., 2021) and ilastik (Berg et al., 2019) with brainlit, our Python package developed here. The BrainLine pipeline accommodates images that are hundreds of gigabytes in size and uses generalizable machine learning training schemes that adapt to out-of-distribution samples.

To share and interact with data across multiple institutions, BrainLine uses Amazon S3 to store data in precomputed format, so it can be viewed using Neuroglancer (n/a). Specifically, we use CloudReg (Chandrashekhar et al., 2021) for file conversion of the stitched image, and for image registration to the Allen atlas (Wang et al., 2020).

For axon segmentation and soma detection, we sought to leverage recent machine learning advances but experienced two major constraints. First, as generating ground truth image annotations is labor intensive, we wanted the approach to be effective on a small amount of training data. Second, images were provided to us in a sequential manner, and new samples would sometimes have unique artifacts or different levels of image quality (Figure 1b, c, e, f). We therefore sought a learning algorithm that could be quickly retrained on new data. Many learning algorithms assume that all training and testing data come from the same distribution and fail when this is not the case (Quinonero-Candela et al., 2008). However, using our closed-loop training paradigm with ilastik (Berg et al., 2019), we were able to use a single ilastik project for all samples, only occasionally adding training data when difficult samples arose.

We used an ilastik pixel classification workflow for both axon segmentation and soma detection, but in the latter case we applied a size threshold to the connected components following segmentation. In both cases, the training approach was the same. For each new whole-brain volume, we identified a set of subvolumes (\(99^3\) voxels for axons, \(49^3\) for somas) across a variety of brain regions, and annotated only a few slices (three for axons, five for somas) in each subvolume for our validation set. This strategy is similar to that employed in Friedmann et al. (2020). If our model could not achieve a satisfactory f-score on this validation dataset, we would annotate more subvolumes from the sample and add them to the training set until satisfactory performance was achieved.

We observed that this heterogeneous training procedure (i.e. training on multiple brain samples) often improved performance on other samples as well. In an experiment where we controlled the number of subvolumes used for training, this approach was at least as good as a homogeneous approach, where all training subvolumes came from a single brain sample (Figure 1d, g).

The pipeline can display the axon segmentation and soma detection results in a variety of ways, including brain-region-based bar charts accompanied by statistical tests (Fig. 1a.i), 2D plots with the atlas borders (Fig. 1a.ii), and 3D visualizations using brainrender (Fig. 1a.iii) (Claudi et al., 2021). Since every experimental design is unique, we designed our pipeline in a modular way, so investigators can pick and choose which components they want to incorporate in their own analyses. We also leverage existing software and file formats to facilitate interoperability (Tyson et al., 2022).

BrainLine enables accelerated analysis of brain-wide connectivity through parallel programming, the use of cloud-compliant file formats, and a machine learning training scheme that generalizes across brain samples. As a result, BrainLine alleviates the need for investigators to build custom analysis pipelines from scratch, helping them characterize the morphology and connectivity profiles of neurons, and discover new neuronal subtypes. BrainLine is available as a set of thoroughly documented notebooks and scripts in our Python package brainlit: http://brainlit.neurodata.io/.

Fig. 1
figure 1

BrainLine allows for efficient processing of heterogeneous whole brain fluorescence volumes. a BrainLine combines CloudReg (Chandrashekhar et al., 2021), ilastik (Berg et al., 2019) and our package, brainlit, to produce results in both quantitative (a.i) and visual (a.ii-a.iii) formats. b Example images with fluorescently labeled axon projections and arrows pointing to regions with (green) and without (red) labeled axons. c Intensity histograms of 20x20x20 voxel subvolumes located at the arrows in b. d Comparison between axon segmentation performance after training on subvolumes from different samples (heterogeneous) or the same sample (homogeneous). e Example images with fluorescently labeled cell bodies and arrows pointing to regions with (green) and without (red) labeled cell bodies. f Intensity histograms of 20x20x20 voxel subvolumes located at the arrows in e. g Comparison between soma detection performance after training on subvolumes from different brain samples (heterogeneous) or a single brain sample (homogeneous)