Data Science Core - Abstract
Achieving the scientific goals of the Overall Research Strategy requires a significant effort and advancement in
data science for neuroscience. In particular, scientific progress depends on novel experimental design, data
collection and processing (as described in Projects 1, 2, and 3), and novel analysis and models (as described in
Projects 1, 2, and 3), which lead to general principles to be tested (as described in Projects 2 and 3). The
fundamental goal of the Data Science Core is to accelerate the process connecting the raw data collected in all
three Projects to the analyses used to obtain data derivatives, which can then be used to build models across
all three Projects, and validated via electron microscopy (EM) in Project 3. The two main challenges we face to
accelerate these links are big data and reproducibility. First, the data collected are too large to fit into memory,
or even on disk, with each experiment ordering on one terabyte (TB), and the entire dataset amassing hundreds
of TB or more. Therefore, the classic paradigm of using MATLAB for all analyses that are stored locally is not
sufficient. The solution to this is twofold: 1) build scalable algorithms, so that different individuals can apply them
to these big data, and 2) develop cloud data management systems, so that all consortium members can quickly
access and analyze the data, and then integrate them with one another. The cloud data management system
will be built on the infrastructure developed for the Open Connectome Project1, originally developed to host data
on institutional resources, and ZBrain 2.0, a resource we are developing to define a common coordinate space
for zebrafish brain atlasing. Second, this is a team effort, so sharing analyses and derivatives and keeping track
of metadata will be important. The solution to this is threefold: 1) build a comprehensive scientific environment
in the cloud, that enables sharing of entire “digital experiments”, linking to the data and ensuring that the entire
analysis pipeline can be trivially run and extended by anyone and anywhere, 2) carefully curating data and
metadata in existing resources, and 3) facilitating the integration of different imaging datasets to improve ZBrain
2.0. Our entire system is built on and will continue to be open source, portable and reproducible, and will use
and extend best practices of data science and FAIR (
Findable, Accessible, Interoperable, and Re-usable)2
data
management. Completing all the aims in this Data Science Core will not only enable and accelerate the scientific
progress addressed by this proposal, it will establish new standards in data science that can be immediately
applied to all other U19 efforts, as well as many other efforts within and outside NIH and even the international
science effort at large.
Public Health Relevance Statement
The research plan we propose aims at a comprehensive multi-level understanding of how the brain and
the body interact to respond adaptively to a potentially harsh and unforgiving environment. Among the
many internal factors that are under constant homeostatic regulation, we focus on the dialogue between
the brain and the heart, since the cardiovascular system and its role in regulating oxygen delivery is one of
the most critical and universal control systems within the animal kingdom. Our approach aims to not only
explain the core mechanisms of top down cardiac control, but also to provide first insights into what is
measured at the level of the heart itself, and what is the nature of the signals that are being sent from the
heart to the brain.
A central aspect of our approach is the generation of a complete brain and body connectome of the larval
zebrafish. This map, the first such map in any vertebrate, will be critical in both testing the validity of
many extant models, and perhaps discovering new pathways within the interconnected neural circuits that
serve the changing needs of the body during behavior.
NIH Spending Category
No NIH Spending Category available.
Project Terms
3-DimensionalAccelerationAddressAlgorithmsAnatomyAnimalsArchitectureAtlasesBehaviorBehavioralBig DataBiological ModelsBiologyBrainBrain imagingBrain regionCardiacCardiovascular systemCloud ComputingCodeCollaborationsCommunitiesComputer softwareDataData CollectionData Management ResourcesData ScienceData Science CoreData SetData SourcesDocumentationEcosystemElectron MicroscopyEnsureEnvironmentExperimental DesignsFAIR principlesFaceFeedbackGenerationsGoalsHeartImageIndividualInfrastructureIngestionInstitutionInternationalInternetKnowledgeLabelLearningLinkMapsMeasuresMemoryMetadataMindModalityModelingModernizationMorphologic artifactsNatureNeural InterconnectionNeuroanatomyNeurosciencesOntologyOxygenPathway interactionsPlantsProcessRegulationReproducibilityResearchResolutionResourcesRoleRunningScienceScientistServicesShapesSignal TransductionSliceSoftware EngineeringSurveysSystemTechnologyTestingTextureUnited States National Institutes of HealthVisualizationWritingZebrafishanalysis pipelinecollaboratorycomparativecomputerized data processingcomputing resourcesconnectomedata curationdata integrationdata managementdata sharingdesigndigitalempowermentexperimental studygenomic dataimprovedinsightlaptopmembermigrationmind body interactionmodel buildingneural circuitnovelopen sourceopen source libraryportabilityrepositoryscalable algorithmsterabytetoolvirtual machineweb platform
National Institute of Neurological Disorders and Stroke
CFDA Code
DUNS Number
082359691
UEI
LN53LCFJFL45
Project Start Date
25-September-2017
Project End Date
31-August-2027
Budget Start Date
01-September-2024
Budget End Date
31-August-2025
Project Funding Information for 2024
Total Funding
$203,065
Direct Costs
$207,464
Indirect Costs
$1,051
Year
Funding IC
FY Total Cost by IC
2024
National Institute of Neurological Disorders and Stroke
$203,065
Year
Funding IC
FY Total Cost by IC
Sub Projects
No Sub Projects information available for 5U19NS104653-08 5832
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 5U19NS104653-08 5832
Patents
No Patents information available for 5U19NS104653-08 5832
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 5U19NS104653-08 5832
Clinical Studies
No Clinical Studies information available for 5U19NS104653-08 5832
News and More
Related News Releases
No news release information available for 5U19NS104653-08 5832
History
No Historical information available for 5U19NS104653-08 5832
Similar Projects
No Similar Projects information available for 5U19NS104653-08 5832