Contact PI/Project LeaderLAWSON, JONATHAN Other PIs
Awardee OrganizationBROAD INSTITUTE, INC.
Description
Abstract Text
Project Summary/Abstract
Five years ago the AnVIL was founded with a vision of creating a federated data ecosystem. Its first phase
focused on building the foundational capabilities needed to bring together data, tools, and research
communities in a cloud-based environment. Now, in this second phase, the focus must be on scientific impact.
We will pursue the following Aims that emphasize growing the AnVIL data corpus, going multi-cloud, creating
analytical tools for flagship NHGRI initiatives, and increasing the user base. We will accomplish this through
the following Aims:
● Aim1 (Data Ingestion): Support the ingestion, curation, and management of diverse datasets, so
that they are accessible to the research community. In Phase I of the AnVIL, we ingested,
wrangled, and QC’d more than 5PB of data from NHGRI consortia. In Phase II, we will continue this
track record of success in supporting consortia, and extend our efforts to support the long tail of
individual researchers with valuable data to contribute to the AnVIL.
● Aim2 (Software Infrastructure): Reducing barriers to entry by supporting multiple clouds and
improving cost control. While Phase I of the AnVIL focused on establishing foundational software
infrastructure, Phase II must be about scaling adoption of the AnVIL. We have a three-part strategy for
achieving this: (i) Becoming multi-cloud, so that we support Microsoft Azure, in additional to Google
Cloud; (ii) Creating “AnVIL lite,” a simplified and free tier of the AnVIL that lowers barriers to entry; (iii)
Exposing tools to improve billing visibility and prevent overspend.
● Aim3 (Scientific Services): Leverage the AnVIL’s datasets and platforms to accelerate scientific
research. In Phase II, we must prioritize the scientific impact of the AnVIL. Towards this end, we will
leverage: (i) an imputation service drawing on AnVIL datasets and other datasets of diverse ancestry;
(ii) a newly developed genomic variant store to support joint calling; (iii) an improved and expanded
capability for third party deployment of tools and applications in the AnVIL.
● Aim4 (User Services): Support the growth and long-term success of the research community
through user support, training, and project management. The services that comprise the AnVIL are
not only web services, but also human services. Meeting the needs of researchers everywhere requires
security, user support, training, and project governance.
The guiding principle of our efforts is that progress in genomic data science will happen most rapidly if there is
a diversity of interoperable solutions created by a plurality of groups. Toward that end, we will continue to
ensure that the AnVIL continues to drive towards interoperability and federation by participating in NIH-led and
international efforts focused on standard setting and data sharing.
Public Health Relevance Statement
Project Narrative
In this proposal, we bring together a unified team with a strong track record of developing secure and scalable
software systems to support flagship scientific initiatives, such as the All of Us Research Program, the Human
Cell Atlas, and the first phase of the AnVIL itself. Our group will leverage these experiences, and the software
developed for them, to create an ecosystem of tools, applications, and services that will continue to serve the
needs of the NHGRI community in this second phase of the AnVIL. Importantly, our approach prioritizes the
continued creation and evolution of a federated data ecosystem, one in which the AnVIL will continue to
interoperate with other key NIH data science initiatives.
NIH Spending Category
No NIH Spending Category available.
Project Terms
AccelerationAdoptionAll of Us Research ProgramAtlasesAutomobile DrivingCatalogsCellsClinicalClinical DataClinical ServicesCommunitiesComputer softwareCost ControlDataData EngineeringData ScienceData SetDedicationsDocumentationEcosystemEngineeringEnvironmentEvolutionGenetic DiseasesGenomicsGoalsGrowthHealthHumanIndividualInfrastructureIngestionInternationalJointsMapsNational Human Genome Research InstitutePhaseProcessProtocols documentationResearchResearch PersonnelResourcesSecureSecurityServicesSoftware EngineeringSystemTailTrainingUnited States National Institutes of HealthVisionanalytical toolbasecloud basedcloud platformcohortdata ecosystemdata ingestiondata managementdata modelingdata platformdata portaldata sharingdata standardsdiverse dataexperiencefederated datagenetic variantgenomic dataimprovedinteroperabilitylarge datasetsmeetingsoperationoutreachpreventservice utilizationsoftware developmentsoftware infrastructuresoftware systemssuccesssupport toolstoolweb services
No Sub Projects information available for 5U24HG010262-07
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 5U24HG010262-07
Patents
No Patents information available for 5U24HG010262-07
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 5U24HG010262-07
Clinical Studies
No Clinical Studies information available for 5U24HG010262-07
News and More
Related News Releases
No news release information available for 5U24HG010262-07
History
No Historical information available for 5U24HG010262-07
Similar Projects
No Similar Projects information available for 5U24HG010262-07