SOCAL: Privacy-protecting Sharing Of Clinical Data Across Laboratories
Project Number7R01EB031030-03
Former Number5R01EB031030-02
Contact PI/Project LeaderKUO, TSUNG-TING
Awardee OrganizationYALE UNIVERSITY
Description
Abstract Text
Project Summary
Privacy and security of personal information has become one of the major grand challenges in modern society,
especially for healthcare studies. Re-identification risks and data breaches require new policies and regulations
for data sharing across healthcare institutions and research laboratories. While policy cannot solve the problem
on its own, advanced technologies that work hand in hand with policy are important to address the
privacy/security concerns. Predictive analytics can support quality improvement, clinical research, and eventually
impact patient health status. Extensive clinical variable information and voluminous data records from multiple
institutions and laboratories are necessary to further improve the performance of modeling approaches and to
identify medication-outcome associations for diseases. Nonetheless, the transfer of such sensitive data among
institutions/laboratories can present serious privacy risks, which can jeopardize NIH’s mission. Aiming at
mitigating the privacy problem while increasing predictive capability via cross-institutional modeling, prior studies
proposed distributed methods to exchange only the predictive models, but not patient data. However, these
methods still pose many challenges to the clinical cross-institutional learning problem, including the need for
more comprehensive clinical variables and more patient records to achieve better prediction discrimination and
build more generalizable models, the necessity for discovery/alleviation of data manipulation to increase the
trustworthiness of the collaboratively trained models, and the requirement for more validation to ensure usability.
In this proposal, we plan to develop SOCAL (Privacy-protecting Sharing Of Clinical data Across Laboratories), a
distributed framework addressing these challenges by integrating vertical/horizontal modeling methods to
include both more complete variables and more records, discovering/alleviating data manipulation incidents
using models recorded on blockchain, and conducting controlled experiments and designing/testing a web portal
with physician-researchers to increase the usability of the system. SOCAL will be evaluated on a Coronavirus
Disease 2019 (COVID-19) dataset from five University of California (UC) Health medical centers. We expect the
knowledge/capability of collaborative modeling can be improved, the trustworthiness of the learning process can
be enhanced, and the framework will be ready for use. SOCAL is innovative because it will be a new integration
methodology for vertical/horizontal modeling, a novel data manipulation resisting methods, and a hardened
prototype for a practical blockchain application. We anticipate a powerful impact of the SOCAL framework to
largely reduce the privacy concerns of predictive modeling tasks for various stakeholders, including healthcare
providers, clinical researchers, and patients. Upon completion, SOCAL can accelerate the development of
methods/technologies to increase willingness of institutions to participate in such a collaboration for improving
the effectiveness of healthcare.
Public Health Relevance Statement
Project Narrative
To advance healthcare quality improvement and clinical research by integrating clinical data across institutions
in a privacy-preserving manner, we will develop a decentralized framework to compute with comprehensive
clinical variables and with numerous records, to discover and alleviate data manipulation during the learning
process, and to conduct controlled experiments and design/test a web portal with physician-researchers, on real
cross-institutional COVID-19 data. This project is innovative as it proposes a new integration methodology for
vertical/horizontal modeling, a novel data manipulation resisting methods, and a hardened prototype for a
practical blockchain application. We expect powerful and sustainable impact for care providers, clinical
researchers, and patients, and can eventually accelerate the development of methods/technologies to increase
willingness of institutions to participate in such a collaboration for improving the effectiveness of healthcare.
NIH Spending Category
No NIH Spending Category available.
Project Terms
AccelerationAddressArtificial IntelligenceCOVID-19CaliforniaClinicalClinical DataClinical ResearchCollaborationsDataData SetDecentralizationDiscriminationDiseaseEffectivenessElectronic Health RecordEnsureExtravasationFailureHealthHealth PersonnelHealth StatusHealth protectionHealthcareHospitalsInstitutionIntuitionKnowledgeLaboratoriesLaboratory ResearchLearningMachine LearningMedical centerMethodologyMethodsMissionModelingModernizationOutcomePatientsPerformancePharmaceutical PreparationsPhysiciansPoliciesPredictive AnalyticsPrimary Care PhysicianPrivacyProcessProtocols documentationRecordsRegulationResearch PersonnelRiskSecuritySiteSocietiesSystemTechnologyTestingTrainingUnited States National Institutes of HealthUniversitiesValidationWorkblockchaincare providersdata sharingdata warehousedesignexperimental studyhealth care qualityimprovedinnovationlearning strategymethod developmentmodel generalizabilitynovelpeerpredictive modelingprivacy preservationprivacy protectionprototypetrustworthinessusabilityweb portalwillingness
National Institute of Biomedical Imaging and Bioengineering
CFDA Code
286
DUNS Number
043207562
UEI
FL6GV84CKN57
Project Start Date
01-July-2024
Project End Date
30-June-2026
Budget Start Date
01-July-2024
Budget End Date
30-June-2025
Project Funding Information for 2024
Total Funding
$343,314
Direct Costs
$220,501
Indirect Costs
$122,813
Year
Funding IC
FY Total Cost by IC
2024
National Institute of Biomedical Imaging and Bioengineering
$343,314
Year
Funding IC
FY Total Cost by IC
Sub Projects
No Sub Projects information available for 7R01EB031030-03
Publications
Publications are associated with projects, but cannot be identified with any particular year of the project or fiscal year of funding. This is due to the continuous and cumulative nature of knowledge generation across the life of a project and the sometimes long and variable publishing timeline. Similarly, for multi-component projects, publications are associated with the parent core project and not with individual sub-projects.
No Publications available for 7R01EB031030-03
Patents
No Patents information available for 7R01EB031030-03
Outcomes
The Project Outcomes shown here are displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed are those of the PI and do not necessarily reflect the views of the National Institutes of Health. NIH has not endorsed the content below.
No Outcomes available for 7R01EB031030-03
Clinical Studies
No Clinical Studies information available for 7R01EB031030-03
News and More
Related News Releases
No news release information available for 7R01EB031030-03
History
No Historical information available for 7R01EB031030-03
Similar Projects
No Similar Projects information available for 7R01EB031030-03