January 2026 | Accelerate Science Now Coalition Members
Date: January 13, 2026
Submitted by: Accelerate Science Now
Point of Contact: Joshua New, Director of Policy, SeedAI
Email: josh@seedai.org
1) How should DOE best mobilize National Laboratories to partner with industry sectors within the United States to form a public-private consortium to curate the scientific data of the DOE across the National Laboratory complex so that the data is structured, cleaned, and preprocessed in a way that makes it suitable for use in AI models? How can DOE anonymize and desensitize data and/or make use of privacy-preserving AI training methods to enable AI model development using sensitive or proprietary data?
Project-first approach: When considering how best to structure a public-private consortium between DOE National Labs and private sector partners, the Department should allow partners to select which project(s) they would like to support. This project-first approach may mitigate industry concerns about commercial risk associated with involvement with individual projects, scope creep, and resource drain. This may also serve as a strong demand-side signal to National Laboratories about which scientific datasets are the highest value and thus worth prioritizing. Though a project-first approach should be the norm, DOE and potential partners may of course establish agreements to provide more general-use infrastructure and resources for a wide variety of projects as appropriate.
Funding and non-monetary resources provided by external partners should be tied to individual projects rather than pooled across the consortium’s full portfolio. However, where possible, there should be flexibility to redirect funding toward more promising projects when a project is deemed unsuccessful.
For each project, the Department should work with external partners to set well-scoped targets with clear deadlines and measurable standards of success, creating accountability for all participants.
Continued external participation should be subject to review and renewal on a short-term basis, such as every one to two years, reducing medium term risk for stakeholders that might be hesitant to participate and increasing accountability and oversight from governing bodies.
Data management: All partnerships should utilize a common set of AI-ready data standards applied across all projects. Shared expectations around data provenance, metadata, uncertainty metrics, and common ontologies would help ensure outputs remain interoperable and reusable across the consortium over time, rather than becoming siloed at the project level.
DOE and partners should use this opportunity to experiment cutting edge privacy-preserving technology to train AI models on potentially sensitive or proprietary data. This would both unlock greater utility from this data that might not otherwise be made available for public scientific benefit as well validate how this technology could be deployed at scale.
DOE should prioritize openness by default and only restrict access to scientific data when its broad dissemination poses national security or privacy risks, in accordance with federal open data requirements.¹
DOE may offer incentives to encourage industry investment and participation, however priority access to scientific data should not be one of them. This would follow best practices established by both federal open data policy and public-private data dissemination models utilized by other scientific agencies.²
Governance: DOE should establish a board of DOE scientists and data experts to be responsible for setting high-level goals for data curation, as well as selecting and prioritizing projects. This governance model should draw on best practices established by the National AI Research Resource when appropriate. Execution of individual projects should be delegated to and executed by DOE Project Managers tasked with setting and driving progress toward specific standards and deadlines.
The core group should set goals and select projects in consultation with relevant federal bodies, such as OSTP, NSF, and agencies conducting parallel research. The Department should consider creating an Interagency Task Force to optimize interagency coordination.
In addition, the U.S. government should consider creating a Federal Advisory Committee to the consortium composed of and chaired by leaders from the private sector, academia, research institutions, and other relevant bodies.
2) How should DOE best structure the public-private consortium to enable activities across a range of scientific and technical disciplines, including partnerships with industry, to develop self-improving AI models for science and engineering using DOE’s data, potentially in combination with data from other partners?
a. Would custom general-purpose AI models, for example those focused on language and reasoning capabilities, be fine-tuned using data provided by DOE and/or other partners? If so, could such fine-tuning be accomplished using existing or already-planned Application Programming Interfaces (APIs)? Alternatively, would the general-purpose AI models be improved such that custom, fine-tuned versions will be unnecessary?
Yes, custom general-purpose AI models could be fine tuned with DOE and partner data. However, existing fine-tuning APIs typically rely on straightforward data upload mechanisms that may not have the adequate controls for data attribution, governance, and privacy necessary under DOE's data stewardship requirements. DOE should also explore federated and privacy-preserving approaches that enable models to improve using local data without requiring centralized data transfer. These approaches include: fine-tuning via directly modifying model weights using local datasets; retrieval augmented generation (RAG); weighted ensembles that combine locally trained or fine-tuned models; and other more advanced/emerging methods (federated-RAG, federated mixture of experts, etc.). RAG and federated methods, in particular, offer advantages for scenarios where data cannot leave its originating site and can be updated more dynamically than traditional fine-tuning.
As foundation models continue to improve, the necessity for extensive fine-tuning may diminish for some use cases. However, domain-specific customization will likely remain valuable for specialized applications.
c. What areas of science and engineering are priorities for the development of the self-improving AI models? What classes and modalities of data are likely needed to address these priorities? To what extent can addressing these priorities rely on preexisting data versus depending on the collection or generation of new data?
One area that stands out as a potential priority is biomanufacturing. Unlike many scientific domains, biomanufacturing already relies heavily on automated, high-throughput experimentation, which makes it especially well suited for closed-loop systems where models can propose designs, experiments can be run at scale, and results can be fed back to improve performance over time. The iteration between models and physical experiments is critical for demonstrating real, measurable gains from self-improving AI.
The relevant data would potentially span multiple modalities, including genetic and protein sequence data, strain and pathway designs, process parameters, sensor and time-series data from bioreactors, and downstream performance metrics such as yield, purity, and stability, and more. Much of this data already exists across DOE facilities, National Laboratories (LBNL's ABPDU, Joint Genome Institute, PNNL, and others), and industry partners, particularly from past and ongoing bioprocess optimization efforts.³ However, there is still a need for new, purpose-built datasets that are standardized, well-annotated, and explicitly designed for iterative model training and evaluation rather than retrospective analysis alone.
3) How should DOE best provide AI models to the scientific community through programs and infrastructure making use of cloud technologies to accelerate innovation in discovery science and engineering for new energy technologies?
Data and model security: It is particularly important that models for national security use cases – especially those capable of rapid self-improvement – are secure from theft or sabotage. At the very least, the most sensitive models should be trained and run on the U.S. government’s classified compute, rather than commercial cloud. The AI Action Plan calls for the federal government to create new technical standards for high-security AI datacenters – DOE should leverage Genesis Mission activities as a proving ground to collaborate with other agencies to: develop adequate security standards for consortium datacenters that are used for fine-tuning / inference on national security use-cases; support R&D into security mechanisms necessary to meet those standards; and build compliant datacenters, likely starting with smaller pilot facilities for fine-tuning and inference.
AI reliability: DOE should prioritize R&D on frontier AI model reliability, as this is a prerequisite to being able to safely use self-improving AI models in national-security-relevant contexts, as well as ensure AI models powering scientific discovery are indeed supporting high-quality science. As part of the effort of developing self-improving AI models, DOE should consider automating AI reliability research itself, as this may provide the technology necessary for the U.S. government to develop and scale reliability techniques on compressed timelines.
DOE should also rely on developing secure and clean datasets to avoid risks stemming from poisoned data. Recent research demonstrates that a small number of malicious documents can compromise even very large models, meaning that comprehensive data provenance and curation can be a critical security factor.⁴ DOE should train models on datasets with demonstrably secure data provenance when possible, particularly for models with potential national security implications.
Additional Questions
2) How can DOE best develop governance models for shared data, AI models, and computing infrastructure and ensure compliance with applicable legal, regulatory, and privacy standards consistent with Pub. L. 119-21?
DOE could survey existing governance models for shared data and make recommendations on which it considers the most appropriate for consortium activities, as well as identify gaps where further work is needed.
3) How can DOE best prepare and curate scientific data at scale for AI training using existing scientific data sets and developing approaches to building and maintaining Findable, Accessible, Interoperable, and Reusable (FAIR), AI-ready data repositories?
Ensuring data curation, management, and maintenance positions are permanent staff with both authority and budget will be essential to ensure DOE’s scientific data is a useful resource for AI training. Federal data curation efforts are often plagued by under-resourcing and weak enforcement authority, resulting in reduced data utility and availability.
4) How can DOE best balance the benefits and challenges of using federated or distributed versus centralized data repositories?
Federated or distributed data repositories are more complex, but pose significantly increased benefits and can unlock significantly greater amounts of data for scientific discovery.
8) What advantages or disadvantages might incorporated consortia, Focused Research Organizations, or other less-traditional awardee structures have in the context of this consortium effort? In addition to DOE, what kinds of entities might provide investment in such awardees, and what kinds of consideration might such investors receive?
Focused Research Organizations (FROs) and similar structures offer potential advantages for consortium-based research: they can assemble dedicated, full-time teams around specific technical challenges and operate with greater flexibility and longer time horizons than traditional university grant structures. These features may be particularly valuable for foundational and ambitious research requiring sustained, coordinated effort. FROs are a relatively new model, but NSF's Tech Labs initiative reflects growing federal interest in funding dedicated teams through milestone-based mechanisms with operational autonomy.⁵
DOE should be open to partnering with a wide variety of entities potentially interested in co-investing in Genesis Mission activities. This includes philanthropic foundations, industry partners, private research organizations, state economic development agencies, and more. DOE should also explore a wide range of considerations to incentivize such partnerships, while still prioritizing open innovation and promoting public benefit.
Accelerate Science Now is a non-partisan coalition of leaders in industry, academia, civil society, and the research community, charged with igniting a new era of rapid scientific discovery and delivering the benefits to the American people. Accelerate Science Now members include: The Align Foundation, Amazon Web Services (AWS), Anthropic, Arizona State University, Arm, Astera, Bit Biome, Black Tech Street, Broad Institute, Caltech, Carnegie Mellon University, Center for Data Innovation, Cohere, Computing Research Association (CRA), Convergent Research, Digit Bio, Emerald Cloud Lab, Energy Sciences Coalition, Engineering Biology Research Consortium (EBRC), Federation of American Scientists (FAS), Foundation for American Innovation, FutureHouse, Ginkgo Bioworks, Good Science Project, Google DeepMind, Hewlett Packard Enterprise (HPE), Horizon Institute for Public Service, Inclusive Abundance, Information Technology Industry Council (ITI), Institute for AI Policy and Strategy (IAPS), Institute for Progress (IFP), Institute of Electrical and Electronics Engineers (IEEE), Intel, Klyne, Lehigh University, Medra, Meridian, Meta, Microsoft, National Applied AI Consortium (NAAIC), New Mexico A.I. Labs, New Mexico Artificial Intelligence Consortium Academia (NMAIC-Academia), New Mexico State University, NobleReach Foundation, OpenMined, Potato, RenPhil, Rice University, Roadrunner Venture Studios, Roboflow, Samsung, SeedAI, Software Information Industry Association (SIIA), Syntensor, Systems & Technology Research (STR), Tetsuwan, Transfyr, UbiQD, University of California Berkeley, University of California Irvine, University of Albany, University of Florida, University of Tennessee-Knoxville, University of Wisconsin-Madison, VentureWell
Accelerate Science Now is led by SeedAI, a non-profit, nonpartisan organization working at the forefront of artificial intelligence policy and governance.
These recommendations do not necessarily represent or reflect the official positions of all coalition members. This document should be understood as a collaborative effort to advance shared objectives, while acknowledging the diversity of viewpoints within our coalition.
