Update:
Objective 1. Continue to develop and enhance genomics-assisted breeding resources and tools to facilitate routine application in public breeding programs.
Objective 1 can be broken down into five sub-objectives for which we can report specific and significant progress during these past six months.
Sub-obj 1: Continue to genotype with genome-wide and trait-targeted markers all new breeding lines entered in the Northern Uniform Soybean Tests
Progress: 560 new NUST breeding lines were genotyped for 20 trait targeted markers using a commercial service. These results were returned and published in the NUST reports to help breeders make selections on SCN resistance, brown stem rot resistance, IDC resistance, and more. Additionally, high quality DNA from these 560 lines was extracted and sent to Gencove for genome-wide skim sequencing. We are awaiting return of data from the vendor. Once the data is returned, it will be filtered and deposited in Soybeanbase, where we have already made NUST genotype data publicly available.
Working with Rex Nelson at USDA, we’ve processed and uploaded the 2023 Norther Uniform Trials data to the Soybase query portal (https://www.soybase.org/ncsrp/queryportal/). This database now hosts NUST trial data from 1993-2023.
Sub-obj 2: Enable individual public breeding programs to test and use genomic prediction
This project has instigated and enabled several public programs to start using genomic prediction routinely. Below are some highlights from reports from individuals programs that are part of the SOYGEN initiative.
UMN has refined its GS pipeline and tested it extensively on the UMN Preliminary Yield Trials (PYT) data. PYT 2023 progeny population lines were assayed using 1K low-density (LD) genotyping assay and parents of PYT23 lines from the crossing block were assayed using a low-pass sequencing platform to generate high density (HD) variant data. The 50K SoySNP Chip subset from the HD data set as the parental reference panel to impute 1K LD set to 50K HD set (~30K SNPs after QC). We used this imputed data to make genomic predictions using genomic prediction models that include GxE interaction effects. We are also designing experiments to compare the efficacy of genomic prediction with phenotypic selection. We’ve also expanded the selection of our GxE models and tested several of these using the 2023 PYT data.
ISU genotyped all lines in their prelim and advanced yield trials. A newly hired postdoctoral research associate is working on developing genomic prediction models.
NDSU is currently working to develop a genomic selection pipeline for future use in the NDSU breeding program. They have begun testing different models through a cross-validation procedure for predicting yield and maturity, using the Agriplex marker set and phenotype data from 2022 and 2023 for roughly 1,000 experimental lines. In the short term, we are aiming to develop training populations and assess accuracy for predicting yield and maturity from past years within our program. The Agriplex genotype data was all funded through the SOYGEN project and the progress we’ve made to this point would have been impossible without this support.
KSU is continuing to evaluate progeny of the rapid cycling experiment. Last year’s data was not the best because of terminal drought conditions. They would like to place the experiment in the field this year, but trying to figure out if they can handle it. In 2020, we setup our crossing block based on 1) GS combinations, and 2) Breeder combinations. F4 derived lines from those populations are in preliminary yield trials this year. We have about 300 entries from populations created based of genomic predictions, and about 300 lines from the breeder’s selections. They have genotyped all 600 entries in the rapid cycling experiment which will be another layer of information to examine the response to selection.
Purdue implemented the genomic selection experiment in progeny rows as part of SOYGEN. They studied the efficacy of genomic selection for yield compared to phenotypes only, and added an objective combining genomic and phenomic data as well. Across two years, we genotyped and phenotyped ~10,000 progeny rows and planted ~2,000 preliminary yield trial plots across four environments. We finished this experiment in the 2023 season and are currently writing a manuscript describing results. Preliminary results indicate that phenotypic and genomic selection for yield were equivalent, but including biomass phenotypes in genomic selection increased accuracy of yield prediction by 10%.
MSU is genotyping all breeding lines with 6K SNPS and using genomic prediction models to predict white model and SDS resistance.
Sub-obj 3: Development of a genomic prediction R-Shiny app for easy implementation of GS for breeders.
Work on the application has continued. We have built in functionality for various types of genotype imputation, including a powerful pedigree-based approach called AlphaPlantImpute.
This will help us go from data to predictions in a streamlined, effective way using one application. We are still working on implementing the genotype-by-environment prediction models. This is getting closer, and once this is up and running, we will write a publication releasing this application to the wider public. We have met with at least four separate research groups who have expressed interest in this application, and we sent them copies for beta testing.
Sub-obj 4: Adopting and advancing BreedBase for storage of information for soybean genomic prediction.
There is little to report on this objective except for the fact that we continue to work with the USDA Breeding OnRamp team to optimize BreedBase for public soybean breeding (called “Soybeanbase”). We have met with this team periodically to make improvements to the database. At least four programs in SOYGEN are using this database for regular organization of genome-wide marker data.
Sub-obj 5: Connect target and training populations using imputation that leverages pedigree relationships and enhance this capacity by inclusion of this method in the software application.
This sub-obj has been completed this past reporting period. We have explored the use of AlphaPlantImpute and found that imputation accuracy is very high when projecting high-density SNPs onto low-density SNPs. This method has been incorporated into the genomic prediction R Shiny App as described above. A grad student presented two posters on this research this past reporting period and will prepare a publication.
Objective 2. Develop and test methods for predicting cultivar performance in future target environments through genomics-assisted breeding models, phenomics, and environmental characterization.
For this objective, we are conducting a multi-environment, multi-institutional coordinated performance trial of 1200 diverse breeding lines. Each breeding line will be phenotyped for several agronomic and phenological traits, and each will be genotyped using low pass re-sequencing technologies. Detailed environmental for each growing location in each year will be collected and analyzed. The ultimate goal is to better predict the interactions between the environment and genotype. If we are successful, we leverage genomic data, phenotype data, and environmental data to predict how new breeding lines may perform in future environments that a producer is most likely to encounter.
The last report focused the successful seed increases we conducted last summer. During the last reporting period, the main goal was to design entry lists for each RM Set, design field maps and field books, and package seed for shipment for planting. The grad student funded on this project organized all the logistics in terms of receiving seed, packaging seed, and sending seed back out to cooperators.
Over 1200 packs of seed were shipped to UMN, and seed was packaged, and shipped back out to nine universities that will plant multi-location yield trials. Planting will commence once weather conditions allow. While describing this feat does not take much space, it was indeed quite an undertaking for the grad student involved to receive all this seed, organize it, do a quality control check, and ship it back out for specific yield trials.
Objective 3. Discover structural variants and test whether modelling structural variants improves genomic predictions for yield and seed composition.
We have fully sequenced the NAM founders using Illumina, we have conducted and optimized SNP variant calling, and have now effectively utilized various structural variant (SV) caller tools in tandem to identify SVs within the soybean NAM parents' dataset. Specifically, Sentieon has revealed approximately 470,000 unfiltered SVs. Delly has identified about 35,000 unfiltered SVs, and CNVnator has detected approximately 4,000 unfiltered copy number variations (CNVs). Currently, we are executing a pipeline that incorporates Manta and Smoove, aiming to uncover additional SVs. The primary objective is to isolate high-quality SVs. To achieve this, we will prioritize SVs that have been consistently identified by at least two distinct SV caller tools, ensuring the reliability of the detected variants. Once we have the full SV dataset we will proceed with determining their effect on heritability within the soybean breeding population. The grad student funded on this project is also re-writing and improving the pipeline for better ease-of-use and reproducibility.
Meanwhile, we have sent 19 high-quality samples to JGI so far to begin sequencing the core of the soybean pangenome, including key North-Central founder lines such as Lincoln, current public elite lines, and the SCN indicator lines. We plan to submit 200 more samples this year as we ramp up the generation of DNA for this very large project which leverages SOYGEN funds.
Peer-reviewed publications for this reporting period
1) Wartha, C., and A.J. Lorenz. 2024. Genomic predictions of genetic variances and correlations among traits for breeding crosses in soybean. Nature Heredity (Accepted pending revision)
2) Wang, H., X. Zhao, L. Tan, J. Zhu, D. Hyten. 2024. Crop DNA extraction with lab-made magnetic nanoparticles. Plos ONE: doi.org/10.1371/journal.pone.0296847/
3) Mahmood Anser , Bilyeu Kristin D. , Škrabišová Mária , Biová Jana , De Meyer Elizabeth J. , Meinhardt Clinton G. , Usovsky Mariola , Song Qijian , Lorenz Aaron J. , Mitchum Melissa G. , Shannon Grover , Scaboo Andrew M. Cataloging SCN resistance loci in North American public soybean breeding programs. Frontiers in Plant Science. 14. 2023. https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1270546. DOI 10.3389/fpls.2023.1270546
4) Viana, J.P.G., A. Avalos, Z. Zhang, R. Nelson, M. Hudson. 2024. Common signatures of selection reveal target loci for breeding across soybean populations. Crop Sci.: doi.org/10.1002/tpg2.20426
Invited presentations
1) Lorenz, A.J., et al. 2024. Developing resources to advance the implementation of genomic prediction in soybean. BioOnRamp USDA Webinar. Feb. 23, 2024.
2) Lorenz, A.J., et al. 2024. Developing resources to advance the implementation of genomic prediction in soybean. International Institute of Tropical Agriculture Webinar. March 7, 2024.
View uploaded report