Project Details:

Title:
Inceasing soybean genetic gain for yield by developing tools, know-how and community among public breeders in the north central US

Parent Project: Increasing the rate of genetic gain for yield in soybean breeding programs
Checkoff Organization:North Central Soybean Research Program
Categories:Breeding & genetics
Organization Project Code:GRT00056023
Project Year:2020
Lead Principal Investigator:Leah McHale (The Ohio State University)
Co-Principal Investigators:
Asheesh Singh (Iowa State University)
Dechun Wang (Michigan State University)
Katy M Rainey (Purdue University)
Brian Diers (University of Illinois at Urbana-Champaign)
Matthew Hudson (University of Illinois at Urbana-Champaign)
Nicolas Frederico Martin (University of Illinois at Urbana-Champaign)
Aaron Lorenz (University of Minnesota)
Pengyin Chen (University of Missouri)
Andrew Scaboo (University of Missouri)
George Graef (University of Nebraska)
David Hyten (University of Nebraska at Lincoln)
Rex Nelson (USDA/ARS-Iowa State University)
Show more
Keywords:

Contributing Organizations

Funding Institutions

Information and Results

Click a section heading to display its contents.

Project Summary

The soybean research community has generated incredible public resources for soybean breeding, including collaborative yield trials such as the Northern Uniform Soybean Trials (NUST) which dates back to 1941 and commodity board funded genotypic data and genotyping platforms. However, these tools can be better leveraged to enhance gains for yield and seed composition in soybean. As part of our first objective, we propose to add value and utility to these resources through development of a breeding database that will be housed within SoyBase, the current community-supported USDA-ARS repository for soybean genetics and genomic data. We also propose the addition of environmental data to the NUST and addition of genotypic data to both the NUST and the SCN Regional Trials, both of which will facilitate breeding objectives for stability of both yield and seed composition.

Genomics-assisted breeding entails the use of genome-wide molecular marker data to aid in breeding decisions that make breeding programs more efficient and effective. Such applications range from the use of genomic selection, which can increase selection intensity and allow selection of parents earlier in a program, to the use of genomic data to optimally pair parents for creation of breeding populations containing more superior breeding lines and even possibly more favorable correlations between traits such as seed yield and protein. This latter application has been called “genomic mating”.

Numerous scientific articles have been published on the development and optimization of genomics-assisted plant breeding and, in part through our current NCSRP project, we have learned a lot about the optimal application of genomics-assisted breeding methods applied to soybean. The actual implementation of genomics-assisted breeding in the public plant breeding communities, however, has been minimal. Thus, Objective 2 is focused on the development and use of high-throughput genome-wide genotyping technologies that are of low cost with high-quality repeatable marker data, and making available tools for genomic data management and decisions that integrate genomic data and phenotypic data along with various analysis pipelines in a user-friendly form. The transfer and availability of these technologies to the public sector is critical to our ability to effectively train future soybean breeders, many of whom will be employed by private sector companies using these techniques.

Increases in soybean yield through breeding have been slower than growers expect. A collaborative study led by Diers of a historic set of MG II-IV varieties released from 1923 to 2008 revealed a recent rate of genetic gain of 0.43 bu/ac/yr, whereas reports of genetic gain in corn generally range from 1.0 to 1.2 bu/ac/yr. Moreover, this same study found that protein has decreased between these time periods by 1.7 percentage points, an undesirable outcome. Based on the mathematical formula for change resulting from selection, there are a number of possible targets for improving the rate of genetic gain. Objective 3 of this work focuses on the evaluation of different breeding methods each of which target one or more areas for improvement, such as selection intensity, accuracy, diversity, and the time required for each breeding cycle, and simultaneous improvement of traits that typically show negative correlations, such as yield and seed protein content. Breeders will implement and test the methods in their own breeding programs to determine which methods are most viable to improve genetic gains.

The proposed activities build on the current project funded to this group by NCSRP, “Increasing the rate of genetic gain for yield in soybean breeding programs.” One main objective in that project deals with extensive evaluation of diverse soybean genotypes from the USDA Soybean Germplasm Collection over four years and 30 environments to obtain high-quality phenotype and environment data. Completion and follow-up on that is detailed under Objective 4 in this project, and it provides foundational information for tool development and implementation here. Information from that study will be leveraged in this project for Objectives 1, 2, and 3. The entire set of 750 accessions, or some various subsets of those (i.e. exotic land races only, elite germplasm only, certain geographical regions only, etc.) can be used as training sets for prediction of yield, seed composition traits, maturity, and other traits for various objectives and for other programs.
Ultimately, in this project SOYGEN (Science Optimized Yield Gains across Environments) will leverage and build upon ongoing and previously funded work to increase soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US.

Project Objectives

Objective 1: Elevating collaborative field trials;
Objective 2: Development of a genomic breeding facilitation suite;
Objective 3: Evaluation of soybean breeding methods that increase gain
Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success

Project Deliverables

1.1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials.
1.2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data.
1.3) Data from the uniform tests will become more useful as it will be connected to environmental and genotypic data.
1.4) Breeders will better understand how to weight data from different environments of the Uniform Tests to know how well it will predict performance.
2.1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
2.2) Workshop on genomic selection delivered to public soybean breeding community.
3.1) Methods to improve selection of progeny rows based on genomic selection with secondary traits and/or improved spatial statistics.
3.2) Understand the potential to improve the unfavorable correlation between yield and protein in soybean through genomic mating.
3.3) Application and limitations established for rapid cycling genomic selection in soybean.
3.4) Characterization of allelic effect of putative yield alleles and markers for their selection.
4.1) Comparison of sampling methods and effective ways to efficiently sample the genotype collection, particularly for improvement of quantitative traits like yield.
4.2) Means and variances for traits in the different sampling groups (so, effect of sampling method on those estimates).
4.3) Identify loci associated with yield and other traits in this diverse panel of accessions that represents the genetic diversity in the collection, so we may ID new loci and alleles that will be useful for commercial and public breeding programs.
4.4) Provide genomic predictions for yield (done), maturity, seed protein and oil %, and other traits as appropriate, for all untested accessions in the USDA Soybean Germplasm Collection.
4.5) Investigate genotype-environment interaction effects on traits, and evaluate stability of yield and composition traits across environments.
4.6) Use data/results in implementation in Objectives 1, 2, and 3 of this project (FY20-22).
4.7) Preliminary analysis of data from the validation set of 250 entries.

Progress of Work

Updated May 7, 2020:
SOYGEN 2: Increasing soybean genetic gain for yield by developing tools, know-how and community among public breeders in the north central US


Objective 1: Elevating collaborative field trials

Task 1: Development of a database to store, query, and distribute data from collaborative field trials
Drs. Nelson and Lorenz have mapped out an overall structure of this database, with appropriate dropdown menus, etc.
Weather data will be accessed from Daymet (or a similar resource) automatically utilizing web services and displayed to the user in real time.
Task 2: Updating the Uniform Soybean Trials
(1) Continued collection of genotypic data for UST as well as the SCN regional trials.
Another 288 breeding lines from the 2019 and 2018 NUST trials were genotyped with the 6K array this past winter. A total of 2238 lines have now been genotyped. Some 2018 lines were missed the first time and we will request those seeds from breeders as soon as campuses re-open. The 2020 NUST lines were collected and delivered to UMN for genotyping, which will occur once shelter in place orders have been lifted.
(2) Incorporation of weather station data or use of interpolated weather data from commercial sources
At this time, we anticipate accessing weather data from Daymet automatically utilizing web services and displayed to the user in real time.
(3) Evaluation of suitability/representation of environments
A graduate student will be starting in the fall on this work. In the meantime, a postdoc at Univ of Illinois will be advancing these goals over the summer.

1c. Key performance indicators
(1) Standardized data input methods will be developed and will include data quality control methods.
Data for NUST will be extracted from data sheets provided by NUST coordinators. Parsing scripts will be created to automatically extract data from the Excel data sheets.
(2) Existing data from collaborative trials will be quality checked.
Data for NUST 1989-2017 have been incorporated into the database.
(3) Collection of genotypic data from the Soy6KSNP chip for UT and SCN regional trial entries.
As stated above, we have genotyped another 288 breeding lines and are making plans to genotype the 2020 NUST and SCN NUST lines once campus re-opens.
(4) Weather data will be collected for the majority of the future Uniform Test field environments.
Data will be accessed programmatically from Daymet (or a similar resource) and incorporated into report pages.

1d. Deliverables
(1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials.
A preliminary database schema has been created to accommodate NUST data 1989-2017.
We all helped Rex develop a basic design. He is implementing this framework right now.
(2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data.
Data for NUST 1989-2017 have been incorporated into the database.
(3) Data from the uniform tests will become more useful as it will be connected to environmental and genotypic data.
Incorporation of environmental data into reports will provide a way to start addressing issues of genotype by environment interactions in phenotypic values.

Objective 2: Development of a genomic breeding facilitation suite

Task 1. Genotyping methods
We plan to include optimization experiments when we receive more samples from breeders during the summer. We have done a load test to identify potential problems with running a large number of samples and are working on improving our sample tracking, the genotyping workflow, and improving the analysis pipeline.
Task 2. Imputation methods
The Hudson group is developing a haplotype map of soybean germplasm, and using this to detect signatures of selection in the U.S. Soybean Germplasm. We are using haplotype imputation in R to be able to convert haplotype location and information between the whole genome sequences, 50k array, 6k and smaller genotyping arrays. These scripts are complete, and we are developing imputation scripts for use by other labs with specific needs to convert between genotyping platforms. Since the scripts are platform-specific, this must be done in collaboration with breeders using specific platforms. So far, we have focused on 6k – 50k conversion for this purpose.
Using these methods, have applied a combination of different methodologies to discover the biological basis of the history of genetic breeding in US.
By analyzing the results that we obtained so far, we have observed a partial parallelism between the conventional public elite lines and the alternative gene pool lines previously developed by Dr. Randall Nelson of USDA-ARS. Our results from haplotype-based analyses (Rsb, XP-EHH, and hapFLK) support the hypothesis that in different gene pools, unique haplotypes have been fixed or are being fixed in different rates. Among the candidate haplotypes under selection, we identified 138 haplotypes underlying QTL regions, including 46 candidate haplotypes underlying QTLs related to yield-related traits. These promising results may contribute to the development of varieties with greater combinations of alleles conferring advantageous characteristics, by combining conventional and alternative gene pool germplasm.
Task 3. Genomic Prediction Facilitation Suite
A postdoc, Sushan Ru, was just hired (April) at UMN to take charge of this task. She has already fully surveyed genomic data management systems and tools available (or lack thereof). We are currently laying plans to put together an easy-to-use workflow that will take data from the Soybase database and provide predictions on target populations using genomic selection best practices.

2c. Key performance indicators
(1) Genotyping of 10,000 breeding lines using targeted GBS approach on 1k SNPs during first year of project.
I think this is for David. We have genotyped three plates of PYTs through David, but technical problems persist in the bioinformatics, and the COVID19 crisis has basically halted further data collection.
(2) Beta version of R script to impute underlying whole-genome haplotypes developed.
Several imputation scripts have been developed for use in the Hudson lab. A beta testing version of a conversion script to convert 50k data into haplotype information for the 6k and smaller arrays has been supplied to the Lorenz lab for testing.
(3) Workshop or webinar given on application of genomic selection to soybean breeding.
I gave a special workshop at the 2020 Soybean Breeders’ Workshop on March 2, 2020. Over 45 people attended this session covering an entire morning. Here is a link to the workshop materials: https://drive.google.com/open?id=16ZNJZYYaPisusKowUAX10F1KPLTEEf7n
A survey was sent out to attendees following the workshop, and all survey responses were favorable or highly favorable. This workshop will be offered again including tools and data built on the progress from the SOYGEN2 project.
(4) Genomic data management system and allied analysis tools for adoption by soybean breeding community identified.
Postdoc hired one month ago to make progress on this KPI.

2e. Deliverables
1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
We received two batches of samples from KSU for the rapid cycling project. We genotyped the 210 samples and returned the data to KSU.
2) Workshop on genomic selection delivered to public soybean breeding community.
Completed March 2, 2020.

Objective 3: Evaluation of soybean breeding methods that increase gain
Breeder participation in implementation and evaluation of selection models

3b. Brief description of proposed research
Task 1. Advanced spatial analysis
To set up protocols, in 2019, soil sampling was done on progeny rows and early stage yield trials near the Ames locations, and soil maps have been made (collaboration with Dr. B. Miller, ISU). Weather stations were also deployed in the field to have more precise measurements across the field. Drone RGB imagery was also taken in these field tests. These will be repeated to standardize the data collection and analysis pipeline in 2020.
Task 2. Development of breeding program specific genomic prediction models
Plants are being grown to produce tissue that will be used in the marker analysis needed for developing the prediction models.
To date, we have genotyped ~350 experimental lines and varieties with the Illumina 6KSNP chip technology from within our breeding program. These data represent 3 years of genotyping only the advanced lines in our breeding program, each year. Starting in 2020, we will also be genotyping experimental lines (~1500) that are advanced from the 2019 plant row stage to the first preliminary yield trails during the summer of 2020. At the end of 2020, we will have a large enough population size to start developing robust prediction models for cross validation and evaluating selection strategies based on genomic estimated breeding values in 2021.
Task 3. Genomic plus secondary trait selection at the progeny row stage
Progress: We have mapped out various scenarios for implementation of this goal and discussed them in the Rainey Lab and among the project PIs. I have created detailed schematics projecting out three years. We have designed an experiment for 2020 and packaged seed to compare heritability of secondary trait phenotypes and yield in small format progeny row plots vs. replicated yield trial plots.
Dr. Martin Rainey also was the recipient of a $500,000 AFRI Plant Breeding grant (ranked ‘Outstanding’) for three years that leverages this objective, and provides additional funding to develop non-destructive drone-based biomass estimation protocols applicable to the collaborators on this project.
Task 4. Exploration of genomic prediction to reduce unfavorable correlations between seed yield and protein
The collected genotype and phenotype data on the NUST trials has been used to predict the genetic correlation between protein and yield among all possible breeding populations that could be created. As expected, the mean of all these correlations is highly negative, but few populations are predicted to have a less severe correlation. Crosses are being designed that are predicted to produce progeny superior for both yield and protein. (Aaron)
Task 5. Rapid cycling
Create a random mating population using 13 parental lines to generate 100 to 150 F1s each intermating cycle. Genotype and use genomic selection to identify the top 30 to 40% of the F1 plants before flowering and random mate the selected F1s each season. Selection criteria will include yield, maturity, protein and oil concentrations, and genetic distance. Attempt to perform this selection and intermating process three seasons per year, one season in the field, and two seasons in the greenhouse during the fall and winter. Continue this intermating and selection process to complete three cycles of genomic selection. Following each cycle (C0, C1, C2, and C3) inbreeding of the selected and unselected F1s will be implemented for multiple generations to produce F4-derived lines of unselected (random) and selected (based on genomic selection) for each cycle of selection. These lines will be evaluated in replicated field trials to characterize the effectiveness of the genomic selection and rapid cycling methodology. (Bill)
Task 6. Evaluation of putative “yield” alleles
F1s have been generated in order to develop NIL varying for one pair of putative yield alleles. However, material transfer agreements (MTAs) from the USDA were unable to be obtained for other selected parental lines. This was due to a lack of proper MTAs in place for generating these parental lines. A new set of parental lines have been selected and we are initiating crosses this summer.

3c. Key performance indicators
(4) Genotyping of advanced lines, development, and cross-validation of breeding program specific models (Task 2).
At Univ of Illinois, we started growing in a greenhouse approximately 1800 lines that were evaluated over the past two years in the University of Illinois preliminary and advanced yield tests. Tissue had been collected from these plants that will be used in DNA extractions. Tissue was collected on about one-half of the lines until these activities were halted because university lab activities were shut down do to Covid-19. This will be completed once the university allows us to restart lab work.
At Univ of Missouri, to date, we have genotyped ~350 experimental lines and varieties with the Illumina 6KSNP chip technology from within our breeding program.
At the Ohio State University, we started growing in a greenhouse approximately 1000 lines that were evaluated in preliminary and/or advanced yield tests in 2018 and/or 2019, isolated DNA was being quantified and normalized when all lab activities were shut-down due to COVID-19. These activities will be resumed when we are able to begin lab work again.
(8) Generate crosses for 5 cross combinations based on breeder selections and 5 cross combinations based on genomic mating selections for protein and yield (Task 4).
This is being done this spring, right now.
(10) Perform crosses, genotyping, and line advancement according to rapid cycling breeding scheme (FY20-22) (Task 5).
In Nebraska, crossing and genotyping to be conducted during 2020 season.In Kansas, during the summer of 2109, 13 elite parents ranging in maturity from mid-group III to early-group IV were intermated using a diallel design in 138 parental combinations that produced a total of 627 CO F1’s. The parents were selected phenotypically for seed yield, maturity, protein and oil concentrations, and for genetic diversity based on pedigree information. In the Fall of 2019, about 140 CO F1’s were planted in the greenhouse and genotyped. Based on genomic predictions for seed yield, maturity, protein, oil and genetic distance, about 43 F1’s were selected and intermated at random. Over 300 emasculations were completed and 221 C1 F1 seeds were produced. In the Winter of 2020, about 140 C1 F1’s were planted in the greenhouse and genotyped. Based on genomic predictions for seed yield, maturity, protein, oil and genetic distance, about 43 F1’s were selected and intermated at random. Over 400 emasculations were completed with seeds to be harvested in May 2020 for planting in the field in June. To eventually produce F4 derived lines, a random sample of F2 seed from all CO F1’s were advanced for inbreeding in the F2 and F3 generations during the winter of 2019/20. Random samples of F2 seed from C1 and C2F1s will be advanced in the field in 2020.
(12) Develop and increase seed for NILs varying for alleles at putative yield loci (Task 6).
F1 have been generated and are currently growing for production of F2 seed to study one putative yield locus.

3d. Deliverables
(3) Application and limitations established for rapid cycling genomic selection in soybean. For the Nebraska rapid cycling GS, we are focusing on water productivity (WP) and genetic diversity in the rapid cycling set of material. The training set for water productivity comes from results of a dissertation study on soybean response to water. This is important across the north central region in rainfed production systems, as well as for 2.5 million acres of irrigated soybean production in Nebraska.

Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success

4c. Key performance indicators
(1) Soybean breeding programs choose soybean accessions for use in their breeding programs based on results of this work.
The predictions for all accessions in the collection were provided to participants in early 2016. Some programs have used the original set of genomic predictions from our first stage of evaluation of the 500 germplasm accessions to select new soybean PI accessions for use in their breeding programs to increase genetic diversity and yield. We selected several PIs from the list that represented new diversity in the US soybean germplasm pool for use in our Nebraska breeding program. Other programs may provide information on their specific use of selections from this work.

4d. Deliverables
(1) Comparison of sampling methods and effective ways to efficiently sample the genotype collection, particularly for improvement of quantitative traits like yield.
The SSD method is more effective at sampling the range of genetic diversity in the germplasm collection, and so leads to more efficient sampling and evaluation of a subset of lines for quantitative traits like yield. For example, our group of about 145 SSD selections in that sample represented the range of genetic diversity in the 19,000+ accessions in the USDA Soybean Germplasm Collection as well as the entire set of 500 entries.
(2) Means and variances for traits in the different sampling groups (so, effect of sampling method on those estimates).
Means for Yield, Seed Weight, Plant Height, Lodging, Shattering and Maturity were similar for each of the sampling groups for yield and most other traits. Interestingly, the genotypic variance for lines in the SSD group was more than twice that for the CLU and RAN samples, and overall (ALL), while the environment variance and line x environment variance estimates were similar across groups. Variance component estimates for traits in each group, and specifically for yield, show distinct difference in the SSD sample group with major effects of the main effect for genotype and environment, but relatively small genotype x environment interaction variance. Details and implications of this finding for modeling, prediction, and parent selection for traits like yield will be discussed in the publication that is being developed.
(3) Identify loci associated with yield and other traits in this diverse panel of accessions that represents the genetic diversity in the collection, so we may ID new loci and alleles that will be useful for commercial and public breeding programs.
The genome wide association analyses for each trait and sample group show different results among sample groups, with some overlapping loci detected. It seems like the SSD group may provide a more robust analysis of the genotype-phenotype association and so shows fewer loci, whereas some of the associations in the other sample groups may be spurious. These results are being investigated further as we develop the manuscript for publication.
(4) Provide genomic predictions for yield (done), maturity, seed protein and oil %, and other traits as appropriate, for all untested accessions in the USDA Soybean Germplasm Collection.
This was done previously and provided the inputs for the current discussion.
(5) Investigate genotype-environment interaction effects on traits, and evaluate stability of yield and composition traits across environments.
This was done previously and provided the inputs for the current discussion.
(6) Use data/results in implementation in Objectives 1, 2, and 3 of this project (FY20-22).
With these results, we can start to use this information for implementation in other parts of this project.
(7) Preliminary analysis of data from the validation set of 250 entries.
Only basic yield and field results done. This will be a focus later this year after submission of the manuscript that documents results from the first training set of 500 entries.

View uploaded report PDF file

Updated November 23, 2020:
We met as a group for a conference call on Aug 13th, 2020. Our discussion covered project updates, any obstacles due to COVID-19 restrictions, and necessary project revisions that may be necessary for FY21 to accommodate budget cuts.
Objective 1: Elevating collaborative field trials
A new interface for the Uniform Trial data has been developed for Soybase by Hriddhi Kulkarni under direction of Rex Nelson. Suggestions and requests were made by the groups; however, the new interface meets most if not all needs. The requests included more accessible hyperlinks for tables and a strain-based query system, and “checks” to be included as a category in the strain drop down menu.
Objective 2: Development of a genomic breeding facilitation suite
Lorenz reported that there has been good progress in workflow or pipeline for Genomic Prediction. Conversion of GIGWA relational database.
Objective 3: Evaluation of soybean breeding methods that increase gain
Most breeding programs have collected or are in the process of collecting tissue samples (hundreds to low thousands) for genotyping. These are being sent to the Hyten lab. There is concern over overwhelming the Hyten lab with samples. To avoid overwhelming the Hyten lab with samples, individual labs will attempt to send isolated DNA and a secondary service from Agriplex will be assessed.
Schapaugh was able to make crosses through rapid cycling (cycle 3 F1s). This relies heavily on the ability of rapid genotyping turn-around. For the last cycle, turn around was 2 weeks.
While a quick turn-around of < 2 weeks necessary for rapid cycling is tough, it is possible. Seed chipping may enable a longer turn-around time.
Martin Rainey sent out drone imaging protocol in June.

Final Project Results

Updated May 14, 2021:
Objective 1: Elevating collaborative field trials
This project aims to develop a database to collect/compile, store/manage, query, and publicly distribute data from collaborative field trials (Northern Uniform Soybean Trials and SCN Trials). We aim to further add value to these collaborative trials by genotyping the breeding lines tested.
1c. Key performance indicators
(1) Standardized data input methods will be developed and will include data quality control methods.
We have communicated with the Northern Uniform Soybean Trials’ collaborators to make an agreement to access GPS coordinates of test locations.
(2) Existing data from collaborative trials will be quality checked.
We have mapped out the structure of and databased the UT data set. We are currently testing the database.
(3) Collection of genotypic data from the Soy6KSNP chipfor UT and SCN regional trial entries.
We collected 6K genotype data on all 2020 UT lines. The 2020 SCN UT lines will be planted in the field along with all 2021 UT and SCN UT lines for tissue collection and genotyping.
(4) Weather data will be collected for the majority of the future NUST field environments.
Weather datasets were collected in the site years corresponding to NUST field trials from using the geographic coordinates of the field trials linked with the DAYMET weather data. This information along with field trial phenotypic information will be used to compare the year to year site trialing similarity.
(5) The data from the NUST will be analyzed to determine the usefulness of test locations in predicting the performance of the experimental lines.
Weather datasets along with field trial phenotypic information is be used to compare the year to year site trialing similarity.
1d. Deliverables
(1) Database framework for agronomic, environmental, genotypic, meta and other trait data for collaborative trials.
Database tables and draft query user interfaces have been created. Beta testing of the interface by project participants continues.
(2) Database populated with historical and current data from collaborative trials, including agronomic, environmental, genotypic, meta and other trait data.
Phenotypic data from collaborative trials from 1989 to the present have been loaded into the data tables and are accessible to project participants. Environmental data will be available through an interface to the DayMet meterological API.
(3) Data from the uniform tests will become more useful as it will be connected to environmental and genotypic data.
GPS locations of field trials and genotypic data of breeding lines are available.
Objective 2: Development of a genomic breeding facilitation suite
Genomics-assisted breeding entails the use of genome-wide molecular marker data to aid in breeding decisions that make breeding programs faster, more efficient, and more effective. To better adopt these methods, we aim to make high-quality, inexpensive genetyping methods, decision support tools, and user friendly analysis pipelines available.
2c. Key performance indicators
(1) Genotyping of 10,000 breeding lines using targeted GBS approach on 1k SNPs during first year of project.
We have received 7,730 DNA samples to run with the 1k SNP set. We are currently processing these samples. The first 2,592 are in the process of being sequenced.
(2) Beta version of R script to impute underlying whole-genome haplotypes developed.
A haplotype map of soybean germplasm is being developed, which will aid in transferring genotypic data across platforms. We are using haplotype imputation in R to be able to convert haplotype location and information between the whole genome sequences, 50k array, 6k and smaller genotyping arrays such as 1k. These analysis tools are complete, and we are developing tools for use by other labs with specific needs to convert between genotyping platforms. So far, we have focused on 6k – 50k conversion for this purpose..
(4) Genomic data management system and allied analysis tools for adoption by soybean breeding community identified.
During this past reporting reporting period we were able to install a genome-wide marker database called GIGWA (https://gigwa.southgreen.fr/gigwa/). We have deposited our current genome-wide marker data into this, including all the genotype data collected on the UT as part of this project. A workflow of software tools and scripts was initiated to seamlessly combine data held in this database with phenotypic data and genomic prediction models to ease the use of genomic selection in a practical breeding context. There are a few steps that need to be developed, such as low-to-high marker density imputation and training population optimization. The current postdoc left for a permanent position, and we are currently seeking another postdoc to continue this work.
On a related front, co-PI Nelson, with input from Lorenz, is research the adoption of a platform called BreedBase (breedbase.org). We are hoping this can be installed at Soybase and be available to public breeders for depositing the phenotypic and genotypic data and facilitate the use of genome-wide marker data for breeding. This is in the early stages of development right now.
2e. Deliverables
1) Streamlined public genotyping service for the public soybean breeding sector at a low enough cost to afford genomic selection on a wide scale.
This first batch of 7,000 lines is helping us to streamline our submission process and determine what parts of the genotyping process need to be improved for this summer.

Objective 3: Evaluation of soybean breeding methods that increase gain
Many alternative strategies and tools exist to improve genetic gain. With expertise in genomics, characterization of genetic diversity, analyses of spatial variation and breeding methodology, this team will be able to simultaneously attempt to improve the genetic gains in our breeding programs while also asking and answering questions to identify best breeding practices.
3c. Key performance indicators
(1) Preliminary single-site validation of spatial statistics are selection of added growth stage and/or drone based phenotyping and soil parameter factors (Task 1).
Preliminary yield prediction models have been run on single location progeny rows from 2019 using elastic net, ridge regression and lasso. Preliminary results show RMSE of 7 bu/acre and R2 of 0.69. Models have shown relative maturity and pedigree information to have the largest effect on yield. Soil parameters and canopy area have also shown some significance. Soil data is extracted using fine scale soil maps generated in collaboration with soil scientist Dr. Miller and his postdoc Dr. Khaledian. With these soil maps we get soil nutrient data (N,P,K, CA,MG, CEC, NO3, OM) as well as soil texture data on a 3m x 3m scale. Further machine learning and model development and selection criteria are being developed with Dr. Sarkar and his graduate student Luis Riera.
(3) Validation and selection of spatial statistics and added factors based on multi-location data (Task 1).
In collaboration with statistican Dr. Dutta and his graduate student, Dongjin Li, we have prepared a tutorial using the statgenSTA R package. This tutorial includes videos, and an html notebook showing the steps from data preparation, fitting and running models, as well as outlier analysis. The statgenSTA package allows users to fit traditional non-spatial models, as well as spatial models, by including row and column information as well as replications. Users can use the lme4, SPATs or ASREML packages for fitting the data. This tutorial will be shared with the breeding community prior to the fall season. We used the SPATs engine, which uses a penalized spline for spatial correction. This allows for a more dynamic spatial correction compared to the traditional moving means corrections. We also used this tool in our spatial adjustments for 2020 yield trials, and compared it with the traditional moving means method that we have used in the past. We have not validated results yet, on which method used for selection gives more accurate results, and this is an on-going work.
(4) Genotyping of advanced lines, development, and cross-validation of breeding program specific models (Task 2).
7000 advanced lines have been submitted and in the process of being genotyped (see Objective 2).
(7) Generate crosses for 5 cross combinations based on breeder selections and 5 cross combinations based on genomic mating selections for protein and yield (Task 4).
We used genomic prediction to predict the mean, variance, yld-pro correlation, and superior progeny mean of all possible crosses among 2019 and 2020 UT lines. We made this information available to all SOYGEN2 breeders for the planning of crossing blocks for each breeding program.
(10) Perform crosses, genotyping, and line advancement according to rapid cycling breeding scheme (FY20-22) (Task 5).
Crosses were made in Nebraska summer 2020 and sent F1 seeds to Puerto Rico to grow F1 plants from October ’20 to January ’21. Intermating among F1 plants were attempted, but virus issues in Puerto Rico caused issues and we were not able to obtain all of the F1 x F1 crosses. Instead, F2 seeds were harvested from all of the confirmed F1 plants and are now crossing among F1:2 lines for the second intermating.
(11) Develop and increase seed for families varying for alleles at putative yield loci (Task 6).
Due to inability to MTA from the USDA for many of the cultivars used in the pedigrees of these lines, we were only able to complete a single cross combination: LG09-8165 x LG11-5120. F2 seed has been generated and is being advanced.
(12) Develop markers for selection of putative yield alleles (Task 6).
Markers have been developed based on key SNPs at four potential yield loci to be tested in families derived from the LG09-8165 x LG11-5120 cross.
Objective 4: Characterization and use of the USDA Soybean Germplasm Collection, a foundation for future success
Our previous NCSRP project included an evaluation of 750 soybean accessions from the USDA Soybean Germplasm Collection in yield tests taking place in 30 environments to obtain high-quality phenotype data on this diverse panel of accessions representing the genetic variation of the entire Collection. This work has allowed us to identify best-practices for sampling of large populations. Genotype-phenotype associations have been made, identifying some key loci for selection in breeding programs. Additionally, genomic predictions for yield have been completed and provide a means to select predicted higher yielding lines from the 20,000 germplasm accessions within the full Collection.
4c. Key performance indicators
(1) Soybean breeding programs choose soybean accessions for use in their breeding programs based on results of this work.
Predictions for crosses are now currently being obtained. This work has been leveraged in other projects to identify lines with predicted high yield.

View uploaded report PDF file

Benefit to Soybean Farmers

Ultimately, in this project SOYGEN (Science Optimized Yield Gains across Environments) will leverage and build upon ongoing and previously funded work to increase soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US. This work will result in greater genetic gains in soybean for yield, as well as any other targeted trait. This will translate to improved cultivars which achieve higher yields and higher quality.

Performance Metrics

1.1) Standardized data input methods will be developed and will include data quality control methods.
1.2) Existing data from collaborative trials will be quality checked.
1.3) Collection of genotypic data from the Soy6KSNP chip for UT and SCN regional trial entries.
1.4) Weather data will be collected for the majority of the future Uniform Test field environments.
1.5) The data from the Uniform Tests will be analyzed to determine the usefulness of test locations in predicting the performance of the experimental lines.
2.1) Genotyping of 10,000 breeding lines using targeted GBS approach on 1k SNPs during first year of project.
2.2) Beta version of R script to impute underlying whole-genome haplotypes developed.
2.3) Workshop or webinar given on application of genomic selection to soybean breeding.
2.4) Genomic data management system and allied analysis tools for adoption by soybean breeding community identified.
3.1) Preliminary single-site validation of spatial statistics are selection of added growth stage and soil parameter factors (Task 1).
3.2) Timely report on soil parameters, yield, and growth stages for 500-1000 entries for each participating program in FY21 and FY22 (Task 1).
3.3) Validation and selection of spatial statistics and added factors based on multi-location data (Task 1).
3.4) Genotyping of advanced lines, development, and cross-validation of breeding program specific models (Task 2).
3.5) Genotyping of 2500 F4 lines in two years (Task 3).
3.6) Application of 5 different selection schemes in two years (Task 3).
3.7) Comparison of preliminary yields resulting from lines selected from 5 different schemes applied in two years (Task 3).
3.8) Generate crosses for 5 cross combinations based on breeder selections and 5 cross combinations based on genomic mating selections for protein and yield (Task 4).
3.9) Advance generation by single seed descent for generated crosses in (8) and perform preliminary yield trials with protein data collected by NIRS on F3 or F4 derived lines in FY22 (Task 4).
3.10) Perform crosses, genotyping, and line advancement according to rapid cycling breeding scheme (FY20-22) (Task 5).
3.11) Evaluate changes in yield, genetic diversity, and model fit across cycles of selection (Task 5).
3.12) Develop and increase seed for NILs varying for alleles at putative yield loci (Task 6).
3.13) Carry out yield tests for NILs generated in (12) (Task 6).
4.1) Soybean breeding programs choose soybean accessions for use in their breeding programs based on results of this work.
4.2) Information from these studies facilitates tool development and implementation across objectives in this project and in other soybean programs to improve genetic gain per year.
4.3) High-quality, multi-environment phenotype information and environment data lead to new knowledge for important traits and their interactions with each other and environmental factors.
4.4) Our work serves as a model for other germplasm collections.

Project Years

YearProject Title (each year)
2023SOYGEN3: Building capacity to increase soybean genetic gain in future environments for seed yield and composition through combining genomics-assisted breeding with environmental characterization
2022SOYGEN2: Increasing soybean genetic gain for yield and seed composition by developing tools, know-how and community among public breeders in the north central US
2021SOYGEN2: Increasing SB genetic gain for yield & seed composition by developing tools, know-how & community among public breeders in the NC US
2021SOYGEN2: Increasing SB genetic gain for yield & seed composition by developing tools, know-how & community among public breeders in the NC US
2020Inceasing soybean genetic gain for yield by developing tools, know-how and community among public breeders in the north central US
2020Inceasing soybean genetic gain for yield by developing tools, know-how and community among public breeders in the north central US
2019Increasing the rate of genetic gain for yield in soybean breeding programs
2018Increasing the rate of genetic gain for yield in soybean breeding programs
2017Increasing the rate of genetic gain for yield in soybean breeding programs