2015
Application of high throughput gene function discovery to soybean improvement
Contributor/Checkoff:
Category:
Sustainable Production
Keywords:
GeneticsGenomics
Parent Project:
This is the first year of this project.
Lead Principal Investigator:
Brian Dilkes, Purdue University
Co-Principal Investigators:
Project Code:
Contributing Organization (Checkoff):
Institution Funded:
Brief Project Summary:

For the continued improvement of soybean seed composition, it is necessary to identify and exploit new sources of genetic variation conferring desirable traits. Researchers have collected mutants that carry either reduced levels of linolenic acid or elevated oleic or stearic acid. These lines can be used to produce soybean oils with added stability and greater health benefits. The project goals include developing a high throughout genomics strategy to quickly identify the causative mutation and to enable these lines to be used in breeding programs for improved soybean oil traits.

Key Benefactors:
farmers, soybean breeders, agronomists

Information And Results
Final Project Results

Update:
refer to attached document

View uploaded report Word file

This project was a continuation of a collaboration between the Dilkes and Hudson labs initiated in June 2014, and the project was awarded additional funding title “Application of High throughput function discovery to soybean improvement”. This is the final report of progress for that project with a focus on the products from 2016 and 2017. The project is complete and no additional work is expected and all unexpended funds should be returned to the funder.

The long term goal of the project was to accelerate the discovery of genes in soybeans. We chose to identify the genes responsible for mutations in soybean lines with altered seed composition, particularly seed oil. In the service of this, we combined the genetic resources in the Hudson lab and the bioinformatics expertise in the Dilkes lab and developed bioinformatics approaches to efficiently filter out unimportant background mutations to improve the accuracy and reduce the cost of identifying causative mutations in soybean lines with altered seed composition. During the project period there were three objectives, The project had three objectives: 1. to sequence the genome of the crossing partner cultivar “Prize” to identify polymorphisms useful for mapping between Prize and Williams-82; 2. Sequence mutant and non-mutant siblings of Prize-mutant F2 populations to map mutant traits; and 3. Identify the location of causative mutations using bioinformatics. We were successful in building the bioinformatics approach. Sequencing of mutants and Prize and the calling of mutations was also successful. For the majority of mutants identified by HPLC, we identified candidate genes in the sequencing data. Remarkably, for all of the mutants that had strong mendelian (3:1) segregation ratios that were identified from the mutant population, we identified strong candidates for the causative mutations. All remaining mutants lacked a strong candidate gene, and when tested for segregation, they were inherited as weakly penetrant alterations of phenotype that gave a more continuous variation in oil content, rather than the mutant vs wildtype categorical changes in oil phenotype that we expected. Thus, we have now described all of the categorically heritable oil mutants identified by HPLC in the population to date.
Novel alleles at previously identified loci were identified in this project, including multiple fatty acid desaturases. A manuscript describing novel alleles of the FAD3A gene has been submitted from this project and is currently under review.

Briefly, by re-sequencing the genomes of thirteen mutants and combining these data with three other mutants sequenced by the Dilkes lab, we were able to identify and remove confounding sequence variation. Polymorphisms that are shared between different lines are a result of residual heterogeneity in the founding population or artifacts caused by the interaction between the complex polyploid genome of soybean and modern sequencing methods. Such positions, identified in error, confuse researchers and make determining the molecular cause of variations in phenotype costly and labor intensive. Using our approach, we were able to eliminate the vast majority of variants (up to 96%) discovered in each of these lines as not possibly causing the altered soybean seed composition by taking this approach. Thus we a were able to increase the precision of our approach by 25-fold when compared to sequencing of individual mutants. We then compared the possibly causative mutations with an extensive list of genes that are predicted to be involved in some aspect of oil biosynthesis or metabolism and were able to assign likely candidates to many of the lines directly, some of this was complete in the previous project period.


1. The cultivar Prize was resequenced. This is a valuable resource going forward for the development of molecular markers for genetic mapping of many traits under study in the Hudson lab, and was used for the analysis in Objective 2.

2. One aim of this project was to test the efficiency of whole-genome bulk segregrant resequencing, and compare it to similar methods such as genotyping by sequencing (GBS) as well as determining optimal sample sizes and preparations for soybean populations. Using F2 samples from 9 populations were sequenced. DNA from F2 individuals was prepared, purified, and pooled for sequencing. From 4 populations, we sequenced DNA from two contrasting phenotypic groups, and from 5 additional populations we sequenced DNA from only the mutant outliers. One of these populations corresponded to a line for which we had whole genome sequence from the first year of funding for the project. Through leveraged funding we obtained GBS data in parallel on these populations to aid in genetic mapping. In soybean, even with the use of winter nurseries and early phenotyping/genotyping, high resolution genetic mapping requires multiple crosses and growing seasons and larger population sizes than were anticipated for this project. We are still fine mapping these traits using both conventional and genomic methods. Figure 1 shows the most promising linkage data from one the lines. The data generally indicate that the remaining uncloned mutants will require larger populations and more complicated experimental designs to identify the molecular identities of any additional genes affecting altered seed composition.

3. We have been able to confirm that line 1877, with a low-linolenic acid phenotype, carries a mutation in the FAD3A and we have now submitted a publication describing this variant for publication in Crop Science (see below). For 5 of the lines, sequencing identified strong candidate genes with known functions in oil biosynthesis or lipid metabolism (Table 1). 8 of the lines did not, and these latter cases the inheritance of the trait was weaker, requiring larger populations sizes than we had expected to discover the causative variant. In one case we were able to demonstrate a non-recessive inheritance pattern (Table 2) that will need to be taken into consideration for future experimentation identifying the gene responsible for this mutant phenotype.


*refer to document for tables

The United Soybean Research Retention policy will display final reports with the project once completed but working files will be purged after three years. And financial information after seven years. All pertinent information is in the final report or if you want more information, please contact the project lead at your state soybean organization or principal investigator listed on the project.