Project Details:

Title:
Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean

Parent Project: This is the first year of this project.
Checkoff Organization:North Central Soybean Research Program
Categories:Environmental stress, Breeding & genetics, Technology
Organization Project Code:60065
Project Year:2023
Lead Principal Investigator:Krishna Jagadish (Texas Tech University)
Co-Principal Investigators:
Doina Caragea (Kansas State University)
William Schapaugh (Kansas State University)
Gunvant Patil (Texas Tech University)
Glen Ritchie (Texas Tech University)
Hamed Sari-Sarraf (Texas Tech University)
Impa Somayanda (Texas Tech University)
Henry Nguyen (University of Missouri)
Avat Shekoofa (University of Tennessee-Institute of Agriculture)
Show more
Keywords:

Contributing Organizations

Funding Institutions

Information and Results

Click a section heading to display its contents.

Project Summary

A 30 to 80% flower drop in soybeans grown across different regions in the US is an unresolved and
persisting bottleneck that has limited soybeans ability to achieve the full genetic yield potential. The major
challenge has been the lack of robust, field-based high throughput phenotyping and analysis tools to
capture temporal variation in flower abortion and pod retention across large genetically diverse
germplasm. The multi-regional (KS, MO, TN and TX) and trans-disciplinary team will develop an image-based
field phenotyping system, integrated with deep-learning tools to capture large genetic variation in
flower abortion and pod retention under different soil and climatic conditions. A genetically diverse set of
250 genotypes including late group II, group III and early group IV will be tested under natural dryland
conditions in MO and KS, and under irrigated and severe drought and heat stress conditions in TX and
TN. Currently available deep re-sequenced genotypic data will be leveraged to identify environmentally
stable and region-specific genomic regions controlling flower abortion. This fundamental knowledge will
help discover molecular switches to enhance flower and pod retention, and thereby enhance yield
potential under diverse environmental conditions. The proposed project will address - Tools and
Technology for Soybean Improvement and utilizing these to induce Extreme Weather Resiliency. In
summary, the overall goal is to increase flower and pod retention by 20 to 30%, with a potential
to enhance yields by 10 to 15%, ultimately translating to an additional 400 million dollars to
the national soybean industry.

Project Objectives

Objectives (Year 1)
• Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions
using a large diversity panel
• Develop an image-based field phenotyping system and deep-learning tools to precisely document
temporal dynamics in flower abortion and pod retention in genetically diverse soybeans
• Discover environmentally stable and region-specific genomic regions controlling flower abortion in
diverse soil types, moisture, and climatic conditions

Year 2 - Utilizing the findings from year 1, we will fine-tune the high-throughput phenotyping and deep-learning tools
to validate environmentally stable and region-specific genomic regions and identify candidate genes and
metabolites that control flower abortion and pod retention under different soil and climatic conditions (Year

Year -3 Initiate breeding populations development using germplasm with significantly higher flower and pod
retention, identify molecular markers and test CRISPR-based gene edited lines with higher flower and pod
retention under controlled environments and field conditions (Year 3).

Project Deliverables

- Identify novel soybean germplasm that have the potential to retain 20 to 30% more flowers, accompanied with a balanced source-sink relation to increase yield potential by 10 to 15%
- Common and regional soybean germplasm with increased flower retention identified and made available for breeding purposes
- A publicly available image-based high throughput phenotyping tool developed to track rate of flower abortion/retention to strengthen soybean breeding efforts
- Identify environmentally stable and region-specific genomic regions and molecular markers controlling flower abortion in soybean
- Identify QTLs and characterize promising genes controlling flower abortion using CRISPR-based gene editing technology
- Breeding populations to incorporate genes for increased flower and pod retention into elite germplasm for variety development

Progress of Work

Updated April 9, 2023:
Project report (first quarter Jan 1 2023 to March 31, 2023)

Project funded by North Central Soybean Research Program

Project tile - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee

Goals & Objectives
Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential

Objectives (Year 1)
• Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel
• Develop an image-based field phenotyping system and deep-learning tools to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans
• Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions

Progress achieved
Objective 1 - Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel

A total of 350 diverse soybean lines were sent for winter nursery seed increase at Costa Rica in December 2022. They were planted in foundation seed increase plot (total of 150 ft row length for each line) to make sure enough seeds (5 lbs) is available for field planting at multiple locations in summer 2023. Among the 350 lines, 310 lines had good germination and plant stand in the seed multiplication field. We expect to receive sufficient seeds for these lines in late April for 2023 summer planting. Genetic diversity among the group 3 and 4s are targeted in terms of genetic structure

The 310 lines represents genetic diversity of the USDA soybean germplasm collection in maturity group III and IV. We have whole genome sequencing data for this set with an average sequencing coverage of 20x. Approximately, 0.6 million high quality SNPs and 0.5 million In/Del are available for robust GWAS to identify genetic loci and genes regulation of stress resilience and flower abortion in soybean. The average SNP and In/Del density together is about 1 marker/Kbp.

Preparation of field trails at multiple participating locations

The experimental site at the University of Missouri for this project is located in the Bradford Research Center (Columbia, MO). Three-acre field was reserved in the farm for this project. We will collect soil samples to identify basic soil properties. The field will be prepared for planting in April. The proposed ~310 diverse lines will be planted in mid-May to early-June, depending on the local weather.

The experimental site to evaluate the diversity panel under rain-fed conditions at Kansas State University will be located at the Agronomy North Farm near Manhattan, KS. Three and one-half acres have been reserved for planting the experiment. Field preparation for planting is underway and soil samples will be taken following planting. We expect to receive seed of the panel from the winter nursery in April or early May (shared by University of Missouri colleagues) with an expected planting date in May.

The experimental site in University of Tennessee that the experiment will be carried out will be located in West TN Research and Education Center (WTREC) under rainfed condition. We have secured a little over 2.5 acres for this study in 2023. The soil samples collection is in progress and detailed information will be documented about the field. The burndown will be done in a couple of weeks. We will be receiving 310 soybean lines seeds in from University of Missouri colleagues and planting will be done in early May.

The experiment will be conducted on the Quaker Avenue Research Farm at Texas Tech University in Lubbock, TX. The experiment will be carried out under sub-surface drip irrigation (SDI). Multiple irrigation zones have been obtained for this trail, which total to an area of 3 acres. Soil samples will be collected and analyzed along with documentation of the field history over prior years. Herbicide applications for burndown will be completed in April followed by a pre-emerge herbicide application in mid-May prior to planting of the ~310 soybean lines thereafter.

Objective 2 - Develop an image-based field phenotyping system and deep-learning tools to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans

Before the field season begins the team has taken good advantage of greenhouse grown soybean plants and other existing datasets to develop a robust machine learning tool to detect flower number and rate of abortion under field conditions.
The team is implementing two general strategies for enumerating aborted flowers and has begun to apply them to greenhouse grown soybean plants.
1. Pre-abortion: Counting flowers on the plant and comparing the counts over time
2. Post-abortion: Collecting and counting aborted flowers over time

Strategy 1: We have developed a preliminary imaging protocol by which images of greenhouse plants are collected from multiple views and with high enough resolution (e.g., 4K x 6K) such that the smallest flowers are comprised of a minimum of 30 pixels. Our proposed strategy would then detect the flowers in two stages. (see page 3 for the in attached PDF for image)
a) Subsample acquired image and feed it to a node-detection network. Subsampling the original high-resolution image would make it possible for the detection network to ingest it without compromising image fidelity.
b) Having the nodes localized from the previous step, crop the original image, and feed the resulting high-resolution sub-images to a flower-detection network. This ensures that even the smallest flowers are comprised of a sufficiently large number of pixels and yet, the cropped input images are small enough for the network to ingest.

Node-Detection Network: As an initial approach to detecting nodes, we have employed the Faster R-CNN architecture. We started by pre-training our model with a dataset provided by the study in 2023 that focuses on detecting nodes on Eggplant, Chili, and Tomato plants. (see pages 4 to 9 for different images related to this network approach)

Future work includes:
1) Simplifying the annotation process for the new dataset, which contains three times more images than the previous one, by using the existing model's predictions (as shown in Figure 5) as preliminary annotations. Therefore, the annotators will primarily focus on refining the predicted bounding boxes and occasionally making additions or deletions. This approach will significantly accelerate the annotation process, which is essential for efficient model development.
2) Exploring and implementing other state-of-the-art network architectures that may be better suited and capable of achieving superior performance for our application.
3) Associating the model predictions with the ground truth flower and node data to ascertain the efficiency of the model predictions and the extent of refinement needed for models to be precise to allow for deployment under field conditions.

Flower Detection Network: Similar to the node detection network, the flower detection network is also based on the Faster R-CNN architecture. Specifically, we used the Faster R-CNN implementation available in Detectron2 (a library containing state-of-the-art detection and segmentation algorithms made publicly available by Facebook AI Research). We trained an initial model based on a dataset published by Zhu et al. (2022). A summary of the dataset, the statistics on the training/validation/test subsets and all related images and tables can be found in Page 10 to 15

Future work includes:
1) Fine-tuning the original model trained on images from Zhu et al. (2022) to images selected from our images to ensure the model performs well on our images and is robust to variations in image resolution and other image variations (e.g., images with smaller or larger number of flowers, images with more or less leaves, etc.)
2) Exploring and implementing other state-of-the-art network architectures (e.g., YOLOv7) that may be better suited and capable of achieving superior performance for our application.

Strategy 2: We have developed a preliminary imaging protocol by which the aborted flowers from greenhouse plants are collected, imaged, and annotated on capture plates. The approach of capturing the aborted flowers and quantifying them and related images can be found in the attached PDF see pages 16 to 19.

Annotated images are used to train a network for aborted flower detection and counting. The network used is also a Faster R-CNN network available in Detectron2. To gain an understanding of what plate color may lead to best predicted counts for aborted flowers, we imaged aborted flowers on plates of three colors: Sky Blue (2 images), Deep Blue (2 images), and Black (3 images), and we trained a model for each plate color (we used one image for training and one for test). Furthermore, we trained a model based on all imaged plates regardless of the color (three images of three different colors were used for training and three images for testing). A total of 168 aborted flowers were annotated on the 7 plate images.

Future work includes:
1) Annotating more image plates and training a model that is robust to plate color/background.
2) Exploring transfer learning from a model that the team has trained in prior work for detecting sorghum seeds spread on a piece of paper.
3) Exploring and implementing other state-of-the-art network architectures (e.g., YOLOv7) that may be better suited and capable of achieving superior performance for our application.

Objective 3 - Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions
Organ abscission (in this case pistil and flower) is an important process that regulates the detachment of flower from the stem. However, the underlying genetic mechanism of flower abscission is largely unknown in plants. To understand the flower abscission in soybean we surveyed the key determinant genes involved in flower and flower organ abscission in Arabidopsis and identified orthologs in soybean genome. The majority of genes expressed in abscission layer in the model organisms are associated with hormone biosynthesis/transport and nutrient uptake. We have selected a subset of these genes (mainly transcription factors) involved in hormone regulation. We will conduct a gene-based haplotype analysis to select the group of lines and correlated the large effect variants with the phenotypic data. The confounding effect (if any) (if any) of flowering QTLs will be compared for the selected genes.

View uploaded report PDF file

Updated July 16, 2023:
Project report (Second quarter April 1 2023 to June 30, 2023)

Project funded by North Central Soybean Research Program

Project tile - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee

Goals & Objectives
Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential

Objectives (Year 1)
• Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel
• Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans
• Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions

Progress achieved

Objective 1 - Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel

Texas Tech University
Seed processing and field preparation activities were initiated for the 228 lines on May 19th, following the delivery of seeds on May 18th, from University of Missouri. On June 13th, the seeds underwent treatment with Histick (Basf) - Inoculant and Biofungicide to promote seed emergence, growth and protection.

However, unfavorable weather conditions characterized by frequent showers posed challenges, resulting in a delay in planting. On June 16th, the soybean seeds were planted, and the plants have currently progressed to the V3 growth stage (Figure 1; see attached PDF).

To ensure effective weed control, continuous monitoring efforts have been undertaken in the field. To facilitate image phenotyping, our team is currently exploring the use of a sprayer or a tractor with a sprayer implement for installing the cameras (Figure 2; see attached PDF).

University of Missouri
Seeds of a diverse set of soybean germplasm (228 lines) in the USDA Gene Bank were successfully increased in Costa Rica. Our group distributed seeds to collaborators in Tennessee, Kansas, and Texas in May.
We planted all these entries in Columbia, MO on May 24, 2023. Germination was excellent. Field plots are well established, and plants reached V4-V5 growth stages as of June 30th (Figure 3; see attached PDF). We expect initial flowering in 10-14 days.
We are preparing image-based field phenotyping system as instructed by the engineering group in this project and field phenotyping is expected to start in last week of July or first week of August.

University of Tennessee
Plots were planted on June 7, 2023 at WTREC. All 700 plots are well maintained. The beans are at growth stage V3 to V4 (3 to 4 trifoliate leaves) (Figure 4; see attached PDF). The soybean crop will be managed according to University of Tennessee recommendations for growth regulator, pesticide applications, etc.
Rainfall and environmental data will be provided by the National Oceanic and Atmospheric Administration Global Historical Climatology Network Weather Station (GHCND: USC00404561) located at the immediately adjacent the experimental field. A Ph.D. student is on board with us to start his dissertation research actives on the current soybean project.

Kansas State University
The planting of the soybean plots took place on May 25th. Currently, we are actively monitoring the plots, and it is anticipated that the soybean plants will soon reach the R1 growth stage (Figure 5; see attached PDF).

To facilitate the installation of the imaging system, we have made specific modifications to a high-clearance spray vehicle (Figure 6; see attached PDF). The wheel spacing has been adjusted to straddle our 10' wide plots, which will serve as the mounting platform for the imaging system. This modification ensures optimal coverage and accessibility for capturing high-quality images of the soybean plants.

Objective 2¬ - Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans

Texas Tech University
Five different models for node detection were evaluated, all of which were found to have comparable performances. We will hand these over to the K-State team so that they can integrate them into the flower detection pipeline and begin to process the images that will be collected at various sites in the coming weeks.
A GoPro Hero11 camera 27-megapixel was evaluated due to easy use and image collection. A protocol for image quality collection was developed based on the GoPro camera parameters to all locations. The imaging system was tested in a greenhouse and its ability to capture and record high-quality images at 60 frames per second was verified (Figure 7; see attached PDF). Furthermore, the captured images were used as input to the node detection model with successful outcome (Figure 8; see attached PDF).
We expect that the respective teams in each of the location will innovate, assemble and implement a strategy for conveying the imaging system through the field. We will provide back-stopping and help with image processing as the teams start generating field-imaging videos.

Kansas State University
Improving the quality of the flower detection model
We have fine-tuned the original Faster R-CNN flower detection model to improve its predictions. Specifically, the model was fine-tuned with a variety of images, some taken in a more controlled environments and others resembling images taken in-the-field; some more focused, and others somewhat blurred; or images taken with different imaging systems/cameras producing different resolutions and quality. The Average Precision for detections whose bounding boxes overlap by at least 50% with the ground truth bounding boxes (denoted as AP50) was 79.53 on the test images. Some sample predictions on test images are provided (see PDF attachment), together with their corresponding ground truth annotations (the predicted and ground truth counts are also shown underneath each image) (Figure 9).
Adding pods to the flower model
During the last reporting period, we have also enriched our model with the ability to detect pods. Specifically, we have adapted the previous Faster R-CNN model to detect pods (in addition to flowers) by fine-tuning it with 2693 annotated pod images (Table 1; see PDF). We used the Faster R-CNN implementation available in Detectron2 (a library containing state-of-the-art detection and segmentation algorithms made publicly available by Facebook AI Research).
In Figure 10 (see attached PDF) are predicted bounding boxes by comparison with the ground truth annotations along with the original images
Flowers/Pods per whole plant images
To better estimate the overall prediction capability of the flower/pod detection model, we evaluated it by comparing the number of detected flowers/pods with the number of ground truth flowers/pods per whole plant image (Note that this is different from the number of flowers/pods per plant, as some flowers/pods may not be visible in a particular image, depending on the angle of the image.) More specifically, we mapped the coordinates of the flowers/pods in each individual node image to coordinates in the whole plant image. This allows us to avoid duplicate detections. Some examples of predictions for each node in a plant image are shown in Figures 11 and 12 (see attached PDF).

Objective 3 - Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions

Texas Tech University

Floral organ abscission is an important process that regulates the detachment of flowers from the stem. Floral organ abscission in well characterized model species (Arabidopsis) involves four steps: Initiation of abscission zone (AZ), promotion of AZ by ethylene, activation of separation and deposition of protective layer where organs have detached from the plant. We have shortlisted 6 genes (Blade on Petiole (BOP), KNAT (KNOX genes), BREVIPEDICELLUS 1 (BP1), INFLORESCENCE DEFICIENT IN ABSCISSION (IDA), HAE/HSL (leucine-rich repeat receptor like kinase), and DNA BINDING WITH ONE FINGER 4.7 (DOF4.7)) in Arabidopsis which correspond to 27 orthologous genes in Soybean involved in floral organ abscission. In addition, we shortlisted additional genes which have been reported to also play a role in floral organ abscission in addition to their known function- ASYMMETRIC LEAVES1 (AS1), AGAMOUS-like 15 (AGL15), and FOREVER YOUNG FLOWER (FYF). The mutant alleles of these genes have shown significant effect on several stages of floral organ abscission. And lastly, the maturity locus E1-E4 plays a significant role in the regulation of flowering in soybeans. The J locus, ortholog of AtELF3 (EARLY FLOWERING 3), is under the influence of E1. The functional analysis of mutant alleles for these genes showed an early flowering phenotype. The haplotype analysis for these genes is currently in progress, the analysis shows that some higher maturity group (MG) lines used in the current project (MG III, IV) retain one or more of the variant alleles. From this analysis, a group of lines correlating large effect variants associated with flowering traits (floral initiation and flower abortion) will be selected to identify causal genomic regions and thereby underlying genes.

View uploaded report Word file

View uploaded report 2 PDF file

Final Project Results

Benefit to Soybean Farmers

Retaining even a proportion of 30% to 80% of flower aborted under well-watered and stressful conditions, respectively, will allow for 10 to 20% increase in yield for the soybean producers in the US. This advantage can be extended to different soil and water available conditions, to support a wide range of soybean producers and is the major rationale for embarking on testing this hypothesis across four different soybean growing states with a focus on MG III to IV. The advantage proposed through this collaboration, will allow the soybean producers to gain additional economic return at the same level of investment i.e., with same seed cost, fertilizer level and management. With changing climate leading to an increase in temperature and lesser water available scenarios, the proportion of flower drop would increase proportionally, future lowering yield and producer profits. Hence, germplasm, breeding populations, novel QTL/genes and CRISPR edited lines developed with increased flower retention would help enhance the yield potential under current climates and retain the advantage even under future warmer and drier environments.

Performance Metrics

• Range in phenotypic variation associated with flower abortion and pod retention in different maturity
groups of soybean grown under different soil, moisture and climatic conditions determined.
• Image-based phenotyping system modified and established to count flowers and pods across all four
participating institutes.
• Deep learning tool developed can analyze images and acquire temporal changes in flower numbers
with minimal human interference, from images collected across all four locations.
• Candidate genomic loci identified for flower abortion under favorable and drought and heat stress
conditions.

Project Years