Project Details: Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean (2025)

2025

Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean

Home

Contributor/Checkoff:

North Central Soybean Research Program

Category:

Sustainable Production

Keywords:

Macronutritional bundle

Parent Project:

Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean

Lead Principal Investigator:

Krishna Jagadish, Texas Tech University

Co-Principal Investigators:

Doina Caragea, Kansas State University
William Schapaugh, Kansas State University
Juliana Espindola, Texas Tech University
Gunvant Patil, Texas Tech University
Glen Ritchie, Texas Tech University
Hamed Sari-Sarraf, Texas Tech University
Impa Somayanda, Texas Tech University
Christopher Turner, Texas Tech University
Henry Nguyen, University of Missouri
Avat Shekoofa, University of Tennessee-Institute of Agriculture

+9 More

Project Code:

60065

Contributing Organization (Checkoff):

North Central Soybean Research Program

$400,156

Institution Funded:

Texas Tech University

$400,156

Brief Project Summary:

The balance between flower production and abortion is the key determining factor that dictates the final pod number and yield in soybeans. Although soybean plants can produce an enormous number of flowers, 25-35% of flowers are aborted under favorable conditions and up to 80% under drought and heat stress conditions (Kokubun 2011). Large variation exists for flower number among US soybean cultivars (Hansen & Shibles, 1978), with flower abortion ranging from 20 to 80% for midwestern US cultivars and 37 to 61% for early maturing cultivars (McBlain and Hume, 1981). We are refining a novel and robust image-based phenotyping system to capture the genetic variation in flower abortion in a diverse set of entries including adapted cultivars to help us develop advanced breeding lines with lower flower and pod abortion.

Unique Keywords:
#farmers, #geneticists, #physiologists, #public and private soybean improvement groups

Information And Results

Project Summary

A 30 to 80% flower drop in soybeans grown across different regions in the US is an unresolved and persisting bottleneck that has limited soybean's ability to achieve full genetic yield potential. The major challenge has been the lack of robust, field-based high-throughput phenotyping and analysis tools to capture variation in flower abortion and pod retention across genetically diverse germplasm. The multi- regional (KS, MO, TN and TX) and trans-disciplinary team has developed a novel image-based field phenotyping system, integrated with deep-learning approaches to capture large genetic variation in flower abortion and pod retention under different climatic conditions. Currently, the field-based phenotyping tool is improvised to phenotype with minimal human intervention and easily used by researchers without expertise in machine learning. Wide genetic diversity in flower abortion has been captured over the last two seasons facilitating the development of Recombinant Inbred Lines for identifying genomic regions and using contrasting lines for transcriptomic analysis and functional validation of novel hub genes controlling flower abortion. This fundamental knowledge will help discover molecular switches to enhance flower and pod retention and thereby enhance yield potential under diverse environmental conditions. The proposed project will address - Tools and Technology for Soybean Improvement and utilizing these to induce Extreme Weather Resiliency. In summary, increase in flower and pod retention by 20 to 30%, with a potential to enhance yields by 10 to 15%, would ultimately translate to an additional $6.07 billion in revenue across the U.S. soybean industry at the current market price.

Project Objectives

Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential.

Objectives for Year 3:
• A novel image-based machine learning tool for quantifying flower abortion with minimal to no manual counting in soybeans grown in diverse environmental conditions.
• Develop recombinant inbred line (RIL) populations using contrasting (low and high) flower abortion lines identified from different environmental conditions.
• Identify key hub genes that regulate flower abortion using contrasting lines and functionally characterize using CRISPR/Cas9-mediated knockout (KO) technology.

By leveraging the insights from Years 1 and 2, we will optimize high-throughput phenotyping and machine learning tools, initiate breeding populations with enhanced flower and pod retention, and identify key hub genes to understand gene regulation driving flower retention and overall productivity.

Project Deliverables

• Range in genetic variation with flower abortion and pod retention in soybean grown under different environmental conditions captured using novel high throughput machine learning tool.
• Machine learning tool to capture flower abortion with minimal manual effort developed. • RIL population developed to capture genomic regions controlling flower abortion in soybeans.
• Candidate genes identified and functionally validated for lower flower abortion.

The high-throughput phenotyping, image capture and analysis and molecular mechanisms associated with this dynamic and complex process will serve as foundational knowledge towards addressing the goal.

Progress Of Work

Updated July 12, 2025:
Report (January 1 to June 30 2025)

Project funded by Multiregional Soybean Checkoff Program and the United Soybean Board.

Project title - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean.

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee.

Goals & Objectives

Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential.

Objectives (Year 3)

• A novel image-based machine learning tool for quantifying flower abortion with minimal to no manual counting in soybeans grown in diverse environmental conditions.
• Investigate physiological effects of drought stress on contrasting lines in both controlled and field environments to assess tolerance to adverse conditions (new activity).
• Develop recombinant inbred line (RIL) populations using contrasting (low and high) flower abortion lines identified from different environmental conditions.
• Identify key hub genes that regulate flower abortion using contrasting lines and functionally characterize using CRISPR/Cas9-mediated knockout (KO) technology.

Note – For all graphs and images kindly refer to the PDF version.

Objective 1 - A novel image-based machine learning tool for quantifying flower abortion with minimal to no manual counting in soybeans grown in diverse environmental conditions.

Genotypes of high (CL0J17-3-6-8 and PI567638) and low (IA3023 and PI506862) flower abortion were selected from 2023/2024, plus two cultivars as checks, for all locations trials in 2025 (Figure 1), planting happened on the following dates:
- Texas Tech University: June 2nd
- Kansas State University: June 9th
- University of Tennessee: June 3rd
- University of Missouri: June 24th (delayed by rain events)

All locations are preparing QR codes for plots identification. The video imaging pre-testing for camera settings and position will start before flowering for method establishment and imaging improvements, to ensure we develop a robust tool that operated with minimal human involvement.

Models’ development

Texas Tech University – Flower count

For flower count annotations, soybean flowers were categorized into two classes, as shown in Figure 2. The total dataset used for training, validation, and testing is presented in Table 1. The flower detection model we developed utilizes Faster R-CNN architecture. We picked Faster R-CNN due to its established capacity for agricultural object detection. The performance of the Faster R-CNN flower detection model was evaluated using a held-out test set comprising 352 images with 14,299 annotated flowers (7,397 old and 6,902 new). Standard object detection metrics—including precision, recall, F1-score, and Average Precision (AP)—were used to assess model accuracy. The network had a precision of 89.6%, a recall of 86.5%, and a resulting F1-score of 88.0%, which was computed across the two flower classes. These metrics also verify the robustness of the model for achieving a good balance between accuracy, sensitivity, and consistency, even under the uncontrolled and high-variance conditions seen on the field. Our Faster R-CNN detector achieved an average precision (AP) of 86% with an IoU threshold of 0.3 (AP30) on the held-out test split, demonstrating high precision for detecting soybean flowers under a wide variety of field conditions as well.

For accomplishing flower enumeration by tracking, we evaluated several general-purpose and popular tracking algorithms (SORT, ByteTrack, OC-SORT, and DeepSORT) that have performed well in the MOTChallenge Dataset (Leal-Taixé et al., 2015). The results shown in Table 3 and Figure 3 reveal an intriguing performance pattern among the algorithms evaluated. Although OC-SORT and ByteTrack achieved slightly better final flower count accuracy in terms of RMSE, SORT consistently outperformed both in terms of MOTA and RMSE. Given that MOTA is a composite metric that accounts for false negatives, false positives, and ID-switches, SORT’s superior MOTA indicates a higher overall reliability in tracking flower identities across frames, even though MOTChallenge benchmarks typically favor OC-SORT and ByteTrack for their advanced occlusion handling. Our findings suggest that in the context of flower counting in agricultural fields, where camera movement is relatively constant and objects move predictably, the added complexity of algorithms designed for non-linear motion is unnecessary. Instead, SORT’s straightforward motion assumptions align well with these conditions, making it a more reliable choice.

Temporal Dynamics of Model Flower Count

The model successfully captured fluctuations in both new and old flower counts, providing detailed insights into flowering dynamics from testing genotypes with high (LG02-9050), intermediate (PI556511), and low (IA3023) flower manual counts, throughout the reproductive period. It predicted the onset of flowering, peak activity, and cessation. For IA3023 and PI556511 (Figure 4), flower counts on August 2nd were higher than those of LG02-9050, indicating a slower flowering initiation. The peak in flower production was more sharply defined for IA3023 and LG02-9050 on August 12, whereas PI556511 exhibited a broader peak spanning August 9th and 12th. The highest number of new flowers occurred on August 6th for IA3023, on both August 6th and 9th for PI556511, and on August 9th for LG02-9050. Notably, prior to the total flower count, LG02-9050 consistently exhibited higher counts of both new and old flowers compared to the other two genotypes, indicating its superior flower production capacity. This trend highlights the genotype’s greater plasticity to flower during the season.
Tracking old flower counts proved valuable in identifying the transition toward flowering cessation, as the number of old flowers began to exceed new flower counts. For all genotypes, flowering slowed down after August 20th, which shows a synchronized end of flowering. This information is important for assessing how long each genotype remains in the reproductive phase and whether this duration is influenced by environmental conditions. These temporal flowering patterns can be influenced by planting date, environmental stress, photoperiod, pest or diseases pressure, and ultimately impact yield.

Regarding total flower counts (Figure 5), the model successfully distinguished three distinct levels of flowering across genotypes for both new and old flower counts, providing dual validation of genotype-specific flowering performance in the field. However, for future application of new and old flower data to predict flower abortion in soybeans, enhancements are needed to ensure the new flower counts consistently exceed old flower counts, as would be expected biologically. In this study, new flower count for genotypes IA3023 and PI556511were lower than old flower counts, which may indicate limitations due to occlusion or insufficient field of view, given that only a single camera was used for this analysis. The camera was positioned to capture the middle section of the plant, which included a substantial portion of the canopy, but may have failed to detect flowers in upper and lower regions as genotypes have different heights or leaf occlusion. To mitigate occlusion and improve plant coverage, a multi-camera setup will be implemented in 2025 trials. Enhancing the imaging platform would enable the model to detect a more comprehensive set of flowers, thereby increasing counting accuracy and improving the reliability of flower abortion predictions.

While the current model effectively distinguishes between two classes flowers, our future work will expend this classification framework to include a third class for small pods. Our preliminary results testing how small pods could improve future predictions for flower abortion are shown in Figure 6. The same genotypes studied for two-class model were quantified for new, and old flowers and small pods. It is possible to track the transition from new flowers to old flowers and then pod formation at each time point (Figure 6). The developmental sequence provides critical information for predicting the dynamics of flower abortion in soybeans under varying environmental conditions and may help identify atypical responses triggered by stress events. A paper detailing our two-class model findings has been written and is in the final stages of review for submission. A second paper, focusing on the three-class model and occlusion quantification, is currently being developed.

Kansas State University – Pod count

For pod count model development, we have further fine-tuned our prior model for pod segmentation and tracking, while also setting up the ground for numeric model evaluation. To achieve this, we have manually annotated 13 videos from 3 locations using the CVAT tool. Each annotated video has a resolution of 608x1080 at 24 frames per second. The set of annotated videos (Table 4) represents an important resource for the project as it allows both model training and fine-tuning as well as model evaluation. To the best of our knowledge, this dataset is the largest of its kind in literature, and ensures diversity in terms of locations, genotypes, irrigation regime, pod stage, video quality, among others, making it ideal for the task at hand. This diversity can be clearly seen from the last two columns of the table below, which show the number of pods manually counted in the field and the number of pods manually counted in videos (through annotating and tracking pods manually using the CVAT tool).

The last two columns in the table also show that there is a significant difference between the two sets of counts. This can be attributed to the fact that only one side of the plant is being captured in videos and thus many of the actual pods counted in the field are occluded. To account for this, during this year of the project, the plants will be imaged on both sides. We have used the annotated videos to fine-tune our prior YOLOv8 model and to evaluate its results. Numeric evaluation shows good results in terms of the standard variants of average precision used as a metric for object detection/segmentation – AP 59.05%, AP50 74.08%, and AP75 63.05%. After the model was trained/fine-tuned for the object segmentation task, we have used ByteSORT to track the pods across the frames. The linked video (Soybean Pod Tracking) shows a demonstration of how the tracking approach performs. Specifically, the model is able to detect 1033 pods while the actual manual count based on the corresponding video (IA3023 2nd replicate irrigated) is 963. The difference can be explained by the fact that the annotators were asked to count only the pods belonging to the front row of the camera, while the model is also detecting and counting some pods which were further back. For some specific examples of the tracking capabilities of our model, we are showing a sequence of images in Figure 7. The sequence shows three frames which are five frames apart from each other in the video. We can see the pods being numbered as well as the pods having the same number through the frames. All pods which were not detected have been detected in the next frame (this behavior is due to how the parameters for the tracking algorithm are set).

Currently, we are working on a paper that will document the data collected during the 2024 harvest season, the annotated dataset that we assembled as well as the model we trained and evaluated using the dataset. We have all the tools in place to count soybean pods from field videos, and we believe that improving the capturing of the pods in the videos (by imaging multiple sides and using multiple cameras) will further improve the results, taking us closet to the actual pods counts in the field.

Objective 2 - Investigate the physiological effects of drought stress on contrasting lines for flower abortion under controlled conditions.

This study aimed to assess the flowers abortion among six soybean lines including PI506862, PI567638, IA3023, CL0J7-3-6-8, PI548318, and PI80837 under progressive water-deficit stress (dry-down phase) and subsequent re-watering recovery (recovery phase) in a greenhouse setting at the West Tennessee Research and Education Center, University of Tennessee (Figure 8). On April 10, 2025, seeds were sown in a 1:1 mixture of sand and Lexington silt loam at a 2 cm depth, then thinned to one plant per pot at 13 days after planting (DAP). Fertilizers were applied at 12 DAP (0.075% V/V liquid, 0–10–10) and 24 DAP (0.06% W/V water-soluble, 24–8–16). Plants were maintained under a 14-hour light/10-hour dark cycle and received 200 mL of water daily during the pre-treatment phase. At 28 DAP, when plants had 4–5 trifoliate leaves, the dry-down (DD) phase began; pots were saturated, drained to their pot capacity, enclosed in 15-L plastic bags (Fig. 1) to eliminate the evaporation, and fitted with watering tubes for controlled watering and monitoring plants transpiration/water loss. Based on daily transpiration rate (TR), four pots per genotype were designated as well-watered (WW)/controls, and six as DD treatments. The DD plants were watered only if TR exceeded 80 g/day, following Shekoofa et al. (2013). Stress progression was monitored using normalized transpiration rate (NTR), with <0.10 indicating the endpoint of available soil water. Recovery occurred on days 36–41 after planting by re-watering DD pots with 350 mL to full capacity. Data were collected daily including number of flowers, flower rate per day, and wilting score (0–5) during stress and recovery phases. The number of nodes and pods was recorded at the end of the dry-down phase and again at experiment termination (June 18, 2025). Data is being processed, graphs and outcome will be incorporated in the final report.

Objective 3 - Develop recombinant inbred line (RIL) populations using contrasting (low and high) flower abortion lines identified from different environmental conditions.
The greenhouse experiment (Figure 9) was planted on April 22nd, with plants grown in pots arranged in four rows, each approximately 4 feet in length and containing 8 pots. Video recordings began at the R1 growth stage. Manual counting of old and new flowers is being conducted within a 2-foot stretch in each row, in parallel with video recording, twice a week. The videos are captured using two GoPro cameras mounted on a hand-held PVC pipe frame. Student employees are being trained on the greenhouse experiment protocols prior to the commencement of the main field experiment. We are continuing the generation advancement of crosses between putative soybean genotypes with contrasting levels of flower abortion. F2 seed is currently being harvested in the greenhouse. The seed will be inventoried, and a selected subset will be sent to a winter nursery in Puerto Rico for further advancement, with the goal of developing F4 or F5-derived lines for field evaluation in the summer of 2026.

Objective 4 - Identify key hub genes that regulate flower abortion using contrasting lines and functionally characterize using CRISPR/Cas9-mediated knockout (KO) technology.

To investigate the molecular basis of flower abortion in soybeans, we performed a comparative transcriptomic analysis of high-abortion (HA) and low-abortion (LA) genotypes across four floral developmental stages: bud, close petals, open flower, and dry flower. Raw RNA-seq reads were quality-checked, trimmed, and aligned to the soybean reference genome. Gene-level counts were generated, followed by normalization and differential expression analysis using DESeq2. The principal component analysis (PCA) (Figure 10) revealed a clear separation of samples according to genotype and developmental stage, with PC1 accounting for 64% of the variance and effectively distinguishing HA and LA groups.

Furthermore, bar plot analysis of differentially expressed genes (DEGs) (Figure 11) showed comparable numbers of up- and down-regulated genes across stages in both genotypes, with subtle variations in magnitude reflecting the complexity of transcriptional responses associated with flower abortion. A comprehensive heatmap of DEGs (Figure 12) highlighted distinct gene expression clusters (C1–C6) associated with specific floral stages and abortion phenotypes, indicating stage-specific transcriptional reprogramming. We are currently advancing functional enrichment and network analyses to identify key candidate genes and pathways driving these contrasting phenotypes.

View uploaded report PDF file

Final Project Results

Benefit To Soybean Farmers

Understanding the genetic diversity associated with flower abortion is necessary to discover untapped yield potential in soybean and increase profitability for soybean producers. Although flower abortion is a major cause of soybean yield loss in the US and elsewhere, this challenge has been largely overlooked. Currently, there is limited information available in the public domain and no systematic efforts have been initiated to address this challenge to fully benefit from soybean’s yield potential. Two major bottlenecks for exploring flower retention and yield improvement have been the complexity of the trait and the lack of robust field based high-throughput phenotyping. Therefore, to address this major knowledge gap in soybean improvement, we have assembled a team with expertise in crop physiology, agronomy, conventional and molecular breeding, soybean genetics, computer science with extensive experience in crop-related AI (artificial intelligence) systems, genomics, and molecular biology.

Aiming for a 20 to 30% increase in flower and pod retention, potentially leading to about 10 to 15% increase in yield provides a strong justification for addressing this challenge by exploring genetic diversity in flower retention and developing tools that will allow translating the advantage into local popular US soybeans. The multi-regional team’s goal is to address the issue of flower abortion and pod retention across different environments, which further justifies that the outputs generated can benefit a large sector of the US soybean producers. Currently, soybeans are grown over 80 million acres in the US with a national average of over 53 bu/acre. A 10% increase in yield due to higher retention of flowers and pods would raise the average yield to approximately 58.5 bu/acre, increasing total production by an additional 440 million bushels, generating an extra $6.07 billion in revenue across the U.S. soybean industry at the current market price.

The United Soybean Research Retention policy will display final reports with the project once completed but working files will be purged after three years. And financial information after seven years. All pertinent information is in the final report or if you want more information, please contact the project lead at your state soybean organization or principal investigator listed on the project.