Project Details: Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean (2023)

2023

Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean

Home

Contributor/Checkoff:

North Central Soybean Research Program

Category:

Sustainable Production

Keywords:

Abiotic stressAgricultureLand Use Water supply

Parent Project:

This is the first year of this project.

Lead Principal Investigator:

Krishna Jagadish, Texas Tech University

Co-Principal Investigators:

Doina Caragea, Kansas State University
William Schapaugh, Kansas State University
Gunvant Patil, Texas Tech University
Glen Ritchie, Texas Tech University
Hamed Sari-Sarraf, Texas Tech University
Impa Somayanda, Texas Tech University
Henry Nguyen, University of Missouri
Avat Shekoofa, University of Tennessee-Institute of Agriculture

+7 More

Project Code:

60065

Contributing Organization (Checkoff):

North Central Soybean Research Program

$400,156

Leveraged Funding (Non-Checkoff):

None

Institution Funded:

Texas Tech University

$400,156

Final Report

Article

Brief Project Summary:

A 30 to 80% flower drop in soybeans across different U.S. regions is an unresolved and persisting bottleneck that has limited soybeans ability to achieve full genetic yield potential. The multi-regional team will develop an image-based field phenotyping system, integrated with deep-learning tools to capture large genetic variation in flower abortion and pod retention under different soil and climatic conditions. Deep re-sequenced genotypic data currently available will be leveraged to identify environmentally stable and region-specific genomic regions controlling flower abortion. This knowledge will help discover molecular switches to enhance flower and pod retention, and enhance yield potential under diverse environmental conditions.

Key Beneficiaries:
#breeders, #farmers, #geneticists, #nematologists

Unique Keywords:
# soybean varieties, #breeding and genetics, #environmental stress, #soybean varieties, #technology

Information And Results

Project Summary

A 30 to 80% flower drop in soybeans grown across different regions in the US is an unresolved and
persisting bottleneck that has limited soybeans ability to achieve the full genetic yield potential. The major
challenge has been the lack of robust, field-based high throughput phenotyping and analysis tools to
capture temporal variation in flower abortion and pod retention across large genetically diverse
germplasm. The multi-regional (KS, MO, TN and TX) and trans-disciplinary team will develop an image-based
field phenotyping system, integrated with deep-learning tools to capture large genetic variation in
flower abortion and pod retention under different soil and climatic conditions. A genetically diverse set of
250 genotypes including late group II, group III and early group IV will be tested under natural dryland
conditions in MO and KS, and under irrigated and severe drought and heat stress conditions in TX and
TN. Currently available deep re-sequenced genotypic data will be leveraged to identify environmentally
stable and region-specific genomic regions controlling flower abortion. This fundamental knowledge will
help discover molecular switches to enhance flower and pod retention, and thereby enhance yield
potential under diverse environmental conditions. The proposed project will address - Tools and
Technology for Soybean Improvement and utilizing these to induce Extreme Weather Resiliency. In
summary, the overall goal is to increase flower and pod retention by 20 to 30%, with a potential
to enhance yields by 10 to 15%, ultimately translating to an additional 400 million dollars to
the national soybean industry.

Project Objectives

Objectives (Year 1)
• Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions
using a large diversity panel
• Develop an image-based field phenotyping system and deep-learning tools to precisely document
temporal dynamics in flower abortion and pod retention in genetically diverse soybeans
• Discover environmentally stable and region-specific genomic regions controlling flower abortion in
diverse soil types, moisture, and climatic conditions

Year 2 - Utilizing the findings from year 1, we will fine-tune the high-throughput phenotyping and deep-learning tools
to validate environmentally stable and region-specific genomic regions and identify candidate genes and
metabolites that control flower abortion and pod retention under different soil and climatic conditions (Year

Year -3 Initiate breeding populations development using germplasm with significantly higher flower and pod
retention, identify molecular markers and test CRISPR-based gene edited lines with higher flower and pod
retention under controlled environments and field conditions (Year 3).

Project Deliverables

- Identify novel soybean germplasm that have the potential to retain 20 to 30% more flowers, accompanied with a balanced source-sink relation to increase yield potential by 10 to 15%
- Common and regional soybean germplasm with increased flower retention identified and made available for breeding purposes
- A publicly available image-based high throughput phenotyping tool developed to track rate of flower abortion/retention to strengthen soybean breeding efforts
- Identify environmentally stable and region-specific genomic regions and molecular markers controlling flower abortion in soybean
- Identify QTLs and characterize promising genes controlling flower abortion using CRISPR-based gene editing technology
- Breeding populations to incorporate genes for increased flower and pod retention into elite germplasm for variety development

Progress Of Work

Updated November 9, 2023:
Project report (first quarter Jan 1 2023 to March 31, 2023)

Project funded by North Central Soybean Research Program, sponsored by the Soy Checkoff

Project tile - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee

Goals & Objectives
Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential

Objectives (Year 1)
• Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel
• Develop an image-based field phenotyping system and deep-learning tools to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans
• Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions

Progress achieved
Objective 1 - Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel

A total of 350 diverse soybean lines were sent for winter nursery seed increase at Costa Rica in December 2022. They were planted in foundation seed increase plot (total of 150 ft row length for each line) to make sure enough seeds (5 lbs) is available for field planting at multiple locations in summer 2023. Among the 350 lines, 310 lines had good germination and plant stand in the seed multiplication field. We expect to receive sufficient seeds for these lines in late April for 2023 summer planting. Genetic diversity among the group 3 and 4s are targeted in terms of genetic structure

The 310 lines represents genetic diversity of the USDA soybean germplasm collection in maturity group III and IV. We have whole genome sequencing data for this set with an average sequencing coverage of 20x. Approximately, 0.6 million high quality SNPs and 0.5 million In/Del are available for robust GWAS to identify genetic loci and genes regulation of stress resilience and flower abortion in soybean. The average SNP and In/Del density together is about 1 marker/Kbp.

Preparation of field trails at multiple participating locations

The experimental site at the University of Missouri for this project is located in the Bradford Research Center (Columbia, MO). Three-acre field was reserved in the farm for this project. We will collect soil samples to identify basic soil properties. The field will be prepared for planting in April. The proposed ~310 diverse lines will be planted in mid-May to early-June, depending on the local weather.

The experimental site to evaluate the diversity panel under rain-fed conditions at Kansas State University will be located at the Agronomy North Farm near Manhattan, KS. Three and one-half acres have been reserved for planting the experiment. Field preparation for planting is underway and soil samples will be taken following planting. We expect to receive seed of the panel from the winter nursery in April or early May (shared by University of Missouri colleagues) with an expected planting date in May.

The experimental site in University of Tennessee that the experiment will be carried out will be located in West TN Research and Education Center (WTREC) under rainfed condition. We have secured a little over 2.5 acres for this study in 2023. The soil samples collection is in progress and detailed information will be documented about the field. The burndown will be done in a couple of weeks. We will be receiving 310 soybean lines seeds in from University of Missouri colleagues and planting will be done in early May.

The experiment will be conducted on the Quaker Avenue Research Farm at Texas Tech University in Lubbock, TX. The experiment will be carried out under sub-surface drip irrigation (SDI). Multiple irrigation zones have been obtained for this trail, which total to an area of 3 acres. Soil samples will be collected and analyzed along with documentation of the field history over prior years. Herbicide applications for burndown will be completed in April followed by a pre-emerge herbicide application in mid-May prior to planting of the ~310 soybean lines thereafter.

Objective 2 - Develop an image-based field phenotyping system and deep-learning tools to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans

Before the field season begins the team has taken good advantage of greenhouse grown soybean plants and other existing datasets to develop a robust machine learning tool to detect flower number and rate of abortion under field conditions.
The team is implementing two general strategies for enumerating aborted flowers and has begun to apply them to greenhouse grown soybean plants.
1. Pre-abortion: Counting flowers on the plant and comparing the counts over time
2. Post-abortion: Collecting and counting aborted flowers over time

Strategy 1: We have developed a preliminary imaging protocol by which images of greenhouse plants are collected from multiple views and with high enough resolution (e.g., 4K x 6K) such that the smallest flowers are comprised of a minimum of 30 pixels. Our proposed strategy would then detect the flowers in two stages. (see page 3 for the in attached PDF for image)
a) Subsample acquired image and feed it to a node-detection network. Subsampling the original high-resolution image would make it possible for the detection network to ingest it without compromising image fidelity.
b) Having the nodes localized from the previous step, crop the original image, and feed the resulting high-resolution sub-images to a flower-detection network. This ensures that even the smallest flowers are comprised of a sufficiently large number of pixels and yet, the cropped input images are small enough for the network to ingest.

Node-Detection Network: As an initial approach to detecting nodes, we have employed the Faster R-CNN architecture. We started by pre-training our model with a dataset provided by the study in 2023 that focuses on detecting nodes on Eggplant, Chili, and Tomato plants. (see pages 4 to 9 for different images related to this network approach)

Future work includes:
1) Simplifying the annotation process for the new dataset, which contains three times more images than the previous one, by using the existing model's predictions (as shown in Figure 5) as preliminary annotations. Therefore, the annotators will primarily focus on refining the predicted bounding boxes and occasionally making additions or deletions. This approach will significantly accelerate the annotation process, which is essential for efficient model development.
2) Exploring and implementing other state-of-the-art network architectures that may be better suited and capable of achieving superior performance for our application.
3) Associating the model predictions with the ground truth flower and node data to ascertain the efficiency of the model predictions and the extent of refinement needed for models to be precise to allow for deployment under field conditions.

Flower Detection Network: Similar to the node detection network, the flower detection network is also based on the Faster R-CNN architecture. Specifically, we used the Faster R-CNN implementation available in Detectron2 (a library containing state-of-the-art detection and segmentation algorithms made publicly available by Facebook AI Research). We trained an initial model based on a dataset published by Zhu et al. (2022). A summary of the dataset, the statistics on the training/validation/test subsets and all related images and tables can be found in Page 10 to 15

Future work includes:
1) Fine-tuning the original model trained on images from Zhu et al. (2022) to images selected from our images to ensure the model performs well on our images and is robust to variations in image resolution and other image variations (e.g., images with smaller or larger number of flowers, images with more or less leaves, etc.)
2) Exploring and implementing other state-of-the-art network architectures (e.g., YOLOv7) that may be better suited and capable of achieving superior performance for our application.

Strategy 2: We have developed a preliminary imaging protocol by which the aborted flowers from greenhouse plants are collected, imaged, and annotated on capture plates. The approach of capturing the aborted flowers and quantifying them and related images can be found in the attached PDF see pages 16 to 19.

Annotated images are used to train a network for aborted flower detection and counting. The network used is also a Faster R-CNN network available in Detectron2. To gain an understanding of what plate color may lead to best predicted counts for aborted flowers, we imaged aborted flowers on plates of three colors: Sky Blue (2 images), Deep Blue (2 images), and Black (3 images), and we trained a model for each plate color (we used one image for training and one for test). Furthermore, we trained a model based on all imaged plates regardless of the color (three images of three different colors were used for training and three images for testing). A total of 168 aborted flowers were annotated on the 7 plate images.

Future work includes:
1) Annotating more image plates and training a model that is robust to plate color/background.
2) Exploring transfer learning from a model that the team has trained in prior work for detecting sorghum seeds spread on a piece of paper.
3) Exploring and implementing other state-of-the-art network architectures (e.g., YOLOv7) that may be better suited and capable of achieving superior performance for our application.

Objective 3 - Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions
Organ abscission (in this case pistil and flower) is an important process that regulates the detachment of flower from the stem. However, the underlying genetic mechanism of flower abscission is largely unknown in plants. To understand the flower abscission in soybean we surveyed the key determinant genes involved in flower and flower organ abscission in Arabidopsis and identified orthologs in soybean genome. The majority of genes expressed in abscission layer in the model organisms are associated with hormone biosynthesis/transport and nutrient uptake. We have selected a subset of these genes (mainly transcription factors) involved in hormone regulation. We will conduct a gene-based haplotype analysis to select the group of lines and correlated the large effect variants with the phenotypic data. The confounding effect (if any) (if any) of flowering QTLs will be compared for the selected genes.

View uploaded report PDF file

Updated November 9, 2023:
Project report (Second quarter April 1 2023 to June 30, 2023)

Project funded by North Central Soybean Research Program, sponsored by the Soy Checkoff

Project tile - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee

Goals & Objectives
Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential

Objectives (Year 1)
• Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel
• Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans
• Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions

Progress achieved

Objective 1 - Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel

Texas Tech University
Seed processing and field preparation activities were initiated for the 228 lines on May 19th, following the delivery of seeds on May 18th, from University of Missouri. On June 13th, the seeds underwent treatment with Histick (Basf) - Inoculant and Biofungicide to promote seed emergence, growth and protection.

However, unfavorable weather conditions characterized by frequent showers posed challenges, resulting in a delay in planting. On June 16th, the soybean seeds were planted, and the plants have currently progressed to the V3 growth stage (Figure 1; see attached PDF).

To ensure effective weed control, continuous monitoring efforts have been undertaken in the field. To facilitate image phenotyping, our team is currently exploring the use of a sprayer or a tractor with a sprayer implement for installing the cameras (Figure 2; see attached PDF).

University of Missouri
Seeds of a diverse set of soybean germplasm (228 lines) in the USDA Gene Bank were successfully increased in Costa Rica. Our group distributed seeds to collaborators in Tennessee, Kansas, and Texas in May.
We planted all these entries in Columbia, MO on May 24, 2023. Germination was excellent. Field plots are well established, and plants reached V4-V5 growth stages as of June 30th (Figure 3; see attached PDF). We expect initial flowering in 10-14 days.
We are preparing image-based field phenotyping system as instructed by the engineering group in this project and field phenotyping is expected to start in last week of July or first week of August.

University of Tennessee
Plots were planted on June 7, 2023 at WTREC. All 700 plots are well maintained. The beans are at growth stage V3 to V4 (3 to 4 trifoliate leaves) (Figure 4; see attached PDF). The soybean crop will be managed according to University of Tennessee recommendations for growth regulator, pesticide applications, etc.
Rainfall and environmental data will be provided by the National Oceanic and Atmospheric Administration Global Historical Climatology Network Weather Station (GHCND: USC00404561) located at the immediately adjacent the experimental field. A Ph.D. student is on board with us to start his dissertation research actives on the current soybean project.

Kansas State University
The planting of the soybean plots took place on May 25th. Currently, we are actively monitoring the plots, and it is anticipated that the soybean plants will soon reach the R1 growth stage (Figure 5; see attached PDF).

To facilitate the installation of the imaging system, we have made specific modifications to a high-clearance spray vehicle (Figure 6; see attached PDF). The wheel spacing has been adjusted to straddle our 10' wide plots, which will serve as the mounting platform for the imaging system. This modification ensures optimal coverage and accessibility for capturing high-quality images of the soybean plants.

Objective 2¬ - Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans

Texas Tech University
Five different models for node detection were evaluated, all of which were found to have comparable performances. We will hand these over to the K-State team so that they can integrate them into the flower detection pipeline and begin to process the images that will be collected at various sites in the coming weeks.
A GoPro Hero11 camera 27-megapixel was evaluated due to easy use and image collection. A protocol for image quality collection was developed based on the GoPro camera parameters to all locations. The imaging system was tested in a greenhouse and its ability to capture and record high-quality images at 60 frames per second was verified (Figure 7; see attached PDF). Furthermore, the captured images were used as input to the node detection model with successful outcome (Figure 8; see attached PDF).
We expect that the respective teams in each of the location will innovate, assemble and implement a strategy for conveying the imaging system through the field. We will provide back-stopping and help with image processing as the teams start generating field-imaging videos.

Kansas State University
Improving the quality of the flower detection model
We have fine-tuned the original Faster R-CNN flower detection model to improve its predictions. Specifically, the model was fine-tuned with a variety of images, some taken in a more controlled environments and others resembling images taken in-the-field; some more focused, and others somewhat blurred; or images taken with different imaging systems/cameras producing different resolutions and quality. The Average Precision for detections whose bounding boxes overlap by at least 50% with the ground truth bounding boxes (denoted as AP50) was 79.53 on the test images. Some sample predictions on test images are provided (see PDF attachment), together with their corresponding ground truth annotations (the predicted and ground truth counts are also shown underneath each image) (Figure 9).
Adding pods to the flower model
During the last reporting period, we have also enriched our model with the ability to detect pods. Specifically, we have adapted the previous Faster R-CNN model to detect pods (in addition to flowers) by fine-tuning it with 2693 annotated pod images (Table 1; see PDF). We used the Faster R-CNN implementation available in Detectron2 (a library containing state-of-the-art detection and segmentation algorithms made publicly available by Facebook AI Research).
In Figure 10 (see attached PDF) are predicted bounding boxes by comparison with the ground truth annotations along with the original images
Flowers/Pods per whole plant images
To better estimate the overall prediction capability of the flower/pod detection model, we evaluated it by comparing the number of detected flowers/pods with the number of ground truth flowers/pods per whole plant image (Note that this is different from the number of flowers/pods per plant, as some flowers/pods may not be visible in a particular image, depending on the angle of the image.) More specifically, we mapped the coordinates of the flowers/pods in each individual node image to coordinates in the whole plant image. This allows us to avoid duplicate detections. Some examples of predictions for each node in a plant image are shown in Figures 11 and 12 (see attached PDF).

Objective 3 - Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions

Texas Tech University

Floral organ abscission is an important process that regulates the detachment of flowers from the stem. Floral organ abscission in well characterized model species (Arabidopsis) involves four steps: Initiation of abscission zone (AZ), promotion of AZ by ethylene, activation of separation and deposition of protective layer where organs have detached from the plant. We have shortlisted 6 genes (Blade on Petiole (BOP), KNAT (KNOX genes), BREVIPEDICELLUS 1 (BP1), INFLORESCENCE DEFICIENT IN ABSCISSION (IDA), HAE/HSL (leucine-rich repeat receptor like kinase), and DNA BINDING WITH ONE FINGER 4.7 (DOF4.7)) in Arabidopsis which correspond to 27 orthologous genes in Soybean involved in floral organ abscission. In addition, we shortlisted additional genes which have been reported to also play a role in floral organ abscission in addition to their known function- ASYMMETRIC LEAVES1 (AS1), AGAMOUS-like 15 (AGL15), and FOREVER YOUNG FLOWER (FYF). The mutant alleles of these genes have shown significant effect on several stages of floral organ abscission. And lastly, the maturity locus E1-E4 plays a significant role in the regulation of flowering in soybeans. The J locus, ortholog of AtELF3 (EARLY FLOWERING 3), is under the influence of E1. The functional analysis of mutant alleles for these genes showed an early flowering phenotype. The haplotype analysis for these genes is currently in progress, the analysis shows that some higher maturity group (MG) lines used in the current project (MG III, IV) retain one or more of the variant alleles. From this analysis, a group of lines correlating large effect variants associated with flowering traits (floral initiation and flower abortion) will be selected to identify causal genomic regions and thereby underlying genes.

View uploaded report Word file

View uploaded report 2 PDF file

Updated November 9, 2023:
Project report (Third quarter July 1, 2023, to September 30, 2023)

Project funded by North Central Soybean Research Program, sponsored by the soy checkoff

Project title - Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee

Goals & Objectives
Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential

Objectives (Year 1)
• Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel
• Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans
• Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions

Progress achieved

Objective 1 - Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel.

Texas Tech University
In early July, the soybean plants of all 228 lines were at the V2 developmental stage. In preparation for imaging and flower counting, several steps were taken. Labels were added to the plots, and measures were implemented for weed control. Additionally, cameras were mounted on the tractor for testing purposes (Figure 1). As the month progressed, the plants transitioned to the reproductive stage, which was delayed and occurred around the 20th of July due to stressful conditions i.e., Lubbock had over 5 weeks of 100 plus oF with no rain. Once they started to flower, manual flower counting and imaging was started. Various aspects, including camera angles, lens types, and camera numbers, were systematically adjusted and tested on the tractor to determine the optimal position and speed for imaging. Also, some pictures to document the diversity of the genotypes were taken.

To date, we have completed the 13th round of flower and pod counts for the 228 diverse genotypes. These counts were conducted every 4 to 5 days in conjunction with the imaging process. Most genotypes have now completed their flowering stage and have reached the R7 stage of development. As we approach the harvesting phase, a final round of imaging will be conducted on the dried plants for developing and counting pods using machine learning models. Subsequently, each plot will be harvested manually (3 feet per genotype per rep). To ensure representative data, we will select the tagged plant used for flower counting, along with an additional four plants, for other yield related parameters. From these five plants we will gather information on plant height, the number of branches, internode size, pod count, seeds per pod, 1000-seed weight, seed size, and grain weight per plant. Data from the tagged plant will be used as ground truth data for validating machine learning models for flower and pod numbers. Plants from 3-foot row length will be used for yield determination. Lodging scores will be recorded at harvest. Figure 2 displays the current flower count progress at TTU, indicating the range in maximum flower counts in 228 lines. Some of the very low numbers could be a result of rabbit damage.

July 2023, Dr Jagadish aired a radio interview of the Dakota Farm Talk to highlight the project and indicated the benefits that the progress made will have on the US and global soybean industry.

October 29th, 2023, Dr. Espíndola will deliver an oral presentation titled "Advancing Phenotyping for Flower Abortion in Soybeans through Image Analysis and Machine Learning" at the 2023 annual meeting of ASA-CSSA-SSSA in St Louis, Missouri.

University of Missouri
Since it was highly challenging to obtain human help to physically count flowers on all 228 lines, all other participating locations, selected a core set of 30 lines based on genetic diversity for manual flower counting. To date, we have completed the 7th round of flower counting for this core set of 30 lines in three replications (90 plots). These counts were conducted every 3 to 4 days in conjunction with the video imaging. Flower numbers are relatively consistent across 3 replications of each genotype and significant differences in flower number were observed among different genotypes. All genotypes have finished flowering and currently reached the R5 to R7 stage. As we approach the harvesting phase, a final round of imaging will be conducted on the dried plants for pod counting together with manual pod counting. Subsequently, each plot of 228 lines (684 plots) will be harvested manually (two center-rows of 8 feet/row) to estimate seed yield. Seed harvest will start at the end of September and the harvested seeds will be used for next year planting at all the locations. There are about 10 lines that may not have enough seeds, which will be included as a part of our winter nursery for seed increase.

University of Tennessee
At the University of Tennessee soybean plants imaging system was facilitated using GoPro Hero11 cameras mounted on a Traxxas Hoss ® 4x4 VXL conveyor (Figure 3). GoPro cameras were set up based on camera parameters including FPS, image ratio, boost, high quality video mode, white balance, camera angles, lens types, and positioning (10-12 inches distance). Field phenotyping was carried out throughout the flowering period wherein flowers and pods were manually counted separately every 4 to 5 days. Labeling was done in all 690 experimental plots. Plants were identified and tagged in each plot in order to facilitate manual counting of flowers and pods and imaging which was initiated on July 27, 2023. An overhead shot was taken to show the entire field including all of the 228 genotypes using an UAS platform. Remarkable differences in foliage color were observed among the genotypes (Figure 4).

When most of the genotypes completed their flowering stage and reached the R7 developmental stage (i.e., beginning maturity), we recorded another set of imaging, this time for the soybean pods for the 30 core lines (complete defoliation in a considerable number of lines). For those plots with plants reaching R8 (full maturity), harvesting has already been initiated (Figure 5). Plants were harvested manually from the two-row plots within 10.8 ft2 (~1 m2) to calculate the final yield. For documenting the yield components and other morphological parameters, the tagged plant that was used for manual counting of flowers and pods during the season along with other 4 plants on the same row were sampled. The harvested soybean plants are threshed using the USDA single plant thresher then the collected seeds per plot were placed inside a labeled bag for quantifying yield. Plant growth stage per soybean line were regularly monitored to estimate the harvest time. Plant height, number of branches, number of pods, seed number per plant, 100-seed weight, total seed weight per plot (1m2), and final yield will be determined. Lodging scores were also recorded once during late August and will be recorded per soybean line at harvest.

We were able to release a podcast on the UTIAg website (available on Spotify for Podcasters) about our soybean flowers abortion project. Find the link here:
https://podcasters.spotify.com/pod/show/utiag/episodes/Culture--Agriculture-Ep--4-Research-Could-Improve-Soybean-Yield-e277htq/a-aa5c7ku

Furthermore, we worked with the UTIA communication team to create and release a video about our current research project. To see the video, click here: https://www.youtube.com/watch?v=H5CVeWbiliU

The video will be broadcasted via WBBJ and Nashville TV channels during late September 2023.

Finally, we put a research abstract together titled “Image-based field high throughput phenotyping for quantifying flower abortion in genetically diverse soybean germplasm” and submitted it to the 2023 ASA-CSSA-SSA annual meeting. It will be presented at the CSSA section during the meeting in late October/early November 2023.

Objective 2 - Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans.

Texas Tech University
In pursuit of Objective #2, Texas Tech has been working on dataset preparation for the flower detection model and implementing an algorithm for flower counting. These are essential to the foundation for the successful development of our phenotyping system.
1. Dataset Preparation for Flower Detection Model Development:
1.1 Automated Video Frame Extraction
One of our initial tasks was to develop an automated pipeline to extract unique frames from videos captured at various locations. This pipeline streamlined the data collection process and ensured a consistent dataset for analysis.
1.2 Dataset Compilation and Annotation
We compiled a new dataset consisting of 1314 images from four diverse locations, namely Missouri, Tennessee, Texas, and Kansas. Collaborating with annotation teams from Texas Tech and Kansas State University, these images were annotated for flower detection. Before these annotated images could be used for model development, they need to be validated by a domain expert which involves confirming, removing, or adding annotations, enhancing the dataset's quality and consistency.
To date, 1341 images (1037 from the previous dataset and 277 from the new dataset) have been validated by Dr. Espíndola, our domain expert and the post-doctoral fellow on the project, encompassing 9367 confirmed flower annotations.
Furthermore, an additional 1037 annotated images are currently undergoing validation. This expansion aims to increase dataset diversity and enable the development of better-generalized models.

2. Dataset Preparation for Flower Detection Model Development:
Accurate flower counting in captured videos is essential for Objective #2. To achieve this, we focused on tracking detected flowers across frames to prevent overcounting.
2.1 Implementation and Modification of Tracking Algorithms
We implemented and modified three state-of-the-art multi-object tracking algorithms: SORT, OC-SORT, and OC-SORT with Byte. These algorithms were selected for evaluation in the context of flower counting.
2.2 Annotation of Tracking Data
Evaluating these algorithms required annotating a series of consecutive frames in a video. This process was challenging and time-consuming, as it necessitated identifying and tracking flowers through frames, even in the presence of occlusions. We annotated 211 consecutive frames from a Kansas field video, encompassing a total of 35 flowers. This segment was chosen for its complexity, involving both long-term and short-term occlusions.
2.3 Algorithm Evaluation
We evaluated the three tracking algorithms with various parameter combinations. Surprisingly, our findings indicate that the choice of algorithm is not the critical factor for accurate tracking and counting of flowers. All three algorithms yielded accurate results when specific parameter settings were used.
To validate and consolidate our conclusions, further videos need to be annotated for tracking and subsequently used for evaluation.

Kansas State University

In this phase of the project, we made a significant shift in our approach to flower detection. Specifically, we switched from models for node detection followed by flower/pod detection to a single model that performs flower detection directly on full images or frames extracted from videos taken in the greenhouse. This shift was motivated by a preliminary exploration of the flower model on full greenhouse images, which showed that the model was capable of detecting flowers directly in those images. Furthermore, by training a flower detection model without the need for node detection, we aimed to streamline our workflow and improve efficiency.

To train an accurate model on full images, we labeled flowers in a set of 1200 images/frames based on guidelines from the TTU domain experts. The images that we labeled were taken by the K-State team in the greenhouse in the beginning of the flowering season, and exhibited many buds and small flowers. We used 800 labeled images for training a new model, 300 images for development and 100 images for testing. Some examples of images predicted by the model are shown below. As can be seen, the model can accurately detect flowers directly in the greenhouse images, without the need to detect and extract the nodes in the first place.

While the model worked well on images and frames from videos taken in the greenhouse early in the flowering season, we encountered challenges when attempting to apply the model to field images (Figure 6). The model's performance suffered because the flowers presented significant differences in their characteristics (including color, shape and texture), as compared with the flowers in the greenhouse images. To account for such differences, we needed to enhance the labeled dataset by incorporating additional frames from videos taken at various flowering stages, which captured a large variety of flowers. A total of 1200 new frames sampled from videos from all four institutions were annotated and added to the original dataset. The original model was fine-tuned with the additional images and showed good performance overall in our testing as can be seen below. However, the performance on blurry frames and frames from videos taken at higher speed can still be improved.

As we are moving towards annotating pods in the next quarter of the project, and we may also need to annotate more flowers, we have also started to explore the use of large pre-trained foundation models, such as the recent Segment Anything model, to annotate images in a zero-shot setting with human-in-the-loop to improve its annotations. We have also worked on a script to identify differences between ground truth bounding boxes and predicted bounding boxes, with the goal of identifying mistakes in the human annotations as well as identifying challenging images that can help improve the robustness of the model.

University of Missouri
We used GoPro cameras to take 3 rounds of videos of the selected core set of 30 lines. There are some challenges, including steady walking speed of the camera, shade of soybean branches and leaves, and plant lodging issues. Group is designing a uniform imaging platform based on the experience of this year field studies, aiming to unify walking speed and avoid shade from leaves. Meanwhile, we are taking notes on lodging score and maturity date for the whole set of 228 lines, which will be used to select upright genotypes (low lodging) with similar maturity dates for our next year field studies. Initial observation indicated that maturity group (MG) III lines showed significant lodging issues compared to the MG IV lines. We will discuss with the group to focus on the set of lines that had minimal lodging across locations. Thus, we can solve the other 2 major issues caused by plant lodging and different flowering peak time.

Objective 3 - Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions.

In our prior analysis, we meticulously selected six pivotal genes that are well-documented in their roles pertaining to the initiation of the abscission zone (AZ), the facilitation of AZ development through ethylene signaling, the activation of tissue separation mechanisms, and the subsequent deposition of protective layers following organ detachment from the plant. Furthermore, we incorporated genes known to be involved in soybean maturity and flowering processes. To perform a robust gene-based clustering analysis and to identify alleles with substantial effects, we leveraged a state-of-the-art gene-haplotype analysis framework, which was executed on the high-performance computing servers at TTU. In a preliminary study, we executed the gene-based haplotype analysis on a cohort comprising 481 lines as a means of testing our analytical pipeline using major flowering genes. During this analysis, we successfully pinpointed four significant haploblocks, with particular emphasis on haploblocks H1 and H4 (as illustrated in Figure 7), which exhibited pronounced allelic variations possessing substantial effects on the observed traits. While the haplotype analysis of additional genes remains an ongoing endeavor, our ultimate objective is to compare the lines that overlap with these haploblocks to field data especially flower number and aborted flowers. Following the data collection from all locations will overlay the phenotypic data with haplotype analysis to identify most diverse accession for further analysis.

View uploaded report PDF file

View uploaded report 2 Word file

Updated December 17, 2023:
Project report (Fourth quarter October 1, 2023, to December 31, 2023)

Project funded by North Central Soybean Research Program

Project tile - Field phenotyping using machine learning tools integrated with genetic mapping to address heat and drought induced flower abortion in soybean

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee

Goals & Objectives

Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential

Objectives (Year 1)

• Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel

• Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans

• Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions

Progress achieved

Objective 1 - Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel.

Note – All data presented on flower abortion is preliminary as the teams are yet to have thorough discussions on the approach taken to ensure the percentage abortion presented is confirmed.

Texas Tech University
Soybean harvesting started on September 22nd and concluded on October 18th (Figure 1 – Please refer to the attached PDF for all figures and images). Figures 2 and 3 showcase the data gathered from the field for flower and pod counts. Other measurements including yield, pods per node, number of seeds per plant, and 1000 seed weight, are currently in progress and will be reported upon completion.
In Figure 2, the data illustrates that among the 30 genotypes, the average flower abortion rate is approximately 47%. Meanwhile, Figure 3 presents the percentage variation in abortion across 161 genotypes exclusively analyzed at Texas Tech University. This range spans from 20% to 80% among the different genotypes. The temperatures and precipitation of the experimental farm during the trail is presented in Figure 4.

University of Missouri
Using phylogenetic analysis a core set of 30 lines was selected that represented the genetic diversity to perform manual flower/pod counting and imaging. We counted flowers over 9 times with intervals of 3 to 4 days during the flowering stage. We performed the final pod counting in October to estimate the flower abortion rates of selected 30 lines. Preliminary data on the flower and pod count and related abortion percentage is presented in Figure 5. The average flower abortion rate at Missouri is approximately 50%, ranging between 37% and 62%. Video imaging at all 9 times was taken during flowering of these 30 lines and shared with the group to optimize the ML-based automatic flower counting platform.
Harvesting of the entire diverse panel of 280 lines from the field (center 2 rows). The harvested plants will be threshed to estimate yield of these lines. The yield data will be used to correlate with flower abortion rates. The temperatures and precipitation of the experimental farm during the trial is presented in Figure 6.

University of Tennessee
Harvesting of soybean plots at the University of Tennessee at the West TN Res. and Edu. Center (WTREC) started on September 14th (Figure 7). Plants were harvested manually from the two-row plots within 1 m2 and the total seed weight was recorded to calculate the yield in kg/acre (Figure 8). The tagged plant that was used for manual counting of flowers and pod along with other 4 plants within the same row were collected using a burlap fabric roll. Morphological characters such as plant height and number of branches as well as yield component parameters including number of pods plant-1 and number of seeds plant-1 were recorded before threshing. All harvested plants were threshed using the USDA single plant thresher then the collected seeds per plot were weighed to account for yield. All 90 plots were harvested and data collection is near completion. Figure 9 shows the rate of flower abortion of the soybean genotypes. The abortion rate among the 30 genotypes was upto 29% The temperature and precipitation at the experimental farm during the trial is presented in Figure 10.

Kansas State University
Flower and pod counts were completed this quarter on the core set of 30 genotypes. At the end of the growing season, single plants used for flower and pod counts from this core set were harvested. Number of pods, nodes, and seeds per pod, along with total seeds, 100 seed weight and total seed weight were determined for each of the single plants. Those counts have been completed. The data is now being evaluated for quality prior to analysis. Based on a preliminary analysis of the flower and pod counts, we observed about a 40% difference in the relative flower abortion in the core set, with abortion ranging between 22% and 70%. This needs to be confirmed and compared to the results at the other locations. Seed yield, plant maturity, lodging and height were taken on the entire panel during September, October and early November. At harvest, these plots were threshed with a stationary thresher. The harvested seed is now being cleaned and weighed to measure final seed yield. Videos taken weekly of the developing plants are now being inventoried and labeled to evaluate the relationship between the flower counts in the field throughout the season, and the detection of the flowers in the videos.

Objective 2 - Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans.

Texas Tech University
In this quarter, our focus has been on advancing the development of a customized Multi-Object Tracking (MOT) algorithm specifically tailored for counting soybean flowers in the field, aligning with the overarching objective of creating an image-based field phenotyping system.

1. Tracking Dataset Preparation, Continued

As mentioned in the previous report, preparing a dataset for the evaluation of MOT algorithms is quite challenging and time-consuming. This intricate task involves identifying and tracking individual flowers across an extensive sequence of consecutive frames, a laborious process exacerbated by the presence of long-term occlusions. This endeavor requires precision and attention to detail, as each flower needs to be tracked separately.
Our tracking dataset has expanded from one video (211 frames) to five videos (1,382 frames), comprising 22,606 individual flower annotations. These videos, hailing from diverse locations such as Kansas, Tennessee, Missouri, and Texas, enrich our dataset with varied environmental conditions and soybean varieties. This growth enhances the representativeness and applicability of our dataset for robust algorithm evaluation and refinement.
It's important to note that well-constructed MOT datasets are inherently scarce, given the complexities involved. Crafting a MOT dataset tailored specifically for Soybean Flower counting adds an extra layer of rarity. Even in its current state, we consider our dataset highly valuable, recognizing its uniqueness within the research landscape.
Our commitment to dataset expansion remains steadfast. The addition of more diverse videos is key to enhancing the accuracy of our algorithm evaluations, and we will continue this effort in the upcoming phases of our research.

2. Tracking and Counting Evaluation Method
We've delved into diverse evaluation methods for our tracking and counting algorithms. Specifically, we've implemented two approaches:

1. A dedicated approach for assessing the accuracy of counting flowers.
2. An evaluation method gauging the quality of flower tracking across frames. This method incorporates various metrics, with a particular emphasis on achieving high tracking accuracy. Inspired by the widely recognized HOTA paper by Luiten (2020), this approach is instrumental in refining our counting performance.

3. Tracking for Counting Algorithm Development
Expanding on our algorithm development, we've incorporated two additional state-of-the-art tracking algorithms, ByteTrack and DeepSORT, alongside our previous implementations of SORT, OC-SORT, and OC-SORT with Byte. The inclusion of DeepSORT is particularly noteworthy, introducing deep learning and a specialized neural network for tracking. While still in the active exploration phase, our preliminary findings indicate an unexpected trend— the integration of deep learning appears to be adversely affecting the performance of our tracking algorithm. These initial results carry significant implications and shed light on the current state of tracking algorithms. Our commitment to rigorously evaluating and investigating these outcomes remains paramount, guiding our next steps in algorithm refinement and optimization.
In summary, our investigation into tracking algorithms for counting soybean flowers is groundbreaking. While previous studies focus on detecting soybean flowers from still images, our work stands out by offering a novel solution to use detection algorithms for large-scale flower counting. This unique contribution addresses a gap in existing research, providing a practical approach to advance soybean phenotyping.

Kansas State University
Recognizing the intricate shape of soybean pods, we have opted for an instance segmentation method as opposed to bounding box object detection, for the task of identifying and counting the pods. The instance segmentation approach enables us to obtain precise segmentation masks of the pods, ensuring a more accurate representation of their complex structures.
Addressing the challenge of limited labeled data, we have devised a strategy of utilizing images extracted from field videos captured at various stages of the soybean growing phase. These images are subsequently annotated using AnyLabeling, a powerful tool driven by the Segment Anything Model (SAM) developed by Meta. This innovative tool allows us to generate precise masks by leveraging weakly supervised prompts, mitigating the need for laborious labeling of segmentation masks.
Through the utilization of AnyLabeling, we have successfully annotated approximately 300 frames of images extracted from videos recorded across all four locations. This annotation process not only facilitates the accurate identification of soybean pods, but also contributes to the enrichment of our dataset with diverse and representative samples. The incorporation of weakly supervised prompts, coupled with the efficiency of SAM, empowers our annotation process, ensuring that the generated masks accurately delineate soybean pods in their varying stages of development. This meticulous annotation approach enhances the robustness and reliability of our dataset, laying the foundation for more accurate and comprehensive analyses of soybean phenotypic traits.
The 300 annotated frames were split into three subsets, used for model training (Train), model development/hyper-parameter tuning (Valid) and model evaluation (Test) as shown below
The performance of the current trained model is shown below in terms of average precision (AP), average precision at 50% IoU (AP50) and average precision at 75% IoU (AP75) Some samples of annotated frames are shown below (Figure 11), together with the original un-annotated frames.

In addition to pod segmentation, we have also worked on tracking the soybean pods. We are currently using multi-object trackers and fine-tuning the trackers, so that we can use them on field level vídeos (Figure 12).

Objective 3 - Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions
Previously, we selected key soybean homologs involved in flower abortion including Initiation of abscission zone (AZ) and promotion of AZ by ethylene and performed haplotype analysis. We identified two major and two minor haplotypes for one of the transcription factors (GmRNI) involved in flower organ abscission (Figure 13). The major haplotype carries two alleles and showed higher allelic diversity in wild accessions (Figure 13B). Most interestingly, this gene expressed during R1 flower stage in multiple soybean accession and suggests a critical role in flower development and probably in floral abscission (Figure 14). Currently we are performing additional analysis to identify allelic variants in a subset of accessions that were selected from Year 1.

View uploaded report PDF file

Final Project Results

Updated December 21, 2023:
Project Annual Report (Jan 1, 2023, to December 31, 2023)
Project funded by North Central Soybean Research Program

Project tile - Field phenotyping using machine learning tools integrated with genetic mapping
to address heat and drought induced flower abortion in soybean.

Summary
Major Accomplishments (Jan 1, 2023 to Dec 31.2023)

• Flower abortion was rated in a diverse panel of soybean lines grown under field conditions in four different soil, climatic, management and environmental conditons.
• Successful in transition the imaging approach from greenhouse conditions and from static images to videos, which are both unique and novel in the soybean research domain
• A variety of platforms were developed for imaging in the field to determine the best approach for phenotyping flower and pod counts
• Developed and trained effective deep-learning (machine learning) models for temporal and spatial tracking soybean flower and pods in plants grown under field conditions
• Key genes were shortlisted, conducted gene-based haplotype analysis, and identified significant haploblocks, to help understand the genetic factors influencing flower abortion in soybeans.

Challenges

• Manual flower counting over multiple times (every 4 days) on large diversity panel posed limitations due to the need for a large workforce at each location
• Detecting and counting of flowers posed a challenge due to the small flower size and the soybean plant architecture, further complicated by different levels of lodging across locations
• The imaging platform required customization based on specific location conditions and available resources.
• Image storage is a limitation, with most videos housed by respective locations. In year 2, we hope to address this for long term storage of videos.
• Training the model for accurate flower and pod counting throughout the flowering period after addressing different levels of occlusions presents another hurdle for discern precise flower abortion patterns.
• Lack of contrasting lines with differential level of flower abortion under field conditions has limited our capacity to identify key transcripts and nodal genes that control flower abortion in soybeans

Participating institutions – Texas Tech University, Kansas State University, University of Missouri, and University of Tennessee.

Goals & Objectives

Long-term Goal – Develop soybean cultivars with 20 to 30% lower flower abortion under favorable to challenging environmental conditions, leading to about 10-15% increase in yield potential.

Objectives (Year 1)
1. Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel.
2. Develop an image-based field phenotyping system and deep-learning tool to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans.
3. Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions.

Note – For all figures and images please refer to the attached PDF file

Objective 1 - Explore the genetic diversity in flower abortion under different soil moisture and climatic conditions using a large diversity panel.

A diverse set of 350 soybean lines were seed increased in winter nursery at Costa Rica, with 310 lines from the USDA soybean germplasm collection showing ideal germination and plant stand in the seed multiplication field (Figure 1). Emphasizing genetic diversity within maturity groups III and IV, 228 lines were selected for distribution across four experimental locations: Texas Tech University, University of Missouri, Kansas State University, and University of Tennessee.

Each experimental site covered approximately three acres; Texas Tech University utilized sub-surface drip irrigation (SDI), while the remaining locations operated under rain-fed conditions. Planting dates varied: Texas Tech University planted the crop on June 16th, University of Missouri on May 24th, Kansas State University on May 25th, and University of Tennessee on June 7th. Soybean crop management followed location-specific recommendations. At Texas Tech University, cameras (Go Pro Hero 11) mounted on a tractor (Figure 2A) facilitated imaging for assessing image quality from different angles. As soybean plants entered the reproductive stage, manual flower counting and imaging was performed in the 228 lines. Systematic adjustments to camera angles, lens types, and numbers were made on the tractor to optimize imaging at that time. Approaching harvest, a final round of imaging on dried plants was conducted focusing on pods to refine the machine learning model for pod counting, subsequently, manual harvesting of 3 feet per genotype per rep was collected for plot yield data. To ensure representative data, the tagged plant used for flower counting, along with three additional plants per plot, were selected for other yield-related parameters.

The University of Missouri, Kansas State University, and the University of Tennessee encountered challenges in securing adequate work force for temporal flower counting across the 228 lines. Consequently, a core set of 30 lines was selected based on genetic diversity for manual flower counting and imaging. The manual flower counting protocol aligned with the procedures established at Texas Tech and was consistently applied across all locations.

At the University of Missouri, imaging procedures involved the use of a bicycle, and adjustments were implemented to enhance imaging quality, as necessary. The seed harvest began in September, with the harvested seeds designated for next year's planting. Sufficient seed has been produced at the University of Missouri for supporting Year 2 trails across all four locations.

Kansas State University utilized a modified spray vehicle (Figure 2B) with adjusted wheel spacing to straddle 10' wide plots, serving as the imaging system's mounting platform. This modification ensured optimal coverage and accessibility for capturing high-quality images of soybean plants.

The University of Tennessee employed GoPro Hero11 cameras on a Traxxas Hoss® 4x4 VXL (Figure 2C) conveyor for soybean plant imaging.

Towards the end of the season, soybean harvesting at Texas Tech University took place from September 22nd to October 18th, revealing an average flower abortion rate of approximately 47% among 30 genotypes (Figure 4). Ongoing measurements, including yield per plot and on area basis, pods per node, number of seeds per plant, weight of 1000 seeds, and plant height, are still in progress. The flower abortion variation across 161 genotypes, from the 228 initial ones, was recorded only at Texas Tech University, spanning between 20% to 80%, as illustrated in Figure 5. These findings corroborate the existing literature regarding the variability in abortion rates observed in soybean plants.

At the University of Missouri, the average flower abortion rate was approximately 50%, ranging between 37% and 62%. (Figure 6). Harvesting of diverse soybean lines has been concluded, and seed threshing to estimate yield is in progress.
University of Tennessee harvested soybean plots starting on September 14th. The average flower abortion rate at Tennessee is approximately 17%, ranging up to 29%, as shown in Figure 7.

Kansas State University completed flowering, and pod counts on a core set of 30 genotypes, The average flower abortion rate was approximately 40%, ranging between 22% and 70%. Data on plant maturity, lodging, height, and seed yield are being evaluated. Threshing and cleaning of harvested seed, along with video analysis, are ongoing to assess the relationship between field flower counts and video-tracking flowers.

Objective 2 - Develop an image-based field phenotyping system and deep-learning tools to precisely document temporal dynamics in flower abortion and pod retention in genetically diverse soybeans.

Ahead of the field season, our team maximized the use of greenhouse by grown soybean plants to develop a robust machine learning tool (Figure 8A & B). This helped to target the primary objective, to detect flower numbers and the rate of abortion under field conditions.

Node-Detection Network: As an initial approach to detecting nodes, we have employed the Faster R-CNN architecture. We started by pre-training our model with a dataset provided by the studies in 2023 that focused on detecting nodes on Eggplant, Chili, and Tomato plants.

The existing model's inference on this dataset indicated that the model’s ability to generalize is reasonably good as it was able to locate most of the visible nodes in the new images. However, the model also outputs several more False-Positives and False-Negatives.

Flower detection: The same imaging protocol previously mentioned was used for the flower detection network. Like the node detection network, the flower detection network was also based on the Faster R-CNN architecture. Specifically, we used the Faster R-CNN implementation available in Detectron2 (a library containing state-of-the-art detection and segmentation algorithms made publicly available by Facebook AI Research). We trained an initial model based on a dataset published by Zhu et al. (2022).

Using both node detection and flower detection (Figure 9) for counting soybean flowers showed great potential to be used, however, it required excessive image processing and storage. For that, it was decided to use only flower detection moving forward on the field trails, as the model demonstrated the ability to effectively detect flowers without the need for node detection.

A GoPro Hero 11 camera 27-megapixel was evaluated due to ease of use and image collection. A protocol for image quality collection was developed based on the GoPro camera parameters and was shared with all collaborators. The imaging system was tested in a greenhouse (Figure 8B) and its ability to capture and record high-quality images at 60 frames per second was verified. Furthermore, the captured images were used as input to the flower detection model with successful outcome.

Improving the precision of the flower detection model.

We have fine-tuned the original Faster R-CNN flower detection model to improve its predictions. Specifically, the model was fine-tuned with a variety of images, some taken in a more controlled environments and others resembling images taken in-the-field (Figure 10); some more focused, and others somewhat blurred; or images taken with different imaging systems/cameras producing different resolutions and quality. The Average Precision for detections whose bounding boxes overlaps by at least 50% with the ground truth bounding boxes (denoted as AP50) was 79.53 on the test images.

Adding pods to the flower model

As a new approach Kansas State University team have enriched our model with the ability to detect pods. We have adapted the previous Faster R-CNN model to detect pods (in addition to flowers) by fine-tuning it with 2693 annotated pod images. We used the Faster R-CNN implementation available in Detectron2.

Texas Tech University team has been working on dataset preparation for the flower detection model and implementing an algorithm for flower counting. These are essential to the foundation for the successful development of our phenotyping system.

Our focus at Texas Tech University has been on advancing the development of a customized Multi-Object Tracking (MOT) algorithm tailored for counting soybean flowers in the field, aligning with the overarching objective of creating an image-based field phenotyping system. The tracking dataset has expanded to five videos Quarter4_Tracking_1.mp4 comprising 22,606 individual flower annotations, enriching the dataset with varied environmental conditions and soybean varieties. Algorithm development incorporates two additional state-of-the-art tracking algorithms, ByteTrack and DeepSORT. Preliminary findings suggest that the integration of deep learning may adversely affect tracking algorithm performance, guiding next steps in algorithm refinement and optimization.

At Kansas State University, recognizing the intricate shape of soybean pods, an instance segmentation method was employed for identifying and counting pods, providing precise segmentation masks for accurate representation (Figure 11). To address limited labeled data, images extracted from field videos were annotated using AnyLabeling, a tool driven by the Segment Anything Model (SAM). Approximately 300 frames were annotated and split into three subsets for model training, development, and evaluation. The annotation approach enhances the robustness and reliability of the dataset for more accurate analyses of soybean phenotypic traits. In addition to pod segmentation, we have also worked on tracking the soybean pods. We are currently using multi-object trackers and fine-tuning the trackers, so that we can use them on field level vídeos (Figure 12).

Objective 3 - Discover environmentally stable and region-specific genomic regions controlling flower abortion in diverse soil types, moisture, and climatic conditions.

We have shortlisted 6 genes (Blade on Petiole (BOP), KNAT (KNOX genes), BREVIPEDICELLUS 1 (BP1), INFLORESCENCE DEFICIENT IN ABSCISSION (IDA), HAE/HSL (leucine-rich repeat receptor like kinase), and DNA BINDING WITH ONE FINGER 4.7 (DOF4.7)) in Arabidopsis which correspond to 27 orthologous genes in soybean involved in floral organ abscission. In addition, we shortlisted additional genes which have been reported to also play a role in floral organ abscission in addition to their known function- ASYMMETRIC LEAVES1 (AS1), AGAMOUS-like 15 (AGL15), and FOREVER YOUNG FLOWER (FYF). The mutant alleles of these genes have shown significant effect on several stages of floral organ abscission. Lastly, the maturity locus E1-E4 plays a significant role in the regulation of flowering in soybeans. The J locus, ortholog of AtELF3 (EARLY FLOWERING 3), is under the influence of E1. The functional analysis of mutant alleles for these genes showed an early flowering phenotype. The haplotype analysis for these genes is currently in progress, the analysis shows that some higher maturity group (MG) lines used in the current project (MG III, IV) retain one or more of the variant alleles. From this analysis, a group of lines correlating large effect variants associated with flowering traits (floral initiation and flower abortion) was selected to identify causal genomic regions and thereby underlying genes.

In our prior analysis, we meticulously selected six pivotal genes that are well-documented in their roles pertaining to the initiation of the abscission zone (AZ), the facilitation of AZ development through ethylene signaling, the activation of tissue separation mechanisms, and the subsequent deposition of protective layers following organ detachment from the plant. Furthermore, we incorporated genes known to be involved in soybean maturity and flowering processes. To perform a robust gene-based clustering analysis and to identify alleles with substantial effects, we leveraged a state-of-the-art gene-haplotype analysis framework, which was executed on the high-performance computing servers at TTU. In a preliminary study, we executed the gene-based haplotype analysis on a cohort comprising 481 lines as a means of testing our analytical pipeline using major flowering genes. During this analysis, we successfully pinpointed four significant haploblocks, with particular emphasis on haploblocks H1 and H4 (as illustrated in Figure 13A), which exhibited pronounced allelic variations possessing substantial effects on the observed traits. While the haplotype analysis of additional genes remains an ongoing endeavor, our ultimate objective is to compare the lines that overlap with these haploblocks to field data especially flower number and aborted flowers. Following the data collection from all locations will overlay the phenotypic data with haplotype analysis to identify the most diverse accession for further analysis.
Previously, we selected key soybean homologs involved in flower abortion including Initiation of abscission zone (AZ) and promotion of AZ by ethylene and performed haplotype analysis. We identified two major and two minor haplotypes for one of the transcription factors (GmRNI) involved in flower organ abscission (Figure 13A). The major haplotype carries two alleles and showed higher allelic diversity in wild accessions (Figure 13B). Most interestingly, this gene expressed during R1 flower stage in multiple soybean accession and suggests a critical role in flower development and probably in floral abscission (Figure 14). Currently we are performing additional analysis to identify allelic variants in a subset of accessions that were selected from Year 1.

Showcasing the Project and Presenting Results

Texas Tech University
July 2023, Dr. Jagadish aired a radio interview of the Dakota Farm Talk to highlight the project and indicated the benefits that the progress made will have on the US and global soybean industry.

October 29th, 2023, Dr. Espíndola delivered an oral presentation titled "Advancing Phenotyping for Flower Abortion in Soybeans through Image Analysis and Machine Learning" at the 2023 annual meeting of ASA-CSSA-SSSA in St Louis, Missouri.

University of Tennessee

Tennessee team were able to release a podcast on the UTIAg website (available on Spotify for Podcasters) about our soybean flowers abortion project. Find the link here:
https://podcasters.spotify.com/pod/show/utiag/episodes/Culture--Agriculture-Ep--4-Research-Could-Improve-Soybean-Yield-e277htq/a-aa5c7ku

Furthermore, they worked with the UTIA communication team to create and release a video about our current research project. To see the video, click here: https://www.youtube.com/watch?v=H5CVeWbiliU

Finally, they put a research abstract together titled “Image-based field high throughput phenotyping for quantifying flower abortion in genetically diverse soybean germplasm” and submitted it to the 2023 ASA-CSSA-SSA annual meeting in St. Louis, Missouri.

In brief, Year 2 work plan
- Grow 50 diverse and elite lines in all four locations, in 4 row plots with 3 replications; irrigated and stress (50% irrigation) at Texas Tech University
- Obtain temporal (4 days once) images and manually count flowers from at least 4 plants per line
- Establish a uniform vehicle carrier for the imaging system for uniform video generation across locations
- Refine both the flower detection and the pod detection models and enhance the predictive capacity
- Candidate lines with increased flower and pod retention identified, specifically for each location and across locations
- Unravel molecular mechanisms (transcriptomics) that control flower abortion from contrasting lines identified from Year 1 field trials, under controlled greenhouse trails
- Identify key transcripts that determine flower abortion in soybeans

View uploaded report PDF file

Major Accomplishments/Results from Year 1

• Flower abortion was rated in a diverse panel of soybean lines grown under field conditions in four different soil, climatic, management and environmental conditons.
• Successful in transition the imaging approach from greenhouse conditions and from static images to videos, which are both unique and novel in the soybean research domain
• A variety of platforms were developed for imaging in the field to determine the best approach for phenotyping flower and pod counts
• Developed and trained effective deep-learning (machine learning) models for temporal and spatial tracking soybean flower and pods in plants grown under field conditions
• Key genes were shortlisted, conducted gene-based haplotype analysis, and identified significant haploblocks, to help understand the genetic factors influencing flower abortion in soybeans.

Benefit To Soybean Farmers

Retaining even a proportion of 30% to 80% of flower aborted under well-watered and stressful conditions, respectively, will allow for 10 to 20% increase in yield for the soybean producers in the US. This advantage can be extended to different soil and water available conditions, to support a wide range of soybean producers and is the major rationale for embarking on testing this hypothesis across four different soybean growing states with a focus on MG III to IV. The advantage proposed through this collaboration, will allow the soybean producers to gain additional economic return at the same level of investment i.e., with same seed cost, fertilizer level and management. With changing climate leading to an increase in temperature and lesser water available scenarios, the proportion of flower drop would increase proportionally, future lowering yield and producer profits. Hence, germplasm, breeding populations, novel QTL/genes and CRISPR edited lines developed with increased flower retention would help enhance the yield potential under current climates and retain the advantage even under future warmer and drier environments.

The United Soybean Research Retention policy will display final reports with the project once completed but working files will be purged after three years. And financial information after seven years. All pertinent information is in the final report or if you want more information, please contact the project lead at your state soybean organization or principal investigator listed on the project.