The majority of modern Artificial intelligence (AI) algorithms are developed by supplying labelled data.
These labels are created by hand (everytime you click on some of the “Are you a robot? Identify the cars” – images, you are supplying labels to AI models). The model receives as input the raw images, and as output, the labels, and learns what the best combination of its internal parameters are, to go from the input to the labelled output.
While it is relatively straightforward to provide ground-truth labels for everyday objects, such as houses, cars or kitchenware, it is much more difficult to do so for invasive plants that look very much alike, and even more so from an aerial perspective, which is unnatural for most humans.
If advanced imagery analysis tasks, such as object detection or segmentation should be used, the objective of identifying a plant is exacerbated by the added task of delineating it, e.g., supplying a start and end location for each of the identified objects, either through bounding boxes or polygons. Operators who are highly familiar with the target species are usually the best people to label weed imagery, provided they are instructed in the use of the chosen labelling software. If possible, it is best to have a team of labellers so that large numbers of images can be labelled for each species. Labelling is highly labour-intensive and thus takes significant time in most cases - this time needs to be accounted for in any weed detection project. Therefore, collating a team of labellers for a particular weed detection project will be the best way to ensure efficiency.
Fig. 1 RGB image showing labelling polygons of target plant species (Bitou Bush)
Furthermore, as data acquisition and labelling are the most cumbersome and expensive components of the process, even more so when expert labellers need to be consulted, the definition and facilitation of the labelling processes requires clear definition and are driven by end-user needs.
The quality of data is the key bottleneck in machine learning model development (Geiger et al., 2021, Roberts et al., 2021) and care should be taken in defining data accurately and reproducibly.
For example, if the treatment of plants is only possible at certain stages of their life cycle, such as when they are already matured, it may be beneficial to identify them just before. Also, if during certain phenological stages no identification by human labellers is possible, and so it may be beneficial to exclude these stages from labelling.
Furthermore, the parts that require identification need to be clearly defined, whether flowers, stems and leaves are to be included or are categorically excluded, or under which circumstances these should be included. These definitions also inform the choice of tools that can be considered suitable for the task.
As a third careful consideration, it should be considered what can be identified from the target viewpoint. While handheld images may show stems and size differences in plants, these may not be contained in top-down viewpoints. The two figures below illustrate this example.
Fig 2. Side-view of a plant, showing differences in height
Fig. 3. Top down view of a plant, with indistinguishable height differences.
The following tables show examples how these processes could be defined, with colour codes highlighting potential negative cases and issues.
Table 1. Features of plants, growth stages and potential false negatives
Species | False Positives | |||
---|---|---|---|---|
Stages | Bitou | Hibertia | Myoporum | Scaveola |
Seed | no distinct features | |||
Sprout | no distinct features | |||
Small Plant | yellow flowers | yellow flowers | yellow flowers | yellow flowers |
Adult Plant | yellow flowers | yellow flowers | yellow flowers | yellow flowers |
Dying Plant | irrelevant |
Table 1. above, shows the distinct plant features and at what stage of the plant life cycle they should be identified and potential false positives by phenological stage.
Table 2. Distinct plant features of targets and potential false negative species and their suitable identification period
Species | ||||
---|---|---|---|---|
Time | Bitou | Hibertia | Myoporuam | Scaveola |
Winter | ||||
Spring | yellow flowers | yellow flowers | yellow flowers | yellow flowers |
Summer | yellow flowers | |||
Autumn |
Table 2 above, shows the distinct plant features and when they should be identified, sorted by time of year. The first two tables could be used interchangeably, depending on the application scenario. Importantly, the time of detection requires definition.
Table 3. Key plant features of targets and potential false negatives
Species | ||||
---|---|---|---|---|
Components | Bitou | Hibertia | Myoporum | Scaveola |
roots | ||||
stems | ||||
leaves | ||||
flowers | yellow | yellow | yellow | yellow |
seeds |
Table 3 above, shows the distinct plant features by component of plant. This is important to delineate labelling sections, as well as highlight potential false-positives.
Clear guidelines and definition of labelling enable consistent labels, especially for cases that are difficult to delineate and change between phenological stages, such as weeds.
Since supervised deep learning is a statistical process during which a model approximates a desired outcome provided by human labellers in the form of data, these labels are key to robust model performance.
There is a variety of software that can be used for labelling, most of which will depend on the type of computer vision task to be employed to develop the model. Table 4 below lists a variety of these and their pros and cons for use in weed detection. Access each by clicking on the heading.
Table 4. Examples of labelling software, pros and cons.
Software | Pros | Cons |
---|---|---|
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
Cloud computing providers also offer a range of image annotation tools as part of their AI frameworks, such as Amazon Web Services (AWS) Rekognition, Google Cloud Vision API, and Azure Machine Learning Annotations.