Artificial intelligence method developed for classifying raw sugarcane in the presence of the solid impurity

An investigation dedicated to evaluating a big issue in biorefineries, solid impurity in raw sugarcane, is presented. This relevant industrial sector requests a high-frequency, low-cost, and noninvasive method. Then, the developed method uses the averaged color values of ten color-scale descriptors: R (red), G (green), B (blue), their relative colors (r, g, and b), H (hue), S (saturation), V (value) and L (luminosity) from digital images acquired from 146 solid mixtures among sugarcane stalks and solid impurity — vegetal parts (green and dry leaves) and soil. The solid mixture of samples was prepared considering desirable and undesirable scenarios for the solid impurity amounts. The outstanding result was revealed by an artificial neural network (ANN), achieving 100% of accurate classifications for two ranges of raw sugarcane in the samples: from 90 to 100 wt% and from 41 to 87 wt%. Lowcomputational cost and a simple setup for image acquisition method could screen solid impurity in sugarcane shipments as a promising application.


Introduction
Image and color information has played an important role in analytical chemistry and can help solve many issues, mainly because of its versatility and availability of many low-cost devices for in loco or laboratory analysis (Capitán-Vallvey et al., 2007;Diniz, 2020;Pereira and Bueno, 2007;Pereira et al., 2011).
Our research group has developed analytical methods to evaluate raw sugarcane to help the mills or biorefineries manufacturing process of this material routinely monitored as a consignment for payment purposes. The quality of raw sugarcane influences the manufacturing process, directly compromising two essential commoditiessugar and ethanol (Andrade et al., 2018;Guedes and Pereira, 2018;2019;Guedes et al., 2020;Romera et al., 2016).
Solid impurity in raw sugarcane is defined as the plant presence (tops, green, brown, and dry leaves) and the soil (Eggleston et al., 2010). This issue is impacted by the type of harvesting process, as harvesting green or burnt cane. In specific, harvesting green increases the quantity of solid impurity in raw sugarcane, as reported in technical notes and scientific literature (Lisboa et al., 2018;Norris et al., 2015).
For instance, classifying solid impurity in raw sugarcane can be performed with chemometric techniques, such as soft independent modeling of class analogy (SIMCA), partial least squares discriminant analysis (PLS-DA), and k-nearest neighbors (kNN) by using the conversion of digital images in color histograms (Guedes and Pereira, 2019). The content of raw sugarcane between 85 and 100 wt% was accurately classified. According to approximately 0.97 of receiver operating characteristic (ROC) area curves for sensibility and specificity using PLS-DA and 1 for SIMCA and kNN. Although these results were promising, the average color values were also tested with no successful results.
In this sense, it is possible to develop a faster computational method using another strategy as the artificial neural network (ANN) model and the averaged color values. The advantage of the averaged color values from images is that ten color-scale represent the average of the color interval with originally 256 intensities/variables, as follows: R (red), G (green), B (blue), their relative colors (r, g, and b), H (hue), S (saturation), V (value) and L (luminosity), which means less running time for computational tests.
The solid impurity in raw sugarcane was successfully estimated using the ANN model for color image data since the data showed no-linear nature. The parameters computed for the ANN model were very promising, the relative errors were 3%, and the data were highly correlated, with the reference values achieving 0.98 for the training set (Guedes et al., 2020).
Artificial neural network methods include accurate results, ease of implementation, low computational cost, speed in obtaining results, and the ability to learn through a set of examples and provide consistent responses to new data (Braga et al., 2000;Guedes et al., 2020;Santos et al., 2019). Therefore, the main goal was to classify raw sugarcane in the presence of solid impurity using the ANN method, as the last part of series of investigations dedicated to this critical issue for sugar mills and biorefineries.

Samples and image acquisition
Among sugarcane stalks, vegetal plant parts, and soil, one-hundred forty-six solid mixtures were prepared to acquire digital images, as shown in Fig. 1. Each one was placed in a paper tray (26.5 × 21.5 cm) into a laboratory-made setup (Guedes and Pereira, 2019) composed of a black box, a digital camera Nikon (COOLPIX S3500, Tokyo, Japan) 20.1-megapixel resolution. The images with a 1600 × 1200-pixel size (width × height) and 300 × 300 dpi (dots per inch) resolution were recorded with the tray in a horizontal position. The camera's focal distance was 10 mm, with a maximum aperture of 3.5, and the region of interest (ROI) corresponded to 100% of the original image. During the acquisition of the images, the camera software automatic adjustments were intentionally disabled. Five images were acquired per sample, and the samples were shaken after each image recording to mimic natural conditions at shipments. The same images were converted into colors using an 'imread' function in MATLAB R2020a (MathWorks, Natick, MA, USA). Afterward, the images were converted into color histograms using another function, 'imhist' in MATLAB. The average color values, which were comprised of ten color-scale descriptors: R (red), G (green), B (blue), their relative colors (r, g, and b), H (hue), S (saturation), V (value) and L (luminosity), using a laboratory-made MATLAB code was available in the study of Camargo et al. (2018).

Neural model
The development of the neural models was performed using the scaled conjugate gradient backpropagation algorithm (traincsg). For this, the MATLAB R2018a software was used, with the 'NNStart' tool available in the software, choosing the pattern recognition app button in which the ANN input layer (with ten neurons representing the ten inputs: R, G, B, H, S, V, r, g, b, and L, number of intermediate layers and an output layer with two neurons were set manually. The number of neurons in the intermediate layer was defined by trial and error to achieve the best classification of raw sugarcane content in solid mixtures. The classification was based on the content of raw sugarcane in wt% denoted as number 1, in the presence of different proportions among green and dry leaves, as number 2, and soil denoted as 3 in Fig. 1. The following division was made: 90-100 wt% designated as class 1appropriate (given by binary code 1 0), representing 36 samples; while 41-87 wt% was class 2 inappropriate (provided by binary code 0 1), representing 110 samples.
The 146 samples were randomly divided using the 'dividerand' function, available in MATLAB, into three sets: 70% (102 samples) for training; 15% (22 samples) for validation, to verify that the network is generalizing the information and to interrupt the training before overfitting occurs; the remaining 15% (22 samples) were for the test, independent of the generalization of the neural model.  For the training set, the correct classifications of 28 samples of class 1 and 74 samples of class 2 were practicable. For the validation set, no misclassifications for all samples were a remarkable result. Finally, for the test set, all samples were classified as members of their classes. Therefore, all 146 samples achieved a 100% accuracy rate, as shown by the confusion matrices for the training, validation, test sets, and an all-confusion matrix in Fig. 3. Table 1 shows the responses obtained and expected by the ANN for the test set, with five samples of class 1appropriate (90-100 wt% of raw sugarcane) and 17 samples of class 2inappropriate (41-87 wt% of raw sugarcane).  Two other ANN models were also investigated: (i) a model for classifying samples based on the maximum desirable content of vegetal impurity (8 wt%), the range from 0 to 8 wt% designated as class 1 (with binary code 1 0), representing 50 samples, and between 9 and 40 wt% as class 2 (with binary code 0 1), comprising 96 samples; (ii) a model for classification based on the maximum tolerate content of the soil as an impurity (3 wt%), ranging from 0 to 3 was class 1 (with binary code 1 0), with 40 samples and from 4 to 20 wt% as class 2 (with binary code 0 1), a total of 106 samples.

Results and discussion
In the ANN model for classifying vegetal impurity content, the best result was 12 neurons in the intermediate layer. The training set was observed among 35 samples of class 1; the neural model misclassified two samples as class 2; among 67 samples of class 2, only one sample was misclassified; that is, the percentage of accurate classifications for the training set was 97.1%. For the validation set, it was observed that the nine samples of class 1 and the 13 samples of class 2 were 100% accurately classified. Finally, for the test set, it was observed that among six samples of class 1, only one sample was misclassified as class 2, and among the 16 samples of class 2, there was one misclassification, that is, the percentage of accurate classifications for the test set was 90.9%. Therefore, for all 146 samples, five were misclassified, representing an average rate of 96.6%.
For classifying the soil content as a solid impurity, 18 neurons in the intermediate layer were the best result. For the training and test sets, eight and three misclassifications were verified, respectively. Therefore, of the 146 samples, 11 were misclassified, representing a 92.5% average rate of accurate classifications.
Thus, the remarkable ANN result is for raw sugarcane content, considering the lowest crossentropy error, was achieved in this model (0.0062). In contrast, the best models for vegetal parts and soil contents resulted in 0.0160 and 0.0482, respectively.

Conclusion
The outstanding result using the ANN method and averaged color values from digital images achieved the lowest cross-entropy errors and 100% of accurate classifications for the content of raw sugarcane, considering the presence of two different types of solid impurityvegetal plant parts of plants and soil.
Additionally, the ANN running takes a few seconds, and the system of a digital image is an easy-to-use system that can be carried out in any location. Thus, the method can be implemented in sugar cane mills as a screening method of raw sugarcane shipments in the presence of solid impurity as vegetal parts of the plant itself and soil.