Estimating bulk density in leguminous grains with different traits using color parameters from digital images combined with artificial neural networks

11 Eclética Química Journal, vol. 45, n. 1, 2020, 11-17 ISSN: 1678-4618 DOI: 10.26850/1678-4618eqj.v45.1.2020.p11-17 ABSTRACT: Dry grains from leguminous species, such as soybeans (Glycine max L.), common beans (Phaseolus vulgaris L.), chickpeas (Cicer arietinum L.) and corn (Zea mays L.), are regularly consumed for human nutrition. This paper showed the possibility of estimating bulk density as quality parameter of 4 different dry grains (soybeans, common beans, chickpeas and corn) in a same model using the average values of color descriptors from digital images combined with an artificial neural network, with low computational costs. These food products are good sources of carbohydrates, protein and dietary fiber, and they possess significant amounts of vitamins and minerals and a high energetic value. Estimation of the physicochemical properties of grains is challenging due to variations in shape, texture, and size and because the grain colors appear similar to the naked eye. In this work, an analytical method was developed based on digital images converted into ten color scale descriptors combined with a neural model to provide an accurate parameter for grain quality control with a low computational cost. The bulk densities of four type of grains, i.e., soybeans, beans, chickpeas and corn, were predicted using numerical data represented by the average values of color histograms of a ten color scale (red R, green G, blue B, hue H, saturation S, value V, relative RGB and luminosity L) from digital images combined with artificial neural networks (ANNs). The reference bulk densities were empirically measured. A very good correlation between the reference values and values predicted by the ANN was achieved, and with a single ANN developed for the four grains, a correlation coefficient of 0.98 was observed for the test set. Moreover, the relative errors were between 0.01 and 5.6% for the test set. Estimating bulk density in leguminous grains with different traits using color parameters from digital images combined with artificial neural networks Bruna Gava Floriam , Fabíola Manhas Verbi Pereira , Érica Regina Filletti+


Introduction
Soybeans (Glycine max L.), common beans (Phaseolus vulgaris L.), chickpeas (Cicer arietinum L.) and corn (Zea mays L.) are a part of most human diets, regardless of culture. These grains are leguminous species consumed as dry grains. They are remarkable sources of carbohydrates, protein and dietary fiber, and they possess significant amounts of vitamins and minerals and a high energetic value 1 .
The quality of food grains is dependent on several physicochemical parameters, including the bulk density (or density in each mass). In a grain, the bulk density is more closely related to its shape than its size 2 . Stored food materials can suffer from variations in bulk density according to the bin depth 3 . Thus, a model that can monitor this parameter would be helpful to avoid losses in agri-food supply chains. For instance, in the study by Bart-Plange and Baryeh 4 several laborious physicochemical methods were applied to evaluate cocoa beans, as the raw material for manufacturing chocolate and other food products, including the determination of their bulk density.
Using the method presented here, the quality of four types of grains can be estimated using an accurate approach based on a relevant physicochemical parameter, i.e., the bulk density, which is related to the storage system, type of container and characteristics of the grains.
Color data from digital images are considered reliable sources of analytical information for many purposes, independent of the type of device, for example, scanners, cell phones or cameras 5 . The combination of digital images and an artificial neural network (ANN) can be adapted for applications such as the shape analysis of grains 6 and for variety identification 7 . However, in both cited studies, many complex steps were necessary to develop a predictive response model with good accuracy.
The advantage of this study is that additional information from a color histogram of a ten color scale 8,9 can be determined using a simple computational routine with fast calculations. In addition, by using the average color values instead of the entire color histogram, which includes 2560 colors 10,11 , the speed of the ANN calculations is improved, and accurate results are achieved. Therefore, ANNs are useful tools for this research because they require less computational effort than other numerical techniques.
ANNs are computational models consisting of simple processing units called neurons, which are inspired by the central nervous system of intelligent organisms that acquire knowledge through experimentation; ANNs can perform machine learning to predict parameters and recognize patterns 12,13 .
Initially, an ANN undergoes a learning phase in which some examples are presented to it during training, and it automatically extracts the necessary characteristics to represent the information learned by adjusting the synaptic weights of the neurons through an adequate learning algorithm. Then, these characteristics are used to generate answers to the problem studied. In other words, by providing input data to an ANN and reporting the desired output (response), the ANN can provide coherent results for new input data that are different from those used in training.
There are several advantages to using ANNs; for instance, ANNs are easy to use and update, they have data error tolerance because of the ability to respond in an acceptable way, even if partially damaged, they have great freedom in the adjustment of synaptic weights of neurons due to the presence of bias, which is a special processing unit that allows better adaptation on the part of the neural network to the knowledge provided to it, and they provide a precise response at high speeds [14][15][16] . The main advantage of ANNs is their ability to generalize or learn from examples 17 ; that is, ANNs can generalize learned information to provide satisfactory results for cases not seen in training. Therefore, ANNs have been used in many fields, such as chemistry 18 , geology 19 , medicine 20 , neurocomputations 21 and biomedical engineering 22 , among others.
In this sense, after an ANN has been trained and tested, it can predict the output (desired response) of new input data in the domain covered by the training examples. For food analysis, data from digital images acquired using both a camera and a desktop scanner have been applied to predict the fermentation index of cocoa beans by ANN modeling 23 . An ANN was also combined with digital images and showed excellent potential for wheat varietal identification 24 using the morphometric characteristics of these grains for the classification of different varieties with 88% accuracy and individual varieties with 84% and 94% accuracy.
Our study shows a feasible analytical method based on digital images converted into ten color scale descriptors combined with an ANN to estimate the bulk density of leguminous grains such as soybeans, beans, chickpeas and corn.

Procurement of samples and instruments
Grains of soybeans, beans, chickpeas and corn were purchased locally. The grains were sorted to eliminate external material and damaged grains. For the tests, the grains were separated in small packets made of transparent plastic bags (10 × 6 cm), as shown in Fig. 1. According to the size of the grains, each plastic bag was filled with 100 soybeans, 100 beans, 50 chickpeas, or 100 kernels of corn, resulting in 56 packets of soybeans, 35 packets of beans, 30 packets of chickpeas and 50 packets of corn.
The content of each packet was weighed on an analytical balance with ±0.0001 g precision (FA-2104N, EQUIPAR, Curitiba, PR, Brazil), and the apparent volume was measured with graduated cylinders with a volume of (50.0±0.5) mL for soybeans, corn and beans and with a volume of (100±1) mL for chickpeas. Using the measured masses (m) and apparent volumes (V), the bulk  Then, each packet was digitalized using a conventional scanner (HP, LaserJet Pro 200 Color MFP M276nw, Brazil). The final size of the images was 550 × 1000 pixels with a resolution of 96 dots per inch (dpi).

Image treatment
The digital images were processed and converted into the average values of ten color scale descriptors, red (R), green (G), blue (B), hue (H), saturation (S), value (V), relative red (Rr), relative green (Rg), relative blue (Rb) and luminosity (L), using the Matlab R2015b (The MathWorks, Natick, MA, USA) code available in the supplementary material of Camargo, Santos and Pereira 8 . Figure 2 shows an image of the data. Using this code, the average value of each histogram color, which includes 256 colors, is obtained; thus, each color is represented by one value per color histogram.

ANN description
A feedforward neural network 25 was implemented in Matlab R2015b. Two learning algorithms were tested in the development of the ANN: error backpropagation and the Levenberg-Marquardt algorithm. The Levenberg-Marquardt algorithm presented correlation coefficients in the mean of 0.88 and RMSE (Root Mean Square Error) equal to 0.031 for the test set, while the backpropagation algorithm provided results with correlation coefficient equal 0.98 and RMSE 0.014. Therefore, the backpropagation algorithm provided the best results and is described briefly below 12,13 : 1) Set the initial parameters of the network , and , (weights and bias) as random numbers.
2) From a training data set with pre-assigned input/output pairs, take the k-th ( , ) pair, calculate the outputs of the network with the same input, and form the new pair ( , ). (1) 4) Calculate the partial derivatives of error e with respect to the weights and bias. 5) Change the weights and bias according to the steepest descent strategy and a specified learning rate α: and 6) Iterate steps 2 to 5 by successively modifying , and , until a defined number of learning cycles or a stopping criterion is reached.
The parameter learning rate, number of neurons per layer, number of layers, activation functions and number of epochs were varied by trial and error to obtain the best result, and early stopping of the training was performed to avoid overtraining.

Results and discussion
The ranges of bulk density measured using the reference method were in g cm -3 for soybean In all of cases, differences among the samples were revealed that justify an ANN model using the 10 colors as input layer. Tests were also performed without variables H and L, for example, showed worse than all input variables were used. For this case, the correlation coefficients were 0.94, 0.84 and 0.92 for training, validation and test sets, respectively. Therefore, the input variables of the ANN training set were the R, G, B, H, S, V, Rr, Rg, Rb and L color descriptors, for a total of 10 input variables for each packet of grain sample; that is, the developed ANN had 10 neurons in the input layer (Fig. 2). The strategy of averaged value of each histogram color was applied because the entire histogram for the 10 colors corresponds to 2560 variables. For the ANN, 2560 values imply 2560 neuron in input layer, then the calculations would be slower than that computed using the averaged color values, mainly considering that the information would not be improved. The information from the hue (H) and saturation (S) color scales was also important as revealed by our previous evaluation 8 .
The ANN outputs were the reference bulk density values. The samples were randomly divided into three sets as follows: a total of 70% of the samples were used in the ANN training set, 15% was used for the validation of ANN, and the remaining 15% was used to test the generalizability of the ANN. The training set consisted of 119 samples, i.e., 38 soybean, 19 bean, 24 chickpea and 38 corn samples, the validation set had 26 samples, i.e., 8 soybean, 10 bean, 2 chickpea and 6 corn samples, and the test set had 26 samples, i.e., 10 soybean, 6 bean, 4 chickpea and 6 corn samples. The number of samples were related to the size of the grains and the size of the packet (plastic bag) was the same. The main goal was to analyze the entire packet then the variation in size of the grain had determined the number of packets, which did not affect ANN results, since it was not necessary for ANN to know which grain sample was being analyzed to estimate grain density in each packet.
A single ANN was developed for all four types of grains (soybeans, beans, chickpeas and corn). Several ANN architectures were tested with one intermediate layer (varying the number of neurons from 10 to 20) and with two intermediate layers (with 8 and 4 neurons in each layer, respectively). The ANN that obtained the best result had only one intermediate layer with 15 neurons, and its performance was below 10 -3 for the mean square error obtained after 1500 epochs, as shown in Fig. 3. The training was interrupted in 1500 epochs to avoid excessive adjustment by ANN, which could lead to an overfitting. The activation functions were a hyperbolic tangent function in  The highest efficiency for the four types of grains together had a learning rate of 0.8. Training was also performed with higher and lower learning rates, but the network became unstable with learning rates below 0.3, meaning that it failed to provide good results for the test samples. At a learning rate of 0.8, the ANN was stable, and the training was very fast (close to 1 min). The training error was small, approximately 10 -3 , and it was stable throughout the learning process.
Among the four types of grains used in this work, the corn grains are the least uniform, that is, they present the most variation in density (between 0.68 e 0.85 g cm -3 ), so the estimated values were more distributed along the linear adjustment, as shown in the Fig. 4. And the values of bean grains density (0.75-0.78 g cm -3 ) are within the range of corn grains density, which agrees with the results obtained for these two grains. On the other hand, soybean and chickpea grains are more uniform with little variation in density.
The plots in Fig. 4 show a high correlation of 0.94 between the reference bulk density values and the values predicted by the ANN model for both the training sets (Fig. 4A) and a correlation coefficient of 0.98 for the data from both validation and test sets (Fig. 4B and 4C). The best fit equations for the training, validation and test data are = 0.86 + 0.10, = 0.92 + 0.057, and = 0.90 + 0.069, respectively. The mean relative error was 2.1, 1.5 and 1.4% for the training, validation and test data, respectively, with the lowest and highest relative errors of the ANN responses for the test set samples between 0.01 and 5.6%, which is a noteworthy result. The accuracy of the results is shown by the plots in Fig. 5, giving the equations = 0.90 + 0.10 and = 0.90 + 0.036 , which envelop the results obtained by the ANN calculations. These equations were based on calculations involving the linear regression of the test set (Fig. 4C), replacing the term by each reference value and summing the linear coefficient to 3 times the standard deviation obtained for the absolute errors (difference between the reference values and those predicted by the ANN) of the 26 samples. Thus, for a given bulk density ρ, the response of the neural network will be in the interval [ − 0.03, + 0.03], where [ − 3 , + 3 ] and = 0.01; a range with a very small amplitude indicates high reliability in the ANN response.

Conclusions
The bulk densities of four different dry grains were accurately estimated using the average values of color descriptors from digital images combined with an ANN. This method is promising, considering that these grain foods have similar colors and different shapes and textures. The reference values and the values predicted by the ANN model were highly correlated ( = 0.98) for the test set, with low relative errors between 0.01 and 5.6%. Thus, our study showed the possibility of using an ANN to provide accurate grain quality control parameters with low computational costs.