# P I Daskalov - Improvement of healthy and fusarium diseased corn kernel s classification using robust simca method - страница 1

УДК 621.771.07

Daskalov P.I.,Mancheva V.P., Draganova Ts.D.

Bulgaria

IMPROVEMENT OF HEALTHY AND FUSARIUM DISEASED CORN KERNEL S CLASSIFICATION USING ROBUST SIMCA METHOD

New approach for healthy and Fusarium diseased corn kernels classification based on robust SIMCA method and spectral data analysis is presented in the paper. The spectral intensity characteristics of the corn kernels (two of the most popular kind of corn in Bulgaria - Knezha 613 and Knezha 436) are obtained in the range 456 - 1140 nm. Principal component analysis is used for spectral data reduction. Healthy and diseased corn kernels are classified using ten principal components by SIMCA and robust SIMCA (RSIMCA) methods. The classification accuracy for raw spectral characteristics of healthy and diseased corn kernels (Knezha 613) is 100% and for Knezha 436 - 100 % for healthy and 97,5% for diseased kernels when RSIMCA is used as a classifier. The classification accuracy for normalized spectral characteristics of healthy (Knezha 613) is 97,5% and 92,5% for diseased corn kernels and for Knezha 436 - 77,5 % for healthy and 95% for diseased kernels when SIMCA is used as a classifier.

Keywords: corn kernels, NIR spectroscopy, Robust SIMCA, Fusarium disease .

The way a problem is put. Corn diseases are widespread in all areas of its cultivation, and generally can be classified into two groups - non-infectious and infectious. More important for the production have infectious diseases - in particular disease Fusarium. Express methods for diagnosis are through analysis of visual images, by analyzing the spectral characteristics and by analysis of hyperspectral images.

Recent and current methods for diagnosing disease Fusarium in cereals are hyperspectral images. They are fast and objective for identification of infected areas, but they are costly methods of work. Bauriegel [3] reached 87% accuracy of recognition infected wheat. Williams [9] analyzes the hyperspectral images to identify healthy and Fusarium-infected corn kernels. It achieves 99.2% recognition accuracy for healthy grains and 97.6 percent - of those infected. Both studies refer to a variety respectively - wheat and corn.

Analyze of latest researches and published works. In assessing the disease Fusarium performed by analysis of the external signs [1,10,4] the surface of the object or the visible part is analized. But sometimes that just does not enough score because they do not always occur and thus ensure that the internal structure of the grains is completely healthy [1]. Therefore internal signs of disease are used for Fusarium recognition [6.7]. They are assessed using spectral analysis in the visible and near infrared region. The changes that occur in seeds, (color and texture) are assessed by measuring the diffuse reflection.

From previous studies of healthy and Fusarium infected maize grains [2.5] it is found that the identification procedure is strongly influence by the tested varieties. Therefore more robust method that is independent of varietal identity is needed. One such method could be modified version of the SIMCA method, called robust SIMCA method (Robust Soft Independence Model of Class Analogy - RSIMCA). From the classical SIMCA approach perspective, the outliers residuals from the PCA model can be considerably small, and thus, the objects with small residuals are classified as members of a given group. To overcome a negative influence of outliers upon the principal components, and thus, to define boundary of a group well, a robust version of SIMCA should be applied. While SIMCA method leads to "soft / light" method of classification because it allowed to have objects that are not classified in any of the established classes, it is robust version RSIMCA - "hard" method of classification for each object can be assigned to only one of the established classes. There is independence of sampling procedures with outliers (large outliers). K. Vanden Branden and M. Hubert [8] offer a Matlab library for robust classification -

LIBRA.

Aim and problems of paper. The aim of this paper is to compare the classification of corn kernels healthy and Fusarium diseased using robust version of SIMCA method and standard SIMCA method. To reach this aim the following problems should be solved:

-to achieve the NIR spectral characteristics of seven varieties corn kernels; -to assess the classification accuracy using two classifiers - robust SIMCA and standard SIMCA. Materials and results of researches.

Corn samples. Seven varieties of corn kernels were examined - Knezha 308, Knezha 436, Knezha 613, Knezha 620, 26A, XM87/136 and Ruse 424. They have been certified by the Maize Institute in the town of Knezha, Bulgaria since 2008. Two samples were formed for each variety - training and test. The images of healthy and diseased corn kernels are presented in fig. 1.

Fig. 1 Healthy - а and diseased -b corn kernels - seven varieties Spectral data acquisition.

Spectral characteristics were obtained by spectrophotometer Ocean Optics in the visible and near infrared spectral area of 456 to 1140 nm. For each of the varieties have taken the spectral characteristics of intensity (Intensity) of 50 healthy and 50 infected grains for both sides - germ side and the other side. Total of 100 characteristics of healthy grains (50 - of the germ side and 50 of the other side) and 100 characteristics of contaminated grains (50 - of the germ side and 50 of the other side). The spectral characteristics of two corn kernels varieties are shown in fig. 2. The characteristics of the other five varieties look similar.

а - variety Ruse 424

b - variety Knezha 613

Start

Choice of corn

Formation of training and test sample

from

Separation of spectral data

classes healthy and

Classification of spectral classification rules R1

Assessment the accuracy of the

kernel variety

the training sample into two Fusarium infected

data and

from the test sample with R2 method RSIMCA

classification data from the test

sample

End

Fig. 2 Spectral characteristics of intensity of 50 healthy and 50 Fusarium diseased corn kernels

The resulting characteristics are very similar in shape and can not be defined areas of the spectrum are not influenced by the grain species. Accordingly, it can be obtained directly identifying of maize kernels. Therefore it is necessary to establish procedures by trained classifiers to provide an assessment of the classification of healthy and infected grains. In previous studies [2.5] it is found that the recognition results are better. But the main disadvantage is that procedures are developed for each variety separately, as varietal identity influence. This robust SIMCA method is one option to reduce this impact.

An algorithm for separation of healthy from Fusarium infected kernels was developed (Fig. 3).

In step 1 corn variety which will be analyzed is selected. The formation of the training and test samples (step 2) use the method of Kennard and Stone [5]. Training set includes 30 kernels and test set includes 20 kernels for each class - healthy and infected. For each corn kernel variety the total number of the spectral characteristics are 120 for training set and 80 for test set.

The third step of the algorithm includes separation of the spectral data from the training sample into two classes. The approach chosen for the classification of corn kernels RSIMCA [8] requires all data - healthy and infected kernels from the training sample to be collected in a single array x. This means that we have to specify the number of classes by the training

Fig. 3 An algorithm for separation of healthy from Fusarium infected kernels

sample x. In the development of training spectral data sample located in the general array x is divided into two classes:

- class 1 - Fusarium infected kernels

- class 2 - healthy kernels.

Matlab library for robust analysis - LIBRA [11] is used for spectral data classification in step 4. Robust PCA analysis (ROBPCA) is made for each of the classes. Then classification rules of type (1) and (2) are developed to determine membership of new observations. The implementation of the method is made using 10 principal components (PC = 10) of the robust PCA analysis and values of the tuning parameter у = (0 ^ 1) with step 0.1. The last fifth step realizes the classification method RSIMCA.

Results.

Seven corn varieties were classified by both classification rules R1 and R2 in range of the parameter у = (0 ^ 1). Dependence of percent correct identification of the parameter у is shown in fig. 4 and fig. 5. The results for the percentage of correctly identified corn kernels from the test sample are presented in tab. 1. For comparison and analysis of treatment received by RSIMCA are given and results of treatment with the standard SIMCA method which are obtained and presented in [2].

The obtained results show that there are no changes in percentage correct recognition of healthy and Fusarium diseased corn kernels when two classification rules R1 and R2 are used with the robust SIMCA method and different corn variety.

In five of the seven varieties best results from the separation of both classes occur at у = 0. With the exception of healthy kernels from a variety Kneja 308 and infected - from a variety Kneja 436, with all varieties can be reached good accuracy - 90%.

The classification results with RSIMCA were compared with results obtained in [2] for three types of spectral data pretreatment - smoothing first and second derivatives of the SIMCA method. The best results in the type of pretreatment with SIMCA are shown in tab. 1. It was found that RSIMCA gave an improvement in accuracy of identification of the kernels to SIMCA method in the following varieties:

> Knezha 308, Knezha 613, 26А и XM87/136 - for Fusarium diseased kernels;

> Knezha 436 and Knezha 620 - for healthy kernels.

For varieties Ruse 424 (healthy and diseased kenles) Knezha 613 (healthy) and Knezha 620 (diseased) are obtained the same results with both methods. While a variety of healthy kernels Knezha 308, 26A and XM87/136 and infected a variety Knezha 436 - the best results are obtained with SIMCA method.

110

- kn308

100

kn436

kn613 kn620

90

26A XM87

80

rs424

70

^---\

60

^ 50

\

40

\ \

30

\

20

\

10 0

' \ '

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

gamma

110

100

- kn308

kn436

kn613 kn620 26A XM87 rs424

90 80

70

TP, %

\

40

30

20

\

10

0

V ■

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 gamma

а - healthy kernels

b - Fusarium diseased kernels

Fig. 4. Depending on the percentage of correctly identified corn kernels as a function of the parameter gamma (TP = f (gamma))

for classification rule R1

110

110

- kn308

100

kn436

100

kn613 kn620

90

26A XM87

90

80

rs424

80

70

^----\\

70

■ \ ■

60

%

60

%

*- 50

\ \

*- 50

40

\

40

30

\

30

20

\

20

\

10 0

• V.

10

.V. ,

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

gamma

gamma

kn308 kn436 kn613 kn620 26A XM87

rs424

а - healthy kernels

b - Fusarium diseased kernels

Fig. 5. Depending on the percentage of correctly identified corn kernels as a function of the parameter gamma (TP = f (gamma))

for classification rule R2

The percentage of improvement with both methods is calculated based on tab. 1 and results are presented in tab. 2. In the tab. 2 1 is class diseased kernels; 2 - class healthy kernels; sign „+" is for the method with best results, sign „=" is for methods with equal results.

For each kernel from class healthy and class diseased are computed its score distance within and its orthogonal distance to the PCA subspace estimated from the training set (Fig. 6).

The results show that the extrime outliers which are numberd in fig. 6a have to be eliminated. This is recommended as they alter the misclassifications unnecessarily.

Conclusions. 1. The outlier in the classes (healthy and Fusarium diseased) can be detected and removed using robust SIMCA method.

2. The use of RSIMCA method for classification of spectral data for healthy and Fusarium diseased corn kernels show improved results compared to standard made SIMCA analysis of data for five of the seven varieties.

3. To reduce the processing time of spectral data is appropriate studies to be made for classification rule R1 in value of the tuning parameter у = 0,5.

Table 1

Comparison of percentage correct recognition of healthy and Fusarium diseased kernels using SIMCA and RSIMCA classification methods

Corn kernel variety

Class

Method

RSIMCA

SIMCA

R1

% correct recognition

R2

% correct recognition

gamma

% correct recognition

kind of the pretreatment

Knezha 308

healthy

72,5

72,5

Y=0,5

95

second derivative

diseased

100

100

97,5

Knezha 436

healthy

100

100

у=0

40

smoothing and first derivative

diseased

35

35

100

Knezha 613

healthy

100

100

7=0,4-1

100

second derivative

diseased

100

100

97,5

Knezha 620

healthy

100

100

у=0-0,9

97,5

smoothing and first derivative

diseased

97,5

97,5

97,5

26А

healthy

90

90

у=0

100

second derivative

diseased

100

100

95

XM87/136

healthy

92,5

92,5

у=0

100

second derivative

diseased

100

100

97,5

Ruse 424

healthy

100

100

у=0-0,3

100

first and second derivative

diseased

100

100

100

Table 2

The percentage of improvement with SIMCA and RSIMCA methods

variety

Knezha 308

Knezha 436

Knezha 613

Knezha 620

26А

XM87/136

Ruse 424

class method

2

1

2

1

2

1

2

1

2

1

2

1

2

1

SIMCA

+22,5

+65

=

=

+10

+7,5

=

=

RSIMCA

+2,5

+60

=

+2,5

+2,5

=

+5

+2,5

=

=

x 104

ROBPCA

5

О 2

4.5

4

Orthogonal distance

32

Э 8

1