Logo Passei Direto
Buscar

1-s2 0-S0263224121004747-main

Material
páginas com resultados encontrados.
páginas com resultados encontrados.

Prévia do material em texto

Measurement 179 (2021) 109491
Available online 3 May 2021
0263-2241/© 2021 Elsevier Ltd. All rights reserved.
A hybrid attention improved ResNet based fault diagnosis method of wind 
turbines gearbox 
Kai Zhang , Baoping Tang *, Lei Deng *, Xiaoli Liu 
The State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing 400030, China 
A R T I C L E I N F O 
Keywords: 
Wind turbines 
ResNet 
Attention mechanism 
Fault diagnosis 
Wavelet transform 
A B S T R A C T 
It is significant to boost the performance of fault diagnosis of wind turbine gearboxes. In this paper, a hybrid 
attention improved residual network (HA-ResNet) based method is proposed to diagnose the fault of wind tur-
bines gearbox by highlighting the essential frequency bands of wavelet coefficients and the fault features of 
convolution channels. First, the paper performed wavelet packet transformation (WPT) on the raw signal and 
improved the ResNet by the band attention to highlight features of wavelet coefficients. Second, a fault diagnosis 
framework based on channel attention is designed to effectively improve the nonlinear feature extraction ability 
of deep convolutional networks. The proposed method is verified by a simulation dataset of the drivetrain 
diagnostic simulator (DDS) and the measured data from a wind farm. The results illustrate the superior per-
formance of the HA-ResNet based fault diagnosis method for time–frequency feature extraction of vibration 
signals, frequency band information enhancement, and recognition accuracy improvement. 
1. Introduction 
Wind power has been widely valued worldwide as clean energy, and 
the cumulative installed capacity of wind turbines is increasing year by 
year. Meanwhile, the research on fault diagnosis of wind turbines (WTs) 
gearbox has received widespread attention, especially among wind 
turbines’ key components. The reason is that the wind turbine gearbox is 
subject to high dynamic repeated fluctuating load, which is one of the 
leading causes of wind turbine failures [1,2]. Due to the extended 
downtime of wind turbines gearbox failure and high maintenance costs, 
it is of great significance to quickly diagnose and locate faults. 
At present, the data-driven fault diagnosis method for wind turbines 
speed-increasing gearbox driving system, which does not rely on accu-
rate physical models and rich signal processing experience, has 
increasingly become a research hotspot in the field of fault diagnosis [3]. 
Qiao Wei et al. [1,2] reviews the condition monitoring and traditional 
machine learning(TML) based fault diagnostic methods for components 
and subsystems of WTs. As showing in Fig. 1 [1,2,4], the general process 
of TML based methods is data acquisition, signal processing, feature 
extraction, feature selection, pattern recognition [5]. Although some 
scholars are still committed to relating the research on TML methods and 
have made some excellent works, the weak nonlinear feature extraction 
capabilities, feature extraction, and feature selection process rely on 
manual experience, which cannot meet the intelligent need of the in-
dustrial application. 
Since 2006, deep learning proposed by Hinton et al. [6] has been 
increasingly widespread due to its highly abstract feature extraction. As 
shown in Fig. 1, compared to the TML based fault diagnosis methods, 
deep learning [7–9] methods provide a brand-new solution for fault 
diagnosis of wind turbine gearbox. On the one hand, they do not need to 
combine rich engineering practice experience to manually extract and 
filter fault features. On the other hand, these methods possess powerful 
nonlinear feature extraction capabilities to feature-coupling and time- 
varying features in vibration signals. Thereinto, the convolutional neu-
ral networks(CNN), proposed by Yann LeCun et al. [10], has been 
improved into a deep supervised learning algorithm, which effectively 
reduces the risk of overfitting by local receptive fields, weight sharing, 
and pooling. Due to its significant advantages in feature extraction, deep 
Abbreviations: WTs, Wind turbines; TML, Traditional machine learning; CNN, Convolutional neural networks; ResNet, Residual network; WPT, Wavelet packet 
transformation; HA-ResNet, Hybrid attention improved resnet; ReLU, Rectified linear unit; BN, Batch normalization; BW, Band weights; DDS, Drivetrain diagnostic 
simulator; WDCNN, Convolutional neural networks with wide first-layer kernels; SGDR, Stochastic gradient descent with warm restarts; t-SNE, t-distributed sto-
chastic neighbor embedding; CMS, Condition monitoring system. 
* Corresponding authors. 
E-mail addresses: bptang@cqu.edu.cn (B. Tang), denglei@cqu.edu.cn (L. Deng). 
Contents lists available at ScienceDirect 
Measurement 
journal homepage: www.elsevier.com/locate/measurement 
https://doi.org/10.1016/j.measurement.2021.109491 
Received 19 November 2020; Received in revised form 13 April 2021; Accepted 25 April 2021 
mailto:bptang@cqu.edu.cn
mailto:denglei@cqu.edu.cn
www.sciencedirect.com/science/journal/02632241
https://www.elsevier.com/locate/measurement
https://doi.org/10.1016/j.measurement.2021.109491
https://doi.org/10.1016/j.measurement.2021.109491
https://doi.org/10.1016/j.measurement.2021.109491
http://crossmark.crossref.org/dialog/?doi=10.1016/j.measurement.2021.109491&domain=pdf
Measurement 179 (2021) 109491
2
CNN has been widely used in fault detection and diagnosis[11–14]. 
Furthermore, researchers have improved convolutional networks’ 
backbone to enhance their feature extraction capability[15,16]. These 
works prove that CNN and its extended methods an outstanding capa-
bility to deal with fault diagnosis tasks. 
However, a big issue dividing the scholars is the way to use vibration 
signals in deep neural networks in fault diagnosis. As showing in Fig. 1, 
some hold the point that the raw vibration signals should be directly 
inputted into the deep neural networks, in view that it will lose part of 
the information with pre-processing [17–20]. Researchers holding this 
view have successfully trained the deep learning framework with raw 
vibration signals and performed various deep neural network improve-
ments for the original one-dimensional vibration signal to enhance the 
model’s nonlinear feature extraction capability. For instance, Jun Pan 
[13] proposed a deep learning networks (Lifting Net) to learn features 
adaptively from raw mechanical data without prior knowledge; Ruonan 
Liu [12] designed a novel diagnosis framework based on the charac-
teristics of industrial vibration signals by adding a dislocate layer; W. 
Zhang [11] presented a convolution neural networks with training 
interference to address the working load changing and noise from the 
working environment. 
Conversely, another point is that it is hard to effectively extract fault 
information from time-domain signals only using traditional deep 
learning models due to the variable working conditions and weak fault 
features of wind turbine gearbox in engineering practice [21–25]. While 
the time–frequency analysis can compensate for this deficiency by 
simultaneously presenting the features relationship of one-dimensional 
raw signals in the time-domain and the frequency domain. As a classic 
time–frequency analysis method, wavelet transformation and its 
extended methods have been widely applied in rotary machines 
(including wind turbines) with TML methods[4,26,27]. In this regard, 
early researchers conducted related research. For instance, Renxiang 
Chen et al. [21] presented intelligent fault diagnosis combining CNN and 
discrete wavelet transform for wind turbine gearboxes; Minghang Zhao 
et al. [22] developed a variant of deep residual network (ResNet), the so- 
called ResNet with dynamically weighted wavelet coefficients (DRN +
DWWC), to improve diagnostic performance. Yan Han[23] proposed a 
dynamic ensemble convolutional neural networks for fault diagnosis by 
fusion of the multi-level waveletcoefficients. Other analysis methods for 
time–frequency transformations such as ensemble empirical mode 
decomposition [28] and variational mode decomposition[29] can also 
be used to feature the feature pre-extraction of vibration signals. How-
ever, there is no consensus about which frequency band contains the 
most intrinsic information about a planetary gearbox’s various health 
statuses. Moreover, the contribution of wavelet coefficient frequency 
bands varies from datasets to datasets. 
The attention mechanism (AM) offered a flexible and solution to this 
problem and was initially used for machine translation and image 
recognition [30–33] to automatically (soft-)search for parts of a source 
sentence. These works have laid a solid foundation for the study of 
attention mechanisms in the field of fault diagnosis. In the field of fault 
diagnosis, work [18] introduced an attention mechanism to help the 
deep networks locate information in the raw data segment and extract 
the input’s discriminative features. What’s more, Y H Chang et al. [34] 
proposed a novel meta-learning network with an adaptive ability to 
assess the degree of correlation between various data to realize state 
recognition of shipborne antenna under small samples prerequisite. 
Inspired by those works, especially DRN + DWWC [22] and channel 
attention Networks[34,35], this paper proposes a hybrid attention 
improved ResNet (HA-ResNet) based method to boost the performance 
of ResNet and to diagnose faults of wind turbine gearbox with high 
accuracy. This proposed method highlighted the useful frequency bands 
of wavelet coefficients and the important fault feature information of the 
convolution kernel channel. First, the paper performs wavelet packet 
transformation (WPT) on the original signal to highlight the vibration 
signal’s weak features. A fault diagnosis framework based on channel 
attention networks is then designed to effectively improve the nonlinear 
feature extraction ability of deep convolutional networks. Finally, the 
original method is improved on the band attention mechanism. Different 
attention weights are assigned to the wavelet coefficient bands to 
enhance further the proposed model’s ability to recognize weak fault 
features. The contributions of this paper are summarized as follows: 
(1) Given that frequency bands’ contribution varies from data to 
data, frequency-band attention mechanism improvement was 
designed in the wavelet-ResNet fault diagnosis framework to 
highlight the weak but crucial frequency band in the wavelet 
coefficients adaptively. 
(2) Considering the randomness of feature-maps extraction of chan-
nels, the channel attention mechanism is designed to obtain vital 
channel features automatically to boost deep networks’ 
performance. 
The organization of the paper is as follows. Section 2 briefly presents 
the basic theory of the standard convolution module, ResNet, and the 
proposed HA-ResNet, as well as its fault diagnostic procedure. Section 3 
analyzes and discusses the experimental diagnosis results and engi-
neering application for wind turbine gearbox. Finally, Section 4 gives 
the overall conclusions. 
Feature
extraction 
& selection
Input raw signal
directly
Input Pre-
processed 
signal
Traditional ML 
based 
SVM,
ANN,
HMM,
K-NN, 
et al.
Signal Processing
Statistic analysis,
STFT,
WT,
EMD,
VMD
et al.
DL based methond
SAE,
DBN,
CNN,
ResNet,
et al.
Dataset
Sample,
Label,
et al.
Data acquisition
Vibration signal,
Sound signal,
et al.
Fig. 1. The data-driven fault diagnosis process based on traditional machine learning and deep learning. 
K. Zhang et al. 
Measurement 179 (2021) 109491
3
2. Methods 
Since the method proposed in the paper involves related knowledge 
of convolutional neural networks and ResNet, it is necessary to give a 
brief introduction to the involved method before proposing the fault 
diagnosis framework. 
2.1. The standard convolution module 
Generally, the Standard Convolution block contains a convolutional 
layer and the pooling layer: The convolutional layer is composed of 
several convolutional kernels (or filters) to compute feature maps, while 
the pooling layer is a sub-sampling operation to improve computation 
efficiency and make features more robust. A general convolutional 
neural network is formed, shown as Fig. 2[10], through the super-
position of convolutional layers, pooling layers, and fully connected 
layers, and the Softmax activation function before the final output layer, 
In the convolutional layer, units in each layer are only connected to a 
part of units in the previous layer (convolution calculation rules), 
namely local connections. The feature map of the input data is calcu-
lated through a different number of filters, and the weights of the filter 
are the same for all neurons in the upper layer, namely weights sharing. 
Mathematically, the operation of the convolutional layer is formulated 
as[16]: 
zl+1
j = σ
(
∑
i
xli ∗ w
l
ij + b
l+1
j
)
(1) 
where * is the operation of convolution; xl
i is the ith channel of feature 
maps in the lth layer; wl
ij and bl+1
j denotes the jth kernel and bias in the 
corresponding layer, respectively;zl+1
j represents the jth channel of 
feature maps in the (l + 1)-th layer; σ(∙) is the activation function aimed 
to implement nonlinear transformation, such as the most commonly 
used rectified linear unit (ReLU). 
Then, an optimal iteration is realized through error backpropagation 
between the predicted label and the real label until the network con-
verges. A loss function constrains this training process. And the most 
typically used cross-entropy loss (also used as the loss function in this 
study) is expressed as follows[16]: 
Ln = −
∑m
k=1
yklog
(
ŷk
)
(2) 
where m is the sample number of mini-batches, and n is the iteration 
index. y andŷ denote the true labels and predicted labels, which are 
generally expressed as one-hot vectors during implementation. 
2.2. The ResNet module 
As the number of neural network layers continues to deepen, the 
difficulty of training the CNN model will gradually increase as well. In 
response to the difficulty of training deep CNN models, K. He et al. [36] 
of Microsoft Research Asia proposed the ResNet model in 2015. The 
ResNet model further reduces deep neural networks’ training difficulty 
by designing identity mappings based on the ordinary CNN. They 
facilitate the backpropagation of errors and optimize model parameters. 
And it has achieved good results in computer vision-related tasks such as 
image recognition, image segmentation, target positioning, and so on. 
Essentially, the ResNet model is an upgraded version of the CNN 
model, which is the core method used in this paper for fault diagnosis of 
wind turbine gearbox. It is usually composed of important basic blocks, 
including an input layer, a series of convolutional layers (Conv), batch 
normalization (BN), identity mappings, and global mean pooling, a fully 
connected output layer, and so on. Similar to the standard convolution 
module, ResNet also has basic residual blocks. And it is a common basic 
residual block shown in Fig. 3[37], in which the main operation path is 
“BN → ReLU → Conv → BN → ReLU → Conv” and then adds to the cross- 
layer path (identity mapping) to form a complete basic residual block. 
2.3. The proposed hybrid attention ResNet 
2.3.1. Wavelet packet transform 
In this study, the raw vibration signal needs to obtain wavelet co-
efficients through WPT and then input them into subsequent neural 
networks. Fig. 4 shows the process of WPT of vibration signals [26]. In 
WPT, a function in Hilbert space can be decomposed into a scale func-
tion and a wavelet function from the mathematicperspective. The scale 
function can be used to construct a low-pass filter for the raw signal, and 
the wavelet function can be used to build a high-pass filter for the raw 
signal. From the perspective of signal processing, the signal can be 
decomposed into high-frequency components (high-frequency sub- 
band) and low-frequency components (low-frequency sub-band) signal 
filters. In this method, the high-frequency sub-band is also called the 
detailed sub-band, and the low-frequency sub-band is also called the 
approximate sub-band. 
Input layer
Output layer
Convolutional 
layer 
Convolutional 
layer Pooling layer Pooling layer Fully connected layer
Fig. 2. The Standard Convolution Module. 
+
Conv_2
Conv_1
BN
Relu
BN
Relu
Fig. 3. A common basic residual block of ResNet. 
K. Zhang et al. 
Measurement 179 (2021) 109491
4
The advantage of WPT is that it can simultaneously present the time- 
domain and frequency-domain feature changes in the vibration signal. 
For example, the time-domain waveform in Fig. 5 (a) and (b) are 
simulated signals of formula (3) and formula (4). The weak frequency 
component changes in the frequency-domain of Fig. 5 (a) and (b) are 
easily submerged in other vibration components in the time-domain. But 
it can be presented through the figure comparison of fast Fourier 
transform and WPT. In contrast, Fig. 5 (c) and (d) show that the fre-
quency components of the frequency-domain remain unchanged, only 
the time sequence of the frequency components is changed. As you can 
see from the comparison of Fourier transform and WPT, the fast Fourier 
transform is challenging to show the time changes of the time domain 
signal, which is easy to distinguish in the wavelet packet coefficients. 
f (t) = cos(2π × 10t)+ cos(2π × 25t)+ cos(2π × 50t)+ cos(2π × 100t)
(3) 
f (t) = cos(2π × 10t)+ cos(2π × 25t)+ cos(2π × 35t)/2+ cos(2π
× 50t)+ cos(2π × 100t) (4) 
2.3.2. The proposed hybrid attention ResNet 
This section presents the proposed bearing fault diagnosis method, 
including the proposed network architecture, the frequency attention 
mechanism, and the frequency attention mechanism. 
(1) The proposed networks architecture 
This paper’s basic model architecture is ResNet with a 34-layer 
structure, as shown in Fig. 6. The model’s mainframe is composed of 
the input layer, frequency-band attention unit, batch regularization, 
activation function, basic residual block, global mean pooling, dropout, 
and fully connected softmax layer and output layer. In the input layer, 
the proposed networks take as inputs the wavelet coefficients of raw 
vibration signals. The residual stage module is composed of several re-
sidual basic blocks. In the stage module, the identity mapping structures 
between the first and the rest residual basic block are slightly different. 
In the identity mapping of the first basic residual block, there is an 
additional “global pooling-convolution-batch regularization” structure 
to match the number of filters, as shown in Fig. 8 (a) and (b). 
It is worth noting that, between the convolution and the activation 
function, the ResNet used introduces batch normalization[38] to speed 
up training and prevent overfitting. In this way, equation (1) is rewritten 
as follows[38], 
zl+1
j = σ
(
BN
(
∑
i
xli ∗ w
l
ij
))
(5) 
where the BN is the batch normalization. 
Another trick that separates from the traditional CNN, the ResNet 
replaces the fully connected layer by global average pooling[39]. One 
advantage of global average pooling over the fully connected layers is 
that it is more native to the convolution structure by enforcing corre-
spondences between feature maps and categories[39]. At the same time, 
this strategy reduces the parameters of the entire network. 
(2) The frequency band attention block 
The time–frequency domain coefficients of WPT are explored for 
feature extraction and fault diagnosis. It is easy to note that not all bands 
in the wavelet coefficients have an equal contribution to fault recogni-
tion. Simultaneously, unlike the time-varying features in the time 
domain, different fault types appear relatively fixed in the wavelet co-
efficient frequency band. This is the motivation to employ the attention 
mechanism to locate the informative frequency bands concerning 
gearboxes’ different health conditions. 
As Fig. 7 shows, the frequency band attention module takes the 
wavelet coefficients as inputs. To locate valuable information in the 
data, the input wavelet coefficients are split into Nb bands, marked as C. 
And the input data is, 
C = [c1, c2,⋯, cNb ], ci ∈ RN (6) 
Then, the attention mechanism initializes a set of weights αi 
randomly for ci, which can be used to evaluate the importance of the 
corresponding frequency band in fault diagnosis. After that, band 
weights (BW) αi will be computed by an attention net f b
att which is a 
simple neural network and similarly takes ci as input, and a softmax 
function to get a normalized index di. The mathematical description of 
this process is as follows, 
di = fatt(ci) (7) 
S
A
1
D
1
A
D
2
A
A
2
D
D
2
D
A
2
A
A
D
n
A
A
A
n
D
D
D
n
D
A
A
n
Time Domain
Fr
eq
ue
nc
y 
D
om
ai
n
0.
02
0.
04
0.
06
0.
08
0.
1
0.
12
0.
14
0.
16
Ti
m
e
-0
.50
0.
5
Amplitude
Ti
m
e(
s)
Fig. 4. The process of WPT of vibration signals. 
K. Zhang et al. 
Measurement 179 (2021) 109491
5
(b) Simulation signal 2
(c) Simulation signal 3
(d) Simulation signal 4
(a) Simulation signal 1
Fig. 5. Comparative analysis of simulation signals in time domain, frequency domain and time–frequency domain. 
K. Zhang et al. 
Measurement 179 (2021) 109491
6
αi =
edi
∑N
k=1edk
(8) 
At the same time, the original wavelet coefficients are sliced through 
a custom Lambda layer. When the attention for the bands is generated, 
the enhanced representation vector Vb for the input data can be obtained 
as 
Vb = [v1, v2,⋯, vNb ], vi = αici (9) 
Afterward, as the enhanced representation of C, Vb can be used for 
further diagnosis. 
(3) The channel attention block 
As the name suggests, convolution neural networks rely on convo-
lution operations, using the idea of local receptive fields to fuse spatial 
information and channel information to extract features. Each con-
volutional layer has several filters, which can learn the local spatial 
connection pattern features, including all channels. In other words, the 
convolution filter extracts the fusion information of the space and 
channel in the local sensing area. But traditionally, the output based on 
the convolutional layer does not consider the variations in the influence 
of each channel. Therefore, in this paper, the purpose of adding the 
channel attention mechanism is to allow the networks to selectively 
enhance channel features with large amounts of information so that 
subsequent processing can make full use of these features and suppress 
the features of the useless or low-effect channels. 
This paper’s channel attention mechanism strategy is shown in Fig. 8 
(b) and (c). Except that the first residual basic block and the remaining 
residual basic blocks have a slight difference between the identity 
mapping structure to match the number of filters, the rest of the prin-
ciples are the same. The process issummarized as follows. 
Assuming that the feature Vb after frequency band attention is pro-
cessed by a series of channel transformations 
{
f c
1, fc
2,⋯, f c
Nc
}
, the signals 
processed by various feature channels are obtained as, 
Uc = [u1, u2,⋯, uNc ], ui = f ci (V) (10) 
Then, the global contextual information with embedded channel- 
wise statistics is gathered by a global average pooling across spatial 
dimensions, 
sc =
1
H ×W
∑H
i=1
∑W
j=1
GUc (i, j) (11) 
where H and W are the block output feature-map sizes. Mapping GUc (i, j)
a global average pooling process. The split weights of channel attention 
given by 
αci =
⎧
⎪⎪⎪⎨
⎪⎪⎪⎩
exp(sc)
∑Nc
1
exp(sc)
if Nc > 1
1
1 + exp( − sc)
if Nc = 1
(12) 
The cardinal channel representations are then concatenated along 
the channel dimension, 
Vc = Concat
{
αc1u1, αc2u2,…, αci uNc
}
(13) 
Afterward, Vc as the optimized representation of Uc can be used for 
further diagnosis. 
2.4. Fault diagnosis procedure based on hybrid attention ResNet 
This paper performs fault diagnosis of wind turbines gearbox based 
on the model designed above. The detailed steps, shown in Fig. 9, are 
summarized as follows: 
(1) Obtain the vibration signal of the wind turbines gearbox through 
the acceleration sensor; 
(2) Analyze the failure report of wind turbines, label and slice the 
corresponding vibration signal, and convert the original vibration 
signal in the sample into wavelet coefficients through wavelet 
packet transformation; 
(3) Divide the data set into training data set and test data set, train 
the designed hybrid attention ResNet using the training data set, 
and select the best model as the trained model using the test data 
set; 
(4) Collect real-time vibration signals, perform WPT with the same 
parameters as the training set, and give fault diagnosis results 
through the trained model. 
3. Experiments and results 
As a crucial part of wind turbines, the gearbox’s high-precision fault 
diagnosis is of great significance. A fault simulation dataset from the 
drivetrain diagnostic simulator (DDS) testbed, and a dataset of measured 
Conv
Conv
BN
Relu
BN
Relu
Softmax
Dense
Repeat
Split-ban
BW-1
BW-2
BW-N
Split-ban
BC-1
BC-2
BC-N
C
onv
B
N
R
elu
Band_ attention
Co vn
Co vn
BN
Relu
BN
Relu
Sof mt ax
Dense
Repeat
Spl ti -ban
BW-1
BW-2
BW-Nb
Spl ti -ban
BC-1
BC-2
BC-Nb
×
×
×
C
o
vn
B
N
R
elu +
+
Fig. 7. The band attention block. 
Band_ attention
Input
BN
Relu
MaxPool2D
Stage
Stage
Stage
Average pooling
Dense
Softmax
Output
Stage
BasicBlock1
BasicBlock2
BasicBlock3
BasicBlockN
n-1
Dropout
Fig. 6. The architecture of the proposed hybrid attention ResNet. 
K. Zhang et al. 
Measurement 179 (2021) 109491
7
wind turbine gearbox failure in a wind farm were built to validate the 
proposed method’s effectiveness. This section will explain the experi-
mental setup and analyze the results. 
3.1. Experimental validation 
3.1.1. Experimental setup and data description 
The DDS testbed performs a fault simulation experiment for the 
Training dataset preprocessing Wind Farm
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
2
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
Model training
Real-time Data
Trained model
Results
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
A
m
pl
itu
de
Ba
nd
_ 
at
te
nt
io
n
In
pu
t
B
N
R
el
u
M
ax
Po
ol
2D
St
ag
e
St
ag
e
St
ag
e
A
ve
ra
ge
 p
oo
lin
g
D
en
se
So
ftm
ax
O
ut
pu
t
St
ag
e
Ba
sic
Bl
oc
k1
Ba
sic
Bl
oc
k2
Ba
sic
Bl
oc
k3
Ba
sic
Bl
oc
kN
n-1
D
ro
po
ut
Fig. 9. The fault diagnosis procedure based on hybrid attention ResNet. 
BasicBlock1
+
Conv
Conv
BN
Relu
Short cut
Relu
+
Average 
pooling
BN
Global average 
pooling
Average 
pooling
BN
Relu
r-Softmax
+
Split-cha
SU-1
SU-2
SU--N c
Conv
Conv
BN
Relu
Split-cha
SW-1
SW-2
SW-N c
×
×
×
Conv
BN
Relu
Conv
Channel 
attention
+
BasicBlockN
Conv
BN
Relu
Short cut
Relu
+
Global average 
pooling
Average 
pooling
BN
Relu
r-Softmax
+
Split-cha
SU-1
SU-2
SU-N c
Conv
Conv
BN
Relu
Split-cha
SW-1
SW-2
SW-N c
×
×
×
Conv
Channel 
attention
+
Conv
BN
Relu
+
(a) First basic block of ResNet block(b) Rest of basic block of ResNet block
Fig. 8. The channel attention ResNet block. 
K. Zhang et al. 
Measurement 179 (2021) 109491
8
gearbox of wind turbines. DDS consists of a speed controller, a driving 
motor, a 2-stage planetary gearbox, a 2-stage parallel shaft gearbox with 
rolling bearings, and a programmable magnetic brake, as shown in 
Fig. 10, to test the performance of the proposed method. In this exper-
iment, the driving motor’s speed range from 20 Hz to 32.85 Hz, and nine 
health conditions (including normal condition) were illustrated in 
Table. 1. In each condition, 36 s of vibration signal data were collected 
by NI 9234 and NI 9188 at a sampling frequency of 25600HZ, which was 
repeated four times. 
To prove the effectiveness and advantages of the proposed model, 
CNN[10], convolutional neural networks with wide first-layer kernels 
(WDCNN) [17], and ResNet [36] are compared in this paper. The hyper- 
parameters settings of the model proposed are shown in Table. 2, and 
the specific structure and hyper-parameters setting of the remaining 
comparison models refer to the original paper. The learning rate adopts 
stochastic gradient descent with warm restarts (SGDR) adaptive 
adjustment strategy [40]. Its parameters are set as factor = 0.01, 
patience = 5, min_lr = 0.00005, and parameters not mentioned in the 
table are default values. 
Besides, this paper uses Python (3.62) language, TensorFlow 
(1.15.4), and Keras (2.3.1) framework to construct and optimize the 
proposed model and comparison models. In terms of hardware config-
uration, this paper uses the Intel i7-6700 processor and NVIDIA’s 
GeForce GTX 1080 GPU to shorten training and optimization time. 
3.1.2. Results and analysis 
The collected raw vibration signal of each status was divided into 
900 samples without overlapping (an example with 4096 points, as 
shown in Fig. 11), and the ratio of the training set to the test set is 5:4. 
Gaussian noise with a signal-to-noise ratio of 6db is added to each 
sample’s raw vibration signals to simulate real working conditions’ 
background environment. Then, the WPT was employed to present 
time–frequency features of raw vibration signals. An optimal choice of 
wavelet basis was db3, which refers to the results of work [4]. 6-level 
decomposition was employed in this experiment, considering the reso-
lution balance of the time domain and frequency domain. 
After that, the pre-processed data was divided into sub-datasets of 
the raw vibration signals and wavelet coefficients. The raw dataset was 
used to compare CNN and WDCNN; then, the fault diagnosis accuracy of 
CNN, WDCNN, ResNet, and the proposed HA-ResNet were also 
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15
Time(s)
-2
-1
0
1
A
m
pl
itu
de
0.05 0.1 0.15
Time(s)
-4
-2
0
2
4
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-1
0
1
2
A
m
pl
itu
de
)c()b()a(
)f()e()d(
)i()h()g(
Fig. 11. Raw vibration signals of (a) Normal, (b) SFB, (c) SFI, (d) SFO, (e) GTRC, (f) GTB, (g) GSA, (h) GTD, and (i) CSF. 
Table 2 
Hyper-parameters settings of the proposed model. 
Method Band attention Stage1 Stage2 Stage3 
HA-ResNet 
⎡
⎣
32 × 1,16
3 × 3,16
3 × 3,16
⎤
⎦
⎡
⎣
1 × 1, 16
3 × 3, 16
1 × 1, 16
⎤
⎦× 3 
⎡
⎣
1 × 1, 32
3 × 3, 32
1 × 1, 32
⎤
⎦× 4 
⎡
⎣
1 × 1, 64
3 × 3, 64
1 × 1, 64
⎤
⎦× 3 
Activation function=’relu’, loss function=’categorical crossentropy’, optimizer=’adam’, Initial LR = 0.0015, batch-size = 32, epochs = 30, dropout = 0.5 
* [] refers to a block, where [Height × width, channels] × blocks, the rest are default parameters if not specified; 
** Due to too many hyper-parameters involved in the proposed model, the max-pooling, average-pooling, batch-normalization, and other layers are not listed in the 
table. 
Speed Controller
Driving Motor Accelerometer
Parallel Gearbox
Magnetic Brake
2 Stage Planetary Gearbox
Fig. 10. Drivetrain Diagnostics Simulator testbed. 
Table 1 
Health conditions of DDS simulation experiment. 
Healthy condition Labels Description 
0 Normal Normal condition 
1 SFB Seeded fault on a ball in a bearing 
2 SFI Seeded fault on the inner raceway of a bearing 
3 SFO Seeded fault on the outer raceway of a bearing 
4 GTRC Gear tooth root crack 
5 GTB Gear tooth breakage 
6 GSA Gear surface abrasion 
7 GTD Gear tooth deficiency 
8 CSF Composite seeded bearing 
K. Zhang et al. 
Measurement 179 (2021) 109491
9
evaluated through the wavelet coefficients dataset to verify the effect of 
the hybrid attention improvement. Table 3 shows the confusion matrix 
for the binary classification, which presents the complete evaluation of 
the classification results. 
The accuracy refers to the percentage of the recognition results that 
are correctly judged, i.e., the positive ones are recognized as positive, 
and the negative ones are recognized as negative. So the accuracy is 
defined as follows. 
Accuracy =
TP+ TN
TP+ TN + FN + FP
(14) 
The above experiment was carried out ten times, as shown in the 
Fig. 12, and the average value was taken as the final result, as shown in 
the Table 4. 
It can be seen from the t that under the raw vibration signal sub 
dataset, the recognition accuracy of CNN and WDCNN is 87.13% and 
94.13%, respectively, which shows that WDCNN performs better under 
the raw dataset; and under the wavelet coefficient data set, CNN is 
increased by 8.3% compared with the raw dataset, but WDCNN has 
dropped by 0.99%, indicating that under this data set, wavelet packet 
transform can effectively improve the recognition rate of 2-dimensions 
CNN, but cannot improve WDCNN. In summary, the performance of 
CNN under the wavelet coefficient data set is higher than that of 
WDCNN in the raw signal, which is 1.3% higher. 
Besides, comparing the diagnosis results under the wavelet packet 
coefficient dataset, HA-ResNet has increased by 3.36%, 5.65%, and 
2.29% compared to CNN, WDCNN, and ResNet, respectively, which 
shows that the proposed HA-ResNet method can effectively improve the 
wavelet coefficients. T-distributed stochastic neighbor embedding (t- 
SNE) was applied to the visualization process to more intuitively display 
the feature extraction and recognition process of the method proposed in 
this paper, as shown in Fig. 13. Output features of Stage 1 present pre-
liminary clustering shown in Fig. 13 (a). Besides, it can be seen from 
Fig. 13 (b) and (c), the boundaries of feature clustering are becoming 
clearer in deeper network layers. But still, some fault boundaries are not 
clear enough, such as class 0 and 1. Finally, Fig. 13 (d) shows that each 
category’s fault features have presented well-defined boundaries, which 
verifies the effectiveness of the proposed method in this paper. 
3.2. Engineering applications 
3.2.1. Data description 
This paper’s data set is collected from several 2 MW wind turbine 
gearboxes in a wind farm. And the structure of the wind turbine trans-
mission system in the wind farm is shown in Fig. 14 (a), which mainly 
includes the main shaft, one-stage planetary gear train, and two-stage 
parallel gear train (including intermediate shaft and shaft 2). To 
ensure the excellent and stable operation of the wind turbine gearboxes 
during the long-term service and find faults in time, vibration sensors 
were arranged on the gearboxes. The condition monitoring system 
(CMS) was used to collect vibration data to monitor the wind turbine 
gearboxes’ health status in real-time, saving the vibration data every 4 h. 
The position of the sensor used in this paper is shown in Fig. 14 (b). 
Under severe working conditions like the long-term variable speed and 
variable load, the key components failed, such as the wind turbines’ 
gears and bearings in the wind farm. And after a long-time accumulation 
of faults, seven types of vibration data of wind turbine gearboxes in 
different health states are collected, as shown in Table 5. 
The vibration data under seven health conditions measured from the 
output end on the low-speed shaft (shaft2) of the wind turbine gearbox 
are selected for model training and testing. The sampling rate is set to 
25600 Hz, each sampling time is 5.12 s, and each group’s sampling 
length is 131,072 points. In order to obtain more samples, multiple 
groups of vibration data before gearbox failure are collected. Every 4096 
points is a failure sample, and the total sample size for each type of 
health status is 900. This paper takes the first 200 and 500 sample 
training datasets and 400 samples from the remaining samples as the test 
dataset to conduct method verification experiments. 
3.2.2. Results and analysis 
The WPT was employed to present time–frequency features of raw 
vibration signals, as Fig. 15 shows, whose parameters are consistent with 
the DDS simulation experiment settings. It can be seen from Fig. 15 that 
the coefficient values of various frequency bands are quite different, and 
some frequency bands may have a low contribution to model identifi-
cation. Therefore, Fig. 15 also shows the wavelet coefficients of the 
frequency-band attention mechanism with attention weights. From the 
50
55
60
65
70
75
80
85
90
95
100
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
CNN_raw WDCNN_raw CNN_wave WDCNN_wave Resnet_wave HA-ResNet_wave
Fig. 12. Detailed testing accuracy in ten trials of different methods. 
Table 4 
Comparison of results between HA-ResNet and other methods under DDS 
simulation data. 
Method Raw signals Wavelet coefficients 
CNN WDCNN CNN WDCNN ResNet HA- 
ResNet 
(Ours) 
Accuracy 
(%) 
87.13 
± 2.57 
94.13 ±
1.39 
95.43 
± 1.08 
93.14 ±
0.61 
96.50 
± 1.26 
98.79 ±
0.34 
Table 3 
The confusion matrix for the binary classification. 
Total population Condition positive Condition negative 
Predicted condition 
positive 
True positive (TP) False positive (FP, Type I 
error) 
Predicted condition 
negative 
False negative (FN, Type II 
error) 
True negative (TN) 
K. Zhang et al. 
Measurement 179 (2021) 109491
10
visualization results of the weighted coefficient graph, it can be seen that 
part of the value information of wavelet coefficients with a small 
amplitudeis highlighted, which gives the deep learning model a more 
effective ability to extract the weak fault features of the WTs. 
Similarly, CNN and WDCNN on the raw dataset, CNN and WDCNN 
on the dataset after WPT, and the hybrid attention improved ResNet 
proposed in this paper with the above models were compared, whose 
parameters are consistent with the settings in the DDS simulation 
experiment. The comparison results are shown in the t. To ensure the 
reliability of the results, the average test accuracy of 10 experiments is 
used as the indicator to measure the fault diagnosis effect. What’s more, 
to compare each method’s results more intuitively, a histogram, as 
shown in Fig. 16 (a) and (b), shows the corresponding test accuracy of 
each method with 500 training samples varying with epoch. 
In the experiment based on the raw data, it can be seen from Table. 6 
that the traditional CNN has a recognition rate of 76.88% for 200 
training samples and 84.15% for 500 training samples. In comparison, 
WDCNN is 81.75% and 88.82%, which indicates that the WDCNN with 
the first-layer one-dimensional wide convolution operation can more 
effectively extract the fault features in the raw vibration signal. And 
after performing WPT on the raw data, the accuracy of CNN is increased 
to 89.08% and 93.59%. In comparison, WDCNN is increased to 87.26% 
and 92.90%, which proves that the pre-processing of WPT can effec-
tively improve the recognition accuracy of deep learning. In particular, 
the accuracy of 2D CNN has increased respectively by 12.20% and 
9.44%, which has surpassed that of WDCNN. Although the accuracy of 
WDCNN has been improved, the improvement is more negligible, which 
is respectively 5.51% and 4.08%. The reason is that 2D CNN can extract 
feature information of time and frequency domain simultaneously. 
Accordingly, the above results prove that the raw vibration signal’s 
wavelet transform is a practical pre-processing approach to improve the 
deep learning model’s accuracy. Still, the level of accuracy improvement 
varies with the model structure. 
In the experimental results of the dataset after WPT, the accuracy of 
ResNet increased to 92.45% for 200 training samples and 96.69%for 500 
training samples, which is 3.37% and 3.10% higher than that of the 
traditional CNN, and 5.15% and 3.79% higher than that of WDCNN. And 
it shows that the backbone improvement of the convolutional networks 
has brought a certain degree of improvement to the recognition accu-
racy of deep learning. However, the recognition accuracy of the HA- 
ResNet method achieves 95.97% for 200 training samples and 98.76% 
for 500 training samples. The accuracy is 6.89% and 5.17% higher than 
D
im
 2
Dim 1
D
im
 2
Dim 1
serutaeftuptuo2egatS)b(serutaeftuptuo1egatS)a(
D
im
 2
Dim 1
D
im
 2
Dim 1
serutaeftuptuoreyaltsaL)d(serutaeftuptuo3egatS)c(
Fig. 13. The feature visualization under wavelet coefficients dataset of the proposed method by t-SNE. 
K. Zhang et al. 
Measurement 179 (2021) 109491
11
that of traditional CNN, 8.71% and 5.86% higher than that of WDCNN, 
and 3.52% and 2.07% higher than that of traditional ResNet, 
respectively. 
Fig. 16 (a) shows the mean square error of the recognition results of 
each method. The training samples are smaller, the stability of the model 
test results is lower. Compared to CNN, the model stability of WDCNN is 
better, but ResNet and the HA-ResNet method are more stable; Espe-
cially the mean square error value of the method proposed is only 0.52 
and 0.39, which shows that it is not only better than the comparison 
model in recognition accuracy but also instability. 
To better understand the effect of the methods in the fault dataset of 
wind turbine gearbox, this part visualized the diagnosis results of 500 
training samples of each method by the confusion matrix, as shown in 
Fig. 17. And the 2-dimensional reduced data of the last hidden fully- 
connected layer using t-SNE [39] is shown in Fig. 18. Comparing the 
results in (a) and (b) of Fig. 17, WDCNN performs better than CNN when 
the raw data is used as input for fault diagnosis. The comparison of (a) 
and (b) of Fig. 18 confirms that the feature clustering boundary of 
WDCNN is more discriminatory, although some categories still have 
feature confusion. As for the input data of wavelet coefficients, both 
incorrectly classified samples of CNN and WDCNN are reduced to 
various degrees from the comparison of Fig. 17 (a)–(c), and the clarity of 
the feature boundaries is enhanced from the comparison of Fig. 18 (a)– 
(c). What’s more, Figs. 17 (e) and 18(e) present the recognition results 
and feature clustering of the ResNet, which show that the backbone’s 
improvement effectively improves the model’s performance for wind 
turbine fault diagnosis. Moreover, from Figs. 17 (e) and 18(e), the fea-
tures HA-ResNet model is much more divisible than other comparative 
methods, which present the proposed method can learn discriminative 
features from vibration data. 
4. Conclusion 
This paper presents a hybrid attention deep ResNet-based fault 
diagnosis method of wind turbines gearbox. The proposed method ad-
dresses the challenge of time–frequency information processing and 
weak feature extraction of vibration signals in fault diagnosis. 
(1) Wavelet packet decomposition can effectively present the vibra-
tion signal’s time–frequency information features, and combined 
with 2D CNN, it can more effectively extract the time–frequency 
domain features. In the experimental results of the wind farm’s 
measured fault data, the wavelet packet decomposition has 
improved the recognition accuracy of CNN and WDCNN, 
respectively, especially the improvement of CNN by 12.20% and 
9.44%. 
(2) By the frequency-band attention mechanism improvement of the 
deep learning model, the weak fault features frequency band in 
the wavelet coefficients is highlighted. The fault recognition 
Faultybearing
Elastic support
Elastic support
Generator
Pitch-regulated 
device
Speed detector
Speed detector
Faulty gear
Location of vibration sensor
a Schematic diagram of wind turbines transmission system and 
the location of faulty components
b Shaft 2 output end of gearbox 
(location of sensor installation)
Fig. 14. The schematic diagram of fault location and measuring point. 
Table 5 
Labels and health status descriptions of wind turbine gearboxes. 
Labels Health 
status 
Description 
0 Normal Normal status 
1 BW Ball wear of the front bearing of the planet carrier 
2 BFA Ball falling of the rear bearing of the shaft 2 large gear 
3 RBR Retainer breaking of the rear bearing of the shaft 2 small 
gear 
4 RCR Retainer cracking of the rear bearing of the shaft 2 large 
gear 
5 PC Pitting corrosion of intermediate shaft gear 
6 BFL Ball flaking of the rear bearing of the shaft 2 large gear 
K. Zhang et al. 
Measurement 179 (2021) 109491
12
(a) Attention weights
(b) Normal (c) BW (d) BFA
(e) RBR (f) RCR (g) PC (h) BFL
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
dom
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
Time domain
Fr
eq
ue
nc
y 
do
m
ai
n
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
-1
0
1
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
-1
0
1
2
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
TimimeTimme TTiTimeime TTime
Timmem
0.08 0.1
Timmeme Timmme Timmme
Fig. 15. The visualization of wavelet coefficient band attention mechanism. 
K. Zhang et al. 
Measurement 179 (2021) 109491
13
accuracy under the strong noise background is improved. 
Further, through the channel attention mechanism, channels are 
automatically given different attention weights through back-
propagation, which has improved the learned feature represen-
tations and boosted the ResNet’s performance. In applying the 
measured fault data from a wind farm, the proposed method is 
much better than CNN-raw, WDCNN-raw, CNN-wave, WDCNN- 
wave, and ResNet. 
In a word, the proposed method can effectively extract the time-
–frequency features of the vibration signal, highlight the weak fault 
features of frequency bands and valuable feature information in various 
(a) CNN-raw (b) WDCNN-raw (c) CNN-wave
(d) WDCNN-wave (e) ResNet-wave (f) HA-ResNet-wave (Ours)
Fig. 17. The confusion matrix of test results of each method under 500 training samples. 
Table 6 
Comparison of results between HA-ResNet and other methods. 
Methods Train samples 
200 500 
CNN-raw 76.88 ± 2.29 84.15 ± 1.09 
WDCNN-raw 81.75 ± 1.63 88.82 ± 1.46 
CNN-wave 89.08 ± 2.92 93.59 ± 2.21 
WDCNN- wave 87.26 ± 1.48 92.90 ± 2.50 
ResNet- wave 92.45 ± 1.15 96.69 ± 0.66 
HA-ResNet-wave (Ours) 95.97 ± 0.52 98.76 ± 0.39 
0 10 20 30
Epochs
0
0.2
0.4
0.6
0.8
1
A
cc
ur
ac
y(
%
)
500-CNN-raw
500-WDCNN-raw
500-CNN-wave
500-WDCNN-wave
500-ResNet-wave
500-HA-ResNet-wave
(a) Comparison of fault diagnosis accuracy of each method (b) Test accuracy curve
Fig. 16. Histogram of fault vibration results and test accuracy at 500 training samples. 
K. Zhang et al. 
Measurement 179 (2021) 109491
14
channels. This design effectively improved the wind turbine gearbox’s 
fault diagnosis accuracy through the experimental results of the wind 
farm’s measured fault data. However, the frequency band attention is 
not recommended for fault diagnosis methods using the raw data in the 
time-domain due to the original vibration signal’s time-shifted 
characteristics. 
CRediT authorship contribution statement 
Kai Zhang: Conceptualization, Software, Writing - original draft, 
Validation. Baoping Tang: Methodology, Writing - review & editing. 
Lei Deng: Methodology, Supervision, Data curation. Xiaoli Liu: Visu-
alization, Writing - review & editing. 
Declaration of Competing Interest 
The authors declare that they have no known competing financial 
interests or personal relationships that could have appeared to influence 
the work reported in this paper. 
Acknowledgments 
This research is supported by the Science and Technology Projects in 
Chongqing (cstc2019jcyj-zdxmX0026), the National Natural Science 
Foundation of China (No. 51775065), and the National Key Research 
and Development Project (2020YFB1709800). 
References 
[1] W. Qiao, D. Lu, A Survey on Wind Turbine Condition Monitoring and Fault 
Diagnosis - Part I: Components and Subsystems, IEEE Trans. Ind. Electron. 62 (10) 
(2015) 6536–6545, https://doi.org/10.1109/TIE.2015.2422112. 
[2] W. Qiao, D. Lu, A Survey on Wind Turbine Condition Monitoring and Fault 
Diagnosis - Part II: Signals and Signal Processing Methods, IEEE Trans. Ind. 
Electron. 62 (10) (2015) 6546–6557, https://doi.org/10.1109/TIE.2015.2422394. 
[3] L. Wang, Z. Zhang, H. Long, J. Xu, R. Liu, Wind Turbine Gearbox Failure 
Identification with Deep Neural Networks, IEEE Trans. Ind. Informatics. 13 (3) 
(2017) 1360–1368, https://doi.org/10.1109/TII.2016.2607179. 
[4] K. Zhang, B. Tang, Y. Qin, L. Deng, Fault diagnosis of planetary gearbox using a 
novel semi-supervised method of multiple association layers networks, Mech. Syst. 
Signal Process. 131 (2019) 243–260, https://doi.org/10.1016/j. 
ymssp.2019.05.049. 
[5] F. Chen, Y. Yang, B. Tang, B. Chen, W. Xiao, X. Zhong, Performance degradation 
prediction of mechanical equipment based on optimized multi-kernel relevant 
vector machine and fuzzy information granulation, Meas. J. Int. Meas. Confed. 151 
(2020) 107116, https://doi.org/10.1016/j.measurement.2019.107116. 
[6] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural 
networks, Science (80-.). 313 (2006) 504–507. https://doi.org/10.1126/science.11 
27647. 
[7] C. Choy, J. Gwak, S. Savarese, 4D Spatio-Temporal ConvNets: Minkowski 
Convolutional Neural Networks, 2019. http://arxiv.org/abs/1904.08755. 
[8] P. Gendron, B. Nandram, Best wavelet packet bases in a rate-distortion sense, 
J. Agric. Biol. Environ. Stat. 6 (2001) 160–175, https://doi.org/10.1111/1467- 
9876.00135. 
[9] Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, W. Wu, Feedback network for image super- 
resolution, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2019- 
June, 2019, pp. 3862–3871. https://doi.org/10.1109/CVPR.2019.00399. 
[10] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to 
document recognition, Proc. IEEE. 86 (1998) 2278–2323, https://doi.org/ 
10.1109/5.726791. 
[11] W. Zhang, C. Li, G. Peng, Y. Chen, Z. Zhang, A deep convolutional neural network 
with new training methods for bearing fault diagnosis under noisy environment 
and different working load, Mech. Syst. Signal Process. 100 (2018) 439–453, 
https://doi.org/10.1016/j.ymssp.2017.06.022. 
[12] R. Liu, G. Meng, B. Yang, C. Sun, X. Chen, Dislocated Time Series Convolutional 
Neural Architecture: An Intelligent Fault Diagnosis Approach for Electric Machine, 
IEEE Trans. Ind. Informatics. 13 (3) (2017) 1310–1320, https://doi.org/10.1109/ 
TII.2016.2645238. 
[13] J. Pan, Y. Zi, J. Chen, Z. Zhou, B. Wang, LiftingNet: A Novel Deep Learning 
Network with Layerwise Feature Learning from Noisy Mechanical Data for Fault 
Classification, IEEE Trans. Ind. Electron. 65 (6) (2018) 4973–4982, https://doi. 
org/10.1109/TIE.4110.1109/TIE.2017.2767540. 
[14] L. Jing, M. Zhao, P. Li, X. Xu, A convolutional neural network based feature 
learning and fault diagnosis method for the condition monitoring of gearbox, Meas. 
J. Int. Meas. Confed. 111 (2017) 1–10, https://doi.org/10.1016/j. 
measurement.2017.07.017. 
[15] M. Zhao, S. Zhong, X. Fu, B. Tang, M. Pecht, Deep Residual Shrinkage Networks for 
Fault Diagnosis, IEEE Trans. Ind. Informatics. 16 (7) (2020) 4681–4690, https:// 
doi.org/10.1109/TII.942410.1109/TII.2019.2943898. 
[16] J. Jiao, M. Zhao, J. Lin, C. Ding, Deep coupled dense convolutional network with 
complementary data for intelligent fault diagnosis, IEEE Trans. Ind. Electron. 66 
(12) (2019) 9858–9867, https://doi.org/10.1109/TIE.4110.1109/ 
TIE.2019.2902817. 
[17] W. Zhang, G. Peng, C. Li, Y. Chen, Z. Zhang, A new deep learning model for fault 
diagnosis with good anti-noise and domain adaptation ability on raw vibration 
Dim 1
D
im
 2
Dim 1
D
im
 2
Dim 1
D
im
 2
evaw-NNC)c(NNCDW)b(NNC)a(
Dim 1
D
im
 2
Dim 1
D
im
 2
D
im
 2
Dim 1
(d) WDCNN-wave (e) ResNet-wave(f) HA-ResNet-wave (Ours)
Fig. 18. 2-dimensional visualization under 500 training samples of each method by t-SNE. 
K. Zhang et al. 
https://doi.org/10.1109/TIE.2015.2422112
https://doi.org/10.1109/TIE.2015.2422394
https://doi.org/10.1109/TII.2016.2607179
https://doi.org/10.1016/j.ymssp.2019.05.049
https://doi.org/10.1016/j.ymssp.2019.05.049
https://doi.org/10.1016/j.measurement.2019.107116
https://doi.org/10.1126/science.1127647
https://doi.org/10.1126/science.1127647
http://arxiv.org/abs/1904.08755
https://doi.org/10.1111/1467-9876.00135
https://doi.org/10.1111/1467-9876.00135
https://doi.org/10.1109/CVPR.2019.00399
https://doi.org/10.1109/5.726791
https://doi.org/10.1109/5.726791
https://doi.org/10.1016/j.ymssp.2017.06.022
https://doi.org/10.1109/TII.2016.2645238
https://doi.org/10.1109/TII.2016.2645238
https://doi.org/10.1109/TIE.4110.1109/TIE.2017.2767540
https://doi.org/10.1109/TIE.4110.1109/TIE.2017.2767540
https://doi.org/10.1016/j.measurement.2017.07.017
https://doi.org/10.1016/j.measurement.2017.07.017
https://doi.org/10.1109/TII.942410.1109/TII.2019.2943898
https://doi.org/10.1109/TII.942410.1109/TII.2019.2943898
https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2902817
https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2902817
Measurement 179 (2021) 109491
15
signals, Sensors (Switzerland). 17 (2) (2017) 425, https://doi.org/10.3390/ 
s17020425. 
[18] X. Li, W. Zhang, Q. Ding, Understanding and improving deep learning-based rolling 
bearing fault diagnosis with attention mechanism, Sign. Process. 161 (2019) 
136–154, https://doi.org/10.1016/j.sigpro.2019.03.019. 
[19] D. Peng, Z. Liu, H. Wang, Y. Qin, L. Jia, A novel deeper one-dimensional CNN with 
residual learning for fault diagnosis of wheelset bearings in high-speed trains, IEEE 
Access. 7 (2019) 10278–10293, https://doi.org/10.1109/ 
Access.628763910.1109/ACCESS.2018.2888842. 
[20] X.u. Chang, B. Tang, Q. Tan, L. Deng, F. Zhang, One-dimensional fully decoupled 
networks for fault diagnosis of planetary gearboxes, Mech. Syst. Signal Process. 
141 (2020) 106482, https://doi.org/10.1016/j.ymssp.2019.106482. 
[21] R. Chen, X. Huang, L. Yang, X. Xu, X. Zhang, Y. Zhang, Intelligent fault diagnosis 
method of planetary gearboxes based on convolution neural network and discrete 
wavelet transform, Comput. Ind. 106 (2019) 48–59, https://doi.org/10.1016/j. 
compind.2018.11.003. 
[22] M. Zhao, M. Kang, B. Tang, M. Pecht, Deep Residual Networks with Dynamically 
Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes, IEEE 
Trans. Ind. Electron. 65 (5) (2018) 4290–4300, https://doi.org/10.1109/ 
TIE.2017.2762639. 
[23] Y. Han, B. Tang, L. Deng, Multi-level wavelet packet fusion in dynamic ensemble 
convolutional neural network for fault diagnosis, Meas. J. Int. Meas. Confed. 127 
(2018) 246–255, https://doi.org/10.1016/j.measurement.2018.05.098. 
[24] P. Liang, C. Deng, J. Wu, Z. Yang, Intelligent fault diagnosis of rotating machinery 
via wavelet transform, generative adversarial nets and convolutional neural 
network, Meas. J. Int. Meas. Confed. 159 (2020) 107768, https://doi.org/10.1016/ 
j.measurement.2020.107768. 
[25] M. Zhao, B. Tang, L. Deng, M. Pecht, Multiple wavelet regularized deep residual 
networks for fault diagnosis, Meas. J. Int. Meas. Confed. 152 (2020) 107331, 
https://doi.org/10.1016/j.measurement.2019.107331. 
[26] R. Yan, R.X. Gao, X. Chen, Wavelets for fault diagnosis of rotary machines: A 
review with applications, Sign. Process. 96 (2014) 1–15, https://doi.org/10.1016/ 
j.sigpro.2013.04.015. 
[27] W. Teng, X. Ding, X. Zhang, Y. Liu, Z. Ma, Multi-fault detection and failure analysis 
of wind turbine gearbox using complex wavelet transform, Renew. Energy. 93 
(2016) 591–598, https://doi.org/10.1016/j.renene.2016.03.025. 
[28] I.I.E. Amarouayache, M.N. Saadi, N. Guersi, N. Boutasseta, Bearing fault 
diagnostics using EEMD processing and convolutional neural network methods, 
Int. J. Adv. Manuf. Technol. 107 (9-10) (2020) 4077–4095, https://doi.org/ 
10.1007/s00170-020-05315-9. 
[29] Y. Zhang, G. Pan, B. Chen, J. Han, Y. Zhao, C. Zhang, Short-term wind speed 
prediction model based on GA-ANN improved by VMD, Renew. Energy. 156 (2020) 
1373–1388, https://doi.org/10.1016/j.renene.2019.12.047. 
[30] D. Bahdanau, K.H. Cho, Y. Bengio, Neural machine translation by jointly learning 
to align and translate, in: 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track 
Proc., 2015, pp. 1–15. 
[31] V. Mnih, N. Heess, A. Graves, K. Kavukcuoglu, Recurrent models of visual 
attention, Adv. Neural Inf. Process. Syst. 3 (2014) 2204–2212. 
[32] M.T. Luong, H. Pham, C.D. Manning, Effective approaches to attention-based 
neural machine translation, in: Conf. Proc. - EMNLP 2015 Conf. Empir. Methods 
Nat. Lang. Process., 2015, pp. 1412–1421, https://doi.org/10.18653/v1/d15- 
1166. 
[33] H. Zhang, C. Wu, Z. Zhang, Y. Zhu, Z. Zhang, H. Lin, Y. Sun, T. He, J. Mueller, R. 
Manmatha, M. Li, A. Smola, ResNeSt: Split-Attention Networks, 2020. http://arxiv. 
org/abs/2004.08955. 
[34] Y H Chang, J L Chen, S L He, Intelligent Fault Diagnosis of Satellite Communication 
Antenna via a Novel Meta-learning Network Combining with Attention 
Mechanism, J. Phys. Conf. Ser. 1510 (2020) 012026, https://doi.org/10.1088/ 
1742-6596/1510/1/012026. 
[35] Jiayi Ma, Hao Zhang, Peng Yi, Zhongyuan Wang, SCSCN: A separated channel- 
spatial convolution net with attention for single-view reconstruction, IEEE Trans. 
Ind. Electron. 67 (10) (2020) 8649–8658, https://doi.org/10.1109/ 
TIE.4110.1109/TIE.2019.2950866. 
[36] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, 
ArXiv Prepr. ArXiv1512.03385v1. 7 (2015) 171–180. https://doi.org/10.33 
89/fpsyg.2013.00124. 
[37] K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, Lect. 
Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes 
Bioinformatics). 9908 LNCS (2016) 630–645, https://doi.org/10.1007/978-3-319- 
46493-0_38. 
[38] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by 
reducing internal covariate shift, 32nd Int, Conf. Mach. Learn. ICML 2015 (1) 
(2015) 448–456. 
[39] M. Lin, Q. Chen, S. Yan, Network in network, in: 2nd Int. Conf. Learn. Represent. 
ICLR 2014 - Conf. Track Proc., 2014, pp. 1–10. 
[40] I. Loshchilov, F. Hutter, SGDR: Stochastic gradient descent with warm restarts, in: 
5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc., 2017, pp. 1–16. 
K. Zhang et al. 
https://doi.org/10.3390/s17020425
https://doi.org/10.3390/s17020425
https://doi.org/10.1016/j.sigpro.2019.03.019
https://doi.org/10.1109/Access.628763910.1109/ACCESS.2018.2888842
https://doi.org/10.1109/Access.628763910.1109/ACCESS.2018.2888842
https://doi.org/10.1016/j.ymssp.2019.106482
https://doi.org/10.1016/j.compind.2018.11.003
https://doi.org/10.1016/j.compind.2018.11.003
https://doi.org/10.1109/TIE.2017.2762639
https://doi.org/10.1109/TIE.2017.2762639
https://doi.org/10.1016/j.measurement.2018.05.098
https://doi.org/10.1016/j.measurement.2020.107768
https://doi.org/10.1016/j.measurement.2020.107768
https://doi.org/10.1016/j.measurement.2019.107331
https://doi.org/10.1016/j.sigpro.2013.04.015
https://doi.org/10.1016/j.sigpro.2013.04.015
https://doi.org/10.1016/j.renene.2016.03.025
https://doi.org/10.1007/s00170-020-05315-9
https://doi.org/10.1007/s00170-020-05315-9
https://doi.org/10.1016/j.renene.2019.12.047
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0155
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0155
https://doi.org/10.18653/v1/d15-1166https://doi.org/10.18653/v1/d15-1166
http://arxiv.org/abs/2004.08955
http://arxiv.org/abs/2004.08955
https://doi.org/10.1088/1742-6596/1510/1/012026
https://doi.org/10.1088/1742-6596/1510/1/012026
https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2950866
https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2950866
https://doi.org/10.3389/fpsyg.2013.00124
https://doi.org/10.3389/fpsyg.2013.00124
https://doi.org/10.1007/978-3-319-46493-0_38
https://doi.org/10.1007/978-3-319-46493-0_38
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0190
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0190
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0190
	A hybrid attention improved ResNet based fault diagnosis method of wind turbines gearbox
	1 Introduction
	2 Methods
	2.1 The standard convolution module
	2.2 The ResNet module
	2.3 The proposed hybrid attention ResNet
	2.3.1 Wavelet packet transform
	2.3.2 The proposed hybrid attention ResNet
	2.4 Fault diagnosis procedure based on hybrid attention ResNet
	3 Experiments and results
	3.1 Experimental validation
	3.1.1 Experimental setup and data description
	3.1.2 Results and analysis
	3.2 Engineering applications
	3.2.1 Data description
	3.2.2 Results and analysis
	4 Conclusion
	CRediT authorship contribution statement
	Declaration of Competing Interest
	Acknowledgments
	References

Mais conteúdos dessa disciplina