Prévia do material em texto
Measurement 179 (2021) 109491
Available online 3 May 2021
0263-2241/© 2021 Elsevier Ltd. All rights reserved.
A hybrid attention improved ResNet based fault diagnosis method of wind
turbines gearbox
Kai Zhang , Baoping Tang *, Lei Deng *, Xiaoli Liu
The State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing 400030, China
A R T I C L E I N F O
Keywords:
Wind turbines
ResNet
Attention mechanism
Fault diagnosis
Wavelet transform
A B S T R A C T
It is significant to boost the performance of fault diagnosis of wind turbine gearboxes. In this paper, a hybrid
attention improved residual network (HA-ResNet) based method is proposed to diagnose the fault of wind tur-
bines gearbox by highlighting the essential frequency bands of wavelet coefficients and the fault features of
convolution channels. First, the paper performed wavelet packet transformation (WPT) on the raw signal and
improved the ResNet by the band attention to highlight features of wavelet coefficients. Second, a fault diagnosis
framework based on channel attention is designed to effectively improve the nonlinear feature extraction ability
of deep convolutional networks. The proposed method is verified by a simulation dataset of the drivetrain
diagnostic simulator (DDS) and the measured data from a wind farm. The results illustrate the superior per-
formance of the HA-ResNet based fault diagnosis method for time–frequency feature extraction of vibration
signals, frequency band information enhancement, and recognition accuracy improvement.
1. Introduction
Wind power has been widely valued worldwide as clean energy, and
the cumulative installed capacity of wind turbines is increasing year by
year. Meanwhile, the research on fault diagnosis of wind turbines (WTs)
gearbox has received widespread attention, especially among wind
turbines’ key components. The reason is that the wind turbine gearbox is
subject to high dynamic repeated fluctuating load, which is one of the
leading causes of wind turbine failures [1,2]. Due to the extended
downtime of wind turbines gearbox failure and high maintenance costs,
it is of great significance to quickly diagnose and locate faults.
At present, the data-driven fault diagnosis method for wind turbines
speed-increasing gearbox driving system, which does not rely on accu-
rate physical models and rich signal processing experience, has
increasingly become a research hotspot in the field of fault diagnosis [3].
Qiao Wei et al. [1,2] reviews the condition monitoring and traditional
machine learning(TML) based fault diagnostic methods for components
and subsystems of WTs. As showing in Fig. 1 [1,2,4], the general process
of TML based methods is data acquisition, signal processing, feature
extraction, feature selection, pattern recognition [5]. Although some
scholars are still committed to relating the research on TML methods and
have made some excellent works, the weak nonlinear feature extraction
capabilities, feature extraction, and feature selection process rely on
manual experience, which cannot meet the intelligent need of the in-
dustrial application.
Since 2006, deep learning proposed by Hinton et al. [6] has been
increasingly widespread due to its highly abstract feature extraction. As
shown in Fig. 1, compared to the TML based fault diagnosis methods,
deep learning [7–9] methods provide a brand-new solution for fault
diagnosis of wind turbine gearbox. On the one hand, they do not need to
combine rich engineering practice experience to manually extract and
filter fault features. On the other hand, these methods possess powerful
nonlinear feature extraction capabilities to feature-coupling and time-
varying features in vibration signals. Thereinto, the convolutional neu-
ral networks(CNN), proposed by Yann LeCun et al. [10], has been
improved into a deep supervised learning algorithm, which effectively
reduces the risk of overfitting by local receptive fields, weight sharing,
and pooling. Due to its significant advantages in feature extraction, deep
Abbreviations: WTs, Wind turbines; TML, Traditional machine learning; CNN, Convolutional neural networks; ResNet, Residual network; WPT, Wavelet packet
transformation; HA-ResNet, Hybrid attention improved resnet; ReLU, Rectified linear unit; BN, Batch normalization; BW, Band weights; DDS, Drivetrain diagnostic
simulator; WDCNN, Convolutional neural networks with wide first-layer kernels; SGDR, Stochastic gradient descent with warm restarts; t-SNE, t-distributed sto-
chastic neighbor embedding; CMS, Condition monitoring system.
* Corresponding authors.
E-mail addresses: bptang@cqu.edu.cn (B. Tang), denglei@cqu.edu.cn (L. Deng).
Contents lists available at ScienceDirect
Measurement
journal homepage: www.elsevier.com/locate/measurement
https://doi.org/10.1016/j.measurement.2021.109491
Received 19 November 2020; Received in revised form 13 April 2021; Accepted 25 April 2021
mailto:bptang@cqu.edu.cn
mailto:denglei@cqu.edu.cn
www.sciencedirect.com/science/journal/02632241
https://www.elsevier.com/locate/measurement
https://doi.org/10.1016/j.measurement.2021.109491
https://doi.org/10.1016/j.measurement.2021.109491
https://doi.org/10.1016/j.measurement.2021.109491
http://crossmark.crossref.org/dialog/?doi=10.1016/j.measurement.2021.109491&domain=pdf
Measurement 179 (2021) 109491
2
CNN has been widely used in fault detection and diagnosis[11–14].
Furthermore, researchers have improved convolutional networks’
backbone to enhance their feature extraction capability[15,16]. These
works prove that CNN and its extended methods an outstanding capa-
bility to deal with fault diagnosis tasks.
However, a big issue dividing the scholars is the way to use vibration
signals in deep neural networks in fault diagnosis. As showing in Fig. 1,
some hold the point that the raw vibration signals should be directly
inputted into the deep neural networks, in view that it will lose part of
the information with pre-processing [17–20]. Researchers holding this
view have successfully trained the deep learning framework with raw
vibration signals and performed various deep neural network improve-
ments for the original one-dimensional vibration signal to enhance the
model’s nonlinear feature extraction capability. For instance, Jun Pan
[13] proposed a deep learning networks (Lifting Net) to learn features
adaptively from raw mechanical data without prior knowledge; Ruonan
Liu [12] designed a novel diagnosis framework based on the charac-
teristics of industrial vibration signals by adding a dislocate layer; W.
Zhang [11] presented a convolution neural networks with training
interference to address the working load changing and noise from the
working environment.
Conversely, another point is that it is hard to effectively extract fault
information from time-domain signals only using traditional deep
learning models due to the variable working conditions and weak fault
features of wind turbine gearbox in engineering practice [21–25]. While
the time–frequency analysis can compensate for this deficiency by
simultaneously presenting the features relationship of one-dimensional
raw signals in the time-domain and the frequency domain. As a classic
time–frequency analysis method, wavelet transformation and its
extended methods have been widely applied in rotary machines
(including wind turbines) with TML methods[4,26,27]. In this regard,
early researchers conducted related research. For instance, Renxiang
Chen et al. [21] presented intelligent fault diagnosis combining CNN and
discrete wavelet transform for wind turbine gearboxes; Minghang Zhao
et al. [22] developed a variant of deep residual network (ResNet), the so-
called ResNet with dynamically weighted wavelet coefficients (DRN +
DWWC), to improve diagnostic performance. Yan Han[23] proposed a
dynamic ensemble convolutional neural networks for fault diagnosis by
fusion of the multi-level waveletcoefficients. Other analysis methods for
time–frequency transformations such as ensemble empirical mode
decomposition [28] and variational mode decomposition[29] can also
be used to feature the feature pre-extraction of vibration signals. How-
ever, there is no consensus about which frequency band contains the
most intrinsic information about a planetary gearbox’s various health
statuses. Moreover, the contribution of wavelet coefficient frequency
bands varies from datasets to datasets.
The attention mechanism (AM) offered a flexible and solution to this
problem and was initially used for machine translation and image
recognition [30–33] to automatically (soft-)search for parts of a source
sentence. These works have laid a solid foundation for the study of
attention mechanisms in the field of fault diagnosis. In the field of fault
diagnosis, work [18] introduced an attention mechanism to help the
deep networks locate information in the raw data segment and extract
the input’s discriminative features. What’s more, Y H Chang et al. [34]
proposed a novel meta-learning network with an adaptive ability to
assess the degree of correlation between various data to realize state
recognition of shipborne antenna under small samples prerequisite.
Inspired by those works, especially DRN + DWWC [22] and channel
attention Networks[34,35], this paper proposes a hybrid attention
improved ResNet (HA-ResNet) based method to boost the performance
of ResNet and to diagnose faults of wind turbine gearbox with high
accuracy. This proposed method highlighted the useful frequency bands
of wavelet coefficients and the important fault feature information of the
convolution kernel channel. First, the paper performs wavelet packet
transformation (WPT) on the original signal to highlight the vibration
signal’s weak features. A fault diagnosis framework based on channel
attention networks is then designed to effectively improve the nonlinear
feature extraction ability of deep convolutional networks. Finally, the
original method is improved on the band attention mechanism. Different
attention weights are assigned to the wavelet coefficient bands to
enhance further the proposed model’s ability to recognize weak fault
features. The contributions of this paper are summarized as follows:
(1) Given that frequency bands’ contribution varies from data to
data, frequency-band attention mechanism improvement was
designed in the wavelet-ResNet fault diagnosis framework to
highlight the weak but crucial frequency band in the wavelet
coefficients adaptively.
(2) Considering the randomness of feature-maps extraction of chan-
nels, the channel attention mechanism is designed to obtain vital
channel features automatically to boost deep networks’
performance.
The organization of the paper is as follows. Section 2 briefly presents
the basic theory of the standard convolution module, ResNet, and the
proposed HA-ResNet, as well as its fault diagnostic procedure. Section 3
analyzes and discusses the experimental diagnosis results and engi-
neering application for wind turbine gearbox. Finally, Section 4 gives
the overall conclusions.
Feature
extraction
& selection
Input raw signal
directly
Input Pre-
processed
signal
Traditional ML
based
SVM,
ANN,
HMM,
K-NN,
et al.
Signal Processing
Statistic analysis,
STFT,
WT,
EMD,
VMD
et al.
DL based methond
SAE,
DBN,
CNN,
ResNet,
et al.
Dataset
Sample,
Label,
et al.
Data acquisition
Vibration signal,
Sound signal,
et al.
Fig. 1. The data-driven fault diagnosis process based on traditional machine learning and deep learning.
K. Zhang et al.
Measurement 179 (2021) 109491
3
2. Methods
Since the method proposed in the paper involves related knowledge
of convolutional neural networks and ResNet, it is necessary to give a
brief introduction to the involved method before proposing the fault
diagnosis framework.
2.1. The standard convolution module
Generally, the Standard Convolution block contains a convolutional
layer and the pooling layer: The convolutional layer is composed of
several convolutional kernels (or filters) to compute feature maps, while
the pooling layer is a sub-sampling operation to improve computation
efficiency and make features more robust. A general convolutional
neural network is formed, shown as Fig. 2[10], through the super-
position of convolutional layers, pooling layers, and fully connected
layers, and the Softmax activation function before the final output layer,
In the convolutional layer, units in each layer are only connected to a
part of units in the previous layer (convolution calculation rules),
namely local connections. The feature map of the input data is calcu-
lated through a different number of filters, and the weights of the filter
are the same for all neurons in the upper layer, namely weights sharing.
Mathematically, the operation of the convolutional layer is formulated
as[16]:
zl+1
j = σ
(
∑
i
xli ∗ w
l
ij + b
l+1
j
)
(1)
where * is the operation of convolution; xl
i is the ith channel of feature
maps in the lth layer; wl
ij and bl+1
j denotes the jth kernel and bias in the
corresponding layer, respectively;zl+1
j represents the jth channel of
feature maps in the (l + 1)-th layer; σ(∙) is the activation function aimed
to implement nonlinear transformation, such as the most commonly
used rectified linear unit (ReLU).
Then, an optimal iteration is realized through error backpropagation
between the predicted label and the real label until the network con-
verges. A loss function constrains this training process. And the most
typically used cross-entropy loss (also used as the loss function in this
study) is expressed as follows[16]:
Ln = −
∑m
k=1
yklog
(
ŷk
)
(2)
where m is the sample number of mini-batches, and n is the iteration
index. y andŷ denote the true labels and predicted labels, which are
generally expressed as one-hot vectors during implementation.
2.2. The ResNet module
As the number of neural network layers continues to deepen, the
difficulty of training the CNN model will gradually increase as well. In
response to the difficulty of training deep CNN models, K. He et al. [36]
of Microsoft Research Asia proposed the ResNet model in 2015. The
ResNet model further reduces deep neural networks’ training difficulty
by designing identity mappings based on the ordinary CNN. They
facilitate the backpropagation of errors and optimize model parameters.
And it has achieved good results in computer vision-related tasks such as
image recognition, image segmentation, target positioning, and so on.
Essentially, the ResNet model is an upgraded version of the CNN
model, which is the core method used in this paper for fault diagnosis of
wind turbine gearbox. It is usually composed of important basic blocks,
including an input layer, a series of convolutional layers (Conv), batch
normalization (BN), identity mappings, and global mean pooling, a fully
connected output layer, and so on. Similar to the standard convolution
module, ResNet also has basic residual blocks. And it is a common basic
residual block shown in Fig. 3[37], in which the main operation path is
“BN → ReLU → Conv → BN → ReLU → Conv” and then adds to the cross-
layer path (identity mapping) to form a complete basic residual block.
2.3. The proposed hybrid attention ResNet
2.3.1. Wavelet packet transform
In this study, the raw vibration signal needs to obtain wavelet co-
efficients through WPT and then input them into subsequent neural
networks. Fig. 4 shows the process of WPT of vibration signals [26]. In
WPT, a function in Hilbert space can be decomposed into a scale func-
tion and a wavelet function from the mathematicperspective. The scale
function can be used to construct a low-pass filter for the raw signal, and
the wavelet function can be used to build a high-pass filter for the raw
signal. From the perspective of signal processing, the signal can be
decomposed into high-frequency components (high-frequency sub-
band) and low-frequency components (low-frequency sub-band) signal
filters. In this method, the high-frequency sub-band is also called the
detailed sub-band, and the low-frequency sub-band is also called the
approximate sub-band.
Input layer
Output layer
Convolutional
layer
Convolutional
layer Pooling layer Pooling layer Fully connected layer
Fig. 2. The Standard Convolution Module.
+
Conv_2
Conv_1
BN
Relu
BN
Relu
Fig. 3. A common basic residual block of ResNet.
K. Zhang et al.
Measurement 179 (2021) 109491
4
The advantage of WPT is that it can simultaneously present the time-
domain and frequency-domain feature changes in the vibration signal.
For example, the time-domain waveform in Fig. 5 (a) and (b) are
simulated signals of formula (3) and formula (4). The weak frequency
component changes in the frequency-domain of Fig. 5 (a) and (b) are
easily submerged in other vibration components in the time-domain. But
it can be presented through the figure comparison of fast Fourier
transform and WPT. In contrast, Fig. 5 (c) and (d) show that the fre-
quency components of the frequency-domain remain unchanged, only
the time sequence of the frequency components is changed. As you can
see from the comparison of Fourier transform and WPT, the fast Fourier
transform is challenging to show the time changes of the time domain
signal, which is easy to distinguish in the wavelet packet coefficients.
f (t) = cos(2π × 10t)+ cos(2π × 25t)+ cos(2π × 50t)+ cos(2π × 100t)
(3)
f (t) = cos(2π × 10t)+ cos(2π × 25t)+ cos(2π × 35t)/2+ cos(2π
× 50t)+ cos(2π × 100t) (4)
2.3.2. The proposed hybrid attention ResNet
This section presents the proposed bearing fault diagnosis method,
including the proposed network architecture, the frequency attention
mechanism, and the frequency attention mechanism.
(1) The proposed networks architecture
This paper’s basic model architecture is ResNet with a 34-layer
structure, as shown in Fig. 6. The model’s mainframe is composed of
the input layer, frequency-band attention unit, batch regularization,
activation function, basic residual block, global mean pooling, dropout,
and fully connected softmax layer and output layer. In the input layer,
the proposed networks take as inputs the wavelet coefficients of raw
vibration signals. The residual stage module is composed of several re-
sidual basic blocks. In the stage module, the identity mapping structures
between the first and the rest residual basic block are slightly different.
In the identity mapping of the first basic residual block, there is an
additional “global pooling-convolution-batch regularization” structure
to match the number of filters, as shown in Fig. 8 (a) and (b).
It is worth noting that, between the convolution and the activation
function, the ResNet used introduces batch normalization[38] to speed
up training and prevent overfitting. In this way, equation (1) is rewritten
as follows[38],
zl+1
j = σ
(
BN
(
∑
i
xli ∗ w
l
ij
))
(5)
where the BN is the batch normalization.
Another trick that separates from the traditional CNN, the ResNet
replaces the fully connected layer by global average pooling[39]. One
advantage of global average pooling over the fully connected layers is
that it is more native to the convolution structure by enforcing corre-
spondences between feature maps and categories[39]. At the same time,
this strategy reduces the parameters of the entire network.
(2) The frequency band attention block
The time–frequency domain coefficients of WPT are explored for
feature extraction and fault diagnosis. It is easy to note that not all bands
in the wavelet coefficients have an equal contribution to fault recogni-
tion. Simultaneously, unlike the time-varying features in the time
domain, different fault types appear relatively fixed in the wavelet co-
efficient frequency band. This is the motivation to employ the attention
mechanism to locate the informative frequency bands concerning
gearboxes’ different health conditions.
As Fig. 7 shows, the frequency band attention module takes the
wavelet coefficients as inputs. To locate valuable information in the
data, the input wavelet coefficients are split into Nb bands, marked as C.
And the input data is,
C = [c1, c2,⋯, cNb ], ci ∈ RN (6)
Then, the attention mechanism initializes a set of weights αi
randomly for ci, which can be used to evaluate the importance of the
corresponding frequency band in fault diagnosis. After that, band
weights (BW) αi will be computed by an attention net f b
att which is a
simple neural network and similarly takes ci as input, and a softmax
function to get a normalized index di. The mathematical description of
this process is as follows,
di = fatt(ci) (7)
S
A
1
D
1
A
D
2
A
A
2
D
D
2
D
A
2
A
A
D
n
A
A
A
n
D
D
D
n
D
A
A
n
Time Domain
Fr
eq
ue
nc
y
D
om
ai
n
0.
02
0.
04
0.
06
0.
08
0.
1
0.
12
0.
14
0.
16
Ti
m
e
-0
.50
0.
5
Amplitude
Ti
m
e(
s)
Fig. 4. The process of WPT of vibration signals.
K. Zhang et al.
Measurement 179 (2021) 109491
5
(b) Simulation signal 2
(c) Simulation signal 3
(d) Simulation signal 4
(a) Simulation signal 1
Fig. 5. Comparative analysis of simulation signals in time domain, frequency domain and time–frequency domain.
K. Zhang et al.
Measurement 179 (2021) 109491
6
αi =
edi
∑N
k=1edk
(8)
At the same time, the original wavelet coefficients are sliced through
a custom Lambda layer. When the attention for the bands is generated,
the enhanced representation vector Vb for the input data can be obtained
as
Vb = [v1, v2,⋯, vNb ], vi = αici (9)
Afterward, as the enhanced representation of C, Vb can be used for
further diagnosis.
(3) The channel attention block
As the name suggests, convolution neural networks rely on convo-
lution operations, using the idea of local receptive fields to fuse spatial
information and channel information to extract features. Each con-
volutional layer has several filters, which can learn the local spatial
connection pattern features, including all channels. In other words, the
convolution filter extracts the fusion information of the space and
channel in the local sensing area. But traditionally, the output based on
the convolutional layer does not consider the variations in the influence
of each channel. Therefore, in this paper, the purpose of adding the
channel attention mechanism is to allow the networks to selectively
enhance channel features with large amounts of information so that
subsequent processing can make full use of these features and suppress
the features of the useless or low-effect channels.
This paper’s channel attention mechanism strategy is shown in Fig. 8
(b) and (c). Except that the first residual basic block and the remaining
residual basic blocks have a slight difference between the identity
mapping structure to match the number of filters, the rest of the prin-
ciples are the same. The process issummarized as follows.
Assuming that the feature Vb after frequency band attention is pro-
cessed by a series of channel transformations
{
f c
1, fc
2,⋯, f c
Nc
}
, the signals
processed by various feature channels are obtained as,
Uc = [u1, u2,⋯, uNc ], ui = f ci (V) (10)
Then, the global contextual information with embedded channel-
wise statistics is gathered by a global average pooling across spatial
dimensions,
sc =
1
H ×W
∑H
i=1
∑W
j=1
GUc (i, j) (11)
where H and W are the block output feature-map sizes. Mapping GUc (i, j)
a global average pooling process. The split weights of channel attention
given by
αci =
⎧
⎪⎪⎪⎨
⎪⎪⎪⎩
exp(sc)
∑Nc
1
exp(sc)
if Nc > 1
1
1 + exp( − sc)
if Nc = 1
(12)
The cardinal channel representations are then concatenated along
the channel dimension,
Vc = Concat
{
αc1u1, αc2u2,…, αci uNc
}
(13)
Afterward, Vc as the optimized representation of Uc can be used for
further diagnosis.
2.4. Fault diagnosis procedure based on hybrid attention ResNet
This paper performs fault diagnosis of wind turbines gearbox based
on the model designed above. The detailed steps, shown in Fig. 9, are
summarized as follows:
(1) Obtain the vibration signal of the wind turbines gearbox through
the acceleration sensor;
(2) Analyze the failure report of wind turbines, label and slice the
corresponding vibration signal, and convert the original vibration
signal in the sample into wavelet coefficients through wavelet
packet transformation;
(3) Divide the data set into training data set and test data set, train
the designed hybrid attention ResNet using the training data set,
and select the best model as the trained model using the test data
set;
(4) Collect real-time vibration signals, perform WPT with the same
parameters as the training set, and give fault diagnosis results
through the trained model.
3. Experiments and results
As a crucial part of wind turbines, the gearbox’s high-precision fault
diagnosis is of great significance. A fault simulation dataset from the
drivetrain diagnostic simulator (DDS) testbed, and a dataset of measured
Conv
Conv
BN
Relu
BN
Relu
Softmax
Dense
Repeat
Split-ban
BW-1
BW-2
BW-N
Split-ban
BC-1
BC-2
BC-N
C
onv
B
N
R
elu
Band_ attention
Co vn
Co vn
BN
Relu
BN
Relu
Sof mt ax
Dense
Repeat
Spl ti -ban
BW-1
BW-2
BW-Nb
Spl ti -ban
BC-1
BC-2
BC-Nb
×
×
×
C
o
vn
B
N
R
elu +
+
Fig. 7. The band attention block.
Band_ attention
Input
BN
Relu
MaxPool2D
Stage
Stage
Stage
Average pooling
Dense
Softmax
Output
Stage
BasicBlock1
BasicBlock2
BasicBlock3
BasicBlockN
n-1
Dropout
Fig. 6. The architecture of the proposed hybrid attention ResNet.
K. Zhang et al.
Measurement 179 (2021) 109491
7
wind turbine gearbox failure in a wind farm were built to validate the
proposed method’s effectiveness. This section will explain the experi-
mental setup and analyze the results.
3.1. Experimental validation
3.1.1. Experimental setup and data description
The DDS testbed performs a fault simulation experiment for the
Training dataset preprocessing Wind Farm
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
2
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
Model training
Real-time Data
Trained model
Results
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
A
m
pl
itu
de
Ba
nd
_
at
te
nt
io
n
In
pu
t
B
N
R
el
u
M
ax
Po
ol
2D
St
ag
e
St
ag
e
St
ag
e
A
ve
ra
ge
p
oo
lin
g
D
en
se
So
ftm
ax
O
ut
pu
t
St
ag
e
Ba
sic
Bl
oc
k1
Ba
sic
Bl
oc
k2
Ba
sic
Bl
oc
k3
Ba
sic
Bl
oc
kN
n-1
D
ro
po
ut
Fig. 9. The fault diagnosis procedure based on hybrid attention ResNet.
BasicBlock1
+
Conv
Conv
BN
Relu
Short cut
Relu
+
Average
pooling
BN
Global average
pooling
Average
pooling
BN
Relu
r-Softmax
+
Split-cha
SU-1
SU-2
SU--N c
Conv
Conv
BN
Relu
Split-cha
SW-1
SW-2
SW-N c
×
×
×
Conv
BN
Relu
Conv
Channel
attention
+
BasicBlockN
Conv
BN
Relu
Short cut
Relu
+
Global average
pooling
Average
pooling
BN
Relu
r-Softmax
+
Split-cha
SU-1
SU-2
SU-N c
Conv
Conv
BN
Relu
Split-cha
SW-1
SW-2
SW-N c
×
×
×
Conv
Channel
attention
+
Conv
BN
Relu
+
(a) First basic block of ResNet block(b) Rest of basic block of ResNet block
Fig. 8. The channel attention ResNet block.
K. Zhang et al.
Measurement 179 (2021) 109491
8
gearbox of wind turbines. DDS consists of a speed controller, a driving
motor, a 2-stage planetary gearbox, a 2-stage parallel shaft gearbox with
rolling bearings, and a programmable magnetic brake, as shown in
Fig. 10, to test the performance of the proposed method. In this exper-
iment, the driving motor’s speed range from 20 Hz to 32.85 Hz, and nine
health conditions (including normal condition) were illustrated in
Table. 1. In each condition, 36 s of vibration signal data were collected
by NI 9234 and NI 9188 at a sampling frequency of 25600HZ, which was
repeated four times.
To prove the effectiveness and advantages of the proposed model,
CNN[10], convolutional neural networks with wide first-layer kernels
(WDCNN) [17], and ResNet [36] are compared in this paper. The hyper-
parameters settings of the model proposed are shown in Table. 2, and
the specific structure and hyper-parameters setting of the remaining
comparison models refer to the original paper. The learning rate adopts
stochastic gradient descent with warm restarts (SGDR) adaptive
adjustment strategy [40]. Its parameters are set as factor = 0.01,
patience = 5, min_lr = 0.00005, and parameters not mentioned in the
table are default values.
Besides, this paper uses Python (3.62) language, TensorFlow
(1.15.4), and Keras (2.3.1) framework to construct and optimize the
proposed model and comparison models. In terms of hardware config-
uration, this paper uses the Intel i7-6700 processor and NVIDIA’s
GeForce GTX 1080 GPU to shorten training and optimization time.
3.1.2. Results and analysis
The collected raw vibration signal of each status was divided into
900 samples without overlapping (an example with 4096 points, as
shown in Fig. 11), and the ratio of the training set to the test set is 5:4.
Gaussian noise with a signal-to-noise ratio of 6db is added to each
sample’s raw vibration signals to simulate real working conditions’
background environment. Then, the WPT was employed to present
time–frequency features of raw vibration signals. An optimal choice of
wavelet basis was db3, which refers to the results of work [4]. 6-level
decomposition was employed in this experiment, considering the reso-
lution balance of the time domain and frequency domain.
After that, the pre-processed data was divided into sub-datasets of
the raw vibration signals and wavelet coefficients. The raw dataset was
used to compare CNN and WDCNN; then, the fault diagnosis accuracy of
CNN, WDCNN, ResNet, and the proposed HA-ResNet were also
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15
Time(s)
-2
-1
0
1
A
m
pl
itu
de
0.05 0.1 0.15
Time(s)
-4
-2
0
2
4
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-2
0
2
A
m
pl
itu
de
0.05 0.1 0.15Time(s)
-1
0
1
2
A
m
pl
itu
de
)c()b()a(
)f()e()d(
)i()h()g(
Fig. 11. Raw vibration signals of (a) Normal, (b) SFB, (c) SFI, (d) SFO, (e) GTRC, (f) GTB, (g) GSA, (h) GTD, and (i) CSF.
Table 2
Hyper-parameters settings of the proposed model.
Method Band attention Stage1 Stage2 Stage3
HA-ResNet
⎡
⎣
32 × 1,16
3 × 3,16
3 × 3,16
⎤
⎦
⎡
⎣
1 × 1, 16
3 × 3, 16
1 × 1, 16
⎤
⎦× 3
⎡
⎣
1 × 1, 32
3 × 3, 32
1 × 1, 32
⎤
⎦× 4
⎡
⎣
1 × 1, 64
3 × 3, 64
1 × 1, 64
⎤
⎦× 3
Activation function=’relu’, loss function=’categorical crossentropy’, optimizer=’adam’, Initial LR = 0.0015, batch-size = 32, epochs = 30, dropout = 0.5
* [] refers to a block, where [Height × width, channels] × blocks, the rest are default parameters if not specified;
** Due to too many hyper-parameters involved in the proposed model, the max-pooling, average-pooling, batch-normalization, and other layers are not listed in the
table.
Speed Controller
Driving Motor Accelerometer
Parallel Gearbox
Magnetic Brake
2 Stage Planetary Gearbox
Fig. 10. Drivetrain Diagnostics Simulator testbed.
Table 1
Health conditions of DDS simulation experiment.
Healthy condition Labels Description
0 Normal Normal condition
1 SFB Seeded fault on a ball in a bearing
2 SFI Seeded fault on the inner raceway of a bearing
3 SFO Seeded fault on the outer raceway of a bearing
4 GTRC Gear tooth root crack
5 GTB Gear tooth breakage
6 GSA Gear surface abrasion
7 GTD Gear tooth deficiency
8 CSF Composite seeded bearing
K. Zhang et al.
Measurement 179 (2021) 109491
9
evaluated through the wavelet coefficients dataset to verify the effect of
the hybrid attention improvement. Table 3 shows the confusion matrix
for the binary classification, which presents the complete evaluation of
the classification results.
The accuracy refers to the percentage of the recognition results that
are correctly judged, i.e., the positive ones are recognized as positive,
and the negative ones are recognized as negative. So the accuracy is
defined as follows.
Accuracy =
TP+ TN
TP+ TN + FN + FP
(14)
The above experiment was carried out ten times, as shown in the
Fig. 12, and the average value was taken as the final result, as shown in
the Table 4.
It can be seen from the t that under the raw vibration signal sub
dataset, the recognition accuracy of CNN and WDCNN is 87.13% and
94.13%, respectively, which shows that WDCNN performs better under
the raw dataset; and under the wavelet coefficient data set, CNN is
increased by 8.3% compared with the raw dataset, but WDCNN has
dropped by 0.99%, indicating that under this data set, wavelet packet
transform can effectively improve the recognition rate of 2-dimensions
CNN, but cannot improve WDCNN. In summary, the performance of
CNN under the wavelet coefficient data set is higher than that of
WDCNN in the raw signal, which is 1.3% higher.
Besides, comparing the diagnosis results under the wavelet packet
coefficient dataset, HA-ResNet has increased by 3.36%, 5.65%, and
2.29% compared to CNN, WDCNN, and ResNet, respectively, which
shows that the proposed HA-ResNet method can effectively improve the
wavelet coefficients. T-distributed stochastic neighbor embedding (t-
SNE) was applied to the visualization process to more intuitively display
the feature extraction and recognition process of the method proposed in
this paper, as shown in Fig. 13. Output features of Stage 1 present pre-
liminary clustering shown in Fig. 13 (a). Besides, it can be seen from
Fig. 13 (b) and (c), the boundaries of feature clustering are becoming
clearer in deeper network layers. But still, some fault boundaries are not
clear enough, such as class 0 and 1. Finally, Fig. 13 (d) shows that each
category’s fault features have presented well-defined boundaries, which
verifies the effectiveness of the proposed method in this paper.
3.2. Engineering applications
3.2.1. Data description
This paper’s data set is collected from several 2 MW wind turbine
gearboxes in a wind farm. And the structure of the wind turbine trans-
mission system in the wind farm is shown in Fig. 14 (a), which mainly
includes the main shaft, one-stage planetary gear train, and two-stage
parallel gear train (including intermediate shaft and shaft 2). To
ensure the excellent and stable operation of the wind turbine gearboxes
during the long-term service and find faults in time, vibration sensors
were arranged on the gearboxes. The condition monitoring system
(CMS) was used to collect vibration data to monitor the wind turbine
gearboxes’ health status in real-time, saving the vibration data every 4 h.
The position of the sensor used in this paper is shown in Fig. 14 (b).
Under severe working conditions like the long-term variable speed and
variable load, the key components failed, such as the wind turbines’
gears and bearings in the wind farm. And after a long-time accumulation
of faults, seven types of vibration data of wind turbine gearboxes in
different health states are collected, as shown in Table 5.
The vibration data under seven health conditions measured from the
output end on the low-speed shaft (shaft2) of the wind turbine gearbox
are selected for model training and testing. The sampling rate is set to
25600 Hz, each sampling time is 5.12 s, and each group’s sampling
length is 131,072 points. In order to obtain more samples, multiple
groups of vibration data before gearbox failure are collected. Every 4096
points is a failure sample, and the total sample size for each type of
health status is 900. This paper takes the first 200 and 500 sample
training datasets and 400 samples from the remaining samples as the test
dataset to conduct method verification experiments.
3.2.2. Results and analysis
The WPT was employed to present time–frequency features of raw
vibration signals, as Fig. 15 shows, whose parameters are consistent with
the DDS simulation experiment settings. It can be seen from Fig. 15 that
the coefficient values of various frequency bands are quite different, and
some frequency bands may have a low contribution to model identifi-
cation. Therefore, Fig. 15 also shows the wavelet coefficients of the
frequency-band attention mechanism with attention weights. From the
50
55
60
65
70
75
80
85
90
95
100
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
CNN_raw WDCNN_raw CNN_wave WDCNN_wave Resnet_wave HA-ResNet_wave
Fig. 12. Detailed testing accuracy in ten trials of different methods.
Table 4
Comparison of results between HA-ResNet and other methods under DDS
simulation data.
Method Raw signals Wavelet coefficients
CNN WDCNN CNN WDCNN ResNet HA-
ResNet
(Ours)
Accuracy
(%)
87.13
± 2.57
94.13 ±
1.39
95.43
± 1.08
93.14 ±
0.61
96.50
± 1.26
98.79 ±
0.34
Table 3
The confusion matrix for the binary classification.
Total population Condition positive Condition negative
Predicted condition
positive
True positive (TP) False positive (FP, Type I
error)
Predicted condition
negative
False negative (FN, Type II
error)
True negative (TN)
K. Zhang et al.
Measurement 179 (2021) 109491
10
visualization results of the weighted coefficient graph, it can be seen that
part of the value information of wavelet coefficients with a small
amplitudeis highlighted, which gives the deep learning model a more
effective ability to extract the weak fault features of the WTs.
Similarly, CNN and WDCNN on the raw dataset, CNN and WDCNN
on the dataset after WPT, and the hybrid attention improved ResNet
proposed in this paper with the above models were compared, whose
parameters are consistent with the settings in the DDS simulation
experiment. The comparison results are shown in the t. To ensure the
reliability of the results, the average test accuracy of 10 experiments is
used as the indicator to measure the fault diagnosis effect. What’s more,
to compare each method’s results more intuitively, a histogram, as
shown in Fig. 16 (a) and (b), shows the corresponding test accuracy of
each method with 500 training samples varying with epoch.
In the experiment based on the raw data, it can be seen from Table. 6
that the traditional CNN has a recognition rate of 76.88% for 200
training samples and 84.15% for 500 training samples. In comparison,
WDCNN is 81.75% and 88.82%, which indicates that the WDCNN with
the first-layer one-dimensional wide convolution operation can more
effectively extract the fault features in the raw vibration signal. And
after performing WPT on the raw data, the accuracy of CNN is increased
to 89.08% and 93.59%. In comparison, WDCNN is increased to 87.26%
and 92.90%, which proves that the pre-processing of WPT can effec-
tively improve the recognition accuracy of deep learning. In particular,
the accuracy of 2D CNN has increased respectively by 12.20% and
9.44%, which has surpassed that of WDCNN. Although the accuracy of
WDCNN has been improved, the improvement is more negligible, which
is respectively 5.51% and 4.08%. The reason is that 2D CNN can extract
feature information of time and frequency domain simultaneously.
Accordingly, the above results prove that the raw vibration signal’s
wavelet transform is a practical pre-processing approach to improve the
deep learning model’s accuracy. Still, the level of accuracy improvement
varies with the model structure.
In the experimental results of the dataset after WPT, the accuracy of
ResNet increased to 92.45% for 200 training samples and 96.69%for 500
training samples, which is 3.37% and 3.10% higher than that of the
traditional CNN, and 5.15% and 3.79% higher than that of WDCNN. And
it shows that the backbone improvement of the convolutional networks
has brought a certain degree of improvement to the recognition accu-
racy of deep learning. However, the recognition accuracy of the HA-
ResNet method achieves 95.97% for 200 training samples and 98.76%
for 500 training samples. The accuracy is 6.89% and 5.17% higher than
D
im
2
Dim 1
D
im
2
Dim 1
serutaeftuptuo2egatS)b(serutaeftuptuo1egatS)a(
D
im
2
Dim 1
D
im
2
Dim 1
serutaeftuptuoreyaltsaL)d(serutaeftuptuo3egatS)c(
Fig. 13. The feature visualization under wavelet coefficients dataset of the proposed method by t-SNE.
K. Zhang et al.
Measurement 179 (2021) 109491
11
that of traditional CNN, 8.71% and 5.86% higher than that of WDCNN,
and 3.52% and 2.07% higher than that of traditional ResNet,
respectively.
Fig. 16 (a) shows the mean square error of the recognition results of
each method. The training samples are smaller, the stability of the model
test results is lower. Compared to CNN, the model stability of WDCNN is
better, but ResNet and the HA-ResNet method are more stable; Espe-
cially the mean square error value of the method proposed is only 0.52
and 0.39, which shows that it is not only better than the comparison
model in recognition accuracy but also instability.
To better understand the effect of the methods in the fault dataset of
wind turbine gearbox, this part visualized the diagnosis results of 500
training samples of each method by the confusion matrix, as shown in
Fig. 17. And the 2-dimensional reduced data of the last hidden fully-
connected layer using t-SNE [39] is shown in Fig. 18. Comparing the
results in (a) and (b) of Fig. 17, WDCNN performs better than CNN when
the raw data is used as input for fault diagnosis. The comparison of (a)
and (b) of Fig. 18 confirms that the feature clustering boundary of
WDCNN is more discriminatory, although some categories still have
feature confusion. As for the input data of wavelet coefficients, both
incorrectly classified samples of CNN and WDCNN are reduced to
various degrees from the comparison of Fig. 17 (a)–(c), and the clarity of
the feature boundaries is enhanced from the comparison of Fig. 18 (a)–
(c). What’s more, Figs. 17 (e) and 18(e) present the recognition results
and feature clustering of the ResNet, which show that the backbone’s
improvement effectively improves the model’s performance for wind
turbine fault diagnosis. Moreover, from Figs. 17 (e) and 18(e), the fea-
tures HA-ResNet model is much more divisible than other comparative
methods, which present the proposed method can learn discriminative
features from vibration data.
4. Conclusion
This paper presents a hybrid attention deep ResNet-based fault
diagnosis method of wind turbines gearbox. The proposed method ad-
dresses the challenge of time–frequency information processing and
weak feature extraction of vibration signals in fault diagnosis.
(1) Wavelet packet decomposition can effectively present the vibra-
tion signal’s time–frequency information features, and combined
with 2D CNN, it can more effectively extract the time–frequency
domain features. In the experimental results of the wind farm’s
measured fault data, the wavelet packet decomposition has
improved the recognition accuracy of CNN and WDCNN,
respectively, especially the improvement of CNN by 12.20% and
9.44%.
(2) By the frequency-band attention mechanism improvement of the
deep learning model, the weak fault features frequency band in
the wavelet coefficients is highlighted. The fault recognition
Faultybearing
Elastic support
Elastic support
Generator
Pitch-regulated
device
Speed detector
Speed detector
Faulty gear
Location of vibration sensor
a Schematic diagram of wind turbines transmission system and
the location of faulty components
b Shaft 2 output end of gearbox
(location of sensor installation)
Fig. 14. The schematic diagram of fault location and measuring point.
Table 5
Labels and health status descriptions of wind turbine gearboxes.
Labels Health
status
Description
0 Normal Normal status
1 BW Ball wear of the front bearing of the planet carrier
2 BFA Ball falling of the rear bearing of the shaft 2 large gear
3 RBR Retainer breaking of the rear bearing of the shaft 2 small
gear
4 RCR Retainer cracking of the rear bearing of the shaft 2 large
gear
5 PC Pitting corrosion of intermediate shaft gear
6 BFL Ball flaking of the rear bearing of the shaft 2 large gear
K. Zhang et al.
Measurement 179 (2021) 109491
12
(a) Attention weights
(b) Normal (c) BW (d) BFA
(e) RBR (f) RCR (g) PC (h) BFL
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
dom
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
Time domain
Fr
eq
ue
nc
y
do
m
ai
n
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
-1
0
1
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
-1
0
1
2
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-1
0
1
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Time
-0.5
0
0.5
A
m
pl
itu
de
TimimeTimme TTiTimeime TTime
Timmem
0.08 0.1
Timmeme Timmme Timmme
Fig. 15. The visualization of wavelet coefficient band attention mechanism.
K. Zhang et al.
Measurement 179 (2021) 109491
13
accuracy under the strong noise background is improved.
Further, through the channel attention mechanism, channels are
automatically given different attention weights through back-
propagation, which has improved the learned feature represen-
tations and boosted the ResNet’s performance. In applying the
measured fault data from a wind farm, the proposed method is
much better than CNN-raw, WDCNN-raw, CNN-wave, WDCNN-
wave, and ResNet.
In a word, the proposed method can effectively extract the time-
–frequency features of the vibration signal, highlight the weak fault
features of frequency bands and valuable feature information in various
(a) CNN-raw (b) WDCNN-raw (c) CNN-wave
(d) WDCNN-wave (e) ResNet-wave (f) HA-ResNet-wave (Ours)
Fig. 17. The confusion matrix of test results of each method under 500 training samples.
Table 6
Comparison of results between HA-ResNet and other methods.
Methods Train samples
200 500
CNN-raw 76.88 ± 2.29 84.15 ± 1.09
WDCNN-raw 81.75 ± 1.63 88.82 ± 1.46
CNN-wave 89.08 ± 2.92 93.59 ± 2.21
WDCNN- wave 87.26 ± 1.48 92.90 ± 2.50
ResNet- wave 92.45 ± 1.15 96.69 ± 0.66
HA-ResNet-wave (Ours) 95.97 ± 0.52 98.76 ± 0.39
0 10 20 30
Epochs
0
0.2
0.4
0.6
0.8
1
A
cc
ur
ac
y(
%
)
500-CNN-raw
500-WDCNN-raw
500-CNN-wave
500-WDCNN-wave
500-ResNet-wave
500-HA-ResNet-wave
(a) Comparison of fault diagnosis accuracy of each method (b) Test accuracy curve
Fig. 16. Histogram of fault vibration results and test accuracy at 500 training samples.
K. Zhang et al.
Measurement 179 (2021) 109491
14
channels. This design effectively improved the wind turbine gearbox’s
fault diagnosis accuracy through the experimental results of the wind
farm’s measured fault data. However, the frequency band attention is
not recommended for fault diagnosis methods using the raw data in the
time-domain due to the original vibration signal’s time-shifted
characteristics.
CRediT authorship contribution statement
Kai Zhang: Conceptualization, Software, Writing - original draft,
Validation. Baoping Tang: Methodology, Writing - review & editing.
Lei Deng: Methodology, Supervision, Data curation. Xiaoli Liu: Visu-
alization, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
the work reported in this paper.
Acknowledgments
This research is supported by the Science and Technology Projects in
Chongqing (cstc2019jcyj-zdxmX0026), the National Natural Science
Foundation of China (No. 51775065), and the National Key Research
and Development Project (2020YFB1709800).
References
[1] W. Qiao, D. Lu, A Survey on Wind Turbine Condition Monitoring and Fault
Diagnosis - Part I: Components and Subsystems, IEEE Trans. Ind. Electron. 62 (10)
(2015) 6536–6545, https://doi.org/10.1109/TIE.2015.2422112.
[2] W. Qiao, D. Lu, A Survey on Wind Turbine Condition Monitoring and Fault
Diagnosis - Part II: Signals and Signal Processing Methods, IEEE Trans. Ind.
Electron. 62 (10) (2015) 6546–6557, https://doi.org/10.1109/TIE.2015.2422394.
[3] L. Wang, Z. Zhang, H. Long, J. Xu, R. Liu, Wind Turbine Gearbox Failure
Identification with Deep Neural Networks, IEEE Trans. Ind. Informatics. 13 (3)
(2017) 1360–1368, https://doi.org/10.1109/TII.2016.2607179.
[4] K. Zhang, B. Tang, Y. Qin, L. Deng, Fault diagnosis of planetary gearbox using a
novel semi-supervised method of multiple association layers networks, Mech. Syst.
Signal Process. 131 (2019) 243–260, https://doi.org/10.1016/j.
ymssp.2019.05.049.
[5] F. Chen, Y. Yang, B. Tang, B. Chen, W. Xiao, X. Zhong, Performance degradation
prediction of mechanical equipment based on optimized multi-kernel relevant
vector machine and fuzzy information granulation, Meas. J. Int. Meas. Confed. 151
(2020) 107116, https://doi.org/10.1016/j.measurement.2019.107116.
[6] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural
networks, Science (80-.). 313 (2006) 504–507. https://doi.org/10.1126/science.11
27647.
[7] C. Choy, J. Gwak, S. Savarese, 4D Spatio-Temporal ConvNets: Minkowski
Convolutional Neural Networks, 2019. http://arxiv.org/abs/1904.08755.
[8] P. Gendron, B. Nandram, Best wavelet packet bases in a rate-distortion sense,
J. Agric. Biol. Environ. Stat. 6 (2001) 160–175, https://doi.org/10.1111/1467-
9876.00135.
[9] Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, W. Wu, Feedback network for image super-
resolution, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2019-
June, 2019, pp. 3862–3871. https://doi.org/10.1109/CVPR.2019.00399.
[10] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to
document recognition, Proc. IEEE. 86 (1998) 2278–2323, https://doi.org/
10.1109/5.726791.
[11] W. Zhang, C. Li, G. Peng, Y. Chen, Z. Zhang, A deep convolutional neural network
with new training methods for bearing fault diagnosis under noisy environment
and different working load, Mech. Syst. Signal Process. 100 (2018) 439–453,
https://doi.org/10.1016/j.ymssp.2017.06.022.
[12] R. Liu, G. Meng, B. Yang, C. Sun, X. Chen, Dislocated Time Series Convolutional
Neural Architecture: An Intelligent Fault Diagnosis Approach for Electric Machine,
IEEE Trans. Ind. Informatics. 13 (3) (2017) 1310–1320, https://doi.org/10.1109/
TII.2016.2645238.
[13] J. Pan, Y. Zi, J. Chen, Z. Zhou, B. Wang, LiftingNet: A Novel Deep Learning
Network with Layerwise Feature Learning from Noisy Mechanical Data for Fault
Classification, IEEE Trans. Ind. Electron. 65 (6) (2018) 4973–4982, https://doi.
org/10.1109/TIE.4110.1109/TIE.2017.2767540.
[14] L. Jing, M. Zhao, P. Li, X. Xu, A convolutional neural network based feature
learning and fault diagnosis method for the condition monitoring of gearbox, Meas.
J. Int. Meas. Confed. 111 (2017) 1–10, https://doi.org/10.1016/j.
measurement.2017.07.017.
[15] M. Zhao, S. Zhong, X. Fu, B. Tang, M. Pecht, Deep Residual Shrinkage Networks for
Fault Diagnosis, IEEE Trans. Ind. Informatics. 16 (7) (2020) 4681–4690, https://
doi.org/10.1109/TII.942410.1109/TII.2019.2943898.
[16] J. Jiao, M. Zhao, J. Lin, C. Ding, Deep coupled dense convolutional network with
complementary data for intelligent fault diagnosis, IEEE Trans. Ind. Electron. 66
(12) (2019) 9858–9867, https://doi.org/10.1109/TIE.4110.1109/
TIE.2019.2902817.
[17] W. Zhang, G. Peng, C. Li, Y. Chen, Z. Zhang, A new deep learning model for fault
diagnosis with good anti-noise and domain adaptation ability on raw vibration
Dim 1
D
im
2
Dim 1
D
im
2
Dim 1
D
im
2
evaw-NNC)c(NNCDW)b(NNC)a(
Dim 1
D
im
2
Dim 1
D
im
2
D
im
2
Dim 1
(d) WDCNN-wave (e) ResNet-wave(f) HA-ResNet-wave (Ours)
Fig. 18. 2-dimensional visualization under 500 training samples of each method by t-SNE.
K. Zhang et al.
https://doi.org/10.1109/TIE.2015.2422112
https://doi.org/10.1109/TIE.2015.2422394
https://doi.org/10.1109/TII.2016.2607179
https://doi.org/10.1016/j.ymssp.2019.05.049
https://doi.org/10.1016/j.ymssp.2019.05.049
https://doi.org/10.1016/j.measurement.2019.107116
https://doi.org/10.1126/science.1127647
https://doi.org/10.1126/science.1127647
http://arxiv.org/abs/1904.08755
https://doi.org/10.1111/1467-9876.00135
https://doi.org/10.1111/1467-9876.00135
https://doi.org/10.1109/CVPR.2019.00399
https://doi.org/10.1109/5.726791
https://doi.org/10.1109/5.726791
https://doi.org/10.1016/j.ymssp.2017.06.022
https://doi.org/10.1109/TII.2016.2645238
https://doi.org/10.1109/TII.2016.2645238
https://doi.org/10.1109/TIE.4110.1109/TIE.2017.2767540
https://doi.org/10.1109/TIE.4110.1109/TIE.2017.2767540
https://doi.org/10.1016/j.measurement.2017.07.017
https://doi.org/10.1016/j.measurement.2017.07.017
https://doi.org/10.1109/TII.942410.1109/TII.2019.2943898
https://doi.org/10.1109/TII.942410.1109/TII.2019.2943898
https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2902817
https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2902817
Measurement 179 (2021) 109491
15
signals, Sensors (Switzerland). 17 (2) (2017) 425, https://doi.org/10.3390/
s17020425.
[18] X. Li, W. Zhang, Q. Ding, Understanding and improving deep learning-based rolling
bearing fault diagnosis with attention mechanism, Sign. Process. 161 (2019)
136–154, https://doi.org/10.1016/j.sigpro.2019.03.019.
[19] D. Peng, Z. Liu, H. Wang, Y. Qin, L. Jia, A novel deeper one-dimensional CNN with
residual learning for fault diagnosis of wheelset bearings in high-speed trains, IEEE
Access. 7 (2019) 10278–10293, https://doi.org/10.1109/
Access.628763910.1109/ACCESS.2018.2888842.
[20] X.u. Chang, B. Tang, Q. Tan, L. Deng, F. Zhang, One-dimensional fully decoupled
networks for fault diagnosis of planetary gearboxes, Mech. Syst. Signal Process.
141 (2020) 106482, https://doi.org/10.1016/j.ymssp.2019.106482.
[21] R. Chen, X. Huang, L. Yang, X. Xu, X. Zhang, Y. Zhang, Intelligent fault diagnosis
method of planetary gearboxes based on convolution neural network and discrete
wavelet transform, Comput. Ind. 106 (2019) 48–59, https://doi.org/10.1016/j.
compind.2018.11.003.
[22] M. Zhao, M. Kang, B. Tang, M. Pecht, Deep Residual Networks with Dynamically
Weighted Wavelet Coefficients for Fault Diagnosis of Planetary Gearboxes, IEEE
Trans. Ind. Electron. 65 (5) (2018) 4290–4300, https://doi.org/10.1109/
TIE.2017.2762639.
[23] Y. Han, B. Tang, L. Deng, Multi-level wavelet packet fusion in dynamic ensemble
convolutional neural network for fault diagnosis, Meas. J. Int. Meas. Confed. 127
(2018) 246–255, https://doi.org/10.1016/j.measurement.2018.05.098.
[24] P. Liang, C. Deng, J. Wu, Z. Yang, Intelligent fault diagnosis of rotating machinery
via wavelet transform, generative adversarial nets and convolutional neural
network, Meas. J. Int. Meas. Confed. 159 (2020) 107768, https://doi.org/10.1016/
j.measurement.2020.107768.
[25] M. Zhao, B. Tang, L. Deng, M. Pecht, Multiple wavelet regularized deep residual
networks for fault diagnosis, Meas. J. Int. Meas. Confed. 152 (2020) 107331,
https://doi.org/10.1016/j.measurement.2019.107331.
[26] R. Yan, R.X. Gao, X. Chen, Wavelets for fault diagnosis of rotary machines: A
review with applications, Sign. Process. 96 (2014) 1–15, https://doi.org/10.1016/
j.sigpro.2013.04.015.
[27] W. Teng, X. Ding, X. Zhang, Y. Liu, Z. Ma, Multi-fault detection and failure analysis
of wind turbine gearbox using complex wavelet transform, Renew. Energy. 93
(2016) 591–598, https://doi.org/10.1016/j.renene.2016.03.025.
[28] I.I.E. Amarouayache, M.N. Saadi, N. Guersi, N. Boutasseta, Bearing fault
diagnostics using EEMD processing and convolutional neural network methods,
Int. J. Adv. Manuf. Technol. 107 (9-10) (2020) 4077–4095, https://doi.org/
10.1007/s00170-020-05315-9.
[29] Y. Zhang, G. Pan, B. Chen, J. Han, Y. Zhao, C. Zhang, Short-term wind speed
prediction model based on GA-ANN improved by VMD, Renew. Energy. 156 (2020)
1373–1388, https://doi.org/10.1016/j.renene.2019.12.047.
[30] D. Bahdanau, K.H. Cho, Y. Bengio, Neural machine translation by jointly learning
to align and translate, in: 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track
Proc., 2015, pp. 1–15.
[31] V. Mnih, N. Heess, A. Graves, K. Kavukcuoglu, Recurrent models of visual
attention, Adv. Neural Inf. Process. Syst. 3 (2014) 2204–2212.
[32] M.T. Luong, H. Pham, C.D. Manning, Effective approaches to attention-based
neural machine translation, in: Conf. Proc. - EMNLP 2015 Conf. Empir. Methods
Nat. Lang. Process., 2015, pp. 1412–1421, https://doi.org/10.18653/v1/d15-
1166.
[33] H. Zhang, C. Wu, Z. Zhang, Y. Zhu, Z. Zhang, H. Lin, Y. Sun, T. He, J. Mueller, R.
Manmatha, M. Li, A. Smola, ResNeSt: Split-Attention Networks, 2020. http://arxiv.
org/abs/2004.08955.
[34] Y H Chang, J L Chen, S L He, Intelligent Fault Diagnosis of Satellite Communication
Antenna via a Novel Meta-learning Network Combining with Attention
Mechanism, J. Phys. Conf. Ser. 1510 (2020) 012026, https://doi.org/10.1088/
1742-6596/1510/1/012026.
[35] Jiayi Ma, Hao Zhang, Peng Yi, Zhongyuan Wang, SCSCN: A separated channel-
spatial convolution net with attention for single-view reconstruction, IEEE Trans.
Ind. Electron. 67 (10) (2020) 8649–8658, https://doi.org/10.1109/
TIE.4110.1109/TIE.2019.2950866.
[36] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition,
ArXiv Prepr. ArXiv1512.03385v1. 7 (2015) 171–180. https://doi.org/10.33
89/fpsyg.2013.00124.
[37] K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, Lect.
Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics). 9908 LNCS (2016) 630–645, https://doi.org/10.1007/978-3-319-
46493-0_38.
[38] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by
reducing internal covariate shift, 32nd Int, Conf. Mach. Learn. ICML 2015 (1)
(2015) 448–456.
[39] M. Lin, Q. Chen, S. Yan, Network in network, in: 2nd Int. Conf. Learn. Represent.
ICLR 2014 - Conf. Track Proc., 2014, pp. 1–10.
[40] I. Loshchilov, F. Hutter, SGDR: Stochastic gradient descent with warm restarts, in:
5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc., 2017, pp. 1–16.
K. Zhang et al.
https://doi.org/10.3390/s17020425
https://doi.org/10.3390/s17020425
https://doi.org/10.1016/j.sigpro.2019.03.019
https://doi.org/10.1109/Access.628763910.1109/ACCESS.2018.2888842
https://doi.org/10.1109/Access.628763910.1109/ACCESS.2018.2888842
https://doi.org/10.1016/j.ymssp.2019.106482
https://doi.org/10.1016/j.compind.2018.11.003
https://doi.org/10.1016/j.compind.2018.11.003
https://doi.org/10.1109/TIE.2017.2762639
https://doi.org/10.1109/TIE.2017.2762639
https://doi.org/10.1016/j.measurement.2018.05.098
https://doi.org/10.1016/j.measurement.2020.107768
https://doi.org/10.1016/j.measurement.2020.107768
https://doi.org/10.1016/j.measurement.2019.107331
https://doi.org/10.1016/j.sigpro.2013.04.015
https://doi.org/10.1016/j.sigpro.2013.04.015
https://doi.org/10.1016/j.renene.2016.03.025
https://doi.org/10.1007/s00170-020-05315-9
https://doi.org/10.1007/s00170-020-05315-9
https://doi.org/10.1016/j.renene.2019.12.047
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0155
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0155
https://doi.org/10.18653/v1/d15-1166https://doi.org/10.18653/v1/d15-1166
http://arxiv.org/abs/2004.08955
http://arxiv.org/abs/2004.08955
https://doi.org/10.1088/1742-6596/1510/1/012026
https://doi.org/10.1088/1742-6596/1510/1/012026
https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2950866
https://doi.org/10.1109/TIE.4110.1109/TIE.2019.2950866
https://doi.org/10.3389/fpsyg.2013.00124
https://doi.org/10.3389/fpsyg.2013.00124
https://doi.org/10.1007/978-3-319-46493-0_38
https://doi.org/10.1007/978-3-319-46493-0_38
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0190
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0190
http://refhub.elsevier.com/S0263-2241(21)00474-7/h0190
A hybrid attention improved ResNet based fault diagnosis method of wind turbines gearbox
1 Introduction
2 Methods
2.1 The standard convolution module
2.2 The ResNet module
2.3 The proposed hybrid attention ResNet
2.3.1 Wavelet packet transform
2.3.2 The proposed hybrid attention ResNet
2.4 Fault diagnosis procedure based on hybrid attention ResNet
3 Experiments and results
3.1 Experimental validation
3.1.1 Experimental setup and data description
3.1.2 Results and analysis
3.2 Engineering applications
3.2.1 Data description
3.2.2 Results and analysis
4 Conclusion
CRediT authorship contribution statement
Declaration of Competing Interest
Acknowledgments
References