Buscar

Adaptive Hyperspectral Image Compression

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 3, do total de 8 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 6, do total de 8 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Prévia do material em texto

Adaptive Hyperspectral Image Compression using 
the KLT and Integer KLT Algorithms 
Chafik Egho 
 Surrey Space Centre 
University of Surrey 
Guildford, UK 
C.Egho@surrey.ac.uk 
Tanya Vladimirova 
Department of Engineering 
University of Leicester 
Leicester, UK 
tv29@leicester.ac.uk
 
 
Abstract—The use of the Karhunen-Loéve Transform 
(KLT) for spectral decorrelation in compression of 
hyperspectral satellite images results in improved 
performance. However, the KLT algorithm consists of 
sequential processes, which are computationally intensive, 
such as the covariance matrix computation, eigenvector 
evaluation and matrix multiplications. These processes 
slow down the overall computation of the KLT transform 
significantly. The traditional KLT can only offer lossy 
compression; therefore, a reversible KLT algorithm, 
named the Integer KLT, is used for lossless compression. 
The Integer KLT includes more computational processes 
and, hence, it requires a longer processing time. The 
acceleration of these processes within the context of limited 
power and hardware budgets is the main objective of this 
paper. The computations of each of these processes are 
investigated thoroughly. Subsequently, a novel adaptive 
architecture for the computation of the KLT and the 
Integer KLT is proposed. The proposed system improves 
the traditional KLT performance compared with a 
previous architecture, and offers significant improvement 
for hyperspectral data with a larger spectral dimension. 
The experiments showed an overall improvement of up to 
4.9%, 11.8% and 18.4% for 8, 16 and 32 spectral bands, 
respectively. In addition, this paper addresses novel 
hardware aspects of the Integer KLT implementation. The 
scalability of this hardware architecture can offer much higher 
level of parallel computing than processor platforms 
Keywords—KLT, Integer KLT, Hyperspectral Image compression 
I. INTRODUCTION 
 The detailed and accurate information from hyperspectral 
images is of a significant importance for a wide range of 
applications. However, the large number of spectral bands 
results in an excessively large volume of imaging data. This 
requires increased memory resources to store the raw images 
on board satellites and a significant power consumption to 
transmit them to ground. Both power and memory resources 
are major design constraints in spacecraft embedded 
applications and therefore on-board hyperspectral image 
compression systems are designed to meet strict power and 
data storage capacity budgets. 
Employing spectral decorrelation improves hyperspectral 
data compression significantly. While various techniques have 
been used for spectral decorrelation, the Karhunen-Loéve 
transform (KLT) offers the best performance [1]. However, 
the floating-point outputs of the KLT process incur some data 
loss, which makes the KLT transform lossy. Therefore, a 
modified KLT algorithm, named Integer KLT, was proposed 
in [2]. By introducing different matrix factorizations the 
Integer KLT eliminates the floating-point output and achieves 
a lossless data compression process. Fig. 1 demonstrates the 
KLT and DWT performances for lossy compression; Table 1 
outlines the improvements offered by the Integer KLT over 
other techniques (Multi-Component Compression MCP, 
Linear Prediction LP, Band differential BD and Integer 
Wavelet Transform IWT) for lossless compression. 
The main objective of this research is the design of a 
spectral decorrelation subsystem as part of a hyperspectral 
image compression system on board satellites. A design aimed 
at an FPGA-based SoC platform will be introduced and 
different design parameters will be investigated: processing 
time, accuracy, hardware resources and power consumption. 
A previous related work developed a KLT computing 
system on an FPGA-based SoC platform [3], where a parallel 
approach was proposed for the mapping of the KLT algorithm. 
However, in [3] only hyperspectral data with a limited number 
of bands were considered and low power consumption was not 
targeted. In [4] and [5], the computation of the KLT and the 
Integer KLT were addressed, respectively, achieving in both 
works an acceleration factor of approximately 50%. However, 
the design was targeted at the flash based SmartFusion 
platform, which offers very limited hardware resources. In [6], 
the computation of the Integer KLT algorithm was optimised 
through its parallelization at top level. The work targeted an 
on-board embedded implementation, however, only DSP and 
multiprocessor platforms were considered. In [7], a multiplier-
less scheme of Integer KLT was presented for three 
dimensional (3-D) medical images. Various other works have 
addressed the implementation of the KLT and Integer KLT 
algorithms, such as [8], [9], and [10], however, they were 
aimed at software rather than hardware realizations. 
In this paper, a spectral decorrelation system for adaptive 
lossy/lossless hyperspectral image compression aimed at an 
FPGA system-on-a-chip platform is proposed. The proposed 
978-1-4799-5356-1/14/$31.00 ©2014 IEEE 
112
2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS)
architecture add a significant improvement to the hardware 
proposed in [3], by enhancing the level of parallel computing, 
especially for hyperspectral images with large spectral 
dimension. In addition, the proposed system addresses a novel 
hardware architecture for the Integer KLT computation. 
 
Fig. 1. Comparison of the KLT and the DWT Lossy compression 
performance [14] 
TABLE I THE IMPROVEMENT OF THE RKLT OVER OTHER TECHNIQUES [1] [15] 
Cuprite Jasper 
MCP BD&LP IWT MCP BD&LP IWT 
8.94% 7.7% 8.86% 15.04% 11.84% 14.46% 
II. OVERVIEW 
The Karhunen-Loéve transform is an orthogonal linear 
transform, which is applied to 3-D data sets to de-correlate 
data on different bands. If the 3-D data set is a hyperspectral 
image, the KLT will remove the correlations between the 
image bands, constructing a more compressible data set. Fig. 2 
depicts the computational flows of the KLT and the Integer 
KLT algorithms. It can be seen that the two algorithms share 
some of the computation processes, such as the BandMean, 
the MeanSub, the Covariance Matrix and the Eigenvectors. On 
the other hands, the differences are: 
� The BandMean vector is rounded to the nearest integer in 
the integer KLT 
� A PLUS Matrix factorization is applied to the 
Eigenvectors in the Integer KLT 
� The matrix multiplication (Eigenvectors × MeanSub) of 
the KLT is replaced with a matrix lifting in the Integer 
KLT 
All these processes are performed on large matrices; and 
therefore, they are computationally intensive, which slows 
down the overall computation process significantly. In the 
next section, each of these processes will be analysed and 
investigated thoroughly by breaking them down into 
primitive arithmetic operations. This is followed by a 
comprehensive analysis of the individual computations 
inspecting the possibility and feasibility of suitable 
acceleration techniques. In The mathematical expressions of 
this paper, a hyperspectral image H is considered, where M 
and L represent the spatial coordinates and N the number of 
bands. 
III. COMPUTATIONAL REQUIREMNTS 
While some of the computation processes in Fig. 2 are 
simple repetitive operations, such as the BandMean and the 
MeanSub, others, such as the covariance matrix and 
eigenvector evaluations, are more complicated processes 
involving different sequential operations. 
A. Band Mean,Mean Sub and Covariance Computations 
The Band Mean is the computation of the mean value of 
each spectral band as shown in equation (1), so the output is a 
vector�ܣ א �ܴͳൈܰ�, where N is the number of spectral bands. 
Therefore, the computation of eachelement requires M × L 
additions (accumulation) and a single division. 
ܣ ൌ�
ͳ
ܯ ൈ ܮ
� ෍ ܪ௜ǡ௝������������������ሺͳሻ
௜ୀெǡ௝ୀ௅
௜ୀଵǡ௝ୀଵ
 
The MeanSub process is the normalization of the data, so 
this is computed by subtracting all the mean value of each 
spectral band from all the data of that band. Therefore, this 
process is very straightforward and only requires M× L×N 
subtractions. The computation of the Covariance Matrix is 
shown in equation (2). The second term of this equation is a 
vector multiplication of the BandMean ܣ א �ܴଵൈே�with its 
transpose, so this will only require ேሺேାଵሻ
ଶ
 multiplications. On 
the other hand, the first term of equation (2) is the matrix 
multiplication of the input image with its transpose. The result 
of this multiplication is a symmetric matrix ܴேൈேwhere each 
element requires M × L multiply-accumulate operations. 
Therefore, the first term of equation (2) is much more 
computationally intensive and requires M × L multiply-
accumulate operations for ேሺேାଵሻ
ଶ
 elements. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Fig. 2. The Computation Proces of the KLT and the Integer KLT 
ܥܱܸ ൌ� ଵ
ெൈ௅
�ܪ ൈ ܪ் െ ܣ ൈ ܣ்�������� (2) 
ݓ݄݁ݎ݁�ܥܱܸ א �ܴேൈே 
Hyperspectral data Hyperspectral data 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 KLT Integer KLT 
Band Mean 
Mean Sub 
Covariance 
Eigenvectors 
Rounding 
Mean Sub 
Covariance 
Eigenvectors 
Lifting 
PLUS Eigenvectors 
× 
Mean Sub 
Band Mean 
113
B. Eigenvectors Computation 
In this process, the eigenvalues and eigenvectors of the 
covariance matrix are evaluated. Since the covariance matrix 
is real and symmetric, the Jacobi algorithm will be considered. 
Designed for real and symmetric matrices, the Jacobi 
algorithm is an iterative method that outperforms other 
techniques such as the QR algorithm in term of output 
accuracy. The Jacobi algorithm computes both the eigenvalues 
and the eigenvectors. The eigenvalues are computed by 
diagonalizing the input matrix through iterative 
transformations, which will eventually accumulate the 
eigenvalues on the matrix diagonal. The eigenvectors are the 
results of the multiplications of the transformation matrices. 
Fig. 3 illustrates the Jacobi algorithm, where the 
eigenvalues accumulate on the diagonal of matrix B, while V, 
initially an identity matrix, converges to the eigenvectors. 
 
 
 
 
 
 
 
 
Fig. 3. The Computation Proces of the Jacobi Algorithm 
The computations in Fig. 3 are: 
� The computation of θ, sin θ and cos θ to form the 
transform matrix��ܶሺ௞ሻ. 
� The matrix multiplications of equations (3) and (4). 
 �����ሺ͵ሻ 
ܸ௡ାଵ ൌ �ܸ௡ �ൈ�ܶሺ௡ሻ�������������ሺͶሻ 
Theoretically, a matrix multiplication of � ൈ � dimensions 
requires ܰଷelement multiplications and ሺܰ െ ͳሻܰଶ elements 
addition. Nevertheless, taking into consideration that ܶሺ௞ሻ is 
an identity matrix with only four non-one non-zero 
elements: ௜ܶ௜ ǡ ௜ܶ௝ ǡ ௝ܶ௜ƒ† ௝ܶ௝, the matrix multiplication in (3) 
will alter only two rows and two columns, while in (4) only 
two columns will be altered. Therefore, (3) requires 8N 
multiplications and 4N additions; while (4) requires 4N 
multiplications and 2N additions. Equations (3) and (4) can be 
decomposed into (5) to (9), and (10) and (11), respectively. 
ܤ௜௜
௞ାଵ ൌ�ܤ௜௜�
௞ ሺ…‘•ߠ௞ሻଶ �െ�ܤ௝௝
௞ ሺ•‹ߠ௞ሻଶ െ �ʹܤ௜௝
௞ …‘•ߠ௞ •‹ ߠ௞(5) 
ܤ௝௝
௞ାଵ ൌ�ܤ௜௜�
௞ ሺ•‹ ߠ௞ሻଶ ൅�ܤ௝௝
௞ ሺ…‘•ߠ௞ሻଶ ൅ �ʹܤ௜௝
௞ …‘•ߠ௞ •‹ ߠ௞(6) 
ܤ௜௝௞ାଵ ൌ ܤ௝௜௞ାଵ ൌ 
ሺܤ௜௜௞ାଵ െ ܤ௝௝௞ାଵሻ …‘•ߠ௞ •‹ ߠ௞ ൅ ܤ௝௜�௞ ሺሺ…‘•ߠ௞ሻଶ െ ሺ•‹ ߠ௞ሻଶሻ� (7) 
for ݊� א ሾͳ�ܰሿƒ†�݊ ് ݅ǡ ݊ ് ݆ 
ܤ௜௡௞ାଵ ൌ ܤ௡௜௞ାଵ ൌ ܤ௜௡௞ …‘•ߠ௞ െ�ܤ௝௡௞ •‹ߠ௞ (8) 
ܤ௝௡௞ାଵ ൌ ܤ௡௝௞ାଵ ൌ ܤ௝௡௞ …‘•ߠ௞ ൅�ܤ௜௡௞ •‹ߠ௞ (9) 
for���א ሾͳ�ܰሿ 
௡ܸ௜
௞ାଵ ൌ ௜ܸ௡௞ …‘•ߠ௞ െ� ௝ܸ௡௞ •‹ ߠ௞ (10) 
௡ܸ௝
௞ାଵ ൌ ܤ௝௡௞ …‘•ߠ௞ ൅� ௜ܸ௡௞ •‹ ߠ௞ (11) 
Theoretically, more iterations result in more accurate 
eigenvalues and eigenvectors; however, in [10] it is suggested 
that 10 sweeps ሺ•™‡‡’ ൌ �� ேሺேିଵሻ
ଶ
� iterations) are enough to 
achieve both eigenvectors and eigenvalues. Table II outlines 
the computational requirements of the eigenvectors when 10 
Jacobi sweeps are performed. 
TABLE II COMPUTATIONAL REQUIREMENTS OF THE EIGENVECTOR 
EVALUATION 
Addition ૞ࡺሺࡺ െ ૚ሻሺ૜ ൅ ૛ࡺሻ 
Subtraction ͷܰሺܰ െ ͳሻሺͶ ൅ ʹܰሻ 
Multiplication ͷሺܰ െ ͳሻሺͳͳ ൅ ͺܰሻ 
Division ͷܰሺܰ െ ͳሻ 
Trignometric ͳͷܰሺܰ െ ͳሻ 
C. Matrix Multiplication (Eigenvectors by MeanSub) 
This process is also referred to as the Eigen mapping in the 
literature. As shown in equation (12), this process is the 
multiplication of the eigenvectors with the normalized data. 
Each pixel of the output data is the vector multiplication result 
of an eigenvector with a set of pixel of the same spatial 
coordinates. Therefore, the computation of each pixel required 
N multiply-accumulate operations and the whole image will 
require ܯ ൈ ܮ ൈ ܰଶ multiply-accumulate operations. 
 
൥
࢜૚ ڮ ࢜૚
ڭ ڰ ڭ
࢜࢔ ڮ ࢜࢔
൩�ൈ ቎
ࢎ૚ǡ૚ǡ૚ ǥǥǥ ࢎ࢒ǡ࢓ǡ૚
ڭ ڰ ڭ
ࢎ૚ǡ૚ǡ࢔ ǥǥǥ ࢎ࢒ǡ࢓ǡ࢔
቏ ൌ ࡻ࢛࢚࢖࢛࢚�ࡰࢇ࢚ࢇ (12) 
D. PLUS Matrix Factorization 
The PLUS matrix factorization is used so the eigenvectors 
matrix is represented in four matrices rather than one. These 
four matrices (P, L, U and S) maintain the same information of 
the original eigenvectors matrix, so when applying the Eigen 
mapping process, these matrices are used as the eigenvectors. 
Therefore, by applying the Eigen mapping on these four 
matrices, the output error incurred from the rounding of the 
fractional numbers will be eliminated; hence, maintaining a 
lossless process. This matrix factorization is a sequentially 
iterative process as shown in the Fig. 4. The required number 
of iterations is (N – 1), and in each of these iterations, the 
following processes are performed sequentially: Pivoting, L, U 
and S factorizations. The computational requirements of this 
process are outlined in Table III. 
 
Fig. 4. The PLUS Matrix Factorization Process [6] 
ܶሺ݇ሻ � א � ܴܰൈܰ 
ܤ௡ାଵ ൌ �ܶሺ௡ሻ೅ �ൈ ܤ௡ ൈ �ܶሺ௡ሻ 
ܸ௡ାଵ ൌ �ܸ௡ �ൈ �ܶሺ௡ሻ�� 
114
TABLE III. COMPUTATIONAL REQUIREMNTS OF THE PLUS FACTORISATION 
Addition ෍ ሺࡺ െ ࢑ሻ૛
࢑ୀࡺି૚
࢑ୀ૚
 
Subtraction (N-1) + σ ሺܰ െ ݇ሻሺܰ െ ݇ െ ͳሻ௞ୀேିଵ௞ୀ଴ 
Multiplication 
ܰ ൅ ෍ ሺܰ െ ݇ሻଶ
௞ୀேିଵ
௞ୀଵ
൅ ሺܰ െ ܭሻ ൅ 
ʹ ෍ ሺܰ െ ݇ െ ͳሻ
௞ୀேିଵ
௞ୀଵ
��� 
Division (N-1) +σ ሺܰ െ ݇ െ ͳሻ௞ୀேିଵ௞ୀଵ ሺܰ െ ݇ሻ 
E. Lifting Scheme 
Instead of the matrix multiplication (MeanSub × 
Eigenvectors) in the classical KLT, a lifting technique is 
employed in the Integer KLT. In the lifting technique, the 
PLUS matrices are applied to the MeanSub data set according 
to the reversible lifting scheme presented in [2]. The lifting 
scheme can be mathematically represented as in Equation (13) 
In order to maintain no-floating point output, after each stage, 
the output elements are rounded. So, after multiplying the 
vector X (the image data) with the S matrix the output vector 
is rounded before it is multiplied with the U matrix and so on. 
ࢅ ൌ �ݎ݋ݑ݊݀ሺݎ݋ݑ݊݀ሺݎ݋ݑ݊݀ሺࢄ כ ࡿሻ כ ࢁሻ כ ࡸሻ כ ࡼ� (13) 
where P is a permuting matrix; hence, multiplying by P is 
element swapping, where multiplications are only performed 
by ones and zeroes. Therefore, the permuting is not 
computationally intensive, as it only requires a loop through 
the vector swapping certain elements. The S Matrix is a sparse 
lower triangle matrix; therefore, by applying a zero-check 
technique to the ࢄ כ ࡿ�multiplication, much less elements 
multiplications can be required. The most computationally 
intensive part of the lifting scheme is the multiplication with 
the upper triangle matrix U and the lower triangle matrix L. 
IV. DESIGN CONSIDERATIONS 
A. Hardware Platrform and The Design Tools 
The SRAMbased FPGA Cyclone IV from Altera (DE2-
115 Development Board [13]) is considered in this work. The 
Cyclone FPGA series is designed to meet the needs of low 
power applications, such as space applications. In addition to 
the hardware logic and the SRAM memory resources, this 
platform offers more than 500 hardware embedded multipliers 
that can offer optimal performance and very low power 
consumption, Moreover, the soft processor NIOS II is 
provided by this platform. While the NIOS II is considered a 
midrange processor in term of performance, it will be 
adequate for the proposed architecture as it will only be 
performing the high-level management and scheduling of the 
computation stages rather than the actual computations, which 
will be executed in the FPGA fabric. 
The Altera Quartus II IDE will be used for the 
development. This IDE provides the power estimator 
PowerPlay tool, which will be used for estimating the power 
consumption. Moreover, it offers ready-made and pre-tested 
intellectual property blocks named MegaFunctions; these IP 
blocks are optimised (speed or area) for Altera devices; they 
support simple functions (addition, multiplication, subtraction 
etc.) and complex functions such as trigonometric functions 
for both fixed- and floating point data format. 
B. Input Data 
This work targets hyperspectral images with a large number 
of bands (in the order of 100s). The Airborne Visible / 
Infrared Imaging Spectrometer (AVIRIS) hyperspectral 
images [12] are used as test images since they comprise 224 
spectral bands. In particular, the Cuprite image, shown in Fig. 
5, is employed in the testing presented in this paper. The 
AVRIS Cuprite raw data consist of 14-bit unsigned integers. 
As the main focus of this research is the spectral and not the 
spatial compression, a portion of the AVIRIS Cuprite image, 
comprised of 512 X 512 pixels, will be considered in the 
experimental work. 
 
Fig. 5. The AVIRIS Cuprite Hypersprectral Image [12] 
V. KLT DESIGN APPROACH 
The hardware computational flow of the KLT algorithm is 
shown in Fig. 6. This flow constitutes of 3 stages: the 
covariance matrix and the mean vector are computed in 
parallel in stage 1, in stage 2, the eigenvectors and the 
MeanSub (Normalization) are computed in parallel; the Eigen 
mapping is performed in Stage3. 
 
 
 
 
 
 
 
 
 
 
 
Fig. 6. The KLT Computational Flow on Hardware 
A. Stage 1 
In this stage both the Mean Vectors and the Covariance Matrix 
are computed, equations (1) and (2). The first equation only 
needs accumulators and the division can be performed by wire 
shifting as most spatial dimensions are of power 2, i.e. 128, 
256 or 512. The second equation is more intense, specially the 
Normaliz
ation 
Covarianc
e Matrix 
Eigen 
Mapping 
Eigen-
Vectors 
Mean-
Vector 
Stage 
3 
Stage 
2 
Stage 
1
115
first term (ܪ ൈܪ்ሻ, which is a matrix multiplication with its 
transpose. Therefore, the result of this matrix multiplication is 
a symmetric matrix of size N. The number of elements need to 
be computed is��ܭ ൌ�ேሺேାଵሻ
ଶ
, and each element will require 
ܯ ൈ ܮ multiply-accumulate operation cycles. Fig. 7 illustrates 
the hardware architecture for this stage; each FIFO holds a 
spectral band and both equations (1) and (2) are executed in 
parallel by the accumulators ACC and the multiply-
accumulate units MAC. The optimal number of MACs is 
ܭ ൌ�ேሺேାଵሻ
ଶ
, at which the covariance computation can be 
performed in a single round (ܯ ൈ ܮ MAC operation cycles). 
However, for hyperspectral images N can be large, so R MAC 
units can be utilised, where R is a division of K, so, as the 
used MAC units require 2 clock cycles, the execution of this 
process will require ʹ ൈܯ ൈ ܮ ൈ ௄
ோ
 clock cycles. 
 
Fig. 7. Hardware Architecture of Stage 1 
The used FPGA (Cyclone IV) can execute a subset of the 
hyperspectral AVIRIS data, while the whole image is saved in 
the on-board SDRAM. Larger FPGAs, such as the Cyclone V 
or the Stratix, can offer much larger memory resources; thus, 
larger data subsets can be processed at a time, saving the 
input\output time. Nevertheless, dividing hyperspectral data 
into subsets, known as clustering and tilling, does not only 
reduce the on-chip memory requirements, it also has been 
proven to improve the resilience of the algorithm to Single 
Event Upsets (SEUs) [16] [17]. In [1], a comprehensive 
analysis of the benefits introduced by clustering and tilling is 
presented, where the simulations showed optimal clustering 
sizes for different tilling. These simulations considered 
different AVIRIS data sets and shown that optimal clustering 
varies between 16 or 32 spectral bands. While the used 
Cyclone IV can 32 spectral bands in 3 goes, the Cyclone V 
offers 4 times larger memory resources and can process 32 
bands in single go. 
B. Stage 2 
The normalization and eigenvectors computation are 
executed in this stage; the first is straight forward subtractions 
that can be performed in parallel similar to Stage 1. The 
eigenvectors computation is the most complicated process of 
the KLT computation, and for greater number spectral bands, 
the computation time increases exponentially. Therefore, the 
acceleration of this process is very significant, especially for 
hyperspectral data with large number of spectral bands. 
Fig. 8 illustrates the proposed hardware architecture of the 
Jacobi algorithm. The processed eigenvectors and eigenvalues 
are saved in two sets of FIFOs. These FIFOs are connected to 
a multiplexing logic that manages data fetching and loading to 
and from the computing logic. The computing logic is 
responsible of the main computations, so the Ʌ Computer will 
estimate�Ʌǡ •‹ Ʌ ƒ†…‘• Ʌ; Eq(5,6,7) Computer of Fig.8 will 
compute equations (5),(6) and (7); and 8 multipliers, 2 adders 
and 2 subtractors will compute equations (8), (9), (10) and 
(11). All three parts of the accelerator (FIFOs, Mux Logic and 
computing logic) are controlled and scheduled by the on-chip 
NIOS processor. 
 
Fig. 8. Hardware Architecture of the Eigenvectors Computation 
Since fixed-point implementation will result in a 
significant output error for such an iterative algorithm, floating 
point data format is considered for the eigenvectors 
computation. The Altera MegaFunctions have been utilized 
for most of the computation logic. Since the floating point 
multiplication and additions require 11 and 14 clock cycles, 
executing equations (8), (9), (10) and (11) will take 
approximately ͷͲ ൅ ͳ͸ ൈ ሺܰ െ ʹሻ clock cycles. On the other 
hands, the Ʌ Computer and Eq(5,6,7) Computer will require 
up to 125 clock cycles each. Therefore, a single Jacobi 
iteration will require ͵ͲͲ ൈ ሺܰ െ ʹሻ clock cycles. 
C. Matrix Reduction Technique 
In [11] an online matrix reduction technique was proposed. 
In addition to the acceleration offered by this technique, it also 
offers partial computation, so that some of the 
eigenvectors/eigenvalues are computed before complete 
computations of all the eigenvectors. This can be very 
valuable for the KLT computation on hardware, so that the 
Eigen mapping computation can be performed partially in 
parallel with eigenvectors computations. Table IV outlines the 
convergence of the eigenvectors for the proposed technique, 
where different data sets of the AVIRIS Cuprite were 
considered. It can be noticed that some of the eigenvectors are 
computed and ready for the next stage from sweep 4 and 5. 
116
TABLE IV. EIGENVECTORS CONVERGENCE THROUGH 8 SWEEPS 
Sweep 8×8 16×16 32×32 64×64 
1 - - - - 
2 - - - - 
3 - - - - 
4 1 1 1 1 
5 6 4 3 1 
6 8 13 20 4 
7 8 16 29 30 
8 8 16 32 64 
D. Stage 3 
The Eigen mapping is the matrix multiplication of 
equation (12), so the main operation is multiply-accumulateand the same resources of Stage 1 can be utilised, with very 
similar architecture. Each pixel of the output data will require 
N multiply-accumulate operations, therefore, the throughput 
of a single MAC is ʹܰ ൅ ͳ clock cycles per pixel (2 cycles 
for MAC and one for saving and zeroing the accumulator). 
Since R MAC units are implemented in Stage 1, the 
processing time of this stage will be ܯ ൈ ܮ ൈ ܰ�ሺʹܰ ൅ ͳሻȀܴ 
clock cycles. 
E. Experimental Results 
In order to assess the proposed architecture, it has been 
implemented on the Altera DE2-115 Board. Table V outlines 
the hardware and power requirements of the proposed 
architecture and Table VI outlines the processing time of the 
proposed architecture of different data sets from the AVIRIS 
Cuprite. Since the main objective of this work is the 
acceleration of the computations, the I/O power and time were 
excluded from these results. 
When the hardware multipliers are fully used and Logic 
Elements are utilised instead (R>136), a significant increase in 
the power consumption can be seen, (i.e. when R increases 
from 64 to 128, the power consumption increases by 9%; 
while when it increases from 128 to 256 the power 
consumption increases by more than 100%. This is because 
the hardware multipliers offer much less power consumption. 
(Experimental results show that an LE based multiplier 
requires almost 9 times more power than a hardware 
multiplier). The maximum R could be realised on the DE2-
115 is 136, which is the optimal R for the covariance 
computation of 16 spectral bands. So data with larger number 
of bands will require more than one round for computing the 
covariance matrix. At R = 136, the power consumption was 
1.05 Watt. 
TABLE V. HARDWARE AND POWER RESOURCES OF THE KLT SYSTEM 
R Multipliers % LEs % Power (mW) 
8 324 61 49000 43 767 
16 340 64 49300 43 775 
32 372 70 50000 44 791 
36 380 71 50000 44 794 
64 436 82 51000 45 820 
128 532 100 64000 56 981 
136 532 100 71000 62 1053 
256 532 100 165000 144 2018 
 
 
TABLE VI. THE EXECUTION TIME (MS) OF THE PROPOSED KLT SYSTEM 
 R S 1 S 2 S 3 Overlap Total Time Time % 
ʹͷ͸
ൈ
ʹͷ͸
ൈ
ͳ͸ 
 
17 10.56 
 
 
 
 
 
 
 
3.5 
22 
 
 
 
 
 
 
 
0.5 
1.41 35.56 
34 5.28 11 
2.59 19.28 
68 2.64 5.5 
4.49 11.14 
136 1.32 2.75 
7.07 7.07 
ͷͳʹ
ൈ
ͷͳʹ
ൈ
ͳ͸ 
17 42.24 88 
0.37 133.24 
34 21.12 44 0.73 68.12 
68 10.56 22 
1.4 35.56 
136 5.28 11 2.59 19.28 
ʹͷ͸
ൈ
ʹͷ͸
ൈ
͵ʹ 
33 21.12 
 
 
 
 
 
25.3 
44 
 
 
 
 
 
5.2 
6.1 85.22 
66 10.56 22 9.87 52.66 
132 5.28 11 
14.29 36.38 
264 2.64 5.5 
18.41 28.24 
ͷͳʹ
ൈ
ͷͳʹ
ൈ
͵ʹ 
33 84.48 176 1.85 280.58 
66 42.24 88 
3.46 150.34 
132 21.12 44 
6.1 85.22 
264 10.56 22 9.87 52.66 
 
The processing time of each individual stage is stated where 
different values of R are considered. The overlap time is the 
difference between the complete computation of the 
eigenvectors and when some of the eigenvectors are ready for 
eigenvectors; hence, the overlap presents the extra parallelism 
offered by the proposed Matrix Reduction Technique (MRT). 
The following can be noticed from Table VI: 
� The processing time of Stage 1 and Stage 3 are 
proportional to both the spectral and spatial 
dimensions while Stage 2 is only proportional to the 
spectral dimension 
� The processing time of Stage 2 over the overall 
processing time is much more significant for larger 
number of spectral bands (for 8 spectral bands: up to 
33%; for 16 spectral bands: up to 68%; for 32 spectral 
bands: up to 91%) 
� The overlap time is independent from the spatial 
dimensions and exponentially propotiona1 to the 
spectral dimension, hence it is much more significant 
for larger spectral bands (for 8 spectral bands: up to 
4.89%; for 16 spectral bands: up to 11.82%; for 32 
spectral bands: up to 18.4%). 
Therefore, the experiments results have shown that the 
additional level of parallel computing (overlap) is more 
significant for larger spectral bands (hyperspectral rather than 
multispectral); hence, the benefit of the proposed architecture 
is more noticeable for hyperspectral data. 
117
VI. INTEGER KLT DESIGN APPROACH 
As shown in Fig. 2, the computation process of the Integer 
KLT is very similar to the computation process of the KLT 
algorithm. Therefore, a very similar computational flow will 
be considered, where the Eigen mapping will be performed 
through the PLUS factorization and the lifting scheme. The 
hardware computation of these 2 processes will be discussed. 
A. PLUS Factorization 
The PLUS factorization is an iterative process and offers 
insignificant level of parallel computing. Moreover, this 
process requires floating point implementation as the main 
objective of the Integer KLT is lossless data transformation. 
Therefore, it will not be feasible to execute such a highly 
sequential process in hardware. Table VII outlines the required 
time to perform the PLUS factorization of different data sets 
on the NIOS II and hardwired Cortex M-3 processor. 
TABLE VII. EXECUTION TIME (MS) OF THE PLUS FACTORIZATION 
 8×8 16×16 32×32 
NIOS II 6.7 71 880 
Cortex M-3 1.8 15.2 142.8 
B. Lifting Scheme 
In order to simplify the illustration, Fig. 9 depicts the lifting 
computation in hardware for 4 pixels. This process is executed 
over 3 phases: multiplying by S, U and L, while the 
multiplication by P (pivoting) is computationally insignificant. 
The output results of each phase are saved into intermediate 
FIFOs and while any of the phases executes a certain pixels 
set, the other phases can execute the next or the previous 
pixels sets and so on. The execution of each phase will require 
ܰ operation cycles (computing) plus ܰ buffering cycles 
(filling the buffer of the next phase). Architecture 
Since the multiply-accumulate (MAC) units used in this 
design requires 2 clock cycles, the execution of each phase 
will require (ʹ ൈ ܰ ൅ܰ ൌ ͵ܰሻ clock cycles. Since the order 
of each intermediate FIFO need to be reversed, the computing 
operations cannot be performed in parallel with the buffering. 
Consequently, the processing throughput can reach 
approximately N pixels per (3ܰሻ clock cycles for each unit of 
Fig. 9. Therefore, for an ሺܯ ൈ ܮ ൈ ܰሻ image, the execution 
time for the lifting scheme will be slightly more the ͵ ൈܯ ൈ
ܮ ൈ ܰ clock cycles. Since the L and S are lower triangular 
matrices and U is an upper triangular matrix, resource sharing 
of the MAC units of Fig. 9 is possible. Therefore, the required 
number MAC for a lifting unit (Fig. 9), is ହே
ଶ
െ ʹ; more units 
will lead to faster computation. 
C. Experimental Results 
Table VII outlines the hardware and power requirements 
of the proposed architecture and Table IX outlines the 
processing time of the proposed architecture of different data 
sets from the AVIRIS Cuprite. 
The maximum R could be realised on the target FPGA is 
156, this offers 2 lifting units for 32 spectral bands and more 
than one fourth of the optimal R for the covariance 
computations; so, the computation of the covariance matrix 
will be performed in 4 rounds when 32 spectral bands are 
processes. Since the hardware multipliers offer much less 
power consumption, when the hardware multipliers are fully 
used and Logic Elements are utilised instead (R=152), a 
significant increase in the power consumption can be seen. 
TABLE VIII. HARDWARE AND POWER RESOURCES OF THE INTEGERT KLT 
R Multipliers % LEs % Power (mW) 
18 351 66 51000 45 809 
36 387 73 52000 46 829 
38 391 83 53000 46 841 
76 467 88 54000 47 873 
152 532 100 90000 79 1261 
156 532 100 92000 81 1282 
 
The processing time of each individual stage is stated where 
different values of R are considered. Since the hard CortexM-
3 showed a significant computational performance comparing 
to the soft NIOS II (Table VII), the Cortex M-3 will be 
considered in the assessment as a conceptual approach if the 
proposed architecture were to be implemented on an FPGA 
platform incorporating a hard processor. The processing time 
of the PLUS factorization over the overall processing time is 
more significant for larger number of spectral bands 
While the computational requirements of the PLUS 
factorization are less than the other processes (eigenvectors, 
covariance and lifting), the execution time of this process can 
be an overhead, especially if implemented on less powerful 
processor such as the soft NIOS II. Therefore, it would be 
significantly valuable to have similar factorization process to 
the PLUS but with less sequential computation process, so it 
can be more suitable for hardware implementation 
TABLE IX. EXECUTION TIME OF THE PROPOSED INTEGER KLT SYSTEM 
Spectra
l 
Spatial 
R 
 
 
S 1 S 2 
S3 Total Time 
(ms) PLUS 
Lift 
Cortex NIOS Crtx NIOS 
16
 
256
 
17 10.56 
 
 
 
 
3.5 
 
 
 
 
 
15.2 
 
 
 
 
 
71 
32.5 56.48 112.28 
34 5.28 16.25 37.59 93.39 
68 2.64 8.125 28.145 83.945 
512
 
17 42.24 130 169.82 225.62 
34 21.12 65 94.26 150.06 
68 10.56 32.5 56.48 112.28 
32
 
256
 
33 21.12 
 
 
25.3 
 
 
 
142 
 
 
 
880 
65 310.3 1048.3 
66 10.56 32.5 355.8 1093.8 
512
 
33 84.48 260 505.3 1243.3 
66 42.24 130 453.3 1191.3 
VII ADAPTIVE SYSTEM 
The similarity between the KLT and the Integer KLT can be 
utilised to realise an adaptive system. The adaptive 
computation flow is shown in the Fig. 10; in this flow, most of 
the adaptive overhead is presented in the processor firmware; 
while the hardware overhead is only presented in the Eigen 
mapping (Lifting\ Matrix Multiplication), which also share 
most the hardware resources. 
118
 
Fig. 10. The Computation Flow of the Adaptive KLT/ Integer KLT System 
VIII. CONCLUSION 
In this paper, novel architecture for Adaptive KLT/ Integer 
KLT computation has been proposed. This architecture offers 
an improvement over previous work by introducing a further 
level of parallel computing. The experimental results have 
shown that this improvement is more significant for 
hyperspectral data with a larger number of spectral bands. The 
experiments showed an overall improvement of up to 4.9%, 
11.8% and 18.4% for 8, 16 and 32 spectral bands, respectively 
In addition, the proposed architecture addresses a novel 
hardware architecture for the computation of the Integer KLT, 
which has only been addressed in the context of processor 
platform (DSP, GPU or Desktop) in previous works. The 
scalability of this hardware architecture can offer much higher 
level of parallel computing than processor platforms. 
 
REFERENCES 
[1] Rizuan Mat Noor and Tanya Vladimirova, Integer KLT Design Space 
Exploration for Hyperspectral Satellite Image Compression, Proceedings 
of 5th international Conference on Convergence and Hybrid Information 
Technology, ICHIT'11. 
[2] P. Hao and Q. Shi "Reversible integer KLT for progressive-to-lossless 
compression of multiple component images", Proc IEEE Int. Conf. 
Image Process., p.I-633 , 2003 
[3] M. Fleury, R. P. Self, and A. C. Downton, “Development of a Fine 
Grained Karhunen-Loeve Transform”, Journal of Parallel and 
Distributed Computing, 64(4):520-535, 2004 
[4] C. Egho and T. Vladimirova. “Acceleration of the Karhunen-Loéve 
Transform for System-on-a-Chip Platform”, NASA/ESA Conference on 
Adaptive Hardware and Systems (AHS-2012). June 2012, Nuremberg, 
Germany 
[5] Egho and T. Vladimirova. “Hardware Acceleration of the Integer 
Karhunen-Loève Transform Algorithm for Satellite Image 
Compression”, IEEE International Geoscience and Remote Sensing 
Symposium (IGARSS 2012), July 2012, Munich, Germany. 
[6] N. R. Mat Noor and T. Vladimirova, “Parallel Implementation of 
Lossless Clustered Integer KLT Using OpenMP”, Proc. of 7th 
NASA/ESA Conference on the Adaptive Hardware and Systems (AHS-
2012), 25-28 June 2012, Nuremberg, Germany. 
[7] L. Wang , J. Wu , L. Jiao and G. Shi "3D medical image compression 
based on multiplierless low-complexity RKLT and shape-adaptive 
wavelet transform", Proc. ICIP, pp.2521 -2524 2009 
[8] B. Penna, T. Tillo, E. Magli, and G. Olmo, “Transform Coding 
Techniques for Lossy Hyperspectral Data Compression,” IEEE Geosci. 
Remote Sens., vol. 45, no. 5, pp.1408-1421, May. 2007. 
[9] L. Xin, G. Lei, and L. Zhen, "Reversible Integer Principal Component 
Transform for Hyperspectral Imagery Lossless Compression," in IEEE 
International Conference on Control and Automation, 2007 (ICCA 
2007), Guangzhou, China, 2007, pp. 2968-2972. 
[10] L. Galli and S. Salzo, "Lossless Hyperspectral Compression Using 
KLT," in IEEE International Conference on Geoscience and Remote 
Sensing Symposium, 2004 (IGARSS 2004), Anchorage, USA, 2004. 
[11] C. Egho and T. Vladimirova. “Eigenvectors Computation on a System-
on-Chip Platform for Satellite On-Board Use”, Proc. Of 7th Jordanian 
International Electrical and Electronics Engineering Conference 
(JIEEEC 2011), 11-14 April 2011, pp. 145-149, Amman, Jordan. 
[12] AVIRIS Airborne Visible Infrared Imaging Spectrometer (Cuprite) 
[13] www.altera.com/education/univ/materials/boards/de2-115/unv-de2-115-
board.html 
[14] Qian Du; Fowler, J.E., "Hyperspectral Image Compression Using 
JPEG2000 and Principal Component Analysis," Geoscience and Remote 
Sensing Letters, IEEE , vol.4, no.2, pp.201,205, April 2007 
[15] Blanes, I.; Serra-Sagrista, J., "Clustered Reversible-KLT for Progressive 
Lossyto-Lossless 3d Image Coding," Data Compression Conference, 
2009. DCC '09. , vol.,no., pp.233,242, 16-18 March 2009 
[16] Q. Du , W. Zhu , H. Yang and J. E. Fowler "Segmented principal 
component analysis for parallel compression of hyperspectral imagery", 
IEEE Geosci. Remote Sens. Lett., vol.6, no. 4, pp.713 -717 2009 
[17] T. Vladimirova, M. Meerman, and A. Curiel, "On-Board Compression of 
Multispectral Images for Small Satellites," in IEEE International 
Conference on Geoscience and Remote Sensing Symposium, 2006 
(IGARSS 2006), Denver, Colorado, USA, 2006, pp. 3533-3536. 
Fig. 9. Hardware Architecture of Eigenvectors Computation 
 
119

Outros materiais