Adaptive Hyperspectral Image Compression

Computação Gráfica

•

UFPE

Josias J. Silva

20.09.2016

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 3, do total de 8 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 6, do total de 8 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

E aí, curtiu este material?

Ajude a incentivar outros estudantes a melhorar o conteúdo

Gostou desse material? Compartilhe! 🧡

Computação Gráfica

4.062 Materiais compartilhados

Baixe o app para aproveitar ainda mais

Leia os materiais offline, sem usar a internet. Além de vários outros recursos!

Prévia do material em texto

Adaptive Hyperspectral Image Compression using
the KLT and Integer KLT Algorithms
Chafik Egho
Surrey Space Centre
University of Surrey
Guildford, UK
C.Egho@surrey.ac.uk
Tanya Vladimirova
Department of Engineering
University of Leicester
Leicester, UK
tv29@leicester.ac.uk

Abstract—The use of the Karhunen-Loéve Transform
(KLT) for spectral decorrelation in compression of
hyperspectral satellite images results in improved
performance. However, the KLT algorithm consists of
sequential processes, which are computationally intensive,
such as the covariance matrix computation, eigenvector
evaluation and matrix multiplications. These processes
slow down the overall computation of the KLT transform
significantly. The traditional KLT can only offer lossy
compression; therefore, a reversible KLT algorithm,
named the Integer KLT, is used for lossless compression.
The Integer KLT includes more computational processes
and, hence, it requires a longer processing time. The
acceleration of these processes within the context of limited
power and hardware budgets is the main objective of this
paper. The computations of each of these processes are
investigated thoroughly. Subsequently, a novel adaptive
architecture for the computation of the KLT and the
Integer KLT is proposed. The proposed system improves
the traditional KLT performance compared with a
previous architecture, and offers significant improvement
for hyperspectral data with a larger spectral dimension.
The experiments showed an overall improvement of up to
4.9%, 11.8% and 18.4% for 8, 16 and 32 spectral bands,
respectively. In addition, this paper addresses novel
hardware aspects of the Integer KLT implementation. The
scalability of this hardware architecture can offer much higher
level of parallel computing than processor platforms
Keywords—KLT, Integer KLT, Hyperspectral Image compression
I. INTRODUCTION
The detailed and accurate information from hyperspectral
images is of a significant importance for a wide range of
applications. However, the large number of spectral bands
results in an excessively large volume of imaging data. This
requires increased memory resources to store the raw images
on board satellites and a significant power consumption to
transmit them to ground. Both power and memory resources
are major design constraints in spacecraft embedded
applications and therefore on-board hyperspectral image
compression systems are designed to meet strict power and
data storage capacity budgets.
Employing spectral decorrelation improves hyperspectral
data compression significantly. While various techniques have
been used for spectral decorrelation, the Karhunen-Loéve
transform (KLT) offers the best performance [1]. However,
the floating-point outputs of the KLT process incur some data
loss, which makes the KLT transform lossy. Therefore, a
modified KLT algorithm, named Integer KLT, was proposed
in [2]. By introducing different matrix factorizations the
Integer KLT eliminates the floating-point output and achieves
a lossless data compression process. Fig. 1 demonstrates the
KLT and DWT performances for lossy compression; Table 1
outlines the improvements offered by the Integer KLT over
other techniques (Multi-Component Compression MCP,
Linear Prediction LP, Band differential BD and Integer
Wavelet Transform IWT) for lossless compression.
The main objective of this research is the design of a
spectral decorrelation subsystem as part of a hyperspectral
image compression system on board satellites. A design aimed
at an FPGA-based SoC platform will be introduced and
different design parameters will be investigated: processing
time, accuracy, hardware resources and power consumption.
A previous related work developed a KLT computing
system on an FPGA-based SoC platform [3], where a parallel
approach was proposed for the mapping of the KLT algorithm.
However, in [3] only hyperspectral data with a limited number
of bands were considered and low power consumption was not
targeted. In [4] and [5], the computation of the KLT and the
Integer KLT were addressed, respectively, achieving in both
works an acceleration factor of approximately 50%. However,
the design was targeted at the flash based SmartFusion
platform, which offers very limited hardware resources. In [6],
the computation of the Integer KLT algorithm was optimised
through its parallelization at top level. The work targeted an
on-board embedded implementation, however, only DSP and
multiprocessor platforms were considered. In [7], a multiplier-
less scheme of Integer KLT was presented for three
dimensional (3-D) medical images. Various other works have
addressed the implementation of the KLT and Integer KLT
algorithms, such as [8], [9], and [10], however, they were
aimed at software rather than hardware realizations.
In this paper, a spectral decorrelation system for adaptive
lossy/lossless hyperspectral image compression aimed at an
FPGA system-on-a-chip platform is proposed. The proposed
978-1-4799-5356-1/14/$31.00 ©2014 IEEE
112
2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS)
architecture add a significant improvement to the hardware
proposed in [3], by enhancing the level of parallel computing,
especially for hyperspectral images with large spectral
dimension. In addition, the proposed system addresses a novel
hardware architecture for the Integer KLT computation.

Fig. 1. Comparison of the KLT and the DWT Lossy compression
performance [14]
TABLE I THE IMPROVEMENT OF THE RKLT OVER OTHER TECHNIQUES [1] [15]
Cuprite Jasper
MCP BD&LP IWT MCP BD&LP IWT
8.94% 7.7% 8.86% 15.04% 11.84% 14.46%
II. OVERVIEW
The Karhunen-Loéve transform is an orthogonal linear
transform, which is applied to 3-D data sets to de-correlate
data on different bands. If the 3-D data set is a hyperspectral
image, the KLT will remove the correlations between the
image bands, constructing a more compressible data set. Fig. 2
depicts the computational flows of the KLT and the Integer
KLT algorithms. It can be seen that the two algorithms share
some of the computation processes, such as the BandMean,
the MeanSub, the Covariance Matrix and the Eigenvectors. On
the other hands, the differences are:
� The BandMean vector is rounded to the nearest integer in
the integer KLT
� A PLUS Matrix factorization is applied to the
Eigenvectors in the Integer KLT
� The matrix multiplication (Eigenvectors × MeanSub) of
the KLT is replaced with a matrix lifting in the Integer
KLT
All these processes are performed on large matrices; and
therefore, they are computationally intensive, which slows
down the overall computation process significantly. In the
next section, each of these processes will be analysed and
investigated thoroughly by breaking them down into
primitive arithmetic operations. This is followed by a
comprehensive analysis of the individual computations
inspecting the possibility and feasibility of suitable
acceleration techniques. In The mathematical expressions of
this paper, a hyperspectral image H is considered, where M
and L represent the spatial coordinates and N the number of
bands.
III. COMPUTATIONAL REQUIREMNTS
While some of the computation processes in Fig. 2 are
simple repetitive operations, such as the BandMean and the
MeanSub, others, such as the covariance matrix and
eigenvector evaluations, are more complicated processes
involving different sequential operations.
A. Band Mean,Mean Sub and Covariance Computations
The Band Mean is the computation of the mean value of
each spectral band as shown in equation (1), so the output is a
vector�ܣ א �ܴͳൈܰ�, where N is the number of spectral bands.
Therefore, the computation of eachelement requires M × L
additions (accumulation) and a single division.
ܣ ൌ�
ͳ
ܯ ൈ ܮ
� ෍ ܪ௜ǡ௝������������������ሺͳሻ
௜ୀெǡ௝ୀ௅
௜ୀଵǡ௝ୀଵ

The MeanSub process is the normalization of the data, so
this is computed by subtracting all the mean value of each
spectral band from all the data of that band. Therefore, this
process is very straightforward and only requires M× L×N
subtractions. The computation of the Covariance Matrix is
shown in equation (2). The second term of this equation is a
vector multiplication of the BandMean ܣ א �ܴଵൈே�with its
transpose, so this will only require ேሺேାଵሻ
ଶ
multiplications. On
the other hand, the first term of equation (2) is the matrix
multiplication of the input image with its transpose. The result
of this multiplication is a symmetric matrix ܴேൈேwhere each
element requires M × L multiply-accumulate operations.
Therefore, the first term of equation (2) is much more
computationally intensive and requires M × L multiply-
accumulate operations for ேሺேାଵሻ
ଶ
elements.

Fig. 2. The Computation Proces of the KLT and the Integer KLT
ܥܱܸ ൌ� ଵ
ெൈ௅
�ܪ ൈ ܪ் െ ܣ ൈ ܣ்�������� (2)
ݓ݄݁ݎ݁�ܥܱܸ א �ܴேൈே
Hyperspectral data Hyperspectral data

KLT Integer KLT
Band Mean
Mean Sub
Covariance
Eigenvectors
Rounding
Mean Sub
Covariance
Eigenvectors
Lifting
PLUS Eigenvectors
×
Mean Sub
Band Mean
113
B. Eigenvectors Computation
In this process, the eigenvalues and eigenvectors of the
covariance matrix are evaluated. Since the covariance matrix
is real and symmetric, the Jacobi algorithm will be considered.
Designed for real and symmetric matrices, the Jacobi
algorithm is an iterative method that outperforms other
techniques such as the QR algorithm in term of output
accuracy. The Jacobi algorithm computes both the eigenvalues
and the eigenvectors. The eigenvalues are computed by
diagonalizing the input matrix through iterative
transformations, which will eventually accumulate the
eigenvalues on the matrix diagonal. The eigenvectors are the
results of the multiplications of the transformation matrices.
Fig. 3 illustrates the Jacobi algorithm, where the
eigenvalues accumulate on the diagonal of matrix B, while V,
initially an identity matrix, converges to the eigenvectors.

Fig. 3. The Computation Proces of the Jacobi Algorithm
The computations in Fig. 3 are:
� The computation of θ, sin θ and cos θ to form the
transform matrix��ܶሺ௞ሻ.
� The matrix multiplications of equations (3) and (4).
�����ሺ͵ሻ
ܸ௡ାଵ ൌ �ܸ௡ �ൈ�ܶሺ௡ሻ�������������ሺͶሻ
Theoretically, a matrix multiplication of � ൈ � dimensions
requires ܰଷelement multiplications and ሺܰ െ ͳሻܰଶ elements
addition. Nevertheless, taking into consideration that ܶሺ௞ሻ is
an identity matrix with only four non-one non-zero
elements: ௜ܶ௜ ǡ ௜ܶ௝ ǡ ௝ܶ௜ ௝ܶ௝, the matrix multiplication in (3)
will alter only two rows and two columns, while in (4) only
two columns will be altered. Therefore, (3) requires 8N
multiplications and 4N additions; while (4) requires 4N
multiplications and 2N additions. Equations (3) and (4) can be
decomposed into (5) to (9), and (10) and (11), respectively.
ܤ௜௜
௞ାଵ ൌ�ܤ௜௜�
௞ ሺߠ௞ሻଶ �െ�ܤ௝௝
௞ ሺߠ௞ሻଶ െ �ʹܤ௜௝
௞ ߠ௞ ߠ௞(5)
ܤ௝௝
௞ାଵ ൌ�ܤ௜௜�
௞ ሺ ߠ௞ሻଶ ൅�ܤ௝௝
௞ ሺߠ௞ሻଶ ൅ �ʹܤ௜௝
௞ ߠ௞ ߠ௞(6)
ܤ௜௝௞ାଵ ൌ ܤ௝௜௞ାଵ ൌ
ሺܤ௜௜௞ାଵ െ ܤ௝௝௞ାଵሻ ߠ௞ ߠ௞ ൅ ܤ௝௜�௞ ሺሺߠ௞ሻଶ െ ሺ ߠ௞ሻଶሻ� (7)
for ݊� א ሾͳ�ܰሿ�݊ ് ݅ǡ ݊ ് ݆
ܤ௜௡௞ାଵ ൌ ܤ௡௜௞ାଵ ൌ ܤ௜௡௞ ߠ௞ െ�ܤ௝௡௞ ߠ௞ (8)
ܤ௝௡௞ାଵ ൌ ܤ௡௝௞ାଵ ൌ ܤ௝௡௞ ߠ௞ ൅�ܤ௜௡௞ ߠ௞ (9)
for���א ሾͳ�ܰሿ
௡ܸ௜
௞ାଵ ൌ ௜ܸ௡௞ ߠ௞ െ� ௝ܸ௡௞ ߠ௞ (10)
௡ܸ௝
௞ାଵ ൌ ܤ௝௡௞ ߠ௞ ൅� ௜ܸ௡௞ ߠ௞ (11)
Theoretically, more iterations result in more accurate
eigenvalues and eigenvectors; however, in [10] it is suggested
that 10 sweeps ሺ ൌ �� ேሺேିଵሻ
ଶ
� iterations) are enough to
achieve both eigenvectors and eigenvalues. Table II outlines
the computational requirements of the eigenvectors when 10
Jacobi sweeps are performed.
TABLE II COMPUTATIONAL REQUIREMENTS OF THE EIGENVECTOR
EVALUATION
Addition ૞ࡺሺࡺ െ ૚ሻሺ૜ ൅ ૛ࡺሻ
Subtraction ͷܰሺܰ െ ͳሻሺͶ ൅ ʹܰሻ
Multiplication ͷሺܰ െ ͳሻሺͳͳ ൅ ͺܰሻ
Division ͷܰሺܰ െ ͳሻ
Trignometric ͳͷܰሺܰ െ ͳሻ
C. Matrix Multiplication (Eigenvectors by MeanSub)
This process is also referred to as the Eigen mapping in the
literature. As shown in equation (12), this process is the
multiplication of the eigenvectors with the normalized data.
Each pixel of the output data is the vector multiplication result
of an eigenvector with a set of pixel of the same spatial
coordinates. Therefore, the computation of each pixel required
N multiply-accumulate operations and the whole image will
require ܯ ൈ ܮ ൈ ܰଶ multiply-accumulate operations.

൥
࢜૚ ڮ ࢜૚
ڭ ڰ ڭ
࢜࢔ ڮ ࢜࢔
൩�ൈ ቎
ࢎ૚ǡ૚ǡ૚ ǥǥǥ ࢎ࢒ǡ࢓ǡ૚
ڭ ڰ ڭ
ࢎ૚ǡ૚ǡ࢔ ǥǥǥ ࢎ࢒ǡ࢓ǡ࢔
቏ ൌ ࡻ࢛࢚࢖࢛࢚�ࡰࢇ࢚ࢇ (12)
D. PLUS Matrix Factorization
The PLUS matrix factorization is used so the eigenvectors
matrix is represented in four matrices rather than one. These
four matrices (P, L, U and S) maintain the same information of
the original eigenvectors matrix, so when applying the Eigen
mapping process, these matrices are used as the eigenvectors.
Therefore, by applying the Eigen mapping on these four
matrices, the output error incurred from the rounding of the
fractional numbers will be eliminated; hence, maintaining a
lossless process. This matrix factorization is a sequentially
iterative process as shown in the Fig. 4. The required number
of iterations is (N – 1), and in each of these iterations, the
following processes are performed sequentially: Pivoting, L, U
and S factorizations. The computational requirements of this
process are outlined in Table III.

Fig. 4. The PLUS Matrix Factorization Process [6]
ܶሺ݇ሻ � א � ܴܰൈܰ
ܤ௡ାଵ ൌ �ܶሺ௡ሻ೅ �ൈ ܤ௡ ൈ �ܶሺ௡ሻ
ܸ௡ାଵ ൌ �ܸ௡ �ൈ �ܶሺ௡ሻ��
114
TABLE III. COMPUTATIONAL REQUIREMNTS OF THE PLUS FACTORISATION
Addition ෍ ሺࡺ െ ࢑ሻ૛
࢑ୀࡺି૚
࢑ୀ૚

Subtraction (N-1) + σ ሺܰ െ ݇ሻሺܰ െ ݇ െ ͳሻ௞ୀேିଵ௞ୀ଴
Multiplication
ܰ ൅ ෍ ሺܰ െ ݇ሻଶ
௞ୀேିଵ
௞ୀଵ
൅ ሺܰ െ ܭሻ ൅
ʹ ෍ ሺܰ െ ݇ െ ͳሻ
௞ୀேିଵ
௞ୀଵ
���
Division (N-1) +σ ሺܰ െ ݇ െ ͳሻ௞ୀேିଵ௞ୀଵ ሺܰ െ ݇ሻ
E. Lifting Scheme
Instead of the matrix multiplication (MeanSub ×
Eigenvectors) in the classical KLT, a lifting technique is
employed in the Integer KLT. In the lifting technique, the
PLUS matrices are applied to the MeanSub data set according
to the reversible lifting scheme presented in [2]. The lifting
scheme can be mathematically represented as in Equation (13)
In order to maintain no-floating point output, after each stage,
the output elements are rounded. So, after multiplying the
vector X (the image data) with the S matrix the output vector
is rounded before it is multiplied with the U matrix and so on.
ࢅ ൌ �ݎ݋ݑ݊݀ሺݎ݋ݑ݊݀ሺݎ݋ݑ݊݀ሺࢄ כ ࡿሻ כ ࢁሻ כ ࡸሻ כ ࡼ� (13)
where P is a permuting matrix; hence, multiplying by P is
element swapping, where multiplications are only performed
by ones and zeroes. Therefore, the permuting is not
computationally intensive, as it only requires a loop through
the vector swapping certain elements. The S Matrix is a sparse
lower triangle matrix; therefore, by applying a zero-check
technique to the ࢄ כ ࡿ�multiplication, much less elements
multiplications can be required. The most computationally
intensive part of the lifting scheme is the multiplication with
the upper triangle matrix U and the lower triangle matrix L.
IV. DESIGN CONSIDERATIONS
A. Hardware Platrform and The Design Tools
The SRAMbased FPGA Cyclone IV from Altera (DE2-
115 Development Board [13]) is considered in this work. The
Cyclone FPGA series is designed to meet the needs of low
power applications, such as space applications. In addition to
the hardware logic and the SRAM memory resources, this
platform offers more than 500 hardware embedded multipliers
that can offer optimal performance and very low power
consumption, Moreover, the soft processor NIOS II is
provided by this platform. While the NIOS II is considered a
midrange processor in term of performance, it will be
adequate for the proposed architecture as it will only be
performing the high-level management and scheduling of the
computation stages rather than the actual computations, which
will be executed in the FPGA fabric.
The Altera Quartus II IDE will be used for the
development. This IDE provides the power estimator
PowerPlay tool, which will be used for estimating the power
consumption. Moreover, it offers ready-made and pre-tested
intellectual property blocks named MegaFunctions; these IP
blocks are optimised (speed or area) for Altera devices; they
support simple functions (addition, multiplication, subtraction
etc.) and complex functions such as trigonometric functions
for both fixed- and floating point data format.
B. Input Data
This work targets hyperspectral images with a large number
of bands (in the order of 100s). The Airborne Visible /
Infrared Imaging Spectrometer (AVIRIS) hyperspectral
images [12] are used as test images since they comprise 224
spectral bands. In particular, the Cuprite image, shown in Fig.
5, is employed in the testing presented in this paper. The
AVRIS Cuprite raw data consist of 14-bit unsigned integers.
As the main focus of this research is the spectral and not the
spatial compression, a portion of the AVIRIS Cuprite image,
comprised of 512 X 512 pixels, will be considered in the
experimental work.

Fig. 5. The AVIRIS Cuprite Hypersprectral Image [12]
V. KLT DESIGN APPROACH
The hardware computational flow of the KLT algorithm is
shown in Fig. 6. This flow constitutes of 3 stages: the
covariance matrix and the mean vector are computed in
parallel in stage 1, in stage 2, the eigenvectors and the
MeanSub (Normalization) are computed in parallel; the Eigen
mapping is performed in Stage3.

Fig. 6. The KLT Computational Flow on Hardware
A. Stage 1
In this stage both the Mean Vectors and the Covariance Matrix
are computed, equations (1) and (2). The first equation only
needs accumulators and the division can be performed by wire
shifting as most spatial dimensions are of power 2, i.e. 128,
256 or 512. The second equation is more intense, specially the
Normaliz
ation
Covarianc
e Matrix
Eigen
Mapping
Eigen-
Vectors
Mean-
Vector
Stage
3
Stage
2
Stage
1
115
first term (ܪ ൈܪ்ሻ, which is a matrix multiplication with its
transpose. Therefore, the result of this matrix multiplication is
a symmetric matrix of size N. The number of elements need to
be computed is��ܭ ൌ�ேሺேାଵሻ
ଶ
, and each element will require
ܯ ൈ ܮ multiply-accumulate operation cycles. Fig. 7 illustrates
the hardware architecture for this stage; each FIFO holds a
spectral band and both equations (1) and (2) are executed in
parallel by the accumulators ACC and the multiply-
accumulate units MAC. The optimal number of MACs is
ܭ ൌ�ேሺேାଵሻ
ଶ
, at which the covariance computation can be
performed in a single round (ܯ ൈ ܮ MAC operation cycles).
However, for hyperspectral images N can be large, so R MAC
units can be utilised, where R is a division of K, so, as the
used MAC units require 2 clock cycles, the execution of this
process will require ʹ ൈܯ ൈ ܮ ൈ ௄
ோ
clock cycles.

Fig. 7. Hardware Architecture of Stage 1
The used FPGA (Cyclone IV) can execute a subset of the
hyperspectral AVIRIS data, while the whole image is saved in
the on-board SDRAM. Larger FPGAs, such as the Cyclone V
or the Stratix, can offer much larger memory resources; thus,
larger data subsets can be processed at a time, saving the
input\output time. Nevertheless, dividing hyperspectral data
into subsets, known as clustering and tilling, does not only
reduce the on-chip memory requirements, it also has been
proven to improve the resilience of the algorithm to Single
Event Upsets (SEUs) [16] [17]. In [1], a comprehensive
analysis of the benefits introduced by clustering and tilling is
presented, where the simulations showed optimal clustering
sizes for different tilling. These simulations considered
different AVIRIS data sets and shown that optimal clustering
varies between 16 or 32 spectral bands. While the used
Cyclone IV can 32 spectral bands in 3 goes, the Cyclone V
offers 4 times larger memory resources and can process 32
bands in single go.
B. Stage 2
The normalization and eigenvectors computation are
executed in this stage; the first is straight forward subtractions
that can be performed in parallel similar to Stage 1. The
eigenvectors computation is the most complicated process of
the KLT computation, and for greater number spectral bands,
the computation time increases exponentially. Therefore, the
acceleration of this process is very significant, especially for
hyperspectral data with large number of spectral bands.
Fig. 8 illustrates the proposed hardware architecture of the
Jacobi algorithm. The processed eigenvectors and eigenvalues
are saved in two sets of FIFOs. These FIFOs are connected to
a multiplexing logic that manages data fetching and loading to
and from the computing logic. The computing logic is
responsible of the main computations, so the Ʌ Computer will
estimate�Ʌǡ Ʌ Ʌ; Eq(5,6,7) Computer of Fig.8 will
compute equations (5),(6) and (7); and 8 multipliers, 2 adders
and 2 subtractors will compute equations (8), (9), (10) and
(11). All three parts of the accelerator (FIFOs, Mux Logic and
computing logic) are controlled and scheduled by the on-chip
NIOS processor.

Fig. 8. Hardware Architecture of the Eigenvectors Computation
Since fixed-point implementation will result in a
significant output error for such an iterative algorithm, floating
point data format is considered for the eigenvectors
computation. The Altera MegaFunctions have been utilized
for most of the computation logic. Since the floating point
multiplication and additions require 11 and 14 clock cycles,
executing equations (8), (9), (10) and (11) will take
approximately ͷͲ ൅ ͳ͸ ൈ ሺܰ െ ʹሻ clock cycles. On the other
hands, the Ʌ Computer and Eq(5,6,7) Computer will require
up to 125 clock cycles each. Therefore, a single Jacobi
iteration will require ͵ͲͲ ൈ ሺܰ െ ʹሻ clock cycles.
C. Matrix Reduction Technique
In [11] an online matrix reduction technique was proposed.
In addition to the acceleration offered by this technique, it also
offers partial computation, so that some of the
eigenvectors/eigenvalues are computed before complete
computations of all the eigenvectors. This can be very
valuable for the KLT computation on hardware, so that the
Eigen mapping computation can be performed partially in
parallel with eigenvectors computations. Table IV outlines the
convergence of the eigenvectors for the proposed technique,
where different data sets of the AVIRIS Cuprite were
considered. It can be noticed that some of the eigenvectors are
computed and ready for the next stage from sweep 4 and 5.
116
TABLE IV. EIGENVECTORS CONVERGENCE THROUGH 8 SWEEPS
Sweep 8×8 16×16 32×32 64×64
1 - - - -
2 - - - -
3 - - - -
4 1 1 1 1
5 6 4 3 1
6 8 13 20 4
7 8 16 29 30
8 8 16 32 64
D. Stage 3
The Eigen mapping is the matrix multiplication of
equation (12), so the main operation is multiply-accumulateand the same resources of Stage 1 can be utilised, with very
similar architecture. Each pixel of the output data will require
N multiply-accumulate operations, therefore, the throughput
of a single MAC is ʹܰ ൅ ͳ clock cycles per pixel (2 cycles
for MAC and one for saving and zeroing the accumulator).
Since R MAC units are implemented in Stage 1, the
processing time of this stage will be ܯ ൈ ܮ ൈ ܰ�ሺʹܰ ൅ ͳሻȀܴ
clock cycles.
E. Experimental Results
In order to assess the proposed architecture, it has been
implemented on the Altera DE2-115 Board. Table V outlines
the hardware and power requirements of the proposed
architecture and Table VI outlines the processing time of the
proposed architecture of different data sets from the AVIRIS
Cuprite. Since the main objective of this work is the
acceleration of the computations, the I/O power and time were
excluded from these results.
When the hardware multipliers are fully used and Logic
Elements are utilised instead (R>136), a significant increase in
the power consumption can be seen, (i.e. when R increases
from 64 to 128, the power consumption increases by 9%;
while when it increases from 128 to 256 the power
consumption increases by more than 100%. This is because
the hardware multipliers offer much less power consumption.
(Experimental results show that an LE based multiplier
requires almost 9 times more power than a hardware
multiplier). The maximum R could be realised on the DE2-
115 is 136, which is the optimal R for the covariance
computation of 16 spectral bands. So data with larger number
of bands will require more than one round for computing the
covariance matrix. At R = 136, the power consumption was
1.05 Watt.
TABLE V. HARDWARE AND POWER RESOURCES OF THE KLT SYSTEM
R Multipliers % LEs % Power (mW)
8 324 61 49000 43 767
16 340 64 49300 43 775
32 372 70 50000 44 791
36 380 71 50000 44 794
64 436 82 51000 45 820
128 532 100 64000 56 981
136 532 100 71000 62 1053
256 532 100 165000 144 2018

TABLE VI. THE EXECUTION TIME (MS) OF THE PROPOSED KLT SYSTEM
R S 1 S 2 S 3 Overlap Total Time Time %
ʹͷ͸
ൈ
ʹͷ͸
ൈ
ͳ͸

17 10.56

3.5
22

0.5
1.41 35.56
34 5.28 11
2.59 19.28
68 2.64 5.5
4.49 11.14
136 1.32 2.75
7.07 7.07
ͷͳʹ
ൈ
ͷͳʹ
ൈ
ͳ͸
17 42.24 88
0.37 133.24
34 21.12 44 0.73 68.12
68 10.56 22
1.4 35.56
136 5.28 11 2.59 19.28
ʹͷ͸
ൈ
ʹͷ͸
ൈ
͵ʹ
33 21.12

25.3
44

5.2
6.1 85.22
66 10.56 22 9.87 52.66
132 5.28 11
14.29 36.38
264 2.64 5.5
18.41 28.24
ͷͳʹ
ൈ
ͷͳʹ
ൈ
͵ʹ
33 84.48 176 1.85 280.58
66 42.24 88
3.46 150.34
132 21.12 44
6.1 85.22
264 10.56 22 9.87 52.66

The processing time of each individual stage is stated where
different values of R are considered. The overlap time is the
difference between the complete computation of the
eigenvectors and when some of the eigenvectors are ready for
eigenvectors; hence, the overlap presents the extra parallelism
offered by the proposed Matrix Reduction Technique (MRT).
The following can be noticed from Table VI:
� The processing time of Stage 1 and Stage 3 are
proportional to both the spectral and spatial
dimensions while Stage 2 is only proportional to the
spectral dimension
� The processing time of Stage 2 over the overall
processing time is much more significant for larger
number of spectral bands (for 8 spectral bands: up to
33%; for 16 spectral bands: up to 68%; for 32 spectral
bands: up to 91%)
� The overlap time is independent from the spatial
dimensions and exponentially propotiona1 to the
spectral dimension, hence it is much more significant
for larger spectral bands (for 8 spectral bands: up to
4.89%; for 16 spectral bands: up to 11.82%; for 32
spectral bands: up to 18.4%).
Therefore, the experiments results have shown that the
additional level of parallel computing (overlap) is more
significant for larger spectral bands (hyperspectral rather than
multispectral); hence, the benefit of the proposed architecture
is more noticeable for hyperspectral data.
117
VI. INTEGER KLT DESIGN APPROACH
As shown in Fig. 2, the computation process of the Integer
KLT is very similar to the computation process of the KLT
algorithm. Therefore, a very similar computational flow will
be considered, where the Eigen mapping will be performed
through the PLUS factorization and the lifting scheme. The
hardware computation of these 2 processes will be discussed.
A. PLUS Factorization
The PLUS factorization is an iterative process and offers
insignificant level of parallel computing. Moreover, this
process requires floating point implementation as the main
objective of the Integer KLT is lossless data transformation.
Therefore, it will not be feasible to execute such a highly
sequential process in hardware. Table VII outlines the required
time to perform the PLUS factorization of different data sets
on the NIOS II and hardwired Cortex M-3 processor.
TABLE VII. EXECUTION TIME (MS) OF THE PLUS FACTORIZATION
8×8 16×16 32×32
NIOS II 6.7 71 880
Cortex M-3 1.8 15.2 142.8
B. Lifting Scheme
In order to simplify the illustration, Fig. 9 depicts the lifting
computation in hardware for 4 pixels. This process is executed
over 3 phases: multiplying by S, U and L, while the
multiplication by P (pivoting) is computationally insignificant.
The output results of each phase are saved into intermediate
FIFOs and while any of the phases executes a certain pixels
set, the other phases can execute the next or the previous
pixels sets and so on. The execution of each phase will require
ܰ operation cycles (computing) plus ܰ buffering cycles
(filling the buffer of the next phase). Architecture
Since the multiply-accumulate (MAC) units used in this
design requires 2 clock cycles, the execution of each phase
will require (ʹ ൈ ܰ ൅ܰ ൌ ͵ܰሻ clock cycles. Since the order
of each intermediate FIFO need to be reversed, the computing
operations cannot be performed in parallel with the buffering.
Consequently, the processing throughput can reach
approximately N pixels per (3ܰሻ clock cycles for each unit of
Fig. 9. Therefore, for an ሺܯ ൈ ܮ ൈ ܰሻ image, the execution
time for the lifting scheme will be slightly more the ͵ ൈܯ ൈ
ܮ ൈ ܰ clock cycles. Since the L and S are lower triangular
matrices and U is an upper triangular matrix, resource sharing
of the MAC units of Fig. 9 is possible. Therefore, the required
number MAC for a lifting unit (Fig. 9), is ହே
ଶ
െ ʹ; more units
will lead to faster computation.
C. Experimental Results
Table VII outlines the hardware and power requirements
of the proposed architecture and Table IX outlines the
processing time of the proposed architecture of different data
sets from the AVIRIS Cuprite.
The maximum R could be realised on the target FPGA is
156, this offers 2 lifting units for 32 spectral bands and more
than one fourth of the optimal R for the covariance
computations; so, the computation of the covariance matrix
will be performed in 4 rounds when 32 spectral bands are
processes. Since the hardware multipliers offer much less
power consumption, when the hardware multipliers are fully
used and Logic Elements are utilised instead (R=152), a
significant increase in the power consumption can be seen.
TABLE VIII. HARDWARE AND POWER RESOURCES OF THE INTEGERT KLT
R Multipliers % LEs % Power (mW)
18 351 66 51000 45 809
36 387 73 52000 46 829
38 391 83 53000 46 841
76 467 88 54000 47 873
152 532 100 90000 79 1261
156 532 100 92000 81 1282

The processing time of each individual stage is stated where
different values of R are considered. Since the hard CortexM-
3 showed a significant computational performance comparing
to the soft NIOS II (Table VII), the Cortex M-3 will be
considered in the assessment as a conceptual approach if the
proposed architecture were to be implemented on an FPGA
platform incorporating a hard processor. The processing time
of the PLUS factorization over the overall processing time is
more significant for larger number of spectral bands
While the computational requirements of the PLUS
factorization are less than the other processes (eigenvectors,
covariance and lifting), the execution time of this process can
be an overhead, especially if implemented on less powerful
processor such as the soft NIOS II. Therefore, it would be
significantly valuable to have similar factorization process to
the PLUS but with less sequential computation process, so it
can be more suitable for hardware implementation
TABLE IX. EXECUTION TIME OF THE PROPOSED INTEGER KLT SYSTEM
Spectra
l
Spatial
R

S 1 S 2
S3 Total Time
(ms) PLUS
Lift
Cortex NIOS Crtx NIOS
16

256

17 10.56

3.5

15.2

71
32.5 56.48 112.28
34 5.28 16.25 37.59 93.39
68 2.64 8.125 28.145 83.945
512

17 42.24 130 169.82 225.62
34 21.12 65 94.26 150.06
68 10.56 32.5 56.48 112.28
32

256

33 21.12

25.3

142

880
65 310.3 1048.3
66 10.56 32.5 355.8 1093.8
512

33 84.48 260 505.3 1243.3
66 42.24 130 453.3 1191.3
VII ADAPTIVE SYSTEM
The similarity between the KLT and the Integer KLT can be
utilised to realise an adaptive system. The adaptive
computation flow is shown in the Fig. 10; in this flow, most of
the adaptive overhead is presented in the processor firmware;
while the hardware overhead is only presented in the Eigen
mapping (Lifting\ Matrix Multiplication), which also share
most the hardware resources.
118

Fig. 10. The Computation Flow of the Adaptive KLT/ Integer KLT System
VIII. CONCLUSION
In this paper, novel architecture for Adaptive KLT/ Integer
KLT computation has been proposed. This architecture offers
an improvement over previous work by introducing a further
level of parallel computing. The experimental results have
shown that this improvement is more significant for
hyperspectral data with a larger number of spectral bands. The
experiments showed an overall improvement of up to 4.9%,
11.8% and 18.4% for 8, 16 and 32 spectral bands, respectively
In addition, the proposed architecture addresses a novel
hardware architecture for the computation of the Integer KLT,
which has only been addressed in the context of processor
platform (DSP, GPU or Desktop) in previous works. The
scalability of this hardware architecture can offer much higher
level of parallel computing than processor platforms.

REFERENCES
[1] Rizuan Mat Noor and Tanya Vladimirova, Integer KLT Design Space
Exploration for Hyperspectral Satellite Image Compression, Proceedings
of 5th international Conference on Convergence and Hybrid Information
Technology, ICHIT'11.
[2] P. Hao and Q. Shi "Reversible integer KLT for progressive-to-lossless
compression of multiple component images", Proc IEEE Int. Conf.
Image Process., p.I-633 , 2003
[3] M. Fleury, R. P. Self, and A. C. Downton, “Development of a Fine
Grained Karhunen-Loeve Transform”, Journal of Parallel and
Distributed Computing, 64(4):520-535, 2004
[4] C. Egho and T. Vladimirova. “Acceleration of the Karhunen-Loéve
Transform for System-on-a-Chip Platform”, NASA/ESA Conference on
Adaptive Hardware and Systems (AHS-2012). June 2012, Nuremberg,
Germany
[5] Egho and T. Vladimirova. “Hardware Acceleration of the Integer
Karhunen-Loève Transform Algorithm for Satellite Image
Compression”, IEEE International Geoscience and Remote Sensing
Symposium (IGARSS 2012), July 2012, Munich, Germany.
[6] N. R. Mat Noor and T. Vladimirova, “Parallel Implementation of
Lossless Clustered Integer KLT Using OpenMP”, Proc. of 7th
NASA/ESA Conference on the Adaptive Hardware and Systems (AHS-
2012), 25-28 June 2012, Nuremberg, Germany.
[7] L. Wang , J. Wu , L. Jiao and G. Shi "3D medical image compression
based on multiplierless low-complexity RKLT and shape-adaptive
wavelet transform", Proc. ICIP, pp.2521 -2524 2009
[8] B. Penna, T. Tillo, E. Magli, and G. Olmo, “Transform Coding
Techniques for Lossy Hyperspectral Data Compression,” IEEE Geosci.
Remote Sens., vol. 45, no. 5, pp.1408-1421, May. 2007.
[9] L. Xin, G. Lei, and L. Zhen, "Reversible Integer Principal Component
Transform for Hyperspectral Imagery Lossless Compression," in IEEE
International Conference on Control and Automation, 2007 (ICCA
2007), Guangzhou, China, 2007, pp. 2968-2972.
[10] L. Galli and S. Salzo, "Lossless Hyperspectral Compression Using
KLT," in IEEE International Conference on Geoscience and Remote
Sensing Symposium, 2004 (IGARSS 2004), Anchorage, USA, 2004.
[11] C. Egho and T. Vladimirova. “Eigenvectors Computation on a System-
on-Chip Platform for Satellite On-Board Use”, Proc. Of 7th Jordanian
International Electrical and Electronics Engineering Conference
(JIEEEC 2011), 11-14 April 2011, pp. 145-149, Amman, Jordan.
[12] AVIRIS Airborne Visible Infrared Imaging Spectrometer (Cuprite)
[13] www.altera.com/education/univ/materials/boards/de2-115/unv-de2-115-
board.html
[14] Qian Du; Fowler, J.E., "Hyperspectral Image Compression Using
JPEG2000 and Principal Component Analysis," Geoscience and Remote
Sensing Letters, IEEE , vol.4, no.2, pp.201,205, April 2007
[15] Blanes, I.; Serra-Sagrista, J., "Clustered Reversible-KLT for Progressive
Lossyto-Lossless 3d Image Coding," Data Compression Conference,
2009. DCC '09. , vol.,no., pp.233,242, 16-18 March 2009
[16] Q. Du , W. Zhu , H. Yang and J. E. Fowler "Segmented principal
component analysis for parallel compression of hyperspectral imagery",
IEEE Geosci. Remote Sens. Lett., vol.6, no. 4, pp.713 -717 2009
[17] T. Vladimirova, M. Meerman, and A. Curiel, "On-Board Compression of
Multispectral Images for Small Satellites," in IEEE International
Conference on Geoscience and Remote Sensing Symposium, 2006
(IGARSS 2006), Denver, Colorado, USA, 2006, pp. 3533-3536.
Fig. 9. Hardware Architecture of Eigenvectors Computation

119