Baixe o app para aproveitar ainda mais
Prévia do material em texto
Adaptive Hyperspectral Image Compression using the KLT and Integer KLT Algorithms Chafik Egho Surrey Space Centre University of Surrey Guildford, UK C.Egho@surrey.ac.uk Tanya Vladimirova Department of Engineering University of Leicester Leicester, UK tv29@leicester.ac.uk Abstract—The use of the Karhunen-Loéve Transform (KLT) for spectral decorrelation in compression of hyperspectral satellite images results in improved performance. However, the KLT algorithm consists of sequential processes, which are computationally intensive, such as the covariance matrix computation, eigenvector evaluation and matrix multiplications. These processes slow down the overall computation of the KLT transform significantly. The traditional KLT can only offer lossy compression; therefore, a reversible KLT algorithm, named the Integer KLT, is used for lossless compression. The Integer KLT includes more computational processes and, hence, it requires a longer processing time. The acceleration of these processes within the context of limited power and hardware budgets is the main objective of this paper. The computations of each of these processes are investigated thoroughly. Subsequently, a novel adaptive architecture for the computation of the KLT and the Integer KLT is proposed. The proposed system improves the traditional KLT performance compared with a previous architecture, and offers significant improvement for hyperspectral data with a larger spectral dimension. The experiments showed an overall improvement of up to 4.9%, 11.8% and 18.4% for 8, 16 and 32 spectral bands, respectively. In addition, this paper addresses novel hardware aspects of the Integer KLT implementation. The scalability of this hardware architecture can offer much higher level of parallel computing than processor platforms Keywords—KLT, Integer KLT, Hyperspectral Image compression I. INTRODUCTION The detailed and accurate information from hyperspectral images is of a significant importance for a wide range of applications. However, the large number of spectral bands results in an excessively large volume of imaging data. This requires increased memory resources to store the raw images on board satellites and a significant power consumption to transmit them to ground. Both power and memory resources are major design constraints in spacecraft embedded applications and therefore on-board hyperspectral image compression systems are designed to meet strict power and data storage capacity budgets. Employing spectral decorrelation improves hyperspectral data compression significantly. While various techniques have been used for spectral decorrelation, the Karhunen-Loéve transform (KLT) offers the best performance [1]. However, the floating-point outputs of the KLT process incur some data loss, which makes the KLT transform lossy. Therefore, a modified KLT algorithm, named Integer KLT, was proposed in [2]. By introducing different matrix factorizations the Integer KLT eliminates the floating-point output and achieves a lossless data compression process. Fig. 1 demonstrates the KLT and DWT performances for lossy compression; Table 1 outlines the improvements offered by the Integer KLT over other techniques (Multi-Component Compression MCP, Linear Prediction LP, Band differential BD and Integer Wavelet Transform IWT) for lossless compression. The main objective of this research is the design of a spectral decorrelation subsystem as part of a hyperspectral image compression system on board satellites. A design aimed at an FPGA-based SoC platform will be introduced and different design parameters will be investigated: processing time, accuracy, hardware resources and power consumption. A previous related work developed a KLT computing system on an FPGA-based SoC platform [3], where a parallel approach was proposed for the mapping of the KLT algorithm. However, in [3] only hyperspectral data with a limited number of bands were considered and low power consumption was not targeted. In [4] and [5], the computation of the KLT and the Integer KLT were addressed, respectively, achieving in both works an acceleration factor of approximately 50%. However, the design was targeted at the flash based SmartFusion platform, which offers very limited hardware resources. In [6], the computation of the Integer KLT algorithm was optimised through its parallelization at top level. The work targeted an on-board embedded implementation, however, only DSP and multiprocessor platforms were considered. In [7], a multiplier- less scheme of Integer KLT was presented for three dimensional (3-D) medical images. Various other works have addressed the implementation of the KLT and Integer KLT algorithms, such as [8], [9], and [10], however, they were aimed at software rather than hardware realizations. In this paper, a spectral decorrelation system for adaptive lossy/lossless hyperspectral image compression aimed at an FPGA system-on-a-chip platform is proposed. The proposed 978-1-4799-5356-1/14/$31.00 ©2014 IEEE 112 2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS) architecture add a significant improvement to the hardware proposed in [3], by enhancing the level of parallel computing, especially for hyperspectral images with large spectral dimension. In addition, the proposed system addresses a novel hardware architecture for the Integer KLT computation. Fig. 1. Comparison of the KLT and the DWT Lossy compression performance [14] TABLE I THE IMPROVEMENT OF THE RKLT OVER OTHER TECHNIQUES [1] [15] Cuprite Jasper MCP BD&LP IWT MCP BD&LP IWT 8.94% 7.7% 8.86% 15.04% 11.84% 14.46% II. OVERVIEW The Karhunen-Loéve transform is an orthogonal linear transform, which is applied to 3-D data sets to de-correlate data on different bands. If the 3-D data set is a hyperspectral image, the KLT will remove the correlations between the image bands, constructing a more compressible data set. Fig. 2 depicts the computational flows of the KLT and the Integer KLT algorithms. It can be seen that the two algorithms share some of the computation processes, such as the BandMean, the MeanSub, the Covariance Matrix and the Eigenvectors. On the other hands, the differences are: � The BandMean vector is rounded to the nearest integer in the integer KLT � A PLUS Matrix factorization is applied to the Eigenvectors in the Integer KLT � The matrix multiplication (Eigenvectors × MeanSub) of the KLT is replaced with a matrix lifting in the Integer KLT All these processes are performed on large matrices; and therefore, they are computationally intensive, which slows down the overall computation process significantly. In the next section, each of these processes will be analysed and investigated thoroughly by breaking them down into primitive arithmetic operations. This is followed by a comprehensive analysis of the individual computations inspecting the possibility and feasibility of suitable acceleration techniques. In The mathematical expressions of this paper, a hyperspectral image H is considered, where M and L represent the spatial coordinates and N the number of bands. III. COMPUTATIONAL REQUIREMNTS While some of the computation processes in Fig. 2 are simple repetitive operations, such as the BandMean and the MeanSub, others, such as the covariance matrix and eigenvector evaluations, are more complicated processes involving different sequential operations. A. Band Mean,Mean Sub and Covariance Computations The Band Mean is the computation of the mean value of each spectral band as shown in equation (1), so the output is a vector�ܣ א �ܴͳൈܰ�, where N is the number of spectral bands. Therefore, the computation of eachelement requires M × L additions (accumulation) and a single division. ܣ ൌ� ͳ ܯ ൈ ܮ � ܪǡ������������������ሺͳሻ ୀெǡୀ ୀଵǡୀଵ The MeanSub process is the normalization of the data, so this is computed by subtracting all the mean value of each spectral band from all the data of that band. Therefore, this process is very straightforward and only requires M× L×N subtractions. The computation of the Covariance Matrix is shown in equation (2). The second term of this equation is a vector multiplication of the BandMean ܣ א �ܴଵൈே�with its transpose, so this will only require ேሺேାଵሻ ଶ multiplications. On the other hand, the first term of equation (2) is the matrix multiplication of the input image with its transpose. The result of this multiplication is a symmetric matrix ܴேൈேwhere each element requires M × L multiply-accumulate operations. Therefore, the first term of equation (2) is much more computationally intensive and requires M × L multiply- accumulate operations for ேሺேାଵሻ ଶ elements. Fig. 2. The Computation Proces of the KLT and the Integer KLT ܥܱܸ ൌ� ଵ ெൈ �ܪ ൈ ܪ் െ ܣ ൈ ܣ்�������� (2) ݓ݄݁ݎ݁�ܥܱܸ א �ܴேൈே Hyperspectral data Hyperspectral data KLT Integer KLT Band Mean Mean Sub Covariance Eigenvectors Rounding Mean Sub Covariance Eigenvectors Lifting PLUS Eigenvectors × Mean Sub Band Mean 113 B. Eigenvectors Computation In this process, the eigenvalues and eigenvectors of the covariance matrix are evaluated. Since the covariance matrix is real and symmetric, the Jacobi algorithm will be considered. Designed for real and symmetric matrices, the Jacobi algorithm is an iterative method that outperforms other techniques such as the QR algorithm in term of output accuracy. The Jacobi algorithm computes both the eigenvalues and the eigenvectors. The eigenvalues are computed by diagonalizing the input matrix through iterative transformations, which will eventually accumulate the eigenvalues on the matrix diagonal. The eigenvectors are the results of the multiplications of the transformation matrices. Fig. 3 illustrates the Jacobi algorithm, where the eigenvalues accumulate on the diagonal of matrix B, while V, initially an identity matrix, converges to the eigenvectors. Fig. 3. The Computation Proces of the Jacobi Algorithm The computations in Fig. 3 are: � The computation of θ, sin θ and cos θ to form the transform matrix��ܶሺሻ. � The matrix multiplications of equations (3) and (4). �����ሺ͵ሻ ܸାଵ ൌ �ܸ �ൈ�ܶሺሻ�������������ሺͶሻ Theoretically, a matrix multiplication of � ൈ � dimensions requires ܰଷelement multiplications and ሺܰ െ ͳሻܰଶ elements addition. Nevertheless, taking into consideration that ܶሺሻ is an identity matrix with only four non-one non-zero elements: ܶ ǡ ܶ ǡ ܶ ܶ, the matrix multiplication in (3) will alter only two rows and two columns, while in (4) only two columns will be altered. Therefore, (3) requires 8N multiplications and 4N additions; while (4) requires 4N multiplications and 2N additions. Equations (3) and (4) can be decomposed into (5) to (9), and (10) and (11), respectively. ܤ ାଵ ൌ�ܤ� ሺ ߠሻଶ �െ�ܤ ሺߠሻଶ െ �ʹܤ ߠ ߠ(5) ܤ ାଵ ൌ�ܤ� ሺ ߠሻଶ �ܤ ሺ ߠሻଶ �ʹܤ ߠ ߠ(6) ܤାଵ ൌ ܤାଵ ൌ ሺܤାଵ െ ܤାଵሻ ߠ ߠ ܤ� ሺሺ ߠሻଶ െ ሺ ߠሻଶሻ� (7) for ݊� א ሾͳ�ܰሿ�݊ ് ݅ǡ ݊ ് ݆ ܤାଵ ൌ ܤାଵ ൌ ܤ ߠ െ�ܤ ߠ (8) ܤାଵ ൌ ܤାଵ ൌ ܤ ߠ �ܤ ߠ (9) for���א ሾͳ�ܰሿ ܸ ାଵ ൌ ܸ ߠ െ� ܸ ߠ (10) ܸ ାଵ ൌ ܤ ߠ � ܸ ߠ (11) Theoretically, more iterations result in more accurate eigenvalues and eigenvectors; however, in [10] it is suggested that 10 sweeps ሺ ൌ �� ேሺேିଵሻ ଶ � iterations) are enough to achieve both eigenvectors and eigenvalues. Table II outlines the computational requirements of the eigenvectors when 10 Jacobi sweeps are performed. TABLE II COMPUTATIONAL REQUIREMENTS OF THE EIGENVECTOR EVALUATION Addition ࡺሺࡺ െ ሻሺ ࡺሻ Subtraction ͷܰሺܰ െ ͳሻሺͶ ʹܰሻ Multiplication ͷሺܰ െ ͳሻሺͳͳ ͺܰሻ Division ͷܰሺܰ െ ͳሻ Trignometric ͳͷܰሺܰ െ ͳሻ C. Matrix Multiplication (Eigenvectors by MeanSub) This process is also referred to as the Eigen mapping in the literature. As shown in equation (12), this process is the multiplication of the eigenvectors with the normalized data. Each pixel of the output data is the vector multiplication result of an eigenvector with a set of pixel of the same spatial coordinates. Therefore, the computation of each pixel required N multiply-accumulate operations and the whole image will require ܯ ൈ ܮ ൈ ܰଶ multiply-accumulate operations. ࢜ ڮ ࢜ ڭ ڰ ڭ ࢜ ڮ ࢜ ൩�ൈ ࢎǡǡ ǥǥǥ ࢎǡǡ ڭ ڰ ڭ ࢎǡǡ ǥǥǥ ࢎǡǡ ൌ ࡻ࢛࢚࢛࢚�ࡰࢇ࢚ࢇ (12) D. PLUS Matrix Factorization The PLUS matrix factorization is used so the eigenvectors matrix is represented in four matrices rather than one. These four matrices (P, L, U and S) maintain the same information of the original eigenvectors matrix, so when applying the Eigen mapping process, these matrices are used as the eigenvectors. Therefore, by applying the Eigen mapping on these four matrices, the output error incurred from the rounding of the fractional numbers will be eliminated; hence, maintaining a lossless process. This matrix factorization is a sequentially iterative process as shown in the Fig. 4. The required number of iterations is (N – 1), and in each of these iterations, the following processes are performed sequentially: Pivoting, L, U and S factorizations. The computational requirements of this process are outlined in Table III. Fig. 4. The PLUS Matrix Factorization Process [6] ܶሺ݇ሻ � א � ܴܰൈܰ ܤାଵ ൌ �ܶሺሻ �ൈ ܤ ൈ �ܶሺሻ ܸାଵ ൌ �ܸ �ൈ �ܶሺሻ�� 114 TABLE III. COMPUTATIONAL REQUIREMNTS OF THE PLUS FACTORISATION Addition ሺࡺ െ ሻ ୀࡺି ୀ Subtraction (N-1) + σ ሺܰ െ ݇ሻሺܰ െ ݇ െ ͳሻୀேିଵୀ Multiplication ܰ ሺܰ െ ݇ሻଶ ୀேିଵ ୀଵ ሺܰ െ ܭሻ ʹ ሺܰ െ ݇ െ ͳሻ ୀேିଵ ୀଵ ��� Division (N-1) +σ ሺܰ െ ݇ െ ͳሻୀேିଵୀଵ ሺܰ െ ݇ሻ E. Lifting Scheme Instead of the matrix multiplication (MeanSub × Eigenvectors) in the classical KLT, a lifting technique is employed in the Integer KLT. In the lifting technique, the PLUS matrices are applied to the MeanSub data set according to the reversible lifting scheme presented in [2]. The lifting scheme can be mathematically represented as in Equation (13) In order to maintain no-floating point output, after each stage, the output elements are rounded. So, after multiplying the vector X (the image data) with the S matrix the output vector is rounded before it is multiplied with the U matrix and so on. ࢅ ൌ �ݎݑ݊݀ሺݎݑ݊݀ሺݎݑ݊݀ሺࢄ כ ࡿሻ כ ࢁሻ כ ࡸሻ כ ࡼ� (13) where P is a permuting matrix; hence, multiplying by P is element swapping, where multiplications are only performed by ones and zeroes. Therefore, the permuting is not computationally intensive, as it only requires a loop through the vector swapping certain elements. The S Matrix is a sparse lower triangle matrix; therefore, by applying a zero-check technique to the ࢄ כ ࡿ�multiplication, much less elements multiplications can be required. The most computationally intensive part of the lifting scheme is the multiplication with the upper triangle matrix U and the lower triangle matrix L. IV. DESIGN CONSIDERATIONS A. Hardware Platrform and The Design Tools The SRAMbased FPGA Cyclone IV from Altera (DE2- 115 Development Board [13]) is considered in this work. The Cyclone FPGA series is designed to meet the needs of low power applications, such as space applications. In addition to the hardware logic and the SRAM memory resources, this platform offers more than 500 hardware embedded multipliers that can offer optimal performance and very low power consumption, Moreover, the soft processor NIOS II is provided by this platform. While the NIOS II is considered a midrange processor in term of performance, it will be adequate for the proposed architecture as it will only be performing the high-level management and scheduling of the computation stages rather than the actual computations, which will be executed in the FPGA fabric. The Altera Quartus II IDE will be used for the development. This IDE provides the power estimator PowerPlay tool, which will be used for estimating the power consumption. Moreover, it offers ready-made and pre-tested intellectual property blocks named MegaFunctions; these IP blocks are optimised (speed or area) for Altera devices; they support simple functions (addition, multiplication, subtraction etc.) and complex functions such as trigonometric functions for both fixed- and floating point data format. B. Input Data This work targets hyperspectral images with a large number of bands (in the order of 100s). The Airborne Visible / Infrared Imaging Spectrometer (AVIRIS) hyperspectral images [12] are used as test images since they comprise 224 spectral bands. In particular, the Cuprite image, shown in Fig. 5, is employed in the testing presented in this paper. The AVRIS Cuprite raw data consist of 14-bit unsigned integers. As the main focus of this research is the spectral and not the spatial compression, a portion of the AVIRIS Cuprite image, comprised of 512 X 512 pixels, will be considered in the experimental work. Fig. 5. The AVIRIS Cuprite Hypersprectral Image [12] V. KLT DESIGN APPROACH The hardware computational flow of the KLT algorithm is shown in Fig. 6. This flow constitutes of 3 stages: the covariance matrix and the mean vector are computed in parallel in stage 1, in stage 2, the eigenvectors and the MeanSub (Normalization) are computed in parallel; the Eigen mapping is performed in Stage3. Fig. 6. The KLT Computational Flow on Hardware A. Stage 1 In this stage both the Mean Vectors and the Covariance Matrix are computed, equations (1) and (2). The first equation only needs accumulators and the division can be performed by wire shifting as most spatial dimensions are of power 2, i.e. 128, 256 or 512. The second equation is more intense, specially the Normaliz ation Covarianc e Matrix Eigen Mapping Eigen- Vectors Mean- Vector Stage 3 Stage 2 Stage 1 115 first term (ܪ ൈܪ்ሻ, which is a matrix multiplication with its transpose. Therefore, the result of this matrix multiplication is a symmetric matrix of size N. The number of elements need to be computed is��ܭ ൌ�ேሺேାଵሻ ଶ , and each element will require ܯ ൈ ܮ multiply-accumulate operation cycles. Fig. 7 illustrates the hardware architecture for this stage; each FIFO holds a spectral band and both equations (1) and (2) are executed in parallel by the accumulators ACC and the multiply- accumulate units MAC. The optimal number of MACs is ܭ ൌ�ேሺேାଵሻ ଶ , at which the covariance computation can be performed in a single round (ܯ ൈ ܮ MAC operation cycles). However, for hyperspectral images N can be large, so R MAC units can be utilised, where R is a division of K, so, as the used MAC units require 2 clock cycles, the execution of this process will require ʹ ൈܯ ൈ ܮ ൈ ோ clock cycles. Fig. 7. Hardware Architecture of Stage 1 The used FPGA (Cyclone IV) can execute a subset of the hyperspectral AVIRIS data, while the whole image is saved in the on-board SDRAM. Larger FPGAs, such as the Cyclone V or the Stratix, can offer much larger memory resources; thus, larger data subsets can be processed at a time, saving the input\output time. Nevertheless, dividing hyperspectral data into subsets, known as clustering and tilling, does not only reduce the on-chip memory requirements, it also has been proven to improve the resilience of the algorithm to Single Event Upsets (SEUs) [16] [17]. In [1], a comprehensive analysis of the benefits introduced by clustering and tilling is presented, where the simulations showed optimal clustering sizes for different tilling. These simulations considered different AVIRIS data sets and shown that optimal clustering varies between 16 or 32 spectral bands. While the used Cyclone IV can 32 spectral bands in 3 goes, the Cyclone V offers 4 times larger memory resources and can process 32 bands in single go. B. Stage 2 The normalization and eigenvectors computation are executed in this stage; the first is straight forward subtractions that can be performed in parallel similar to Stage 1. The eigenvectors computation is the most complicated process of the KLT computation, and for greater number spectral bands, the computation time increases exponentially. Therefore, the acceleration of this process is very significant, especially for hyperspectral data with large number of spectral bands. Fig. 8 illustrates the proposed hardware architecture of the Jacobi algorithm. The processed eigenvectors and eigenvalues are saved in two sets of FIFOs. These FIFOs are connected to a multiplexing logic that manages data fetching and loading to and from the computing logic. The computing logic is responsible of the main computations, so the Ʌ Computer will estimate�Ʌǡ Ʌ Ʌ; Eq(5,6,7) Computer of Fig.8 will compute equations (5),(6) and (7); and 8 multipliers, 2 adders and 2 subtractors will compute equations (8), (9), (10) and (11). All three parts of the accelerator (FIFOs, Mux Logic and computing logic) are controlled and scheduled by the on-chip NIOS processor. Fig. 8. Hardware Architecture of the Eigenvectors Computation Since fixed-point implementation will result in a significant output error for such an iterative algorithm, floating point data format is considered for the eigenvectors computation. The Altera MegaFunctions have been utilized for most of the computation logic. Since the floating point multiplication and additions require 11 and 14 clock cycles, executing equations (8), (9), (10) and (11) will take approximately ͷͲ ͳ ൈ ሺܰ െ ʹሻ clock cycles. On the other hands, the Ʌ Computer and Eq(5,6,7) Computer will require up to 125 clock cycles each. Therefore, a single Jacobi iteration will require ͵ͲͲ ൈ ሺܰ െ ʹሻ clock cycles. C. Matrix Reduction Technique In [11] an online matrix reduction technique was proposed. In addition to the acceleration offered by this technique, it also offers partial computation, so that some of the eigenvectors/eigenvalues are computed before complete computations of all the eigenvectors. This can be very valuable for the KLT computation on hardware, so that the Eigen mapping computation can be performed partially in parallel with eigenvectors computations. Table IV outlines the convergence of the eigenvectors for the proposed technique, where different data sets of the AVIRIS Cuprite were considered. It can be noticed that some of the eigenvectors are computed and ready for the next stage from sweep 4 and 5. 116 TABLE IV. EIGENVECTORS CONVERGENCE THROUGH 8 SWEEPS Sweep 8×8 16×16 32×32 64×64 1 - - - - 2 - - - - 3 - - - - 4 1 1 1 1 5 6 4 3 1 6 8 13 20 4 7 8 16 29 30 8 8 16 32 64 D. Stage 3 The Eigen mapping is the matrix multiplication of equation (12), so the main operation is multiply-accumulateand the same resources of Stage 1 can be utilised, with very similar architecture. Each pixel of the output data will require N multiply-accumulate operations, therefore, the throughput of a single MAC is ʹܰ ͳ clock cycles per pixel (2 cycles for MAC and one for saving and zeroing the accumulator). Since R MAC units are implemented in Stage 1, the processing time of this stage will be ܯ ൈ ܮ ൈ ܰ�ሺʹܰ ͳሻȀܴ clock cycles. E. Experimental Results In order to assess the proposed architecture, it has been implemented on the Altera DE2-115 Board. Table V outlines the hardware and power requirements of the proposed architecture and Table VI outlines the processing time of the proposed architecture of different data sets from the AVIRIS Cuprite. Since the main objective of this work is the acceleration of the computations, the I/O power and time were excluded from these results. When the hardware multipliers are fully used and Logic Elements are utilised instead (R>136), a significant increase in the power consumption can be seen, (i.e. when R increases from 64 to 128, the power consumption increases by 9%; while when it increases from 128 to 256 the power consumption increases by more than 100%. This is because the hardware multipliers offer much less power consumption. (Experimental results show that an LE based multiplier requires almost 9 times more power than a hardware multiplier). The maximum R could be realised on the DE2- 115 is 136, which is the optimal R for the covariance computation of 16 spectral bands. So data with larger number of bands will require more than one round for computing the covariance matrix. At R = 136, the power consumption was 1.05 Watt. TABLE V. HARDWARE AND POWER RESOURCES OF THE KLT SYSTEM R Multipliers % LEs % Power (mW) 8 324 61 49000 43 767 16 340 64 49300 43 775 32 372 70 50000 44 791 36 380 71 50000 44 794 64 436 82 51000 45 820 128 532 100 64000 56 981 136 532 100 71000 62 1053 256 532 100 165000 144 2018 TABLE VI. THE EXECUTION TIME (MS) OF THE PROPOSED KLT SYSTEM R S 1 S 2 S 3 Overlap Total Time Time % ʹͷ ൈ ʹͷ ൈ ͳ 17 10.56 3.5 22 0.5 1.41 35.56 34 5.28 11 2.59 19.28 68 2.64 5.5 4.49 11.14 136 1.32 2.75 7.07 7.07 ͷͳʹ ൈ ͷͳʹ ൈ ͳ 17 42.24 88 0.37 133.24 34 21.12 44 0.73 68.12 68 10.56 22 1.4 35.56 136 5.28 11 2.59 19.28 ʹͷ ൈ ʹͷ ൈ ͵ʹ 33 21.12 25.3 44 5.2 6.1 85.22 66 10.56 22 9.87 52.66 132 5.28 11 14.29 36.38 264 2.64 5.5 18.41 28.24 ͷͳʹ ൈ ͷͳʹ ൈ ͵ʹ 33 84.48 176 1.85 280.58 66 42.24 88 3.46 150.34 132 21.12 44 6.1 85.22 264 10.56 22 9.87 52.66 The processing time of each individual stage is stated where different values of R are considered. The overlap time is the difference between the complete computation of the eigenvectors and when some of the eigenvectors are ready for eigenvectors; hence, the overlap presents the extra parallelism offered by the proposed Matrix Reduction Technique (MRT). The following can be noticed from Table VI: � The processing time of Stage 1 and Stage 3 are proportional to both the spectral and spatial dimensions while Stage 2 is only proportional to the spectral dimension � The processing time of Stage 2 over the overall processing time is much more significant for larger number of spectral bands (for 8 spectral bands: up to 33%; for 16 spectral bands: up to 68%; for 32 spectral bands: up to 91%) � The overlap time is independent from the spatial dimensions and exponentially propotiona1 to the spectral dimension, hence it is much more significant for larger spectral bands (for 8 spectral bands: up to 4.89%; for 16 spectral bands: up to 11.82%; for 32 spectral bands: up to 18.4%). Therefore, the experiments results have shown that the additional level of parallel computing (overlap) is more significant for larger spectral bands (hyperspectral rather than multispectral); hence, the benefit of the proposed architecture is more noticeable for hyperspectral data. 117 VI. INTEGER KLT DESIGN APPROACH As shown in Fig. 2, the computation process of the Integer KLT is very similar to the computation process of the KLT algorithm. Therefore, a very similar computational flow will be considered, where the Eigen mapping will be performed through the PLUS factorization and the lifting scheme. The hardware computation of these 2 processes will be discussed. A. PLUS Factorization The PLUS factorization is an iterative process and offers insignificant level of parallel computing. Moreover, this process requires floating point implementation as the main objective of the Integer KLT is lossless data transformation. Therefore, it will not be feasible to execute such a highly sequential process in hardware. Table VII outlines the required time to perform the PLUS factorization of different data sets on the NIOS II and hardwired Cortex M-3 processor. TABLE VII. EXECUTION TIME (MS) OF THE PLUS FACTORIZATION 8×8 16×16 32×32 NIOS II 6.7 71 880 Cortex M-3 1.8 15.2 142.8 B. Lifting Scheme In order to simplify the illustration, Fig. 9 depicts the lifting computation in hardware for 4 pixels. This process is executed over 3 phases: multiplying by S, U and L, while the multiplication by P (pivoting) is computationally insignificant. The output results of each phase are saved into intermediate FIFOs and while any of the phases executes a certain pixels set, the other phases can execute the next or the previous pixels sets and so on. The execution of each phase will require ܰ operation cycles (computing) plus ܰ buffering cycles (filling the buffer of the next phase). Architecture Since the multiply-accumulate (MAC) units used in this design requires 2 clock cycles, the execution of each phase will require (ʹ ൈ ܰ ܰ ൌ ͵ܰሻ clock cycles. Since the order of each intermediate FIFO need to be reversed, the computing operations cannot be performed in parallel with the buffering. Consequently, the processing throughput can reach approximately N pixels per (3ܰሻ clock cycles for each unit of Fig. 9. Therefore, for an ሺܯ ൈ ܮ ൈ ܰሻ image, the execution time for the lifting scheme will be slightly more the ͵ ൈܯ ൈ ܮ ൈ ܰ clock cycles. Since the L and S are lower triangular matrices and U is an upper triangular matrix, resource sharing of the MAC units of Fig. 9 is possible. Therefore, the required number MAC for a lifting unit (Fig. 9), is ହே ଶ െ ʹ; more units will lead to faster computation. C. Experimental Results Table VII outlines the hardware and power requirements of the proposed architecture and Table IX outlines the processing time of the proposed architecture of different data sets from the AVIRIS Cuprite. The maximum R could be realised on the target FPGA is 156, this offers 2 lifting units for 32 spectral bands and more than one fourth of the optimal R for the covariance computations; so, the computation of the covariance matrix will be performed in 4 rounds when 32 spectral bands are processes. Since the hardware multipliers offer much less power consumption, when the hardware multipliers are fully used and Logic Elements are utilised instead (R=152), a significant increase in the power consumption can be seen. TABLE VIII. HARDWARE AND POWER RESOURCES OF THE INTEGERT KLT R Multipliers % LEs % Power (mW) 18 351 66 51000 45 809 36 387 73 52000 46 829 38 391 83 53000 46 841 76 467 88 54000 47 873 152 532 100 90000 79 1261 156 532 100 92000 81 1282 The processing time of each individual stage is stated where different values of R are considered. Since the hard CortexM- 3 showed a significant computational performance comparing to the soft NIOS II (Table VII), the Cortex M-3 will be considered in the assessment as a conceptual approach if the proposed architecture were to be implemented on an FPGA platform incorporating a hard processor. The processing time of the PLUS factorization over the overall processing time is more significant for larger number of spectral bands While the computational requirements of the PLUS factorization are less than the other processes (eigenvectors, covariance and lifting), the execution time of this process can be an overhead, especially if implemented on less powerful processor such as the soft NIOS II. Therefore, it would be significantly valuable to have similar factorization process to the PLUS but with less sequential computation process, so it can be more suitable for hardware implementation TABLE IX. EXECUTION TIME OF THE PROPOSED INTEGER KLT SYSTEM Spectra l Spatial R S 1 S 2 S3 Total Time (ms) PLUS Lift Cortex NIOS Crtx NIOS 16 256 17 10.56 3.5 15.2 71 32.5 56.48 112.28 34 5.28 16.25 37.59 93.39 68 2.64 8.125 28.145 83.945 512 17 42.24 130 169.82 225.62 34 21.12 65 94.26 150.06 68 10.56 32.5 56.48 112.28 32 256 33 21.12 25.3 142 880 65 310.3 1048.3 66 10.56 32.5 355.8 1093.8 512 33 84.48 260 505.3 1243.3 66 42.24 130 453.3 1191.3 VII ADAPTIVE SYSTEM The similarity between the KLT and the Integer KLT can be utilised to realise an adaptive system. The adaptive computation flow is shown in the Fig. 10; in this flow, most of the adaptive overhead is presented in the processor firmware; while the hardware overhead is only presented in the Eigen mapping (Lifting\ Matrix Multiplication), which also share most the hardware resources. 118 Fig. 10. The Computation Flow of the Adaptive KLT/ Integer KLT System VIII. CONCLUSION In this paper, novel architecture for Adaptive KLT/ Integer KLT computation has been proposed. This architecture offers an improvement over previous work by introducing a further level of parallel computing. The experimental results have shown that this improvement is more significant for hyperspectral data with a larger number of spectral bands. The experiments showed an overall improvement of up to 4.9%, 11.8% and 18.4% for 8, 16 and 32 spectral bands, respectively In addition, the proposed architecture addresses a novel hardware architecture for the computation of the Integer KLT, which has only been addressed in the context of processor platform (DSP, GPU or Desktop) in previous works. The scalability of this hardware architecture can offer much higher level of parallel computing than processor platforms. REFERENCES [1] Rizuan Mat Noor and Tanya Vladimirova, Integer KLT Design Space Exploration for Hyperspectral Satellite Image Compression, Proceedings of 5th international Conference on Convergence and Hybrid Information Technology, ICHIT'11. [2] P. Hao and Q. Shi "Reversible integer KLT for progressive-to-lossless compression of multiple component images", Proc IEEE Int. Conf. Image Process., p.I-633 , 2003 [3] M. Fleury, R. P. Self, and A. C. Downton, “Development of a Fine Grained Karhunen-Loeve Transform”, Journal of Parallel and Distributed Computing, 64(4):520-535, 2004 [4] C. Egho and T. Vladimirova. “Acceleration of the Karhunen-Loéve Transform for System-on-a-Chip Platform”, NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2012). June 2012, Nuremberg, Germany [5] Egho and T. Vladimirova. “Hardware Acceleration of the Integer Karhunen-Loève Transform Algorithm for Satellite Image Compression”, IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2012), July 2012, Munich, Germany. [6] N. R. Mat Noor and T. Vladimirova, “Parallel Implementation of Lossless Clustered Integer KLT Using OpenMP”, Proc. of 7th NASA/ESA Conference on the Adaptive Hardware and Systems (AHS- 2012), 25-28 June 2012, Nuremberg, Germany. [7] L. Wang , J. Wu , L. Jiao and G. Shi "3D medical image compression based on multiplierless low-complexity RKLT and shape-adaptive wavelet transform", Proc. ICIP, pp.2521 -2524 2009 [8] B. Penna, T. Tillo, E. Magli, and G. Olmo, “Transform Coding Techniques for Lossy Hyperspectral Data Compression,” IEEE Geosci. Remote Sens., vol. 45, no. 5, pp.1408-1421, May. 2007. [9] L. Xin, G. Lei, and L. Zhen, "Reversible Integer Principal Component Transform for Hyperspectral Imagery Lossless Compression," in IEEE International Conference on Control and Automation, 2007 (ICCA 2007), Guangzhou, China, 2007, pp. 2968-2972. [10] L. Galli and S. Salzo, "Lossless Hyperspectral Compression Using KLT," in IEEE International Conference on Geoscience and Remote Sensing Symposium, 2004 (IGARSS 2004), Anchorage, USA, 2004. [11] C. Egho and T. Vladimirova. “Eigenvectors Computation on a System- on-Chip Platform for Satellite On-Board Use”, Proc. Of 7th Jordanian International Electrical and Electronics Engineering Conference (JIEEEC 2011), 11-14 April 2011, pp. 145-149, Amman, Jordan. [12] AVIRIS Airborne Visible Infrared Imaging Spectrometer (Cuprite) [13] www.altera.com/education/univ/materials/boards/de2-115/unv-de2-115- board.html [14] Qian Du; Fowler, J.E., "Hyperspectral Image Compression Using JPEG2000 and Principal Component Analysis," Geoscience and Remote Sensing Letters, IEEE , vol.4, no.2, pp.201,205, April 2007 [15] Blanes, I.; Serra-Sagrista, J., "Clustered Reversible-KLT for Progressive Lossyto-Lossless 3d Image Coding," Data Compression Conference, 2009. DCC '09. , vol.,no., pp.233,242, 16-18 March 2009 [16] Q. Du , W. Zhu , H. Yang and J. E. Fowler "Segmented principal component analysis for parallel compression of hyperspectral imagery", IEEE Geosci. Remote Sens. Lett., vol.6, no. 4, pp.713 -717 2009 [17] T. Vladimirova, M. Meerman, and A. Curiel, "On-Board Compression of Multispectral Images for Small Satellites," in IEEE International Conference on Geoscience and Remote Sensing Symposium, 2006 (IGARSS 2006), Denver, Colorado, USA, 2006, pp. 3533-3536. Fig. 9. Hardware Architecture of Eigenvectors Computation 119
Compartilhar