The increasing demand for HyperSpectral Imaging (HSI) applications has pushed the appearance of new, smaller and more affordable HS cameras. Their adoption in embedded systems with real-time and energy constraints open research questions given the huge amounts of data involved. Traditionally, this issue is tackled with dimensionality reduction algorithms as Principal Component Analysis (PCA). In this regard, a highly impacting factor in performance is the data layout used to store the 3D HS image into a 1D memory. In this paper, the impact of using the three main HS data layouts (BSQ, BIP and BIL) is analyzed in terms of performance and energy efficiency for two PCA methods in the embedded NVIDIA Jetson TX1 GPU-based accelerator. Results show the bottlenecks found for each layout along with architectural insights that explain the observed behaviours. Consequently, we provide a set of recommendations to select a suitable PCA method for best performance and energy efficiency, depending on the number of principal components required. Additionally, we observe how different orderings with similar results account for a much different developer's productivity given the involved coding complexity in each case.