Page 281 - 2024-Vol20-Issue2
P. 281
277 | Alrudainy, Marzook, Hussein & Shafik
tions and contributions of the most recent existing approaches. III. SYSTEM ARCHITECTURE AND
Over recent years significant research has been conducted APPLICATIONS
addressing real-time energy reduction approaches. These tech-
niques have taken into account single metric based optimiza- The impetus of adopting heterogeneous architectures, com-
tion: mainly performance improvement within a particular prising two or various types of CPUs, is recently increasing.
power budget, or performance-constrained for power reduc- Although these platforms provide superior performance, it
tion [14]. For instance, real-time dynamic voltage frequency is essential to ensure optimum energy consumption while
scaling (DVFS) control method for power reduction of many- exercising various types of workloads. The Odroid-XU3
core embedded platforms has been proposed in [15, 21–23]. board facilitates approaches including affinity, DVFS, and
Their method utilizes performance and user experience con- core manually disabling, normally utilized to enhance sys-
straints to obtain the minimum DVFS combinations by adopt- tem operation in respect of energy consumption and perfor-
ing reinforcement learning and transfer principles. Others mance. The Odroid-XU3 board is a small heterogeneous
illustrated another power reduction method that models real- 8-cores computational platform. This board can run Android
time workload analysis to constantly maintain the core allo- 4.4 or Ubuntu 14.04 operating systems. The primary element
cations and DVFS combination through predictive controls of Odroid-XU3 board is the 28 nm Application Processor
using multinomial logic regression [16]. A number of re- Exynos 5422. The main processor architecture depicted in
search papers have also demonstrated analytical investiga- Fig. 1. This multiprocessor system on chip (MPSoC) is de-
tions adopting simulation frameworks, including McPAT, and veloped by ARM big.LITTLE heterogeneous architecture and
gem5. These studies have utilized task mapping, DVFS, and comprises of a low power Cortex-A7 quad core block, a high
offline optimization methods to significantly reduce the power performance Cortex-A15 quad core processor block, 2GB
dissipation under workloads variations [17, 24–26]. A novel DRAM LPDDR3, and a Mali-T628 GPU. Further, this board
work in [11] presented low complexity runtime management comprises of 4 real time current sensors that provide the op-
approach based on workload classification for heterogeneous portunity to measure power consumption on the 4 separated
many core platforms. This approach addresses most config- power blocks: little (A7) CPUs, big (A15) CPUs, DRAM,
uration space of odroid-xu3 platform including core types, and GPU. In addition, there are also 1 temperature sensor
threads allocation, optimum dynamic voltage and frequency for the GPU and 4 temperature sensor for each of the A15
scaling. CPUs. The clock frequency and supply voltage (Vdd) of the
A hardware based load balancing scheme for homogeneous Odroid-XU3 board, for each power block, can be adjusted
many-core system is assessed in aspect of power consumption using a range of pre-defined range of values. For example, the
and thermal behavior [7]. In this scheme, a power minimiza- low power Cortex-A7 quad core block has a set of frequencies
tion is reached by powering off the dark silicon area. In [8], ranged between 200 MHz and 1400 MHz with a step size of
to minimize static power consumption during the sub-clock 100 MHz, while the performance Cortex-A15 quad core block
cycle, a power gating based sub-clock approach was imple- features a set of frequencies ranged between 200 MHz and 2
mented in ARM Cortex-M0 processor. In the same context, GHz with a step size equal to 100MHz.
Charles et al. [9] performed per core power gating (PCPG)
in contemporary homogeneous Intel Core i7 processor. It is The PARSEC real application benchmark suite supports
illustrated that additional power headroom can be transferred both emerging and current workloads for multi processing
to the active cores by power gating dark silicon area, idle hardware [27]. It contains a various set of workloads from
cores, to boost their frequency and voltage without overstep diverse domains including systems applications or interactive
the thermal and power envelop. Likewise, transferring energy animation that mimic large-scale commercial workloads. In
saving from dark silicon area into enabled cores was studied our paper, Therefore, PARSEC applications has been adopted
in [10] using a homogeneous many core platforms named as and exercised on the Odroid-XU3 system on chip (SoC) whose
AMD Opteron 6168. The practical outcomes of this work
are relied on manually adjustment of dynamic voltage scaling TABLE II.
(DVS) combination integrated with per core power gating CHARACTERISTIC OF PARSEC BENCHMARK [27]
approach.
Application Domain Type
ferret Similarity Search CPU
cannel CPU
Engineering CPU+mem
bodytrack Computer Vision mem
streamcluster mem
fluidanimate Data Mining
Animation