Massoud Pedram

Stephen and Etta Varra Professor

Department of EE-Systems

University of Southern California

Back to the research page

1999-2020 Projects



USC SPORT: System Power Optimization and Regulation Technologies

Project URL: SPORT Lab

We investigate power estimation and low power design of CMOS VLSI circuits and systems all different abstraction levels. Our emphasis is on developing mathematically rigorous analysis and optimization algorithms and power-aware design methodologies for solving various problems of practical interest and import. Our most recent work has focused on energy-efficient enterprise computing, reliability-power efficiency tradeoffs in VLSI circuits and systesm, design of hybrid energy storage systems, dynamic power/thermal management in chip multiprocessors, core-level voltage and frequency scaling, low power displays, ASIC design with power gating and multiple voltage islands, and current source based modeling of power and timing in VLSI circuits. More details about various ongoing projects are included below.



Accurate gate modeling under variation with neural networks

Deeply scaled FinFET devices are the optimal choice for low power applications based on their specific characteristics over conventional MOSFET devices. However, these devices are very sensitive to process variation and exhibit non-linear timing and power behavior. Due impact of several number of variation parameters, it is not-practical for the conventional industrial characterization process (e.g. using LUTs) to capture this non-linear behavior. We used neural networks for capturing non-linear timing behavior of deeply-scaled circuits.

Related work:


Energy-aware Task Scheduling in Real-Time Systems with Hard Deadline Constraints

Energy efficiency is one of the most critical design criteria for modern embedded systems such as multiprocessor system-on-chips (MPSoCs). Dynamic voltage and frequency scaling (DVFS) and dynamic power management (DPM) are two major techniques for reducing energy consumption in such embedded systems. Furthermore, MPSoCs are becoming more popular for many real-time applications. One of the challenges of integrating DPM with DVFS and task scheduling of real-time applications on MPSoCs is the modeling of idle intervals on these platforms. In this project, we present a novel approach for modeling idle intervals in MPSoC platforms which leads to a mixed integer linear programming (MILP) formulation integrating DPM, DVFS, and task scheduling of periodic task graphs subject to a hard deadline. We also present a heuristic approach for solving the MILP and compare its results with those obtained from solving the MILP.

Related work:


Margin and yield calculation for Single Flux Quantum (SFQ) logic cells

Sponsor: Intelligence Advanced Research Projects Activity (IARPA).

In this project novel margin calculation methods are introduced for SFQ cels. These methods calculate a set of parameter margins for each logic cell such that if all parameters lie within the boundary of the calculated margins, parametric yield values are near one.


Collaborative Intellignece


Latency and energy consumption of DNN queries can be significantly improved by splitting the workload between the mobile and cloud. Real-time scheduling of computations between the mobile and cloud and efficient feature communication are studied in this project.

Related work:




Meta-learning focuses on learning over the task space rather than instance space by training a general model which is able to quickly adapt to new unseen tasks. The implications of fast adaptation to new unseen tasks can also be effective on the traditional paradigm of neural networks training. Modeling generalization, few-shot learning, and task generation are studied in this project.

Related work:


Improving the Energy Efficiency and Lifetime of Coarse-Grained Reconfigurale Architectures


The error resiliency of applications such as media processing has provided designers with a new technique called approximate computing (AC), which abandons exactness of computation in favor of improved efficiency. In functional approximation, which is our focus here, a more simpler function different than the actual design is implemented that can be generated manually by the designer (in the case of approximate arithmetic blocks) or automatically (when an arbitrary function is given). We generated an approximate non-iterative divider which is highly accurate and energy efficient.

Related work:


Towards Green Communication: Energy Efficient Solutions for the Next Generation Network


This project investigates energy efficiency problems in next generation network. Intelligent power m anagement solutions are studied to maximize the utility objective. For example, dynamic switching of base stations, smart scheduling for renewable energy, content caching and cooperative transmission.

Related work:


VINE: A Variational Inference-Based Bayesian Neural Network Engine

Sponsor: Defense Advanced Research Projects Agency.

The primary goal is to develop a Bayesian Neural Network (BNN) with an integrated Variational Inference (VI) engine to perform inference and learning under uncertain or incomplete input and output features. A secondary goal is to enable robust decision making under noise and variability in the observed data and without reference to a ground truth.

Algorithm/Hardware Co-Design of Input Dimension Reduction Module

We developed a new approximation to gradient descent optimization that is suitable for hardware implementation of input dimension reduction module using independent component analysis. The proposed approximation enables a hardware implementation that operates at an order of magnitude higher clock frequency compared to prior work and achieves two orders of magnitude improvement in throughput.

Related work:

Medical Dataset Construction with Severity Scores and Comorbidities

We investigated physiological measurements from a publicly available and de-identified medical data set called MIMIC-III, which consists of 58,976 distinct hospital admissions. Nine commonly-used severity scores (including SAPS-II, APS-III, LODS, SOFA, QSOFA, SIRS, OASIS, MLODS, and SAPS) and the Elixhauser comorbidities are extracted based on the original dataset. The extracted features are used for a mortality prediction application in a BNN and achieve competitive results.

Gaussian Random Number Generator in Hardware

One crucial component to enable hardware implementation of a BNN is the Gaussian random number generator. We developed a RAM based Linear Feedback Gaussian Random Number Generator (RLF-GRNG) inspired by the properties of binomial distribution and linear feedback logics. The proposed RLF-GRNG requires minimal and sharable extra logic for controlling and indexing and is ideal for parallel random number generation.

Knowledge Transfer Framework

Knowledge transfer is widely adopted in learning applications to allow for different domains, distributions, and tasks to be used during training and testing. Since a conventional BNN cannot be used for action recommendation without encountering the “no ground truth” problem (i.e., the best action is not present in the training data), we propose a knowledge transfer framework which first constructs a BNN for outcome prediction only (i.e. latent model construction) and then transfer the knowledge learned to the domain of action recommendation.


Performance and Power Efficiencies of Network On-Chip

Sponsors: National Science Foundation (the Software and Hardware Foundations).

Improving the Quality of Service in Application Mapping on NoC-Based Multi-Core Platforms

With tens to possibly hundreds of cores integrated in current and future multiprocessor systems-on-chips (MPSoCs) and chip-multiprocessors (CMPs), multiple applications usually run concurrently on the system. However, existing mapping methods for reducing overall packet latency cannot meet the requirement of balanced on-chip latency when multiple applications are present. We address the looming issue of balancing minimized on-chip packet latency with performance-awareness in the multi-application mapping of CMPs.

The approach of adding express channels to the tile-based NoCs has gained increasing attention. However, this approach also greatly changes the packet delay estimation and traffic behaviors of the network, both of which have not yet been exploited in existing mapping algorithms. Therefore, we explore the opportunities in optimizing application mapping for express channel-based on-chip networks.

Related work:

Improving the Power Efficiencies of On-Chip Networks

Compared with traditional bus structures, the relatively complex NoCs with routers and links can draw a substantial percentage of chip power. An effective approach to reduce NoC power consumption is to apply power gating techniques. We explore different power gating schemes of on-chip networks to achieve low energy consumption as well as small latency penalties.

Related work:


Near-Threshold Computing and Deeply-Scaled Devices

Sponsors: DARPA PERFECT program, and National Science Foundation (the Software and Hardware Foundations).

Device Simulation and Performance Prediction for Deeply-Scaled FinFET Devices

As the geometric dimension of transistors scales down, FinFET devices are proved to better address the challenges facing conventional planar CMOS devices. Due to manufacturing limitations, deeply-scaled FinFET devices below 10nm feature size have not been manufactured. Nevertheless, it is crucial to investigate the performance of such devices with lower feature sizes in order to shed some light on further studies on novel process techniques and circuit structures. In our work, the 7nm-gate-length FinFET structure models are built up and simulated using the Synopsys TCAD tool suite. We generate the predicted performance of the FinFET devices with different design parameters, supply voltages, and die temperatures, from which SPICE-compatible compact models are extracted and used further in circuit-level design and optimization.

Related work:

Standard Cell Libraries and Circuit Synthesis for Deeply-Scaled FinFET Devices

A standard cell library containing timing and power information at different input and output conditions, i.e., input slew rates and output load capacitance, is required to enable logic synthesis, time and power analysis with the most advanced FinFET device technology. We generate 7nm FinFET device models by using Synopsys TCAD simulator and characterize standard cells through HSPICE simulations. Multiple supply voltages ranging from the near-threshold to the super-threshold regime are supported in our 7nm FinFET technology nodes, allowing both high performance and low power usage. In addition, devices with multiple threshold voltages are supported to enable multi-threshold technology. Synthesis results demonstrate that 7nm FinFET technology can achieve 15X circuit speed improvement and 350X energy consumption reduction, against the 45nm CMOS technology. 7nm FinFET standard cell libraries are available at here.

FinFET Sizing in Sub/Near-Threshold Regimes

FinFET has been proposed as an alternative for bulk CMOS in current and future technology nodes due to more effective channel control, reduced random dopant fluctuation, high ON/OFF current ratio, lower energy consumption, etc. Key characteristics of FinFET operating in the sub/near-threshold region are very different from those in the strong-inversion region. Therefore, FinFET sizing again becomes the focus of attention.

Life-Cycle Assessment of FinFET versus Bulk CMOS Technologies

The manufacturing of modern semiconductor devices involves a complex set of nanoscale fabrication process that are energy and resource intensive. It is important to understand and reduce the environmental impacts of manufacturing and usage of semiconductor circuits. We presented the first life-cycle energy and inventory analysis of FinFET integrated circuits and a comparative analysis with CMOS technology. A gate-to-gate inventory analysis is provided accounting for manufacturing, assembly, and use-phase. The functional unit used in this work is a (FinFET or CMOS) processor with the same functionality and performance level. Two types of applications are considered: high-performance servers and low-power mobile devices. The following conclusions are observed: (i) FinFET circuits achieve lower use-phase energy consumption compared with CMOS counterparts, and (ii) FinFET circuits can achieve less manufacturing and assembly energy because the effect of smaller size outweighs that of more complex manufacturing process.

Energy Efficient Memory Designs in Deeply-scaled Technologies

The aggressive down-scaling of transistors to the sub-10nm regime exacerbates short channel effects as well as device mismatches. Under such circumstances, conventional 6T SRAM cells made of bulk CMOS devices suffer from poor read and write stabilities. Accordingly, in order to improve the cell stability, at the device level, planar CMOS transistors are replaced with FinFET devices, and, at the circuit level, more robust SRAM structures such as the 8T SRAM cell are adopted. Our research thus focuses on the design of high yield (i.e., robust against process variations) and energy efficient FinFET-based cache memories. For this purpose, we use a cross-layer design and optimization framework spanning device, circuit, and architecture levels.

Architectural Analysis of Caches in Deeply-scaled Technologies -- Future memory systems in deeply-scaled technologies (i.e., sub-10nm) necessitate FinFET support and more sophisticated SRAM cell structures. Accordingly, characteristics of SRAM cells need to be analyzed in order to find a desirable SRAM cell that simultaneously achieves high stability and low leakage power. Furthermore, evaluating such memory systems at the architecture-level requires modifications to the existing memory models and analysis tools. Hence, we developed P-CACTI which enhances CACTI by adding the following features:

Related work:

Statistical Static Timing Analysis based on Efficient Current Source Modeling of FinFET Gates

Characteristics of FinFETs operating in the near/sub-threshold regime make it difficult to verify the timing of a circuit using conventional statistical static timing analysis (SSTA) techniques. Our work focuses on extending the CSM approach to handle VLSI circuits comprised of FinFET devices with independent gate control operating in the near/sub-threshold voltage regime and subject to process variations. In particular, we combine non-linear analytical models and low-dimensional CSM lookup tables to simultaneously achieve high modeling accuracy and time/space efficiency. The proposed model can be used to perform statistical static timing analysis (SSTA) based on the distribution of process variation parameters.

Semi-Analytical CSM for FinFET Devices with Independent Gate Control

We develop a semi-analytical approach for FinFET CSM operating in the sub/near-threshold regime, accounting for the unit feature of independent gate control as well as process variations. The proposed technique determines all the component values in this equivalent circuit model given the applied voltages on the front-gate-controlled and back-gate-controlled fins, the output voltage, as well as process variation parameters for N-type and P-type FETs. Only 2D LUTs are needed in our semi-analytical method to reduce the storage space requirement.

Related work: A Scalable CSM for Multiple-Input Cells

Another advantage of CSM over traditional STA models is that it can accurately capture the multiple-input switching (MIS) effect. The conventional multiple-input switching current source model (MCSM) is not scalable, since it requires high-dimensional lookup tables to account for all input, output, and internal node voltages (e.g., for a 3-input NAND gate, a 6-D lookup table is needed). We propose to model the current through each transistor in an m-input logic gate by building 2-D lookup tables with the key being Vgs and Vds of the transistor in question. Having looked up the current through all transistors in the design, we can then calculate the new output and internal node voltages.

Related work:


Therminator: a thermal simulator for smartphones

Thermal design and management of smartphones is concerned with a new challenge: the constraint of skin temperature. This constraint refers to the fact that the temperature at the device skin must not exceed a certain upper threshold. Reference showed that most people experience a sensation of heat pain when they touch an object hotter than 45˚C. Ideally speaking, distributing the heat uniformly onto the device skin results in the most effective heat dissipation. However, in practice, majority of the heat flows in vertical direction from the AP die, and thus hot spots with a high temperature are formed on the device skin.

To address this design challenge, we designed Therminator, a compact thermal modeling-based component-level thermal simulator targeting at small form-factor mobile devices (such as smartphones). It produces temperature maps for all components, including the AP, battery, display, and other key device components, as well as the skin of the device itself, with high accuracy and fast runtime. Therminator results have been validated against thermocouple measurements on multiple devices and simulation results generated by Autodesk Simulation CFD. In addition, Therminator is very versatile in the sense of handling different device specifications and component usage information, which allows a user to explore impacts of different thermal designs and thermal management policies. New devices can be simply described through an input file (in XML format). Finally, Therminator has implemented a parallel processing feature, allowing users to use GPU to reduce the runtime by more than two orders of magnitude for high-resolution temperature maps.

Therminator takes two input files provided by users. The specs.xml file describes the smartphone design, including components of interest and their geometric dimensions (length, width, and thickness) and relative positions. Therminator has a built-in library storing properties of common materials (i.e., thermal conductivity, density, and specific heat) that are used to manufacture smartphones. In addition, users can override these properties or specify new materials through the specs.xml file. The power.trace file provides the usage information (power consumption) of those components that consume power and generate heat, e.g., ICs, battery, and display. The power.trace can be obtained through real measurements or other power estimation tools/methods. power.trace is a separate file so that one can easily interface a performance-power simulator with Therminator.


Energy Efficiency in the Cloud and Data Centers

Request Dispatch, Resource Allocation, DVFS, and Geographically Load Balancing in Data Centers

The cloud computing paradigm is quickly gaining popularity because of its advantages in on-demand self-service, ubiquitous network access, location independent resource pooling, and transference of risk. The ever increasing demand for the cloud computing service is driving the expense of the data centers through the roof. In order to control the expense of a data center while satisfying the clients' requests specified in the service level agreements (SLAs), one must find the appropriate design and management policy. We have proposed joint optimization of request dispatch, resource allocation, dynamic voltage and frequency scaling (DVFS), and geographically load balancing among different data centers in a cloud computing system, in order to enhance the cloud computing system’s net profit, which is the revenue it receives from processing service requests minus the overall energy cost.

Related work:

Placement, capacity provisioning and request flow control for distributed cloud infrastructure

The cloud computing paradigm is quickly gaining popularity because of its advantages in on-demand self-service, ubiquitous network access, location independent resource pooling, and transference of risk. The ever increasing demand for the cloud computing service is driving the expense of the data centers through the roof. In order to control the expense of a data center while satisfying the clients' requests, one must find the appropriate design and management policy. Being aware of the interdependency between the problem of placement and capacity provisioning when designing a data center and resource allocation when operating the data center, we propose a generalized concurrent placement, capacity provisioning, and request flow control optimization framework for a distributed cloud infrastructure. With the trend of dynamic utility pricing, we try to utilize energy storage devices such as battery cells to further lower the utility cost of a data center.

Related work:

Power-aware Control for a Mobile Device in a Cloud Computing System

Because of the enlarging gap between the rapidly increasing power demand of mobile devices (e.g. smartphones, tablet PCs, etc.) and the limited growth of the volumetric/gravimetric energy density in rechargeable batteries, the battery lifetime has become a major concern in the design of these mobile devices. Apart from some well-known techniques including DVFS that can balance between the processing power and the power consumption of some components in the mobile device, computation offloading, a technique that transfers some local tasks to a server in the cloud, can also be used to extend the battery life of a mobile device. Our work propose an optimization framework for a mobile device based on a semi-Markov decision process model to determine the DVFS policy, proportion of tasks to be offloaded to the cloud, as well as the transmission bit rate used for offloading.

Related work:

Trace-based Workload Characterization for a Cloud Computing System

The prediction of the workload profile of the cloud service clients can help optimize the operational cost and improve the quality of service (QoS) for the cloud infrastructure provider. However, because of the complex dynamics in the users’ behavior, it is challenging to generate workload prediction results with high accuracy. By analyzing the cluster dataset released by Google, we identify the multi-fractal behavior of the workload profile, based on which we propose a prediction algorithm using fractional ordered derivatives and find that the alpha-stable distribution can be used to fit the distribution of a set of characteristics of the workload.

Related work:


Hybrid Electrical Energy Storage Systems: Design, Optimization, and Validation

Sponsor: National Science Foundation

Conventional EES systems only consist of a single type of EES element. Unfortunately, no available EES element can fulfill all the desired performance metrics of an ideal storage means, e.g., high power/energy density, low cost/weight per unit capacity, high round-trip efficiency, and long cycle life. An obvious shortcoming of a homogeneous EES system is that the key figures of merit (normalized with respect to capacity) of the system cannot be any better than those of its constituent EES element.

HEES System Design and Configuration

An HEES system is comprised of different types of EES elements (e.g., batteries and supercapacitors), where each type has its unique strengths and weaknesses. The HEES system can exploit the strength of each type of EES element and achieve a combination of performance metrics that is superior to that of any of its individual EES components.

Related work:

HEES System Management

Based on the properties of the HEES system and characteristics of power sources (or load devices), we developed charge management policies (such as charge allocation, charge replacement, charge migration, bank re-configuration, SoH-management, and so on) to operate HEES system properly to achieve a near-optimal performance. The charge allocation is to maximize the charge allocation efficiency, defined as the ratio of energy received by EES banks and the total energy provided by power sources over a given time period, by properly distributing power of the incoming power to selected destination banks. The charge replacement problem in the HEES system is to adaptively select the EES banks and determine the discharging currents, from zero to a maximum limit, and the voltage level settings on a charge transfer interconnect (CTI) so that the given load demand is met and the charge replacement efficiency is maximized. While charge allocation and replacement deal with energy exchange with external power supply and load demand, charge migration is an internal energy transfer from one EES bank to another.

The lifetime of EES elements is one of the most important metrics that should be considered by the designers of the EES system. The EES system lifetime is usually described using the state of health (SoH), which is defined as the ratio of full charge capacity of a cycle-aged EES element to its designed capacity. State of health-aware charge management problem in HEES systems is to find charging/discharging current profiles for all EES banks and CTI voltage, aiming to improve both the cycle life of the EES arrays (mainly battery arrays) and overall cycle efficiency of the entire system.

Related work:

HEES System Implementation

A HEES prototype has been built based on our proposed HEES system architecture. The hardware part of the HEES prototype is comprised of three types of module: the EES bank module, the CTI module, and the converter module. For the software part, the user interface (UI) is designed using the LabVIEW, while control policies are implemented using the Mathscript module of the LabVIEW.

Related work:

User interface and control unit: The user interface (UI) is designed using LabVIEW. The LabVIEW UI monitors the runtime status of the HEES prototype, including the CTI voltage, voltage and input/output current for each EES bank, and calculates the instantaneous charging or discharging efficiency using these information.

Capital Cost-Aware Design and Control of HEES Systems

The deployment of residential HEES systems has the potential to alleviate the mismatch between electric energy generation and consumption. However, its wide application in residential units are prohibited due to the lack of a convincing and complete analysis of their economic feasibility.

In this project, we provide a complete cost-aware design and control flow of residential HEES systems. Specifically, we propose a two-step design and control method: first deriving daily management policies with energy buffering strategies and then determining the global problem of HEES specification based on the daily management results. We take into consideration the real-life factors such as the battery¡¯s capacity degradation, unit capital cost of EES elements, maintenance and replacing cost of the HEES system, etc. Simulation results show that this system achieves averagely 11.10% more profits compared to the none-buffering HEES system.

In addition, we present a design flow for HEES system in electric vehicles (EVs). Different from a residential HEES system, the EV HEES system is highly restricted by the weight requirement since larger weight results in higher traction energy consumption. We propose a Li-ion battery and supercapacitor hybrid system for EVs to reduce the daily cost and achieve high efficiencies.

Related work:

SIMES: A Simulator for HEES Systems

SIMES is a simulation platform targeted at fast and accurate simulation for HEES systems. SIMES models various elements in a HEES system, including different types of energy storage systems, power conversion circuitry, charge transfer interconnects, etc. Most of the models are calibrated based on measurement data of actual hardware performed in our lab.

SIMES consists of three modules: Parser, Simulator and Visualizer. Parser parses input data in the form of an XML file and constructs HEES system model. Simulator simulates the operation of the constructed system. Visualizer is a graphical user interface which can visualize both the HEES system configuration and the simulation output.

SIMES enables end users to freely explore the design space of HEES systems, as well as testing custom power management policies. In addition, SIMES provides an easy-to-use interface which allows experienced users to implement their own component models as an extension.

Related work:


Dynamic Power Management and Voltage/Frequency Scaling

Dynamic Power Management Using Model-Free Reinforcement Learning and Bayesian Classification

To cope with the variations and uncertainties that emanate from hardware and application characteristics, dynamic power management (DPM) frameworks must be able to learn about the system inputs and environment and adjust the power management policy on the fly. We present an online adaptive DPM technique based on model-free reinforcement learning (RL), which is commonly used to control stochastic dynamical systems. In particular, we employ temporal difference learning for semi-Markov decision process (SMDP) for the model-free RL. In addition a novel workload predictor based on an online Bayes classifier is presented to provide effective estimates of the workload states for the RL algorithm. In this DPM framework, power and latency tradeoffs can be precisely controlled based on a user-defined parameter. Experiments show that amount of average power saving (without any increase in the latency) is up to 16.7% compared to a reference expert-based approach. Alternatively, the per-request latency reduction without any power consumption increase is up to 28.6% compared to the expert-based approach.

We have further extended the RL-based DPM framework to (i) hierarchical DPM framework which jointly perform component-level DPM with a CPU scheduler, and (ii) DPM of a power-managed system with battery-based power supply or hybrid (battery + supercapacitor) power supply.

Related work:

Accurate Modeling of the Delay and Energy Overhead of Dynamic Voltage and Frequency Scaling in Modern Microprocessors

Dynamic voltage and frequency scaling (DVFS) has been studied for well over a decade. The state-of-the-art DVFS technologies and architectures are advanced enough such that they are employed in most commercial systems today. Nevertheless, existing DVFS transition overhead models suffer from significant inaccuracies, for example, by correctly accounting for the effect of DC-DC converters, frequency synthesizers, and voltage and frequency change policies on energy losses incurred during mode transitions. Incorrect and/or inaccurate DVFS transition overhead models prevent one from determining the precise break-even time and thus forfeit some of the energy saving that is ideally achievable. Through detailed analysis of modern DVFS setups and voltage and frequency change policies provided by commercial vendors, we introduce accurate DVFS transition overhead models for both energy consumption and delay. In particular, we identify new contributors to the DVFS transition overhead including the underclocking-related losses in a DVFS-enabled microprocessor, additional inductor IR losses, and power losses due to discontinuous-mode DC- DC conversion. We report the transition overheads for three representative processors: Intel Core2Duo E6850, ARM Cortex-A8, and TI MSP430. Finally, we present a compact, yet accurate, DVFS transition overhead macro model for use by high-level DVFS schedulers.

Related work:



Photovoltaic (PV) System Design, Control, and Optimization

Sponsor: National Science Foundation

PV systems have been widely deployed in electronic and electrical systems of various scales, such as embedded systems, hybrid electric vehicles, home appliances, satellites and power plants. Due to the intermittent nature of solar energy, power management techniques are imperative for maximizing the output power of a PV system.

Maximum Power Transfer Tracking for PV Systems

The PV panel exhibits a nonlinear output current-voltage (I-V) relationship, and under a given solar irradiance level there is an operating point (V, I) where the PV panel output power is maximized. We propose the maximum power transfer tracking technique, which considers the converter efficiency variation to maximize the output power of the whole PV system.

Related work:

Reconfigurable PV Panel to Combat Partial Shading Effect

The solar irradiance levels on PV cells in a PV panel may be different from each other, and this is what we call partial shading effect. Partial shading effect significantly degrades the output power of a PV panel. For example, if one fourth of the PV panel is completely shaded, the PV panel will suffer from power loss of nearly 50%. To improve the output power of a PV panel under partial shading, we propose a reconfigurable PV panel structure and a dynamic programming algorithm to reconfigure the PV panel dynamically under partial shading. Simulation results demonstrate that our method can improve the PV system output power by up to 2.31X. We have also built reconfigurable PV prototypes and demonstrated the effectiveness of PV panel reconfiguration technique.

Related work:

Online Fault Detection and Tolerance for PV Systems

A PV system may suffer from PV cell faults, which are caused by contact failure, corrosion of wire, hail impact, moisture, etc. When some of the PV cells in a PV panel become defective, it can lead to lower output power and shorter lifespan for the PV system. Unfortunately, manual fault detection and elimination are expensive and almost impossible for remote PV systems (e.g., PV systems in orbital or deep space mission). We present design principles and runtime control algorithms for a fault-tolerant PV panel, which can detect and bypass PV cell faults in situ without manual interventions.

Related work:

PV System on HEV/EV

On top of regenerative braking-based battery charging scheme, a PV system mounted on the HEV/EV can collect energy to charge vehicle batteries whenever there is solar irradiance. To fully make use of the vehicle surface areas, PV cells will be mounted on the hood, rooftop, door panels, etc. However, due to the uneven distributions of the solar irradiance and temperature on different vehicle surface areas, the actual output power of the vehicle PV system may be depressed. We propose a dynamic PV panel reconfiguration algorithm, which updates the PV panel configuration according to the change of irradiance and temperature distributions on the PV panel to enhance the output power. Furthermore, we investigate the customization of the PV panel installation on HEV/EV and implement a high-speed, high-voltage PV reconfiguration switch network with IGBT and a controller. We derive the optimal reconfiguration period based on the driving profiles, taking into account the on/off delay of IGBT, computation overhead, and energy overhead.

Related work:


Optimal Design of Power Delivery Network for System on Chip

Spornsor:Defense Advanced Research Projects Agency, National Science Foundation, and the Semiconductor Research Corporation.

Project Summary: While power modeling and dynamic power management in various VLSI platforms have been heavily investigated, there is one critical factor that has often been overlooked, and that is the power conversion efficiency of the power delivery network (PDN) in the platforms. The PDN provides power from a power source (e.g., battery) to all modules in a platform. In reality, voltage regulators (VRs), which play a pivotal role in the PDN, inevitably dissipate power, and power dissipations from all VRs can result in a considerable amount of power loss. This is mainly because of the characteristics of the VRs that their efficiency can drop dramatically under adverse load conditions (i.e., out-of-range output current levels). This project aims to improve the power conversion efficiency of the PDN. We have been proposing optimization methods that minimize the power loss of the PDN, thereby maximizing the system-wide power savings.

Optimizing the PDN in Smartphone Platforms

In a series of papers published in proceedings of the ISLPED-12 and IEEE T. CAD-14, we introduced optimization methods to improve the power conversion efficiency of the PDNs in mobile (smartphone) platforms. Starting from detailed models of the VR designs, two optimization methods have been presented: (i)static switch sizing to maximize the efficiency of a VR under statistical loading profiles, and (ii) dynamic switch modulation to achieve high efficiency enhancement under dynamically varying load conditions. To verify the efficacy of the optimization methods in actual smartphone platforms, we also presented a characterization procedure for the PDN. The procedure is as follows:(i) group the modules in the smartphone platform together and use profiling to estimate their average and peak power consumption levels, and (ii) build an equivalent VR model for the power delivery path from the battery source to each group of modules and use linear regression to estimate the conversion efficiency of the corresponding equivalent converter. Experimental results demonstrated the efficacy of the proposed methods.

Related work:

Dynamic Reconfiguration of VRs in Multicore Platforms

In DATE-14 paper, we focused on the dynamic control of the multiple VRs in chip multicore processor (CMP) platforms that support per-core DVFS. Starting with a proposed platform with a reconfigurable VR-to-core power distribution network (PDN), two optimization methods have been presented to maximize the system-wide energy savings: (i) reactive VR consolidation to reconfigure the PDN for maximizing the power conversion efficiency of the VRs under the pre-determined DVFS levels for the cores, and (ii) proactive VR consolidation to determine new DVFS levels for maximizing the total energy savings without any performance degradation. Results from detailed simulations based on realistic experimental setups demonstrated significant VR energy loss reduction and total energy saving.

Related work:

Architecture Development and Design Optimization of Hybrid Electrical Energy Storage Systems

There are examples of actual deployment of grid-scale EES systems to mitigate the gap between the supply and demand. In addition, most stand-alone renewable energy sources, such as solar energy, wind power, and hydropower require an EES system. However, current EES systems are mainly homogeneous, that is, they consist of a single type of EES element, and therefore, suffer from a fundamental shortcoming that will plague every homogeneous EES: key metrics (normalized with respect to capacity) of any homogeneous EES cannot be better than those of its individual storage elements. Consequently, a homogeneous approach is not viable for any system where none of the existing types of EES elements can fulfill all the required performance metrics - such as power density, energy density, cost per unit capacity, weight per unit capacity, round-trip efficiency, cycle life, and environmental effects. This limitation is preventing the adoption of a wide range of socially and economically useful technologies, such as wide adoption of grid-scale EES and electric vehicles (EVs), and causing significant inefficiencies for many others. Hence, elimination of this limitation of homogeneous EES systems is the primary motivation for our research.

Our approach for improving performance of EES systems is to exploit different types of EES elements, where each type has its unique strengths and weaknesses, to design hybrid EES system architecture and control policies that dramatically improve the key performance characteristics of the storage system. This approach will exploit fundamental properties that provide a heterogeneous energy storage system (HEES) with the potential to achieve a combination of performance metrics that are superior to that for any of its individual EES components. In fact, in some cases, it is possible for a HEES system to attain values of individual metrics that are close to their respective best values across its constituent EES elements. For example, it is possible that a HEES system can achieve the power density of its constituent EES component that has the highest power density (which is likely to have the highest cost) and, at the same time, achieve a cost close to that of its cheapest component (which is likely to have low density). Simply speaking, we pursue HEES since it holds the promise of providing us with the best of all worlds. Such dramatic improvements can be provided only by a HEES system that is well-designed and well-controlled. Hence, development of design and control techniques for HEES is our goal.

Tutorial given at the 2011 International Symposium on Quality Electronic Design, Santa Clara, CA -- Hybrid Electrical Energy Storage Systems

As of today, no single type of electrical energy storage (EES) element fulfills high energy density, high power delivery capacity, low cost per unit of storage, long cycle life, low leakage, and so on, at the same time. Following a review of conventional EES, we introduce a HEES (hybrid EES) system comprising heterogeneous EES elements based on the concepts of computer memory hierarchy. We introduce HEES design considerations aiming at the optimal charge management for various cost metrics


Design, Optimization, and Mapping of Quantum Algorithms to Quantum Circuit Fabrics

Quantum information processing has captivated atomic and optical physicists as well as theoretical computer scientists by promising a model of computation that can improve the complexity class of several challenging problems. To be able to do efficient quantum computation, one needs to have an efficient set of computer-aided design tools in addition to the ability of working with favorable complexity class and controlling quantum mechanical systems with a high fidelity and long coherence times. This is comparable with the classical domain where a Turing machine, a high clock speed and no error in switching were not adequate to design fast modern computers. Quantum circuit and layout design with algorithmic techniques and CAD tools are the focus of our research. We conduct research that spans the areas of computer programming, data structure and algorithms, and optimization while maintaining a strong relevance to quantum computing. For quantum circuit design, our research results in a systematic synthesis framework with favorable results for some families of functions, including modular exponentiation [6], quantum adders and multiplexers [1] and complex Toffoli gates [2]. In quantum layout design, we proposed several techniques to design quantum fabrics that use either MOVE [7] or SWAP [3,5] operation to change the location of quantum information or approximate communication overhead [4].

For more information, see the following papers.

  • Afshin Abdollahi, Mehdi Saeedi, and Massoud Pedram, "Reversible Logic Synthesis by Quantum Rotation Gates," Quantum Information and Computation , Vol. 13, No. 9-10, pp. 0771-0792, 2013 (arXiv:1302.5382).

  • Mehdi Saeedi and Massoud Pedram, "Linear-Depth Quantum Circuits for n-qubit Toffoli gates with no Ancilla" ( arXiv:1303.3557 ), 2013.

  • Mehdi Saeedi, Alireza Shafaei, and Massoud Pedram, "Constant-Factor Optimization of Quantum Adders on 2D Quantum Architectures," to appear in 5th Conference on Reversible Computation (RC), 2013 (arXiv:1304.0432).

  • Mohammad Javad Dousti and Massoud Pedram, "LEQA: Latency Estimation for a Quantum Algorithm Mapped to a Quantum Circuit Fabric," to appear in Proc. of the 50th Design Automation Conf. (DAC), Jun. 2013.

  • Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram, "Optimization of Quantum Circuits for Interaction Distance in Linear Nearest Neighbor Architectures," to appear in Proc. of the 50th Design Automation Conf. (DAC), Jun. 2013.

  • Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram, "Reversible Logic Synthesis of k-Input, m-Output Lookup Tables," Design Automation and Test in Europe (DATE), Mar. 2013.

  • Mohammad Javad Dousti and Massoud Pedram, "Minimizing the Latency of Quantum Circuits during Mapping to the Ion-Trap Circuit Fabric," Design Automation and Test in Europe (DATE), Mar. 2012.


    Designing Reliable and Power-Efficient VLSI Circuits and Systems

    Digital information management is the key enabler for the unparalleled rise in productivity and efficiency gains experienced by the world economies. Computing and information processing systems are important elements of the world's digital infrastructure by providing ever-present and ever-increasing general purpose and data-driven processing and storage capabilities for both wired and mobile users. As such, they are also significant drivers of economic growth and social change. However, continued expansion of computing and information processing systems is now hindered by their unsustainable and rising power needs, with associated electrical energy costs and peak power draw requirements. Moreover governments, people, and corporations are becoming increasingly concerned about the environmental impact of these systems i.e., their carbon footprint. Separately from all this, with the increasing levels of variability in the characteristics of nanoscale CMOS devices and on-chip interconnects and continued uncertainty in the operating conditions of VLSI circuits, achieving power efficiency and high performance in computing and information processing systems under process, voltage, and temperature variations as well as interconnect wear-out and device aging has become a daunting, yet vital, task.

    Keynote speech given at the 2011 International Symp. on Physical Design , Santa Barbara, CA -- Robust Design of Power-Efficient VLSI Circuits

    It is against the backdrop of rising power demands and energy costs as well as increased device- and circuit-level variability and aging effects that I present a number of best practices and methods for improving the power-performance efficiency of VLSI circuits and systems. The reviewed techniques range from dynamic power management to design of power-aware circuits, and from power/clock gating to leakage power minimization. A key issue to be addressed is how to deal with process and environment-induced variability of circuit parameters through statistical modeling and robust optimization and how to manage uncertainty about the workload and input data characteristics through observations and closed feedback loop control.


    Green Computing: Reducing Energy Cost and Carbon Footprint of Information Processing Systems

    Our research aims to develop technical approaches for improving energy efficiency in the enterprise computing systems and data centers ranging from server-level power/thermal management to energy balancing and HVAC control in the data center to application software with builtin power tuning levers. This is a critically important topic with many different beneficiaries and players and excellent opportunities for research and development.

    Keynote speech given at the 2010 International Workshop on IT and Future Society, Jeju Island, South Korea -- Energy Efficient Enterprise Computing and Green Datacenters

    Datacenters provide the supporting infrastructure for a wide range of economic activities based on digital information. As such, they are extremely important drivers of economic growth. They are also at the center of societal changes enabling new media for cyber-social interactions. However, the continued growth of datacenters is now hindered by their unsustainable and rising energy needs. Apart from datacenter energy consumption and associated costs, corporations and governments are also concerned about the environmental impact of datacenters, in terms of their CO2 footprint. In my talk I will describe a number of techniques for improving the energy efficiency of enterprise computing platforms and datacenters ranging from task scheduling and server consolidation to combined power and cooling optimizations and adaptive control algorithms built on a variety of mathematical optimization frameworks.

    Lecture given at the 2009 SIGDA Design Automation Summer School -- Energy-Efficient Computing

    The increasing demand for higher processing power and storage capacity, along with the shift to high-density computing, is driving the energy expenses of data centers through the roof. A data center is a facility used to house computer systems and associated components, such as telecommunications and storage systems. Data centers sit at the center of the ICT ecosystem. Indeed, by the end of 2009, energy costs will emerge as the second-highest operating cost (behind labor) in 70% of data center facilities worldwide. According to an Environmental Protection Agency report, data centers in the US alone consumed about 61 billion kilowatt-hours in 2006 for a total electricity cost of about $4.5 billion. If current trends continue, this demand would double by 2012. In an energy-constrained world, this level of consumption is unsustainable and comes at increasingly unacceptable social and environmental costs. Data center energy efficiency has thus become a public policy concern and it is imperative that data centers implement efficient methods to minimize their energy use


    Stochastic Approaches for Dynamic Thermal Management in High Performance Microprocessor Chips

    Sponsor: National Science Foundation - Computer Systems Research

    Project Summary: Peak power dissipation and the resulting temperature rise have become the dominant limiting factors to processor performance and a significant component of its design cost. Expensive packaging and heat removal solutions are needed to achieve acceptable substrate and interconnect temperatures in high-performance microprocessors. Current thermal solutions are designed to limit the peak processor power dissipation to ensure its reliable operation under worst-case scenarios. However, the peak power and ensuing peak temperature are hardly ever observed. Dynamic thermal management (DTM) has been proposed as a class of micro-architectural solutions and software strategies to achieve the highest processor performance under a peak temperature limit. When the chip approaches its thermal limit, a DTM controller initiates hardware reconfiguration, slow-down, or shutdown to lower the chip temperature. Possible response mechanisms include micro-architectural adaptations e.g., fetch toggling, register file resizing, and issue width reduction, and/or on-the-fly performance adjustment e.g., dynamic voltage and frequency scaling and functional unit shut-down. The proposed research aims to develop a new DTM solution that takes a global, predictive approach based on constructing and utilizing a continuous-time Markovian decision process model of the microprocessor chip and the application programs. The offline algorithms developed by this framework are provably optimal whereas the online versions of these algorithms are easily deployable and highly flexible. The project thus produces temperature-aware policies and techniques for ensuring that the microprocessor chips operate within the allowed temperature zone while having maximum possible performance yet not being over-designed.

    A Stochastic Local Hot Spot Alerting Technique -- In an ASPDAC-08 conference paper, we addressed the questions of how and when to identify and issue a hot spot alert in a microprocessor. These are important questions since temperature reports by thermal sensors may be erroneous, noisy, or arrive too late to enable effective application of thermal management mechanisms to avoid chip failure. More precisely, we presented a stochastic technique for identifying and reporting local hot spots under probabilistic conditions induced by uncertainty in the chip junction temperature and the system power state. In particular, we introduced a stochastic framework for estimating the chip temperature and the power state of the system based on a combination of Kalman Filtering (KF) and Markovian Decision Process (MDP) model. Experimental results demonstrated the effectiveness of the framework and show that the proposed technique alerts about thermal threats accurately and in a timely fashion in spite of noisy or sometimes erroneous readings by the temperature sensor.

    Continuous Frequency Adjustment Technique Based on Dynamic Workload Prediction -- In a VLSI Design-08 conference paper, we presented a technique for continuous frequency adjustment (CFA) which enables one to adjust the frequency values of various functional blocks in the system at very low granularity so as to minimize energy while meeting a performance constraint. A key feature of the proposed technique is that the workload characteristics for functional blocks are effectively captured at runtime to generate a frequency value that is continuously adjusted, thereby eliminating the delay and energy penalties incurred by transitions between power-saving modes. The workload prediction is accomplished by solving an initial value problem (IVP). Applying CFA to a real-time system in 65nm CMOS technology, we demonstrate the effectiveness of the proposed technique by reporting 13.6% energy saving under a performance constraint.

    A Unified Framework for System-level Design: Modeling and Performance Optimization of Scalable Networking System -- In an ISQED-07 conference paper, we presented a new unified modeling framework, called the extended queuing Petri net (EQPN), which combines extended stochastic Petri net and G/M/1 queuing models, to realize the design of reliable systems during the design time, while improving the accuracy and robustness of power and temperature optimization for high-speed scalable networking systems. The EQPN model is employed to represent the performance behaviors and to minimize power consumption of the system under performance constraints through mathematical programming formulations. Being able to model the system with the EQPN would enable the users to accomplish the design of reliable and optimized system at the beginning of design cycle. The proposed system model was compared with existing stochastic models with real simulation data.

    Minimizing Power Dissipation during Write Operation to Register Files -- In an ISLPED-07 conference paper, we introduced a power reduction mechanism for the write operation in register files (RegFiles), which adds a conditional charge-sharing structure to the pair of complementary bit-lines in each column of the RegFile. Because the read and write ports for the RegFile are separately implemented, it is possible to avoid pre-charging the bit-line pair for consecutive writes. More precisely, when writing same values to some cells in the same column of the RegFile, it is possible to eliminate energy consumption due to precharging of the bit-line pair. At the same time, when writing opposite values to some cells in the same column of the RegFile, it is possible to reduce energy consumed in charging the bit-line pair thanks to charge-sharing. Motivated by these observations, we modified the bit-line structure of the write ports in the RegFile removing the per-cycle bit-line pre-charging and employing conditional data dependent charge-sharing. Experimental results on a set of SPEC2000INT / MediaBench benchmarks showed an average of 61.5% power savings with 5.1% area overhead and 16.2% increase in write access delay. Lower power dissipation also resulted in lower substrate temperature in the RegFile.

    Active Bank Switching for Temperature Control of the Register File in a Microprocessor -- In a GLS-VLSI-07 paper, we described an effective thermal management scheme, called active bank switching, for temperature control in the register file of a microprocessor. The idea is to divide the physical register file into two equal-sized banks, and to alternate between the two banks when allocating new registers to the instruction operands. Experimental results show that this periodic active bank switching scheme achieves 3.4℃ of steady-state temperature reduction, with a mere 0.75% average performance penalty.

    Dynamic Thermal Management for MPEG-2 Decoding In an ISLPED-06 paper, we presented an effective dynamic thermal management (DTM) scheme for MPEG-2 decoding by allowing some degree of spatiotemporal quality degradation. Given a target MPEG-2 decoding time, we dynamically select either an intra-frame spatial degradation or an inter-frame temporal degradation strategy in order to make sure that the microprocessor chip will continue to stay in a thermally safe state of operation, albeit with certain amount of image/video quality loss. For our experiments, we used the MPEG-2 decoder program of MediaBench and modify/combine Wattch and HotSpot for the power and thermal simulations and measurements, respectively. Our experimental results demonstrated that we can achieve thermally safe state with spatial quality degradation of 0.12 RMSE and with frame drop rate of 12.5% on average.

    Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach -- In an ICCD-06 paper, we introduced a stochastic DTM technique in high-performance VLSI system with especial attention to the uncertainty in temperature observation. More specifically, we presented a stochastic thermal management framework to improve the accuracy of decision making in DTM, which performs dynamic voltage and frequency scaling to minimize total power dissipation and on-chip temperature. Multi-objective optimization with the aid of a mathematical programming solver was used to reduce operating temperature. Experimental results with a 32-bit embedded RISC processor demonstrated the effectiveness of the technique and show that the proposed algorithm ensures thermal safety under performance constraints.


    System-Wide Dynamic Voltage Scaling and Power Management in Battery-Powered Embedded Systems

    Sponsor: National Science Foundation - Computer Systems Research

    Project Summary: One of the key problems confronting computer system designers is the management and conservation of energy sources. This challenge is evident in a number of ways. The goal may be to extend the battery lifetime in a computer system comprising of a processor and a number of memory modules, I/O cores, and bridges. This is especially important in light of the fact that power consumption in a typical portable electronic system is increasing rapidly whereas the gravimetric energy density of its battery source is improving at a much slower pace. Other goals may be to limit the cooling requirements of a computer system or to reduce the financial burden of operating a large computing facility. The objective of this research is to develop system-wide power optimization algorithms and techniques that eliminate waste or overhead and allow energy-efficient use of the various memory and I/O devices while meeting an overall performance requirement. More precisely, this project tackles two related problems: dynamic voltage and frequency scaling targeting the minimization of the total system energy dissipation and global power management in a system comprising of modules that are potentially managed by their own local power management policies, yet must closely interact with one another in order to yield maximum system-wide energy efficiency. The broader impacts of this project include the development of energy-aware computer systems as the key for cost-effective realization of a large number of high-performance applications running on battery-powered portable platforms and the education and training of young researchers and engineers to be able to address complex and intertwined energy efficiency/performance challenges that arise in the context of designing next-generation information technology products and services.

    Flow-Through-Queue based Power Management for Gigabit Ethernet Controller -- Computer networking is beginning to support multi-gigabit data transfer rates. In an ASPDAC-07 paper we presented an energy-efficient packet interface architecture and a power management technique for gigabit Ethernet controllers, where low-latency and high-bandwidth are achieved to meet the pressing demands of extremely high frame-rate data. More specifically, we presented a predictive-flow-queue (PFQ) based packet interface architecture to adjust the operating frequencies of various functional blocks in the system at a fine granularity so as to minimize the total system energy dissipation while meeting the performance constraints. A key feature of the proposed architecture is that runtime workload prediction of the network traffic is implemented so as to generate an operating frequency value that is continually adjusted, thereby eliminating the delay and energy penalties incurred by transitions between power-saving modes. Furthermore, a modeling approach based on Markov processes and queuing models is employed, which allow one to apply mathematical programming formulations for energy optimization. Experimental results with a designed 65nm gigabit Ethernet controller show that the proposed energy-efficient architecture and power management technique can achieve system-wide energy savings under tighter performance constraints.

    Dynamic Voltage and Frequency Management Based on Variable Update Intervals or Frequency Setting -- In an ICCAD-06 paper, we developed an efficient adaptive method to perform dynamic voltage and frequency management (DVFM) for minimizing the energy consumption of microprocessor chips. Instead of using a fixed update interval, our DVFM system makes use of adaptive update intervals for optimal frequency and voltage scheduling. The optimization enables the system to rapidly track the workload changes so as to meet soft real-time deadlines. The method, which is based on introducing the concept of an effective deadline, utilizes the correlation between consecutive values of the workload. Since in real situations the frequency and voltage update rates are dynamically set based on variable update interval lengths, voltage fluctuations on the power network are also minimized. The technique, which may be implemented by simple hardware and is completely transparent from the application, leads to power savings of up to 60% for highly correlated workloads compared to DVFM systems based on fixed update intervals.

    Power-Aware Scheduling and Voltage Setting for Tasks Running on a Hard Real-Time System -- In an ASPDAC-06 paper, we presented a solution to the problem of minimizing energy consumption of a computer system performing periodic hard real-time tasks with precedence constraints. In the proposed approach, dynamic power management and voltage scaling techniques are combined to reduce the energy consumption of the CPU and devices. The optimization problem is initially formulated as an integer programming problem. Next, a three-phase heuristic solution, which integrates power management, task scheduling and task voltage assignment, is provided. Experimental results show that the proposed approach outperforms existing methods by an average of 18% in terms of the system-wide energy savings.

    Hierarchical Power Management with Application to Scheduling -- In an ISLPED-05 paper, we presented a hierarchical power management (HPM) architecture which aims to facilitate power-awareness in an energy-managed computer (EMC) system with multiple self-power-managed components. The proposed architecture divides the PM function into two layers: system-level and component-level. Although the system-level PM has detailed information about the global state of the EMC and its various computational and memory resources, it cannot directly control the power management policies of the constituent components, which are typically designed and manufactured by different IC vendors. In particular, the system-level PM resorts to adaptive service request flow regulation and online application scheduling to force the component-level PM's to function in such a way that would minimize the total system energy dissipation while meeting an overall eerformance target. Preliminary experimental results show that HPM achieves a 25% reduction in the total system energy compared to the "best" component-level PM policies.

    Dynamic Voltage and Frequency Scaling for Energy-Efficient System Design -- This talk, which was given at NSTU, Taiwan in 2005, summarizes the results of our research in the area of dynamic voltage and frequency scaling (DVFS). More precisely, the first part of this talk describes an intra-process DVFS technique targeted toward non real-time applications running on an embedded system platform. The key idea is to make use of runtime information about the external memory access statistics in order to perform CPU voltage and frequency scaling with the goal of minimizing the energy consumption while translucently controlling the performance penalty. The proposed DVFS technique relies on dynamically-constructed regression models that allow the CPU to calculate the expected workload and slack time for the next time slot, and thus, adjust its voltage and frequency in order to save energy while meeting soft timing constraints. This is in turn achieved by estimating and exploiting the ratio of the total off-chip access time to the total on-chip computation time. The proposed technique has been implemented on an XScale-based embedded system platform and actual energy savings have been calculated by current measurements in hardware. The second part of this talk describes a DVFS technique that minimizes the total system energy consumption for performing a task while satisfying a given execution time constraint. We first show that in order to guarantee minimum energy for task execution by using DVFS it is essential to divide the system power into fixed, idle and active power components. Next, we present a new DVFS technique, which considers not only active power, but also idle and fixed power components of the system. This is in sharp contrast to previous DVFS techniques, which only consider the active power component. The fixed plus idle components of the system power are measured by monitoring the system power when it is idle. The active component of the system power is estimated at run time by a technique known as workload decomposition whereby the workload of a task is decomposed into on-chip and off-chip based on statistics reported by a performance monitoring unit (PMU). We have implemented the proposed DVFS technique on the BitsyX platform; an Intel PXA255-based platform manufactured by ADS Inc., and performed detailed energy measurements.


    Hardware/Software Support and Algorithms for Dynamic Backlight Scaling in TFT LCDs

    Sponsor: National Science Foundation - Computing Processes and Artifacts

    Project Summary: Display components have become a key focus of efforts for maximization of the battery lifetime in a wide range of portable, display-equipped, microelectronic systems and products. A particularly effective technique in reducing the power consumption of all kinds of displays is the dynamic backlight scaling technique, where the intensity of the backlight lamp and the LCD transmittance function are changed concurrently and in proportion so that the same visual perception is created in the human eyes at much lower levels of power consumption. This research therefore aims to develop spatiotemporal and/or color-aware backlight scaling techniques for pixel transformation of the displayed still images or video streams so as to maximize the energy saving in a target platform. The new techniques , which take advantage of the human visual system characteristics to minimize distortion between the original and backlight-scaled images/videos, will be implemented and demonstrated on the Apollo Testbed II hardware platform. The broader impact of the research is to significantly reduce the power consumption of typical handheld devices, increasing their discharge-cycle lifetime, thereby, enabling more widespread and convenient use of such devices. The backlight dimming technology can also be applied in AC-powered systems where the key concern is the energy cost to the individual user as well as the society at large. This technology has the potential to reduce the typical energy bill of a desktop computer by 30% or so (when the system is being used). This research, if successful, will expedite introduction of advanced display technologies (such as LED-based backlighting for LCDs, or organic LED-based displays) since it will reduce their power cost without sacrificing quality.

    LCD (Liquid Crystal Display) TVs are becoming the main stream in FPD (Flat Panel Display) market. In spite of their superb performances (e.g. vivid image representation and high native resolution) compared to other types of TVs such as PDP (Plasma Display Panel), LCDs suffer from a number of well-known shortcomings such as motion blur artifact, low contrast ratio, and low brightness. Furthermore, backlighting for the modern LCD panels is typically done with the aid of a 2-D array of individually luminance-controlled white LED's, each of whom serves as the backlight for a fixed-size region on the LCD panel. We are currently investigating dimming and scanning of the 2-D LED array with the aid of appropriately time-shifted and duty cycle adjusted Pulse Width Modulation (PWM) signals. The goal is both to minimize the total power dissipation of the LED array drivers while improving the static contrast ratio and eliminating the motion blur artifact in LCD TVs. More precisely, we are developing a 2-D PWM-driven backlight dimming technique which simultaneously dims certain regions of the LCD screen and sets the pixel values by applying an optimal pixel value transformation function. In addition, we are investigating a 2-D backlight scanning technique which determines a new duty cycle for the PWM signal for each white LED driver so as to preserve the original backlight intensity for the LED while ensuring that the LED can be completely turned off for a period of time during each frame. This off time, which is about 8ms in the target display system, greatly reduces the motion blur. At the same time, if the pixel value updates due to refresh operation take place during this off time, the viewer will only see the changed pixel values corresponding to the new frame and will not be subjected to effects arising from pixel value transitions while the pixels are being exposed to back light. Both of the proposed ideas are being implemented in a Xilinx FPGA (Spartan 3E) and tested on a Samsung 40-inch LCD TV.

    B2Sim: A Fast Micro-Architecture Simulator Based on Basic Block Characterization -- State-of-the-art architectural simulators support cycle accurate pipeline execution of application programs. However, it takes days and weeks to complete the simulation of even a moderate-size program. During the execution of a program, program behavior does not change randomly but changes over time in a predictable/periodic manner. This behavior provides the opportunity to limit the use of a pipeline simulator. More precisely, in a CODED-06 paper, we presented a hybrid simulation engine, named B2Sim for (cycle-characterized) Basic Block based Simulator, where a fast cache simulator e.g., sim-cache and a slow pipeline simulator e.g., sim-outorder are employed together. B2Sim reduces the runtime of architectural simulation engines by making use of the instruction behavior within executed basic blocks. We integrated B2Sim into SimpleScalar and achieved on average a factor of 3.3 times speedup on the SPEC2000 benchmark and Media-bench programs compared to conventional pipeline simulator while maintaining the accuracy of the simulation results with less than 1% CPI error on average.

    Backlight Dimming in Power-Aware Mobile Displays -- In a DAC-06 paper, we introduced a temporally-aware backlight scaling technique for video streams. The goal is to maximize energy saving in the display system by means of dynamic backlight dimming subject to a video distortion tolerance. The video distortion comprises of (1) an intra-frame (spatial) distortion component due to frame-sensitive backlight scaling and transmittance function tuning and (2) an inter-frame (temporal) distortion component due to large-step backlight dimming across frames modulated by the psychophysical characteristics of the human visual system. The proposed backlight scaling technique is capable of efficiently computing the flickering effect online and subsequently using a measure of the temporal distortion to appropriately adjust the slack on the intra-frame spatial distortion, thereby, achieving a good balance between the two sources of distortion while maximizing the backlight dimming-driven energy saving in the display system and meeting an overall video quality figure of merit.
    The proposed dynamic backlight scaling approach is amenable to highly efficient hardware realization and has been implemented on the Apollo Testbed II. Actual current measurements demonstrate the effectiveness of proposed technique compared to the previous backlight dimming techniques, which have ignored the temporal distortion effect.

    DTM: Dynamic Tone Mapping for Backlight Scaling -- In a DAC-05 paper, we presented an approach for pixel transformation of the displayed image to increase the potential energy saving of the backlight scaling method. The proposed approach takes advantage of human visual system (HVS) characteristics and tries to minimize distortion between the perceived brightness values of the individual pixels in the original image and those of the backlight-scaled image. This is in contrast to previous backlight scaling approaches which simply match the luminance values of the individual pixels in the original and backlight-scaled images. Moreover, the proposed dynamic backlight scaling approach, which is based on tone mapping, is amenable to highly efficient hardware realization because it does not need information about the histogram of the displayed image. Experimental results show that the dynamic tone mapping for backlight scaling method results in about 35% power saving with an effective distortion rate of 5% and 55% power saving for a 20% distortion rate.

    HEBS: Histogram Equalization for Backlight Scaling -- In a DATE-05 paper, we presented a method for finding a pixel transformation function that minimizes the backlight intensity while maintaining a pre-specified image distortion level for a liquid crystal display. This is achieved by first finding a pixel transformation function, which maps the original image histogram to a new histogram with lower dynamic range. Next the contrast of the transformed image is enhanced so as to compensate for the brightness loss that arises from backlight dimming. The proposed approach relies on an accurate definition of the image distortion, which accounts for both the pixel value differences and a model of the human visual system and is amenable to highly efficient hardware realization. Experimental results show that histogram equalization for backlight scaling results in about 45% power saving with an effective distortion rate of 5% and 65% power saving for a 20% distortion rate. This is higher power savings compared to previously reported dynamic backlight scaling approaches.


    Design Techniques and Tools to Enable and Enhance Coarse-Grain Power Gating in ASIC Designs

    Sponsor: National Science Foundation - Computing Processes and Artifacts

    Project Summary: The semiconductor industry's $261 B in 2006 revenue does not accurately reflect its crucial role in enabling a $47 T ($61 T on a PPP basis) world economy to thrive and grow. This industry underpins the systems and technologies on which the people and governments of the world rely on for future prosperity. This industry is currently facing some extraordinary challenges, including variability of nano devices as well as excessive power dissipation in circuits and systems. In order for the industry to continue to expand and prosper, it is critical to address these challenges heads on. The proposed research takes on one of these two fundamental challenges, i.e., the "power crisis". More precisely, this project focuses on coarse-grain power gating in ASIC designs, which switches entire blocks/rows of standard cells. This choice is due to lower cost and greater leakage savings of coarse-grain power gating compared to its fine-grain counterpart, which inserts the header or footer in each standard cell in the ASIC design library. The project results are expected to include the following: (i) Distributed sleep transistor placement and sizing; (ii) Sleep signal scheduling to minimize the peak current demand on wakeup; (iii) Mode transition energy minimization to enable more frequent mode transitions; (iv) Local sleep signal generation for autonomous power gating; and (v) Power gating to enable multiple power modes. This project aims to address each of these tasks by developing algorithmic or mathematical programming solutions to solving each step and by developing a design flow and prototype software tools that enable widespread adoption of this very interesting and important technology in the ASIC design.

    Coarse-Grain MTCMOS Sleep Transistor Sizing Using Delay Budgeting -- Current state-of-the-art sleep transistor sizing algorithms minimize the total sleep transistor width subject to a maximum IR voltage drop on the virtual node of each MTCMOS switch cell. In these approaches, the DC noise constraint for the virtual node of a switch cell is somehow related to the tolerable delay increase in the circuit. Using a single maximum IR voltage drop value on all virtual nodes is over constraining the problem. Instead, we would like to set the DC noise constraint for the virtual node of each MTCMOS switch based on the minimum tolerable delay increase (i.e., the positive timing slack) for any logic cell in the corresponding module. The voltage drop allocation on the virtual nodes of the MTCMOS switches should thus be closely related to the timing slack allocation to individual cells in the circuit. In a DATE-08 paper, we introduced a new approach for minimizing the total sleep transistor width for a coarse-grain MTCMOS circuit assuming a given standard cell and sleep transistor placement. Our algorithm takes a maximum allowed circuit slowdown factor and produces the sizes of various sleep transistors in the standard cell layout while considering the DC parasitics of the virtual ground net. We showed that the problem can be formulated as a sizing with delay budgeting problem and solved efficiently using a heuristic sizing algorithm which implicitly performs maximum current calculation through sleep transistors while accounting for different current flow paths in the virtual ground net through adjacent sleep transistors. This technique uses at least 40% less total sleep transistor width compared to other approaches.

    Sizing and Placement of Charge Recycling Transistors in MTCMOS Circuits -- In an ICCAD-07 paper, we showed that the sizing and placement problems of charge-recycling transistors in charge-recycling multi-threshold CMOS (CR-MTCMOS) can be formulated as a linear programming problem, and hence, can be efficiently solved using standard mathematical programming packages. The proposed sizing and placement techniques allow us to employ the CR-MTCMOS solution in large row-based standard cell layouts while achieving nearly the full potential of this power-gating architecture, i.e., we achieve 44% saving in switching energy due to the mode transition in CR-MTCMOS compared to standard MTCMOS.

    Charge Recycling in MTCMOS Circuits: Concept and Analysis -- Design of a suitable power gating (e.g., multi-threshold CMOS or super cutoff CMOS) structure is an important and challenging task in sub-90nm VLSI circuits where leakage currents are significant. In designs where the mode transitions are frequent, a significant amount of energy is consumed to turn on or off the power gating structure. It is thus desirable to develop a power gating solution that minimizes the energy consumed during mode transitions. In a DAC-06 paper and an IEEE SSCS DLP talk in October 2006, we described such a solution by recycling charge between the virtual power and ground rails immediately after entering the sleep mode and just before wakeup. The proposed method can save up to 43% of the dynamic energy wasted during mode transition while maintaining the wake up time of the original circuit. It also reduces the peak negative voltage value and the settling time of the ground bounce.


    Statistical Static Timing Analysis and Circuit Optimization: A Current Source Model-Based Approach

    Sponsor: Seminconductor Research Corp.

    Project Summary The down scaling of layout geometries to 45nm and below has resulted in a significant increase in the packing density and the operational frequency of VLSI circuits. The conventional static timing analysis (STA) techniques model signal transitions as saturated ramps with known arrival and transition times and propagate these timing parameters from the circuit primary inputs to the primary outputs. However the different waveforms with identical arrival time and slew (transition) time applied to the input of a logic cell or an interconnect line can result in very different propagation delays through the component depending on the exact form of the applied signal waveform. In addition, as we move towards the 45nm and lower minimum feature sizes for the devices, process variations are becoming an ever increasing concern for the design of high performance integrated circuits. The process variations can cause excessive uncertainty in timing calculation, which in turn calls for sophisticated analysis techniques to reduce the uncertainty.

    Recent Results of the Current Source Model-Based Approach for Timing Analysis -- Our work focuses on the development of an accurate current source model of a CMOS logic cell with extensions to handle multiple input switching and statistical parameter variability. The work also includes development of efficient methods to generate the CSMs of logic cells, which are typically present in a standard cell library. The work addresses integration of CSMs of logic cells with a waveform propagation engine in order to produce a highly efficient and robust CSM-based static timing analyzer.


    Optimal Design of Power Delivery Network for System on Chip

    Partial support from the National Science Foundation

    Project Summary: Utilizing multiple voltage domains (also known as voltage island) is one of the most effective techniques to minimize the overall power dissipation - both dynamic and leakage - while meeting a performance constraint. In a system designed with multiple voltage domains, the power delivery network (PDN) is responsible for delivering power with appropriate voltage levels to different functional blocks (FB's) on the chip. Voltage regulator modules (VRM's) which are in charge of voltage conversion and regulation are inevitable components in this network. The selection of appropriate VRM's plays a critical role in the power efficiency of the PDN.

    Design of an Efficient Power Delivery Network in an SoC to Enable Dynamic Power Management In an ISLPED-07 paper, we introduced a new technique to design the power delivery network for a SoC design to support dynamic voltage scaling. In this technique the power delivery network is composed of two layers. In the first layer, DC-DC converters with fixed output voltages are used to generate all voltage levels that are needed by different loads in the SoC design. In the second layer of the power delivery network, a power switch network is used to dynamically connect the power supply terminals each load to the appropriate DC-DC converter output in the first layer. Experimental results demonstrate the efficacy of this technique.

    Optimal Selection of Voltage Regulator Modules in a Power Delivery Network -- Typically a star configuration of the VRM's, where only one VRM resides between the power supply and each FB, is used to deliver currents with appropriate voltage levels to different loads in the circuit. In a DAC-07 paper, we showed that using a tree topology of suitably chosen VRM's between the power source and FB's yields higher power efficiency in the PDN. We formulated and efficiently solved the problem of selecting the best set of VRM's in a tree topology as a dynamic program and efficiently solve it.


    Power Efficient SRAM Cell and Array Design

    Partial support from the National Science Foundation

    Project Summary: In many modern microprocessors, caches occupy a large portion of the die. For example, in Intel's Itanium 2 Montecito processor, more than 80% of the die is dedicated to caches. Since the leakage power dissipation is roughly proportional to the area of a circuit, the leakage power of caches is one of the major sources of power consumption in high performance microprocessors. Our research on SRAM design focuses on leakage reduction in such memory structures and on judicious use of multiple Vth and multiple tox transistors in a large SRAM array and power-ground-gated, data-retentive SRAM cells.

    Low-Leakage SRAM Design in Deep Submicron Technologies -- This January-2008 presentation has two parts. In the first part, a method based on dual-Vt and dual-Tox assignment is presented to reduce the total leakage power dissipation of SRAMs while maintaining their performance. The proposed method is based on the observation that read and write delays of a memory cell in an SRAM block depend on the physical distance of the cell from the sense amplifier and the decoder. Thus, the idea is to deploy different configurations of six-transistor SRAM cells corresponding to different threshold voltage and oxide thickness assignments for the transistors. Unlike other techniques for low-leakage SRAM design, the proposed technique incurs neither area nor delay overhead. In addition, it results in a minor change in the SRAM design flow. The leakage saving achieved by using this technique is a function of the values of the high threshold voltage and the oxide thickness, as well as the number of rows and columns in the cell array. Simulation results with a 65nm process demonstrate that this technique can reduce the total leakage power dissipation of a 64 512 SRAM array by 33% and that of a 32 512 SRAM array by 40%. In the second part, a gated-supply, gated-ground data retention technique for CMOS SRAM cells to enable design of robust and ultra low-power caches in very deep submicron CMOS technologies is presented. We show that, given a fixed value of the voltage difference on the power rails of the SRAM cell during the standby mode, the proposed power-ground-gating (PG-gating) solution achieves significantly higher leakage power savings compared to either power supply (P) gating or ground (G) gating techniques while improving the static noise margin and soft error rate. In particular, it is shown that optimum ground and supply voltage levels exist for which the SRAM cell leakage is minimized subject to a hold static noise margin constraint. When the PG-gated cell is not accessed for read/write operations, it is biased to the optimum values of ground and supply voltages, resulting in minimum leakage power consumption. Simulation results demonstrate that the PG-gating technique has a higher hold and read static noise margin, lower soft error rate, and also higher leakage saving compared to single P or G gating techniques at the expense of an increase in the area overhead. Moreover, the PG-gated cell exhibits less leakage variability under process and temperature variations compared to single P or G gating techniques. Moreover, its hold static noise margin is more robust to process variations. For a 64Kb SRAM array designed in 130nm CMOS technology with Vdd=1.3V and a 180mV hold static noise margin, the leakage power of PG-gated design is 60% lower than that of a low power G-gated design.




    Minimizing Leakage Power in CMOS Designs

    Support from miscellaneous sources

    Project Summary: In many new designs, the leakage component of power consumption is comparable to the dynamic component. Many reports indicate that, in sub-65 nm CMOS technology node, 40% or even higher percentage of the total power consumption is due to the leakage of transistors and this percentage will increase with technology scaling unless effective techniques are used to bring leakage under control. This research focuses on minimizing leakage in CMOS VLSI circuits.

    Minimizing Leakage Power in CMOS: Technology and Design Issues -- This tutorial given at EPFL in July 2008 focuses on circuit techniques and design methods to accomplish this goal. The first part of the presentation provides an overview of basic physics and technology and scaling trends that have resulted in the significant increase in sub-threshold and gate leakage currents. The part provides an in-depth description of multiple, Vdd, multiple-Vt, and multiple Tox techniques for leakage minimization in light of process variations and substrate temperature changes. The second part of this presentation describes a number of design optimization techniques for controlling leakage current, including, state assignment, technology mapping, and precomputation-based signal guarding. It will also present runtime mechanisms for leakage control including body bias control, transition to minimum leakage state, and power gating.

    Circuit and Design Automation Techniques for Leakage Minimization of CMOS VLSI Circuits -- This tutorial given at Samsung Research in October 2006 focuses on circuit techniques and design methods to accomplish leakage minimization in CMOS VLSI circuits. The first part of the presentation provides an overview of basic physics and technology and scaling trends that have resulted in the significant increase in sub-threshold and gate leakage currents. The part provides an in-depth description of multiple, Vdd, multiple-Vt, and multiple Tox techniques for leakage minimization in light of process variations and substrate temperature changes. This part will address the use of high permittivity gate dielectric, metal gate, novel device structures and circuit based techniques for controlling the gate tunneling current. The second part of this presentation describes a number of design optimization techniques for controlling leakage current, including, state assignment, technology mapping, and precomputation-based signal guarding. It will also present runtime mechanisms for leakage control including body bias control, transition to minimum leakage state, power gating, etc.


    Battery Aware Hierarchical Wireless Sensor Network for Distributed Data Collection

    Project Summary: Wireless sensor networks (WSN) have gained considerable attention in applications where spatially distributed events are to be monitored. Recent technological advances have led to the emergence of small battery-powered sensors with considerable processing and communication capabilities. We consider a distributed, hierarchical wireless sensor network of energy-constrained nodes. Each node in this network has limited computation and storage resources, wireless communication capability, and a limited energy source in the form of a battery. This network of autonomous nodes performs collaborative problem solving, such as providing situational and tactical awareness to the first respondents in an emergency situation, carrying out automatic intrusion detection/deterrence, or object recognition and tracking. The problem of interest is maximizing the network lifetime while providing a minimum quality of service requirement subject to some performance constraints (e.g., the response time.) Energy is considered as a key network resource that must be allocated and dispensed properly to maximize the network lifetime. We analyze network and wireless link properties and develop protocols that compensate/account for effects of extreme variations in wireless link dependability, many-to-one nature of the communication in a mixed multi-tier WSN, local high-contention nodes in the network, and relatively high cost of maintenance. This research addresses battery awareness of a monitoring sensor network as an intrinsic aspect of the distributed data collection task. This project will produce battery-aware algorithms and techniques for wireless sensor network design and deployment as the key enabler for cost-effective realization of many applications. The broader impact of this project will be to assist in the critical ongoing efforts to deploy networks of energy-constrained sensors and distribution/collection nodes for environmental, medical and security applications.

    Lifetime-Aware Hierarchical Wireless Sensor Network Architecture with Mobile Overlays -- With power efficiency and lifetime awareness becoming critical design concerns, we focus on energy-aware design of different layers of the WSN protocol stack. In a RAW-07 conference paper, we presented and analyzed a hierarchical wireless sensor network with mobile overlays, along with a mobility-aware multi-hop routing scheme, in order to optimize the network lifetime, delay, and local storage size. Furthermore, we show how certain physical layer attributes may affect the overall network lifetime. More specifically, we have investigated how certain adaptive modulation schemes may affect overall energy balancing in the network and hence its lifetime. Finally, we investigate new lifetime models which can be used to obtain more practical design criteria for energy-aware system design.


    Controlling Uncertainty and Handling Variability in System-Level Dynamic Power Management

    Project Summary: Variability represents diversity or heterogeneity in a well-characterized population. Fundamentally a property of Nature, variability is usually not reducible through further measurement or study. For example, different dies have different leakage power dissipations, no matter how carefully we measure them. Uncertainty represents partial ignorance or lack of perfect information about poorly-characterized phenomena or models. Fundamentally a property of the observer, uncertainty is usually reducible through further measurement or study. For example, even though an observer may not know the leakage power dissipation of every die coming out of a manufacturing plant, he or she can surely take more samples to gain additional (albeit still imperfect) information about the leakage power distribution. With the increasing levels of variability in the characteristics of nanoscale CMOS devices and VLSI interconnects and continued uncertainty in the operating conditions of VLSI circuits, achieving power efficiency and high performance in electronic systems under process, voltage, and temperature variations as well as current stress, device aging, and interconnect wear-out phenomena has become a daunting, yet vital, task. This research tackles the problem of system-level dynamic power management (DPM) in systems which are manufactured in nanoscale CMOS technologies and are operated under widely varying conditions over the lifetime of the system. Such systems are greatly affected by increasing levels of process variations typically materializing as random or systematic sources of variability in device and interconnect characteristics, and widely varying workloads and temperature fluctuations usually appearing as sources of uncertainty. At the system level this variability and uncertainty is beginning to undermine the effectiveness of traditional DPM approaches. It is thus critically important that we develop the mathematical basis and practical applications of a variability-aware, uncertainty-reducing DPM approach with the following unique features and capabilities.

    Improving the Efficiency of Power Management Techniques by Using Bayesian Classification In an ISQED-08 paper, we presented a supervised learning based dynamic power management (DPM) framework for a multicore processor, where a power manager (PM) learns to predict the system performance state from some readily available input features (such as the state of service queue occupancy and the task arrival rate) and then uses this predicted state to look up the optimal power management action from a pre-computed policy lookup table. The motivation for utilizing supervised learning in the form of a Bayesian classifier is to reduce overhead of the PM which has to recurrently determine and issue voltage-frequency setting commands to each processor core in the system. Experimental results reveal that the proposed Bayesian classification based DPM technique ensures system-wide energy savings under rapidly and widely varying workloads.

    Resilient Dynamic Power Management under Uncertainty In a DATE-08 paper, we presented a stochastic framework to improve the accuracy of decision making during dynamic power management, while considering manufacturing process and/or design induced uncertainties. More precisely, the uncertainties are captured by a partially observable semi-Markov decision process and the policy optimization problem is formulated as a mathematical program based on this model. Experimental results with a RISC processor in 65nm technology demonstrate the effectiveness of the technique and show that the proposed uncertainty-aware power management technique ensures system-wide energy savings under statistical circuit parameter variations.


    Design Methodologies and Techniques for Optimizing Power Consumption and Performance in Pipeline Circuits

    Project Summary: Excessive power dissipation and resulting temperature rise have become one of the key limiting factors to processor performance and a significant component of its cost. In modern microprocessors, expensive packaging and heat removal solutions are required to achieve acceptable substrate and interconnect temperatures. Due to their high utilization, pipeline circuits of a high-performance microprocessor are major contributors to the overall power consumption of the processor, and consequently, one of the main sources of heat generation on the chip. Our research is expected to propose techniques to minimize power consumption in pipeline circuits at different design levels and, at the same time, produce guidelines and tools for optimizing their power dissipation.

    A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip Flops -- In an ISLPED-08 paper, we presented a technique to address the problem of reducing the power consumption in a synchronous linear pipeline, based on the idea of utilizing soft-edge flip-flops (SEFF) for time borrowing and voltage scaling in the pipeline stages. We described a unified methodology for optimally selecting the supply voltage level of a linear pipeline and optimizing the transparency window of the SEFF so as to achieve the minimum power consumption subject to a total computation time constraint. We formulated the problem as a quadratic program that can be solved optimally in polynomial time. Our experimental results demonstrated that this technique is quite effective in reducing the power consumption of a pipeline circuit under a performance constraint. Next, we will improve the pipeline stages by using optimally designed flip-flops. Also, we will consider the effect of higher order constraints such as the interdependency between the setup and hold time, and then generalize the problem to the non-linear pipelines with multi-stage feed forward and feedback paths.


    Performance and Reliability Analysis and Optimization in Sub-45nm CMOS Circuits

    Project Summary: With the CMOS technology in the nanometer regime, reliability is becoming a major design concern. It seems in future designer will need to make power-performance-reliability tradeoffs at all levels of the VLSI circuit and system design. In this area our current research focuses on building accurate, fast and easy to use fault and reliability device models and incorporating these models into CAD tools. Because of reliability concerns physical scaling of CMOS has already been slowed. Many nanotechnologies are emerging that are an order of magnitude smaller than CMOS but all these technologies are far below CMOS in terms of reliability. Our current research also focuses on discovering new hybrid architectures that promise VLSI scaling at the system level in future technologies.

    Probabilistic Error Propagation in a Logic Circuit Using the Boolean Difference Calculus -- A gate level probabilistic error propagation model is presented which takes as input Boolean function of the gate, signal probability, the probability for signal being "1", and error probability at the gate inputs, and the gate error probability and generates the error probability at the output of the gate. The presented model uses the Boolean difference calculus and can be efficiently applied to the problem of calculating the error probability at the primary outputs of a multi-level Boolean circuit with a time complexity which is linear in the number of gates in the circuit. This is done by starting from the primary inputs and moving toward the primary outputs by using a post-order (reverse DFS) traversal. Experimental results demonstrate the accuracy and efficiency of the proposed approach compared to the other known methods for error calculation in VLSI circuits.

    Apollo Testbed

    We research three major areas in low power design of VLSI circuits and systems: software and system level power prediction and optimization, architectural/behavioral power estimation and optimization, and system-level dynamic power management.

    We investigate the problem of simultaneous scheduling and mapping of the computational and communication processes in a generalized task flow graph to HW/SW resources on a VLSI chip so as to minimize the energy dissipation while satisfying a given deadline and/or throughput constraint. As part of this research we examine the problem of modeling energy-latency characteristics of a given application program (for example, specified in a standard programming language such as C/C++) which is to be mapped to custom hardware and/or run on an embedded processor. We develop efficient, yet accurate, estimators at this high-level of design abstraction without having to do detailed compilation of the application program into the hardware and/or software components. This capability is in turn essential in achieving effective power-aware hardware/software co-design. At the same time we develop optimization techniques for power-conscious compiler targeting the StrongARM microprocessor. We research a number of problems related to power analysis and optimization at the behavioral/architectural level. In particular, we address early power estimation for combinational and sequential logic blocks. Examples include power estimation of a finite state machine circuit prior to state encoding, or of a combinational logic circuit before logic synthesis and mapping. We also develop power characterization of Intellectual Property (IP) cores at the architectural level and develop an automatic clock-gating tool for HDL descriptions. We consider dynamic power management techniques, which exploit the idleness of system components, and study the problem of determining optimal management policies for a variety of system models. In particular, we focus on operating system (OS) directed control policies and seek to develop realistic models of the hardware and software components and the system environment.

    The key research results include development of prototype software programs that perform power prediction of C/C++ and HDL descriptions of complex applications and systems, provide system-level component modeling and characterization for power, and optimize the application software (C/C++ or HDL) and the system software (OS) to achieve low power dissipation.


    Analysis and Design Techniques for Battery-Powered Digital CMOS Circuits

    In the past, the major concerns of the VLSI designer were area, speed, and cost; power consideration was typically of secondary importance. In recent years, however, this has begun to change and, increasingly, power is being given comparable weight to other design considerations. Several factors have contributed to this trend, including the remarkable success and growth of the class of battery-powered, personal computing devices and wireless communications systems that demand high-speed computation and complex functionality with low power consumption. In these applications, extending the battery service life is a critical design concern. There also exists a significant pressure for producers of high-end products to reduce their power consumption. The main driving factors for lower power dissipation in these products are the cost associated with packaging and cooling as well as the circuit reliability.

    Our research focuses on the problem of maximizing the battery service life in battery-powered CMOS circuits. In particular, we recently proposed an integrated model of the VLSI hardware and the battery sub-system that powers it. We showed that, under this model and for a fixed operating voltage, the battery efficiency (or utilization factor) decreases as the average discharge current from the battery increases. The implication is that the battery life is a super-linear function of the average discharge current. Furthermore, even if the average discharge current remains the same, different discharge current profiles (distributions) may result in very different battery lifetimes. The maximum battery life is achieved when the variance of the discharge current distribution is minimized. Finally, we demonstrated that accounting for the dependence of battery capacity on the average discharge current changes the shape of the energy-delay trade-off curve and hence the value of the operating voltage that results in the optimum energy-delay product for the target circuit. Consequently, we proposed a more accurate metric (i.e., the battery discharge rate times delay product as opposed to the energy-delay product) for comparing various low power optimization methodologies and techniques targeted toward battery-powered electronics. Analytical derivations as well as simulation results demonstrate the importance of correct modeling of the battery-hardware system as a whole.

    Our research has far-reaching implications for the design of battery-powered electronics by shifting the focus from power and energy minimization to battery service life maximization. It also brings up a number of new and exciting research problems, including, but not limited to, static and dynamic voltage scaling rules to maximize the battery service life subject to performance constraints, optimal choice of battery cells for a given VLSI circuit, circuit and architectural design of the VLSI system hardware to match the output characteristics of the battery cells that power it, use of multiple battery cells and dynamic power management schemes to maximize the service life of the battery subsystem, and even integrated on-chip battery-hardware design (micro-batteries for micro-electronics).

    Portable electronic devices tend to be much more complex than a single VLSI chip; They contain many components, ranging from digital and analog to electro-mechanical and electro-chemical. Hence reducing power consumption only in the digital VLSI circuits is insufficient. System designers have started to respond to the requirement of power-constrained system designs by a combination of technological advances and architectural improvements. Dynamic power management which refers to selective shut-off or slow-down of system components that are idle or underutilized has proven to be a particularly effective technique. Incorporating an effectual dynamic power management scheme in the design of an already-complex system is a difficult process that may require many design iterations and careful debugging and validation. The goal of a dynamic power management policy is to reduce the power consumption of an electronic system by putting system components into different states, each representing certain performance and power consumption level. The policy determines the type and timing of theses transitions based on the system history, workload and performance constraints.

    Our research focuses on the development of an abstract stochastic model of a power-managed electronic system and formulating the problem of system-level power management as a stochastic optimization problem based on the theories of continuous-time Markov decision processes and stochastic networks. This problem will be solved exactly and efficiently using a "policy iteration" approach. Extensions to more complex systems, non-stationary system behavior and non-Markovian decision making will be considered.


    Design Methodologies and Techniques for Temperature-dependent Reliability, Performance and Signal Integrity Analysis and Optimization of VLSI Interconnects

    Due to the ever-increasing failure rates in DSM interconnects, interconnect reliability has become a critical design concern in today's VLSI circuits. However, interconnect reliability and performance (i.e., speed) are tightly coupled and any approach to improve one metric has to consider its effect on the other. Temperature plays a very important role in determining both circuit reliability and performance. The proposed research focuses on detailed yet efficient characterization and quantification of electromigration (EM) and thermomigration (TM) induced failures in VLSI interconnect as well as design automation techniques to combat and control these failures. These techniques will work in a two-dimensional tradeoff space of performance and reliability (PR-space). The proposed research is expected to advance our understanding of EM and especially TM-induced failures in integrated circuits (IC's) and, at the same time, produce guidelines, algorithms, and tools for achieving a non-dominated operating point in the PR-space.

    Our work also focuses on the analysis and modeling of non-uniform chip temperature profile and the study of its effects on different aspects of signal integrity in very high performance VLSI interconnects. First, we will develop computationally efficient methods to calculate the thermal profile of VLSI interconnect lines. A temperature-dependent distributed RC interconnect delay model will be developed next. The model can be applied to a wide variety of interconnect layouts and temperature distributions to quantify the impact of these thermal non-uniformities on signal integrity issues. Using this model, we will show that global nets (including clock and power/ground distribution networks as well as long busses and set/reset lines) are the nets that are the most vulnerable to the thermal non-uniformities in the substrate. We will therefore develop computer-aided design techniques for constructing a thermally-driven zero skew clock routing tree, a power/ground distribution network, optimal buffer insertion in long interconnect lines, and, more generally, chip-level dynamic thermal management policies.


    Power-Aware Memory Bus Encoding

    This research develops encoding techniques to minimize the switching activity on a time-multiplexed Dynamic RAM (DRAM) address bus. The DRAM switching activity can be classified either as external (between two consecutive addresses) or internal (between the row and column addresses of the same address). For external switching activity in a sequential access pattern, we will develop an optimal encoding, PYRAMID code. Extensions of the basic code address different types of DRAM devices and bus architectures, and explore static vs. dynamic coding schemes. To minimize internal switching activity, we propose scattered paging and redundant coding techniques for both random and sequential access patterns. The proposed codes are expected to reduce power dissipation on the memory bus by a factor of two or more.

    We also develop encoding techniques for minimizing the switched capacitance on a non-multiplexed address bus between the processor and static memory. More precisely, we have developed the ALBOZ code, which is constructed based on transition signaling and the limited-weight codes, and with enhancements to make it adaptive and irredundant, results in up to 87% reduction in the bus switching activity at the expense of a small area overhead for realizing the encoder/decoder circuitry. Furthermore, building on T0 and Offset-Xor encoding techniques, we have developed three irredundant bus-encoding techniques that decrease switching activity on the memory address bus by up to 83% without the need for redundant bus lines. The power dissipation of encoder and decoder circuitry has also been calculated and shown to be small in comparison with the power savings on the memory address bus itself.


    Apollo: Adaptive Power Optimization and Control for the Land Warrior

    Project URL: Apollo Testbed

    The Apollo project aims at significantly reducing power dissipation of next-generation mobile DoD computing and communication systems by means of operating system-directed power management, power-aware software compilation, and system-level synthesis and optimization of the integrated hardware/software platform subject to performance and quality-of-service constraints.

    We consider dynamic power management techniques and study the problem of determining optimal management policies for a variety of system models. In particular, we focus on operating system (OS) directed control policies and seek to develop realistic models of the hardware and software components and the system environment in the Land Warrior System (LWS). We characterize power consumption of common arithmetic logic and memory blocks and develop instruction-level power macro-models for the StrongARM microprocessor and TI's digital signal processor 320c-5410 in addition to the major subsystems in the (next-generation) LWS.

    We investigate the problem of developing techniques for power-conscious architectural organization and optimization techniques targeting a StrongARM-based hardware platform that we are constructing based on the Intel's Assabet and Neponset boards plus a number of external devices. This platform is called the Apollo Testbed (AT). We also develop system and application software for the AT. This task will include development of the ARMLinux drivers for all external devices, the "map" application, and the utility software needed for the AT usage scenario that is provided to us by the IPM team of the Army CECOM.

    We develop encoding techniques to minimize the switching activity on a time-multiplexed Dynamic RAM (DRAM) address bus. We develop redundant (i.e., with INVERT bit) memory bus encoding techniques that reduce the switching activity on the bus between the FLASH memory and the processor. The proposed codes are expected to reduce power dissipation on the memory bus by a factor of two or more. We develop algorithms and techniques for power optimization of the FLASH and main memory hierarchy in the AT. More precisely, we explore use of different data representations for the images stored in the map database so as to reduce power-consuming accesses to the FLASH memory (which acts as the secondary storage in the AT) at the expense of more intensive computations on the SA 1110. We study and analyze the impact of various architectural optimization techniques on the power saving of the AT. Such techniques include power optimization and control for the LCD, the camcorder, and the network (wireless LAN) interface card.

    This work is done in collaboration with Prof. Niraj Jha of Princeton University. Dr. Jha will tackle both periodic and aperiodic task graphs, automatically generate and transform task graphs from the system specification, estimate system power and synthesize low-power system architectures. The system synthesis tools that will be developed include all supporting databases and simulation engines. The tools will synthesize a given system specification written in C or Hardware Description Language (HDL) into a low-power system architecture. He will analyze, model and optimize the power consumed by a real-time operating system (RTOS). He will develop behavioral synthesis tools for low power application-specific integrated circuits (ASICs). The work will be implemented on top of the Princeton university's synthesis system called IMPACT. Additional research topics are common-case computation, leakage power optimization and run-time adaptation in behavioral synthesis for low power.


    Low-Power Fanout Optimization

    Low-Power Fanout Optimization Using MTCMOS and Multi-Vt Techniques

    Although much research has been done to address fanout optimization problem in VLSI circuits, there is little work on low-power fanout optimization. More specifically, since both capacitive and leakage power dissipation of a fanout chain are proportional to its area, it has been widely accepted that power minimization of the fanout tree is equivalent to its area optimization. We have shown that due to short-circuit power dissipation, minimizing area does not necessarily result in a minimized power dissipation solution. In particular, the solution obtained from an area optimized fanout tree may dissipate excessive short-circuit power. We formulate the problem of minimizing the power dissipation of a fanout chain and show how to build a fanout tree out of these power-optimized chains. Additionally, to suppress the leakage power dissipation in a fanout tree, we use multi channel length (LGate) and multi-Vt techniques. In the presence of multi-LGate and multi-Vt options, we accurately model the delay and power dissipation of inverters as posynomials; therefore, our proposed problem formulation results in a convex mathematical program comprising of a posynomial objective function with posynomial inequality constraints which can be efficiently solved.