Project
|
|
This page describes the project for System Simulation.
Data centers consume a large and growing amount of energy (see
this report and
this article).
Much of the energy consumed is wasted when servers operate at very low utilization
levels. Generally, server machines consume consume roughly the same amount of
energy whether lightly or highly utilized. This notion - and its implications
to energy use of data centers - is described in
this paper.
A key technology to reduce energy waste in data centers is virtualization (see
this paper).
With virtualization, multiple physical machines can be consolidated into one
machine when the offered load to the data center is low. This enables unused
machines to be powered-down and energy consumption is then reduced. Previous
work suggests that even for large data centers, power management is most
effective at the rack or cluster level (see
this paper).
A key challenge is to develop a policy to consolidate machines such that the
performance criteria specified by a Service Level Agreement (SLA) can still
be met. Thus, the trade-off is one of energy consumption versus response
time.
For this project you will build a simulation model of a server cluster and
experiment with policies to power-up and power-down machines (assuming that
virtualization is used to allow such consolidation to occur - you will not be
modeling virtualization). You will use both synthetic request arrivals (Poisson
arrivals) and a trace of request interarrival times taken from a real web server
as your workload. Several simplifications will be made to make this problem
tractable.
System specification
The specification of the system is:
- The cluster has 10 server machines that can each be modeled as a single
server queue.
- The request service time is deterministic and is 200 milliseconds for
each machine in the cluster.
- A load balancer controls the cluster. The load balancer distributes
arriving requests to powered-up machines in a round-robin fashion.
- This load balancer also executes an algorithm (or policy) to power-up
and power-down server machines in response to measurements taken at the machines.
One possible policy is described below.
- A powered-up machine consumes 200 W and a powered-down machine consumes
5 W.
- The time to power-up and power-down a machine is instantaneous.
- A machine must be powered-up (or powered-down) for a minimum of 1 minute
before it can change its power state.
A figure of the system is here.
Service Level Agreement (SLA)
The SLA for the system is:
- The server cluster must maintain an SLA based on measured response time.
The SLA states that the mean response time must not exceed 250 milliseconds
and that the 99% response time must not exceed 500 milliseconds.
Given power-up/power-down policy
The given load balancer power-up/power-down policy is as follows:
Do forever
Wait for a 1 minute sample period
Collect statistics to determine the utilization for the last sample period from all powered-up machines
Determine the grand mean utilization for all powered-up machines
If the grand mean is greater than a high threshold then power-up one additional machine
If the grand mean is less than a low threshold then power-down one additional machine
Note that the number of powered-up machines cannot be greater than 10 or less
than 1 at any time.
Workload
You are to study the performance of this system with two workloads.
- Poisson arrivals with a rate of 1.245 requests per second.
- A trace from a real production web server (the trace is
here).
The trace (in ASCII text format) contains one month of interarrival times to a
real production server at a small business.
What you are to do (and grading)
You are to model the above described system and its power-up/power-down
policy and study the effects of key parameters (factors) on response time
performance. You are to determine best possible parameter values that minimize
energy use while still meeting the SLA. You are to also invent, describe, model,
and evaluate your own policy to try to improve on the given policy. Even if
your policy is not better than the given policy, this is OK if the policy is
based on good engineering judgement and its evaluation is complete.
You are to do the following:
- Characterize the server trace (10 points)
- Develop the simulation model for the above system (and policy) and validate
it (10 points)
- Describe the factors and the experiment design (10 points)
- Determine the best possible parameter values for the Poisson workload
and determine the energy savings (20 points)
- Determine the best possible parameter values for the server
trace workload and determine the energy savings (20 points)
- Invent and describe your own policy (10 points)
- Evaluate your own policy and compare it to the given policy for
both the Poisson and server trace workloads (10 points)
- Complete a related work literature review (10 points)
You are to document your findings in a properly formatted IEEE-style paper
of maximum length 5 pages.
Up to 30 points can be subtracted for poorly written/formatted paper and
source code. Up to 20 points extra credit for a particularly insightful policy
that yields much better performance (that is, uses less energy while meeting
the SLA) than the above given policy.
Project submission
Please submit your project as follows. Please email to me one zip file with
filename your last name followed by your first name (e.g.,
ChristensenKen.zip). In the zip file please have you paper named as
your last name followed by your first name followed by "_paper" (e.g.,
ChristensenKen_paper.pdf) and your model source code (i.e., the
.c files). Please name your source files with last name, first
name as well (e.g., ChristensenKen.c). Include a readme in your zip
file if you have more than one source code file. Please do not email your
submission multiple times, you will receive a "Got it" email from me when
I have received your submission.
Miscellaneous
Some miscellaneous items are:
- The template for IEEE-style papers is
here.
- The coding style guide you are to follow is
here.
Last update on June 25, 2013
|