Tiger Video Fileserver Review from Troy Ronda on 2005-12-01 (mbox)

From: Troy Ronda <ronda_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 1 Dec 2005 10:01:21 -0500

Tiger Video Fileserver Review
Review By: Troy Ronda

Video data is both large and computationally expensive. Providing high
quality of a network presents many challenges – from providing
sufficient service in the network to handling time-sensitive data. This
paper is concerned with providing a file server that can meet the
demanding timings of video distribution to many clients. Video servers
succeed when they consistently meet deadlines, scale well and deal
appropriately with component failures. Tiger uses a cluster of standard
desktop workstations connected via a backplane. They choose this design
because a single video stream is small compared to the available I/O
bandwidth of workstations. Tiger distributes chunks of each video across
all of the machines in the cluster (as well as a small amount of
redundancy). The video data is split into blocks of X second(s) each.
The blocks are assigned round-robin to all of the disks in the cluster.
Interestingly, they make the observation that parts of a disk track
perform at different rates; hence the redundant mirrored data is stored
in the slow area, while the primary video data is stored in the fast
area. Tiger provides streaming video data so it must provide resource
guarantees. This is accomplished through a schedule of viewers for the
bottlenecked resource (network or disk). The observation is that
scalability is limited by the number of disks (and therefore the number
of workstations) that hold a particular file. Another observation is
that maintaining a global schedule is not scalable. Therefore, each
workstation maintains its own, possibly outdated, local schedule. That
is, workstations only maintain a schedule near where its own disks are
processing. The workstations forward schedules to the next workstation
in line (from the round-robin distribution of blocks). Tiger does disk
reads ahead of the schedule when possible, avoiding performance
variations. In the unfailed experiment, all components are alive and the
loss rate is 1 in 180,000 blocks. In the failed experiment, one machine
fails and the loss rate is approximately 1 in 40,000 blocks.

The paper was well-written but perhaps too long for the description. I
do not need all of the “implementation” description provided. The
authors make the correct observation that bandwidth is the limiting
factor, not disk space. Primary data is stored in the fastest area of
the disk, while mirrored data is stored in the slow areas of the disk.
Failure is mitigated by passing schedules to both the next successor and
the second successor. The entire system is built out of cheap
components. It is good that the authors recognized the requirements for
their application and did not deviate from them for the entire paper.
For example, the real requirement of their system is meeting deadlines
(even under failure) and secondarily maximizing the usage of their
hardware. It is also good that the single point of failure represented
by the controller can be mitigated. I think the main strength in this
work is load balancing, which is accomplished, in this work, by striping
data. The authors make the assumption that demand for content is
dynamic, therefore this property is important. The non-failure mode, of
course, works very well. The CPU loads increase linearly with the number
of streams making the system scalable with respect to CPU vs streams.
The time-to-delivery of the first block, for this experiment, also looks
acceptable in figure 10.

It would be nice to see some memory usage data, since the Tiger system
reads ahead of the schedule. It would be nice to see how well the system
performs under less-ideal circumstances. For example, how does it
perform on the Internet? This isn’t really important for this paper,
however. Figure 9 shows heavy CPU usage even with only one failure. This
concerns me a great deal. There seems to be two main assumptions in this
work. The first is that two consecutive nodes will not fail. Perhaps the
second is that only one (or a very small number of) node(s) fail;
although this is certainly not clear. Why only show one failure? I want
to see when the system collapses. I also want comparisons to the related
servers given in section 6. I realize that the difference is that Tiger
better responds to changing load demands. However, what about the other
systems using main memory for caching video streams? For example, a
server dedicated to one stream, with plenty of main memory might have
not have to do many disk reads. Additionally, it is better able to meet
scheduling demands. What about replicating content to other servers “on
the fly” based on demand? Although, there is a serious problem with
failures in this dedicated server model. I think these are important
comparisons that are left out, especially when one of the main purposes
of the system is meeting scheduling demands.
Received on Thu Dec 01 2005 - 10:00:35 EST

This archive was generated by hypermail 2.2.0 : Thu Dec 01 2005 - 10:26:51 EST