review: Tiger from Jing Su on 2005-12-01 (mbox)

From: Jing Su <jingsu_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 1 Dec 2005 10:51:45 -0500

The goal of Tiger is to spread video content over several cluster
systems in order to provide high quality video streams to many
clients. Its approach is to stripe videos in units of time across
many cluster servers (which it calls cubs). Because the striping is
in units of time, cubs advance their slots in lockstep with the
viewers. By having a loosely consistent view of the play schedule,
load can be distributed across all cubs and the system as a whole is
able to judge whether adding more clients would overburden the system.

The description of Tiger found in this paper makes most sense for a
CBR video stream. They describe a modified scheduling system for
handling VBR files, but there is no implementation for it. However,
in the worst case, a VBR system can behave like a CBR system except
with wasted idle resources. In their description, the VBR blocks need
to take extra parameters into consideration so that a single cub does
not get overburdened for particular timeslices.

I don't understand why the authors tried to coin a new phrase,
especially one which isn't particular catchy or intuitive. What they
call a hallucination is simply a global shared data structure -- one
where no single entity has a full copy of. Like they mentioned in
related work, DNS can be seen as a "hallucination".

The schedule is implemented like multi-forwarded token-ring. Because
video files are linear, each cub only has to maintain the slots it is
servicing now, and it can determine the slots which must be serviced
next. At each time-step, cubs forward this viewer information to the
next cub.

The biggest weakness of this work is the assumed homogeneity of the
system, and its limited ability to tolerate faults. Tiger only
tolerates the failure of any single disk or cub within the whole
system at any one time (tolerate as in service does not degrade).
However, one can imagine that such a limited failure model would not
suffice for an eBlockbuster store. This also raises many unanswered
questions as to how the system might be upgraded in the future. I
also don't understand how new content is added to the Tiger system,
and whether the whole system must be stopped in order to add content.

This failure model limitation is important because the biggest cost of
running such a cluster is human maintenance, not hardware. How does
the performance of Tiger compare to the "big iron" solutions offered
by SGI and such? Does using commodity hardware + estimated human cost
actually outweigh buying big quality hardware?
Received on Thu Dec 01 2005 - 10:51:51 EST

This archive was generated by hypermail 2.2.0 : Thu Dec 01 2005 - 11:16:30 EST