Tutorial 6: Assignment 2 discussion

Question 2

This question is straightforward if you're comfortable reasoning about probabilities. Most students had an intuitive sense of what the probabilities should be but were unclear about how to argue these formally. Any argument about probability is must be expressed within the framework of probabilistic reasoning (which is to say, using the axioms of probability)

We are given a family of functions $\mathcal{H}$, each operating on $\mathbb{Z}_m$, and told that for any triple of distinct elements $x_1, x_2, x_3$ and any elements $y_1$, $y_2$, $y_3$, we have that \begin{equation} \label{eq:measure_q2} \Pr_{h \in_R \mathcal{H}} [ (h(x_1),h(x_2),h(x_3)) = (y_1,y_2,y_3) ] = \frac{1}{m^3} \end{equation}

Let's fix $x_1$, $x_2$, $x_3$ and define the random variable $T$ as follows:

\begin{equation} \label{eq:rv_T} T = (h(x_1),h(x_2),h(x_3)) \mbox{ where $h \in_R \mathcal{H}$} \end{equation}

$T$ takes values over $\mathbb{Z}_m^3$. $T$ is described by a distribution $D_T$, supported over $\mathbb{Z}_m^3$ whose measure is given by \eqref{eq:measure_q2}.

A Warmup

You might be interested in the distribution $h(x_1)$ for $h \in_R \mathcal{H}$. i.e. for a fixed value of $y$, you might be interested in the probability \begin{equation} \label{eq:marginal} \Pr_h[h(x_1) = y] \end{equation} You might guess that this probability is $1/m$ but we must reason that this is the case.

The only information we have about the distribution of outputs $\mathcal{H}$ is \eqref{eq:measure_q2}, so we know the probability that $T$ takes any particular triple of values. However this is fairly detailed information, and we can use this to deduce the the probability that $T_1$, the first coordinate of $T$, takes some particular value by noting that this less specific event can be expressed in terms of the more specific events we know. Specifically, \[ [T_1 = y] = \bigcup_{y_2,y_3} [T = (y, y_2, y_3)] \] Note further that the events on the right hand side are all disjoint. This allows us to apply the additivity of $\Pr$ to deduce \begin{align*} \Pr[h(x_1) = y] &= \sum_{y_2,y_3} \Pr[ (h(x_1),h(x_2),h(x_3)) = (y_1,y_2,y_3)] \end{align*} There are $m^2$ summands each of value $m^{-3}$, which gives us \begin{equation} \label{eq:marginal_q2} \Pr[h(x_1) = y] = \frac{1}{m} \end{equation} The distribution of $h(x_1)$ is one of the marginals of $D_T$.

The marginals of $D_T$ don't give sufficient information to prove that $\mathcal{H}$ is universal; this computation was done to give a rigorous demonstration of probabilistic reasoning, however it does contain all of the ingredients of our solution.

That \mathcal{H} is Universal

To show that $\mathcal{H}$ is universal, we will decompose the event of a collision of $h$ in terms of the probabilities of triples. Specifically, \begin{equation*} [h(x_1) = h(x_2)] = \bigcup_{y_1, y_2} [ (h(x_1),h(x_2),h(x_2)) = (y_1,y_1,y_2) ] \end{equation*} By reasoning identical to the above, we have that \begin{equation} \Pr_h[h(x_1) = h(x_2)] = \frac{1}{m} \end{equation} and so $\mathcal{H}$ is a universal family.

CSC 265 Fall 2013