Today's tutorial concerns the first mathematical concept of the course, pseudorandom generators. We work out how some obviously biased generators fail to satisfy our definition of pseudorandomness in an effort to help the reader develop a feeling for the definition.

A number generator $G$ is a length-increasing function mapping $\{0,1\}^*$ to $\{0,1\}^*$ such that $|G(s)| = \ell(|s|) > |s|$ for some monotonic function $\ell$ and every string $s$.

A number generator $G$ is pseudo-random if for all polytime distinguishing algorithms $D$, the function $$ \begin{align} A_{\mathcal{D}}(n) = \left| \Pr_{s \sim U_n}[ D(G(s)) = 1] - \Pr_{\alpha \sim U_{\ell(n))}}[ D(\alpha) = 1] \right| \label{eq:pr_verbose} \end{align} $$ vanishes faster than the inverse of every polynomial; i.e. \begin{align} \forall c \exists n_c \forall n>n_c A_{\mathcal{D}}(n) < 1/n^c \label{eq:sufficiently_small_verbose} \end{align}

A function that's not pseudorandom

Let $G$ be the following number generator: $$ \begin{equation} \label{eq:biased_generator_1} G(x_1 \ldots x_n) = \begin{cases} x_1 \ldots x_n x_1 & n > 0 \\ 0 & \mbox{otherwise} \end{cases} \end{equation} $$ $G$ is not pseudorandom.

I assume the reader will agree that this generator, by repeating a bit, is definitely biased. We will see now that this generator does not satisfy the definition of pseudorandomnes given above (thereby showing you that the definition above aligns with your intuition at least in this case; hopefully in seeing how the definition works to rule out this generator, you'll agree it's a good definition).

We'll now prove this proposition.

We must show that it is not the case that \eqref{eq:sufficiently_small_verbose} is satisfied, i.e.

\begin{equation*} \neg (\forall \mathcal{D}, c \ \exists n_c \ \forall n>n_c \quad A_{\mathcal{D}}(n) < 1/n^c) \end{equation*} i.e. that

$$ \exists \mathcal{D}, c \ \forall n_c \ \exists n>n_C \quad A_{\mathcal{D}}(n) \geq 1/n^c $$ So we must give a particular distinguisher that behaves differently on pseudorandom strings than on random strings. Consider the following distinguisher:

\begin{equation*} D(\alpha_1 \ldots \alpha_n) = \begin{cases} 1 & \alpha_1 = \alpha_n \lor n=0 \\ 0 & \mbox{otherwise} \end{cases} \end{equation*}

The following facts are immediate from the definition of $G$ and $D$:

\begin{align*} \Pr_{x \in U_n}[ D(G(s)) = 1] &= 1 \\ \Pr_{\alpha \in U_{\ell(n)}}[D(\alpha) = 1] &= 1/2 \end{align*} This gives us that \begin{equation} | \Pr_{x \in U_n}[ D(G(s)) = 1] - \Pr_{\alpha \in U_{\ell(n)}}[D(\alpha) = 1] | \geq 1/2 \label{eq:proposition} \end{equation} and \eqref{eq:proposition} is greater than $1/n^c$ for $c=1, n>3$.

The following proposition is left as an exercise to the reader:

Let $G$ be the following number generator: \[ G(s_1 \ldots s_n) = \begin{cases} 0 & n=0 \\ s_1, s_1 \oplus s_2, \ldots, s_{n-1} \oplus s_n, s_n & \mbox{otherwise} \end{cases} \] $G$ is not pseudorandom.

Miscellaneous Notes

The reader may find it somewhat unfortunate that in \eqref{eq:biased_generator_1} even with so simple a generator, a the special case for $n=0$ was needed. This is necessary as we defined number generators to be both length-increasing and acting on the empty string. Had we restricted the domain of number generators to $\{0,1\}^{\geq 1}$ we wouldn't have needed the special case above however this domain is arguably less natural. However you slice it, there will be messy bits somewhere.
The definition of pseudorandomness can be made more concise if we're willing to borrow a few concepts.
- The concept of little omega, i.e. $\omega$, can be used to capture our notion of asymptotically small. A function $f(n)$ is $\omega(1)$ if it grows faster than every constant, thus if $g(n)$ is $n^{-\omega(1)}$, it shrinks faster than the inverse of every polynomial and thus satisfies condition of \eqref{eq:sufficiently_small_verbose}.
- The condition \eqref{eq:pr_verbose} describes the difference between two distributions: the output of $D$ on $G(U_n)$, and the output of $D$ on samples from $U_{\ell(n)}$. The difference between probability distributions is a general concept and is sometimes defined separately. The notion of statistical distance natural for us is sometimes called Total Variation Distance. If $X$ and $Y$ are distributions over a set $S$, we denote their statistical distance as $\Delta(X,Y)$ which is the quantity. $$ \Delta(X,Y) = \sum_{x \in S} | \Pr_{x \sim X}[y = s] - \Pr_{y \sim Y}[y = s] | $$ Given this notion, \eqref{eq:pr_verbose} is simply \[ A_{\mathcal{D}}(n) = \Delta\left(D(G(U_n)), D(U_{\ell(n)})\right)/2 \] The factor of two comes from that $\Delta$ requires a sum over the support of the distribution, whereas since our distinguisher only output zero's and one's, knowing the measure of $D(G(U_n)) = 1$ is

CSC 422 Fall 2013

Tutorial 1: Pseudorandom Generators

A function that's not pseudorandom

Miscellaneous Notes