Problem Set 12

1 Alien Linguistics and Compression

It is the future and humans and aliens are beginning to coexist. Aliens have a different language than ours, their language uses only 3 characters. Although they are written differently, let’s represent those characters as $\Sigma = \{a, b, c\}$

Let $n \in \mathbb{N}$ be any natural number. $\Sigma^n$ denotes the set of all length $n$ strings over $\Sigma$.

Note that $\Sigma^0 = \{\epsilon\}$

Let $\Sigma^{\leq n} = \Sigma^0 \cup \Sigma^1 \cup ... \cup \Sigma^n$ be the set of all sequence of characters with length at most $n$.

1.1

What is $|\Sigma^{n}|$? No explanation necessary.

Solution

$3^n$

1.2

Prove by induction $|\Sigma^{\leq n}| = \frac{3^{n+1} - 1}{2}$.

Solution

By induction.

Base case. $\Sigma^{\leq n} = \{\epsilon\}$, which has size $1 = (3^{1} - 1) / 2$

Inductive step. Let $k \in \mathbb{N}$ be any natural number and and assume $|\Sigma^{\leq k}| = \frac{3^{k+1} - 1}{2}$. Then

$\Sigma^{\leq k+1} = \Sigma^{\leq k} \cup \Sigma^{k+1}$

Note that $\Sigma^{\leq k}$ and $\Sigma^{k+1}$ do not share any elements, so

\[ \begin{aligned} |\Sigma^{\leq k+1}| & = |\Sigma^{\leq k}| + |\Sigma^{k+1}| \\ & = \frac{3^{k+1} - 1}{2} + |\Sigma^{k+1}| & \text{(IH)} \\ & = \frac{3^{k+1} - 1}{2} + 3^{k+1} & \text{(part a.)} \\ & = \frac{3^{k+2} - 1}{2} \end{aligned} \]

as required.

1.3

A compression algorithm on length $n$ strings is a function $C$ that compresses all strings of length exactly $n$ into strictly shorter strings.

Let $C$ be a compression function. What is the domain and codomain of $C$?

Solution

The domain is $\Sigma^n$. The codomain is $\Sigma^{\leq n-1}$.

1.4

A compression algorithm $C$ is lossless if for any string $x$, it is possible to fully recover $x$ from $C(x)$.

Let $A$, $B$ be the domain and codomain you found in the previous question.

Select the property below such that for every function $C: A \to B$, $C$ has the property if and only if $C$ is a lossless compression algorithm.

Symmetric
Injective
Surjective
Increasing
Bijective

Solution

Injective

1.5

The aliens ask you to help them design a lossless compression algorithm on length $100$ strings.

Prove that this is in fact an impossible task!

Solution

We need an injective function from $\Sigma^{100} \to \Sigma^{\leq 99}$.

The domain has size $3^{100}$, the codomain has size $\frac{3^{100} - 1}{2} < 3^{100}$. By the Pigeonhole Principle, there is no such function.

1.6

One alien language is \[S = \{w \in \{a, b, c\}^*: \text{none of } aa, bb, cc \text{ are substrings of w}\}\]

I.e. $S$ contains all strings where no consecutive letters are the same.

Show that $S$ is regular.

If you choose to draw a DFA for $S$, use at most 5 states. You do not have to prove that your DFA works.

Solution

Define * $A = \{w \in \{a, b, c\}^*: aa \text{ is a substring of } w\}$ * $B = \{w \in \{a, b, c\}^*: bb \text{ is a substring of } w\}$ * $C = \{w \in \{a, b, c\}^*: cc \text{ is a substring of } w\}$

$A$ has regular expression $\Sigma^*aa\Sigma^*$, $B$ has regular expression $\Sigma^*bb\Sigma^*$, and $C$ has regular expression $\Sigma^*cc\Sigma^*$.

Then $S = \overline{A} \cap \overline{B} \cap \overline{C}$. Thus $S$ is regular since the complement of a regular language is regular, and the intersection of regular languages is regular.

1.7

Let $S_n\subseteq\Sigma^n$ be the subset of $S$ containing strings of length exactly $n$ I.e. for all $n \in \mathbb{N}$, $S_n = \{w \in S: |w| = n\}$.

Show that $S_n$ is regular.

Solution

$S_n$ is finite and thus regular.

1.8

Another alien language allows consecutive $b$s and $c$s but not $a$s. Let $X = \{w \in \Sigma^*: w \text{ does not have consecutive $a$s} \}$. Again, let $X_n = \{w \in X: |w| = n\}$, be the subset containing strings of length $n$.

Find a recurrence (including the base case(s)) for $|X_n|$, i.e. write

Solution

\[ |X_n| = \begin{cases} 1 & n = 0 \\ 3 & n = 1 \\ 2|X_{n-1}| + 2|X_{n-2}| & n > 1 \end{cases} \]

2 Sandwich

Let $\mathrm{Chars}= \{a, b, c, ..., z\}$ and use $\epsilon$ to denote the empty string (which is “” in Python).
Let $\mathrm{Strings}= \{\epsilon, a, b,..., aa,ab,...\}$ be the set of all strings over the alphabet $\mathrm{Chars}$. I.e. it is the set of sequences of $0$ or more characters in $\mathrm{Chars}$.
Let $\mathrm{Sandwich}: \mathrm{Strings}\times \mathrm{Strings}\to \mathrm{Strings}$ be the function that maps $(x, y) \to xyx$. For example, \[\mathrm{Sandwich}(\text{`ice'}, \text{`cream'}) = \text{`icecreamice'}.\]

2.1

Is $\mathrm{Sandwich}$ injective? Prove your answer

Solution

Not injective: $\mathrm{Sandwich}(\text{a}, \epsilon) = aa = \mathrm{Sandwich}(\epsilon, aa)$.

2.2

Is $\mathrm{Sandwich}$ surjective? Prove your answer

Solution

Surjective. Let $w \in \mathrm{Strings}$. $w = \mathrm{Sandwich}(\epsilon, w)$.

2.3

Let $B = \mathrm{Chars}\cup \{\epsilon\}$, and let $X$ be the set generated from $B$ by $\{\mathrm{Sandwich}\}$.

Prove that $\forall w \in X$, at most $1$ character in $\mathrm{Chars}$ appears an odd number of times.

Solution

By Structural Induction.

Base Case. Every string in the $B$ has length $0$ or $1$. Thus, there it is impossible for more than one character to appear an odd number of times.

Inductive Step. For any $\sigma \in \mathrm{Chars}$, and $\varphi \in \mathrm{Strings}$ let $\mathrm{Count}(\sigma, \varphi)$ be the number of times a character $\sigma$ appears in $\varphi$.

Let $\alpha, \beta \in X$, and suppose for the inductive hypothesis that at most one character appears an odd number of times in $\alpha$, and $\beta$. We’ll now show that the same is true for $\mathrm{Sandwich}(\alpha, \beta) = \alpha\beta\alpha$.

Let $\sigma$ be any character. Then we have $\mathrm{Count}(\sigma, \alpha\beta\alpha) = 2\mathrm{Count}(\sigma, \alpha) + \mathrm{Count}(\sigma, \beta)$. Note that this is odd if and only if $\mathrm{Count}(\sigma, \beta)$ is odd. By the inductive hypothesis, $\mathrm{Count}(\sigma, \beta)$ is odd for at most one character. Therefore, $\mathrm{Count}(\sigma, \alpha\beta\alpha)$ is odd for at most one character.

2.4

Let $\mathrm{Reverse}: \mathrm{Strings}\to \mathrm{Strings}$ be the function that maps a string to its reverse. For example, \[\mathrm{Reverse}(\text{`hello'}) = \text{`olleh'}.\]
Let $P = \{w \in \mathrm{Strings}: w = \mathrm{Reverse}(w) \}$. Note that this is the set of Palindromes.

Show that $X = P$ (i.e. $X$ is the set from the previous problem).

Solution

First we’ll show $X \subseteq P$, namely, that $\forall x \in X. x \in P$. By structural induction.

Base Case. For the base case, we have that $\epsilon$, and all length $1$ strings are equal to their reverse, and hence in $P$.

Inductive Step. Let $\alpha, \beta \in X$, and suppose $\alpha, \beta \in P$, that is $\alpha=\mathrm{Reverse}(\alpha)$, and $\beta = \mathrm{Reverse}(\beta)$. We’ll show that $\mathrm{Sandwich}(\alpha, \beta) = \alpha\beta\alpha \in P$. We have $\mathrm{Reverse}(\alpha\beta\alpha) = \mathrm{Reverse}(\alpha)\mathrm{Reverse}(\beta)\mathrm{Reverse}(\alpha) = \alpha\beta\alpha$, Thus, $\alpha\beta\alpha \in P$.

Now, we’ll show that $P \subseteq X$. By complete induction on the length of the string.

Base Case. We’ll show all length $0$ and length $1$ palindromes are in $X$. The length $0$ and length $1$ palindromes are exactly $\epsilon$ and $a$ for $a \in \mathrm{Chars}$, which conveniently is the base set for $X$. Hence, $X$ contains them all.

Inductive Step. Let $k \in \mathbb{N}$ with $k \geq 1$, and assume that all strings of length at most $k$ in $P$, are also in $X$. Let $\alpha \in P$ be any string of length $k+1$. Since $\alpha \in P$, we have $\alpha = \mathrm{Reverse}(\alpha)$. Therefore, the first and last character must be equal, thus, we can write $\alpha = \sigma \alpha' \sigma$, where $\sigma = \alpha[0]$, and $\alpha' = \alpha[1:-1]$ (using Python notation). Note that since $\alpha = \mathrm{Reverse}(\alpha)$, we have $\alpha' = \mathrm{Reverse}(\alpha')$, and hence $\alpha' \in P$. By the inductive hypothesis, since $\alpha'$ is a string of length $k-1$, $\alpha' \in P$. From the base case, $\sigma$ is also in $P$. Thus $\mathrm{Sandwich}(\sigma, \alpha') = \sigma \alpha ' \sigma \in P$, and of course, $\sigma \alpha' \sigma = \alpha$.

2.5

Suppose we changed the domain and codomain of the $\mathrm{Sandwich}$ function. Each row of the table below represents one possiblility. For a given row, write a ‘Y’ in the injective column if the function is injective on that domain/codomain, and a ‘N’ otherwise. Same for the surjective column.

Domain	Codomain	Injective?	Surjective?
$P \times P$	$P$
$\mathrm{Chars}\times \mathrm{Strings}$	$\mathrm{Strings}$
$\mathrm{Chars}\cup \{\epsilon\} \times \mathrm{Strings}$	$\mathrm{Strings}$
$P \times \mathrm{Chars}\cup \{\epsilon\}$	$P$

Solution

Domain	Codomain	Injective?	Surjective?
$P \times P$	$P$	N	Y
$\mathrm{Chars}\times \mathrm{Strings}$	$\mathrm{Strings}$	Y	N
$\mathrm{Chars}\cup \{\epsilon\} \times \mathrm{Strings}$	$\mathrm{Strings}$	N	Y
$P \times \mathrm{Chars}\cup \{\epsilon\}$	$P$	Y	N

Domain	Codomain	Injective?	Surjective?
\(P \times P\)	\(P\)
\(\mathrm{Chars}\times \mathrm{Strings}\)	\(\mathrm{Strings}\)
\(\mathrm{Chars}\cup \{\epsilon\} \times \mathrm{Strings}\)	\(\mathrm{Strings}\)
\(P \times \mathrm{Chars}\cup \{\epsilon\}\)	\(P\)

Domain	Codomain	Injective?	Surjective?
\(P \times P\)	\(P\)	N	Y
\(\mathrm{Chars}\times \mathrm{Strings}\)	\(\mathrm{Strings}\)	Y	N
\(\mathrm{Chars}\cup \{\epsilon\} \times \mathrm{Strings}\)	\(\mathrm{Strings}\)	N	Y
\(P \times \mathrm{Chars}\cup \{\epsilon\}\)	\(P\)	Y	N