Problem Set 12

1 Alien Linguistics and Compression

It is the future and humans and aliens are beginning to coexist. Aliens have a different language than ours, their language uses only 3 characters. Although they are written differently, let’s represent those characters as \(\Sigma = \{a, b, c\}\)

Let \(n \in \mathbb{N}\) be any natural number. \(\Sigma^n\) denotes the set of all length \(n\) strings over \(\Sigma\).

Note that \(\Sigma^0 = \{\epsilon\}\)

Let \(\Sigma^{\leq n} = \Sigma^0 \cup \Sigma^1 \cup ... \cup \Sigma^n\) be the set of all sequence of characters with length at most \(n\).

1.1

What is \(|\Sigma^{n}|\)? No explanation necessary.

Solution \(3^n\)

1.2

Prove by induction \(|\Sigma^{\leq n}| = \frac{3^{n+1} - 1}{2}\).

Solution

By induction.

Base case. \(\Sigma^{\leq n} = \{\epsilon\}\), which has size \(1 = (3^{1} - 1) / 2\)

Inductive step. Let \(k \in \mathbb{N}\) be any natural number and and assume \(|\Sigma^{\leq k}| = \frac{3^{k+1} - 1}{2}\). Then

\(\Sigma^{\leq k+1} = \Sigma^{\leq k} \cup \Sigma^{k+1}\)

Note that \(\Sigma^{\leq k}\) and \(\Sigma^{k+1}\) do not share any elements, so

\[ \begin{aligned} |\Sigma^{\leq k+1}| & = |\Sigma^{\leq k}| + |\Sigma^{k+1}| \\ & = \frac{3^{k+1} - 1}{2} + |\Sigma^{k+1}| & \text{(IH)} \\ & = \frac{3^{k+1} - 1}{2} + 3^{k+1} & \text{(part a.)} \\ & = \frac{3^{k+2} - 1}{2} \end{aligned} \]

as required.

1.3

A compression algorithm on length \(n\) strings is a function \(C\) that compresses all strings of length exactly \(n\) into strictly shorter strings.

Let \(C\) be a compression function. What is the domain and codomain of \(C\)?

Solution

The domain is \(\Sigma^n\). The codomain is \(\Sigma^{\leq n-1}\).

1.4

A compression algorithm \(C\) is lossless if for any string \(x\), it is possible to fully recover \(x\) from \(C(x)\).

Let \(A\), \(B\) be the domain and codomain you found in the previous question.

Select the property below such that for every function \(C: A \to B\), \(C\) has the property if and only if \(C\) is a lossless compression algorithm.

  • Symmetric

  • Injective

  • Surjective

  • Increasing

  • Bijective

Solution

Injective

1.5

The aliens ask you to help them design a lossless compression algorithm on length \(100\) strings.

Prove that this is in fact an impossible task!

Solution

We need an injective function from \(\Sigma^{100} \to \Sigma^{\leq 99}\).

The domain has size \(3^{100}\), the codomain has size \(\frac{3^{100} - 1}{2} < 3^{100}\). By the Pigeonhole Principle, there is no such function.

1.6

One alien language is \[S = \{w \in \{a, b, c\}^*: \text{none of } aa, bb, cc \text{ are substrings of w}\}\]

I.e. \(S\) contains all strings where no consecutive letters are the same.

Show that \(S\) is regular.

If you choose to draw a DFA for \(S\), use at most 5 states. You do not have to prove that your DFA works.

Solution

Define * \(A = \{w \in \{a, b, c\}^*: aa \text{ is a substring of } w\}\) * \(B = \{w \in \{a, b, c\}^*: bb \text{ is a substring of } w\}\) * \(C = \{w \in \{a, b, c\}^*: cc \text{ is a substring of } w\}\)

\(A\) has regular expression \(\Sigma^*aa\Sigma^*\), \(B\) has regular expression \(\Sigma^*bb\Sigma^*\), and \(C\) has regular expression \(\Sigma^*cc\Sigma^*\).

Then \(S = \overline{A} \cap \overline{B} \cap \overline{C}\). Thus \(S\) is regular since the complement of a regular language is regular, and the intersection of regular languages is regular.

1.7

Let \(S_n\subseteq\Sigma^n\) be the subset of \(S\) containing strings of length exactly \(n\) I.e. for all \(n \in \mathbb{N}\), \(S_n = \{w \in S: |w| = n\}\).

Show that \(S_n\) is regular.

Solution

\(S_n\) is finite and thus regular.

1.8

Another alien language allows consecutive \(b\)s and \(c\)s but not \(a\)s. Let \(X = \{w \in \Sigma^*: w \text{ does not have consecutive $a$s} \}\). Again, let \(X_n = \{w \in X: |w| = n\}\), be the subset containing strings of length \(n\).

Find a recurrence (including the base case(s)) for \(|X_n|\), i.e. write

Solution

\[ |X_n| = \begin{cases} 1 & n = 0 \\ 3 & n = 1 \\ 2|X_{n-1}| + 2|X_{n-2}| & n > 1 \end{cases} \]

2 Sandwich

  • Let \(\mathrm{Chars}= \{a, b, c, ..., z\}\) and use \(\epsilon\) to denote the empty string (which is “” in Python).

  • Let \(\mathrm{Strings}= \{\epsilon, a, b,..., aa,ab,...\}\) be the set of all strings over the alphabet \(\mathrm{Chars}\). I.e. it is the set of sequences of \(0\) or more characters in \(\mathrm{Chars}\).

  • Let \(\mathrm{Sandwich}: \mathrm{Strings}\times \mathrm{Strings}\to \mathrm{Strings}\) be the function that maps \((x, y) \to xyx\). For example, \[\mathrm{Sandwich}(\text{`ice'}, \text{`cream'}) = \text{`icecreamice'}.\]

2.1

Is \(\mathrm{Sandwich}\) injective? Prove your answer

Solution

Not injective: \(\mathrm{Sandwich}(\text{a}, \epsilon) = aa = \mathrm{Sandwich}(\epsilon, aa)\).

2.2

Is \(\mathrm{Sandwich}\) surjective? Prove your answer

Solution

Surjective. Let \(w \in \mathrm{Strings}\). \(w = \mathrm{Sandwich}(\epsilon, w)\).

2.3

Let \(B = \mathrm{Chars}\cup \{\epsilon\}\), and let \(X\) be the set generated from \(B\) by \(\{\mathrm{Sandwich}\}\).

Prove that \(\forall w \in X\), at most \(1\) character in \(\mathrm{Chars}\) appears an odd number of times.

Solution

By Structural Induction.

Base Case. Every string in the \(B\) has length \(0\) or \(1\). Thus, there it is impossible for more than one character to appear an odd number of times.

Inductive Step. For any \(\sigma \in \mathrm{Chars}\), and \(\varphi \in \mathrm{Strings}\) let \(\mathrm{Count}(\sigma, \varphi)\) be the number of times a character \(\sigma\) appears in \(\varphi\).

Let \(\alpha, \beta \in X\), and suppose for the inductive hypothesis that at most one character appears an odd number of times in \(\alpha\), and \(\beta\). We’ll now show that the same is true for \(\mathrm{Sandwich}(\alpha, \beta) = \alpha\beta\alpha\).

Let \(\sigma\) be any character. Then we have \(\mathrm{Count}(\sigma, \alpha\beta\alpha) = 2\mathrm{Count}(\sigma, \alpha) + \mathrm{Count}(\sigma, \beta)\). Note that this is odd if and only if \(\mathrm{Count}(\sigma, \beta)\) is odd. By the inductive hypothesis, \(\mathrm{Count}(\sigma, \beta)\) is odd for at most one character. Therefore, \(\mathrm{Count}(\sigma, \alpha\beta\alpha)\) is odd for at most one character.

2.4

  • Let \(\mathrm{Reverse}: \mathrm{Strings}\to \mathrm{Strings}\) be the function that maps a string to its reverse. For example, \[\mathrm{Reverse}(\text{`hello'}) = \text{`olleh'}.\]

  • Let \(P = \{w \in \mathrm{Strings}: w = \mathrm{Reverse}(w) \}\). Note that this is the set of Palindromes.

Show that \(X = P\) (i.e. \(X\) is the set from the previous problem).

Solution

First we’ll show \(X \subseteq P\), namely, that \(\forall x \in X. x \in P\). By structural induction.

Base Case. For the base case, we have that \(\epsilon\), and all length \(1\) strings are equal to their reverse, and hence in \(P\).

Inductive Step. Let \(\alpha, \beta \in X\), and suppose \(\alpha, \beta \in P\), that is \(\alpha=\mathrm{Reverse}(\alpha)\), and \(\beta = \mathrm{Reverse}(\beta)\). We’ll show that \(\mathrm{Sandwich}(\alpha, \beta) = \alpha\beta\alpha \in P\). We have \(\mathrm{Reverse}(\alpha\beta\alpha) = \mathrm{Reverse}(\alpha)\mathrm{Reverse}(\beta)\mathrm{Reverse}(\alpha) = \alpha\beta\alpha\), Thus, \(\alpha\beta\alpha \in P\).

Now, we’ll show that \(P \subseteq X\). By complete induction on the length of the string.

Base Case. We’ll show all length \(0\) and length \(1\) palindromes are in \(X\). The length \(0\) and length \(1\) palindromes are exactly \(\epsilon\) and \(a\) for \(a \in \mathrm{Chars}\), which conveniently is the base set for \(X\). Hence, \(X\) contains them all.

Inductive Step. Let \(k \in \mathbb{N}\) with \(k \geq 1\), and assume that all strings of length at most \(k\) in \(P\), are also in \(X\). Let \(\alpha \in P\) be any string of length \(k+1\). Since \(\alpha \in P\), we have \(\alpha = \mathrm{Reverse}(\alpha)\). Therefore, the first and last character must be equal, thus, we can write \(\alpha = \sigma \alpha' \sigma\), where \(\sigma = \alpha[0]\), and \(\alpha' = \alpha[1:-1]\) (using Python notation). Note that since \(\alpha = \mathrm{Reverse}(\alpha)\), we have \(\alpha' = \mathrm{Reverse}(\alpha')\), and hence \(\alpha' \in P\). By the inductive hypothesis, since \(\alpha'\) is a string of length \(k-1\), \(\alpha' \in P\). From the base case, \(\sigma\) is also in \(P\). Thus \(\mathrm{Sandwich}(\sigma, \alpha') = \sigma \alpha ' \sigma \in P\), and of course, \(\sigma \alpha' \sigma = \alpha\).

2.5

Suppose we changed the domain and codomain of the \(\mathrm{Sandwich}\) function. Each row of the table below represents one possiblility. For a given row, write a ‘Y’ in the injective column if the function is injective on that domain/codomain, and a ‘N’ otherwise. Same for the surjective column.

Domain Codomain Injective? Surjective?
\(P \times P\) \(P\)
\(\mathrm{Chars}\times \mathrm{Strings}\) \(\mathrm{Strings}\)
\(\mathrm{Chars}\cup \{\epsilon\} \times \mathrm{Strings}\) \(\mathrm{Strings}\)
\(P \times \mathrm{Chars}\cup \{\epsilon\}\) \(P\)
Solution
Domain Codomain Injective? Surjective?
\(P \times P\) \(P\) N Y
\(\mathrm{Chars}\times \mathrm{Strings}\) \(\mathrm{Strings}\) Y N
\(\mathrm{Chars}\cup \{\epsilon\} \times \mathrm{Strings}\) \(\mathrm{Strings}\) N Y
\(P \times \mathrm{Chars}\cup \{\epsilon\}\) \(P\) Y N