2025-08-06
To show a language is regular, we can
To show a language is not regular, use the Myhill-Nerode Theorem.
Find a set of strings \(S \subseteq \Sigma^*\) such that
A language is regular if it can be computed using fixed, finite memory.
Similar to stuff you have seen in homework, lecture and tutorial.
The exam is cumulative, with a slight emphasis on Formal Language Theory.
The exam is not easy. It’s designed to be challenging in order to give you a chance to demonstrate your understanding of the material. I look over the exams before submitting the final grades, so if I see that people seem to have a better understanding of the course material than their grade reflects, I may make adjustments.
Here are the main question types that we have seen in the course and my recommendations for how to approach them in an exam and how to study for them.
Write out the formal first order logic definitions for what you’re trying to prove. Let that guide your solution.
Study tips
Remember the intuition for injective/surjective/bijective. This might help you come up with a solution.
Remember the formal definitions for injective/surjective/bijective. This will help you write the proof.
Writing it up
Explicitly define your graph \(G = (V, E)\).
Tell me exactly what the vertex set \(V\) is.
Tell me exactly what the edge set \(E\) is.
State the corresponding graph problem.
Explain why a solution to the graph problem is a solution to the problem in question and vice versa.
Study tips
Recognizing when to do something by induction.
When the problem is something like \(\forall n \in \mathbb{N}. P(n)\).
When I give you an inductively/recursively defined set - that’s almost always structural induction.
Writing it up
When proving a statement, translate it into \(\forall n \in \mathbb{N}. P(n)\) (or for structural induction \(\forall x \in X. P(x)\), were \(X\) is an inductively defined set. This step will clarify your proof process.
Clearly label the base case and inductive step.
For the inductive step:
Clearly state your inductive hypothesis.
State what you’re trying to prove, i.e. \(P(k+1)\).
Clearly state where you are applying the inductive hypothesis.
Study tips
Deciding on a method
Can it be done using the Master Method? If so, use it!
Otherwise, draw the recursion tree to form a guess, and then use the substitution method.
State clearly that the master theorem applies in this case. “This is a standard form recurrence with parameters \(a = ...\), \(b = ...\), \(f(n) = ...\)”
Calculate the leaf work, \(n^{\log_b(a)}\), and determine the case split.
Find an explicit \(\epsilon\) like 0.0000001 in the leaf/root heavy case.
In the root heavy case don’t forget the regularity condition!
Study tips
Draw recursion tree for guess.
Write out your guess \(f(n)\).
Write out the recurrence \(T(n) = T(...) + ...\)
Substitute all the \(T(...)\)s on the RHS with \(c\cdot f(...)\)s, and replace the \(=\) with \(\leq\) for Big O or \(\geq\) for Big \(\Omega\)
You want the RHS to then be \(\leq cf(n)\).
Rearrange to find an appropriate value of \(c\).
Correctness is precondition implies postcondition.
Do complete induction on the size of the input.
The IH is that precondition implies postcondition for smaller instances.
To apply the IH, you need to prove the precondition holds for the recursive step, AND that the recursive step is indeed a smaller instance (and thus captured in the IH)
Do NOT try to do induction on the size of the input.
Define a loop invariant.
Trace the algorithm on some example inputs to get an idea of what each variable corresponds to.
Use that intuition to come up with a loop invariant.
It’s almost always useful to include: after the \(k\)th iteration \(i_k = ...\) where \(i\) is the iteration variable.
Sketch initialization, maintenance, and termination BEFORE writing it up. If everything seems to work out ok, you can start writing it up. Otherwise, you found something that didn’t work, so you need to update your LI.
Explicitly state your loop invariant: “\(P(n):\) After the \(n\)th iteration ...”
Prove initialization/maintenance/termination
Termination: Either use the loop invariant or the descending sequence strategy, and prove that the LI after the last iteration implies the postcondition.
Pros | Cons | |
---|---|---|
Regex | Fast, great for matching substrings/beginnings/endings | Easy to match too many things |
NFA | Powerful, great for counting # of occurrences mod N | Slower than regex and can harder to check than DFAs |
DFA | Safe. Once you write a DFA it is easier to check (don’t need to follow different sequence of choices) | Takes longer to write. |
Study tips
For all the language I have given you in lectures, tutorials, and homeworks, come up with DFAs/NFAs and regular expressions for them.
Get a lot of practice. This is particularly important for these kinds of problems since it increases the likelihood of you being able to relate a language on the exam to something you’ve seen before.
Questions of the form, suppose \(A\), \(B\) are regular, then so is (some transformation of \(A, B\))
Assume you have DFAs for \(A\), and \(B\).
Define a NFA based on the DFAs for \(A\) and \(B\).
To show \(A\) is not regular,
\(G_{236} = (V, E)\)
\(V\) is the set of things we studied.
\(\{u, v\} \in E\) if and only if \(u\) and \(v\) are “directly related".
We want to be able to decode from many errors.
There should be minimal overhead. I.e. \(c\) shouldn’t be much longer than \(m\) itself.
\(\mathrm{Enc}, \mathrm{Dec}\) should be fast.
QR Codes
Hard drives
\[ \begin{align*} m &= \texttt{hello} \\ c = \mathrm{Enc}(m) &= \texttt{hellohellohello} \\ \tilde{c} &= \texttt{jellohillahelgo} \end{align*} \]
How do you correct the errors?
When will this strategy work/fail?How many errors can I guarantee to work on if I sent \(5\) copies instead of just \(3\)? How about \(2a+1\) in general for \(a \in \mathbb{N}\)?
Performance summary
Size blow up: \(3\) (\(2a+1\))
Errors decodable from: \(1\) (\(a\))
A size blow up of \(3\) to correct just a single error doesn’t seem that worth it! Can we do better?
\[ m \in \{0,1\}^4 \]
\[ c = m_1m_2m_3m_4(m_2\oplus m_3 \oplus m_4)(m_1 \oplus m_3 \oplus m_4)(m_1 \oplus m_2 \oplus m_4) \]
E.g. \(m = 1001\)
\(c = 1001100\)
Claim: We can correct from one error!
What is the message if this was the received codeword (with one error)
\[ \tilde{c} = 1110011 \]
Performance summary
Size blow up: \(7/4\)
Errors decodable from: \(1\)
We get the same error correction performance, but the blow up in the size of the message is only \(7/4\) instead of \(3\)!
What is the optimal tradeoff between the size blow up and the number of errors you can correct from?
How can we construct codes that get a good tradeoff that also have good encoding and decoding algorithms?
\[ \begin{align*} m = \texttt{hello}, k=3 \\ c = \texttt{jhoor} \end{align*} \]
Suppose some eavesdropper who doesn’t know the key saw the ciphertext jhoor
. What can they learn about the message?
They actually get to learn a lot about the message! For example, they know that the third and fourth character must have been the same in the message and can rule out messages like \(\texttt{peach}\). In fact, of all the \(26^5\) possible sequences of letters that could have been the message, they can narrow it down to just \(26\) possible messages.
For security, we want the eavesdropper to learn nothing about the real message!
How can we mathematically define “learns nothing”?
Since we don’t have a way of checking all algorithms, we need to make some assumptions of the form: “There is no efficient algorithm to solve \(X\)” i.e. \(X\) is a hard problem.
We then use problem \(X\) to design an encryption scheme. To prove the security of that scheme, we prove that if there is an adversary that can distinguish b/w the encryption of two messages, then they can solve problem \(X\) efficiently, which is impossible under our assumption.
What are some hard problems you know that can be used?
Factoring large number is a common one. Finding the discrete logarithm is another.
Interestingly, these problems are not hard if there are good enough quantum computers : )
Zero Knowledge Proofs
Secret Sharing
Private Information Retrieval
Homomorphic Encryption
We looked at time as a valuable resource for computation. What are some other resources you can think of?
Once we pick a resource R, we can ask...
How much R do we need to solve problem \(X\)?
How much extra time do I need to simulate random or nondeterministic computation?
How much time can I buy with extra space?
For DFAs, we saw that adding nondeterminism gave us no additional power.
We can simulate a NFA using a DFA!
An analogous fact is not known for more general computation.
\(P\) is the set of formal languages that can be solved deterministically in polynomial time.
\(NP\) is the set of formal languages that can be solved nondeterministically in polynomial time.
\(NP\) is equivalent to the set of formal langauge for which we can verify a solution to a problem deterministically in polynomial time.
For example,
\[ \mathrm{Sudoku} = \{S: S \text{ is a sudoku puzzle with a solution}\} \]
Given a sudoku puzzle \(S\), and proposed solution \(S'\), I can quickly check whether or not \(S'\) is indeed a solution for \(S\) and hence \(S \in \mathrm{Sudoku}\)
\(NP\) represents the class of problems we can reasonably hope to solve. If we can’t check whether a proposed solution is correct, how can I be confident about any answer?
Does this sudoku have a solution?
Is this a valid solution to the sudoku puzzle?
Is \(P = NP\)?
I.e. If I can check a solution to a problem deterministically in polynomial time, can I solve the problem deterministically in polynomial time?
Intuition: No! It’s much easier to check a solution than to solve the problem!
This is one of 7 Millennium Prize Problems. There is a 1000000 USD prize for each problem. So far, just one out of 7 of the problems have been solved.
Depending what you liked here are some topics to explore next! If you liked...
Mathematical foundations...
MAT377 - Mathematical Probability
MAT344 - Introduction to Combinatorics
MAT332 - Introduction to Graph Theory
MAT221/223/224 - Linear Algebra
Algorithms...
CSC263 - Data Structures and Analysis
CSC373 - Algorithm Design, Analysis & Complexity
CSC473 - Advanced Algorithm Design
CSC324 - Principles of Programming Languages
Formal Languages...
CSC448 - Formal Languages and Automata
CSC463 - Computational Complexity and Computability
It’s been a pleasure being your instructor this term! I hope you enjoyed the course.
Best of luck on the exam, and beyond!
CSC236 Summer 2025