Writing Resources
Computer Science Writing Advice
The following is an incomplete list of writing tips that I find useful when writing about computer science research in English. Please keep in mind that this article is full of my opinions and preferences. Some slides from a talk I give on CS writing.
First, a bit about my bias. Strunk and White “The Elements of Style” is a great starting point for CS writers. Yes, it is prescriptive. Nonetheless, you need to know the rules, even the “small” rules about when to capitalize and where to put commas. When you deviate from expected rules, you draw a reader’s attention. A reader’s attention is a precious gift, not to be wasted. In scientific writing, you want your ideas to receive the attention, not your writing.
This advice is very specific to English CS writing. For an insightful essay on how good writing in English differs from other languages, I recommend Zinnser’s essay “Writing English as a Second Language”. Many of Zinnser’s tips for journalists apply to computer scientists. Simple words and simple sentences are powerful!
Some other favorite resources: Simon Peyton Jones has slides and video on how to write a great research paper and Margo Seltzer’s slides on how to give a good talk.
Rules of Thumb
- Make definite assertions. Avoid tame, colorless, hesitating, non-committal language.
- Simple sentences are effective. Avoid long, convoluted, perhaps even warped, phrases that twist, turn and distract from the point one might wish, in this ever-so literary of worlds, to make in the most elegant way possible. Do not use long sentences with complex grammar just to show that you can. Particularly in technical writing, simple sentence structure enhances clarity. (See Strunk’s Elements of Style.)
- Be consistent in your writing style. The style should not distract the reader from the concepts you are trying to communicate. Never switch between different notations or conventions within one paper.
Using Citations
- A citation is an annotation for a sentence. It is not part
of the sentence and should play no grammatical role in the
sentence. In other words, if you remove the citation, the
sentence should still be grammatically correct and complete.
- Wrong “Thirty-second quasi-lunar normal form is defined in [AO72].”
- Wrong “[A072] contains a definition of…”
- Right “Alpha and Omega defined thirty-second quasi-lunar normal form [A072].”
- OK “Many researchers have studied these normal forms [A072,ABC00,XYZ+80].”
I recommend avoiding lists of citations like this. It should be clear to the reader why you are citing a work and what connection it has to your work.
- Unless required by the restrictions of a journal or conference, use more than cryptic numbers for citations. It is hard for a reader to remember the meaning of Reference 42. If you use the more common convention of author initials and year (e.g., [AO72]), it is much easier to remember common references.
- The Latin phrase et al. (meaning ‘and others’) is
used for a set of two or more authors (never for a single
author). Note that it is an abbreviation of et alia, so
there is a period after the second but not the first word.
- Wrong “Alpha et al. defined thirty-second quasi-lunar normal form [A072].”
- Right “Alpha and Omega defined thirty-second quasi-lunar normal form [A072].”
- Right “In follow-on work, Alpha et al. defined thirty-third quasi-lunar normal form [ABC00].” (This indicates that the publication has three (or more) authors, the first of whom is Alpha.)
- The phrase et al. may be italicized because it is a foreign phrase. In general, foreign words may be italicized in English writing, though generally this is done for foreign words that are likely to be unfamiliar to the reader. One can certainly argue that the use of et al. is quite common in computer science research, so it is often not italicized.
Abbreviations, Initialisms, Acronyms
- The abbreviation e.g. means “for example” while i.e. means “that is”. Do not mix them up. Both abbreviations are commonly followed by a comma. That is, you should punctuation them as you would the equivalent English phrase.
- Since e.g. indicates a partial list, many feel it is redundant to add “etc.” at the end of a list introduced by this abbreviation.
- My preference is to not start a sentence with an abbreviation. “I.E., I would prefer not to see ths sentence.”
- Since computer scientists love initialisms and acronyms, we should understand how to use them properly. Here’s a good article on this: http://www.dailywritingtips.com/initialisms-and-acronyms/.
Good Mathematical Writing
A thorough resource is the Knuth, Larrabee, and Roberts book on Mathematical Writing. A few brief pointers on common mistakes are given below.
- Do not start sentences with symbols, even capitalized symbols.
- Wrong R1 and R2 are disjoint. f is a total function.
- Right The relations R1 and R2 are disjoint. Function f is total.
- Do not punctuate math symbols. This can be very misleading if
a reader interprets the punctuation as mathematical notation.
- Wrong Relation R1 is incomplete Figure 1. f leaves … Does this refer to Figure 1.f which leaves, or Figure 1 and only f leaves?
- Wrong There are 235 other left-deep plans that differ only in the order that tables are joined.
This is a real example from “Database Management Systems”, Ramakrishnan, Gehrke, 3rd Edition, Page 415. Any database researcher worth her salt knows there are only 23 other left-deep plans. So why did Ramakrishnan and Gehrke claim there are a whopping 6,436,343 plans?
Answer: they didn’t, they annotated a number (23) with a footnote (5). They had me confused! - Right There are 23 other left-deep plans that differ only in the order that tables are joined.5
If you are expressing one idea per sentence, then a footnote belongs at then end of the sentence after the period (with no space between the period and the footnote).
- Avoid using notation with multiple, or (horrors!) nested, sub- or super-scripts.
- Do not use notation for the sake of notation. Often, it is clearer to use prose.
Common Mistakes
Gerunds and infinitives are not interchangeable! Both can be used as objects, but some verbs take gerunds, some take infinitives, and some can take infinitives with an agent.
For example, “The experiment required to implement” is
wrong, but “The experiment required the user to implement…”
is fine, as is “The experiment required the implementation
of…” Also, “The experiment involves to implement…” is
wrong, while “The experiment involves implementing…” is
correct.
In contrast, “I managed to implement…” is correct,
while “I managed implementing…” is not.
For more information, search for “gerunds and infinitives” on the web.
Uncountable Nouns.
Some nouns (like information, knowledge,
research, and advice) are considered uncountable in English
meaning you cannot count them by writing one information, two
informations, three informations, etc.
Most nouns are countable, so I can say one algorithm or two
algorithms. A rule of thumb is that for uncountable nouns,
you do not make the plural (“knowledges” is wrong), you do not
use them with the article a (“a knowledge” is wrong), and you
use them with singular verbs even you are using them in a
collectively plural sense (“knowledge are ellusive” is wrong).
- Wrong An advice I follow… (or Advices I follow…)
- Wrong Research have shown… (or Researches have shown…)
- Right Advice I follow… (or The best advice I have received is to…)
- Right Research has shown (or shows)…
The same comment applies to the use of the word ‘work’ when applied to a body of research. Note that when work is used to apply to a unit of something, then it is countable. The sentence “We saw two works of art by Rodin” is correct.
- Awkward “There are numerous works on data integration….”
- My preference “There is considerable work on data integration…”
Notice that I changed the adjective. Since work is uncountable in the context above, use an adjective that is not associated with enumeration (like numerous). Saying “We saw numerous works of art by Rodin” is certainly correct.
There are lists of uncountable nouns on the web. Note that in modern usage, data (like dust) is uncountable. In the past, you might read “a datum is” or “many data are”, but in modern use we have adopted the (formerly plural) word data as an uncountable noun that is used with a singluar verb (like information).
See a favorite essay on Big Data which explains the adoption of data as a mass noun.
Compound modifiers and compound nouns should be
hyphenated appropriately to avoid ambiguity.
- eerie-blue eyes (the eyes have a color that is an eerie shade of blue, so the color is eerie)
- eerie blue eyes (the eyes are both blue and eerie, here the eyes are eerie)
Here is another example for computer scientists.
- A binary-data structure is a structure for holding binary data. For example, a bitmap index holds binary data.
- A binary data-structure is a data structure that is binary. For example, a binary tree is a binary data-structure and can of course hold data that is not binary. It can contain strings or integers.
See: www.dailywritingtips.com or search for “compound modifiers and hypens”.
Bullet lists are over used by many CS writers (including in this blog).
They can be effective for drawing a reader’s attention to a
set of important statements. However, they are not an excuse
for writing abbreviated or sloppy prose.
Bullet lists
should be punctuated consistently. You should use consistent sentence or
phrase structure in each item. If the list contains full
sentences, it should not be started with a colon.
- The following is wrong:
- First, we prove that quasi-lunar is equivalent to semi-solar.
- We also consider solar flare designs showing them to be
- frequent; and
- proving such designs are numerous.
- The following is a correct statement of our
contributions.
- First, we prove that quasi-lunar is equivalent to semi-solar.
- Second, we show that solar flare designs are both frequent and numerous.
If the items in your bullet list contain a single phrase then it can start with a colon and be punctuated by semi-colons.
- Asymetric quasi-linear hashing has the following properties:
- a sweet fragrance, one that attracts users;
- a vibrant purple and orange color; and
- a linear complexity unless used in an inverted manner.
If the items in your bullet list contain more than one sentence, then it should be introduced by a sentence (ending in a period), and each item should contain a set of sentences (starting with a capital and ending in a period). In this case, there is no “and” before the last bullet. My advice is to use a similar construction in each bullet.
Enumerated nouns should be capitalized consistently for all nouns (this is preferred by convention in most CS forums) or not at all. Do not switch back and forth on a whim. Some examples include the following.
- See Figure 1 in Appendix A.
- We will use Function f in Equation 32a.
- In our experiments, Iguana 17 performed very well.
A reason for this is that the capitalization is a visual clue that the number or symbol following the noun is part of that noun. This makes the sentence easier to parse for a reader.
- …in figure 2 mice are white… Does this mean that in the figure there are 2 white mice, or the second figure mice are (all) white?
If you feel that such capitalization is unwarranted, more power to you. However, this conviction is not a license for being inconsistent. If you refuse to capitalize Iguana 17, then why are you capitalizing Section 2?
Note that section or chapter are not special words in English that are always capitalized. They follow the same rules for capitalization as regular nouns. We often see them capitalized in science writing because they are often enumerated.
How and what are not interchangeable. How, unlike what, is not a pronoun. Hence, how can not be used to represent a noun or noun phrase.
- Right “what it looks like” and “how it looks”.
- Wrong “how it looks like”. For example, “We now describe how the process will look like in this situation.”
Some minor pet peeves of mine (but I may be old fashioned).
- A preposition is not something to end a sentence (or a phrase) with. (Remember that ‘to’ is often used as a preposition unless one knows more than one ought to.)
- So-called means “commonly named” or more often “improperly named” (and thus carries a negative connotation). It does not just mean “named”.
- On the other hand is a rather pedestrian phrase that should be used sparingly in scientific writing. It should never be used to present a set of three (or more) alternatives unless you come from a planet of three-handed people.
Latex advice for my students
- In latex “~” has the same semantics as a space (“ “)
except it prevents a line break. So I recommend using tilde
(instead of a space) before a citation or reference. It is
also useful after a “.” which is not a period (for example,
a “.” used in an abbreviation). This is because most
typesetting systems put more space after a period than they
do after a “.” used for another purpose. So the “~” in
latex indicates that only a number amount of space should be
used (not the larger “period” amount of space). This is
easier for old farts to remember since in our typing
classes, we learned to put two spaces after a period and
only one after an abbreviation.
- “Mendelzon~\cite{Men98} presented” will be formatted as “Mendelzon [Men98] presented”, but the citation will always appear on the same line as Mendelzon.
- “Figure~\ref{fig:example1} shows” will be formatted as “Figure 1 shows”, but the 1 will never appear starting a new line (which would look very odd and be hard to read).
- “Prof.~Li gave the first lecture. She was very articulate.” In the latex output, there will be more space after “lecture.” than between “Prof.” and “Li”.
- The svn All project has an example paper in it that has some useful macros in it for reducing space in lists and other things. It also has an example bibliography file.
- Try to avoid widows and orphans (see the Wikipedia page on these editing terms), especially if you are trying to fit your paper into a page limit. Often this will mean rewording a sentence or removing an unnecessary word. But you’ll be surprised how many words are unnecessary…
- Section and subsection titles should fit on a single line. So in a two column formatted paper they have to be short.