John DiMarco on IT (and occasionally other things)
I welcome comments by email to jdd at cs.toronto.edu.

Sun 06 Jun 2021 13:39

The Covid19 Blues

Man playing a guitar
Image by lucasvieirabr from Pixabay

The arts find inspiration in times of trouble, none more so than the sort of music known as the blues. Blues are creative and emotional, sometimes raw, but never fake. Blues are not about superstars and megahits, blues are about the endurance and hope of ordinary people. As Covid19 drags on, endurance and hope are needed more than ever. Here are pointers to a few Covid19-inspired blues tracks that I appreciate.

Enjoy!

/misc permanent link

Thu 31 Dec 2020 22:57

What's Wrong With Passwords on the Internet Anyway?

Completed Login Prompt
Image by Gerd Altmann from Pixabay
More than fifteen years ago, Bill Gates predicted that use of traditional passwords would dwindle. This has happened to a certain extent, but a login and password is still the most commonly used credential for computing authentication. It is increasingly problematic. According to Verizon's 2020 Data Breach Investigations report, 37% of all breaches involved the stealing of credentials or the use of stolen credentials. (p.7) What is the root cause of the problem?

Put in simple terms, a login and password is what a system relies on to know who is who. Your password is secret: only you know what it is, and the system has some way of checking that it is correct. If someone connects to the system with your login and password, the system checks that the password is the right one for your login. If it is, the system concludes that you are the person trying to connect, and lets you in. If you are the only one who knows the password, this approach works, since you are the only person who can provide the correct password. But if criminals know your password too, and use it, the system will think the criminals are you, and will give them access to your account and all your data. The only way to fix this is to change your password to something new that only you know, but by then the damage may well be done.

Unfortunately, criminals have a pretty effective technique for finding out your login and password: they trick you into telling it to them. "Wait a minute!", you might say, "I won't ever tell a criminal my password. I don't even tell my family my password!" But you tell the system your password every time you log in. So if criminals set up a fake system that looks like the real one, and trick you into trying it, when you tell their fake system your password, the criminals will learn what it is.

This was not a common problem in the past, because it was difficult for criminals to successfully set up fake systems that look convincing. But on the Internet today, it is easy to set up a web site that looks like another site. The only thing that's hard to fake is the first part of the link, the hostname section that comes immediately after the double slash (//) and before the first single slash (/), because that part of the link is used to direct the request to the right system on the Internet. But given that the Internet is available in hundreds of countries, each with its own set of internet service providers, it is often not too difficult for criminals to find somewhere on the Internet where they can register a hostname that is similar-looking to the real thing.

Worse, the rise of messages containing embedded links make it very easy for criminals to send a fake message (e.g. an email or text) with a link that seems legitimate but really directs you to a fake site. This is called "phishing". Because of the way the web's markup language ( HTML) works, it is easy to set up a link that seems to point to one site, but actually points to another. For example, https://www.walmart.com is a link that seems to point to Walmart but really points to Amazon. Most web browsers will let you "hover" over a link to see where it really goes. But do people check every link carefully each time they use it?

The problem is made worse by the proliferation of legitimate messages with embedded links to all sorts of cloud services. I recently saw a message from a large organization to its staff, about their pensions. The message contained links to an external site whose name had no resemblance to the organization's name. The message invited the staff to click on those links to see information about their pensions. The message was legitimate: the organization had contracted with an external cloud provider to provide an online pension calculator for staff. But the message said nothing about the cloud provider: it merely contained a link to the calculator. If criminals had sent a similar message containing a malicious link to a fake system somewhere on the Internet, one that prompted staff to enter their login and password, no doubt many staff would have thought it legitimate. How could staff be expected to be able to tell the difference?

A good way to combat the password capturing problem is to require more than just a password to use a system. This is called "two-factor" or "multi-factor" authentication. Your password is one factor, and something else is a second factor, and you must provide both factors to prove to the system that it is you. This helps because the criminals must have both your password and your second factor in order to access your account and data. To ease the authentication burden for users, systems can ask for two factors only sometimes, such as when logging in for the first time in a while, or logging in from a new machine or a new location.

Ideally the second factor should be something that is hard for criminals to capture and use. One problem with a password is that it is a secret that can be used from anywhere on the Internet. With almost 60% of the world's population on the Internet, which now reaches every country in the world, the Internet can hardly be considered a "safe place". A second password, as easily used from anywhere on the Internet as the first, would not be much of an improvement. Worse would be the answers to some personal question about yourself, such as your mother's maiden name or the name of your first school: not only is such information just as easily used as a password, it is information that people may be able to find out in various ways. Answers to personal questions, while sometimes used for authentication, typically do not make a good second factor.

A better second factor is a message sent via a communication channel that goes only to you: for example, an email to your email address, or a text to your cell phone number. When you attempt to log in, the system sends a unique one-time code to you through that channel, and asks you to enter it. The assumption is that criminals won't have access to your email or your cell number, so they won't know and be able to enter the one-time code that the system sent to you. This is usually a good assumption. But criminals can try to get access to your email or your phone number, and sometimes they succeed. For example, in the case of a cell number, one thing they could try is to call your cell phone provider, tell them they are you and that your phone has been stolen, and request that your phone number be transferred to their new phone.

Another second factor, one even better, is a physical device in your possession. This could be a hardware security token that you plug into your computer or that displays a unique, frequently changing, code. Or it could be an app on your cell phone that is tied to your unique device. A physical device is an excellent second factor, because most criminals on the Internet are physically distant. To successfully pretend to be you, a criminal would need direct physical access to a device that would likely be located in your purse or pocket.

Relying on a device in purse or pocket as well as a password in your head is an improvement in security, but it has its drawbacks. It makes that device essential for you to use the system: if it is broken, lost or stolen, you're locked out, even if you know the password. While locking out people who don't have the device is exactly the point, that doesn't help when it is keeping you from legitimately using the system. Moreover, if that device is your smartphone, it changes your phone from a convenience to a necessity. While a smartphone has become a necessity already to some, it is a potentially consequential thing for it to become a requirement for everyone. A hybrid approach is perhaps best: hardware security tokens those who prefer it, a smartphone for those who for their own reasons carry one around anyway, and for many, both: a smartphone for convenience, with a hardware security token as backup, in case of smartphone loss or damage.

Perhaps there is an even more secure option? What if your second factor wasn't a device, but an actual physical part of your body, such as a finger (for a fingerprint), eye (for a retinal scan), face, or even heartbeat (as measured by e.g. a Nymi Band)? Would that be better still? After all, if it is hard for a criminal to get access to someone's things without being noticed, it is even harder to get access to someone's body. This is indeed possible: a technique called "biometrics, and it can be an effective second factor. Unfortunately there are a couple of issues with biometrics. For example, injuries or health issues can change your body; a cut on your finger may affect your fingerprint, for instance. Secondly, biometrics have a "revocation" problem. This comes from the fact that a biometric is a unique measurement of your body part: a fingerprint, retinal scan, facial image, or ECG. But measurements are data, and biometric data, like any other data, can and has been breached. If this happens, what will you do? Passwords can be changed, hardware security tokens can be replaced, but how are you going to change your fingerprint, your face, your eye, your heartbeat? While biometrics do have a place in authentication, most commonly to unlock a local device such as a smartphone or a laptop, the lack of revocability make biometrics less suitable as a second factor for Internet-accessible services.

Regardless of what is chosen for a second factor, the inconvenience of using more than one factor is something that has to be considered. Passwords, especially ones that are easy to remember, are quite convenient. Requiring more than this can make authentication more difficult. If becomes too difficult, the difficulty becomes a disincentive to use the system. For systems protecting highly sensitive data, some difficulty may be warranted, given the risk. For lower-risk systems, things are less clear. Yet for Internet-accessible systems, due to the prevalence of phishing, something more secure than just passwords seems increasingly necessary. I think Bill Gates is right: like it or not, the traditional password will become increasingly rare on the Internet, for good reason.

/it permanent link

Mon 23 Nov 2020 00:00

Thoughts on Covid19

Visual representation of Covid19 viruses under electron microscope
Image by PIRO4D from Pixabay
I'd recently reread a blog entry I'd written more than a year ago now on intentionality about blog posting. After writing it, I lived it: I wrote several additional blog entries throughout the year. But then along came the Covid19 pandemic, and it illustrated a problem with intentionality: intentionality requires priority. When Covid19 hit Ontario in March, the pandemic required substantial changes in how I live and work, and that drove a reprioritization of my efforts, both in my job as Director responsible for computing at the University of Toronto's Computer Science department, and at home, as a parent of teenagers in high school. In the face of the challenges of Covid19, blogging seemed not sufficiently important, and of course, it wasn't. So I didn't write, I worked. I am grateful to have work, in fact: I know of others who couldn't work because the sort of work they did couldn't be done from home. I consider myself fortunate to work in the computing field, which has not been so badly affected. In fact, in many ways, computing has been part of the solution (networking, videoconferencing, cloud computing, medical informatics, etc.) and has been boosted rather than impaired. In my job, I and my staff, and my department, found ourselves not without work, but with too much. This is not necessarily a bad situation to be in, but it doesn't lend itself to blogging.

Another reason is that Covid19 didn't just affect me professionally, it affected me personally: I lost a parent to Covid19 this summer. While I am not in any way unique in having lost someone to this disease, I was not really in a good state to blog, for quite some time.

There is still another factor, though, one that also kept me from blogging. I am no epidemiologist. Still, as a thinking person, I seek to understand what was going on, why, and what can be done about it. Seeking to understand is, for me, theraputic: it helps me deal with stress, anxiety, grief, and loss.

First, I looked for good sources of information about the pandemic itself. The Centre for Disease Control and Prevention in the US has plenty of good material about it. One thing I found particularly helpful was an analysis in mid-May about a choir practice in Washington state with 61 attendees, one that led to most becoming infected. It resulted in three hospitalizations and two deaths. The CDC report is a very helpful example of rigorous statistical data analysis set in a small, understandable real-world context. As an illustration of what the Covid19 virus is like, I find it very helpful. For instance, it suggested airborne spread before that became generally realized.

Secondly, information about previous pandemics. Again, the Centre for Disease Control and Prevention in the US has a very good past pandemics page, put together before the Covid19 pandemic started, covering the horrifying 1918 influenza pandemic that killed fifty million people around the world, and the later influenza epidemics of 1957, 1968, and 2009. Each of these provide a general helpful picture: firstly, that each pandemic has a timeframe that is typically greater than one year but less than two, that transmission reduces in the summer but increases in the fall/winter due to indoor crowding and decreased relative humidity, and that mass vaccination can be an effective way to ward off a disaster of the scale of the 1918 pandemic.

One problem with this current pandemic is that, unlike the pandemics of 1957, 68, and 2009, the virus is not influenza, but a coronavirus. There are four coronaviruses that have been circulating widely for years (229E, NL63, OC43, and HKU1), but they typically don't cause serious illness. Two others (SARS-CoV and MERS-CoV) emerged in the early 21st century, both quite dangerous and certainly serious enough to warrant vaccination were they to spread widely, but due to a great deal of diligence and effort, and not a little good fortune, both of these were kept from spreading through the world population. The current Covid19 pandemic, caused by yet another coronavirus, SARS-CoV2, is the first coronavirus both serious enough and widespread enough to warrant a vaccine. Unlike for influenza, a coronavirus vaccine has never been produced before, so it has taken longer than it would if this pandemic had been influenza. Only now, as we approach the one year mark of the virus' first emergence, are we seeing some likely vaccine candidates. It will still take some time to produce and distribute suitable vaccines.

In the meantime, while efforts continue to design, test, produce and distribute a suitable vaccine, the challenge is to keep Covid19 from spreading far and fast. While at first it was believed that Covid19 spreads primarily through surface contact, there is increasing evidence for areosol spread (fine droplets in the air). So methods are needed to hinder the passing of the virus from one person to another. There are two main approaches: keeping people further apart, and putting physical barriers (e.g. masks) and processes (e.g. handwashing) in place so that the virus can't easily pass from one person to another.

The best way to hinder the transmission of Covid19 is to find out who may be contagious (through testing and contact-tracing), and keep them away from everyone else (quarantine) until they are no longer contagious. One challenge is that it can sometimes be very hard to detect when someone has Covid19 and is spreading the virus. There is a wide variation in how Covid19 affects people who have it. For many, it can take days for symptoms to emerge (presymptomatic), and for some, Covid19 can be mostly or completely asymptomatic, yet asymptomatic and presymptomatic Covid19 patients can spread the disease. If those who may have Covid19 can be identified (through testing and thorough contact tracing), then those individuals alone can be quarantined until they are no longer contagious. If they cannot be identified, then the only way to hinder the spread of the disease is to assume that almost anyone might have Covid19. This requires such things as requiring everyone to wear masks, and, despite severe social and economic cost, lockdowns, which are a sort of semi-quarantine for everyone. As I write this, Covid19 has been spreading quite quickly in my city, Toronto, despite a mask mandate, and so Toronto is going into lockdown.

How will it all end? In the struggle between pessimism and hope, I choose hope. I hope that I will not lose any more family members to this disease. I hope that effective vaccines will soon be available in the necessary quantities. I hope that the measures taken to hinder the spread will be effective. I think it is reasonable to expect that we will see the widespread distribution of effective vaccines in 2021, and this pandemic will be over sometime next year. Will everything be the same? No, I think not. Some businesses (tourism and travel, for example) will have a massive economic hole to climb out of, and some companies will not survive, but people will travel again. Working from home, and technology in support of it, will be more widely accepted. Cheek-to-jowl "open-concept" offices, handshaking, and other close-quarters working practices will be less readily accepted. There will be a greater consciousness of viral hygiene, and a greater acceptance of masks. But life will go on. Covid19 will no longer command the attention it is getting now. Other things will seem important again. And there will be many worthwhile things to blog about.

/misc permanent link

Mon 24 Feb 2020 10:19

Some Clarity on Public Cloud Cybersecurity

Break in clouds, revealing clear skies
Image by Sabrina Corana from Pixabay
I've been thinking about public cloud cybersecurity for some years now, as I've watched adoption of the public cloud grow from a trickle to a flood. Early on, most of the reasons I heard for public cloud adoption made a great deal of sense to me: the need to rapidly scale up and down the size of a service, the desire to leverage the expertise of a large technical partner with resources in network and computing infrastructure exceeding one's own, the desire to leverage geographically diverse, redundant datacentres, the desire to fund computing from operating rather than capital budgets, and the desire to build adaptable, scriptable services with better connectivity to the Internet than one could otherwise provide for oneself. But in the last year or two, as anxiety about cybersecurity increases, I've been hearing more and more people refer to cybersecurity as their primary reason for their adoption of the public cloud. I'm not so sure what I think of this reasoning. I can understand why someone might want to pass to a third party a task that makes them anxious. In situations involving strong emotions, such as anxiety, there is risk of "confirmation bias": believing something is true because you want it to be true. But is it? Ceteris paribus (all other things being equal), is the public cloud intrinsicly more secure than on-premise datacentres?

Some argue yes. Eplexity calls cloud computing "an established best practice for businesses" and claims "your data is typically safer in the public cloud than in an on-premises data centre". In 2016, Sara Patrick of Clutch, guest-writing for Tripwire.com, claimed to have "four reasons why the Cloud is more secure than Legacy Systems" In 2017, Quentin Hardy of the New York Times claimed that cloud data is "probably more secure than conventionally stored data." In 2018, David Linthicum, writing for InfoWorld, claimed "your information is actually safer in the cloud than it is in your own data centre".

One reason given for the claim is that public cloud providers offer greater technical expertise than what is possible on-premise. Eplexity writes:

Unless your company is already in the business of IT security, spending time and effort on securing your on-premises data distracts from your core functions. Most organizations likely don't have a robust, experienced team of cybersecurity professionals at their disposal to properly protect their on-premises data. ... As such, cloud providers may employ hundreds or thousands of developers and IT professionals.
This is an argument from size and scale. Cloud providers are bigger than you, and have arguably more IT expertise than you, so they can do a better job than you. But sadly, size and IT expertise is no guarantee of security. Yahoo was a large Internet company, valued at one time at $125 billion. It employed thousands of developers and IT professionals. Yet it was subject to a cybersecurity breach of three billion user accounts in 2013/14; the breach was not disclosed until the fall of 2016, and the full impact was not known until october 2017. The damage to Yahoo's business was significant: Verizon acquired Yahoo in 2017 for less than $5 billion, a deal that was nearly derailed by the disclosure of the breaches.

I think we must conclude from the Yahoo story that size and expertise alone is no guarantee of cybersecurity. Naturally, major cloud providers like Amazon, Microsoft and Google are aware of the Yahoo situation and its consequences. No doubt it illustrated for them the negative impact that a major breach would have on their business. I cannot imagine that they would take the threat lightly.

Yet there have been close calls. Microsoft, a major cloud provider, in December 2019 accidentally disclosed to the world a cloud database on Azure with 250 million entries of customer support data. Happily, a security researcher spotted and reported it, and Microsoft fixed it soon after. Moreover, Zak Doffman, writing for Forbes, reported in Jan 2020 that Check Point Software Technologies, a cybersecurity vendor, had discovered in 2019 a serious flaw in Microsoft Azure's infrastructure that allowed users of the service to access other users' data. While Check Point reported it immediately to Microsoft, who fixed it quickly, had the flaw been discovered by criminals instead of cybersecurity researchers, a great many things running on Azure could have been compromised. Doffman quotes Yaniv Balmas of Check Point:

...the take away here is that the big cloud concept of security free from vulnerabilities is wrong. That's what we showed. It can happen there as well. It's just software and software has bugs. The fact I can then control the infrastructure gives me unlimited power.
In the Check Point research article describing the flaw, Balmas concludes:
The cloud is not a magical place. Although it is considered safe, it is ultimately an infrastructure that consists of code that can have vulnerabilities - just as we demonstrated in this article.

What, then, is the right answer? Well, there isn't one. Neither public cloud or on-premise datacentres are magic, neither are "safe". Cybersecurity is a challenge that has to be met, no matter where the service is, or what infrastructure it is using. Happily, this is finally being recognized. Even Gartner Research, a long-time proponent of the public cloud, predicting as recently as mid-2019 that public cloud infrastructure as a service (IaaS) workloads will suffer at least 60% fewer security incidents than those in traditional data centers, has recently taken a more nuanced view. In the fall of 2019, this prediction of fewer security incidents in the cloud disappeared from Gartner's website, and was replaced by this:

Through 2024, the majority of enterprises will continue to struggle with appropriately measuring cloud security risks.
Questions around the security of public cloud services are valid, but overestimating cloud risks can result in missed opportunities. Yet, while enterprises tended to overestimate cloud risk in the past, there's been a recent shift - many organizations are now underestimating cloud risks. This can prove just as detrimental, if not more so, than an overestimation of risk. A well-designed risk management strategy, aligned with the overarching cloud strategy, can help organizations determine where public cloud use makes sense and what actions can be taken to reduce risk exposure.

So does "public cloud use make sense"? Yes, of course it does, for a great many things. But it's not because the public cloud is intrinsicly more secure. The public cloud has its own set of cybersecurity issues. There is no "free pass". As always, carefully assess your risks and make an informed decision.

/it permanent link

Fri 24 Jan 2020 20:02

Does AI Help or Hinder Cybersecurity?

Hooded figure with glowing circuit-board visage
Image by Gerd Altmann from Pixabay
Both AI and cybersecurity have become increasingly prominent in recent years. AI's prominence has been driven by advances in machine learning and the very real improvements it has made in the ability of computer systems to do things that previously seemed possible only to human beings. Cybersecurity's prominence has been driven by a number of developments, including increasing nation-state conflict on the Internet, and a dramatic rise in organized cyber-crime. It is inevitable that the two will combine: AI will be and is being applied to the cybersecurity space, through the development of machine learning techniques for breaking into and defending systems.

One view on this is that machine learning, as a powerful technique that enables computer systems to take on tasks previously reserved only for humans, will empower cyberattackers to breach computer security in new ways, or at least in ways more effective than before. I know there is a great deal of anxiety about this. This past fall, I had a conversation with a CIO of a large university, who told me that his university was migrating its internet services to Amazon precisely because he believed that new AI-powered cyberattacks were coming, and he thought Amazon would be better able to fend them off. I'm not sure what I think of this defensive strategy, but that is not the important question here. The key question is this: are AI-powered cyberattacks going to overwhelm cyberdefence?

No doubt AI-powered cyberattacks are a reality. Machine learning is a powerful computer science technique, especially for automation. Cyberattackers, especially sophisticated, well-funded cyberattackers, will use it and I am confident are already using it. But highly automated cyberattacks are nothing new: cyberattackers have been automating their attacks for decades. Smarter automated cyberattacks are certainly something to worry about, but will they be transformative? Maybe. After all, in cybersecurity, the advantage is to the attacker, who needs to find only one hole in the defences, while the defender needs to block all of them. Anything that boosts the effectiveness of the attacker would seem to make the situation worse.

To really see the full picture, it's important to look at the defender too. Machine learning makes the situation worse only if it benefits the attacker more than it benefits the defender. But does it?

I don't have a complete answer to this question: there is a great deal of work still to be done on the application of machine learning to cybersecurity. But I suspect that the answer is a qualified No: rather, all other things being equal, machine learning will likely shift the balance of power towards the defender. The reason is data.

Machine learning is a technique where computer systems, instead of being programmed by programmers, learn what to do from data. But the quality of the learning depends on the quality and in particular the quantity of data. Machine learning is a technique that is most effective when trained with large amounts of data. ImageNet, for instance, a standard training dataset used to train machine learning applications to recognize images, contains about 14.2 million images. But who is more likely to have access to large amounts of good data about a system: the attacker or the defender? Of course, it depends, but it seems to me that, very generally speaking, the defender is more likely to have access to good system data than the attacker. The attacker is trying to get in; the defender is already in.

Of course, this is the broadest of generalizations. The effectiveness of machine learning in the cybersecurity space depends on a great many things. But I am cautiously optimistic. I realize I may be bucking what seems to be becoming a prevailing trend of ever-increasing anxiety about cybersecurity, but I believe here that machine learning has more potential to help than to harm. I look forward to seeing what will emerge in this space over the next few years.

/it permanent link

Mon 30 Sep 2019 06:33

What's all the fuss about AI anyway?

Brain-shaped Network
Image by Gordon Johnson from Pixabay
A great deal in the past five years has been written about Artificial Intelligence (AI). But there's a lot of confusion about what AI actually is, and why it is of special interest now. Lets clear up some of that confusion. In ordinary language, what is this fuss about AI all about?

AI, broadly understood, is a term used to describe a set of computing techniques that allow computers to do things that human beings use intelligence to do. This is not to say that the computer is intelligent, but rather that the computer is doing something that, if done by a person, would be considered evidence of that person's intelligence. Contrary to widespread opinion, this is not the same thing as an artificial person. In fact, there have been for a long time many things that humans use intelligence to do, that computers do better, whether it be remembering and recalling items, doing arithmetic, or playing chess. But computers do these things using different techniques than humans do. For example, Deep Blue, a custom chess computer built by IBM, beat Garry Kasparov, the then-reigning world chess champion, in 1997, but Deep Blue played chess in a very different way than Garry. Garry relied on his human intelligence, while Deep Blue used programming and data.

However, some computer scientists, noting that people can do things that computers can't, thought long and hard about ways that people do it, and how computers might be progammed to do the same. One such technique, deep learning, a neural network technique modelled after the human brain, has been worked on since the 1980s, with slow but steady improvement, but computer power was limited and error rates were often high, and for many years, most computer scientists seemed to feel that other techniques would yield better results. But a few kept at it, knowing that the computers of the day were inadequate, but advances in computing would make things possible that weren't possible before.

This all changed in 2012, when one such researcher, Geoff Hinton, and his students, working here at the University of Toronto, published a seminal deep learning paper that cut error rates dramatically. I remember supporting Geoff's group's research computing at that time. It was a bit challenging: we were using multiple GPUs per machine to train machine learning models at a time when GPU computing was still rather new and somewhat unreliable. But GPUs were absolutely necessary: without them, instead of days of computing time to train a model, months would be required. One of our staff, Relu Patrascu, a computer scientist and skilled system administrator working hand-in-glove with the researchers, tuned and configured and babysat those machines as if they were sick children. But it worked! Suddenly deep learning could produce results closer to what people could do, and that was only the beginning. Since then, deep learning has produced terrific results in all sorts of domains, some exceeding what people can do, and we've not even scraped the surface of what is possible.

But what does deep learning actually do? It is a computer science data classification technique. It's used to take input data and classify it: give it a thing and it will figure out what the thing is. But it classifies things in a way that's different and more useful than traditional computer science methods for classification, such as computer programming, or data storage and retrieval (databases). As such, it can be used to do a lot more than computers previously had been able to do.

To see this, consider traditional computer science methods: for example, computer programming. This approach requires a person to write code that explicitly considers different cases. For example, imagine that you want to classify two-dimensional figures. You want to consider whether they are regular polygons. You could write a computer program that defines for itself what a regular polygon is, and checks each characteristic of an input shape to see whether or not it matches the definition of a regular polygon. Such a program, when given a square, will notice that it is a polygon, it has four sides, and that those sides are equal in length. Since the programmer put into the program a detailed definition of what a regular polygon is, and since the program checks each feature explicitly, it can tell whether or not a shape is a regular polygon, even if the program has never seen that particular shape before.

But what about exceptional cases? Is a circle a regular polygon? It is, after all, the limit of an N-gon as N goes to infinity. This is an "edge case" and programs need to consider those explicitly. A programmer had to anticipate this case and write it into the program. Moreover, if you wanted to consider some other type of shape, a programmer would have to rewrite the code accordingly. There's no going from a bunch of examples to working code without a programmer to write it. Programming is certainly a useful technique, but it has its limits. Wouldn't it be nice to be able to learn from a bunch of examples, without a person having to write all that code?

One way to do that would be data storage and retrieval, for example, a database. Consider the shape classifier problem again. You might put in a bunch of shapes into a database, indicating whether the shape is a regular polygon or not. Once the database is populated, classifying a shape simply becomes looking it up. The database will say whether or not it is a regular polygon.

But what if it's not there? A database has the advantage of being able to learn from examples. But it has a big disadvantage: if it hasn't seen an example before, and is asked about it, it has no idea what the right answer is. So while data storage and retrieval is a very useful computing technique, and it is the backbone of most of our modern information systems, it has its limits. Wouldn't it be nice if a classifier system could provide a useful answer for input data that it's never seen before, without a programmer to tell it how?

Deep learning does exactly this. Like data storage and retrieval, it learns from examples, through training. Very roughly, a neural network, when trained, is given some input data, and is told what output data it should produce when it sees that data in future. These input and output constraints propagate forward and backwards through the network, and are used to modify internal values such that when the network next sees input like that, it will produce the matching output.

The key advantage of this technique is that if it sees data that is similar to, but not the same as data it has been trained on, it will produce output similar to the trained output. This is very important, because like programming, it can work on input it has never seen, but like databases, it can learn from examples and need not be coded by a programmer anticipating all the details in advance. For our shape example, if trained with many examples of regular polygons, the neural network will be able to figure out whether or not a given input is a regular polygon, and perhaps even more interestingly, it will be able to note that a circle is very like a regular polygon, even if it had never been trained on a circle.

Moreover, a deep learning neural network can learn from its own results. This is called reinforcement learning. This technique involves using a neural network to derive output data from some input data, the results are tested to see how well they work, and the neural network is retrained accordingly. This way a neural network can "learn from its own mistakes", training itself iteratively to classify better. For example, a model of a walking human, with some simple programming to teach it the laws of physics, can, using reinforcement learning, teach itself how to walk. A few years ago, some of the researchers in our department did exactly that. Another example: Google got a lot of attention a few years ago when deep learning researchers there built a deep learning system that used reinforcement learning to become a champion at the game of Go, a game very hard to computerize using traditional techniques, and proved it by beating the reigning Go world champion.

It seems clear to me at this point that deep learning is as fundamental a computing technique as computer programming and databases in building practical computer systems. It is enormously powerful, and is causing a great deal of legitimate excitement. Like all computer science techniques, it has its advantages and drawbacks, but its strengths are where other computer science techniques have weaknesses, and so it is changing computer science (and data science more generally) in dramatic ways. It's an interesting time to be a computer scientist, and I can't even begin to imagine the many things that bright and innovative people will be able to do with it in the future.

/it permanent link

Mon 02 Sep 2019 20:14

Existential threats from AI?

Nuclear explosion
Image by Alexander Antropov from Pixabay
Plenty has been written about the possible threats to humanity from Artificial Intelligence (AI). This is an old concern, a staple of science fiction since at least the 1950s. The usual story: a machine achieves sentience and pursues its own agenda, harmful to people. The current successes of machine learning have revived this idea. The late Stephen Hawking warned the BBC in 2014 that "the development of full artificial intelligence could spell the end of the human race". He feared that "it would take off on its own, and re-design itself at an ever increasing rate." He worries that human beings, "who are limited by slow biological evolution, couldn't compete, and would be superseded." Henry Kissinger, in a thoughtful essay in The Atlantic last year, worried that "AI, by mastering certain competencies more rapidly and definitively than humans, could over time diminish human competence and the human condition itself as it turns it into data." Elon Musk, in a debate last month with Alibaba's Jack Ma, reported by WIRED, argued that "there's just a smaller and smaller corner of what of intellectual pursuits that humans are better than computers. And that every year, it gets smaller and smaller, and soon will be far far surpassed in every single way. Guaranteed. Or civilization will end."

Are they right? Is there an existential threat to humanity from AI? Well, yes, I think there actually is one, but not quite in the way Musk, Kissinger, or Hawking fear. Computer have been better at humans for a long time in many cognitive domains. Computers remember things more accurately, process things faster, and scale better than humans in many tasks. AI, particularly machine learning, increases the number of skills where computers are better than humans. Given that humanity has been spending the last couple of generations getting used to a certain arrangement where computers are good at some things and humans are good at others, it can be a bit disconcerting to have this upended by computers suddenly getting good at things they weren't good at before. I understand how this can make some people feel insecure, especially highly accomplished people who define themselves by their skills. Kissinger, Musk and Hawking fear a world in which computers are better at many things than humans. But we have been living in such a world for decades. AI simply broadens the set of skills in question.

As a computer scientist, I am not particularly worried about the notion of computers replacing people. Yes, computers are developing new useful skills, and it will take some getting used to. But I see no imminent danger of AI resulting in an artificial person, and even if it did, I don't think an artificial person is an intrinsic danger to humans. Yet I agree that there are real existential threats to humanity posed by AI. But these are not so much long term or philosophical, to me they're eminently practical and immediate.

The first threat is the same sort of threat as posed by nuclear physics: AI can be used to create weapons that can cause harm to people on a massive scale. Unlike nuclear bombs, AI weapons do not do their harm through sheer energy discharge. Rather, machine learning, coupled with advances in miniaturization and mass production, can be used to create horrific smart weapons that learn, swarms of lethal adaptive drones that seek out and destroy people relentlessly. A deep commitment to social responsibility, plus a healthy respect for the implications of such weapons, will be needed to offset this danger.

The second threat, perhaps even more serious, comes not from AI itself but from the perceptions it creates. AI's successes are transforming human work: because of machine learning, more and more jobs, even white-collar ones requiring substantial training, can be replaced by computers. It's unclear yet to what extent jobs eliminated by AI will be offset by new jobs created by AI, but if AI results in a widespread perception that most human workers are no longer needed, this perception may itself become an existential threat to humanity. The increasingly obvious fact of anthropogenic climate change has already fueled the idea that humanity itself can be viewed as an existential threat to the planet. If AI makes it possible for some to think that they can have the benefits of society without keeping many people around to do the work, I worry we may see serious consideration of ways to reduce the human population to much smaller numbers. This to me is a dangerous and deeply troubling idea, and I believe a genuine appreciation for the intrinsic value of all human beings, not just those who are useful at the moment, will be needed to forestall it. Moreover, a good argument from future utility can also be made: we cannot accurately predict which humans will be the great inventors and major contributors of the future, the very people we need to address anthropogenic climate change and many other challenges. If we value all people, and build a social environment in which everyone can flourish, many innovators of the future will emerge, even from unexpected quarters.

Threats notwithstanding, I don't think AI or machine learning can go back into Pandora's box, and as a computer scientist who has been providing computing support for machine learning since long before it became popular, I would not want it to. AI is a powerful tool, and like all powerful tools, it can be used for many good things. Let us build a world together in which it is used for good, not harm.

/it permanent link

Mon 26 Aug 2019 06:51

Why we thought for a while Pluto was a planet, but it never was.
Pluto

More than a decade after Pluto's demotion from the rank of planet, some still do not accept it. I can sympathize. Like many of us, I grew up memorizing in school the nine planets of the Solar system, the last of which was Pluto: icy, distant and mysterious. I remember as a child poring over a diagram of the solar system, marvelling at the concentric elipses of the planetary orbits, and wondering why Pluto's orbit was so odd. For odd it was: all the other planets orbited the sun in more or less concentric elipses, but Pluto was eccentric: its orbit was at an unusual angle, and it even briefly came closer to the sun than Neptune. None of the other plants had orbits like this: why Pluto? But I didn't question that it was a planet. It had been recognized as a planet since Clyde Tombaugh discovered it before my parents were born. For me, Pluto was weird, but it was still "planet", the astronomical equivalent of a sort of odd uncle who behaved strangely and kept to himself, but still family.

But the idea of Pluto as a planet started to become problematic in the early 1990s. In 1992, Jewitt and Luu discovered another object beyond Neptune: Albion, much smaller than Pluto, and also with an odd orbit. Because it was a small object, it was pretty clearly not a planet, so Pluto's status was not yet in question, but it was only the first of many. By 2000, more than seventy such objects had been discovered. Most of these were very small, but some were not so small. And the discoveries continued. In 2003, with the discovery of the Eris, a trans-Neptunian body more massive than Pluto itself, the problem became acute. No longer was Pluto the odd uncle of the planets: now there were on the order of 100 odd uncles and aunts, and at least one of them, Eris, aptly named after the greek goddess of discord, had a better claim to planethood than Pluto itself. Something had to be done. This bunch of odd objects, odd in the same way as Pluto, were either all planets, or they were none of them planets. There was no reasonable distinction that could be made that would keep Pluto a planet but deny planethood to Eris and many of her siblings. To do so would be arbitrary: we would be saying that Pluto was a planet simply because we discovered it first and it took us a long time to discover the others. What to do?

Happily, there was a precedent: this sort of thing had come up before. In 1801, Giuseppe Piazza discovered Ceres, a body orbiting between Mars and Jupiter. This was a big deal. Only twenty years before, a new planet had been discovered for the first time in recorded history: Uranus, found by accident by William Herschel in 1781. Now, twenty years later, Piazza had found a second. And this one was not out beyond Saturn, it was nearer than Jupiter. But Piazza's share of the limelight was soon to lessen. his planet had a rival: a year later, Heinrich Wilhelm Olbers discovered Pallas, another body between Jupiter and Mars. Two years later, in 1804, Karl Harding discovered another: Juno. Not to be outdone, Olbers in 1807 discovered yet another, Vesta. By the middle of the 19th century, fifteen bodies orbiting between Mars and Jupiter were known, and while none of them were anywhere as large as Ceres, one of them, Vesta, had nearly a third of Ceres' mass. Were there really many small planets between Mars and Jupiter, or were these something else? When in 1846 the planet Neptune was discovered beyond Uranus, it became clear that some decision about these bodies between Mars and Jupiter needed to be made. A consensus emerged: Ceres and other such objects were not planets. They were called "asteroids", a name coined in 1802 by William Herschel. It was a good call: there are now well over 100,000 known asteroids, far too many for schoolchildren to memorize.

With Pluto, a similar situation was now occurring. While we weren't yet at 100,000 Pluto-like bodies, we knew about quite a few more than fifteen. And Pluto, unlike Ceres, wasn't even the most massive: Eris was, and quite possibly, bigger ones would be found. There was no denying the facts. Pluto, like Ceres, could not be a planet. It must be something else.

Of course this was quite controversial. People had been calling Pluto a planet for the better part of a century. Generations of schoolchildren had memorized it as part of the list of planets. But the choice was clear: either the schoolchildren would have to start memorizing longer lists, much much longer ones, or Pluto would have to be demoted. Well, not demoted, exactly, but newly recognized for what it really was all along: something different. In the sumer of 2006, the International Astronomical Union (IAU) declared that Pluto isn't a planet, it is a dwarf planet. While this designation is a little confusing (if a dwarf planet isn't a planet, why is it called a dwarf planet?), one thing was now clear: Pluto is not the same sort of thing as Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus and Neptune; it, and Eris, and probably a couple of other larger trans-Neptunian bodies discovered since the 1990s, are something different. But guess what: Ceres, too, fits IAU's definition of dwarf planet, the only asteroid that does. Two centuries after its discovery, Ceres, first-born of the non-planets and largest of the asteroids, was deemed a dwarf planet, and Piazza, its discoverer, though not the second person in recorded history to discover a new planet, was recognized as the very first to discover a dwarf one.

/misc permanent link

Fri 19 Jul 2019 16:13

Ross Anderson's Security Engineering
Security Engineering - Second Edition

Until recently, I had not read Ross Anderson's Security Engineering, despite hearing good things about it. I'm not sure why: I think I was put off a bit by the title. I had a vague and confused impression that a book about "Security Engineering" would be yet another how-to book about making computers secure. I should have known better. In this case, I was wrong, very much so, and much to my detriment. I should have read this book long ago.

Why had I not read it? I have no excuse. The book has been out for a while: it is in its second edition, which came out in 2008 (Anderson is writing a third edition, expected next year). So I certainly had the opportunity. Moreover, since 2012, the book has been free for the reading (and downloading) from his website. So I certainly had the means. I just didn't, until a few weeks ago, when I stumbled across it again. I read a little from the website, then a little more. Before long, I was well and thoroughly hooked.

Security Engineering is a classic, comprehensive book about information security: eminently readable, clear and thorough, it covers information security in pretty much every aspect one might encounter it, from the usual (cryptography, access controls, protocols, biometrics) to the not quite so day-to-day (nuclear weapons launch protocols, counterfeiting, even spying by analyzing the RF emissions from computers). Each chapter is a clear elucidation of a particular aspect of information security, focusing on the essential issues. Each chapter provides enough detail to understand the essential elements, yet not too much detail as to overwhelm the reader. His writing is a classic illustration of the difference between an expert and a master. An expert knows a great deal about a topic and provides an abundance of information. A master knows the key elements, those things that are most important, on which everything else hangs, and focuses exactly on these. This book is mastery, in clear, understandable and engaging language. It has become my favourite book in information security already, and I haven't yet finished it.

I look forward to the third edition sometime next year. I can't wait.

/it permanent link

Mon 04 Mar 2019 12:04

Externality and Information Security
It was a hot midsummer weekend, and I was traveling back to Toronto with friends. We were on the expressway (the name here in Ontario for the sort of road that Americans call freeways and Brits call motorways). Traffic was very slow: a classic traffic jam. After about thirty minutes, we reached the cause of the problem. It was not a collision. Nor was it highway construction. Instead, by the side of the roadway, a minivan was parked, back gate open, and a family was having a picnic on the nearby grass. I don't know if they realized they were causing a traffic jam, but they were. People had slowed to look, which caused traffic behind to slow too, and because of the traffic volume, this led to a traffic jam over a considerable distance.

I don't know why the family having the picnic had chosen that spot for it, and I don't know whether they realized the problem they were causing. But their picnic went on, unaffected by the traffic problems they were causing. In other words, the traffic jam was not their problem. It was an externality, something causing a negative effect not felt by those who cause it.

Externalities happen in life all the time. Large organizations (companies, countries, institutions) suffer significantly when their decision-makers make decisions that are good for themselves but not good for the organization. Rules to make this less likely are put in place: rules against bribery, rules concerning conflict of interest, rules imposing due process. But rules only work to a certain extent: there are plenty of situations where the rules are followed yet still externalities happen. Moreover, rules come with costs, sometimes significant ones. Rules may be necessary, but they are not sufficient, and they need to be accompanied by buy-in.

Let's consider traffic again. Driving is governed by all sorts of rules. Some of these rules work well: at traffic lights, go when the light is green, stop when it is red. Rarely broken, this rule makes traffic work in dense situations where otherwise there would be chaos. Most of the time, this rule is followed even in the absence of external enforcement. When enforcement does occur, it is well regarded: hardly anyone will argue that a person running a red light is a safety hazard and should be ticketed. In practice, you can stand for hours beside a busy traffic signal in a typical Ontario city, and despite the absence of police presence, not find a single driver running a red light.

Sadly, other driving rules don't work quite so well, such as speed limits on expressways here in Ontario. These limits are often broken, with some following them and others not. Often, on an uncongested expressway, unless enforcement is likely (i.e. police is present) there will be some people driving over the speed limit. Enforcement is viewed cynically: speeding tickets are often viewed more as revenue generation than as a safety measure. Obeying speed limits is often viewed by drivers as an externality: not my problem, unless there is a police officer around to make it one. In practice, at any place on any uncongested Ontario expressway, you will be hard-pressed to find a five-minute period in which no passing driver has exceeded the speed limit.

I have been thinking a lot about information security lately. In information security, we have a situation similar in many respects to driving. Just as driving is a matter of traveling safely, information security is a matter of computing safely. When we compute, we may be processing information that is sensitive, confidential, private. Harm can occur when it is exposed. Steps need to be taken to ensure that it is not: persons handling information will have to handle it securely. But do we want this process to look like speed limits? Or traffic lights? I think the answer is clear: if we want information to actually be secure, we want good security practice to be followed like the rules for traffic lights are followed: broadly and consistently, without the need for the constant threat of enforcement.

In recent years, an information security profession has arisen. The increasing demands of the profession have made it increasingly rare that an information security professional has spent much time actually running a substantial IT operation. Certifications abound, and a multiplicity of complex and large security standards have been created, each requiring professionals to interpret. A great deal of money is being spent on information security. Much of this is good and necessary: information security needs attention, codification, dissemination, and championship. But the professionalization of information security comes with big risks, too: the risk that information security will become the responsibility only of specialists, the risk that these specialists will come up with all-encompassing codexes of security standards to impose, the risk that these standards will be treated as externalities by IT practitioners, the risk that the information security profession will respond with enforcement, and hence the risk we will find ourselves in the expressway speed limit situation with respect to information security.

The fact is, information security is an aspect of good IT practice: if an implementation is not secure, it is broken, just as much as if it were not reliable. Security is the responsibility of all IT practitioners: it needs to be internalized, not externalized.

For this to happen, it is important that information security rules be simple and understandable, to ensure buy-in. Just as traffic light rules address the obvious risk of traffic accidents, so should security rules address clear risks in a visibly appropriate way. In most cases, it's not so important that rules be part of a comprehensive codex that addresses all possible areas of risk: the more complex the rule and the more extensive the system of rules, the more likely it will all be treated as an externality. What we really want are not rules for their own sake, but genuinely secure IT.

If we want secure IT, we need to recognize that there is another potential externality at work. Genuine information security and the good of the information security profession may not always align. Just as expressway speed limits employ more police than traffic lights, an enforcement approach will employ more information security professionals than an internalized one. But the internalized approach is what gives us secure computing. This is not something that can be left to the information security profession alone. To get there, we will need collaborative effort from all of us, particularly those with long experience running substantial IT operations. We will all need to make a true commitment to a practical approach, one that seeks to make computing genuinely more secure in the real world.

/it permanent link

Tue 26 Feb 2019 06:27

Intentionality

I spent all of 2018 intending to blog, and not doing it. Sadly, this is an all too human situation. We intend to do things, when we can, when time permits, but we can't; time doesn't permit. Or at least this is one of those stories we tell ourselves. The truth is a little simpler: throughout 2018, my intention to blog was not strong enough for me to re-prioritize things in my day so that I would do it.

I had plenty to say. I continue to have plenty to say. I had plenty of important things to do, and that also continues to be true. Despite my other responsibilities, I am making time now, and I will continue to make time, every so often, to say things in this blog. I am being intentional about it.

To be intentional about something means to be deliberately purposeful: to make one's actions a directly chosen consequence of one's thoughtful decisions. For most people, myself included, life is full of input, distractions, demands, requests. It is easy to fill time without much effort. But if I am not intentional, it will be filled with reaction, not action: things that circumstances and prior commitments have chosen for me, not things I have chosen for myself.

Reaction is fine, even good and necessary. Many people, myself included, build up throughout their lives various important responsibilities: responsibilities to family, work, friends, communities. Responsibilities carry with them a commitment to react to the needs of others. This is well and good. But it is not enough, at least not for me. I realize that to be authentic, I have to consider carefully what is important to me, decide what to do about it, and then act on it. This is intentionality. I've decided to be intentional about blogging. Look for more blog entries in the coming weeks.

/misc permanent link

Tue 12 Dec 2017 13:07

A Way to Visualize Relative Masses of Things in the Solar System
Every so often we hear things in the news about the solar system: a mission to a planet or asteroid, talk of manned missions to mars, arguments about whether Pluto is a planet or not. We tend to have pretty sketchy ideas of what most bodies in the solar system are like compared to Earth. The fact is that they're more wildly different in size and mass than we might think.

Let's look at mass. Imagine you decide to row across San Francisco bay in a 12-foot aluminum rowboat. You pack a couple of suitcases, your 15 inch Macbook Pro (can't go without connectivity) and your ipad mini, you get in your rowboat and start rowing. As you row, you get hungry, so you pull out a Snickers bar. Now imagine that the USS Nimitz, a massive nuclear-powered aircraft carrier, passes by. There you are, in a rowboat with your two suitcases, your Macbook Pro, your iPad, and your Snickers bar, alongside a huge supercarrier.

Well, the mass of the sun compared to the earth is like that aircraft carrier compared to you and your boat. The mass of Mars is like your two suitcases. The mass of the moon is like your 15 inch Macbook Pro, and the mass of Pluto is like your iPad mini. As for the Snickers bar, it's like Ceres, the largest of the asteroids.

Now let's suppose the massive wake of the aircraft carrier tips over your rowboat and leaves you in the water. Along comes a rich tech founder in his 70 foot yacht, and fishes you out. That yacht is like Jupiter, the largest planet.

So forget any mental images you might have of planets being something like the Sun, only a bit smaller and cooler. The sizes of things in the solar system are really quite different, and there is nothing, absolutely nothing, in the solar system that is anything quite like the Sun.

/misc permanent link

Mon 11 Dec 2017 14:02

Bitcoin, Cryptocurrency and Blockchain

As the price of Bitcoin goes up and up, talk increases about Bitcoin and other cryptocurrencies, like Litecoin, Monero, ZCash, Ethereum, and many others. Plenty is being said, and it can be a bit confusing.

But there is no need to be confused. Bitcoin and other cryptocurrencies are basically simple. They are not coins. They are simply lists. Each cryptocurrency has a master list. The list typically contains information about who and what (i.e. amounts). The list is designed in a clever way, using computer software, so that people all over the world can have identical copies of the list and keep it up to date, without someone having to be the holder of the "master copy". But it is still just a list.

The sort of list used for cryptocurrencies is called a "blockchain", and it has some special properties. One particularly clever property is that you can't normally just add anything you want to the list, there is a scheme to control that. Instead, you need to arrange with someone already on the list to give up (some of) their place on the list to you.

So when someone says they bought some Bitcoin and they're going to make a lot of money, what they mean (whether they realize it or not) is that they paid somebody some money to put them on a list, and they hope that someone later will pay them even more money to get off it.

As for me, I haven't "bought" any. As I write this, cryptocurrency prices are rising fast. But I think what is happening is a kind of run-away positive feedback loop: people are buying in because it is going up, and it is going up because people are buying in. Eventually it will run out of people to buy in, and it will stop going up. Then some people will sell, causing the feedback loop to go the other way: people will sell because it is going down, and it will go down because people are selling.

That being said, one thing in particular about cryptocurrency is making me grumpy about it, even though I don't "own" any. Recall I wrote that you can't normally make yourself a new entry on a blockchain list, but there is a way. You can do an enormous lot of computations on a computer in an attempt to find new special numbers that can be used to create new entries on the list. This process is misnamed "mining", but it's more a sort of computerized brute-force mathematical searching. Those computations take a long time and use a lot of electricity. Moreover, even the ordinary transactions generated by people "buying" and "selling" a cryptocurrency is a computational burden, since there are so many copies of the list around the world. Each list is very big: Bitcoin's is more than 100GB, and every copy need to be updated. This uses electricity too. In fact, digiconomist.net estimates that Bitcoin computations alone presently use up enough electricity to power more than three million US households. Furthermore, the "mining" computers use GPUs that are really good for graphics and machine learning, but because cryptocurrency "miners" are buying them all up, those GPUs are getting harder to find for a good price. Personally, I am not happy with the challenges I am having in finding enough GPU resources for our computer scientists, who are hungry for GPUs for machine learning. While high demand for GPUs is maybe good for GPU manufacturers (for example, according to fortune.com, Nvidia made U$150M in one quarter in 2017 selling GPUs to cryptocurrency "miners"), surely all those GPUs, and all that electricity, can be used for something more useful than cryptocurrency.

/it permanent link

Thu 09 Mar 2017 12:58

A closer look at topuniversities.com's 2017 rankings for Computer Science.

The QS World University Rankings for 2017 are out, including the subject rankings. For the subject "Computer Science & Information Systems", the University of Toronto does very well, placing tenth.

A closer look at the top ten shows some expected leaders (MIT, Stanford, CMU, UC Berkeley) but some less expected ones, such as Oxford and Cambridge. These are superb Universities with good Computer Science programs, but are their CS programs really among the ten best in the world?

A closer look at how the score is computed sheds some light on this question. The Overall Score is a combination of Academic Reputation, Citations per Paper, Employer Reputation, and H-index Citations. Academic Reputation and Employer Reputation are, in essence, the opinions of professors and employers respectively. While (hopefully) they are reasonably well founded opinions, this is a subjective, not an objective, metric. On the other hand, Citations per Paper and H-index Citations are objective. So I looked at Citations per Paper and H-index Citations for the top forty schools on the 2017 QS Computer Science & Information Systems ranking.

By Citations per Paper, top five of those forty are:

  1. Princeton
  2. Stanford
  3. UT Austin
  4. Washington
  5. UC Berkeley

No MIT? This seems off. So lets look at the top five by H-Index Citations:

  1. Stanford
  2. MIT
  3. UC Berkeley
  4. UI Urbana-Champaign
  5. UT Austin

That looks more reasonable. So let's look at the top twenty by H-Index Citations:

  1. Stanford
  2. MIT
  3. UC Berkeley
  4. UI Urbana-Champaign
  5. UT Austin
  6. Georgia IT
  7. CMU
  8. Tsinghua
  9. Nanyang
  10. ETH Zurich
  11. Washington
  12. Princeton
  13. UBC
  14. Toronto
  15. Waterloo
  16. NU Singapore
  17. UC London
  18. Cornell
  19. UCLA
  20. CU Hong Kong

That's a list that makes more sense to me. While it puts my department 14th instead of 10th, I think I have more confidence in the objectivity of this ordering than I do in the QS Overall Score ordering.

/misc permanent link

Thu 02 Feb 2017 13:35

Program Source Code Should be Readable by Human Beings By Definition
Version 3 of the Python programming language made a seemingly innocuous change to the Python programming language: no longer could tabs and spaces be mixed for indentation: either tabs must be used exclusively, or spaces. Hence the following is not a valid Python 3 program:

def hello():
	print("Hello")
        print("World")
hello()
If I run it, here's what I get:
% python3 testme.py
  File "testme.py", line 3
    print("World")
                 ^
TabError: inconsistent use of tabs and spaces in indentation
However, the following is a valid Python 3 program:
def hello():
        print("Hello")
        print("World")
hello()
% python3 testme.py
Hello
World
and so is the following:
def hello():
	print("Hello")
	print("World")
hello()
% python3 testme.py
Hello
World
Confused yet?

As you can, or perhaps more to the point, can't see, the problem here is that the first program uses a tab to indent the first print statement, and spaces to indent the second print statement. The second program uses spaces to indent both, and the third program uses tabs to indent both. But because tabs and spaces are both visually represented as whitespace, it is difficult or impossible to visually distinguish between a correct and an incorrect python3 program through inspecting the source code. This breaks the basic definition of source code: human-readable computer instructions.

No doubt the Python 3 designers have good intentions: to help python programmers be consistent about indentation. But to me, it seems unreasonable to have a programming language where syntactically or semantically important distinctions are not clearly visible in the source code.

/it permanent link

Wed 23 Nov 2016 09:48

Slow Windows Update on Windows 7 again? Install two Windows Update patches first.
Back in May, I wrote about Windows Update for Windows 7 taking many hours or even days; the fix then was to install two patches manually first.

The problem has returned. Even if you install the two patches I mentioned in May, you may experience very slow updates on Windows 7.

Happily, again there's a workaround: grab two patches, different than before, and manually install them. Get KB3172605 and its prerequisite KB3020369 from the Microsoft Download Center, and install them manually in numeric order, before running Windows update. If making a fresh Windows 7 installation, simply install Windows 7 SP1, followed by KB3020369, then KB3172605, and only then run windows update. These two patches seem to address the slowness issues: after they were installed on some of our systems here, Windows Update ran in a reasonable amount of time.

/it permanent link

Wed 26 Oct 2016 10:41

Remembering Kelly Gotlieb

On October 16th, 2016, Kelly Gotlieb, founder of the Department of Computer Science at the University of Toronto, passed away in his 96th year. I had the privilege of knowing him. Kelly was a terrific person: brilliant, kind, and humble. He was always willing to make time for people. He was a great thinker: his insights, particularly in the area of computing and society, were highly influential. I never fully realized how influential he was until we, here at the department of Computer Science, created a blog, http://socialissues.cs.toronto.edu, in honour of the 40th anniversary of Social Issues in Computing, the seminal textbook he and Allan Borodin wrote in 1973 in the area of computers and society. I served as editor of the blog, and solicited contributions from the top thinkers in the field. So many of them responded, explaining to me how influential his ideas had been to them, and the blog was filled with insightful articles building in various ways upon the foundation that he and Allan had laid so many years before. I interviewed Kelly for the blog, and he was terrific: even in his nineties, he was full of insights. His mind active and enthusiastic, he was making cogent observations on the latest technologies, ranging from self-driving cars to automated medical diagnosis and treatment.

To me, Kelly epitomized the truth about effective teaching that is all too often missed: teaching is not just about information, teaching is about inspiration. Kelly was a truly inspiring teacher and thinker. He was completely authentic in everything he did, he was full of enthusiasm, and that enthusiasm was infectious. Conversations with Kelly so often left me energized and inspired, thinking along new directions of thought that something he said had triggered, or leaping past obstacles that had previously seems insurmountable. That is true teaching. Information without inspiration is simply fodder for forgetfulness, but teaching that inspires leads to new insights, integration of ideas, genuine understanding, and a better, clearer and sharper window on the world. Kelly inspired so many people for so many years. We are truly blessed that he was among us. He will be remembered.

/misc permanent link

Sun 16 Oct 2016 18:02

The Price of Google
I am a Canadian still living in the city in which I was born. I love living in Canada, but life in Canada has its price. Al Purdy, the late 20th century Canadian poet, once wrote about Canada as a country where everyone knows, but nobody talks about, the fact that you can die from simply being outside. It is true, of course: almost everywhere in Canada, the winter is cold enough that a sufficient number of hours outside without protection can lead to death by exposure. But this basic fact is designed into pretty much everything in Canadian life, it is simply accepted as a given by well over thirty million Canadians, and we cope: we wear the right winter clothes, we heat and insulate our buildings in winter, we equip our cars with the right tires, and life goes on. Despite the Canadian winter, Canada is a great place to live.

Google offers a lot of very good free web services: it is "a great place to live" on the Internet, and their services are used by hundreds of milliions of people all over the world. While Google seems about as far removed from a Canadian winter as you can imagine, there's something in their Terms of Service that people seem to rarely talk about, something that might have a bit of a chilling effect on one's initial ardor.

Google, to its credit, has a very clear and easy-to-read Terms of Service document. Here's an excerpt from the version of April 14, 2014, which is the most current version at the time I write this.

When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones. This license continues even if you stop using our Services (for example, for a business listing you have added to Google Maps).
Let me pull out for closer examination the most important bits. For readability, I've omitted elipses.
When you submit content to our Services, you give Google (and those we work with) a worldwide license to use such content for the purpose of our Services. This continues even if you stop using our Services.

As you can see, this is pretty broad. You are granting Google and their partners the right to use your content for Google's Services (present and future) anywhere in the world, forever. While it does say that it must be used for the purpose of their Services, it doesn't limit itself to existing Services and it doesn't constrain what a "Service" might be. Since developing and offering Services, broadly understood, pretty much covers the gamut of what Google does as a company, the answer is Yes: by submitting content to their services, you are granting Google and their partners the right to use your content anywhere in the world, forever, for a broadly unconstrained set of purposes.

So does this mean nobody should use Google? Does the Canadian winter mean that nobody should live in Canada? After all, as Al Purdy writes, in Canada you can die from simply being outside.

Well, no, of course not. While Google has the right to do broadly unconstrained things with our content that we submit to them, their self -interest is typically aligned with our's: they want us to entrust our content to them, because they use it to earn money to operate. Therefore, to persuade us to keep submitting content to them, they will work hard to protect and secure the content they already have, in ways they think we consider important. For this reason, I think it's not unreasonable to trust Google with some of my content: I believe they are likely to protect it in sensible ways. Other content I choose not to submit to Google. Just as I am prepared for a Canadian winter, knowing it is the price I pay to live in Canada, I continue to use some Google services, knowing that they will keep and use my content. Many Google services are very good and well worth using, much of my content is not very sensitive, and I trust Google enough to share content with them.

I do wonder, however, how many Google users really understand the rights they are granting to Google. Canada has been around for centuries: the Canadian winter is no secret. But the implications of Google's broad right to use our content are not quite so obvious. It's not really so clear how Google is using the content or might use it in the future, and even if we trust Google, can we trust all those who might put pressure on Google? Quite frankly, we really don't know yet how Google's massive repository of our collective content can be used. We can envision wonderful outcomes: historians a century or two hence coming to insightful conclusions about early twenty-first century society, for example, but we can also envision outcomes not quite so sanguine: for example, a twenty-first century version of Orwell's 1984, a dystopian world of "thought-crimes" and "doublespeak" where content is is scanned for dissent from a prevailing ideology. A certain degree of caution may be warranted: in the case of Google, unlike Canada, we may not have yet seen how severe winter can be. A certain degree of caution is warranted. Yes, use Google, but use it knowing what you are doing.

One last thing to be said: I focus on Google here, but the same issues hold for Facebook, Twitter, Yahoo and other purveyors of free services over the Internet. Read their Terms of Service to learn what rights you are granting by your use of their services, and decide on the basis of that knowledge how to use their services, and even whether you use their services at all. After all, even Canadians sometimes choose to spend winter in Florida, Mexico, or Arizona.

/it permanent link

Mon 16 May 2016 20:29

The Sun-Managers Mailing list: a Knowledge Sharing Success Story
Sun-Managers was an email mailing list for system administrators of computers made by Sun Microsystems, Inc. The list operated from mid-1989 to the fall of 2014, and I was privileged to be part of it for almost all of its history. Sun-Managers was founded in May of 1989 by William (Bill) LeFebvre, at Northwestern University. At the time, Bill ran Sun-Spots, a digest-format mailing list for system administrators of Sun systems, but the digest format made it difficult for people to ask questions and get a timely response. He created Sun-Managers, an unmoderated mailing list intended for short-turnaround time questions. This was an immediate success: so much so that by the fall of 1989, the sheer number of messages on the list were swamping mailboxes. In Nov 1989, Bill instituted a simple policy: if someone asks a question on the list, other list members were expected to reply by email directly to the person asking the question, not to the list. The person asking the question, in turn, was expected to summarize the answers received, and send the summary to the list.

I joined the list about this time: I had started a new job at the University of Toronto's Computer Science department, a role that included the administration of a number of Sun workstations and servers. I was looking for resources to help me with my Sun system administration tasks, and this list was an excellent one. Because of this summary policy, the list volume was manageable enough that I could keep up, yet the turnaround time on questions was short. I mostly "lurked" at first, reading but not replying. I felt too inexpert to answer many questions, and too shy to ask. However, I learned a great deal from what I read. Moreover, the summaries were archived, and this archive became a resource in itself, a knowledge-base of practical information about administering Sun systems.

The list grew very rapidly: 343 summaries in 1990, and over 1000 in 1991. In August of that year, it was noted that certain questions were being asked often, and rather than waste effort answering the same question several times, a "Frequently Asked Questions" (FAQ) file was instituted. The first version was created by a list member from Boston University, and quickly grow to dozens of answers.

By November of 1992, the list had grown to thousands of members, and the workload of managing the list, editing the FAQ and coaching list members on how to follow the list policy had become significant. Many list members were not individuals, but "mail exploders": email addresses that themselves were mailing lists going to multiple individuals at a given site. This made handling list membership issues more complex. Bill LeFebvre decided to hand the list over to others. Two list members stepped up: Gene Rackow from Argonne National Laboratory to run the list software, and me, to handle the FAQ and policy work. By this time, I had benefitted from the list for a while, and I felt it was time to "give back". At the time, I wasn't in a position to actually run the list: I'd just taken on a new role as system manager of the University of Toronto Computer Science Department's teaching laboratories, and had my hands full, but I could certainly help with content. I was really glad to work together with Gene, a seasoned system administrator, on this rapidly growing list, which we moved to a system at Argonne National Labs, where Gene worked.

The list continued to grow through the 1990s. During this time, Sun Microsystems was quietly supportive, helping Gene with hardware (a Sparcstation 1) as the list grew. By 1996, over two thousand summaries a year were being produced, peaking at 2243 in 2002. In May of 1998, Gene Rackow handed over list management to Rob Montjoy from the University of Cincinnati, who in turn handed over list management to Bill Bradford in November of 2000. The list was moved from Argonne National Labs to a system in Austin run by Bill. I continued to manage the list policy and edit list information files, such as a "think before posting" reminder and the FAQ which had grown to 79 questions by December 2000. This had become a bit too large, and so 19 questions deemed less frequently asked were trimmed. A further trim was made in 2005, reducing a 65-question FAQ to one under 60.

By 2002, the list had reached over five thousand members and the workload of running the list software and managing the list subscriptions had become too much for one person. Dan Astoorian, my colleage at the University of Toronto, stepped in to help, and he was sorely needed. Moreover, the list server hardware was feeling the strain: by mid-2001, list members were being asked to contribute used equipment to upgrade the server. This was resolved in April 2003, when the list was migrated to a machine at the University of Toronto that had been donated to the University by Sun Microsystems.

But times were changing. Linux was growing rapidly and Sun's business was being affected. The web provided more resources for people seeking help administering their systems, and fewer were relying on mailing lists. The list fell below 2000 summaries per year in 2003, under 1200 in 2004, and dropped below 1000 in 2005. By 2008, summaries per year had fallen to about 300, fewer than in any full-year period previously. Sun Microsystems ran into significant difficulties during the economic downturn that year, and was sold to Oracle the following year. As for the list, in 2009, there were just over 200 summaries, declining to less than 100 in 2011. More disturbingly, the ratio of summaries to questions was steadily declining, from over 24% in 2001 to less than 16% by 2010: for some reason, list members were becoming less diligent in summarizing responses back to the list. Summaries and list traffic in general continued to decline rapidly: there were just over 50 summaries in 2012, and less than a dozen in 2013. In 2014, there were only three by October, when a hardware failure provided a good excuse to retire the list.

The Sun-Managers mailing list, over its twenty-five year lifetime, provided help to many thousands of system administrators, producing over 29000 summaries, an archive of which continues to be available. Special thanks is due to the superb people I was privileged to work together with on the list over the years: William LeFebvre, Gene Rackow, Rob Montjoy, Bill Bradford, and Dan Astoorian. Gratitude, also, is due to the thousands of list members who so freely shared their knowledge and expertise with others.

The list summary archive, and an account of the list's history (on which this blog entry is based) is available at http://sunmanagers.cs.toronto.edu. The list's official web page, http://www.sunmanagers.org, continues to be maintained by Bill Bradford.

/it permanent link

Mon 09 May 2016 10:54

Slow Windows Update on Windows 7? Install two Windows Update patches first.
Recently, I noticed Windows Update taking many hours or even days on Windows 7, especially for new installs/reinstalls. Task manager shows svchost.exe exhibiting large memory usage (suggestive of a memory leak) and/or sustained 100% CPU.

Happily, there's a workaround: grab a couple of patches to Windows Update itself, and manually install them. Get KB3050265 and KB3102810 from the Microsoft Download Center, and install them manually in that order, before running Windows update. These two patches seem to address the issues: after they were installed on some of our systems here, Windows Update ran in a reasonable amount of time (an hour or two perhaps on slow systems when many updates are needed, but not days).

/it permanent link

Fri 04 Mar 2016 10:25

Apple vs FBI: it is about setting a precedent.
There seems to be lots of confusion about Apple's current dispute with the FBI, despite Apple's message to their customers of Feb 16, 2016, where they tried to explain the issue. Here's the issue in a nutshell.

The FBI has an Apple iPhone that was the work-phone of a now-dead terrorist. The FBI wants to read what is on that phone. But the phone is encrypted, and runs a secure version of iOS. The FBI wants Apple to make an insecure version of iOS to run on that phone, so that the FBI can break into the phone and read the contents. Apple has, so far, refused.

This issue will no doubt be addressed in the US courts and legislatures. What is at stake is the precedent it sets. The essential question is this: to what extent should law enforcement be able to compel others to assist them with an investigation? Should software developers be expected to make insecure versions of their software, so that law enforcement can "break in"? It will be very interesting to see how this plays out.

/it permanent link

Fri 13 Mar 2015 11:08

Apple's new Macbook laptop: like a tablet?

I rarely write about Apple's products because they have no shortage of press already: Apple has superb marketing, and many of their products are remarkable in one way or another, often for excellent design and engineering. Their new super-thin Macbook laptop is no exception: it's very thin and light, has a superb high-resolution screen, a carefully redesigned trackpad and keyboard, and is very power-efficient. New to this machine is the fact that it has only a single USB-C port for power, data, and video (it also has a headphone port for audio). Most laptops have many more ports than this. A USB port used for both power and data, and a headphone port, but nothing else, is more typical of a tablet, not a laptop. Indeed, some of the press seems to have really latched onto this "tablet" comparison. Brooke Crothers of Foxnews/Tech claims that the MacBook is "almost a tablet" and states that the MacBook "is an iPad with a keyboard" while Lily Hay Newman of Slate claims that "you should think of the new macbook as a tablet". So how true is this? Is the new MacBook like a tablet?

Well, no, it's not. The MacBook's screen is not touch-capable, and is not capable of being used like a tablet screen. The keyboard and touchpad is an integral part of the machine: it is not optional or detachable. It runs a desktop/laptop operating system (MacOSX), not a tablet operating system such as iOS. The device is not a tablet, it is not "almost a tablet", it is not even like a tablet. It's a small, light, power-efficient laptop. If it must be compared to something, perhaps it can be compared to a netbook, though it has a much better keyboard, touchpad and screen, and is much more expensive.

Then what about the single I/O port? That's simply the consequence of the new USB 3.1 specification, which finally allows a USB connection to deliver enough power to power a laptop, and defines the USB-C connector, which in addition to USB data lines, provides "alternate mode" data lines that can be used for display protocols like DisplayPort. This makes it possible for Apple to build multiport adapters for the Macbook that provide video (e.g. HDMI), data (USB-A) and charging ports, making it unnecessary to provide all those ports separately in the laptop itself.

So does this make the Macbook "like a tablet"? While it is true that tablets have been using single connectors for power and data for a long time, this doesn't make the Macbook tablet-like. It's not the presence of a single shared power/data connector that makes something like a tablet, it's the interactive screen. Yes, a horse has four legs and is often sat upon, but a horse is not anything like a chair.

So will I be getting one of the new Macbooks? Probably not: like a fine thoroughbred, the new Macbook is lovely but rather too expensive for me. The need to buy the multiport adapter separately makes the already high cost of acquisition even higher. The high price doesn't stop me from admiring the design and engineering of this new laptop, but it does keep me from buying one.

/it permanent link

Sat 05 Oct 2013 17:03

What's wrong with Blackberry? (and some ideas about how to fix it)
Blackberry is in the news a fair bit these days, and the news seems to be all bad. As the firm reports close to a billion dollars in quarterly losses, a Gartner analyst recommends that enterprise customers find alternatives to Blackberry over the next six months. What's the problem?

Basically, fewer and fewer people want to buy Blackberry phones. The problem isn't so much that Blackberries don't do what they're supposed to, it's that people now perceive iPhones and various Android phones as much better choices, and are buying those instead. Why? The reason is that an iPhone or an Android phone isn't the same sort of phone as a traditional Blackberry. An iPhone or an Android phone is a true smartphone, i.e. an "app" phone, a platform that runs a whole "ecosystem" of third party software. A traditional Blackberry is a "messaging" phone, a device that specializes in effective messaging, such as email. Yes, it can run applications too, but that's not its primary function, and it shows.

To illustrate, consider email. Sending email requires the ability to type quickly. A physical keyboard works best for this, one that stretches across the short side of the phone. The screen, located above the keyboard, then becomes roughly square: it can't be very wide, because the phone will then become too wide to hold easily or to fit in one's pocket, and it can't be very tall or the phone will become too long. A square screen is fine for messaging, but for other things that a smartphone might like to do, such as displaying video, one wants a screen that is significantly wider than it is tall. A smartphone handles this by having a rectangular screen: when doing messaging, one holds the phone vertical: the bottom half of the screen then turns into a keyboard, and the top half turns into a roughly square messaging display. When watching media, such as videos, the phone is held horizontal, allowing a screen that is wider than it is tall. Hence the smartphone is useful in a broader set of ways: it is not just a messaging device. Smartphones have become good enough at messaging that many people do not feel they need a dedicated messaging device. Once the smartphone is the only device that people feel they need to carry, there's much less demand for a messaging phone.

Blackberry realized the problem, and tried to create a smartphone of its own. For instance, in 2008, it released the Blackberry Storm. But it became clear that Blackberry's phone OS was not as well suited for general smartphone use as iOS and Android. The Storm was not a commercial success because it did not work as well as competing phones. In response, in 2010 Blackberry bought a company called QNX that had a powerful OS, and started building devices to use it: first the Playbook, released in spring 2011, and then the Z10 phone in early 2013, followed a few months later by the Q10 and other phone models.

The new Blackberry OS works better than the old in delivering smartphone apps, but it was not very mature in 2011, and was available only on a tablet (the Blackberry Playbook). Unfortunately, the Playbook did not sell particularly well because Blackberry badly misrepresented it, calling it the "best professional-grade table in the industry" though it lacked many features of the market-leading iPad, including key messaging features such as a standalone email client. While it could have been a market success if it were marketed as a Blackberry phone accessory, a role it could effectively play, at release it was clearly not a true general-purpose tablet like the iPad. So it accumulated few apps, while Apple's iOS and Google's Android accumulated many. Blackberry realized this fairly quickly, and released an Android application emulation environment for their OS in early 2012, which allowed many Android apps to be easily moved over to the new OS. But few Android developers bothered to make Blackberry versions of their Android apps, given the relatively few Playbooks sold.

In the meanwhile, Blackberry did itself no favours by making it clear that there was no future for its existing phones, while failing to deliver a phone running its new OS for more than a year. This merely encouraged Blackberry users and app developers alike to switch to another platform. When the Z10 phone finally came out in 2013, the bulk of its apps were those that had been written for or ported to the Playbook, a far less rich set of applications than any Android or iOS phone. And while the Z10 is a decent phone that comes with some very nice messaging features, Blackberry did not do an effective job of touting the unique features of the Z10 that iPhones and Android phones do not have. Moreover, the price was set high (about the same as an iPhone or high end Android phone) and Blackberry produced a huge number, expecting to sell a great many. Some sold, but many didn't, and Blackberry's recent $1B loss was due primarily to writing down the value of unsold Z10s.

Blackberry sits today in a difficult position. No, it is not about to go out of business: the company is debt-free and has a couple of billion dollars in the bank. But its smartphone is not selling. What should it do now?

Blackberry's best chance at this point to make its smartphone platform viable is to take its large inventories of written-down Z10 phones and sell them cheaply, using a renewed marketing campaign that focuses on the unique features of the phone's software. The Z10 hardware is really no different than the various Android and iPhone models out there: if the phone is to sell, it has to be on the basis of what makes it unique, and that's the Blackberry OS software. For instance, Blackberry should show everyone the clever virtual keyboard that supports fast one-handed typing, the unique messaging hub, and the "Blackberry Balance" software that lets you separate work items from personal items on the phone. Blackberry needs to hire the best marketing people in the world to help get the message out. This is a "make or break" situation for the platform.

Secondly, Blackberry should modify the OS to run Android apps natively, without repackaging. Android app developers are not going to repackage their apps for Blackberry. Blackberry needs to recognize this and make sure that Android apps will appear automatically on Blackberry devices. Blackberry will need to find a way to get Google Play (the Android app store) ported to the platform. It is too late to build a separate app ecosystem around the Blackberry OS: it has to leverage an existing ecosystem, or die. Android is really the only viable option for Blackberry right now.

Finally, Blackberry needs to recognize that a niche market for dedicated messaging devices exists, and continue making devices that are the best messaging phones available, while tapping into an existing app ecosystem. Blackberry needs to be careful not to compromise the devices' effectiveness for messaging: it should pay attention to how people use the devices in the real world, and address quickly whatever issues they have. If Blackberry can't find a way of building such messaging devices using its own OS, it should switch to Android. Blackberry knows how to make superb messaging phones, and it should find a way to continue to do what it does best.

/it permanent link

Tue 20 Aug 2013 22:45

Cloud Computing: Everything Old is New Again
There is a great deal of hype about Cloud Computing at the moment, and it's getting a great deal of attention. It's no wonder: when firms such as Netflix, with a market capitalization of over U$15B, use cloud computing to deliver streaming video services to nearly forty million customers around the world, and when the US Central Intelligence Agency spends U$600M for cloud computing services, people take notice. But what is it all about?

Cloud computing is not really a new thing, it's a variation of a very old idea, with a new name. In the 1960s, when computers were large and expensive, not everyone could afford their own. Techniques for sharing computers were developed, and firms arose whose business was selling time on computers to other firms. This was most commonly described as "timesharing". IBM released its VM virtualization environment in 1972, which allowed a mainframe computer to be divided up into virtual computers, each for a different workload. A timesharing vendor could buy and operate an IBM computer, then rent to their customers "virtual computers" that ran on that machine. From the customer's perspective, it was a way to obtain access to computing without buying one's own computer. From the vendor's perspective, it was a way of "renting out" one's investment in computer infrastructure, as a viable business.

Today, cloud computing, as did timesharing in the past, involves the renting of virtual computers to customers. The name has changed: then, it was called "timesharing"; now, "cloud computing". The type of physical machine has changed: then, a mainframe was used to provide computing services; now, a grid computer. The interconnection has changed: then, leased data lines were typically used; now, the internet. But the basic concept is the same: a vendor rents virtual computers to customers, who then use the virtual computers for their computing, rather than buying their own physical computers.

The advantages and disadvantages of today's cloud computing echo the pros and cons of yesterday's timesharing. Advantages include risk sharing, the ability to pay for just the amount of computing needed, the option to scale up or down quickly, the option to obtain computing resources without having to develop and maintain expertise in operating and maintaining those resources, and the ability to gain access to computing resources in very large or very small quantities very quickly and easily. Moreover, cloud computing vendors can develop economies of scale in running physical computers and data centres, economies that they can leverage to decrease the cost of computing for their customers. Disadvantages of cloud computing include possibly higher unit costs for resources (for example, cloud data storage and data transfer can be very expensive, especially in large quantities), a critical dependance on the cloud computing vendor, variable computing performance, substantial security and privacy issues, greater legal complexity, and so on. These tradeoffs are neither surprising nor particularly new: in fact, many are typical of "buy" vs. "rent" decisions in general.

Then why does cloud computing seem so new? That, I think, is an artifact of history. In the 1970s and early 1980s, computers were expensive and timesharing was popular. In the 1990s and early 2000s, computers became increasingly cheaper, and running one's own became enormously popular. Timesharing faded away as people bought and ran their own computers. Now the pendulum is swinging back, not driven so much by the cost of computers themselves, but the costs of datacentres to house them. A few years ago, Amazon Inc. saw a business opportunity in making virtual machines available for rental: it was building grid computers (and datacentres to house them) for its own operations anyway; why not rent out some of those computing resources to other firms? In so doing, Amazon developed an important new line of business. At the same time, a huge number of new internet firms arose, such as Netflix, whose operations are dominantly or exclusively that of providing various computer-related services over the internet, and it made a great deal of sense for such firms to use Amazon's service. After all, when a company's operations are primarily or exclusively serving customers on the internet, why not make use of computing resources that are already on the internet, rather than build private datacentres (which takes time, money and expertise)? These new internet firms, with lines of business that were not even possible a decade or two ago, and Amazon's service, also only a few years old, have lent their sheen of newness to the notion of "cloud computing" itself, making it appear fresh, inventive, novel. But is it? The name is new, yes. But in truth, the concept is almost as old as commercial computing itself: it has merely been reinvented for the internet.

Of course, the computing field, because of its inventiveness, high rate of change and increasing social profile, is rather at risk of falling into trendiness, and cloud computing certainly has become a significant trend. The danger of trendiness is that some will adopt cloud computing not on its own merits, but solely because it seems to be the latest tech tsunami: they want to ride the wave, not be swamped by it. But cloud computing is complex, with many pros and cons; it is certainly a legitimate choice, as was timesharing before it, but it is not necessarily the best thing for everyone. It's easier to see this, I think, if we look beyond the name, beyond the trend, and see that the "rent or buy" question for computing has been with us for decades, and the decision between renting virtual machines and buying physical ones has often been complex, a balance of risks, opportunities, and resources. For an internet firm whose customers are exclusively on the internet, renting one's computing assets on the internet may make a great deal of sense. For other firms, it may not make sense at all. Deciding which is true for one's own firm takes wisdom and prudence; a healthy dose of historical perspective is unlikely to hurt, and may help cut through the hype.

/it permanent link

Tue 23 Apr 2013 12:56

Handling Unsolicited Commercial Email

My email address is all over the web: at the time of writing this, a search on google for my email address produces about 15,800 results. So anyone who wants to find my email address can do so easily. Many people or companies who want to sell me something send me email out of the blue. I get a great deal of such unsolicited commercial email, too much to read or pay adequate attention to. I simply delete them. Unfortunately, many sources of such email persist. So for some time now, I've elicited the help of technology. I process my incoming email using procmail, a powerful piece of software that lets me script what happens to my email. When I receive unsolicited commercial email, if it is from a vendor or organization I don't have a relationship with, I will often add a procmail rule to discard, unseen, all future email messages from that vendor. I've got about 400 organizations (mostly vendors) in my discard list so far, and the list slowly grows. Am I still getting unsolicited commercial email from these sources? I am, but I am not seeing it. It's the same effect, really, as manual deletion (i.e. the message is deleted, unread), but it's easier for me, because I am not interrupted. But of course I think it would be better still if the email were not sent at all.

If you are a vendor with whom I do not have a pre-existing relationship, and you want to send me email introducing your products, please don't. I do not accept cold salescalls either. Instead, advertise effectively on the web, so that if I am looking for a product like yours, I can find you. If you must contact me directly, send me something by postal mail, where, unlike email, the communication does not have an interruptive aspect.

/misc permanent link

Thu 29 Nov 2012 00:00

A closer look at the University of Toronto's international ranking in Computer Science.

International rankings of universities seem to be all the rage these days. The interest seems to be fed by three rankings of particular prominence that have emerged in the past decade. These are Shanghai Jiao Tong University's Academic Ranking of World Universities (sometimes known as AWRU, or simply as the "Shanghai Ranking"), Quacquarelli Symonds' QS World University Rankings, and the Times Higher Education World University Rankings. Part of the attractiveness of these rankings is that they can become a way of "keeping score", of seeing how one institution does in comparison to others.

My employer, the University of Toronto, does quite well in these rankings, particularly my department, Computer Science. The subject area of Computer Science is not ranked separately in the Times Higher Education World University Rankings (it's bundled together with Engineering), but in the other two, Toronto has consistently ranked in the top ten in the world each year in Computer Science, with only one exception.

This exception is recent, however, and worth a closer look. In the QS World University Rankings for Computer Science and Information Systems, Toronto dropped from 10th in 2011 to 15th in 2012. This big drop immediately raises all sorts of questions: has the quality of Toronto's Computer Science programme suddenly plummetted? Has the quality of Computer Science programmes at other universities suddenly soared? Or has the QS World University Rankings changed its methodology?

To answer this question, let's look at how other universities have changed from 2011 to 2012 on this ranking. Many (MIT, Stanford, Berkeley, Harvard, Oxford, Cornell, and others) stayed where they were. Others dropped precipitously: Cambridge University dropped from 3rd to 7th, UCLA from 8th to 12th, and Caltech plummetted from 7th to 27th. Some other universities went up: Carnegie Mellon University (CMU) went from 9th to 3rd, ETH Zurich from 11th to 8th, the National University of Singapore (NUS) from 12th to 9th, and the Hong Kong University of Science and Technology (HKUST) soared from 26th to 13th. Surely these curious and significant changes reflect a methodology change? But what?

The QS university rankings website, in the Methodology section, Academic subsection, reveals something of interest:

	NEW FOR 2012 - Direct Subject Responses

	Until 2010, the survey could only infer specific opinion on
	subject strength by aggregating the broad faculty area opinions
	of academics from a specific discipline. From the 2011 survey
	additional questions have been asked to gather specific opinion
	in the respondent's own narrow field of expertise. These responses
	are given a greater emphasis from 2012.
To understand this change, it needs to be recognized that the QS rankings rely highly on the opinions of academics. A large number of academics around the world are surveyed: the QS rankings website indicates that in 2012, 46079 academic responses were received, of which 7.5% addressed Computer Science." The seemingly modest change made in 2012, to weigh more heavily the opinions of academics in a field about their own field, given its impact on the 2012 results for Computer Science, leads one to wonder about the regional distribution of academics in Computer Science in comparison to academics in other disciplines. One significant factor may be China.

In 1999, courses in the fundamentals of computer science became required in most Chinese universities, and by the end of 2007, China had nearly a million undergraduates studying Computer Science. While QS rankings does not indicate regional distribution by discipline for the academics whose opinions it consults, the surge in the number of Chinese computer scientists worldwide in the past decade almost certainly must have an effect on the regional distribution of academics in Computer Science as compared to other disciplines. As such, is it any surprise to see world universities prominent in China that possess strong Computer Science programmes (such as HKUST and NUS) climb significantly in the rankings, and others less prominent in China plummet? But if a world ranking of universities is so affected by regional shifts in those whose opinion is being solicited, how reliable is it as an objective gage of the real quality of a given university?

Perhaps a more reliable gage of quality can be found in the Shanghai ranking, which is not opinion-based, but relies on concrete indicators and metrics. On the Shanghai ranking, the University of Toronto consistently ranks 10th in the world in Computer Science in 2010, 2011, and 2012. But what does this mean, concretely?

To answer these questions, we need to grapple with an important fact: in Computer Science, the US dominates. As a nation, the US has been enormously supportive of Computer Science ever since the field first existed, and as a result, it has become pre-eminent in computing. Nine of the top ten schools in the Shanghai ranking, and twenty of the top twenty-five, are in the US. For the University of Toronto to be one of the handful of universities outside the US to break into the top twenty-five, and the only one to break into the top ten, is a significant accomplishment. A chart is illustrative:

Of course, the University of Toronto is in Canada, so a comparison to other schools in Canada is also illustrative. For Computer Science, on the Shanghai ranking, there seems to be no close Canadian rival. In 2012, UBC comes closest, being a only a few points short of breaking into the top 25, but all other Canadian schools rank well back:

Even compared to other disciplines that have Shanghai rankings (only science, social science, and related disciplines seem to be ranked), Toronto's pre-eminence in Computer Science in Canada is striking:

From a score-keeping perspective, I think we can conclude that the University of Toronto is doing very well in Computer Science with respect to other universities in Canada, and it is one of the few non-US schools that can keep up with the US in this field.

But all this needs to be put into perspective. After all, rankings are not a full picture, they're aggregations of metrics of varying value, they represent a formulaic approach to something (university education) that cannot always be so conveniently summarized, and they reflect methodologies chosen by the producers of the rankings, methodologies that may not always best reflect objective quality. Of course, if the University of Toronto were to climb to fifth, I'd be pleased, and if it were to drop to fifteenth, I'd be disappointed: surely the score-keeper in me can be allowed this much. But in the overall scheme of things, what matters most for Computer Science at Toronto is not our score on a ranking system, but the objective quality of our programme, the learning outcomes of our students, and the impact of our research, and these things, not our score on rankings, must always remain our top priorities.

/misc permanent link

Wed 22 Aug 2012 14:07

Intel desktop CPU price-performance: Hyperthreading not helping?
Typically, CPU prices follow performance. Faster CPUs command higher prices; slower CPUs are available for less. Recent Intel desktop CPUs continue to show this general pattern, but there appears to be more to the story than usual.

At first glance, everything seems to be what you would expect. Using current pricing in US$ at time of writing from newegg.com, we get:
Processor PassMark Price PassMark/$ Price-Performance vs G640
Pentium G640 2893 $79 36.6 100%
i3-2120 4222 $125 33.8 92.2%
i5-3570 7684 $215 35.7 97.6%
i7-3770 10359 $310 33.4 91.3%
The PassMark (http://www.cpubenchmark.net/) to dollar ratio is pretty consistent across all these processors, roughly 35 ± 2.

But what happens if we look at a more real-life benchmark? Consider SPEC CPU 2006 Integer (CINT2006) Baseline. For each CPU, I used the CINT2006 Baseline results from the most recently reported Intel reference system, as reported on spec.org. In the case of the G640, no Intel reference system was reported, so I used the results for a Fujitsu Primergy TX140 S1p.
Processor CINT2006 Base Price CINT/$ Price-Performance vs G640
Pentium G640 34.4 $79 0.44 100%
i3-2120 36.9 $125 0.30 67.8%
i5-3570 48.5 $215 0.23 51.8%
i7-3770 50.5 $310 0.16 37.4%
When looking at CINT2006 Baseline, we see the price-performance ratio drop off dramatically as the processor price increases. We would expect this from the i3 to the i5, since SPEC cpu int is a single job benchmark and the i3 to the i5 represents a transition from two to four cores, but it's curious to see the dropoff in the price-performance ratio between the G640 and the i3 (both dual-core CPUs), and the i5 and the i7 (both quad-core CPUs). What might be going on?

A look at hyperthreading may provide some answers. Intel hyperthreading is a feature of some Intel CPUs that allow each physical core to represent itself to the OS as two different "cores". If those two "cores" simultaneously run code that happens to use different parts of the physical core, they can proceed in parallel. If not, one of the "cores" will block. The i3 and i7 CPUs offer hyperthreading, the Pentium G and i5 do not. It turns out that the PassMark benchmark sees significant speedups when hyperthreading is turned on. SPEC CINT2006, and many ordinary applications, do not.

What about SPEC CINT2006 Rate Baseline, then? The SPEC CPU Rate benchmarks measure throughput, not just single-job performance, so maybe hyperthreading helps more here? Let's see:
Processor CINT2006 Rate Base Price Rate Base/$Price-Performance vs G640
Pentium G640 61.7 $79 0.78 100%
i3-2120 78.8 $125 0.63 80.7%
i5-3570 146 $215 0.68 87.0%
i7-3770 177 $310 0.57 73.1%
If we look at the transition from two to four cores (by comparing the i3 to the i5), we now see that the price-performance of the i5 is better than the i3: this is no surprise, since we are now measuring throughput, and from the i3 to the i5, we go from two to four cores. But there still is a dropoff in price-performance between the Pentium G and the i3, and again between the i5 and the i7. It's not as extreme as before, but it is still significant. This suggests that hyperthreading may help with throughput, but not as much as the increase in price would suggest.

What does this mean, then? It suggests the increase in price from a non-hyperthreaded to a hyperthreaded Intel desktop processor may reflect more an increase in PassMark performance than an increase in real performance. Hyperthreading may have a positive effect, it seems, but typically not as much as PassMark suggests. At present, for best real-world price-performance in Intel desktop CPUs, I would consider models without hyperthreading.

/it permanent link

Tue 26 Jun 2012 16:56

How to avoid being fooled by "phishing" email.
A "phishing" email is an email message that tries to convince you to reveal your passwords or other personal details. Most often, it tries to send you to a website that looks like the real thing (e.g. your bank or your email provider) but is really a clever duplicate of the real website that's set up by crooks to steal your information. Often the pretence looks authentic. If you fall for it and give your password or other personal details, criminals may steal your identity, clean out your bank account, send junk email from your email account, use your online trading account to buy some penny stock you never heard of, send email to all the people in your address book telling them you're stranded in a foreign country and need them to wire money immediately, or do any number of other bad things.

But there's a really easy way to avoid being fooled by phishing messages. If you get a message that asks you to confirm or update your account details, never, ever go to the website using a link that is in the email message itself. Remember, anyone can send you a message with any sort of fraudulent claim, containing any number of links that pretend to go to one place, but really go to another. So if you feel you must check, go to the website that you know for sure is the real thing: use your own bookmark (or type in the URL yourself), not the link in the message.

/it permanent link

Thu 15 Dec 2011 15:14

Dealing with unsolicited salescalls (cold calls).

For many years, I've been plagued by unsolicited salescalls. It's not very hard to find my phone number, and various people (mostly in the IT realm) call me up out of the blue hoping to sell me something. The interruption is unwelcome, even if the product isn't.

For some years now, my policy is to explain to the caller that I don't accept unsolicited salescalls, sincerely apologize, and end the call. Occasionally, I am then asked how I am to be contacted. I explain that I prefer to do the contacting myself: when I have a need, I am not too shy to contact likely vendors and make inquiries about their products.

Occasionally I run into someone who is offended by my unwillingness to take their unsolicited salescall. I do feel more than a little sympathy for the salesperson when this happens: I imagine they may think I objected to something they did, or to their manner. The fact is, I handle all unsolicited salescalls this way. As for whether it is intrinsicly offensive to reject unsolicited salescalls out of hand, I don't think it is. Indeed, it is natural for a salesperson to want their salescall, even if unsolicited, to be better accepted. But it is unreasonable for any salesperson to expect that unsolicited sales inquiries to strangers will always be welcome. But I do apologize, each time, and in general, when I so quickly end telephone conversations with salespersons who call me out of the blue.

Dear reader, if you are a salesperson, and you are tempted to contact me to sell me something, please do not call. Instead, just advertise generally (and if you must, send me some mail in the post). Trust me to find you when the need arises. I frequently do.

/misc permanent link

Tue 26 Jul 2011 17:15

Gigabit ethernet, and Category 5, 5e cabling.
There seems to be lots of folklore that says that Category 5 (Cat5) cabling can't run gigabit ethernet. Contrary to widespread belief, that's mostly false. Here's the situation. Cat5 has the bandwidth to run 1000baseT. But early experience with 1000baseT showed that 1000baseT was pickier about certain cabling issues that weren't specified in the Cat5 standard, such as crosstalk and delay skew, so the Cat5 standard was enhanced for 1000baseT to enforce limits on these. This enhanced standard is called Cat5e. But the fact is that most Cat5 installations already perform to the Cat5e spec.

If someone tells you to rip out a Cat5 installation because it can't support 1000baseT, you're being prompted to do something that is expensive and probably unnecessary. All you generally need is test the existing cables to the Cat5e standard (using a Cat5e cable tester) and replace the ones that fail. Often, most if not all the cables will be fine. Or just use the cables for 1000baseT and replace any that exhibit problems.

Cat6 and Cat6a are a different matter. Cat6 supports a spectral bandwidth of 250MHz, up from Cat5/Cat5e's 100Mhz, while Cat6a supports 500Mhz. Cat6 cabling will run ten gigabit ethernet (10GbaseT) to 37-55m, while Cat6a will run 10GbaseT to 100m. So it's worth choosing Cat6 or Cat6a over Cat5e for new cabling, if the cost increment isn't too high, so that the cabling can support 10GbaseT, even if it's not needed today.

/it permanent link

Mon 30 May 2011 21:26

Einstein's special relativity isn't as complicated as many people seem to think.

I run into people who think that special relativity is some sort of mysterious thing that only Einstein and physicists can understand. But it's not. It's a bit weird, but it's no weirder than the earth being a globe.

Originally people thought that light moved like any other moving object. Einstein thought about this and wondered: what would happen if you followed some light and sped up until you travelled at the same speed as it. Then light would look to you like it was stopped. But stopped light (light "standing still") didn't (and still doesn't) make sense. So Einstein thought: what if light travels at the same speed no matter how fast you're going? What would this mean?

Well, what does it mean to travel "at the same speed"? It means light covers the same amount of distance in a given amount of time. Or, put another way, light takes the same amount of time to cover a given distance. So if the distance is short, light takes less time to go the distance. If the distance is longer, light takes proportionally more time to cover it.

So Einstein thought: OK, if light travels at the same speed for everyone no matter how fast they're going, what would that mean for someone going very fast? Imagine they're going nearly the speed of light, and are being chased by a beam of light. Clearly the light isn't going to get closer to that person as quickly as it would get closer to someone who was standing still. Ordinarily, you would think that light was moving "slower" for the person who is moving away from it. But if light moves at the same speed for everyone, than something else must be going "slower" for that person. The only possibility is time.

Put it this way: light covers a certain distance in a second. To someone watching, the pursuing light isn't making up the distance quite so fast between it and the moving person, because the person is moving away so fast. But for the moving person, light is moving as fast as it always does, it is the second that takes longer.

This sounds a little bit crazy since we aren't used to thinking of time moving faster for some people and slower for others. But it does. The reason we don't notice is that the speed of light is very fast and we can't easily go at speeds close to it.

It's the same sort of thing as the world being round (i.e. a globe). It looks flat to us, but only because it is so big that we can't see enough of it at once to see it curve. Go high enough and we can see the curve of the earth's surface easily enough.

Similarly with special relativity. Time moves slower for those who move fast. It's not obvious to us because we usually don't move very fast, so at the speeds we move, the time differences are too small to notice. But in 1971, Joseph Hafele and Richard Keating took some very accurate (cesium atomic) clocks abord commercial airliners and flew around the world. They compared their clocks to the very accurate clocks in the US naval observatory: the clocks were indeed different, and showed the results that Einstein had predicted.

What this this mean? Well, if you can wrap your head around the concept of the world being a globe, you can wrap your head around the concept of time moving more slowly for those who move fast. And that's it, right?

Well, not really. There's also general relativity (and it affects Hafele and Keating's results too). But that's a bit more complicated, and I'm not going to get into it now.

/misc permanent link

Wed 23 Feb 2011 11:10

Exchanging files in docx format may lead to problems
When Microsoft came out with Office 2007, the default save format for files was switched to a new format based on XML. For Microsoft Word, for example, instead of files being saved in .doc format by default, they were now saved in .docx format. If you use Microsoft Word 2007 or 2010, you'll notice that when you save a Word document, it saves it as document.docx instead of document.doc.

Unfortunately, now there seems to be an incompatibility between how Word 2007 and Word 2010 interpret .docx files. Apparently, possibly depending on how one's printer is configured, when users of Word 2007 and Word 2010 share files in .docx format, some spaces (seemingly random) between words in the file are dropped.

This has been reported on various places on the net including the CBS Interactive Business Network, cNET.com, and Microsoft's own user forums.

For now, I suggest using the older .doc format for users of different versions of Microsoft Word to exchange documents. For publishing documents, instead of using a native Word format, I suggest using a widely-used open document standard like PDF. CutePDF is a useful free Windows printer driver that lets you create PDF files from any Windows application by simply printing to a CutePDF printer.

/it permanent link

Fri 03 Dec 2010 21:52

What's right about ikiwiki?
Chris Siebenmann pointed me today at ikiwiki. It's a wiki that can also function as a blog. It's potentially interesting, he said. And he was right: to me, it seems definitely interesting. I've only started looking at it, but there's something about it that I like very much, something it does right that most web 2.0 applications seem to do wrong: ikiwiki uses the right sort of tool to store the wiki text. What's the right tool? In my opinion, it's a revision control system (well, to be more exact, a filesystem coupled with a revision control system).

Why is this the right tool? Well, what's wiki data? It's a collection of edited text documents. Databases, such as those used by most wikis and blogs, are designed for large collections of data records, not documents. Yes, they can handle documents, but using them for a collection of documents is like using a tractor-trailer for a trip to the beach. Yes, you can do it, but it's a bit excessive, and you may end up stuck in the sand. On the contrary, it seems to me that a filesystem, not a database, is the appropriate tool for document storage, and a revision control system, not a database, is the tool of choice to keep track of document edits.

Then why do so many wiki and blog implementations use databases such as mysql or postgres as their back-end? I don't know. I suspect it's a lack of imagination: when you're holding a hammer, everything looks like a nail. In fact, "lite" versions of these databases (e.g. sqllite) have been created to take advantage of the fact that the full power of these database systems are not needed by many systems that use them. But "lite" databases for wiki/blog back-ends seem to me to be like cardboard tractor-trailers: still the wrong tool, but with some of the overkill stripped out.

Even more to ikiwiki's credit than the fact that it has what I think is the right sort of backend, it also allows you to use a wide array of different revision control systems (svn, git, cvs, mercurial, etc.), or even no revision control system at all. I like this. Revision control systems seem to be a matter of widely varying preference, and ikiwiki's agnosticism in this regard makes it appealing to a wider array of users.

I've only started looking at ikiwiki, and it may be that in the end, I'll decide I don't like it for some reason or another, but whether I end up liking it or not, or whether we use it or not, I think ikiwiki is right in using a revision control system instead of a database for its backend. I wish it were not so rare in this respect.

/it permanent link

Tue 04 May 2010 14:51

Adding logout to Firefox: making HTTP authentication more useful.

The HTTP protocol (on which the world wide web is based) offers two forms of simple authentication that are built into pretty much every web browser: Basic authentication and Digest Authentication. For both these authentication mechanisms, the web browser obtains authentication information from the user and retains it to submit to the web site on the user's behalf. A set of authentication information retained for a site by a running web browser is called an authenticated session.

Unfortunately, in most web browsers, including Firefox, there is no easy way to delete that information. Hence once you are authenticated to a web site as a particular user, your web browser will continue to authenticate you to that web site as that user until you exit your browser. It's easy to see this in action: simply go to a site that requires basic or digest authentication, authenticate, browse somewhere else, then return to that site. Did it ask you to enter your password again? No, it remembered who you had authenticated as before, and connected you immediately as that user.

This is often what you want, but not always. Sometimes you might want to logout as one user and login as a different user. You can't easily do this in most web browers without exiting and restarting the browser. Or perhaps you may want to allow someone else to use your web browser, and you don't want to give them your access to certain sites. It would be useful to be able to clear your authenticated sessions.

Some web browsers, such as Firefox, permit clearing all internal authentication and identification information: cached data, cookies and authenticated sessions. In more recent versions of Firefox, the feature is called private browsing, and is focused primarily on browsing without leaving privacy information behind. But this is a pretty blunt instrument: all potentially sensitive data is cleared, such as cookies, not just authenticated sessions. What if all you want to do is log out?

My HTTP logout add-on for Firefox is intended to change this. It adds two menu options to Firefox, one on the Tools menu, and the other on the menu you get when you right-click on the background. In each case, the menu option is called HTTP logout all, and if you select it, it will clear all authenticated sessions in your running web browser. You can easily try it: after installing the add-on, go to a site that requires basic or digest authentication, and authenticate. Now choose "HTTP logout all", and reload/refresh that page. It will not recognize you as the person who had logged in before, and will ask you to log in again.

I'm not the only person who wants the ability to log out when using HTTP authentication. Many of us who have implemented web sites using Basic or Digest authentication have often been asked by users "How do I log out"? On this topic, the Apache foundation writes:

        Since browsers first started implementing basic authentication,
        website administrators have wanted to know how to let the user log
        out. Since the browser caches the username and password with the
        authentication realm, as described earlier in this tutorial, this
        is not a function of the server configuration, but is a question
        of getting the browser to forget the credential information, so
        that the next time the resource is requested, the username and
        password must be supplied again. There are numerous situations in
        which this is desirable, such as when using a browser in a public
        location, and not wishing to leave the browser logged in, so that
        the next person can get into your bank account.  

        However, although this is perhaps the most frequently asked question
        about basic authentication, thus far none of the major browser
        manufacturers have seen this as being a desirable feature to put
        into their products.  

        Consequently, the answer to this question is, you can't. Sorry.

        - Apache 1.3 documentation.
Now at least Firefox users can.

/it permanent link

Fri 08 Jan 2010 16:02

Startssl: a better approach to SSL certificates
Perhaps one of the highest profit-margin businesses on the internet is the provisioning of domain SSL certificates. The reason: prices for domain SSL certificates are often very high: up to hundreds of dollars for a 1yr domain certificate, but the cost of producing them is often very low: generally, all that is needed is a simple automated web site that authenticates via email. Typically no human being needs to be involved. Then why do they cost so much money? Probably because only a few certificate vendors are trusted by default in the major web browsers. Nobody wants to use a certificate that is not trusted by default in all the major web browsers, because that would mean a person using one of those browsers will, by default, see scary messages whenever (s)he tries to access the site.

Traditionally, SSL certificate vendors have competed by advertising, each attempting to convince customers that it is more trustworthy than the other guy and thus worth paying more for. But this is generally irrelevant: if the brower trusts the SSL certificate by default, the site will work out of the box, without any scary messages, and the only people who are going to even notice which vendor is used are those who stop to examine the SSL certificate in detail. Few do.

It would be nice (for SSL certificate customers at least) if SSL certificate vendors would start to compete more by price instead. There has been some of that in recent years, but the price of a one year simple domain SSL certificate is still upwards of U$10, with prices most often several times that amount. This is a lot of money for something that is pretty close to zero-cost to create.

Recently, things have started to change. In 2009, Startcom became trusted as a certificate authority by all the major browsers (IE, Firefox, Safari, Chrome). But Startcom is not a traditional SSL certificate vendor. Instead of charging per certificate, Startcom's Certification Authority gives away certificates for free, and charges instead for various authentication services. Simple authentication (the sort that can be done automatically through sending email to a known address and then asking the person to enter into a webpage the secret code that was sent) is free, because it can be fully automated, and thus done cheaply. Once authenticated, the person can generate an unlimited number of the most common sort of domain SSL certificates (1 yr single domain name). More extensive authentication, the sort that requires the participation of a human being to verify a person's identity documents, costs a modest amount of money (U$40/yr). Once authenticated at this higher level, the authenticated person can generate as many as necessary of the less common sort of domain SSL certificates (e.g. 2yr, or wildcard). More extensive authentication services are available, at additional cost. Thus startcom charges for the sort of services that are more intrinsicly expensive (e.g. services that require the attention of a human being, such as extended authentication), and not for automated services that are entirely performed by computer (such as certificate generation). This seems much fairer to the customer.

Is this the future of SSL certificates? I suspect most of the SSL certificate vendors would prefer it not to be: SSL certificate generation is quite profitable at the moment. But it is better economics: the price being charged more closely approximates the cost to offer the service. So if the market for SSL certificates is to more closely approximate a free market, startcom's approach seems quite promising.

/it permanent link

Fri 04 Sep 2009 14:57

Assessing H1N1 risk
What sort of risk does H1N1 (Swine Flu) present this flu season? To assess this, it might be helpful to estimate some H1N1 risks and then compare it to risks with which we are more familiar.

So let's look at some numbers. The worldwide case fatality rate of H1N1 (the number of people who have died of H1N1, divided by the number of people who have gotten H1N1) has been estimated to be 0.45%. Unlike seasonal flu, roughly 80% of those who have died of H1N1 are less than 65 years old (typically 90% of seasonal flu fatalities are 65 years old or over). If we assume a 15% probability of getting H1N1 this flu season, the likelihood of someone under the age of 65 dying of H1N1 this season is thus 0.15 x 0.0045 x 0.80, i.e 0.054% or 1 in 1852. This is a little less than the one-year general odds of death due to external causes in the US, approximately 1 in 1681.

/misc permanent link

Fri 14 Aug 2009 16:45

What's Good About Twitter?
Twitter has a mixed reputation. Negative views generally express the notion that Twitter is pretty much useless , or is a massive waste of time. Indeed, there is no shortage of evidence for this view. What is the usefulness of knowing that someone is brushing their teeth, or having cereal for breakfast? Probably not much. The problem is that "What are you doing?", the question that a tweet is allegedly supposed to answer, is often not very interesting. What one is thinking, what one has stumbled across, or what one wants to tell the world, could be much more interesting.

One very useful purpose Twitter serves is to announce new articles, blog entries, events, or news items when they appear. Twitterfeed makes this easy: it will check an Atom or RSS feed periodically, and automatically tweet the titles and URLs of new articles to Twitter, allowing anyone following the tweeter to be made aware of the new material. For example, my department now uses Twitter to disseminate its news and events.

So is Twitter a waste of time? Is Twitter useless? Only if one takes Twitter's "What are you doing?" too literally. Indeed, some seem to feel the need to tell others whenever they're yawning, scratching an itch or drinking coffee. Clearly this is not the most interesting of material. But, on the other hand, if one uses Twitter to follow information sources (people or organizations) with a high information content, and/or to disemminate such information oneself, it can be very useful indeed.

/it permanent link

Wed 10 Jun 2009 13:43

How well do Java Virtual Machines Scale Up?
Java seems to be a popular language for small to medium-sized applications and its use at that scale is well understood. But what happens when you scale it up to something quite large? It seems that very large Java Virtual Machines (JVMs) are still rather rare. Blackboard is a Java-based learning management system (LMS) now in use at the University of Toronto. The University is rather large, with a great many courses and students, and its Blackboard JVM is correspondingly huge. It turns out that an ordinary off-the-shelf JVM suffered some unusual and unpleasant performance issues (mostly related to garbage collection) when scaled this large. The university and Sun Microsystems eventually resolved the issues quite nicely (using the new and still somewhat experimental Java G1 garbage collector) but it was an eventful journey. John Calvin of the University has put together a rather interesting talk about this, which will be given at the university on June 23rd, and later this summer at BBWorld 2009.

/it permanent link

Tue 07 Apr 2009 14:43

Understanding Portable Displays
Perhaps the most important thing about a portable computer, be it a notebook, netbook, PDA, smartphone, or internet tablet, is what it provides you versus what it demands from you. One of the most important things a portable machine provides is logical screen area or screen resolution: the amount of data it can show you on the screen at one time. But of the most important things a portable machine requires/demands is weight: what does it take to carry it?

Screen resolution is measured as a grid of pixels (dots) in width x height format, e.g. 1024x768 means a screen that is 1024 dots wide and 768 dots high. Weight is of course not the only thing that determines portability: size is important too, but generally larger machines are heavier and smaller ones are lighter, so weight is a good shorthand for "weight and size".

A quick way to approach the costs and benefits of a portable computer is to compute the ratio of its benefits (e.g. screen resolution) to its portability cost (e.g. weight). So a quick assessment of a portable computer is to compute its pixel to weight ratio: if the weight ratio is high, the portable computer may compare better to one that has a lower pixel to weight ratios. I've written a little tool to compute this information (in units of pixels per milligram, i.e. ppmg), at http://www.cs.toronto.edu/~jdd/screenspec.cgi.

Pixel to weight ratio isn't quite enough, though, because there are limits to human sight: a portable computer is of no use if the pixels are so small that they cannot be easily seen. "Too small" depends on the distance the screen is from one's eyes. I tend to use devices like cellphones and PDAs at a distance of 18 inches from my eyes, and laptops at 24 inches. Generally, distance multiplied by the pixels per inch of the screen is a constant. For example, for me, I'm quite comfortable with 170 ppi at 24 inches, but beyond that, I feel some eyestrain. At 18 inches, that works out to (170 x 24) / 18 = 227 ppi. In my (anecdotal) experience, many people seem comfortable with 125ppi at 24 inches and 167ppi at 18. Of course, there is much more to this than a simple ratio: tolerance for high pixel densities varies depending on what the person is doing, what fonts are being used, and many other things.

Still, a pixel to weight approach lets one compare machines in interesting ways: for example, an Apple Ipod Touch has a 3.5" 480x320 screen and weighs 115g; that's 164 ppi and a pixel to weight ratio of 1.3. This is comparable to a Nokia E90 Communicator, which has a 4" 800x352 screen and a weight of 210g; its ppi is 218 and pixel to weight ratio is 1.34. But now consider a Nokia N810 Internet tablet: its 4.13in 800x480 screen (ppi is 225) and weight of 226g gives a significantly higher pixel to weight ratio of 1.69. But with ppi around 220 vs. the Ipod's 164, either Nokia device may result in eyestrain where the Ipod Touch does not.

Now look at some notebooks. A (large and heavy) Dell Vostro 2510 notebook weighing 5.72lbs with a 15.4" WUXGA (1920x1200) screen offers 147ppi and a pixel to weight ratio of 0.9, which is (perhaps surprisingly) a higher pixel to weight ratio than a (small and light) netbook, the Dell Mini 10 with the new high-resolution 10" 1366x768 screen (ppi of 155); its weight of 2.86lbs results in a lower pixel to weight ratio of 0.8 (at a slightly higher ppi, too). Compare this to a Macbook Air: with a 13.3" 1280x800 screen, it weighs 3 lbs; its pixel to weight ratio is 0.75. Unlike the other too, though, the Macbook air has an easier-to-read ppi of 113.

Of course, this doesn't mean that one should pick portable computers based solely (or even mostly) on pixel to weight ratios, or ppi for that matter. It is merely one possibly useful way to compare portable machines, and should be at most only one criterion among many, when making a decision.

/it permanent link

Mon 09 Feb 2009 13:59

Why a netbook?
What is a netbook anyway? It's a new type of low-cost ($350-$600) notebook that is particularly small and light: it typically has a 7-10" screen, and a weight not much more than two pounds. Small and light notebooks are not new, but they have for years been quite expensive, marketted to busy executives who want something small and light to use when travelling, and are willing to pay for the privilege. But an inexpensive small and like notebook is new, so new that it has been given its own name: netbook. The rationale behind the name is that this meant to be an "internet" device: its primary role is web browsing, and office productivity applications are secondary. Such a device relies on wireless: wifi for now, but increasingly 3G and other forms of cellular data service.

Why buy one? It's affordable, by notebook standards. It's also very portable: while it's too large for a pocket, it can easily slip into a handbag. And while it may be designed for internet connectivity, it can run modest office productivity applications. But it is limited in various ways: the small screen, while generally bright and visible on most models, does not have a great deal of screen real-estate; typically 1024x600 or less. RAM and hard drive space is generally less than most notebooks or desktops, and RAM in particular is limited to a maximum of 1GB or 1.5GB, depending on the model, enough for Linux or Windows XP, but not generally enough to run Microsoft Vista quickly. It lacks any form of CD/DVD drive. And the internal CPU (generally an Intel Atom or a VIA C3) is slow, and single-core. A low-end laptop can be bought for as little as $500-$600 with a much larger screen, more memory, a built-in DVD-writer and a more powerful CPU. But it will be quite a bit larger and heavier. That in the end is the key question: is the portability of a netbook worth its tradeoffs? Sometimes yes: if one's computing needs are modest but one wants one's computer whereever you go, then portability is paramount. Sometimes no: those with more than modest computing needs will quickly run into the netbooks' limitations. But whether or not a netbook or a notebook is a better fit, it is nice to have the choice, for a reasonable price.

/it permanent link


Blosxom