Sun 16 Oct 2016 18:02
The Price of Google
I am a Canadian still living in the city in which I was born. I love living in Canada, but life in Canada has its price. Al Purdy, the late 20th century Canadian poet, once wrote about Canada as a country where everyone knows, but nobody talks about, the fact that you can die from simply being outside. It is true, of course: almost everywhere in Canada, the winter is cold enough that a sufficient number of hours outside without protection can lead to death by exposure. But this basic fact is designed into pretty much everything in Canadian life, it is simply accepted as a given by well over thirty million Canadians, and we cope: we wear the right winter clothes, we heat and insulate our buildings in winter, we equip our cars with the right tires, and life goes on. Despite the Canadian winter, Canada is a great place to live.
Google offers a lot of very good free web services: it is "a great place to live" on the Internet, and their services are used by hundreds of milliions of people all over the world. While Google seems about as far removed from a Canadian winter as you can imagine, there's something in their Terms of Service that people seem to rarely talk about, something that might have a bit of a chilling effect on one's initial ardor.
Google, to its credit, has a very clear and easy-to-read Terms of Service document. Here's an excerpt from the version of April 14, 2014, which is the most current version at the time I write this.
When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones. This license continues even if you stop using our Services (for example, for a business listing you have added to Google Maps).Let me pull out for closer examination the most important bits. For readability, I've omitted elipses.
When you submit content to our Services, you give Google (and those we work with) a worldwide license to use such content for the purpose of our Services. This continues even if you stop using our Services.
As you can see, this is pretty broad. You are granting Google and their partners the right to use your content for Google's Services (present and future) anywhere in the world, forever. While it does say that it must be used for the purpose of their Services, it doesn't limit itself to existing Services and it doesn't constrain what a "Service" might be. Since developing and offering Services, broadly understood, pretty much covers the gamut of what Google does as a company, the answer is Yes: by submitting content to their services, you are granting Google and their partners the right to use your content anywhere in the world, forever, for a broadly unconstrained set of purposes.
So does this mean nobody should use Google? Does the Canadian winter mean that nobody should live in Canada? After all, as Al Purdy writes, in Canada you can die from simply being outside.
Well, no, of course not. While Google has the right to do broadly unconstrained things with our content that we submit to them, their self -interest is typically aligned with our's: they want us to entrust our content to them, because they use it to earn money to operate. Therefore, to persuade us to keep submitting content to them, they will work hard to protect and secure the content they already have, in ways they think we consider important. For this reason, I think it's not unreasonable to trust Google with some of my content: I believe they are likely to protect it in sensible ways. Other content I choose not to submit to Google. Just as I am prepared for a Canadian winter, knowing it is the price I pay to live in Canada, I continue to use some Google services, knowing that they will keep and use my content. Many Google services are very good and well worth using, much of my content is not very sensitive, and I trust Google enough to share content with them.
I do wonder, however, how many Google users really understand the rights they are granting to Google. Canada has been around for centuries: the Canadian winter is no secret. But the implications of Google's broad right to use our content are not quite so obvious. It's not really so clear how Google is using the content or might use it in the future, and even if we trust Google, can we trust all those who might put pressure on Google? Quite frankly, we really don't know yet how Google's massive repository of our collective content can be used. We can envision wonderful outcomes: historians a century or two hence coming to insightful conclusions about early twenty-first century society, for example, but we can also envision outcomes not quite so sanguine: for example, a twenty-first century version of Orwell's 1984, a dystopian world of "thought-crimes" and "doublespeak" where content is is scanned for dissent from a prevailing ideology. A certain degree of caution may be warranted: in the case of Google, unlike Canada, we may not have yet seen how severe winter can be. A certain degree of caution is warranted. Yes, use Google, but use it knowing what you are doing.
One last thing to be said: I focus on Google here, but the same issues hold for Facebook, Twitter, Yahoo and other purveyors of free services over the Internet. Read their Terms of Service to learn what rights you are granting by your use of their services, and decide on the basis of that knowledge how to use their services, and even whether you use their services at all. After all, even Canadians sometimes choose to spend winter in Florida, Mexico, or Arizona.
/it permanent linkMon 16 May 2016 20:29
The Sun-Managers Mailing list: a Knowledge Sharing Success Story
Sun-Managers was an email mailing list for system administrators of computers made by Sun Microsystems, Inc. The list operated from mid-1989 to the fall of 2014, and I was privileged to be part of it for almost all of its history. Sun-Managers was founded in May of 1989 by William (Bill) LeFebvre, at Northwestern University. At the time, Bill ran Sun-Spots, a digest-format mailing list for system administrators of Sun systems, but the digest format made it difficult for people to ask questions and get a timely response. He created Sun-Managers, an unmoderated mailing list intended for short-turnaround time questions. This was an immediate success: so much so that by the fall of 1989, the sheer number of messages on the list were swamping mailboxes. In Nov 1989, Bill instituted a simple policy: if someone asks a question on the list, other list members were expected to reply by email directly to the person asking the question, not to the list. The person asking the question, in turn, was expected to summarize the answers received, and send the summary to the list.
I joined the list about this time: I had started a new job at the University of Toronto's Computer Science department, a role that included the administration of a number of Sun workstations and servers. I was looking for resources to help me with my Sun system administration tasks, and this list was an excellent one. Because of this summary policy, the list volume was manageable enough that I could keep up, yet the turnaround time on questions was short. I mostly "lurked" at first, reading but not replying. I felt too inexpert to answer many questions, and too shy to ask. However, I learned a great deal from what I read. Moreover, the summaries were archived, and this archive became a resource in itself, a knowledge-base of practical information about administering Sun systems.
The list grew very rapidly: 343 summaries in 1990, and over 1000 in 1991. In August of that year, it was noted that certain questions were being asked often, and rather than waste effort answering the same question several times, a "Frequently Asked Questions" (FAQ) file was instituted. The first version was created by a list member from Boston University, and quickly grow to dozens of answers.
By November of 1992, the list had grown to thousands of members, and the workload of managing the list, editing the FAQ and coaching list members on how to follow the list policy had become significant. Many list members were not individuals, but "mail exploders": email addresses that themselves were mailing lists going to multiple individuals at a given site. This made handling list membership issues more complex. Bill LeFebvre decided to hand the list over to others. Two list members stepped up: Gene Rackow from Argonne National Laboratory to run the list software, and me, to handle the FAQ and policy work. By this time, I had benefitted from the list for a while, and I felt it was time to "give back". At the time, I wasn't in a position to actually run the list: I'd just taken on a new role as system manager of the University of Toronto Computer Science Department's teaching laboratories, and had my hands full, but I could certainly help with content. I was really glad to work together with Gene, a seasoned system administrator, on this rapidly growing list, which we moved to a system at Argonne National Labs, where Gene worked.
The list continued to grow through the 1990s. During this time, Sun Microsystems was quietly supportive, helping Gene with hardware (a Sparcstation 1) as the list grew. By 1996, over two thousand summaries a year were being produced, peaking at 2243 in 2002. In May of 1998, Gene Rackow handed over list management to Rob Montjoy from the University of Cincinnati, who in turn handed over list management to Bill Bradford in November of 2000. The list was moved from Argonne National Labs to a system in Austin run by Bill. I continued to manage the list policy and edit list information files, such as a "think before posting" reminder and the FAQ which had grown to 79 questions by December 2000. This had become a bit too large, and so 19 questions deemed less frequently asked were trimmed. A further trim was made in 2005, reducing a 65-question FAQ to one under 60.
By 2002, the list had reached over five thousand members and the workload of running the list software and managing the list subscriptions had become too much for one person. Dan Astoorian, my colleage at the University of Toronto, stepped in to help, and he was sorely needed. Moreover, the list server hardware was feeling the strain: by mid-2001, list members were being asked to contribute used equipment to upgrade the server. This was resolved in April 2003, when the list was migrated to a machine at the University of Toronto that had been donated to the University by Sun Microsystems.
But times were changing. Linux was growing rapidly and Sun's business was being affected. The web provided more resources for people seeking help administering their systems, and fewer were relying on mailing lists. The list fell below 2000 summaries per year in 2003, under 1200 in 2004, and dropped below 1000 in 2005. By 2008, summaries per year had fallen to about 300, fewer than in any full-year period previously. Sun Microsystems ran into significant difficulties during the economic downturn that year, and was sold to Oracle the following year. As for the list, in 2009, there were just over 200 summaries, declining to less than 100 in 2011. More disturbingly, the ratio of summaries to questions was steadily declining, from over 24% in 2001 to less than 16% by 2010: for some reason, list members were becoming less diligent in summarizing responses back to the list. Summaries and list traffic in general continued to decline rapidly: there were just over 50 summaries in 2012, and less than a dozen in 2013. In 2014, there were only three by October, when a hardware failure provided a good excuse to retire the list.
The Sun-Managers mailing list, over its twenty-five year lifetime, provided help to many thousands of system administrators, producing over 29000 summaries, an archive of which continues to be available. Special thanks is due to the superb people I was privileged to work together with on the list over the years: William LeFebvre, Gene Rackow, Rob Montjoy, Bill Bradford, and Dan Astoorian. Gratitude, also, is due to the thousands of list members who so freely shared their knowledge and expertise with others.
The list summary archive, and an account of the list's history (on which this blog entry is based) is available at http://sunmanagers.cs.toronto.edu. The list's official web page, http://www.sunmanagers.org, continues to be maintained by Bill Bradford.
/it permanent linkMon 09 May 2016 10:54
Slow Windows Update on Windows 7? Install two Windows Update patches first.
Recently, I noticed Windows Update taking many hours or even days on Windows 7, especially for new installs/reinstalls. Task manager shows svchost.exe exhibiting large memory usage (suggestive of a memory leak) and/or sustained 100% CPU.
Happily, there's a workaround: grab a couple of patches to Windows Update itself, and manually install them. Get KB3050265 and KB3102810 from the Microsoft Download Center, and install them manually in that order, before running Windows update. These two patches seem to address the issues: after they were installed on some of our systems here, Windows Update ran in a reasonable amount of time (an hour or two perhaps on slow systems when many updates are needed, but not days).
/it permanent linkFri 04 Mar 2016 10:25
Apple vs FBI: it is about setting a precedent.
There seems to be lots of confusion about Apple's current dispute with the FBI, despite Apple's message to their customers of Feb 16, 2016, where they tried to explain the issue. Here's the issue in a nutshell.
The FBI has an Apple iPhone that was the work-phone of a now-dead terrorist. The FBI wants to read what is on that phone. But the phone is encrypted, and runs a secure version of iOS. The FBI wants Apple to make an insecure version of iOS to run on that phone, so that the FBI can break into the phone and read the contents. Apple has, so far, refused.
This issue will no doubt be addressed in the US courts and legislatures. What is at stake is the precedent it sets. The essential question is this: to what extent should law enforcement be able to compel others to assist them with an investigation? Should software developers be expected to make insecure versions of their software, so that law enforcement can "break in"? It will be very interesting to see how this plays out.
/it permanent linkFri 13 Mar 2015 11:08
Apple's new Macbook laptop: like a tablet?
I rarely write about Apple's products because they have no shortage of press already: Apple has superb marketing, and many of their products are remarkable in one way or another, often for excellent design and engineering. Their new super-thin Macbook laptop is no exception: it's very thin and light, has a superb high-resolution screen, a carefully redesigned trackpad and keyboard, and is very power-efficient. New to this machine is the fact that it has only a single USB-C port for power, data, and video (it also has a headphone port for audio). Most laptops have many more ports than this. A USB port used for both power and data, and a headphone port, but nothing else, is more typical of a tablet, not a laptop. Indeed, some of the press seems to have really latched onto this "tablet" comparison. Brooke Crothers of Foxnews/Tech claims that the MacBook is "almost a tablet" and states that the MacBook "is an iPad with a keyboard" while Lily Hay Newman of Slate claims that "you should think of the new macbook as a tablet". So how true is this? Is the new MacBook like a tablet?
Well, no, it's not. The MacBook's screen is not touch-capable, and is not capable of being used like a tablet screen. The keyboard and touchpad is an integral part of the machine: it is not optional or detachable. It runs a desktop/laptop operating system (MacOSX), not a tablet operating system such as iOS. The device is not a tablet, it is not "almost a tablet", it is not even like a tablet. It's a small, light, power-efficient laptop. If it must be compared to something, perhaps it can be compared to a netbook, though it has a much better keyboard, touchpad and screen, and is much more expensive.
Then what about the single I/O port? That's simply the consequence of the new USB 3.1 specification, which finally allows a USB connection to deliver enough power to power a laptop, and defines the USB-C connector, which in addition to USB data lines, provides "alternate mode" data lines that can be used for display protocols like DisplayPort. This makes it possible for Apple to build multiport adapters for the Macbook that provide video (e.g. HDMI), data (USB-A) and charging ports, making it unnecessary to provide all those ports separately in the laptop itself.
So does this make the Macbook "like a tablet"? While it is true that tablets have been using single connectors for power and data for a long time, this doesn't make the Macbook tablet-like. It's not the presence of a single shared power/data connector that makes something like a tablet, it's the interactive screen. Yes, a horse has four legs and is often sat upon, but a horse is not anything like a chair.
So will I be getting one of the new Macbooks? Probably not: like a fine thoroughbred, the new Macbook is lovely but rather too expensive for me. The need to buy the multiport adapter separately makes the already high cost of acquisition even higher. The high price doesn't stop me from admiring the design and engineering of this new laptop, but it does keep me from buying one.
/it permanent linkSat 05 Oct 2013 17:03
What's wrong with Blackberry? (and some ideas about how to fix it)
Blackberry is in the news a fair bit these days, and the news seems to be all bad. As the firm reports close to a billion dollars in quarterly losses, a Gartner analyst recommends that enterprise customers find alternatives to Blackberry over the next six months. What's the problem?
Basically, fewer and fewer people want to buy Blackberry phones. The problem isn't so much that Blackberries don't do what they're supposed to, it's that people now perceive iPhones and various Android phones as much better choices, and are buying those instead. Why? The reason is that an iPhone or an Android phone isn't the same sort of phone as a traditional Blackberry. An iPhone or an Android phone is a true smartphone, i.e. an "app" phone, a platform that runs a whole "ecosystem" of third party software. A traditional Blackberry is a "messaging" phone, a device that specializes in effective messaging, such as email. Yes, it can run applications too, but that's not its primary function, and it shows.
To illustrate, consider email. Sending email requires the ability to type quickly. A physical keyboard works best for this, one that stretches across the short side of the phone. The screen, located above the keyboard, then becomes roughly square: it can't be very wide, because the phone will then become too wide to hold easily or to fit in one's pocket, and it can't be very tall or the phone will become too long. A square screen is fine for messaging, but for other things that a smartphone might like to do, such as displaying video, one wants a screen that is significantly wider than it is tall. A smartphone handles this by having a rectangular screen: when doing messaging, one holds the phone vertical: the bottom half of the screen then turns into a keyboard, and the top half turns into a roughly square messaging display. When watching media, such as videos, the phone is held horizontal, allowing a screen that is wider than it is tall. Hence the smartphone is useful in a broader set of ways: it is not just a messaging device. Smartphones have become good enough at messaging that many people do not feel they need a dedicated messaging device. Once the smartphone is the only device that people feel they need to carry, there's much less demand for a messaging phone.
Blackberry realized the problem, and tried to create a smartphone of its own. For instance, in 2008, it released the Blackberry Storm. But it became clear that Blackberry's phone OS was not as well suited for general smartphone use as iOS and Android. The Storm was not a commercial success because it did not work as well as competing phones. In response, in 2010 Blackberry bought a company called QNX that had a powerful OS, and started building devices to use it: first the Playbook, released in spring 2011, and then the Z10 phone in early 2013, followed a few months later by the Q10 and other phone models.
The new Blackberry OS works better than the old in delivering smartphone apps, but it was not very mature in 2011, and was available only on a tablet (the Blackberry Playbook). Unfortunately, the Playbook did not sell particularly well because Blackberry badly misrepresented it, calling it the "best professional-grade table in the industry" though it lacked many features of the market-leading iPad, including key messaging features such as a standalone email client. While it could have been a market success if it were marketed as a Blackberry phone accessory, a role it could effectively play, at release it was clearly not a true general-purpose tablet like the iPad. So it accumulated few apps, while Apple's iOS and Google's Android accumulated many. Blackberry realized this fairly quickly, and released an Android application emulation environment for their OS in early 2012, which allowed many Android apps to be easily moved over to the new OS. But few Android developers bothered to make Blackberry versions of their Android apps, given the relatively few Playbooks sold.
In the meanwhile, Blackberry did itself no favours by making it clear that there was no future for its existing phones, while failing to deliver a phone running its new OS for more than a year. This merely encouraged Blackberry users and app developers alike to switch to another platform. When the Z10 phone finally came out in 2013, the bulk of its apps were those that had been written for or ported to the Playbook, a far less rich set of applications than any Android or iOS phone. And while the Z10 is a decent phone that comes with some very nice messaging features, Blackberry did not do an effective job of touting the unique features of the Z10 that iPhones and Android phones do not have. Moreover, the price was set high (about the same as an iPhone or high end Android phone) and Blackberry produced a huge number, expecting to sell a great many. Some sold, but many didn't, and Blackberry's recent $1B loss was due primarily to writing down the value of unsold Z10s.
Blackberry sits today in a difficult position. No, it is not about to go out of business: the company is debt-free and has a couple of billion dollars in the bank. But its smartphone is not selling. What should it do now?
Blackberry's best chance at this point to make its smartphone platform viable is to take its large inventories of written-down Z10 phones and sell them cheaply, using a renewed marketing campaign that focuses on the unique features of the phone's software. The Z10 hardware is really no different than the various Android and iPhone models out there: if the phone is to sell, it has to be on the basis of what makes it unique, and that's the Blackberry OS software. For instance, Blackberry should show everyone the clever virtual keyboard that supports fast one-handed typing, the unique messaging hub, and the "Blackberry Balance" software that lets you separate work items from personal items on the phone. Blackberry needs to hire the best marketing people in the world to help get the message out. This is a "make or break" situation for the platform.
Secondly, Blackberry should modify the OS to run Android apps natively, without repackaging. Android app developers are not going to repackage their apps for Blackberry. Blackberry needs to recognize this and make sure that Android apps will appear automatically on Blackberry devices. Blackberry will need to find a way to get Google Play (the Android app store) ported to the platform. It is too late to build a separate app ecosystem around the Blackberry OS: it has to leverage an existing ecosystem, or die. Android is really the only viable option for Blackberry right now.
Finally, Blackberry needs to recognize that a niche market for dedicated messaging devices exists, and continue making devices that are the best messaging phones available, while tapping into an existing app ecosystem. Blackberry needs to be careful not to compromise the devices' effectiveness for messaging: it should pay attention to how people use the devices in the real world, and address quickly whatever issues they have. If Blackberry can't find a way of building such messaging devices using its own OS, it should switch to Android. Blackberry knows how to make superb messaging phones, and it should find a way to continue to do what it does best.
/it permanent linkTue 20 Aug 2013 22:45
Cloud Computing: Everything Old is New Again
There is a great deal of hype about Cloud Computing at the moment, and it's getting a great deal of attention. It's no wonder: when firms such as Netflix, with a market capitalization of over U$15B, use cloud computing to deliver streaming video services to nearly forty million customers around the world, and when the US Central Intelligence Agency spends U$600M for cloud computing services, people take notice. But what is it all about?
Cloud computing is not really a new thing, it's a variation of a very old idea, with a new name. In the 1960s, when computers were large and expensive, not everyone could afford their own. Techniques for sharing computers were developed, and firms arose whose business was selling time on computers to other firms. This was most commonly described as "timesharing". IBM released its VM virtualization environment in 1972, which allowed a mainframe computer to be divided up into virtual computers, each for a different workload. A timesharing vendor could buy and operate an IBM computer, then rent to their customers "virtual computers" that ran on that machine. From the customer's perspective, it was a way to obtain access to computing without buying one's own computer. From the vendor's perspective, it was a way of "renting out" one's investment in computer infrastructure, as a viable business.
Today, cloud computing, as did timesharing in the past, involves the renting of virtual computers to customers. The name has changed: then, it was called "timesharing"; now, "cloud computing". The type of physical machine has changed: then, a mainframe was used to provide computing services; now, a grid computer. The interconnection has changed: then, leased data lines were typically used; now, the internet. But the basic concept is the same: a vendor rents virtual computers to customers, who then use the virtual computers for their computing, rather than buying their own physical computers.
The advantages and disadvantages of today's cloud computing echo the pros and cons of yesterday's timesharing. Advantages include risk sharing, the ability to pay for just the amount of computing needed, the option to scale up or down quickly, the option to obtain computing resources without having to develop and maintain expertise in operating and maintaining those resources, and the ability to gain access to computing resources in very large or very small quantities very quickly and easily. Moreover, cloud computing vendors can develop economies of scale in running physical computers and data centres, economies that they can leverage to decrease the cost of computing for their customers. Disadvantages of cloud computing include possibly higher unit costs for resources (for example, cloud data storage and data transfer can be very expensive, especially in large quantities), a critical dependance on the cloud computing vendor, variable computing performance, substantial security and privacy issues, greater legal complexity, and so on. These tradeoffs are neither surprising nor particularly new: in fact, many are typical of "buy" vs. "rent" decisions in general.
Then why does cloud computing seem so new? That, I think, is an artifact of history. In the 1970s and early 1980s, computers were expensive and timesharing was popular. In the 1990s and early 2000s, computers became increasingly cheaper, and running one's own became enormously popular. Timesharing faded away as people bought and ran their own computers. Now the pendulum is swinging back, not driven so much by the cost of computers themselves, but the costs of datacentres to house them. A few years ago, Amazon Inc. saw a business opportunity in making virtual machines available for rental: it was building grid computers (and datacentres to house them) for its own operations anyway; why not rent out some of those computing resources to other firms? In so doing, Amazon developed an important new line of business. At the same time, a huge number of new internet firms arose, such as Netflix, whose operations are dominantly or exclusively that of providing various computer-related services over the internet, and it made a great deal of sense for such firms to use Amazon's service. After all, when a company's operations are primarily or exclusively serving customers on the internet, why not make use of computing resources that are already on the internet, rather than build private datacentres (which takes time, money and expertise)? These new internet firms, with lines of business that were not even possible a decade or two ago, and Amazon's service, also only a few years old, have lent their sheen of newness to the notion of "cloud computing" itself, making it appear fresh, inventive, novel. But is it? The name is new, yes. But in truth, the concept is almost as old as commercial computing itself: it has merely been reinvented for the internet.
Of course, the computing field, because of its inventiveness, high rate of change and increasing social profile, is rather at risk of falling into trendiness, and cloud computing certainly has become a significant trend. The danger of trendiness is that some will adopt cloud computing not on its own merits, but solely because it seems to be the latest tech tsunami: they want to ride the wave, not be swamped by it. But cloud computing is complex, with many pros and cons; it is certainly a legitimate choice, as was timesharing before it, but it is not necessarily the best thing for everyone. It's easier to see this, I think, if we look beyond the name, beyond the trend, and see that the "rent or buy" question for computing has been with us for decades, and the decision between renting virtual machines and buying physical ones has often been complex, a balance of risks, opportunities, and resources. For an internet firm whose customers are exclusively on the internet, renting one's computing assets on the internet may make a great deal of sense. For other firms, it may not make sense at all. Deciding which is true for one's own firm takes wisdom and prudence; a healthy dose of historical perspective is unlikely to hurt, and may help cut through the hype.
/it permanent linkTue 23 Apr 2013 12:56
Handling Unsolicited Commercial Email
My email address is all over the web: at the time of writing this, a search on google for my email address produces about 15,800 results. So anyone who wants to find my email address can do so easily. Many people or companies who want to sell me something send me email out of the blue. I get a great deal of such unsolicited commercial email, too much to read or pay adequate attention to. I simply delete them. Unfortunately, many sources of such email persist. So for some time now, I've elicited the help of technology. I process my incoming email using procmail, a powerful piece of software that lets me script what happens to my email. When I receive unsolicited commercial email, if it is from a vendor or organization I don't have a relationship with, I will often add a procmail rule to discard, unseen, all future email messages from that vendor. I've got about 400 organizations (mostly vendors) in my discard list so far, and the list slowly grows. Am I still getting unsolicited commercial email from these sources? I am, but I am not seeing it. It's the same effect, really, as manual deletion (i.e. the message is deleted, unread), but it's easier for me, because I am not interrupted. But of course I think it would be better still if the email were not sent at all.
If you are a vendor with whom I do not have a pre-existing relationship, and you want to send me email introducing your products, please don't. I do not accept cold salescalls either. Instead, advertise effectively on the web, so that if I am looking for a product like yours, I can find you. If you must contact me directly, send me something by postal mail, where, unlike email, the communication does not have an interruptive aspect.
/misc permanent linkThu 29 Nov 2012 00:00
A closer look at the University of Toronto's international ranking in Computer Science.
International rankings of universities seem to be all the rage these days. The interest seems to be fed by three rankings of particular prominence that have emerged in the past decade. These are Shanghai Jiao Tong University's Academic Ranking of World Universities (sometimes known as AWRU, or simply as the "Shanghai Ranking"), Quacquarelli Symonds' QS World University Rankings, and the Times Higher Education World University Rankings. Part of the attractiveness of these rankings is that they can become a way of "keeping score", of seeing how one institution does in comparison to others.
My employer, the University of Toronto, does quite well in these rankings, particularly my department, Computer Science. The subject area of Computer Science is not ranked separately in the Times Higher Education World University Rankings (it's bundled together with Engineering), but in the other two, Toronto has consistently ranked in the top ten in the world each year in Computer Science, with only one exception.
This exception is recent, however, and worth a closer look. In the QS World University Rankings for Computer Science and Information Systems, Toronto dropped from 10th in 2011 to 15th in 2012. This big drop immediately raises all sorts of questions: has the quality of Toronto's Computer Science programme suddenly plummetted? Has the quality of Computer Science programmes at other universities suddenly soared? Or has the QS World University Rankings changed its methodology?
To answer this question, let's look at how other universities have changed from 2011 to 2012 on this ranking. Many (MIT, Stanford, Berkeley, Harvard, Oxford, Cornell, and others) stayed where they were. Others dropped precipitously: Cambridge University dropped from 3rd to 7th, UCLA from 8th to 12th, and Caltech plummetted from 7th to 27th. Some other universities went up: Carnegie Mellon University (CMU) went from 9th to 3rd, ETH Zurich from 11th to 8th, the National University of Singapore (NUS) from 12th to 9th, and the Hong Kong University of Science and Technology (HKUST) soared from 26th to 13th. Surely these curious and significant changes reflect a methodology change? But what?
The QS university rankings website, in the Methodology section, Academic subsection, reveals something of interest:
NEW FOR 2012 - Direct Subject Responses Until 2010, the survey could only infer specific opinion on subject strength by aggregating the broad faculty area opinions of academics from a specific discipline. From the 2011 survey additional questions have been asked to gather specific opinion in the respondent's own narrow field of expertise. These responses are given a greater emphasis from 2012.To understand this change, it needs to be recognized that the QS rankings rely highly on the opinions of academics. A large number of academics around the world are surveyed: the QS rankings website indicates that in 2012, 46079 academic responses were received, of which 7.5% addressed Computer Science." The seemingly modest change made in 2012, to weigh more heavily the opinions of academics in a field about their own field, given its impact on the 2012 results for Computer Science, leads one to wonder about the regional distribution of academics in Computer Science in comparison to academics in other disciplines. One significant factor may be China.
In 1999, courses in the fundamentals of computer science became required in most Chinese universities, and by the end of 2007, China had nearly a million undergraduates studying Computer Science. While QS rankings does not indicate regional distribution by discipline for the academics whose opinions it consults, the surge in the number of Chinese computer scientists worldwide in the past decade almost certainly must have an effect on the regional distribution of academics in Computer Science as compared to other disciplines. As such, is it any surprise to see world universities prominent in China that possess strong Computer Science programmes (such as HKUST and NUS) climb significantly in the rankings, and others less prominent in China plummet? But if a world ranking of universities is so affected by regional shifts in those whose opinion is being solicited, how reliable is it as an objective gage of the real quality of a given university?
Perhaps a more reliable gage of quality can be found in the Shanghai ranking, which is not opinion-based, but relies on concrete indicators and metrics. On the Shanghai ranking, the University of Toronto consistently ranks 10th in the world in Computer Science in 2010, 2011, and 2012. But what does this mean, concretely?
To answer these questions, we need to grapple with an important fact: in Computer Science, the US dominates. As a nation, the US has been enormously supportive of Computer Science ever since the field first existed, and as a result, it has become pre-eminent in computing. Nine of the top ten schools in the Shanghai ranking, and twenty of the top twenty-five, are in the US. For the University of Toronto to be one of the handful of universities outside the US to break into the top twenty-five, and the only one to break into the top ten, is a significant accomplishment. A chart is illustrative:
Of course, the University of Toronto is in Canada, so a comparison to other schools in Canada is also illustrative. For Computer Science, on the Shanghai ranking, there seems to be no close Canadian rival. In 2012, UBC comes closest, being a only a few points short of breaking into the top 25, but all other Canadian schools rank well back:
Even compared to other disciplines that have Shanghai rankings (only science, social science, and related disciplines seem to be ranked), Toronto's pre-eminence in Computer Science in Canada is striking:
From a score-keeping perspective, I think we can conclude that the University of Toronto is doing very well in Computer Science with respect to other universities in Canada, and it is one of the few non-US schools that can keep up with the US in this field.
But all this needs to be put into perspective. After all, rankings are not a full picture, they're aggregations of metrics of varying value, they represent a formulaic approach to something (university education) that cannot always be so conveniently summarized, and they reflect methodologies chosen by the producers of the rankings, methodologies that may not always best reflect objective quality. Of course, if the University of Toronto were to climb to fifth, I'd be pleased, and if it were to drop to fifteenth, I'd be disappointed: surely the score-keeper in me can be allowed this much. But in the overall scheme of things, what matters most for Computer Science at Toronto is not our score on a ranking system, but the objective quality of our programme, the learning outcomes of our students, and the impact of our research, and these things, not our score on rankings, must always remain our top priorities.
/misc permanent linkWed 22 Aug 2012 14:07
Intel desktop CPU price-performance: Hyperthreading not helping?
Typically, CPU prices follow performance. Faster CPUs command higher prices; slower CPUs are available for less. Recent Intel desktop CPUs continue to show this general pattern, but there appears to be more to the story than usual.
At first glance, everything seems to be what you would expect. Using current pricing in US$ at time of writing from newegg.com, we get:
|Processor||PassMark||Price||PassMark/$||Price-Performance vs G640|
But what happens if we look at a more real-life benchmark? Consider SPEC CPU 2006 Integer (CINT2006) Baseline. For each CPU, I used the CINT2006 Baseline results from the most recently reported Intel reference system, as reported on spec.org. In the case of the G640, no Intel reference system was reported, so I used the results for a Fujitsu Primergy TX140 S1p.
|Processor||CINT2006 Base||Price||CINT/$||Price-Performance vs G640|
A look at hyperthreading may provide some answers. Intel hyperthreading is a feature of some Intel CPUs that allow each physical core to represent itself to the OS as two different "cores". If those two "cores" simultaneously run code that happens to use different parts of the physical core, they can proceed in parallel. If not, one of the "cores" will block. The i3 and i7 CPUs offer hyperthreading, the Pentium G and i5 do not. It turns out that the PassMark benchmark sees significant speedups when hyperthreading is turned on. SPEC CINT2006, and many ordinary applications, do not.
What about SPEC CINT2006 Rate Baseline, then? The SPEC CPU Rate benchmarks measure throughput, not just single-job performance, so maybe hyperthreading helps more here? Let's see:
|Processor||CINT2006 Rate Base||Price||Rate Base/$||Price-Performance vs G640|
What does this mean, then? It suggests the increase in price from a non-hyperthreaded to a hyperthreaded Intel desktop processor may reflect more an increase in PassMark performance than an increase in real performance. Hyperthreading may have a positive effect, it seems, but typically not as much as PassMark suggests. At present, for best real-world price-performance in Intel desktop CPUs, I would consider models without hyperthreading.
/it permanent linkTue 26 Jun 2012 16:56
How to avoid being fooled by "phishing" email.
A "phishing" email is an email message that tries to convince you to reveal your passwords or other personal details. Most often, it tries to send you to a website that looks like the real thing (e.g. your bank or your email provider) but is really a clever duplicate of the real website that's set up by crooks to steal your information. Often the pretence looks authentic. If you fall for it and give your password or other personal details, criminals may steal your identity, clean out your bank account, send junk email from your email account, use your online trading account to buy some penny stock you never heard of, send email to all the people in your address book telling them you're stranded in a foreign country and need them to wire money immediately, or do any number of other bad things.
But there's a really easy way to avoid being fooled by phishing messages. If you get a message that asks you to confirm or update your account details, never, ever go to the website using a link that is in the email message itself. Remember, anyone can send you a message with any sort of fraudulent claim, containing any number of links that pretend to go to one place, but really go to another. So if you feel you must check, go to the website that you know for sure is the real thing: use your own bookmark (or type in the URL yourself), not the link in the message.
/it permanent linkThu 15 Dec 2011 15:14
Dealing with unsolicited salescalls (cold calls).
For many years, I've been plagued by unsolicited salescalls. It's not very hard to find my phone number, and various people (mostly in the IT realm) call me up out of the blue hoping to sell me something. The interruption is unwelcome, even if the product isn't.
For some years now, my policy is to explain to the caller that I don't accept unsolicited salescalls, sincerely apologize, and end the call. Occasionally, I am then asked how I am to be contacted. I explain that I prefer to do the contacting myself: when I have a need, I am not too shy to contact likely vendors and make inquiries about their products.
Occasionally I run into someone who is offended by my unwillingness to take their unsolicited salescall. I do feel more than a little sympathy for the salesperson when this happens: I imagine they may think I objected to something they did, or to their manner. The fact is, I handle all unsolicited salescalls this way. As for whether it is intrinsicly offensive to reject unsolicited salescalls out of hand, I don't think it is. Indeed, it is natural for a salesperson to want their salescall, even if unsolicited, to be better accepted. But it is unreasonable for any salesperson to expect that unsolicited sales inquiries to strangers will always be welcome. But I do apologize, each time, and in general, when I so quickly end telephone conversations with salespersons who call me out of the blue.
Dear reader, if you are a salesperson, and you are tempted to contact me to sell me something, please do not call. Instead, just advertise generally (and if you must, send me some mail in the post). Trust me to find you when the need arises. I frequently do.
/misc permanent linkTue 26 Jul 2011 17:15
Gigabit ethernet, and Category 5, 5e cabling.
There seems to be lots of folklore that says that Category 5 (Cat5) cabling can't run gigabit ethernet. Contrary to widespread belief, that's mostly false. Here's the situation. Cat5 has the bandwidth to run 1000baseT. But early experience with 1000baseT showed that 1000baseT was pickier about certain cabling issues that weren't specified in the Cat5 standard, such as crosstalk and delay skew, so the Cat5 standard was enhanced for 1000baseT to enforce limits on these. This enhanced standard is called Cat5e. But the fact is that most Cat5 installations already perform to the Cat5e spec.
If someone tells you to rip out a Cat5 installation because it can't support 1000baseT, you're being prompted to do something that is expensive and probably unnecessary. All you generally need is test the existing cables to the Cat5e standard (using a Cat5e cable tester) and replace the ones that fail. Often, most if not all the cables will be fine. Or just use the cables for 1000baseT and replace any that exhibit problems.
Cat6 and Cat6a are a different matter. Cat6 supports a spectral bandwidth of 250MHz, up from Cat5/Cat5e's 100Mhz, while Cat6a supports 500Mhz. Cat6 cabling will run ten gigabit ethernet (10GbaseT) to 37-55m, while Cat6a will run 10GbaseT to 100m. So it's worth choosing Cat6 or Cat6a over Cat5e for new cabling, if the cost increment isn't too high, so that the cabling can support 10GbaseT, even if it's not needed today.
/it permanent linkMon 30 May 2011 21:26
Einstein's special relativity isn't as complicated as many people seem to think.
I run into people who think that special relativity is some sort of mysterious thing that only Einstein and physicists can understand. But it's not. It's a bit weird, but it's no weirder than the earth being a globe.
Originally people thought that light moved like any other moving object. Einstein thought about this and wondered: what would happen if you followed some light and sped up until you travelled at the same speed as it. Then light would look to you like it was stopped. But stopped light (light "standing still") didn't (and still doesn't) make sense. So Einstein thought: what if light travels at the same speed no matter how fast you're going? What would this mean?
Well, what does it mean to travel "at the same speed"? It means light covers the same amount of distance in a given amount of time. Or, put another way, light takes the same amount of time to cover a given distance. So if the distance is short, light takes less time to go the distance. If the distance is longer, light takes proportionally more time to cover it.
So Einstein thought: OK, if light travels at the same speed for everyone no matter how fast they're going, what would that mean for someone going very fast? Imagine they're going nearly the speed of light, and are being chased by a beam of light. Clearly the light isn't going to get closer to that person as quickly as it would get closer to someone who was standing still. Ordinarily, you would think that light was moving "slower" for the person who is moving away from it. But if light moves at the same speed for everyone, than something else must be going "slower" for that person. The only possibility is time.
Put it this way: light covers a certain distance in a second. To someone watching, the pursuing light isn't making up the distance quite so fast between it and the moving person, because the person is moving away so fast. But for the moving person, light is moving as fast as it always does, it is the second that takes longer.
This sounds a little bit crazy since we aren't used to thinking of time moving faster for some people and slower for others. But it does. The reason we don't notice is that the speed of light is very fast and we can't easily go at speeds close to it.
It's the same sort of thing as the world being round (i.e. a globe). It looks flat to us, but only because it is so big that we can't see enough of it at once to see it curve. Go high enough and we can see the curve of the earth's surface easily enough.
Similarly with special relativity. Time moves slower for those who move fast. It's not obvious to us because we usually don't move very fast, so at the speeds we move, the time differences are too small to notice. But in 1971, Joseph Hafele and Richard Keating took some very accurate (cesium atomic) clocks abord commercial airliners and flew around the world. They compared their clocks to the very accurate clocks in the US naval observatory: the clocks were indeed different, and showed the results that Einstein had predicted.
What this this mean? Well, if you can wrap your head around the concept of the world being a globe, you can wrap your head around the concept of time moving more slowly for those who move fast. And that's it, right?
Well, not really. There's also general relativity (and it affects Hafele and Keating's results too). But that's a bit more complicated, and I'm not going to get into it now.
/misc permanent linkWed 23 Feb 2011 11:10
Exchanging files in docx format may lead to problems
When Microsoft came out with Office 2007, the default save format for files was switched to a new format based on XML. For Microsoft Word, for example, instead of files being saved in .doc format by default, they were now saved in .docx format. If you use Microsoft Word 2007 or 2010, you'll notice that when you save a Word document, it saves it as document.docx instead of document.doc.
Unfortunately, now there seems to be an incompatibility between how Word 2007 and Word 2010 interpret .docx files. Apparently, possibly depending on how one's printer is configured, when users of Word 2007 and Word 2010 share files in .docx format, some spaces (seemingly random) between words in the file are dropped.
This has been reported on various places on the net including the CBS Interactive Business Network, cNET.com, and Microsoft's own user forums.
For now, I suggest using the older .doc format for users of different versions of Microsoft Word to exchange documents. For publishing documents, instead of using a native Word format, I suggest using a widely-used open document standard like PDF. CutePDF is a useful free Windows printer driver that lets you create PDF files from any Windows application by simply printing to a CutePDF printer.
/it permanent linkFri 03 Dec 2010 21:52
What's right about ikiwiki?
Chris Siebenmann pointed me today at ikiwiki. It's a wiki that can also function as a blog. It's potentially interesting, he said. And he was right: to me, it seems definitely interesting. I've only started looking at it, but there's something about it that I like very much, something it does right that most web 2.0 applications seem to do wrong: ikiwiki uses the right sort of tool to store the wiki text. What's the right tool? In my opinion, it's a revision control system (well, to be more exact, a filesystem coupled with a revision control system).
Why is this the right tool? Well, what's wiki data? It's a collection of edited text documents. Databases, such as those used by most wikis and blogs, are designed for large collections of data records, not documents. Yes, they can handle documents, but using them for a collection of documents is like using a tractor-trailer for a trip to the beach. Yes, you can do it, but it's a bit excessive, and you may end up stuck in the sand. On the contrary, it seems to me that a filesystem, not a database, is the appropriate tool for document storage, and a revision control system, not a database, is the tool of choice to keep track of document edits.
Then why do so many wiki and blog implementations use databases such as mysql or postgres as their back-end? I don't know. I suspect it's a lack of imagination: when you're holding a hammer, everything looks like a nail. In fact, "lite" versions of these databases (e.g. sqllite) have been created to take advantage of the fact that the full power of these database systems are not needed by many systems that use them. But "lite" databases for wiki/blog back-ends seem to me to be like cardboard tractor-trailers: still the wrong tool, but with some of the overkill stripped out.
Even more to ikiwiki's credit than the fact that it has what I think is the right sort of backend, it also allows you to use a wide array of different revision control systems (svn, git, cvs, mercurial, etc.), or even no revision control system at all. I like this. Revision control systems seem to be a matter of widely varying preference, and ikiwiki's agnosticism in this regard makes it appealing to a wider array of users.
I've only started looking at ikiwiki, and it may be that in the end, I'll decide I don't like it for some reason or another, but whether I end up liking it or not, or whether we use it or not, I think ikiwiki is right in using a revision control system instead of a database for its backend. I wish it were not so rare in this respect.
/it permanent linkTue 04 May 2010 14:51
Adding logout to Firefox: making HTTP authentication more useful.
The HTTP protocol (on which the world wide web is based) offers two forms of simple authentication that are built into pretty much every web browser: Basic authentication and Digest Authentication. For both these authentication mechanisms, the web browser obtains authentication information from the user and retains it to submit to the web site on the user's behalf. A set of authentication information retained for a site by a running web browser is called an authenticated session.
Unfortunately, in most web browsers, including Firefox, there is no easy way to delete that information. Hence once you are authenticated to a web site as a particular user, your web browser will continue to authenticate you to that web site as that user until you exit your browser. It's easy to see this in action: simply go to a site that requires basic or digest authentication, authenticate, browse somewhere else, then return to that site. Did it ask you to enter your password again? No, it remembered who you had authenticated as before, and connected you immediately as that user.
This is often what you want, but not always. Sometimes you might want to logout as one user and login as a different user. You can't easily do this in most web browers without exiting and restarting the browser. Or perhaps you may want to allow someone else to use your web browser, and you don't want to give them your access to certain sites. It would be useful to be able to clear your authenticated sessions.
Some web browsers, such as Firefox, permit clearing all internal authentication and identification information: cached data, cookies and authenticated sessions. In more recent versions of Firefox, the feature is called private browsing, and is focused primarily on browsing without leaving privacy information behind. But this is a pretty blunt instrument: all potentially sensitive data is cleared, such as cookies, not just authenticated sessions. What if all you want to do is log out?
My HTTP logout add-on for Firefox is intended to change this. It adds two menu options to Firefox, one on the Tools menu, and the other on the menu you get when you right-click on the background. In each case, the menu option is called HTTP logout all, and if you select it, it will clear all authenticated sessions in your running web browser. You can easily try it: after installing the add-on, go to a site that requires basic or digest authentication, and authenticate. Now choose "HTTP logout all", and reload/refresh that page. It will not recognize you as the person who had logged in before, and will ask you to log in again.
I'm not the only person who wants the ability to log out when using HTTP authentication. Many of us who have implemented web sites using Basic or Digest authentication have often been asked by users "How do I log out"? On this topic, the Apache foundation writes:
Since browsers first started implementing basic authentication, website administrators have wanted to know how to let the user log out. Since the browser caches the username and password with the authentication realm, as described earlier in this tutorial, this is not a function of the server configuration, but is a question of getting the browser to forget the credential information, so that the next time the resource is requested, the username and password must be supplied again. There are numerous situations in which this is desirable, such as when using a browser in a public location, and not wishing to leave the browser logged in, so that the next person can get into your bank account. However, although this is perhaps the most frequently asked question about basic authentication, thus far none of the major browser manufacturers have seen this as being a desirable feature to put into their products. Consequently, the answer to this question is, you can't. Sorry. - Apache 1.3 documentation.Now at least Firefox users can.
/it permanent linkFri 08 Jan 2010 16:02
Startssl: a better approach to SSL certificates
Perhaps one of the highest profit-margin businesses on the internet is the provisioning of domain SSL certificates. The reason: prices for domain SSL certificates are often very high: up to hundreds of dollars for a 1yr domain certificate, but the cost of producing them is often very low: generally, all that is needed is a simple automated web site that authenticates via email. Typically no human being needs to be involved. Then why do they cost so much money? Probably because only a few certificate vendors are trusted by default in the major web browsers. Nobody wants to use a certificate that is not trusted by default in all the major web browsers, because that would mean a person using one of those browsers will, by default, see scary messages whenever (s)he tries to access the site.
Traditionally, SSL certificate vendors have competed by advertising, each attempting to convince customers that it is more trustworthy than the other guy and thus worth paying more for. But this is generally irrelevant: if the brower trusts the SSL certificate by default, the site will work out of the box, without any scary messages, and the only people who are going to even notice which vendor is used are those who stop to examine the SSL certificate in detail. Few do.
It would be nice (for SSL certificate customers at least) if SSL certificate vendors would start to compete more by price instead. There has been some of that in recent years, but the price of a one year simple domain SSL certificate is still upwards of U$10, with prices most often several times that amount. This is a lot of money for something that is pretty close to zero-cost to create.
Recently, things have started to change. In 2009, Startcom became trusted as a certificate authority by all the major browsers (IE, Firefox, Safari, Chrome). But Startcom is not a traditional SSL certificate vendor. Instead of charging per certificate, Startcom's Certification Authority gives away certificates for free, and charges instead for various authentication services. Simple authentication (the sort that can be done automatically through sending email to a known address and then asking the person to enter into a webpage the secret code that was sent) is free, because it can be fully automated, and thus done cheaply. Once authenticated, the person can generate an unlimited number of the most common sort of domain SSL certificates (1 yr single domain name). More extensive authentication, the sort that requires the participation of a human being to verify a person's identity documents, costs a modest amount of money (U$40/yr). Once authenticated at this higher level, the authenticated person can generate as many as necessary of the less common sort of domain SSL certificates (e.g. 2yr, or wildcard). More extensive authentication services are available, at additional cost. Thus startcom charges for the sort of services that are more intrinsicly expensive (e.g. services that require the attention of a human being, such as extended authentication), and not for automated services that are entirely performed by computer (such as certificate generation). This seems much fairer to the customer.
Is this the future of SSL certificates? I suspect most of the SSL certificate vendors would prefer it not to be: SSL certificate generation is quite profitable at the moment. But it is better economics: the price being charged more closely approximates the cost to offer the service. So if the market for SSL certificates is to more closely approximate a free market, startcom's approach seems quite promising.
/it permanent linkFri 04 Sep 2009 14:57
Assessing H1N1 risk
What sort of risk does H1N1 (Swine Flu) present this flu season? To assess this, it might be helpful to estimate some H1N1 risks and then compare it to risks with which we are more familiar.
So let's look at some numbers. The worldwide case fatality rate of H1N1 (the number of people who have died of H1N1, divided by the number of people who have gotten H1N1) has been estimated to be 0.45%. Unlike seasonal flu, roughly 80% of those who have died of H1N1 are less than 65 years old (typically 90% of seasonal flu fatalities are 65 years old or over). If we assume a 15% probability of getting H1N1 this flu season, the likelihood of someone under the age of 65 dying of H1N1 this season is thus 0.15 x 0.0045 x 0.80, i.e 0.054% or 1 in 1852. This is a little less than the one-year general odds of death due to external causes in the US, approximately 1 in 1681.
/misc permanent linkFri 14 Aug 2009 16:45
What's Good About Twitter?
Twitter has a mixed reputation. Negative views generally express the notion that Twitter is pretty much useless , or is a massive waste of time. Indeed, there is no shortage of evidence for this view. What is the usefulness of knowing that someone is brushing their teeth, or having cereal for breakfast? Probably not much. The problem is that "What are you doing?", the question that a tweet is allegedly supposed to answer, is often not very interesting. What one is thinking, what one has stumbled across, or what one wants to tell the world, could be much more interesting.
One very useful purpose Twitter serves is to announce new articles, blog entries, events, or news items when they appear. Twitterfeed makes this easy: it will check an Atom or RSS feed periodically, and automatically tweet the titles and URLs of new articles to Twitter, allowing anyone following the tweeter to be made aware of the new material. For example, my department now uses Twitter to disseminate its news and events.
So is Twitter a waste of time? Is Twitter useless? Only if one takes Twitter's "What are you doing?" too literally. Indeed, some seem to feel the need to tell others whenever they're yawning, scratching an itch or drinking coffee. Clearly this is not the most interesting of material. But, on the other hand, if one uses Twitter to follow information sources (people or organizations) with a high information content, and/or to disemminate such information oneself, it can be very useful indeed.
/it permanent linkWed 10 Jun 2009 13:43
How well do Java Virtual Machines Scale Up?
Java seems to be a popular language for small to medium-sized applications and its use at that scale is well understood. But what happens when you scale it up to something quite large? It seems that very large Java Virtual Machines (JVMs) are still rather rare. Blackboard is a Java-based learning management system (LMS) now in use at the University of Toronto. The University is rather large, with a great many courses and students, and its Blackboard JVM is correspondingly huge. It turns out that an ordinary off-the-shelf JVM suffered some unusual and unpleasant performance issues (mostly related to garbage collection) when scaled this large. The university and Sun Microsystems eventually resolved the issues quite nicely (using the new and still somewhat experimental Java G1 garbage collector) but it was an eventful journey. John Calvin of the University has put together a rather interesting talk about this, which will be given at the university on June 23rd, and later this summer at BBWorld 2009.
/it permanent linkTue 07 Apr 2009 14:43
Understanding Portable Displays
Perhaps the most important thing about a portable computer, be it a notebook, netbook, PDA, smartphone, or internet tablet, is what it provides you versus what it demands from you. One of the most important things a portable machine provides is logical screen area or screen resolution: the amount of data it can show you on the screen at one time. But of the most important things a portable machine requires/demands is weight: what does it take to carry it?
Screen resolution is measured as a grid of pixels (dots) in width x height format, e.g. 1024x768 means a screen that is 1024 dots wide and 768 dots high. Weight is of course not the only thing that determines portability: size is important too, but generally larger machines are heavier and smaller ones are lighter, so weight is a good shorthand for "weight and size".
A quick way to approach the costs and benefits of a portable computer is to compute the ratio of its benefits (e.g. screen resolution) to its portability cost (e.g. weight). So a quick assessment of a portable computer is to compute its pixel to weight ratio: if the weight ratio is high, the portable computer may compare better to one that has a lower pixel to weight ratios. I've written a little tool to compute this information (in units of pixels per milligram, i.e. ppmg), at http://www.cs.toronto.edu/~jdd/screenspec.cgi.
Pixel to weight ratio isn't quite enough, though, because there are limits to human sight: a portable computer is of no use if the pixels are so small that they cannot be easily seen. "Too small" depends on the distance the screen is from one's eyes. I tend to use devices like cellphones and PDAs at a distance of 18 inches from my eyes, and laptops at 24 inches. Generally, distance multiplied by the pixels per inch of the screen is a constant. For example, for me, I'm quite comfortable with 170 ppi at 24 inches, but beyond that, I feel some eyestrain. At 18 inches, that works out to (170 x 24) / 18 = 227 ppi. In my (anecdotal) experience, many people seem comfortable with 125ppi at 24 inches and 167ppi at 18. Of course, there is much more to this than a simple ratio: tolerance for high pixel densities varies depending on what the person is doing, what fonts are being used, and many other things.
Still, a pixel to weight approach lets one compare machines in interesting ways: for example, an Apple Ipod Touch has a 3.5" 480x320 screen and weighs 115g; that's 164 ppi and a pixel to weight ratio of 1.3. This is comparable to a Nokia E90 Communicator, which has a 4" 800x352 screen and a weight of 210g; its ppi is 218 and pixel to weight ratio is 1.34. But now consider a Nokia N810 Internet tablet: its 4.13in 800x480 screen (ppi is 225) and weight of 226g gives a significantly higher pixel to weight ratio of 1.69. But with ppi around 220 vs. the Ipod's 164, either Nokia device may result in eyestrain where the Ipod Touch does not.
Now look at some notebooks. A (large and heavy) Dell Vostro 2510 notebook weighing 5.72lbs with a 15.4" WUXGA (1920x1200) screen offers 147ppi and a pixel to weight ratio of 0.9, which is (perhaps surprisingly) a higher pixel to weight ratio than a (small and light) netbook, the Dell Mini 10 with the new high-resolution 10" 1366x768 screen (ppi of 155); its weight of 2.86lbs results in a lower pixel to weight ratio of 0.8 (at a slightly higher ppi, too). Compare this to a Macbook Air: with a 13.3" 1280x800 screen, it weighs 3 lbs; its pixel to weight ratio is 0.75. Unlike the other too, though, the Macbook air has an easier-to-read ppi of 113.
Of course, this doesn't mean that one should pick portable computers based solely (or even mostly) on pixel to weight ratios, or ppi for that matter. It is merely one possibly useful way to compare portable machines, and should be at most only one criterion among many, when making a decision.
/it permanent linkMon 09 Feb 2009 13:59
Why a netbook?
What is a netbook anyway? It's a new type of low-cost ($350-$600) notebook that is particularly small and light: it typically has a 7-10" screen, and a weight not much more than two pounds. Small and light notebooks are not new, but they have for years been quite expensive, marketted to busy executives who want something small and light to use when travelling, and are willing to pay for the privilege. But an inexpensive small and like notebook is new, so new that it has been given its own name: netbook. The rationale behind the name is that this meant to be an "internet" device: its primary role is web browsing, and office productivity applications are secondary. Such a device relies on wireless: wifi for now, but increasingly 3G and other forms of cellular data service.
Why buy one? It's affordable, by notebook standards. It's also very portable: while it's too large for a pocket, it can easily slip into a handbag. And while it may be designed for internet connectivity, it can run modest office productivity applications. But it is limited in various ways: the small screen, while generally bright and visible on most models, does not have a great deal of screen real-estate; typically 1024x600 or less. RAM and hard drive space is generally less than most notebooks or desktops, and RAM in particular is limited to a maximum of 1GB or 1.5GB, depending on the model, enough for Linux or Windows XP, but not generally enough to run Microsoft Vista quickly. It lacks any form of CD/DVD drive. And the internal CPU (generally an Intel Atom or a VIA C3) is slow, and single-core. A low-end laptop can be bought for as little as $500-$600 with a much larger screen, more memory, a built-in DVD-writer and a more powerful CPU. But it will be quite a bit larger and heavier. That in the end is the key question: is the portability of a netbook worth its tradeoffs? Sometimes yes: if one's computing needs are modest but one wants one's computer whereever you go, then portability is paramount. Sometimes no: those with more than modest computing needs will quickly run into the netbooks' limitations. But whether or not a netbook or a notebook is a better fit, it is nice to have the choice, for a reasonable price.
/it permanent linkTue 23 Sep 2008 21:52
How to buy a Computer
For years I have been asked for my advice about buying computers. My advice has changed over the years, because computers have changed, but one thing seems to be constant: a great many people seem to be very insecure about buying computers. This leads to a great deal of angst, and sometimes to purchases that are much more expensive than they need to be. But there are a few common-sense principles that are generally constant:
1. Think carefully about how the computer is going to be used.
This is the key principle that overrides all others. A computer is a tool. Tools are useful only when they can be used effectively. Do not choose a computer that does not fit with the way you use computers. For example, if you are a small person and like to work in many different places, a large and heavy laptop, or worse, a desktop, will not be a good choice: it is worth investing in something small, light and easily carried. If you are a gamer, particularly if you plan to invest a great deal of time playing games that require high-performance video, you'd best invest in a desktop with high performance graphics, even if it is expensive. Playing a demanding game on a cheaper machine with poor video performance will be frustrating. But if you merely browse the web and run productivity applications like spreadsheets and word processors, investing in high-performance gaming computing is a waste of your money.
2. If a better option is available for a lot more money, choose it
only if you know you need it.
Insecurity about buying computers prompts people to pay a great deal more money for things that they think they might need: particularly a fast CPU (the computer's processing unit) or a high-end computer model instead of a lower-end one. The price difference can be significant: a high-end model can cost 3-4x the price of a lower-end model, and a high-end CPU can more than double the cost again. For example, the base configuration of the lowest-end home desktop with the lowest-end CPU on Dell Canada's web site is currently $329; the highest-end base configuration with the highest-end CPU is $3150. That is an order of magnitude difference in price. Put another way, the high-end configuration is the price of a formal dining-room suite. The low-end configuration is the price of a single chair in that dining-room suite. If you are paying the high-end price, make sure you need what you are paying for.
3. If a better option is available for only a little more money, chose
it unless you know you don't need it.
If it only costs $20 to get a little more memory, a bit faster CPU, or a potentially useful device like a webcam, a fax modem, or a media card reader, why not get it, especially if it's much more money and less convenient to get it separately later? An integrated webcam is a $20 option on many laptops; adding later an external webcam of comparable quality that clips onto your laptop may cost you as much as $90. Or, for example, a fax modem may sound like obsolete technology, and it is, but it can be very convenient to send a fax from your computer by simply printing to the "fax" printer and typing a phone number. The one exception here is to watch out for large price increments for tiny increases in hard drive size: the price difference between a 250G and a 320G hard drive should be a on the order of $10, not $60-70. While one may argue that there is perhaps some value in paying a bit extra for the convenience of ensuring that your computer comes with a decently large hard disk, even a small hard disk these days is quite large. Another thing to consider: if the price difference between a notebook and a desktop is fairly small, and there is no compelling reason to choose a desktop over a notebook, just get a notebook.
4. Assess carefully your need for extended warranties.
Extended warranties can be expensive. However, if you are accident-prone (coffee over the keyboard, dropping your laptop), anxious or risk-adverse, an extended warranty may be worthwhile, particularly the sort that covers accidental damage. Note, though, that on average one spends much less over the lifetime of the computer to repair it (often $0) than one would pay for an extended warranty. Such warranties are often bought out of insecurity, and are highly profitable for computer vendors and technology stores. If, however, you do not expect to have free funds to handle an unexpected repair, especially if the computer is particularly expensive, an extended warranty may be worthwhile as a form of insurance.
5. Don't panic. Most of the available options are all reasonable
Most computers are quite acceptable: there are few bad choices. Choosing a computer is most often a matter of choosing the best choice from among good choices. So relax: even if you miss the best choice, you'll probably end up with a perfectly good computer.
6. Don't forget backups.
The most valuable part of your computer is your data. Make sure you have backups of it, so that if something bad happens to your computer, you will not lose your data. You can always replace a computer. Can you replace your data? The easiest way to back up data is to buy an external hard disk and copy your data to it. Buy that external hard disk when you buy your computer. Yes, you can back up to writeable DVDs if you want, or copy to flash memory of some sort, but it can be a lot of work to divide up your data into DVD-sized chunks, and backups that are a lot of work often turn into backups that are not done.
/it permanent linkTue 26 Aug 2008 09:56
Why own a Desktop computer?
The thirty-year reign of the desktop computer may be coming to an end. According to various news reports, since about the mid 2000s, notebook (or laptop) computers have been outselling desktops. More surprisingly, perhaps, miniature notebook computers like the Asus EEE PC, with small screens and low-power CPUs that are no more powerful than mainstream CPUs of a half-decade ago are becoming increasingly popular, with a flurry of new low-cost (about $500) models. The reasons are intriguing: few productivity applications such as personal databases, spreadsheets, word processors and presentation tools need more than a small fraction of today's fastest CPUs. Thus the sort of CPU tradeoffs that need to be made to ensure long battery life in a notebook are less and less noticeable in practice. Other tradeoffs are also diminishing in importance: notebook screens can be large and bright, more and more rivalling desktop screens, notebook hard drives can be spacious and increasingly fast, and the rise of USB peripherals has made a portable notebook with a couple of USB ports as easily expandable in practice as any desktop computer. While notebooks are still pricier than desktops, the price difference is steadily diminishing as manufacturing economies of scale begin to weigh in. Even many who are in the habit of using their computer in one spot most of the time are realizing that an external screen, keyboard and mouse can be added to a notebook to make it function as if it were a desktop for general use, but when necessary, the notebook can be used elswhere, providing the convenience of having one's computer along (with all its data and software) when needed, without the fuss of copying data and worrying about different versions of applications. Moreover, notebooks have been improving in those areas where they offer abilities not found in desktops: battery life has steadily increased from the one to two hours common a few years ago to three or four hours. Lightweight notebooks are increasingly available, and not all of them are expensive. Most importantly, various forms of wireless networking are becoming ubiquitous, providing internet connectivity to notebooks without the need for wires. As such, it is no surprise that notebook computers are being widely purchased, and many peoples' first computer is now a notebook, not a desktop.
There are still some good reasons to buy desktops. The lowest-cost computers are still desktops, not notebooks. The very best CPU and graphics hardware is available only for desktops, and many modern games use as much of these resources as they can get. Hence desktops suit hardcore gamers much better than notebooks. Finally, Microsoft Windows Vista generally requires much more CPU and memory than most other operating systems, and the introduction of Vista has put some pressure on computing resources; because of this, some of the less powerful notebooks are now shipping with versions of Linux or with a previous version of Microsoft Windows, such as Windows XP. Nevertheless, it seems clear that given the increasing attractiveness of notebooks in comparison to desktops, a sensible way to approach buying a computer is to simply buy a notebook unless one has some concrete reason to need a desktop.
/it permanent linkFri 11 Jul 2008 20:46
Web shell scripting
It is a very handy thing to be able to write a quick script. UNIX and its derivitives have long been superb at making this possible: they possess a great many utilities that are designed to be used both from the command line and within scripts, and they possess shells that have all the control structures one might expect from any programming language. In fact, the traditional UNIX philosophy is to write small programs that do one thing well, and then combine them using scripts into rich and powerful applications. Indeed, the UNIX scripting environment is a rich one. But it is difficult to write shell scripts for the web. The unix scripting environment is designed for files, not web forms, the contents of which are encoded as url-encoded or multipart-encoded data. Hence, while unix shell scripts are sometimes used for web applications (cgi scripts), they are relatively rare, and generally frowned upon. The reason is no surprise: url-encoded and multipart-encoded data is complex to parse, and shell scripts that parse such data using sed, awk, etc. tend to be slow and hard to get right.
But this is easily fixed. If UNIX shell scripts like files, then they should be fed files. Hence I've written a small program (in C and lex), urldecode (ftp:/ftp.cs.toronto.edu/pub/jdd/urldecode) that parses url-encoded and multipart-encoded data, and converts the data into files. No complex file encoding is used. urldecode reads url-encoded or multipart-encoded data, creates a directory, then populates it with files such that each filename is a variable name, and the file contains the variable value. So all a web shell script needs to do to parse url-encoded data is to run urldecode on the data received from a web form, then read the results out of suitably named files. While this is hardly a replacement for PHP or .NET, it does provide a surprisingly simple and straightforward way to script for the web, because it allows all the handy UNIX utilities in the UNIX shell script environment to be leveraged to process web data. That's useful.
/it permanent linkThu 19 Jun 2008 16:34
CRT to LCD computer lab replacement: how much of a difference?
We are replacing all the remaining CRTs in our Computer Science teaching labs this summer with LCD panels, a total of 84 units (we replaced about fifty last summer). It's well known that LCD panels generally use less power when displaying than CRTs do, but the question is: roughly how much power/carbon emissions will we be saving through this summer's CRT replacement?
Using a Kill-A-Watt electricity consumption meter, we measured the power consumption of our CRTs (19" Dell P992) and our new LCD panels (19" Samsung 920BM and 22" Samsung 2253BW). When displaying an image, the CRTs draws between 85W and 110W of power, depending on how bright/white the image is. In comparison, the 19" LCD draws 35W of power, and the 22" draws 41W. If we assume an average CRT power draw (when displaying) of 97.5W (the mean), that's a power savings of 62.5W for the 19" LCDs, and 56.5W for the 22". We are replacing 48 CRTs with the 19", and 36 with the 22", for a total power savings of about five kilowatts.
What does this mean over time? If we assume the machines in our labs are displaying for an average of one hour out of eight (our labs are open to students on a seven-day twenty-four hour basis), and if we consider our projected equipment lifetime of four years, this implies about twenty-two thousand kilowatt-hours saved over this period. If we multiply this by a carbon intensity ratio estimate for grid electricity of 0.0453 kgC/kWh, this suggests a savings of about a metric tonne of carbon over the expected lifetime of these LCD panels.
/it permanent linkFri 06 Jun 2008 20:42
IT Support and human nature
IT is not about computers, but about people. This may be surprising: after all, when we think about technology, we generally think about equipment, gear, gadgets, code. But this gear doesn't exist for itself alone. Quite frankly, an unused computer is nothing more than a combination of space-heater and white noise generator. The I in IT is information, and that information is generated by, used by, and valued by people. For IT to be effective, it needs to be used effectively. Technology is a tool: powerful and complex, and like all powerful and complex tools, it takes time, effort and a certain amount of talent to learn to use the tool effectively. People are social beings, and so we learn to use tools in a social context: people who "know how" teach and help those who don't. This, broadly speaking, is the logic of IT support, which, ultimately, is a social construct to ensure that those who know how to use IT tools are available to help those who need help to effectively use them.
Human beings live in the tension between the collective and the individual. This is a fancy way of saying that people live by interacting with other people in ways that range from the genuinely interpersonal to impersonal embodiments of complex social constructions. Consider the difference between "I love you" on one extreme and "One Way, Do Not Enter" on the other. Both the collective and the interpersonal elements of human interaction are present in IT support: the nature of the technology imposes the need to interact with complex technical systems, while the nature of the human beings who use the technology requires one-on-one personal interaction. Indeed, IT support fails when it becomes too much like the notion of a "computer centre", too removed from the individual and the person-to-person act of helping and receiving help. But it also fails when it becomes too individualized, because of simple economics: there are many fewer IT experts than there are people in need of their help, and the one-to-one dynamic begins to fail when there are so few on one side of the equation and so many on the other. Effective IT support requires a balance between the two.
One way to maintain this balance, if there is a "critical mass" of IT needs and resources, is to make the commitment to do both at the same time. At the Department of Computer Science at the University of Toronto, we have found an effective way to do this for research computing support. We have a broad and diverse community of researchers, divided up into research groups. They have access to a core IT infrastructure of technical services, equipment and highly skilled staff to run it. But the department also has dedicated IT support staff who partner with specific groups: each group has their own person, their own IT expert, to call upon, and this person knows the people in the group and their research. We call such staff "points of contact", or POCs. Research IT support in the department is not a matter of contacting an anonymous service centre in one's moment of need, in the desperate hope of finding a sympathetic stranger with the requisite skills. Instead, it becomes an interaction between people who know each other, people who have been able to build a trust-relationship over time. Yet the economics of purely individualized support have been overcome: this organization "scales", because POCs do not need to do everything themselves. They and their groups have access to a complete infrastructure that offers common services across the entire department: secure and reliable file storage, web services, email, and more, and the expertise of the skilled technical staff that run it. POCs are freed to focus more fully on the unique, individualized needs of the research groups they serve.
Sounds idyllic, doesn't it? It is, in theory. In practice, there are plenty of challenges. Communication is key: POCs need to communicate well with their groups, and with other POCs and the core infrastructure staff. And the groups themselves need to be responsible for communicating with their POC: in human relationships, even those of IT support, there is both benefit and burden in knowing and understanding the other. A POC who is "shut out" of the research activities of the group is hampered in any effort to provide support that is well-tuned to those activites. That does not mean that poor support will result: even generic IT support, with a human face, can be superior to that offered by an anonymous service centre. But it does mean that the full benefit of having a POC will not be realized. But if a group and a POC fully commit to regular communication, the quality of IT support can be significantly greater than anything from a large service centre, because the POC who is offering that support has the potential to be a creative participant in the group's mission, the very mission that the group's use of IT is intended to serve.
/it permanent linkFri 30 May 2008 16:35
Blogging: Keeping It Simple
When I decided it was time to start blogging about information technology and information communication issues, I needed to choose some suitable blog software, something that would provide good results but also be easy to use. Open source is preferable. So I decided to do a quick web search and see what I could find. Most blog software seems to be pretty heavyweight: a database on the back-end to hold the blog entries, plus some sort of complex PHP web front end to display them. But this makes no sense to me: why use a database for blog entries? Blog entries are simple bits of text data that are organized in simple ways. There's no need for powerful query languages, transactional locking, and the various good things databases provide. These things are not free: databases have overhead to set up and run - I'm not worried so much about the computational resource overhead, but rather the human overhead: the time and energy required to configure, maintain, and back up databases. Do we really need such a thing for a blog?
Happily, I found blosxom, a piece of open-source blogging software that consists of a simple Perl CGI that uses a simple filesystem backend (a directory hierarchy) to hold all the blog entries. This is a nicely designed piece of software: simple, straightforward, low overhead, and very quick and easy to get going. It's also quite customizable. Clever simplifying details abound: for example, the date of the blog entry is simply the timestamp of the file, you create a new blog entry by simply creating a new file of the right name wherever you wish in the blog directory hierarchy, and writing something into it with your favourite text editor. You can organize your blog entries by topic if you want, and blosxom gets it all right. RSS seems all the rage these days: blosxom does that too: just add /index.rss to the end of the URL. For me, the only annoying bit of this software so far is the spelling of its name: I keep typing "bloxsom" for some reason.
Why is blosxom so good? Because it leverages what is already present on most systems: the filesystem, rather than introduce a complex, powerful and costly tool (a relational database) when it's not really needed. Kudos to Rael Dornfest, who, instead of taking the most popular or obvious approach, took the time to understand the problem being solved and the tools already available to solve it. This is an example of sensible economizing: human time and effort is a valuable commodity, the use of powerful tools (e.g. relational databases) uses up some of that commodity, and so such tools should be avoided unless they are really needed. If you think this sounds a little like "green" or "environmental" thinking, you're quite right: conserving energy to preserve the environment is very similar to conserving human energy to preserve the human environment. Just as the natural environment suffers strains from excessive energy consumption, so the human environment suffers from excessive demands on each person's time and energy. In both realms, economical thinking at design time is a prerequisite to good technology.
/it permanent link