John DiMarco on Computing (and occasionally other things)

John DiMarco on Computing (and occasionally other things)
I welcome comments by email to jdd at cs.toronto.edu.

Wed 17 Aug 2022 10:54

Innovation vs Control: Finding the Right Balance for Computing

Handshake, Digital styling — Image by Gerd Altmann from Pixabay

In computing, there is a constant tension between the need to exercise proper control over a system, to ensure system security, reliability, and resiliency, and the need to make room for innovation: the imagining, testing, and implementing of new ideas and approaches. There is tension because the things that are typically implemented to ensure control, such as the imposition of checks and constraints, conflict with the things needed for innovation: the removal of constraints, the use of things in ways different from how originally envisioned, and the construction and testing of experimental devices and software programs that almost certainly are not yet properly understood or fully tested.

Some organizations address this conflict by freely choosing control over innovation, turning it into a competitive advantage. Consider Starbucks, Tim Hortons, McDonalds: these are all large companies whose competitive advantage is critically dependent on the consistent implementation of a central vision across a multitude of disparate locations, many that are managed by franchise partners. Essentially all of the organization's computing is focused on this mission of consistency. And it works. Who hasn't travelled with small children in a car on a road trip, and after many hours on the road, spotted, with some relief, a McDonalds or a Tim Hortons en route? The relief is in the fact that even when travelling in a strange place, here is a familiar restaurant where you know what to expect from the food, where things will be much the same as the Tim Hortons or the McDonalds near home.

Other organizations have no choice about where they stand on the matter. For the modern bank, computers, rather than vaults, are where wealth is stored and managed. Whether they want to innovate or not, banks cannot risk the use of computing that is not fully controlled, audited, and de-risked. The same holds in general for most financial institutions, where the constant efforts, sometimes successful, of would-be thieves to exploit computers to gain unauthorized access to wealth, make it unreasonably risky for a financial organization's computers to be anything but fully locked down and fully controlled. Even non-financial institutions, when sufficiently large, will often have substantial financial computing activity because of the size and scale of their operations: this computing, too, needs to be properly controlled, protected and audited.

Yet other organizations are forced into the opposite extreme. Start-up companies can be severely resource-constrained, making it difficult for those companies to make the sort of investments in highly controlled computing that financial institutions are capable of making. For start-ups innovating in the computing space, such as tech start-ups, they may not be able to consider the possibility. Highly controlled computer systems can have very restrictive designs, and when these restrictions hinder the innovation needed to implement the company's product, it will have no choice but to pursue some other form of computing. After all, the company rises or falls on the success of its innovation. That is not to say that controlled enterprise computing is unimportant for such companies: quite the contrary. The success of a start-up is highly dependent on a viable ecosystem that provides known pathways to innovation while still moving towards operating in a suitably controlled, production-ready way that is necessary for any successful business. But for a technology start-up, enterprise computing can never come at the expense of technological innovation. The basic existence of the start-up company depends on its ability to innovate: without innovation, there can be no company. In general, this truth will hold in some form for any technology company, even well beyond the start-up stage.

The tension between innovation and control comes to the fore in a different way at research-intensive universities, which are large organizations with complex missions that need enterprise computing to carry out their task of educating students on a broad scale, but are also organizations committed to research, an activity that is, by its very nature, an exploration into things not yet fully understood. This conflict is particularly acute in units within such universities that do research into computing itself, such as computer science and computer engineering departments, because in such places, the computer must serve both as the locus of research and experimentation in addition to being a tool for implementing institutional and departmental processes and the exercise of legitimate control.

I've had the privilege of working in such a department, Computer Science, at such a university (the University of Toronto) for more than three decades now, most of that time in a computing leadership role, and I know this tension all too well. It is sometimes exhausting, but at the same time, it can also be a source of creative energy: yes, it is a barrier, like a mountain athwart your path, but also, as a mountain to a mountain-climber, a challenge to be overcome with determination, planning, insight, and endurance. This challenge can be successfully overcome at a good university, because in addition to a typical large organization's commitment to basic values such as accountability, equity, reliability and security, the university is equally committed to fundamental academic values such as creativity, innovation and excellence. I look for ways to achieve both. Over the years, I have had some successes. My department has produced some groundbreaking research using academic computing that my technical staff have been able to provide, and the department has been able to operate (and successfully interoperate) in good cooperation with enterprise computing at the divisional level, and with the central university as well.

Yet I believe even more is possible. I have lived the tension in both directions: to our researchers I at times have had to play the regulator, having to impose constraints on computing to try to ensure acceptable reliability, accountability and security. To our central university computing organizations, I at times have had to advocate for looser controls to create more room to innovate, sometimes in opposition to proposals intended to increase reliability, security and accountability. When things went badly, it was because one side or the other decided that the other's concern is not their problem, and tried to force or sidestep the issue. But when things went well, and most often it has, it is because both sides genuinely recognized that at a research-intensive institution, everyone needs to work within the tension between the need to innovate and the need to regulate. As a body needs both a skeleton and flesh, so too does a research university need both regulation and innovation: without one, it collapses into a puddle of jelly; without the other, into a heap of dry bones.

With both being needed, one challenge to overcome is the fact that those responsible for enterprise computing cannot be the same people responsible for innovative research computing, and that is necessarily so. The skill-sets vary, the domains vary, the user-base is quite different, and the scale varies. If the university were to entrust both computing innovation for computer science or computer engineering to the same groups that provide enterprise computing for an entire large university, one of two things would happen. Either the control necessary for a large enterprise would be diminished in order to make room for innovation, or, more likely, innovation would be stifled because of the need to create sufficiently controlled enterprise computing at a suitable scale for the entire university. Thus, necessarily, those who support unit research computing, where the innovation takes place, will be different people from those who support enterprise computing. But that can be a strength, not a weakness. Rather than see each other as rivals, the two groups can partner, embracing the tension by recognizing each others' expertise and each recognizing the others' importance for the University as a whole. Partnership brings many potential benefits: if innovation becomes needed in new areas, for example, when the rise of data science increasingly drives computing innovation outside of the traditional computer science and computer engineering domains, the partnership can be there to support it. Similarly, as the computing landscape shifts, and new controls and new regulation becomes needed to address, for example, emergent threats in information security, the partnership can be there to support it. There is no organization potentially better suited for such a partnership than a large research university, which, unlike a financial institution, is profoundly committed to research and innovation through its academic mission, but also, unlike a start-up, is a large and complex institution with deep and longstanding responsibilities to its students, faculty and community, obligated to carry out the enterprise computing mission of accountability, reliability and security.

So what might a partnership look like? It can take a number of different forms, but in my view, whatever form it takes, it should have three key characteristics:

Locality
Respectful Listening
Practical Collaboration

Locality means that the computing people responsible for research computing must stay close to the researchers who are innovating. This is necessary for strictly practical reasons: all the good will in the world is not enough to make up for a lack of knowledge of what is needed most by researchers at a particular time. For example, deep learning is the dominant approach in Artificial Intelligence today because a few years ago, our technical staff who supported research computing worked very closely with researchers who were pursing deep learning research, customizing the computing as necessary to meet the research needs. This not only meant that we turned graphics cards into computation engines at a time when this was not at all common and not yet up to enterprise standards of reliability, it even means that at one point we set up a research computer in a researcher's bedroom so that he could personally watch over a key computing job running day and night for the better part of a week. While this sort of customizability is not always needed, and sometimes is not even possible (one could never run a large computer centre this way), being able to do it if necessary is a key research asset. A university will never be able to fully support research computing solely from a central vantage-point. A commitment to ensuring local presence and support of research computing operating at the researcher level is necessary.

Respectful Listening means that the computing people responsible for research computing at the unit level where research actually happens, and the people responsible for enterprise computing divisionally and centrally must communicate frequently, with an up-front commitment to hear what the other is saying and take it into account. When problems arise, respectful listening means that those problems will not be "solved" by simply overruling or ignoring the other, to pursue a simplistic solution that suits only one side. It also means a profound commitment to stepping away from traditional organizational authority structures: just because the innovative computing is situated in a department and the enterprise computing is lead from the centre should not mean the centre should force its view on the department, just because it can. Similarly, just because unit research computing is driven by research faculty who enjoy substantial autonomy and academic freedom, their research computing group at the unit level should not simply ignore or sidestep what the enterprise is saying, just because it can. Rather, both sides need to respect the other, listening to, not disregarding, the other.

Practical Collaboration means that enterprise computing and unit research computing need to work together in a collaborative way that respects and reflects the timelines and resource constraints of each side. Centrally offered computing facilities should support and empower research where they can, but in a practical way: it may not be possible to make a central facility so flexible and customizable that all research can be pursued. It is acceptable to capture some research needs without feeling an obligation to support the entire "long tail" of increasingly customized research projects. Unit research computing will need to recognize that the need to scale a centralized computing service may constrain the amount of customizability that may be possible. Similarly, unit research computing should use, rather than duplicate, central services where it makes sense, and run its own services where that makes sense. Both central and unit research computing should recognize that there is a legitimate middle ground where some duplication of services is going to occur: sometimes the effort required to integrate a large scalable central service into a smaller customizable research service is too great, and sometimes the research advantages of having a locally-run standardized service on which experiments can more easily be built, can more than outweigh any sort of economies of scale that getting rid of the unit service in favour of a central service could theoretically provide. Hence the collaboration must be practical: rather than slavishly pursue principles, it must be realistic, grounded, balanced, sensible. It should recognize that one size does not always fit all, and responsibly and collaboratively allocate resources in order to preserve the good of the research mission.

It is that research mission, the ability to innovate, that can make computing so transformative at a research university. Yet while innovative computing can indeed produce transformative change, it cannot be any change, and not at any cost. Computing is a change agent, yes, but it is also a critical component in the maintenance of an organization's commitment to reliability, accountability, equity, and good operation. Success is found in the maintenance of a suitable balance between the need to innovate and the need to control. When an organization critically depends on both factors, as a research university invariably does, I believe collaborative partnerships between respective computing groups is the best way to maintain the balance necessary for success.

/it permanent link