John DiMarco on Computing (and occasionally other things)

John DiMarco on Computing (and occasionally other things)
I welcome comments by email to jdd at cs.toronto.edu.

Wed 23 Nov 2022 10:31

Data Classification and Information Security Standards

White ball with scattered enscribed zeros and ones in columns, seen through a blurred semi-transparent foreground of scattered zeros and ones in columns

Not all data requires equal amounts of information security protection. It can be helpful to classify data by the amount of protection it needs. We do this naturally when we talk about data being "public" or "private".

Public data is data meant to be disclosed. It still needs some protection against being altered, deleted or defaced, but it does not need to be protected against disclosure. In contrast, private data is not meant to be disclosed to anyone other than those who are authorized to access it.

Private data varies in sensitivity. Some data is private only because it hasn't yet been made public. At a University, much research data is in this category. When the research is underway, data is not yet made public because the research has not yet been published, but it is destined for eventual publication. The same is true for much teaching material. While it is being worked on, it is not yet made public, but when it is are complete, it will be disclosed as part of the teaching process.

Other private data is much more sensitive. Identifiable personal information about living or recently deceased persons is a common case. At a university, some research may involve data like this, and most administration will involve personal information. Student grades and personnel records are all personal information, and some financial data too. Unless appropriate permission to disclose personal information has been granted by the people whose data it is, the university will have an obligation to maintain their privacy by ensuring that the information is not disclosed inappropriately. In Ontario, where the University of Toronto is located, privacy protection for personal information is defined and regulated by the Freedom of Information and Protection of Privacy Act (FIPPA).

Some private data is even more sensitive, such as patient medical records. In Ontario, such records are considered personal health information (PHI), which is regulated by the Personal Health Information Protection Act (PHIPA). PHIPA imposes some fairly significant requirements on the handling of PHI: for instance, it requires a detailed electronic audit log of all accesses to electronically stored PHI. The University of Toronto does significant amounts of teaching and research in areas of health, so it is worthwhile for the University to consider in general how it will handle such data.

For these reasons, the University defines four levels of data sensitivity as part of its Data Classification system. Level 4 is for highly sensitive data such as PHI as defined by PHIPA. Level 3 is for personal information as defined by FIPPA. Level 2 is for private data not classified at higher levels, and Level 1 is for public data.

This four-tier system roughly parallels the different types of computer systems that the University uses to handle data. Some systems, such as digital signage systems or public-facing web servers, are designed to disseminate public information (level 1). Other systems, suitable for up to level 2 data, exist mostly at the departmental level in support of academic activites such as research computing and/or the development of teaching materials. An astronomer may, for instance, analyze telescope data, a botanist may model nutrient flow in plant cells, a chemist may use software to visualize molecular bonds, while an economist may use broad financial indicators to calculate the strength of national economies. Still other systems, suitable for up to level 3 data, are used for administration, such as the processing of student records. These include smaller systems used, for example, by business officers in departmental units, as well as large institution-wide systems such as ROSI or AMS. Most general-purpose University systems used for data storage or messaging, such as the University's Microsoft 365 service, would typically be expected to hold some level 3 data, because personal information is quite widespread at a university. After all, a university educates students, and so various types of personal information about students are frequently part of the university's business. This is not normally the case, though, for level 4 data. Systems designed for level 4 data are much rarer at the University, and generally come into play only in situations where, for example, University research involves the health records of identifiable individuals. These systems will benefit from greater data security protection to address the greater risks associated with this sort of data.

A key advantage of the University's four levels of data classification is that the University can establish a Information Security Standard that is tiered accordingly. Systems designed to handle lower risk data (such as level 1 or 2) can be held to a less onerous and costly set of data security requirements, while systems designed to handle higher risk data (especially level 4) can be held to more protective, though more costly, requirements. The University's Information Security Standard is designed so that for each control (a system restriction or requirement), the University's standard indicates whether it is optional, recommended, or mandatory for systems handling a particular level of data. If a system is designed to handle data up to that level, the standard indicates both the set of controls to be considered, and whether or not those controls can, should, or must be adopted.

An obvious question here is what to do when someone puts data on a system that is of greater sensitivity (a higher data classification) than the system is designed to handle. Most likely, nobody will try to use a digital signage system to handle personnel records, but it is quite plausible that professors might find it convenient to use research computers, designed for level 2 data, to process student marks (level 3 data) in courses they are teaching. Similarly, someone handling medical records may wish to make use of the University's general-purpose Microsoft 365 service because of its convenience, but it is a service that is not designed for data of such sensitivity and may well not provide the detailed electronic audit log required by Ontario law. For this reason, clear communication and user training will be required. Handling data appropriately is everyone's responsibility. Training need not be complicated. It is not normally difficult to explain, or to understand, that one should not put patient medical records into email, for example, or use local research computers for personnel records or student marks. For people handling the most sensitive types of data (level 4), more training will be needed, but the number of people at the University who need to handle such data regularly are comparatively few.

The underlying motivation for the University's approach is to protect riskier data with greater, more costly, protections, without having to pay the costs of applying those protections everywhere. The university's resources are thus being applied strategically, deploying them where they matter most, but not in places where the risk does not warrant the expense. This approach is not meant to preclude additional protections where they make sense. If there are risks of academic or industrial espionage, for example, or some other risk beyond the classification of the data being used, one may choose to impose more restrictions on a system than the university's Information Security Standard may require. But the general principle remains: the riskiness of the data on a system should guide and inform what needs to be done to protect it.

/it permanent link