Learning and Using POMDP models of Patient-Caregiver Interactions During Activities of Daily Living

Older adults living with cognitive disabilities (such as Alzheimer's disease or other forms of dementia) have difficulty completing activities of daily living (ADLs). They forget the proper sequence of tasks that need to be completed, or they lose track of the steps that they have already completed. The current solution is to have a human caregiver assisting the patients at all times, who prompts them for tasks or reminds them of their situation. The dependence on a caregiver is difficult for the patient, and can lead to anger and helplessness, particularly for private ADLs such as using the washroom (LoPresti, et al., 2004).

A cognitive orthosis is a system to automate this caregiving process, in order to provide alternative solutions for patients and to reduce caregiver burden (LoPresti, et al., 2004). Such systems would be able to non-invasively monitor the patient, stepping in to provide help in the form of verbal or visual prompts when necessary, and ensuring the health and safety of the patient. Computer vision is an ideal sensor for such a task because it is not invasive, and has the ability to generalise across tasks. This is in contrast to more invasive and interactive monitoring tools such as bracelets, specialised sensors, or call devices, which may require the patient to ask for help, may need to be carried or attached to the patient, and may need to be re-engineered for each task.

The ultimate goal of a computer-vision based cognitive orthosis for assisting dementia patients during ADLs is to choose a prompting strategy that maximises some notion of utility over the possible outcomes given visual observations of the patient.

Our research focusses on learning models of ADL behavior. The principal benefit of the model we describe is that it does not require patient behaviors to be labeled in video sequences. The learning method discovers the classes of behaviors present in the training data, and what their relationship is to the task state. The burden on human experts for the training of the system is thus reduced, for they only need to provide intermittent annotations of some small number of variables. For example, for handwashing, these variables describe whether the patient's hands are wet, soapy, dirty or clean. After training, the model can be used to infer the task state from unlabeled data by inferring what behaviors are taking place, and how they are advancing (or retarding) the task state. In future, this inference will be used to select appropriate prompting actions. From a computer vision perspective, the features we use must be able to generalise across tasks, contexts, and individuals. Thus, we do not want to engineer features for each ADL, such as, for example, skin color for the detection of hands during handwashing (Mihailidis et al., 2004[6]). Instead, we want features which can be learned from training data in such a way that the recognised behaviors are most useful for predicting state or value.