==============================================
Continuous Profile Model (CPM) Matlab Toolbox
Author: Jennifer Listgarten
October 11th, 2006.
==============================================
Analysis of Sibling Time Series Data: Alignment and Difference Detection
Jennifer Listgarten, Ph.D. Thesis, Department of Computer Science,
University of Toronto, 2006.
Conditions of Use: This toolbox is available free for educational and research
use. All use of this software is at the user's own risk.
Please cite my thesis if using this toolbox.
Things you need to use the code:
------------------------------------------------
- Add the CPM toolbox path to your Matlab path. You can do
this either interactively with the GUI, or using the commands
addpath and genpath provided by Matlab. Make sure to go recursively
down into the directory structure.
- For EMCPM only: Compile the MEX function 'maxSparseMEX.c' found in
./cpm_functions/MEX/maxSparseMEX.c
(On my linux machine and PC, I did this using
the command 'mex /u/jenn/phd/MS/matlabCode/functions/MEX/maxSparseMEX.c'
inside of Matlab)
- Put Carl Rasmussen's (newest! 2006) version of minimize.m on your path
(http://www.kyb.tuebingen.mpg.de/bs/people/carl/code/minimize/minimize.m)
******You may want to remove the fprint statements from this code so that******
******it doesn't dump all over the standard output. ******
- For HBCPM only (THIS CODE IS NOT CURRENTLY READY FOR USE):
Install Tom Minka's lightspeed and fastfit toolboxes
(http://research.microsoft.com/~minka/
[specifically, I use the functions: (to be filled in at some point)]
Miscellaneous Notes
-----------------------------------------------
- I have been using MATLAB Version 7.1.0.183 (R14) Service Pack 3, on
a unix operating system. The toolbox should work on any operating system,
but may not work on significantly older versions of Matlab.
- The 'EM-CPM' refers to the model presented in the paper, "Multiple
alignment of continuous time series" (NIPS 2004), with some small changes
(and also extensions), documented in my PhD thesis (forthcoming), while the
'HB-CPM' is the model presented in "Bayesian Detection of Infrequent
Differences in Sets of Time Series with Shared Structure" (NIPS 2006)
(also reported in my thesis).
- numBins/numFeatures -- the dimensionality of each time point.
The computations scale roughly linearly with numBins. Generally, one need
not use all dimensions, but can use some reduced dimensionality
version of the data, and then 'unroll' the recovered alignment
to the desired feature space (i.e. with full dimensionality).
On data sets that I have tried, I frequently ran in to numerical problems
when using more than 24 numBins, so you might expect the same.
- lambda (amount of smoothing/regularization on latent trace) -- Although one
can in principle use no smoothing, if there are any parts of the latent trace
that do not get mapped to (example the ends), then these can be completely
unspecified by the data. The code should handle this, but it
might nevertheless be wise to generally include a very tiny amount of
smoothing. ('tiny' is of course relative to the data set being used.)
- Although the time complexity of the EM-CPM scales roughly linearly with the length
of the observed time series, my implementation has a memory usage which is much
worse than that, so this could become a limitation. I have
never worked with time series longer than about 1000 time points to date.