============================================== Continuous Profile Model (CPM) Matlab Toolbox Author: Jennifer Listgarten October 11th, 2006. ============================================== Analysis of Sibling Time Series Data: Alignment and Difference Detection Jennifer Listgarten, Ph.D. Thesis, Department of Computer Science, University of Toronto, 2006. Conditions of Use: This toolbox is available free for educational and research use. All use of this software is at the user's own risk. Please cite my thesis if using this toolbox. Things you need to use the code: ------------------------------------------------ - Add the CPM toolbox path to your Matlab path. You can do this either interactively with the GUI, or using the commands addpath and genpath provided by Matlab. Make sure to go recursively down into the directory structure. - For EMCPM only: Compile the MEX function 'maxSparseMEX.c' found in ./cpm_functions/MEX/maxSparseMEX.c (On my linux machine and PC, I did this using the command 'mex /u/jenn/phd/MS/matlabCode/functions/MEX/maxSparseMEX.c' inside of Matlab) - Put Carl Rasmussen's (newest! 2006) version of minimize.m on your path (http://www.kyb.tuebingen.mpg.de/bs/people/carl/code/minimize/minimize.m) ******You may want to remove the fprint statements from this code so that****** ******it doesn't dump all over the standard output. ****** - For HBCPM only (THIS CODE IS NOT CURRENTLY READY FOR USE): Install Tom Minka's lightspeed and fastfit toolboxes (http://research.microsoft.com/~minka/ [specifically, I use the functions: (to be filled in at some point)] Miscellaneous Notes ----------------------------------------------- - I have been using MATLAB Version 7.1.0.183 (R14) Service Pack 3, on a unix operating system. The toolbox should work on any operating system, but may not work on significantly older versions of Matlab. - The 'EM-CPM' refers to the model presented in the paper, "Multiple alignment of continuous time series" (NIPS 2004), with some small changes (and also extensions), documented in my PhD thesis (forthcoming), while the 'HB-CPM' is the model presented in "Bayesian Detection of Infrequent Differences in Sets of Time Series with Shared Structure" (NIPS 2006) (also reported in my thesis). - numBins/numFeatures -- the dimensionality of each time point. The computations scale roughly linearly with numBins. Generally, one need not use all dimensions, but can use some reduced dimensionality version of the data, and then 'unroll' the recovered alignment to the desired feature space (i.e. with full dimensionality). On data sets that I have tried, I frequently ran in to numerical problems when using more than 24 numBins, so you might expect the same. - lambda (amount of smoothing/regularization on latent trace) -- Although one can in principle use no smoothing, if there are any parts of the latent trace that do not get mapped to (example the ends), then these can be completely unspecified by the data. The code should handle this, but it might nevertheless be wise to generally include a very tiny amount of smoothing. ('tiny' is of course relative to the data set being used.) - Although the time complexity of the EM-CPM scales roughly linearly with the length of the observed time series, my implementation has a memory usage which is much worse than that, so this could become a limitation. I have never worked with time series longer than about 1000 time points to date.