alphabet = 'abcdefghijklmnopqrstuvwxyz ,.' trainenglish={ 'we describe a way of modelling high dimensional datavectors, such as images, by using an unsupervised, non linear, multilayer neural network in which the activity of each neuron like unit makes an additive contribution to a global energy score that indicates how surprised the network is by the data vector. units whose activities represent violations of learned constraints contribute positively to the global energy and units whose activities represent the presence of familiar features contribute negatively. the connection weights which determine how the activity of each unit depends on the activities in earlier layers are learned by minimizing the energy assigned to data vectors that are actually observed and maximizing the energy assigned to confabulations. the confabulations are generated by perturbing an observed data vector in a direction that decreases its energy under the current model. this learning rule eliminates any systematic differences between the data and the confabulations. backpropagation of energy derivatives through the multilayer network is used both for computing the derivatives that are needed to adjust the weights and for computing how to perturb an observed data vector to produce a confabulation that has lower energy. the backpropagation algorithm trains the units in the intermediate layers of a feedforward neural net to represent features of the input vector that are useful for predicting the desired output. this is achieved by propagating information about the discrepancy between the actual output and the desired output backwards through the net to compute how to change the connection weights in a direction that reduces the discrepancy. in this paper we show how to use backpropagation to learn features and constraints when each input vector is not accompanied by a supervision signal that specifies the desired output. when no desired output is specified, it is not immediately obvious what the goal of learning should be. we assume here that the aim is to characterize the observed data in terms of many different features and constraints that can be interpreted as hidden factors. these hidden factors could be used for subsequent decision making or they could be used to detect highly improbable data vectors by using the globalenergy. we define the probabilty that the network assigns to a data vector in terms of its global energy, run moginit to create training and validation datasets from ten random gaussians. now, using performance on the validation data, determine the optimal number of gaussians to fit to the data. present your results as a graph that plots both the validation density and the training density as a function of the number of gaussians. please do not change the random seeds in moginit to get different data. you can do this later if you want to just play around and get a feel for how the algorithm behaves. change moginit to use only five cases per gaussian and repeat the experiment above without the random seeds.'}; K=10; %% YOU NEED TO PICK THE NUMBER OF HIDDEN STATES IN THE HMM cyc=100; %% YOU NEED TO PICK THE NUMBER OF EPOCHS tol=0.000001; [E,P,Pi,LL]=dhmm2(trainenglish,alphabet,K,cyc,tol); P=0.999*P + (1/K)*0.001*ones(size(P)); E=0.999*E + (1/size(alphabet,2))*0.001*ones(size(E)); Pi=0.999*Pi + (1/K)*0.001*ones(size(Pi));