Contact information
-
Instructor Frank Rudzicz Office 550 University Ave., rm 12-175, Toronto ON, M5G 2A2 Office hours ad hoc Office phone 416 597 3422 x7971 Email frank@cs.toronto.[EDUCATION] (fix the suffix)
Course outline
-
This is a graduate course broadly on topics of speech processing by machine including digital signal processing, automatic speech recognition, and speech synthesis. The theme this year is Speech in healthcare and assistive technologies which will include automatic dictation of speech for medical records, analysis of speech in language pathologies (e.g., in cerebral palsy, Parkinson's disease, and Alzheimer's disease), and assistive technologies such as text-to-speech (with and without brain-computer interfaces) for people with limited speech ability.
News and announcements
-
- LECTURE CANCELLED: 22 September. If you wish to discuss your project proposal, please contact me directly.
Lecture materials
-
Week Title Speaker Supplemental material 8 Sep. Introduction to speech signal processing Frank Rudzicz 15 Sep. Introduction to clinical and biomedical aspects of speech Frank Rudzicz 29 Sep. - (1 hour) B.N. Pasley, S.V. David, N. Mesgarani, A. Flinker, S.A. Shamma, N.E. Crone, R.T. Knight, E.F. Chang (2012) Reconstructing Speech from Human Auditory Cortex. PLoS ONE Biology, 10(1):1-13.
- (1/2 hour) H-Y Wu, M. Rubinstein, E. Shih, J. Guttag, F. Durand, W.T. Freeman (2012) Eulerian video magnification for revealing subtle changes in the world. ACM Transactions on Graphics, 31(4).
- Alex Francois-Nienaber
- Orion Buske
6 Oct. - (1 hour) L. Feenaughty, K. Tjaden, J. Sussman (2014) Relationship between acoustic measures and judgments of intelligibility in Parkinson's disease: A within-speaker approach. Clinical Linguistics & Phonetics, pages 1--22.
- Teresa Valenzano
20 Oct. - (1/2 hour) K.L. Lansford, J.M. Liss (2014) Vowel acoustics in dysarthria: Speech disorder diagnosis and classification. Journal of Speech, Language, and Hearing Research, 57, pages 57-67
- (1/2 hour) A. Temko, C. Nadeu, W. Marnane, G. Boylan, G. Lightbody (2011) EEG Signal Description with Spectral-Envelope-Based Speech Recognition Features for Detection of Neonatal Seizures. IEEE Transactions on Information Technology in Biomedicine, 15(6): 839-847.
- TBD
- Gillian DeBoer
- Ladislav Rampasek
- Narges Norouzi
27 Oct. - (1/2 hour) K. Brigham, B.V.K.V. Kumar (2010) Imagined Speech Classification with EEG Signals for Silent Communication: A Preliminary Investigation into Synthetic Telepathy. Proceedings of IEEE International Conference on Bioinformatics and Biomedical Engineering (iCBBE), pages 1-4.
- (1/2 hour) C.S. DaSalla, H. Kambara, M. Sato, Y. Koike (2009) Single-trial classification of vowel speech imagery using common spatial patterns. Neural Networks, 22(9):1334-1339.
- TBD
- Peter Hamilton
- Peter Hamilton
- Kuan-Chieh Wang
3 Nov. - (1/2 hour) J. Lee, K.C. Hustad, G. Weismer (2014) Predicting Speech Intelligibility With a Multiple Speech Subsystems Approach in Children With Cerebral Palsy. Journal of Speech, Language, and Hearing Research, preprint.
- (1/2 hour) R. Patel (2002) Prosodic Control in Severe Dysarthria. Journal of Speech, Language, and Hearing Research, 45(5):858-870.
- (1/2 hour) A.J. Sporka, T. Felzer, S.H. Kurniawan, O. Poláček, P. Haiduk, and I.S. MacKenzie (2011) CHANTI: predictive text entry using non-verbal vocal input. Proceedings of the SIGCHI Conference on Human Factors in Computing System, pages 2463-2472.
- Gillian DeBoer
- Aryan Arbabi
- Aryan Arbabi
10 Nov. - (1 hour) S. Petrik, C. Drexel, L. Fessler, J. Jancsary, A. Klein, G. Kubin, J. Matiasek, F. Pernkopf, H. Trost (2011) Semantic and phonetic automatic reconstruction of medical dictations. Computer Speech & Language, 25(2):363-385.
- Arjun Subramanian
24 Nov. - (1 hour) Y. Yunusova, J.S. Rosenthal, K. Rudy, M. Baljko, J. Daskalogiannakis, J. (2012). Positional targets for lingual consonants defined using electromagnetic articulography. Journal of the Acoustical Society of America, 132(2):1027–1038.
- (1/2 hour) D. Bone, T. Chaspari, K. Audhkhasi, J. Gibson, A. Tsiartas, M. Van Segbroeck, M. Li, S. Lee, S. Narayanan. (2013) Classifying language-related developmental disorders from speech cues: the promise and the potential confounds. In Proceedings of INTERSPEECH 2013, pages 182-186.
- (1/2 hour) P.O. Kristensson, K. Vertanen (2012). The Potential of Dwell-Free Eye-Typing for Fast Assistive Gaze Communication. Proceedings of ETRA 2012 pages 241-244, Santa Barbara CA.
- Rojin Majd
- Ladislav Rampasek
- Orion Buske
1 Dec. - (1 hour) A.B. Kain, J.-P. Hosom, X. Niu, J.P.H. van Santen, M. Fried-Oken, J. Staehely (2007) Improving the intelligibility of dysarthric speech. Speech Communication, 49(9):743-759.
- (1 hour) E.W. Healy, S.E. Yoho, Y. Wang, D. Wang (2013) An algorithm to improve speech recognition in noise for hearing-impaired listeners. Journal of the Acoustical Society of America, 134(4):3029-38.
- Stacey June Oue
- Sara Sabour Rouh Aghdam
8 Dec. - (1/2 hour) A. Tsanas, M.A. Little, P.E. McSharry, J. Spielman, L.O. Ramig (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Transactions on Biomedical Engineering, 59(5):1264-1271.
- (1/2 hour) D. Hakkani-Tur, D. Vergyri, G. Tur (2010) Speech-based automated cognitive status assessment. Proceedings of Interspeech 2010, pages 1-4.
- (1 hour) T. Nose and T. Kobayashi (2011) Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency. Speech Communication, 53(7):973-985.
- Maria Yancheva
- Maria Yancheva
- Moritz Stiefel
Suggested readings
You are strongly encouraged to select readings from the remaining list below to present. Papers are preceded by the length of their talks in hours.
Speech recognition in healthcare
- (1/2 hour) H.P. Kang, S.J. Sirintrapun, R.J. Nestler, A.V. Parwani (2010) Experience With Voice Recognition in Surgical Pathology at a Large Academic Multi-Institutional Center. American Journal of Clinical Pathology, 133:156-159.
- (1/2 hour) L. Galescu, J. Allen, G. Ferguson, J. Quinn, M. Swift (2009) Speech Recognition in a Dialog System for Patient Health Monitoring. Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM09) Workshop on NLP Approaches for Unmet Information Needs in Health Care, pages 1-4.
Speech-based communication aids
- (1/2 hour) A.R. Toth, A.W. Black. (2007) Using articulatory position data in voice transformation. In Proceedings of SSW, pages 1-6.
Speech-based diagnosis
- All suggested papers taken!
Clinically-relevant features of speech & other
- (1/2 hour) K.-h. Chang, D. Fisher, J. Canny. (2011) Ammon: A speech analysis library for analyzing affect, stress, and mental health on mobile phones.. Proceedings of PhoneSense 2011
Optional readings - general introduction
-
Optional Foundations of Statistical Natural Language Processing C. Manning and H. Schutze Optional Speech and Language Processing D. Jurafsky and J.H. Martin Optional Spoken Language Processing: A Guide to Theory, Algorithm, and System Development X. Huang, A. Acero, and H.-W. Hon
Evaluation policies
- General
- You will be graded on a 1-hour in-class presentation (or two half-hour presentations), overall participation, and a final project report. The relative proportions of these grades are as follows:
Class presentation/participation 20% Final project 80% - Collaboration and plagiarism
- No collaboration or plagiarism in either the class presentation or the project is permitted. The work you submit must be your own. 'Collaboration' in this context includes but is not limited to sharing of source code, correction of another's source code, or uncited copying of a previous work. See Academic integrity at the University of Toronto.
- Course project
- Although you will be expected to submit all source code, and possibly be called upon to give a demonstration, you will be marked on typical factors in academic publications, namely 1) originality, 2) sufficient survey of existing work, 3) technical correctness, 4) empirical methods, 5) overall presentation. You will submit a report in the style of an academic publication according to one of:
Calendar
-
8 September 2014 First lecture 22 September 2014 Last day to add CSC 2518 27 October 2014 Last day to drop CSC 2518 TBD Last lecture 15 December 2014 Final project due
Old website
Here is the website for the iteration of this course offered in 2011, with additional handouts: CSC2518 2011 webpage