Contact information

InstructorFrank Rudzicz
Office550 University Ave., rm 12-175, Toronto ON, M5G 2A2
Office hoursad hoc
Office phone416 597 3422 x7971
Emailfrank@cs.toronto.[EDUCATION] (fix the suffix)
Back to top

Meeting times

LecturesM 11h00-13h00 in BA 1200
Back to top

Course outline

This is a graduate course broadly on topics of speech processing by machine including digital signal processing, automatic speech recognition, and speech synthesis. The theme this year is Speech in healthcare and assistive technologies which will include automatic dictation of speech for medical records, analysis of speech in language pathologies (e.g., in cerebral palsy, Parkinson's disease, and Alzheimer's disease), and assistive technologies such as text-to-speech (with and without brain-computer interfaces) for people with limited speech ability.

Back to top

News and announcements

  • LECTURE CANCELLED: 22 September. If you wish to discuss your project proposal, please contact me directly.

Back to top

Lecture materials

WeekTitleSpeakerSupplemental material
8 Sep.Introduction to speech signal processingFrank Rudzicz
15 Sep.Introduction to clinical and biomedical aspects of speechFrank Rudzicz
29 Sep.
  1. (1 hour) B.N. Pasley, S.V. David, N. Mesgarani, A. Flinker, S.A. Shamma, N.E. Crone, R.T. Knight, E.F. Chang (2012) Reconstructing Speech from Human Auditory Cortex. PLoS ONE Biology, 10(1):1-13.
  2. (1/2 hour) H-Y Wu, M. Rubinstein, E. Shih, J. Guttag, F. Durand, W.T. Freeman (2012) Eulerian video magnification for revealing subtle changes in the world. ACM Transactions on Graphics, 31(4).
  1. Alex Francois-Nienaber
  2. Orion Buske
  1. Alex's slides.
    Sound reconstructed from the brain.
  2. Orion's slides.
    Website.
6 Oct.
  1. (1 hour) L. Feenaughty, K. Tjaden, J. Sussman (2014) Relationship between acoustic measures and judgments of intelligibility in Parkinson's disease: A within-speaker approach. Clinical Linguistics & Phonetics, pages 1--22.
  1. Teresa Valenzano
  1. Teresa's slides.
20 Oct.
  1. (1/2 hour) K.L. Lansford, J.M. Liss (2014) Vowel acoustics in dysarthria: Speech disorder diagnosis and classification. Journal of Speech, Language, and Hearing Research, 57, pages 57-67
  2. (1/2 hour) A. Temko, C. Nadeu, W. Marnane, G. Boylan, G. Lightbody (2011) EEG Signal Description with Spectral-Envelope-Based Speech Recognition Features for Detection of Neonatal Seizures. IEEE Transactions on Information Technology in Biomedicine, 15(6): 839-847.
  3. TBD
  1. Gillian DeBoer
  2. Ladislav Rampasek
  3. Narges Norouzi
  1. Gillian's slides.
  2. Ladislav's slides.
27 Oct.
  1. (1/2 hour) K. Brigham, B.V.K.V. Kumar (2010) Imagined Speech Classification with EEG Signals for Silent Communication: A Preliminary Investigation into Synthetic Telepathy. Proceedings of IEEE International Conference on Bioinformatics and Biomedical Engineering (iCBBE), pages 1-4.
  2. (1/2 hour) C.S. DaSalla, H. Kambara, M. Sato, Y. Koike (2009) Single-trial classification of vowel speech imagery using common spatial patterns. Neural Networks, 22(9):1334-1339.
  3. TBD
  1. Peter Hamilton
  2. Peter Hamilton
  3. Kuan-Chieh Wang
  1. Peter's slides.
3 Nov.
  1. (1/2 hour) J. Lee, K.C. Hustad, G. Weismer (2014) Predicting Speech Intelligibility With a Multiple Speech Subsystems Approach in Children With Cerebral Palsy. Journal of Speech, Language, and Hearing Research, preprint.
  2. (1/2 hour) R. Patel (2002) Prosodic Control in Severe Dysarthria. Journal of Speech, Language, and Hearing Research, 45(5):858-870.
  3. (1/2 hour) A.J. Sporka, T. Felzer, S.H. Kurniawan, O. Poláček, P. Haiduk, and I.S. MacKenzie (2011) CHANTI: predictive text entry using non-verbal vocal input. Proceedings of the SIGCHI Conference on Human Factors in Computing System, pages 2463-2472.
  1. Gillian DeBoer
  2. Aryan Arbabi
  3. Aryan Arbabi
  1. Gillian's slides.
  2. Aryan's slides.
10 Nov.
  1. (1 hour) S. Petrik, C. Drexel, L. Fessler, J. Jancsary, A. Klein, G. Kubin, J. Matiasek, F. Pernkopf, H. Trost (2011) Semantic and phonetic automatic reconstruction of medical dictations. Computer Speech & Language, 25(2):363-385.
  1. Arjun Subramanian
  1. Arjun's slides.
24 Nov.
  1. (1 hour) Y. Yunusova, J.S. Rosenthal, K. Rudy, M. Baljko, J. Daskalogiannakis, J. (2012). Positional targets for lingual consonants defined using electromagnetic articulography. Journal of the Acoustical Society of America, 132(2):1027–1038.
  2. (1/2 hour) D. Bone, T. Chaspari, K. Audhkhasi, J. Gibson, A. Tsiartas, M. Van Segbroeck, M. Li, S. Lee, S. Narayanan. (2013) Classifying language-related developmental disorders from speech cues: the promise and the potential confounds. In Proceedings of INTERSPEECH 2013, pages 182-186.
  3. (1/2 hour) P.O. Kristensson, K. Vertanen (2012). The Potential of Dwell-Free Eye-Typing for Fast Assistive Gaze Communication. Proceedings of ETRA 2012 pages 241-244, Santa Barbara CA.
  1. Rojin Majd
  2. Ladislav Rampasek
  3. Orion Buske
  1. Rojin's slides.
  2. Ladislav's slides.
  3. Orion's slides.
1 Dec.
  1. (1 hour) A.B. Kain, J.-P. Hosom, X. Niu, J.P.H. van Santen, M. Fried-Oken, J. Staehely (2007) Improving the intelligibility of dysarthric speech. Speech Communication, 49(9):743-759.
  2. (1 hour) E.W. Healy, S.E. Yoho, Y. Wang, D. Wang (2013) An algorithm to improve speech recognition in noise for hearing-impaired listeners. Journal of the Acoustical Society of America, 134(4):3029-38.
  1. Stacey June Oue
  2. Sara Sabour Rouh Aghdam
  1. Stacey's slides.
  2. Sara's slides.
8 Dec.
  1. (1/2 hour) A. Tsanas, M.A. Little, P.E. McSharry, J. Spielman, L.O. Ramig (2012) Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease. IEEE Transactions on Biomedical Engineering, 59(5):1264-1271.
  2. (1/2 hour) D. Hakkani-Tur, D. Vergyri, G. Tur (2010) Speech-based automated cognitive status assessment. Proceedings of Interspeech 2010, pages 1-4.
  3. (1 hour) T. Nose and T. Kobayashi (2011) Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency. Speech Communication, 53(7):973-985.
  1. Maria Yancheva
  2. Maria Yancheva
  3. Moritz Stiefel
Back to top

Suggested readings

You are strongly encouraged to select readings from the remaining list below to present. Papers are preceded by the length of their talks in hours.

Speech recognition in healthcare

Speech-based communication aids

Speech-based diagnosis

  • All suggested papers taken!

Clinically-relevant features of speech & other

Optional readings - general introduction

Optional Foundations of Statistical Natural Language ProcessingC. Manning and H. Schutze
Optional Speech and Language ProcessingD. Jurafsky and J.H. Martin
Optional Spoken Language Processing: A Guide to Theory, Algorithm, and System DevelopmentX. Huang, A. Acero, and H.-W. Hon
Back to top

Evaluation policies

General
You will be graded on a 1-hour in-class presentation (or two half-hour presentations), overall participation, and a final project report. The relative proportions of these grades are as follows:
Class presentation/participation20%
Final project80%
Collaboration and plagiarism
No collaboration or plagiarism in either the class presentation or the project is permitted. The work you submit must be your own. 'Collaboration' in this context includes but is not limited to sharing of source code, correction of another's source code, or uncited copying of a previous work. See Academic integrity at the University of Toronto.
Course project
Although you will be expected to submit all source code, and possibly be called upon to give a demonstration, you will be marked on typical factors in academic publications, namely 1) originality, 2) sufficient survey of existing work, 3) technical correctness, 4) empirical methods, 5) overall presentation. You will submit a report in the style of an academic publication according to one of:

Back to top

Calendar

8 September 2014First lecture
22 September 2014Last day to add CSC 2518
27 October 2014Last day to drop CSC 2518
TBDLast lecture
15 December 2014Final project due

See Dates for graduate students.

Back to top

Old website

Here is the website for the iteration of this course offered in 2011, with additional handouts: CSC2518 2011 webpage

Back to top