These are Spanish speech and associated data available from LDC. There appear to be duplications, we should work back from the later publications.
- LDC2014T23 Fisher and CALLHOME Spanish–English Speech Translationnot on server
- LDC2010T04 Fisher Spanish – Transcripts
/projects/ldc/ldc-standard-license/2010/LDC2010T04 - LDC2010S01 Fisher Spanish Speech
/projects/ldc/ldc-standard-license/2010/LDC2010S01 - LDC2006S37 West Point Heroico Spanish Speechnot on server
- 1997 HUB5 Spanish Transcriptsnot on server
- LDC2002S25 1997 HUB5 Spanish Evaluation
- LDC2001T61 CALLHOME Spanish Dialogue Act Annotation
- LDC98S74 1997 Spanish Broadcast News Speech (HUB4-NE)
- 1997 Spanish Broadcast News Transcripts (HUB4-NE)
- LDC98T29 HUB5 Spanish Telephone Speech Corpus
- LDC98T27 HUB5 Spanish Transcripts
- LDC96S57 ALLFRIEND Spanish-Caribbean Dialect
- LDC96S58 CALLFRIEND Spanish-Non-Caribbean Dialect
- LDC96L16 CALLHOME Spanish Lexicon
/projects/ldc/ldc-standard-license/1996/LDC96L16 - LDC96S35 CALLHOME Spanish Speech
/projects/ldc/ldc-standard-license/1996/LDC96S35 - LDC96T17 CALLHOME Spanish Transcripts
/projects/ldc/ldc-standard-license/1996/LDC96T17 - LDC96S57 CALLFRIEND Spanish-Caribbean Dialect
- LDC96S58 CALLFRIEND Spanish-Non-Caribbean Dialect
- LDC95S28 LATINO-40 Spanish Read News
I would like to use number 23) above (LDC95S28 LATINO-40 Spanish Read News), because it is scripted speech – which should be easier to start with – but I went over the documentation and it does not mention a phonetic dictionary, so I think too much time would have to be spent on writing one, what do you say?
20 – 22), the Spanish CALLHOME would seem like a good option (although it is not scripted speech), because it has readily available both the transcripts and a lexicon that includes phonetic transcription along with more (potentially useful) info, like POS.
Let me know…