Angela Krak
December 15, 2021
This is Angela Krak's final project repo for Data Science (LING 2340), Fall 2021. The data consist of transcriptions of sociolinguistic interviews from speakers from Seville, Spain. The purpose of this project is to conduct frequency analyses to explore common themes within the data, both as a whole set and by interview question.
The data that I began the project with consisted of 24 sound files. I had personally collected the data in the summer of 2019. For this project, I transcribed 22/24 files and divided the speech into .txt files by interview question. There were 5 questions total, so each speaker is associated with 5 files. While I cannot share the data set, I have uploaded a sample of .txt files from one speaker, which can be viewed here:
I have linked the most important parts of the repo below.
The final_report.md will contain a summary of the project and the most important findings from the frequency analyses.
The code used in this project can be accessed by viewing Sevilla Transcription Rmd or Sevilla Transcription md.
The images of the frequency graphs can be found in the folder titled images .
Thanks for visiting and feel free to contact me with any questions!