Kuanysh asked me how I am going to be incorporating the API sounds into text — impossible. That meant that there already needs to be an audio version of the book, on top of which the sounds will be layered. He suggested I record myself reading an excerpt of the novel and have an algorithm that would transform speech to text. At this point, I was thinking that all of my previous work with PDFs seemed useless. Nonetheless, speech to text is not always entirely accurate, and double-checking with the PDF/actual text version of the book makes a lot of sense.
Once that is accomplished, I would have to run my code that compares the sound words list I created earlier to the audio (by then it would be in text format), and only then can I begin integrating sound bank APIs.
I felt relieved, since I was starting to get a sense of the direction this project was going to take. Grabbing my iPhone, I recorded myself reading a paragraph from Fahrenheit 451
, giving an ode to the very first book I analyzed.
Like with everything I do when it comes to data science, the issues rose even before I moved on to the implementation stage. Because I was using Voice Memos, the format of the audio was mp4. Well, in order for the speech recognition to work, the file has to be wav. I thought this wasn't going to be a problem and just changed the name of the file from "Fahrenheit 451.mp4" to "Fahrenheit 451.wav", assuming this was going to cut it. Nothing in my Jupyter Notebook worked due to the file format, so I had to use the Online Audio Converter
to make it a wave file.
Surprisingly, the code from the tutorial I found on PythonCode
was immaculate: it was able to print out every word I said accurately. The only thing was that it only did that for the first minute of the audio.