3/2/2024 0 Comments Mac whisper text to speechI'm also testing out the large model to see how and if it differs but that'll take significantly longer as I have to us my cpu because it needs 20 gig of memory and my gpu maxes out at 8. I'm already trying that by adding (Person 1) and (Person 2) into the prompt and I'll update later if it works. Something else I would love to see, again it might already be there, is for the transcript to designate who is saying what. I suppose we can kind of do that already though by matching the title's length up with the markers from the transcript. One thing I would like to see with the program, and I'm not sure if it has it yet or not, is to have the functionality of adding parts of the transcript to Davinci's title editor to place and modify how we see fit. I am having a lot of trouble getting it to work with Davinci studio on Windows, but I think that is a user issue as I am very new to Python. It ended up processing 220 minutes of audio in 26 minutes via cuda though on the medium model which is impressive. ![]() That was expected though as I tend to stutter and stumble through my words, which would lead the program to get confused. There were a few errors, especially when I was the speaker. ![]() I recently tried it myself and it seems pretty up there. I wish that the Resolve API wasn't so limited in certain aspects to implement some of these ideas faster without trying to hack into it too much, but I guess there's no other option but to work with what we have. (now you can only render one at a time, although you can queue up transcriptions) batch transcription of multiple Timelines automatic opening of transcripts on timeline change markers in Resolve via transcript selection I'll try to push these updates as soon as possible: Having these locally generated high quality transcripts really improved our workflow and are changing our editing pipeline. Things are still super raw and maybe buggy here and there, but we use the tool constantly in our editing room at this point. ![]() I'm pushing almost a daily update right now on GitHub, so make sure you're using the latest version. Keep it coming, we're really interested in how people approach editing in Resolve and how to develop a tool that is actually helpful. If you're interested in contributing to this open source project, just get in touch! Everything is done locally without the need for additional accounts or even an Internet connection, once you have all the packages installed.Īgain, this is a free tool that we used in our editing process for almost a week now, but it's quite raw and may be buggy! We may add in the future other learning models like GPT-3 or CLIP to integrate other cool features like content summarization, automated markers etc. When done, you get all the phrases in a JSON file and the subtitles as SRT. Once you install the necessary Python packages on your machine, you can simply go to the timeline in Resolve, press a button in the tool, and wait for the transcription to be processed. StoryToolkitAI can be downloaded for free at the following link (it's written in Python so some knowledge to install it is required): So far, we've tried it on footage in English, Spanish, German, and Chinese, and it's really impressive. Whisper recognizes speech from 97 languages and can translate them into English. If you haven’t tried MacWhisper yet, download it for free and throw some audio at it to see Apple silicon at work.A few days ago OpenAI released publicly Whisper, their Speech Recognition model which is unlike we've ever seen before, so we created a free tool for Resolve called StoryToolkitAI that basically transcribes Timelines into Subtitle SRTs which can be imported back into Resolve. You’re in for a treat if you’ve been using MacWhisper, and this update unlocks those performance gains. You can now select multiple segments and then hit ⌘+C to copy them as a whole to your clipboard You can now navigate between selected segments with the arrow keys on your keyboard. Added support for Undo and Redo in the segments view. ![]() You can now play and pause by pressing the spacebar. Other changes recently added include better keyboard shortcut support: Given the GPU’s role in AI advancements, has GPU become an outdated term? Leveraging the graphics processing unit can make speech-to-text operations perform faster. How are these gains recognized through an app update? The GPU. The new version is said to be two to three times faster at transcribing audio into text with machine learning on the Mac. The awesome MacWhisper app for macOS recently got updated with a speed boost on Macs with Apple silicon.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |