2 free, simple tools for speech to text (an 'ethical shoestring' evaluation) 🗣️
My search and evaluation for a free, ethical tool to automatically convert audio podcast recordings to text transcripts (audio; 9:30)
Our 6 'P's in AI Pods audio interviews for AI, Software, & Wetware have taken off and I’m having so much fun learning from my guests! One downside is that creating the text posts from the audio recordings has taken a lot of my time.
This is a quick capture of some investigations I did this past week to find a free and preferably ethical tool for converting the audio to transcripts for the posts. Let me know if you find it useful.
TL;DR Here are the tools I found in my searches (some details below).
Tool Candidates - free: Conversations, ConverterApp, Fathom, Fireflies.ai, oTranscribe, Restream, SpeakToText.
Tool Candidates - not free, or free plan not suitable: Audiotype, Descript, HappyScribe, Kapwing, Media.io, Notta, Otter.ai, Talknotes, Turboscribe, Veed, Yescribe.
Bottom line: Try ConverterApp or Restream if you need to convert MP3s to text transcripts. They’re fast, pretty accurate for audio from native English speakers, private, and truly free - no accounts or calendar integrations required. If you don’t mind calendar integration and want a free tool, check out Fathom and Fireflies.ai.
Caveat: I haven’t yet surfaced any info about their underlying technologies or ethics. But at least what I do know about ConverterApp and Restream doesn’t raise any red flags.
Tool Candidates - free:
Conversations - https://highlight.ing/apps/conversations
Built on Highlight open source platform
Free - downloadable Windows app
Appears to focus on real-time listening and conversion - not an ideal fit for my needs.
🌟ConverterApp - https://converter.app/audio-to-text/
Can distinguish between multiple speakers - good for sessions with interviewees.
Automatically filters out filler words.
Free to use via website without an account or giving an email address.
Lots of ads on the bottom of the page, but they don’t interfere with use.
Tested with a 28:46 interview recording (26mb) with 2 speakers. Transcription completed in 2 minutes, and the text was over 99% accurate (missed words: kiosk, EULA, a name with an unconventional spelling) 👍.
Fathom - https://fathom.video/
Free account requires sign-in with Google or Microsoft, and calendar integration. An active Zoom account is also required. (Has Zoom app plugin - does it work with free Zoom accounts?)
Their privacy policy and terms of service do not mention AI or training. However, they openly claim AI-based features. They state that their products are GDPR and HIPAA compliant.
Has paid tiers, but the free version may be good enough for my needs
“Pricing: Free with no usage limitations.”
Fireflies.ai: https://fireflies.ai/ (not to be confused with Adobe Firefly for images)
The free plan allows “unlimited transcription” for meetings joined by their bot, with limited AI summaries, if:
“Auto-join calendar meetings” is set to ON, and
“Send email recap to” is set to "Everyone on the invite”.
Sign-in is only supported via Google Auth (calendar integration) or Outlook auth. Unlimited free transcriptions require:
a Google Meet call recorded with the Chrome extension, or
the Fireflies AI notetaker must be allowed to join a meeting as a participant, with the settings above.
Their data processing policy is stated to be GDPR compliant.
Can upload past meetings as mp3, mp4, wav, or m4a files to get them transcribed, but: 1 transcription credit is consumed per uploaded file (new users get between 3-10 credits), and users need to upgrade to a paid plan to download transcripts, audio, or video.
❌ oTranscribe - https://otranscribe.com/
Free and based on open source (MIT license) - “A free web app to take the pain out of transcribing recorded interviews.”
I started to test oTranscribe with a 25:17 MP3 file. It doesn’t actually transcribe. It just provides a better UI than switching between a MP3 player app and a text editing app. The help says “oTranscribe makes the manual task of transcribing audio a lot less painful. But you still have to do the transcription.”
Private - in browser - “your audio file and transcript never leave your computer”
Interoperability - limited; provides a link to media.io to help with conversions if needed
🌟 Restream - https://restream.io/tools/transcribe-audio-to-text
Free and private - no account required: “You can feel confident that your data is safe with us when you use our audio transcription tool. You don’t have to download any software, we don’t show ads, and you don’t need to create an account. We also wipe your data from our servers after you’re done transcribing.”
Interoperability - “Our audio transcription tool supports audio files like MP3, WAV, FLAC, AAC and more.” (no m4a? need to test)
Accuracy - Site claims 99% transcription accuracy for English.
Other features - Plans appear oriented to video and live streaming. Outputs from the free plan appear to include a (video) watermark.
I tested Restream on the same 28 min file as ConverterApp. Once the file was uploaded, the transcription finished in seconds. It doesn’t identify different speakers, but it does separate them nicely in the file with blank lines to improve readability. The transcript includes ‘mmhmm’ and other filler words, which is a plus if you want to find them in the audio and edit them out. 👍
SpeakToText - https://speaktotext.io/
Truly free: “This free online tool transcribes your audio files into text, applying high-precision methods backed by OpenAI's Whisper model. Unlike other services, this tool does not ask for your email address, offers mass transcription and accepts files up to 25MB in size.”
Ethical?: Likely no. ❌ The tool relies on OpenAI’s Whisper model. The safest assumption is that anything from OpenAI has not been fairly trained.
Private: They also promise that the content will not be mined: “At SpeakToText.io, we prioritize your privacy and security. None of your audio files are stored on our servers, ensuring your data's confidentiality. For data sent through our OpenAI-powered API, OpenAI's policy as of March 1, 2023, reinforces our commitment to your privacy. OpenAI no longer uses data submitted through our API for model training or improvement.”
Interoperable: Supported formats include mp3, mp4, wav, m4a, and more. Ability to take m4a format which Zoom provides is a plus, although I have to convert to mp3 anyway for Substack (but if Audacity could take m4a, then I could skip that step).
Tool Candidates - not free, or free plan not suitable:
Audiotype, Descript, HappyScribe, Kapwing, Media.io, Notta, Otter.ai, Talknotes, Turboscribe, Veed, Yescribe
(audio voiceover stops here and is available only in Substack, not in podcasts)
Audiotype - https://www.audiotype.org/transcribe/audio/
Not free unless your content is under 1 minute long. Free trial does not allow downloading.
Pricing is based on the number of minutes, e.g. 60 minutes for 9 euros.
Speed - “With Audiotype, it takes a third of the file duration to transcribe speech to text”
Descript - https://www.descript.com/transcription
Seems more focused on video than audio.
Pricing: Free plan offers 1 hour/month of recording time. Hobbyist plan is $12/mo, Creator plan is $24/mo, Business plan is $40/mo, and they offer an enterprise plan.
HappyScribe - https://www.happyscribe.com/free-transcription-software
Free 85% accurate transcription tool for up to 10 minutes (not downloadable), as a way to upsell 99% accurate human transcription services or a paid transcription plan
Kapwing - https://www.kapwing.com/tools/audio-to-text
Free for 10 minutes of transcription per month, then paid plans are available
Potential ethical concern: “Repurpose content from the internet by pasting a link.” - this seems to open the door for use of content not owned.
Bonus feature: edit the audio by editing the transcript (cut out unwanted parts)
Media.io - https://www.media.io/sound-to-text.html
Free plan includes 5 min of automatic subtitling. Basic plan is $6.99/mo paid yearly. Premium plan is $12.99/mo and includes AI features like background noise removal.
Offers a ‘free’ Windows app download (to be tested to see what it does)
Doesn’t support Firefox
Notta - https://www.notta.ai/en/tools/audio-to-text-converter
Free plan allows 120 min/month but limits to max 3 min per conversation - not viable for me
Otter.ai -
Pricing: Free plan offers 300 monthly transcription minutes; 30 minutes per conversation; allows importing and transcribing of three audio or video files in a lifetime. Annual paid plans start at $10 per month.
Talknotes - https://talknotes.io/tools/transcribe-to-text -
Free trial: can transcribe a 2 min recording for free via website. Base plan allows unlimited notes up to 20 min of recording time for $12/mo or $69/yr. Pro plan allows up to 25 2-hour recordings for $49/mo or $499/yr.
Home page says it offers privacy protections and does not use your content for training AI.
No information is offered on the underlying foundation models they use.
Turboscribe - https://turboscribe.ai
Private: Their FAQ says “We don't use any third-party transcription APIs or services. We run all audio/video transcription in-house on machines we own or control.” and “We don't train AI or machine learning models on your media files or transcripts.”
3 free transcripts per day with account (email required)
Pricing: $20/mo or $120/year for unlimited transcripts up to 10 hours long (each)
Interoperability: “TurboScribe supports the vast majority of common audio and video formats, including MP3, M4A, MP4, MOV, AAC, WAV, OGG, OPUS, MPEG, WMA, WMV, AVI, FLAC, AIFF, ALAC, 3GP, MKV, WEBM, VOB, RMVB, MTS, TS, QuickTime, and DivX.”
Accuracy: Site claims 99.8% accuracy. Bonus feature: it recognizes different speakers, e.g. for interviews (my primary intended purpose).
Veed - https://www.veed.io/tools/audio-to-text
Free plan with watermark and 15 min/month TTS
Pricing -
Yescribe - https://yescribe.ai
Free plan offers 3 files/day, up to 30 min each and identifies speakers
Pricing: Basic plan allows 10 files/day up to 5 hours each for $4.90/month billed yearly. Pro plan allows unlimited files per day at $9.90/month billed yearly.
References
https://www.descript.com/blog/article/best-free-audio-to-text-converters:
Descript - Transcribing audio or video files using a computer (free for 60 min/month)
Otter.ai - Transcribing virtual meetings - only 3 free lifetime downloads
Fathom - Sales teams and customer service reps (worth a look though!)
MacWhisper - Mac users only
Google Docs Voice Typing - Google Suite users, not suited for interviews
Windows Voice Typing - Windows users - could try with ‘virtual audio cable’ for converting MP3 to text (see tutorial)