Behind the scenes yt-dlp is downloading the subs in .vtt format than using ffmpeg to convert those to .srt. Depending on your situation the original .vtt format might be fine.
Or even better, yt-whisper, which uses OpenAI's Whisper speech to text. I guess it'd be better to first check whether the video has captions first before Whispering, so maybe both your command and this one could be used together.
I am not a fan of this pattern - if I'm understanding correctly, you would have to part with all of yt-dlp's niceties like playlist/channel handling, quality selection, file naming, logging config etc.
Why not just use the whisper cli on yt-dlp CLI's output for videos with bad or no subtitles?
Sure you could do that too. yt-whisper uses yt-dlp underneath so there might be a way to pass arguments to the inner yt-dlp instance. Or if not you can modify the source directly, it seems to be a simple wrapper. Or again you can do what you were saying, using the Whisper CLI. All good options, I just mentioned this one since it's easier if I just want to download a video with subs.
I apologize for the question, but I am not entirely clear where "split_sentences" is. Is it a separate script? I have been looking for something with that sort of functionality for a while, very often for this very purpose, splitting transcripts.
Sorry for the late answer, but yes, it would have to be a separate script or command, it is purely fictional, I made it up because it made more sense for the joke to have it, and people might have pointed out that my grep would have filtered out too much context, so I had to add this.
I'm sure there are many unix-y tools for this purpose, but I don't know of them. If you're looking for something that's installed everywhere, maybe a very big awk or sed regex with multiline wizardry could do the trick for most easy-to-parse latin languages and you'd just have to copypaste it around. It prolly becomes harder for regexes once you start working with right-to-left languages like Arabic, and languages with different ponctuation, so it might not be i18n-friendly.