I'm working on a project and would like to know if there is a way to obtain the timestamps at which a speaker speaks in a YouTube video.
For example,
If a video involves 3 persons speaking back and forth, the method/function could return something like
{'speaker1': ['00:09-01:04', '01:15-05:12'],'speaker2': ['08:00-09:02', '15:15-20:12'],'speaker3': ['10:19-11:04', '25:10-35:17'],}
A solution in python would be appreciated. Thanks