You can run my code in this google colab file --> https://colab.research.google.com/drive/1Tfoa5y13GPLxbS-wFNmZpvtQDogyh1Rg?usp=sharing
So I wrote a script that takes a VideoID of a YouTube video like:
VideoID = '3c584TGG7jQ'
Based on this VideoID my script returns a list of dictionaries with the youtube transcript (the video content) like:
data = [{'text': 'Hello World', 'start': 0.19, 'duration': 4.21}, ...]
Finally I wrote a function that takes an input from the user, namely the word/sentence that you want to search and the function returns the time stamp with the according hyperlink.
def search_dictionary(user_input, dictionary): MY_CODE_SEE_GOOGLE_COLAB_NOTEBOOKsearch_dictionary(user_input, dictionary)
Input: "stolen"Output: the 2 million packages that are stolen... 0.0 min und 39.0 sec :: https://youtu.be/3c584TGG7jQ?t=38sstolen and the fifth is this outer... 3.0 min und 13.0 sec :: https://youtu.be/3c584TGG7jQ?t=192s
Now comes my question. How can I apply this to a list of video_ids? E.g.
list_of_video_ids = ['pXDx6DjNLDU', '8HEfIJlcFbs', '3c584TGG7jQ', ...]
Expected Output:
Title_0, timestamp, hyperlinkTitle_0, timestamp, hyperlinkTitle_1, timestamp, hyperlinkTitle_2, timestamp, hyperlinkTitle_2, timestamp, hyperlinkTitle_2, timestamp, hyperlinkTitle_2, timestamp, hyperlink
So every mention within all the video_ids, not just a single video_id