I need to fine-tune a model using transcripts from specific YouTube videos. However, fine-tuning typically requires a question-answering dataset, and I don’t want to manually create a dataset every time I add new YouTube videos as knowledge.
Here’s what I’ve tried so far:
Extracting transcripts from YouTube videos using tools like youtube-transcript-api.
Manually creating a small Q&A dataset from the transcript for fine-tuning.
The manual process is time-consuming, and I’m looking for a way to automate the generation of a question-answering dataset from the transcripts.
How can I automatically generate a Q&A dataset from YouTube video transcripts to use for fine-tuning my model? Are there any tools, libraries, or techniques that can help with this?