I am trying to use Google's Youtube API to scrape comments from videos. Since videos can have many comments, I have decided to break this job into many smaller jobs
Ideally, the workflow would be:
- scrape x number comments from a youtube video
- keep track of the timestamp for the latest comment processed and save to a database (dynamodb)
- when scraping is started again for the same video, only process comments new than the latest timestamp
However. It is not clear if there is a way to search a comment newer than a timestamp in order not to scrape comments already processed?
For reference, I'm currently trying to implement a solution similar to:
But I am not sure how that solution goes about avoiding comments that have already been processed.