I am trying to retrieve all the comments for a specific channel. Before pulling the comments, I checked the statistics of all the videos of the channel to see the number of comments per video I should expect. However, when I pulled the comments using commentThreads, the total number of comments I got did not equal the total number of comments I was expecting. To illustrate, I have a sample of 3 videos. I wrote a function to get the following:
comment_counter =number of comments as per video statistics
top_comments_count = actual number of top-level comments retrieved/counted
reply_count_info = number of replies as per top-level comment snippet
reply_counted = actual number of replies retrieved/counted
Dataframe of comment counts per video
It looks like I am not getting all the replies to top-level comments. Any idea why this is happening?
Here is the code I used to get these numbers:
def count_video_comments(youtube,video_id):comment_counter = 0 #number of comments as per video statisticstop_comments_count = 0 #actual number of top-level comments retrieved/countedreply_count_info = 0 #number of replies as per top-level comment snippetreply_counted = 0 #actual number of replies retrieved/countedrequest = youtube.commentThreads().list( part="snippet,replies", videoId=video_id, order='time', maxResults=100)response = request.execute()request2 = youtube.videos().list( part="statistics", id=video_id)response2 = request2.execute()for video in response2['items']: comment_counter += int(video['statistics']['commentCount']) top_comments_count += len(response['items'])for comment in response['items']: reply_count_info += comment['snippet']['totalReplyCount']for comment in response['items']: if comment['snippet']['totalReplyCount'] !=0: reply_counted += len(comment['replies']['comments'])next_page_token = response.get('nextPageToken')while next_page_token is not None: request = youtube.commentThreads().list( part="snippet,replies", videoId=video_id, maxResults=100, order='time', pageToken = next_page_token) response = request.execute() top_comments_count += len(response['items']) for comment in response['items']: reply_count_info += comment['snippet']['totalReplyCount'] for comment in response['items']: if comment['snippet']['totalReplyCount'] !=0: reply_counted += len(comment['replies']['comments']) next_page_token = response.get('nextPageToken')return (comment_counter,top_comments_count, reply_count_info, reply_counted)