I'm hoping someone can help me solve this problem and if not, tell me why it doesn't work. I'm trying to get all comments and replies from a YouTube video for practice. I hope to then be able to expand upon the code to do sentiment and network analysis for entire channels. The problem is that the code I have written doesn't appear to get all the replies and I'm unsure if it even gets all the comments. Similar issues were discussed in How to get comments from videos using YouTube API v3 and Python? without a solution given.
def main(): # Disable OAuthlib's HTTPS verification when running locally. # *DO NOT* leave this option enabled in production. os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1" api_service_name = "youtube" api_version = "v3" DEVELOPER_KEY = "yourapikey" #<--- insert API key here youtube = googleapiclient.discovery.build( api_service_name, api_version, developerKey = DEVELOPER_KEY) request = youtube.commentThreads().list( part="snippet, replies", order="time", maxResults=100, textFormat="plainText", videoId="aCjyqziVSMA" ) response = request.execute() full = pd.json_normalize(response, record_path=['items']) while response: if 'nextPageToken' in response: response = youtube.commentThreads().list( part="snippet", maxResults=100, textFormat='plainText', order='time', videoId='aCjyqziVSMA', pageToken=response['nextPageToken'] ).execute() df2 = pd.json_normalize(response, record_path=['items']) full = full.append(df2) else: break return fullAfter running the function above I can break down the replies to comments using this block of code. For simplicity, I only took the first instance of where a comment has a reply
df2 = test[test['snippet.totalReplyCount']>0].reset_index(drop=True)pd.json_normalize(df2['replies.comments'][0])As you can see, the first result with replies only grabbed 4 of the 166 possible
What am I doing wrong? Is this a limitation for the API call itself?
Edit 1
It seems that another method to do it would be to (https://developers.google.com/youtube/v3/docs/commentThreads):
To retrieve all of the replies for the top-level comment, you need to call the comments.list method and use the parentId request parameter to identify the comment for which you want to retrieve replies.
However, I'm not very good with API calls yet and I do not know how to make the specific call. If someone gives me the method to do that and it works, I'll accept that as the answer as well :)
Edit 2 (2022-05-23)
Here is the solution I came up with. This function will grab all the replies it can from each comment using Google api and put it into a dataframe. You can then join this data frame onto the original comments data frame to get every reply and every comment (assuming you don't run out of credits)
# functiondef repliesto(parentId): # Disable OAuthlib's HTTPS verification when running locally. # *DO NOT* leave this option enabled in production. os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1" api_service_name = "youtube" api_version = "v3" DEVELOPER_KEY = DevKey #your dev key youtube = googleapiclient.discovery.build( api_service_name, api_version, developerKey = DEVELOPER_KEY) request = youtube.comments().list( part="snippet", maxResults=100, parentId=parentId, textFormat="plainText" ) response = request.execute() replies = pd.json_normalize(response, record_path=['items']) while response: if 'nextPageToken' in response: response = youtube.comments().list( part="snippet", maxResults=100, parentId=parentId, textFormat="plainText", pageToken=response['nextPageToken'] ).execute() df2 = pd.json_normalize(response, record_path=['items']) replies = pd.concat([replies, df2], sort=False) else: break return replies# get the top-level comment id from each reply author. full is what I called the data frame that contains the commentsreplyto = []for reply in full[(full['snippet.totalReplyCount']>0)] ['snippet.topLevelComment.id']: replyto.append(reply)# create a empty dataframe to contain the all the replies and use a for loop to place each item in our replyto list into the function defined abovereplies = pd.DataFrame()for reply in replyto: df = repliesto(reply) replies = pd.concat([replies, df], ignore_index=True)
