I use this python code to get data from Youtube API v3.
First there's a function built to get view count, like counts, dislike counts and comment counts, then there's another big block that's itself a function to get infos like 'video id' and finally it puts everything in a pandas dataframe.
It works fine with some YouTube channels except that with some other channels it fails to get the number of comments. I hit this error message in my Python Notebook :
KeyError Traceback (most recent call last)<ipython-input-46-de3cf6032d33> in <module>() 5 df2 = pd.DataFrame(columns=['video_id','video_title','upload_date','view_count','like_count','dislike_count','comment_count']) 6 ----> 7 df2 = get_videos(df2)1 frames<ipython-input-44-90ae6e5b0155> in get_video_details(video_id) 8 like_count = response_video_stats['items'][0]['statistics']['likeCount'] 9 dislike_count = response_video_stats['items'][0]['statistics']['dislikeCount']---> 10 comment_count = response_video_stats['items'][0]['statistics']['commentCount'] 11 12 #return view_countKeyError: 'commentCount'
I imagine a if else
statement could do the trick but wouldn't be the best option anyway.
Here's the code in full :
# Import librariesimport requestsimport pandas as pdimport time# KeysAPI_KEY = 'xxx'CHANNEL_ID = 'xxx'def get_video_details(video_id): #collecting view, like, dislike, comment counts url_video_stats = 'https://www.googleapis.com/youtube/v3/videos?id='+video_id+'&part=statistics&key='+API_KEY response_video_stats = requests.get(url_video_stats).json() view_count = response_video_stats['items'][0]['statistics']['viewCount'] like_count = response_video_stats['items'][0]['statistics']['likeCount'] dislike_count = response_video_stats['items'][0]['statistics']['dislikeCount'] comment_count = response_video_stats['items'][0]['statistics']['commentCount'] return view_count, like_count, dislike_count, comment_countdef get_videos(df): pageToken = '' while 1: url = 'https://www.googleapis.com/youtube/v3/search?key='+API_KEY+'&channelId='+CHANNEL_ID+'&part=snippet,id&order=date&maxResults=10000&'+pageToken response = requests.get(url).json() time.sleep(1) #give it a second before starting the for loop for video in response['items']: if video['id']['kind'] == "youtube#video": video_id = video['id']['videoId'] video_title = video['snippet']['title'] video_title = str(video_title).replace('&','') upload_date = video['snippet']['publishedAt'] upload_date = str(upload_date).split("T")[0] #view_count = get_video_details(video_id) view_count, like_count, dislike_count, comment_count = get_video_details(video_id) df = df.append({'video_id':video_id,'video_title':video_title,'upload_date':upload_date,'view_count':view_count,'like_count':like_count,'dislike_count':dislike_count,'comment_count':comment_count}, ignore_index=True) try: if response['nextPageToken'] != None: #if none, it means it reached the last page and break out of it pageToken = 'pageToken='+ response['nextPageToken'] except: break return df#build our dataframedf2 = pd.DataFrame(columns=['video_id','video_title','upload_date','view_count','like_count','dislike_count','comment_count']) df2 = get_videos(df2)