I'm trying to retrieve video details from YouTube Shorts using YouTube API v3. My goal is to create a dataframe where each row corresponds to a Shorts video uploaded by a specific channel, with columns identifying the video details.
Problem:YouTube seems to cap the number of extracted videos at 3000 for a channel, even though my quota limit hasn't been reached. This makes me suspect an internal cap on the number of videos extractable from a single channel.When trying to continue fetching videos from the last stored page token, it only returns the same 50 results from the last token, rather than retrieving older videos.
Context:Since the YouTube API v3 doesn't specifically handle Shorts, I learned from a StackOverflow thread that Shorts can be treated as a playlist. Using the channel ID, I can recreate a "playlist ID" to identify all Shorts videos uploaded by that channel.
Steps Taken:I run the initial extraction using the get_playlist_videos_and_store_last_token function.I store the last valid token and plan to continue extraction the next day using the continue_from_last_token function.However, after running the second function, it doesn't seem to retrieve any older videos.
Questions:
- Is there a limit to how many videos can be extracted from a single playlist or channel in YouTube API v3?
- Could there be an issue with how I'm handling pagination and tokens in the second function?
- Any suggestions for how to bypass this cap or retrieve the remaining videos?
Below is the code I'm using for this task in Python. I use Google Colab to run it.
# YouTube API setupyoutube = build('youtube', 'v3', developerKey=API_KEY)# Function to fetch detailed information for a list of video IDsdef get_video_details(video_ids): videos = [] for i in range(0, len(video_ids), 50): request = youtube.videos().list( part='snippet,statistics,contentDetails', id=','.join(video_ids[i:i+50]) ) response = request.execute() for item in response['items']: video_info = {'video_id': item['id'],'title': item['snippet']['title'],'description': item['snippet']['description'],'published_at': item['snippet']['publishedAt'],'tags': item['snippet'].get('tags', 'No tags'),'viewCount': item['statistics'].get('viewCount', 'No data'),'duration': item['contentDetails'].get('duration', 'No data') } videos.append(video_info) time.sleep(0.5) return videos# Function to convert channel ID to Shorts playlist IDdef get_shorts_playlist_id(channel_id): if channel_id.startswith("UC"): return channel_id.replace('UC', 'UUSH', 1) else: raise ValueError("Invalid channel ID format. It should start with 'UC'.")# Function to fetch videos from a playlistdef get_playlist_videos_and_store_last_token(playlist_id, max_results=50): videos = [] video_ids = [] next_page_token = "" last_valid_token = None while True: response = youtube.playlistItems().list( part='snippet', playlistId=playlist_id, maxResults=max_results, pageToken=next_page_token ).execute() for item in response.get('items', []): video_ids.append(item['snippet']['resourceId']['videoId']) next_page_token = response.get('nextPageToken') if not next_page_token: break time.sleep(0.5) if video_ids: videos = get_video_details(video_ids) return pd.DataFrame(videos)# Function to continue fetching videos using the last tokendef continue_from_last_token(playlist_id, max_results=50): try: with open("last_page_token.txt", "r") as file: start_token = file.read().strip() except FileNotFoundError: start_token = "" videos = [] video_ids = [] next_page_token = start_token while True: response = youtube.playlistItems().list( part='snippet', playlistId=playlist_id, maxResults=max_results, pageToken=next_page_token ).execute() for item in response.get('items', []): video_ids.append(item['snippet']['resourceId']['videoId']) next_page_token = response.get('nextPageToken') if not next_page_token: break time.sleep(0.5) if video_ids: videos = get_video_details(video_ids) return pd.DataFrame(videos)