Quantcast
Channel: Active questions tagged youtube-api - Stack Overflow
Viewing all articles
Browse latest Browse all 3831

Issue with Pagination in Youtube Shorts Using YouTube API v3

$
0
0

I'm trying to retrieve video details from YouTube Shorts using YouTube API v3. My goal is to create a dataframe where each row corresponds to a Shorts video uploaded by a specific channel, with columns identifying the video details.

Problem:YouTube seems to cap the number of extracted videos at 3000 for a channel, even though my quota limit hasn't been reached. This makes me suspect an internal cap on the number of videos extractable from a single channel.When trying to continue fetching videos from the last stored page token, it only returns the same 50 results from the last token, rather than retrieving older videos.

Context:Since the YouTube API v3 doesn't specifically handle Shorts, I learned from a StackOverflow thread that Shorts can be treated as a playlist. Using the channel ID, I can recreate a "playlist ID" to identify all Shorts videos uploaded by that channel.

Steps Taken:I run the initial extraction using the get_playlist_videos_and_store_last_token function.I store the last valid token and plan to continue extraction the next day using the continue_from_last_token function.However, after running the second function, it doesn't seem to retrieve any older videos.

Questions:

  • Is there a limit to how many videos can be extracted from a single playlist or channel in YouTube API v3?
  • Could there be an issue with how I'm handling pagination and tokens in the second function?
  • Any suggestions for how to bypass this cap or retrieve the remaining videos?

Below is the code I'm using for this task in Python. I use Google Colab to run it.

# YouTube API setupyoutube = build('youtube', 'v3', developerKey=API_KEY)# Function to fetch detailed information for a list of video IDsdef get_video_details(video_ids):    videos = []    for i in range(0, len(video_ids), 50):        request = youtube.videos().list(            part='snippet,statistics,contentDetails',            id=','.join(video_ids[i:i+50])        )        response = request.execute()        for item in response['items']:            video_info = {'video_id': item['id'],'title': item['snippet']['title'],'description': item['snippet']['description'],'published_at': item['snippet']['publishedAt'],'tags': item['snippet'].get('tags', 'No tags'),'viewCount': item['statistics'].get('viewCount', 'No data'),'duration': item['contentDetails'].get('duration', 'No data')            }            videos.append(video_info)        time.sleep(0.5)    return videos# Function to convert channel ID to Shorts playlist IDdef get_shorts_playlist_id(channel_id):    if channel_id.startswith("UC"):        return channel_id.replace('UC', 'UUSH', 1)    else:        raise ValueError("Invalid channel ID format. It should start with 'UC'.")# Function to fetch videos from a playlistdef get_playlist_videos_and_store_last_token(playlist_id, max_results=50):    videos = []    video_ids = []    next_page_token = ""    last_valid_token = None    while True:        response = youtube.playlistItems().list(            part='snippet',            playlistId=playlist_id,            maxResults=max_results,            pageToken=next_page_token        ).execute()        for item in response.get('items', []):            video_ids.append(item['snippet']['resourceId']['videoId'])        next_page_token = response.get('nextPageToken')        if not next_page_token:            break        time.sleep(0.5)    if video_ids:        videos = get_video_details(video_ids)    return pd.DataFrame(videos)# Function to continue fetching videos using the last tokendef continue_from_last_token(playlist_id, max_results=50):    try:        with open("last_page_token.txt", "r") as file:            start_token = file.read().strip()    except FileNotFoundError:        start_token = ""    videos = []    video_ids = []    next_page_token = start_token    while True:        response = youtube.playlistItems().list(            part='snippet',            playlistId=playlist_id,            maxResults=max_results,            pageToken=next_page_token        ).execute()        for item in response.get('items', []):            video_ids.append(item['snippet']['resourceId']['videoId'])        next_page_token = response.get('nextPageToken')        if not next_page_token:            break        time.sleep(0.5)    if video_ids:        videos = get_video_details(video_ids)    return pd.DataFrame(videos)

Viewing all articles
Browse latest Browse all 3831

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>