Quantcast
Channel: Active questions tagged youtube-api - Stack Overflow
Viewing all articles
Browse latest Browse all 3637

Why is my YouTube transcripts API only working in non-prod, but not in prod?

$
0
0

In my non-production environment, I am able to use the transcript YouTube API to obtain transcripts.

In my production environment, after much debugging and logging, I am unable to do this. Here are the logs:

2024-08-20T07:41:29.989747260Z [ANONYMIZED_IP] - - [20/Aug/2024:07:41:29 +0000] "GET /generate/youtubeSummary/ HTTP/1.1" 200 30723 "https://[ANONYMIZED_DOMAIN]/dashboard/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0"2024-08-20T07:41:45.777775814Z Custom form options: {}2024-08-20T07:41:45.778110014Z Form data debug: {'grade_level': '', 'video_url': 'https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]', 'summary_length': None}2024-08-20T07:41:45.778131714Z INFO 2024-08-20 07:41:45,777 views Generating summary for video URL: https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]2024-08-20T07:41:45.781101714Z INFO 2024-08-20 07:41:45,780 views YouTube IP address: [ANONYMIZED_IP]2024-08-20T07:41:45.979793308Z INFO 2024-08-20 07:41:45,979 views YouTube connection status: 2002024-08-20T07:41:45.980433708Z INFO 2024-08-20 07:41:45,980 views Attempting to connect to: www.youtube.com2024-08-20T07:41:45.980820208Z INFO 2024-08-20 07:41:45,980 views Extracted video ID: [ANONYMIZED_VIDEO_ID]2024-08-20T07:41:46.463787194Z INFO 2024-08-20 07:41:46,463 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'2024-08-20T07:41:46.463807494Z Request method: 'POST'2024-08-20T07:41:46.463815194Z Request headers:2024-08-20T07:41:46.463822194Z     'Content-Type': 'application/json'2024-08-20T07:41:46.463830194Z     'Content-Length': '2373'2024-08-20T07:41:46.463840094Z     'Accept': 'application/json'2024-08-20T07:41:46.463847194Z     'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'2024-08-20T07:41:46.463853694Z     'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'2024-08-20T07:41:46.463863894Z A body is sent with the request2024-08-20T07:41:46.485539393Z INFO 2024-08-20 07:41:46,485 _universal Response status: 2002024-08-20T07:41:46.485558093Z Response headers:2024-08-20T07:41:46.485565793Z     'Transfer-Encoding': 'chunked'2024-08-20T07:41:46.485600593Z     'Content-Type': 'application/json; charset=utf-8'2024-08-20T07:41:46.485609693Z     'Server': 'Microsoft-HTTPAPI/2.0'2024-08-20T07:41:46.485616293Z     'Strict-Transport-Security': 'REDACTED'2024-08-20T07:41:46.485622793Z     'X-Content-Type-Options': 'REDACTED'2024-08-20T07:41:46.485629293Z     'Date': 'Tue, 20 Aug 2024 07:41:45 GMT'2024-08-20T07:41:46.486316593Z INFO 2024-08-20 07:41:46,485 _base Transmission succeeded: Item received: 2. Items accepted: 22024-08-20T07:41:46.515040992Z ERROR 2024-08-20 07:41:46,513 views Error generating YouTube summary: 2024-08-20T07:41:46.515060292Z Could not retrieve a transcript for the video https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]! This is most likely caused by:2024-08-20T07:41:46.515068192Z 2024-08-20T07:41:46.515203692Z Subtitles are disabled for this video2024-08-20T07:41:46.515214292Z 2024-08-20T07:41:46.515348292Z If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!2024-08-20T07:41:46.515375792Z Traceback (most recent call last):2024-08-20T07:41:46.515384092Z   File "/tmp/[ANONYMIZED_PATH]/theDashboard/views.py", line 2002, in generate_youtube_summary2024-08-20T07:41:46.515390492Z     transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)2024-08-20T07:41:46.515396292Z   File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts2024-08-20T07:41:46.515401992Z     return TranscriptListFetcher(http_client).fetch(video_id)2024-08-20T07:41:46.515407392Z   File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch2024-08-20T07:41:46.515413192Z     self._extract_captions_json(self._fetch_video_html(video_id), video_id),2024-08-20T07:41:46.515418692Z   File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json2024-08-20T07:41:46.515424292Z     raise TranscriptsDisabled(video_id)2024-08-20T07:41:46.515429592Z youtube_transcript_api._errors.TranscriptsDisabled: 2024-08-20T07:41:46.515434992Z Could not retrieve a transcript for the video https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]! This is most likely caused by:2024-08-20T07:41:46.515440592Z 2024-08-20T07:41:46.515446092Z Subtitles are disabled for this video2024-08-20T07:41:46.515451692Z 2024-08-20T07:41:46.515457192Z If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!2024-08-20T07:41:46.517604192Z [ANONYMIZED_IP] - - [20/Aug/2024:07:41:46 +0000] "POST /generate/youtubeSummary/ HTTP/1.1" 200 789 "https://[ANONYMIZED_DOMAIN]/generate/youtubeSummary/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0"2024-08-20T07:41:51.462730956Z INFO 2024-08-20 07:41:51,462 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'2024-08-20T07:41:51.462753956Z Request method: 'POST'2024-08-20T07:41:51.462761756Z Request headers:2024-08-20T07:41:51.462838355Z     'Content-Type': 'application/json'2024-08-20T07:41:51.462846755Z     'Content-Length': '1124'2024-08-20T07:41:51.462853355Z     'Accept': 'application/json'2024-08-20T07:41:51.462870155Z     'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'2024-08-20T07:41:51.462877055Z     'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'2024-08-20T07:41:51.462883755Z A body is sent with the request2024-08-20T07:41:51.467871320Z INFO 2024-08-20 07:41:51,467 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'2024-08-20T07:41:51.467899520Z Request method: 'POST'2024-08-20T07:41:51.467909320Z Request headers:2024-08-20T07:41:51.467952720Z     'Content-Type': 'application/json'2024-08-20T07:41:51.467962219Z     'Content-Length': '2397'2024-08-20T07:41:51.467968519Z     'Accept': 'application/json'2024-08-20T07:41:51.467974719Z     'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'2024-08-20T07:41:51.467981319Z     'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'2024-08-20T07:41:51.467987919Z A body is sent with the request2024-08-20T07:41:51.472131390Z INFO 2024-08-20 07:41:51,471 _universal Response status: 2002024-08-20T07:41:51.472146690Z Response headers:2024-08-20T07:41:51.472154290Z     'Transfer-Encoding': 'chunked'2024-08-20T07:41:51.472160590Z     'Content-Type': 'application/json; charset=utf-8'2024-08-20T07:41:51.472167190Z     'Server': 'Microsoft-HTTPAPI/2.0'2024-08-20T07:41:51.472195890Z     'Strict-Transport-Security': 'REDACTED'2024-08-20T07:41:51.472204590Z     'X-Content-Type-Options': 'REDACTED'2024-08-20T07:41:51.472210890Z     'Date': 'Tue, 20 Aug 2024 07:41:50 GMT'2024-08-20T07:41:51.472633987Z INFO 2024-08-20 07:41:51,472 _base Transmission succeeded: Item received: 2. Items accepted: 22024-08-20T07:41:51.479943736Z INFO 2024-08-20 07:41:51,479 _universal Response status: 2002024-08-20T07:41:51.479965136Z Response headers:2024-08-20T07:41:51.479973036Z     'Transfer-Encoding': 'chunked'2024-08-20T07:41:51.479980236Z     'Content-Type': 'application/json; charset=utf-8'2024-08-20T07:41:51.479987236Z     'Server': 'Microsoft-HTTPAPI/2.0'2024-08-20T07:41:51.479995336Z     'Strict-Transport-Security': 'REDACTED'2024-08-20T07:41:51.480004735Z     'X-Content-Type-Options': 'REDACTED'2024-08-20T07:41:51.480012935Z     'Date': 'Tue, 20 Aug 2024 07:41:50 GMT'2024-08-20T07:41:51.480649231Z INFO 2024-08-20 07:41:51,480 _base Transmission succeeded: Item received: 1. Items accepted: 12024-08-20T07:43:41  No new trace in the past 1 min(s).2024-08-20T07:44:41  No new trace in the past 2 min(s).

I know that my code is fine as it works in non-production.

def generate_youtube_summary(video_url, custom_form_options=None):    logger.info(f"Generating summary for video URL: {video_url}")    connectivity_results = test_youtube_connectivity()    if not (connectivity_results["dns_resolution"] and connectivity_results["connection_status"]):        logger.error("YouTube connectivity check failed. Details: %s", connectivity_results)        return "Unable to connect to YouTube. Please check your internet connection and try again."    logger.info(f"Attempting to connect to: {urlparse(video_url).netloc}")    video_id = None    if 'youtu.be/' in video_url:        video_id = video_url.split('youtu.be/')[1]    elif 'youtube.com/watch?v=' in video_url:        video_id = video_url.split('v=')[1]    elif 'youtube.com/embed/' in video_url:        video_id = video_url.split('embed/')[1]    logger.info(f"Extracted video ID: {video_id}")    if video_id:        try:            transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)            logger.info(f"Retrieved transcript list for video ID: {video_id}")            transcript = transcript_list.find_transcript(['en'])            logger.info("Found English transcript")            transcript_data = transcript.fetch()            logger.info("Fetched transcript data")            transcript_text = ''.join([entry['text'] for entry in transcript_data])            logger.info(f"Extracted transcript text (first 100 chars): {transcript_text[:100]}...")            summary_prompt = f"""<role>YouTube Video Summarizer</role> """            logger.info("Sending summary prompt to process_text function")            summary = process_text(summary_prompt)            logger.info(f"Received summary from process_text (first 100 chars): {summary[:100]}...")            return summary        except Exception as e:            logger.error(f"Error generating YouTube summary: {str(e)}", exc_info=True)            if "TranscriptsDisabled" in str(e):                return "Unable to generate summary. Subtitles are disabled for this video."            elif "NoTranscriptFound" in str(e):                return "No transcript found for this video. It may not have subtitles available."            else:                return f"Failed to generate video summary. Error: {str(e)}"    else:        logger.warning(f"Invalid YouTube video URL: {video_url}")        return "Invalid YouTube video URL. Please provide a valid URL."

According to the logs and the fact that my OpenAI API works, it can't be a networking issue.

In addition to solving that, I'm quite curious why this is the case.

Debugging / logging.

Checking networking settings.

Note it says subtitles are disabled in this video. However, I can confirm they are not - this seems to be a blanket error message thrown at any video.


Viewing all articles
Browse latest Browse all 3637

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>