In my non-production environment, I am able to use the transcript YouTube API to obtain transcripts.
In my production environment, after much debugging and logging, I am unable to do this. Here are the logs:
2024-08-20T07:41:29.989747260Z [ANONYMIZED_IP] - - [20/Aug/2024:07:41:29 +0000] "GET /generate/youtubeSummary/ HTTP/1.1" 200 30723 "https://[ANONYMIZED_DOMAIN]/dashboard/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0"2024-08-20T07:41:45.777775814Z Custom form options: {}2024-08-20T07:41:45.778110014Z Form data debug: {'grade_level': '', 'video_url': 'https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]', 'summary_length': None}2024-08-20T07:41:45.778131714Z INFO 2024-08-20 07:41:45,777 views Generating summary for video URL: https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]2024-08-20T07:41:45.781101714Z INFO 2024-08-20 07:41:45,780 views YouTube IP address: [ANONYMIZED_IP]2024-08-20T07:41:45.979793308Z INFO 2024-08-20 07:41:45,979 views YouTube connection status: 2002024-08-20T07:41:45.980433708Z INFO 2024-08-20 07:41:45,980 views Attempting to connect to: www.youtube.com2024-08-20T07:41:45.980820208Z INFO 2024-08-20 07:41:45,980 views Extracted video ID: [ANONYMIZED_VIDEO_ID]2024-08-20T07:41:46.463787194Z INFO 2024-08-20 07:41:46,463 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'2024-08-20T07:41:46.463807494Z Request method: 'POST'2024-08-20T07:41:46.463815194Z Request headers:2024-08-20T07:41:46.463822194Z 'Content-Type': 'application/json'2024-08-20T07:41:46.463830194Z 'Content-Length': '2373'2024-08-20T07:41:46.463840094Z 'Accept': 'application/json'2024-08-20T07:41:46.463847194Z 'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'2024-08-20T07:41:46.463853694Z 'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'2024-08-20T07:41:46.463863894Z A body is sent with the request2024-08-20T07:41:46.485539393Z INFO 2024-08-20 07:41:46,485 _universal Response status: 2002024-08-20T07:41:46.485558093Z Response headers:2024-08-20T07:41:46.485565793Z 'Transfer-Encoding': 'chunked'2024-08-20T07:41:46.485600593Z 'Content-Type': 'application/json; charset=utf-8'2024-08-20T07:41:46.485609693Z 'Server': 'Microsoft-HTTPAPI/2.0'2024-08-20T07:41:46.485616293Z 'Strict-Transport-Security': 'REDACTED'2024-08-20T07:41:46.485622793Z 'X-Content-Type-Options': 'REDACTED'2024-08-20T07:41:46.485629293Z 'Date': 'Tue, 20 Aug 2024 07:41:45 GMT'2024-08-20T07:41:46.486316593Z INFO 2024-08-20 07:41:46,485 _base Transmission succeeded: Item received: 2. Items accepted: 22024-08-20T07:41:46.515040992Z ERROR 2024-08-20 07:41:46,513 views Error generating YouTube summary: 2024-08-20T07:41:46.515060292Z Could not retrieve a transcript for the video https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]! This is most likely caused by:2024-08-20T07:41:46.515068192Z 2024-08-20T07:41:46.515203692Z Subtitles are disabled for this video2024-08-20T07:41:46.515214292Z 2024-08-20T07:41:46.515348292Z If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!2024-08-20T07:41:46.515375792Z Traceback (most recent call last):2024-08-20T07:41:46.515384092Z File "/tmp/[ANONYMIZED_PATH]/theDashboard/views.py", line 2002, in generate_youtube_summary2024-08-20T07:41:46.515390492Z transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)2024-08-20T07:41:46.515396292Z File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts2024-08-20T07:41:46.515401992Z return TranscriptListFetcher(http_client).fetch(video_id)2024-08-20T07:41:46.515407392Z File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch2024-08-20T07:41:46.515413192Z self._extract_captions_json(self._fetch_video_html(video_id), video_id),2024-08-20T07:41:46.515418692Z File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json2024-08-20T07:41:46.515424292Z raise TranscriptsDisabled(video_id)2024-08-20T07:41:46.515429592Z youtube_transcript_api._errors.TranscriptsDisabled: 2024-08-20T07:41:46.515434992Z Could not retrieve a transcript for the video https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]! This is most likely caused by:2024-08-20T07:41:46.515440592Z 2024-08-20T07:41:46.515446092Z Subtitles are disabled for this video2024-08-20T07:41:46.515451692Z 2024-08-20T07:41:46.515457192Z If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!2024-08-20T07:41:46.517604192Z [ANONYMIZED_IP] - - [20/Aug/2024:07:41:46 +0000] "POST /generate/youtubeSummary/ HTTP/1.1" 200 789 "https://[ANONYMIZED_DOMAIN]/generate/youtubeSummary/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0"2024-08-20T07:41:51.462730956Z INFO 2024-08-20 07:41:51,462 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'2024-08-20T07:41:51.462753956Z Request method: 'POST'2024-08-20T07:41:51.462761756Z Request headers:2024-08-20T07:41:51.462838355Z 'Content-Type': 'application/json'2024-08-20T07:41:51.462846755Z 'Content-Length': '1124'2024-08-20T07:41:51.462853355Z 'Accept': 'application/json'2024-08-20T07:41:51.462870155Z 'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'2024-08-20T07:41:51.462877055Z 'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'2024-08-20T07:41:51.462883755Z A body is sent with the request2024-08-20T07:41:51.467871320Z INFO 2024-08-20 07:41:51,467 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'2024-08-20T07:41:51.467899520Z Request method: 'POST'2024-08-20T07:41:51.467909320Z Request headers:2024-08-20T07:41:51.467952720Z 'Content-Type': 'application/json'2024-08-20T07:41:51.467962219Z 'Content-Length': '2397'2024-08-20T07:41:51.467968519Z 'Accept': 'application/json'2024-08-20T07:41:51.467974719Z 'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'2024-08-20T07:41:51.467981319Z 'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'2024-08-20T07:41:51.467987919Z A body is sent with the request2024-08-20T07:41:51.472131390Z INFO 2024-08-20 07:41:51,471 _universal Response status: 2002024-08-20T07:41:51.472146690Z Response headers:2024-08-20T07:41:51.472154290Z 'Transfer-Encoding': 'chunked'2024-08-20T07:41:51.472160590Z 'Content-Type': 'application/json; charset=utf-8'2024-08-20T07:41:51.472167190Z 'Server': 'Microsoft-HTTPAPI/2.0'2024-08-20T07:41:51.472195890Z 'Strict-Transport-Security': 'REDACTED'2024-08-20T07:41:51.472204590Z 'X-Content-Type-Options': 'REDACTED'2024-08-20T07:41:51.472210890Z 'Date': 'Tue, 20 Aug 2024 07:41:50 GMT'2024-08-20T07:41:51.472633987Z INFO 2024-08-20 07:41:51,472 _base Transmission succeeded: Item received: 2. Items accepted: 22024-08-20T07:41:51.479943736Z INFO 2024-08-20 07:41:51,479 _universal Response status: 2002024-08-20T07:41:51.479965136Z Response headers:2024-08-20T07:41:51.479973036Z 'Transfer-Encoding': 'chunked'2024-08-20T07:41:51.479980236Z 'Content-Type': 'application/json; charset=utf-8'2024-08-20T07:41:51.479987236Z 'Server': 'Microsoft-HTTPAPI/2.0'2024-08-20T07:41:51.479995336Z 'Strict-Transport-Security': 'REDACTED'2024-08-20T07:41:51.480004735Z 'X-Content-Type-Options': 'REDACTED'2024-08-20T07:41:51.480012935Z 'Date': 'Tue, 20 Aug 2024 07:41:50 GMT'2024-08-20T07:41:51.480649231Z INFO 2024-08-20 07:41:51,480 _base Transmission succeeded: Item received: 1. Items accepted: 12024-08-20T07:43:41 No new trace in the past 1 min(s).2024-08-20T07:44:41 No new trace in the past 2 min(s).
I know that my code is fine as it works in non-production.
def generate_youtube_summary(video_url, custom_form_options=None): logger.info(f"Generating summary for video URL: {video_url}") connectivity_results = test_youtube_connectivity() if not (connectivity_results["dns_resolution"] and connectivity_results["connection_status"]): logger.error("YouTube connectivity check failed. Details: %s", connectivity_results) return "Unable to connect to YouTube. Please check your internet connection and try again." logger.info(f"Attempting to connect to: {urlparse(video_url).netloc}") video_id = None if 'youtu.be/' in video_url: video_id = video_url.split('youtu.be/')[1] elif 'youtube.com/watch?v=' in video_url: video_id = video_url.split('v=')[1] elif 'youtube.com/embed/' in video_url: video_id = video_url.split('embed/')[1] logger.info(f"Extracted video ID: {video_id}") if video_id: try: transcript_list = YouTubeTranscriptApi.list_transcripts(video_id) logger.info(f"Retrieved transcript list for video ID: {video_id}") transcript = transcript_list.find_transcript(['en']) logger.info("Found English transcript") transcript_data = transcript.fetch() logger.info("Fetched transcript data") transcript_text = ''.join([entry['text'] for entry in transcript_data]) logger.info(f"Extracted transcript text (first 100 chars): {transcript_text[:100]}...") summary_prompt = f"""<role>YouTube Video Summarizer</role> """ logger.info("Sending summary prompt to process_text function") summary = process_text(summary_prompt) logger.info(f"Received summary from process_text (first 100 chars): {summary[:100]}...") return summary except Exception as e: logger.error(f"Error generating YouTube summary: {str(e)}", exc_info=True) if "TranscriptsDisabled" in str(e): return "Unable to generate summary. Subtitles are disabled for this video." elif "NoTranscriptFound" in str(e): return "No transcript found for this video. It may not have subtitles available." else: return f"Failed to generate video summary. Error: {str(e)}" else: logger.warning(f"Invalid YouTube video URL: {video_url}") return "Invalid YouTube video URL. Please provide a valid URL."
According to the logs and the fact that my OpenAI API works, it can't be a networking issue.
In addition to solving that, I'm quite curious why this is the case.
Debugging / logging.
Checking networking settings.
Note it says subtitles are disabled in this video. However, I can confirm they are not - this seems to be a blanket error message thrown at any video.