Okay, please bear with me as this is basically a capstone project of mine in my university and the long and short objective of this project of ours is to make a program that will fetch 3 random comments from a youtube link entered by the user and then two models that we have trained (BERT model and Naive-Bayes model) will be used to give these comments sentiment ratings. I am basically debugging a problem of our program where it seems like that once the user has entered the link of the video the user wants, a TypeError is returned. This is ran on python 3.10 by the way.
Before reading the errors and the program, please understand that I am very new in programming and this is my first real project that I am making. So, a detailed answer would be helpful. I also understand that this maybe a very specific field that a few people may have knowledge about.
This is the error:
Traceback (most recent call last): File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 2213, in __call__ return self.wsgi_app(environ, start_response) File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 2193, in wsgi_app response = self.handle_exception(e) File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 2190, in wsgi_app response = self.full_dispatch_request() File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 1486, in full_dispatch_request rv = self.handle_user_exception(e) File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 1484, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 1469, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "c:\Users\william\Documents\GitHub\working app v2\app.py", line 96, in get_comments sentiment = analyze_sentiment(comment_text) File "c:\Users\william\Documents\GitHub\working app v2\app.py", line 62, in analyze_sentiment input_ids = tokenizer.encode( File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2332, in encode encoded_inputs = self.encode_plus( File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2740, in encode_plus return self._encode_plus( File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\transformers\tokenization_utils_fast.py", line 497, in _encode_plus batched_output = self._batch_encode_plus( File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\transformers\tokenization_utils_fast.py", line 425, in _batch_encode_plus encodings = self._tokenizer.encode_batch(TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]I will not be adding the script I used as it will lengthen the post, but I will be posting the main file where we run the program. In case the scripts for generating the models are needed, I will post them on a reply separate from this post.
Here is the main app.py file:
from flask import Flask, render_template, requestimport requestsimport jsonimport reimport nltkfrom nltk.sentiment import SentimentIntensityAnalyzerfrom nltk.tokenize import word_tokenizeimport stringimport pickleimport numpy as npimport tensorflow as tffrom transformers import TFAutoModelForSequenceClassification, AutoTokenizerfrom transformers import TFBertModelimport joblibapp = Flask(__name__)nltk.download('vader_lexicon') # Download the VADER lexicontokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')tf.keras.utils.register_keras_serializable(TFBertModel)bert_model = tf.keras.models.load_model(r'C:\Users\william\Documents\GitHub\working app v2\bert_model.h5', custom_objects={'TFBertModel': TFBertModel})# Load the Naive-Bayes modelnaive_bayes_model = joblib.load('naive_bayes_model.pkl')def preprocess_text(text): # Convert to lowercase text = text.lower() # Remove special characters and links text = re.sub(r'http\S+|www\S+|https\S+', '', text) text = re.sub(r'[%s]' % re.escape(string.punctuation), '', text) text = re.sub(r'[^a-zA-z\s]', '', text) # Remove emojis emoji_pattern = re.compile("[" u"\U0001F600-\U0001F64F" # emoticons u"\U0001F300-\U0001F5FF" # symbols & pictographs u"\U0001F680-\U0001F6FF" # transport & map symbols u"\U00002702-\U000027B0" u"\U000024C2-\U0001F251""]+", flags=re.UNICODE) text = emoji_pattern.sub(r'', text) # Tokenize and POS tag the cleaned text tokenized_text = word_tokenize(text) tokenized_text = nltk.pos_tag(tokenized_text) return tokenized_textdef extract_features(text): features = {} for word, pos in text: features[word] = pos return featuresdef analyze_sentiment(text): tokenized_text = tokenizer.tokenize(text) input_ids = tokenizer.encode( tokenized_text, add_special_tokens=True, padding='longest', truncation=True, max_length=512, return_tensors='tf' ) logits = bert_model.predict(input_ids)[0] sentiment = np.argmax(logits) return sentiment@app.route('/')def index(): return render_template('index.html')@app.route('/get_comments', methods=['POST'])def get_comments(): video_link = request.form['video_link'] video_id = extract_video_id(video_link) api_key = 'AIzaSyA3MDCEyMo2s4ltzkx1X2sCqV75secjOOQ' video_url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet&id={video_id}&key={api_key}' video_response = requests.get(video_url) video_data = json.loads(video_response.text) video_title = video_data['items'][0]['snippet']['title'] comments_url = f'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId={video_id}&key={api_key}' comments_response = requests.get(comments_url) comments_data = json.loads(comments_response.text) comments = [] for item in comments_data['items'][:3]: comment_text = item['snippet']['topLevelComment']['snippet']['textDisplay'] sentiment = analyze_sentiment(comment_text) comments.append({'comment': comment_text,'sentiment': sentiment }) return render_template('index.html', video_title=video_title, comments=comments)def extract_video_id(video_link): video_id = None patterns = [ r'youtu\.be/([^\?]+)', r'youtube\.com/watch\?v=([^\&]+)', r'youtube\.com/embed/([^\?]+)', r'youtube\.com/v/([^\?]+)', r'youtube\.com/user/[^\?]+/\?v=([^\&]+)', r'youtube\.com/[^\?]+\?v=([^\&]+)' ] for pattern in patterns: match = re.search(pattern, video_link) if match: video_id = match.group(1) break return video_idif __name__ == '__main__': app.run(debug=True)Thank you for reading this very long post. Sorry if it was a massive wall of text.
So far, I have tried to modify the def analyze_sentiment code block, but that seemed to just not fix it.