Quantcast
Channel: Active questions tagged youtube-api - Stack Overflow
Viewing all articles
Browse latest Browse all 3831

Having problems in displaying youtube comment sentiment analysis

$
0
0

Okay, please bear with me as this is basically a capstone project of mine in my university and the long and short objective of this project of ours is to make a program that will fetch 3 random comments from a youtube link entered by the user and then two models that we have trained (BERT model and Naive-Bayes model) will be used to give these comments sentiment ratings. I am basically debugging a problem of our program where it seems like that once the user has entered the link of the video the user wants, a TypeError is returned. This is ran on python 3.10 by the way.

Before reading the errors and the program, please understand that I am very new in programming and this is my first real project that I am making. So, a detailed answer would be helpful. I also understand that this maybe a very specific field that a few people may have knowledge about.

This is the error:

Traceback (most recent call last):  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 2213, in __call__    return self.wsgi_app(environ, start_response)  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 2193, in wsgi_app    response = self.handle_exception(e)  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 2190, in wsgi_app    response = self.full_dispatch_request()  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 1486, in full_dispatch_request    rv = self.handle_user_exception(e)  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 1484, in full_dispatch_request    rv = self.dispatch_request()  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\flask\app.py", line 1469, in dispatch_request    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  File "c:\Users\william\Documents\GitHub\working app v2\app.py", line 96, in get_comments    sentiment = analyze_sentiment(comment_text)  File "c:\Users\william\Documents\GitHub\working app v2\app.py", line 62, in analyze_sentiment    input_ids = tokenizer.encode(  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2332, in encode    encoded_inputs = self.encode_plus(  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\transformers\tokenization_utils_base.py", line 2740, in encode_plus    return self._encode_plus(  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\transformers\tokenization_utils_fast.py", line 497, in _encode_plus    batched_output = self._batch_encode_plus(  File "C:\Users\william\Documents\GitHub\working app v2\.venv\lib\site-packages\transformers\tokenization_utils_fast.py", line 425, in _batch_encode_plus    encodings = self._tokenizer.encode_batch(TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

I will not be adding the script I used as it will lengthen the post, but I will be posting the main file where we run the program. In case the scripts for generating the models are needed, I will post them on a reply separate from this post.

Here is the main app.py file:

from flask import Flask, render_template, requestimport requestsimport jsonimport reimport nltkfrom nltk.sentiment import SentimentIntensityAnalyzerfrom nltk.tokenize import word_tokenizeimport stringimport pickleimport numpy as npimport tensorflow as tffrom transformers import TFAutoModelForSequenceClassification, AutoTokenizerfrom transformers import TFBertModelimport joblibapp = Flask(__name__)nltk.download('vader_lexicon')  # Download the VADER lexicontokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')tf.keras.utils.register_keras_serializable(TFBertModel)bert_model = tf.keras.models.load_model(r'C:\Users\william\Documents\GitHub\working app v2\bert_model.h5', custom_objects={'TFBertModel': TFBertModel})# Load the Naive-Bayes modelnaive_bayes_model = joblib.load('naive_bayes_model.pkl')def preprocess_text(text):    # Convert to lowercase    text = text.lower()    # Remove special characters and links    text = re.sub(r'http\S+|www\S+|https\S+', '', text)    text = re.sub(r'[%s]' % re.escape(string.punctuation), '', text)    text = re.sub(r'[^a-zA-z\s]', '', text)    # Remove emojis    emoji_pattern = re.compile("["        u"\U0001F600-\U0001F64F"  # emoticons        u"\U0001F300-\U0001F5FF"  # symbols & pictographs        u"\U0001F680-\U0001F6FF"  # transport & map symbols        u"\U00002702-\U000027B0"        u"\U000024C2-\U0001F251""]+", flags=re.UNICODE)    text = emoji_pattern.sub(r'', text)    # Tokenize and POS tag the cleaned text    tokenized_text = word_tokenize(text)    tokenized_text = nltk.pos_tag(tokenized_text)    return tokenized_textdef extract_features(text):    features = {}    for word, pos in text:        features[word] = pos    return featuresdef analyze_sentiment(text):    tokenized_text = tokenizer.tokenize(text)    input_ids = tokenizer.encode(        tokenized_text,        add_special_tokens=True,        padding='longest',        truncation=True,        max_length=512,        return_tensors='tf'    )    logits = bert_model.predict(input_ids)[0]    sentiment = np.argmax(logits)    return sentiment@app.route('/')def index():    return render_template('index.html')@app.route('/get_comments', methods=['POST'])def get_comments():    video_link = request.form['video_link']    video_id = extract_video_id(video_link)    api_key = 'AIzaSyA3MDCEyMo2s4ltzkx1X2sCqV75secjOOQ'    video_url = f'https://www.googleapis.com/youtube/v3/videos?part=snippet&id={video_id}&key={api_key}'    video_response = requests.get(video_url)    video_data = json.loads(video_response.text)    video_title = video_data['items'][0]['snippet']['title']    comments_url = f'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId={video_id}&key={api_key}'    comments_response = requests.get(comments_url)    comments_data = json.loads(comments_response.text)    comments = []    for item in comments_data['items'][:3]:        comment_text = item['snippet']['topLevelComment']['snippet']['textDisplay']        sentiment = analyze_sentiment(comment_text)        comments.append({'comment': comment_text,'sentiment': sentiment        })    return render_template('index.html', video_title=video_title, comments=comments)def extract_video_id(video_link):    video_id = None    patterns = [        r'youtu\.be/([^\?]+)',        r'youtube\.com/watch\?v=([^\&]+)',        r'youtube\.com/embed/([^\?]+)',        r'youtube\.com/v/([^\?]+)',        r'youtube\.com/user/[^\?]+/\?v=([^\&]+)',        r'youtube\.com/[^\?]+\?v=([^\&]+)'    ]    for pattern in patterns:        match = re.search(pattern, video_link)        if match:            video_id = match.group(1)            break    return video_idif __name__ == '__main__':    app.run(debug=True)

Thank you for reading this very long post. Sorry if it was a massive wall of text.

So far, I have tried to modify the def analyze_sentiment code block, but that seemed to just not fix it.


Viewing all articles
Browse latest Browse all 3831

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>