I downloaded json file responses from YouTube API v3 calls to the videos endpoint See data structure example for around 7500 videos, I made around 150 calls using 50 videos each (list).
yt_jsons_ <- purrr::map(list.files('./yt_calls_wget/',full.names = TRUE) , fromJSON)I get a large list in R, containing all the json calls merged.
View(yt_jsons_[[1]][["items"]])where 1 is the first call of 150 containing 50 (videos) observations. The data that im looking for is in these fields
yt_jsons_[[1]][["items"]][["snippet"]][["publishedAt"]][1] "2022-03-02T14:24:12Z" "2022-03-04T15:50:12Z" "2022-02-20T18:20:08Z"[4] "2021-11-17T16:43:40Z" "2022-03-24T17:33:27Z" "2022-03-06T16:14:41Z"[7] "2022-03-24T13:50:27Z" "2021-11-20T19:13:58Z" "2022-02-06T15:00:41Z"[10] "2021-07-14T09:00:22Z" "2022-01-19T12:37:25Z" "2022-03-05T01:24:01Z"[13] "2022-01-19T22:00:05Z" "2021-11-30T14:49:18Z" "2021-12-28T02:45:01Z"[16] "2022-03-25T15:02:42Z" "2021-12-22T21:09:51Z" "2022-02-26T00:50:24Z"[19] "2021-05-12T18:38:18Z" "2021-10-26T15:30:00Z" "2022-02-16T15:59:33Z"[22] "2022-01-07T18:50:45Z" "2022-03-26T15:59:18Z" "2022-01-31T22:02:17Z"[25] "2021-12-11T18:53:35Z" "2021-04-24T21:35:39Z" "2022-02-02T20:55:40Z"[28] "2021-12-27T23:00:11Z" "2022-03-27T09:36:32Z" "2022-03-24T19:48:14Z"[31] "2021-06-01T02:00:01Z" "2021-03-25T11:08:29Z" "2021-12-09T21:00:26Z"[34] "2022-03-30T13:00:31Z" "2022-03-22T14:25:49Z" "2022-01-07T19:30:12Z"[37] "2022-03-26T19:09:17Z" "2021-12-09T01:43:56Z" "2022-01-20T20:00:05Z"[40] "2022-02-01T01:00:28Z" "2022-02-23T20:03:27Z" "2022-02-23T22:59:23Z"[43] "2022-01-22T14:41:10Z" "2022-02-11T20:00:05Z"yt_jsons_[[1]][["items"]][["statistics"]] viewCount likeCount favoriteCount commentCount1 3715 160 0 172 14313 876 0 493 17376 274 0 1274 7584 338 0 645 13422 508 0 436 1535743 33215 0 14647 8493 752 0 788 <NA> 25 0 09 104 5 0 010 50342 2995 0 14811 126 14 0 012 189 2 0 113 25 4 0 014 258485 5876 0 70015 11063 1221 0 18816 12825 488 0 15517 7594 420 0 518 79783 5689 0 20219 2121226 16016 0 177720 7453 577 0 5421 22329 1098 0 10322 83 4 0 023 14329 571 0 8324 121 6 0 025 3120 465 0 2826 1103 <NA> 0 1027 23139 140 0 628 1189 112 0 1529 46067 2319 0 28830 39595 3852 0 21531 19738 298 0 1532 494391 17683 0 39733 7080 262 0 4334 71213 4917 0 36335 4793 41 0 936 42288 2786 0 87137 5497 272 0 5438 883 47 0 2039 3492 209 0 740 11285 302 0 3141 22962 1772 0 23542 387 18 0 043 1652 95 0 2744 27904 1189 0 98yt_jsons_[[1]][["items"]][["topicDetails"]]topicCategories1 https://en.wikipedia.org/wiki/Action-adventure_game, https://en.wikipedia.org/wiki/Action_game, https://en.wikipedia.org/wiki/Video_game_culture2 NULL3 https://en.wikipedia.org/wiki/Lifestyle_(sociology), https://en.wikipedia.org/wiki/Vehicle4 https://en.wikipedia.org/wiki/Society5 https://en.wikipedia.org/wiki/Society6 https://en.wikipedia.org/wiki/Lifestyle_(sociology)7 https://en.wikipedia.org/wiki/Society8 https://en.wikipedia.org/wiki/Lifestyle_(sociology)9 https://en.wikipedia.org/wiki/Lifestyle_(sociology)10 https://en.wikipedia.org/wiki/Society11 https://en.wikipedia.org/wiki/Lifestyle_(sociology)12 https://en.wikipedia.org/wiki/Society13 https://en.wikipedia.org/wiki/Lifestyle_(sociology)14 https://en.wikipedia.org/wiki/Lifestyle_(sociology), https://en.wikipedia.org/wiki/Vehicle15 https://en.wikipedia.org/wiki/Politics, https://en.wikipedia.org/wiki/Society16 https://en.wikipedia.org/wiki/Society17 https://en.wikipedia.org/wiki/Society18 https://en.wikipedia.org/wiki/Society19 https://en.wikipedia.org/wiki/Lifestyle_(sociology), https://en.wikipedia.org/wiki/Vehicle20 https://en.wikipedia.org/wiki/Society21 https://en.wikipedia.org/wiki/Lifestyle_(sociology), https://en.wikipedia.org/wiki/Vehicle22 https://en.wikipedia.org/wiki/Lifestyle_(sociology)23 https://en.wikipedia.org/wiki/Society24 https://en.wikipedia.org/wiki/Lifestyle_(sociology), https://en.wikipedia.org/wiki/Technology25 https://en.wikipedia.org/wiki/Society26 https://en.wikipedia.org/wiki/Lifestyle_(sociology)27 https://en.wikipedia.org/wiki/Lifestyle_(sociology)28 https://en.wikipedia.org/wiki/Society29 NULL30 https://en.wikipedia.org/wiki/Society31 NULL32 https://en.wikipedia.org/wiki/Lifestyle_(sociology)33 https://en.wikipedia.org/wiki/Society34 NULL35 https://en.wikipedia.org/wiki/Society36 https://en.wikipedia.org/wiki/Society37 https://en.wikipedia.org/wiki/Lifestyle_(sociology), https://en.wikipedia.org/wiki/Motorsport, https://en.wikipedia.org/wiki/Racing_video_game, https://en.wikipedia.org/wiki/Vehicle38 https://en.wikipedia.org/wiki/Society39 https://en.wikipedia.org/wiki/Music40 https://en.wikipedia.org/wiki/Lifestyle_(sociology)41 https://en.wikipedia.org/wiki/Society42 https://en.wikipedia.org/wiki/Society43 https://en.wikipedia.org/wiki/Lifestyle_(sociology)44 https://en.wikipedia.org/wiki/Lifestyle_(sociology), https://en.wikipedia.org/wiki/VehicleI tried
# this did not work.. SOtry <- yt_jsons_ %>% map(unlist) %>% map(as_tibble) %>% bind_rows()Sotry# A tibble: 401,564 × 1 value <chr> 1 youtube#videoListResponse 2 5OfTreCNJKiXdr9MzkeSZpaSubk 3 youtube#video 4 youtube#video 5 youtube#video 6 youtube#video 7 youtube#video 8 youtube#video 9 youtube#video 10 youtube#video # ℹ 401,554 more rows# ℹ Use `print(n = ...)` to see more rowsbut it did not work. I used this block for the Reddit idt3 See data structure example json files, and it worked. But those were one post per file, in the YouTube json files I have a list of posts.
I then tried this
# not yet...yt_not_quite <- as.data.frame(t(sapply(yt_jsons_, function(x) c(x$items[["statistics"]],x$items[["snippet"]]))))and I do get all the columns I'd like but
head(yt_not_quite$viewCount)[[1]] [1] "3715" "14313" "17376" "7584" "13422" "1535743" "8493" [8] NA "104" "50342" "126" "189" "25" "258485" [15] "11063" "12825" "7594" "79783" "2121226" "7453" "22329" [22] "83" "14329" "121" "3120" "1103" "23139" "1189" [29] "46067" "39595" "19738" "494391" "7080" "71213" "4793" [36] "42288" "5497" "883" "3492" "11285" "22962" "387" [43] "1652" "27904" [[2]] [1] "473" "9255" "229601" "757" "13341" "173" [7] "19675" "12506" "206659" "642" "23756" "87018" [13] "10920" "10" "2144" "29667" "485063" "436" [19] "1064" "35040" "225344" "585" "66893" "1481145" [25] "636470" "10813623" "3727" "27" "1514" "6489" [31] "435" "38213" "55" "23821" "1116" "3584" [37] "13142" "271" "1254" "18886" "16313" "744081" [43] "47812" "4122" "304" "886" "654" "673016" [[3]] [1] "47164" "2660" "123" "4952" "28" "5194541" "40587" [8] "55" "341545" "32999" "28410" "232312" "78703" "52012" [15] "107539" "233602" "1537" "97889" "291250" "1742" "4331606"[22] "58315" "255241" "18942" "13852" "4636" "1147" "49748" [29] "71897" "578" "565" "209506" "770" "514" "228" [[4]] [1] "79987" "5857" "72042" "11751" "3201" "6849" "71014" [8] "192" "9420" "155120" "14011" "253221" "5209" "1968" [15] "10197" "81255" "856" "5152" "5353" "1600" "242" [22] "20135" "60032" "8127" "17766" "1" "17014" "807" [29] "191783" "4903648" "1525" "730" "39839" "104" "12428" [36] "3139" "11199" "9611" "13012" "46456" "211132" "3520" [43] "468" "124582" "18337" "3266" [[5]] [1] "153416" "6984" "13870" "40215" "6557" "2818" "343829" "4382" [9] "21651" "2370" "3888" "10953" "20003" "8644" NA "213" [17] "807179" "20361" "21390" "169421" "8766" "53595" "340" "10506" [25] "970" "16253" [[6]] [1] "2234" "19093" "104322" "44101" "1309843" "4864" "17798" [8] "4485" "23376" "963159" "10385" "6974" "45116" "106527" [15] "860" "251528" "13" "735" "16355" "3566" "50150" [22] "3398" "83611" "19595" "2564" "24" "11016" "40530" [29] "137" "22480" "6869" "2546" "171143" "321389" "58745" [36] "5275" "29680" "26941" "13220" "81599" "42311" "135" [43] "2243" "53860" "50783" "3897" typeof(yt_not_quite)[1] "list"i still have each json call 'encapsulated' by row (the legth of rows is equal to the number of calls i imported, the fields for each column are vectors of the list of videos I queried each call).
I can extract each call individually like
View(as.data.frame(yt_not_quite[1,1],col.names = colnames(yt_not_quite[1])))as.data.frame(yt_not_quite[1,1],col.names = colnames(yt_not_quite[1])) viewCount1 37152 143133 173764 75845 134226 15357437 84938 <NA>9 10410 5034211 12612 18913 2514 25848515 1106316 1282517 759418 7978319 212122620 745321 2232922 8323 1432924 12125 312026 110327 2313928 118929 4606730 3959531 1973832 49439133 708034 7121335 479336 4228837 549738 88339 349240 1128541 2296242 38743 165244 27904But I should be applying this to multiple columns and across all n rows and rbind them together, to get all of the 7500 videos...
Anyway, I wonder if there is a better way to convert these to a dataframe. I am trying to keep the memory usage processes under 1GB for shinyapps