Reply to thread

Message

<blockquote data-quote="percocet" data-source="post: 402156" data-attributes="member: 71898">I'm trying to make a python app that extracts all of the youtube titles of a youtube channel's videos.I'm currently attempting to do it using selenium.[code]def getVideoTitles():&nbsp;&nbsp; driver = webdriver.Chrome(&quot;/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver&quot;)&nbsp;&nbsp; driver.get(googleYoutubePage())&nbsp;&nbsp; titleElement = driver.find_element_by_class_name(&quot;yt-lockup-content&quot;)&nbsp;&nbsp; print(titleElement.text) #it prints out title, + views, hours ago, and &quot;CC&quot;&nbsp; &nbsp; #I suck at selenium so lets just store the title and cut everything after it[/code]The class_name yt-lockup-content is the class name for each video on a youtube channel's /videos page. In the code above I am able to get the title for the first youtube video on that page. But I want to iterate through all of the youtube titles (in other words, I want to iterate through every single yt-lockup-content element) in order to store the .text (which is the title of the video)But I was wondering how do I access the yt-lockup-content[2] persay. Which in other words would be the second video on that page, that has the same class name. Because each youtube video has the same class name.Here is my full code. Play with it if you'd like.Cheers,[code]''''''import seleniumfrom selenium import webdriverdef getChannelName():&nbsp; &nbsp; print(&quot;Please enter the channel that you would like to scrape video titles...&quot;)&nbsp; &nbsp; channelName = input()&nbsp; &nbsp; googleSearch = &quot;https://www.google.ca/search?q=%s+youtube&amp;oq=%s+youtube&amp;aqs=chrome..69i57j0l5.2898j0j4&amp;sourceid=chrome&amp;ie=UTF-8#q=%s+youtube&amp;*&quot; %(channelName, channelName, channelName)&nbsp; &nbsp; print(googleSearch)&nbsp; &nbsp; return googleSearchdef googleYoutubePage():&nbsp; &nbsp; driver = webdriver.Chrome(&quot;/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver&quot;)&nbsp; &nbsp; driver.get(getChannelName())&nbsp; &nbsp; element = driver.find_element_by_class_name(&quot;s&quot;) #this is where the link to the proper youtube page lives&nbsp; &nbsp; keys = element.text #this grabs the link to the youtube page + other crap that will be cut&nbsp; &nbsp; splitKeys = keys.split(&quot; &quot;) #this needs to be split, because aside from the link it grabs the page description, which we need to truncate&nbsp; &nbsp; linkToPage = splitKeys[0] #this is where the link lives&nbsp; &nbsp; for index, char in enumerate(linkToPage): #this loops over the link to find where the stuff beside the link begins (which is unecessary)&nbsp; &nbsp; &nbsp; &nbsp; if char == &quot;\n&quot;:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; extraCrapStartsHere = index #it starts here, we know everything beyond here can be cut&nbsp; &nbsp; link = &quot;&quot;&nbsp; &nbsp; for i in range(extraCrapStartsHere): #the offical link will be everything in the linkToPage up to where we found suitable to cut&nbsp; &nbsp; &nbsp; &nbsp; link = link + linkToPage[i]&nbsp; &nbsp; videosPage = link + &quot;/videos&quot;&nbsp; &nbsp; print(videosPage)&nbsp; &nbsp; return videosPagedef getVideoTitles():&nbsp; &nbsp; driver = webdriver.Chrome(&quot;/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver&quot;)&nbsp; &nbsp; driver.get(googleYoutubePage())&nbsp; &nbsp; titleElement = driver.find_element_by_class_name(&quot;yt-lockup-content&quot;)&nbsp; &nbsp; print(titleElement.text) #it prints out title, + views, hours ago, and &quot;CC&quot;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #I suck at selenium so lets just store the title and cut everything after itdef main():&nbsp; &nbsp; getVideoTitles()main()[/code][doublepost=1488486148,1488428358][/doublepost]Thanks for everyone that may have tried to answer this question.The answer lied mainly in finding the right element that held all of the title names. Which was a lot more work than it seems, considering how obfuscated youtube's web page.What I had to do was loop through every element, like so[code]while driver.find_element_by_class_name(&quot;yt-uix-button&quot;) is not False:&nbsp; &nbsp; for title in driver.find_elements_by_class_name(&quot;yt-uix-tile-link&quot;):&nbsp; &nbsp; &nbsp; &nbsp; print(title.text)[/code]That line of coded is added in my getVideoTitles function in replacement of the titleElement variable init.Cheers,[doublepost=1488486171][/doublepost]Thread can be closed by mods due to answer being found</blockquote>

[QUOTE="percocet, post: 402156, member: 71898"] I'm trying to make a python app that extracts all of the youtube titles of a youtube channel's videos. I'm currently attempting to do it using selenium. [code] def getVideoTitles(): driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver") driver.get(googleYoutubePage()) titleElement = driver.find_element_by_class_name("yt-lockup-content") print(titleElement.text) #it prints out title, + views, hours ago, and "CC" #I suck at selenium so lets just store the title and cut everything after it [/code] The class_name yt-lockup-content is the class name for each video on a youtube channel's /videos page. In the code above I am able to get the title for the first youtube video on that page. But I want to iterate through all of the youtube titles (in other words, I want to iterate through every single yt-lockup-content element) in order to store the .text (which is the title of the video) But I was wondering how do I access the yt-lockup-content[2] persay. Which in other words would be the second video on that page, that has the same class name. Because each youtube video has the same class name. Here is my full code. Play with it if you'd like. Cheers, [code] ''' ''' import selenium from selenium import webdriver def getChannelName(): print("Please enter the channel that you would like to scrape video titles...") channelName = input() googleSearch = "https://www.google.ca/search?q=%s+youtube&oq=%s+youtube&aqs=chrome..69i57j0l5.2898j0j4&sourceid=chrome&ie=UTF-8#q=%s+youtube&*" %(channelName, channelName, channelName) print(googleSearch) return googleSearch def googleYoutubePage(): driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver") driver.get(getChannelName()) element = driver.find_element_by_class_name("s") #this is where the link to the proper youtube page lives keys = element.text #this grabs the link to the youtube page + other crap that will be cut splitKeys = keys.split(" ") #this needs to be split, because aside from the link it grabs the page description, which we need to truncate linkToPage = splitKeys[0] #this is where the link lives for index, char in enumerate(linkToPage): #this loops over the link to find where the stuff beside the link begins (which is unecessary) if char == "\n": extraCrapStartsHere = index #it starts here, we know everything beyond here can be cut link = "" for i in range(extraCrapStartsHere): #the offical link will be everything in the linkToPage up to where we found suitable to cut link = link + linkToPage[i] videosPage = link + "/videos" print(videosPage) return videosPage def getVideoTitles(): driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver") driver.get(googleYoutubePage()) titleElement = driver.find_element_by_class_name("yt-lockup-content") print(titleElement.text) #it prints out title, + views, hours ago, and "CC" #I suck at selenium so lets just store the title and cut everything after it def main(): getVideoTitles() main() [/code] [doublepost=1488486148,1488428358][/doublepost]Thanks for everyone that may have tried to answer this question. The answer lied mainly in finding the right element that held all of the title names. Which was a lot more work than it seems, considering how obfuscated youtube's web page. What I had to do was loop through every element, like so [code] while driver.find_element_by_class_name("yt-uix-button") is not False: for title in driver.find_elements_by_class_name("yt-uix-tile-link"): print(title.text) [/code] That line of coded is added in my getVideoTitles function in replacement of the titleElement variable init. Cheers, [doublepost=1488486171][/doublepost]Thread can be closed by mods due to answer being found [/QUOTE]

Verification

Reply to thread

Connect with us

Newest members