Menu
Forums
All threads
Latest threads
New posts
Trending threads
New posts
Search forums
Trending
What's new
New posts
New profile posts
Latest activity
Members
Current visitors
New profile posts
Search profile posts
Upgrades
Log in
Register
What's new
Search
Search
Search titles only
By:
All threads
Latest threads
New posts
Trending threads
New posts
Search forums
Menu
Log in
Register
Navigation
Install the app
Install
More options
Contact us
Close Menu
Forums
Software Development
Programming
Programming Q&A
How Do I get Around Instagram Blocking Web Crawlers
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="Mikee" data-source="post: 418124" data-attributes="member: 76567"><p>[CODE]</p><p>#Instagram Thief</p><p>#scape the top page every hour, check every photo on the top page, compare each of their likes</p><p>#download the photo with the highest votes. Do this every hour, also keep track of the tags that they used</p><p>#upload the photo i downloaded, to my account and put the same tags in them</p><p>#after 24 hours, record which photos got most likes, and record their tags in a JSON File.</p><p>#do this every day</p><p>from bs4 import BeautifulSoup</p><p>import urllib3</p><p></p><p></p><p>class InstagramPhoto(object):</p><p> top_page_text = None</p><p> </p><p> def __init__(self):</p><p> self.data = None</p><p></p><p></p><p> @staticmethod</p><p> def get_top_page():</p><p> try:</p><p> http = urllib3.PoolManager()</p><p> r = http.request("GET", "https://www.instagram.com/explore/")</p><p> print (r.data.decode('utf-8'))</p><p> except Exception as e:</p><p> print("\nAn Error With UrlLib3 Has Occured...\n\n",e,"\n")</p><p> return</p><p></p><p> '''</p><p> def find_top_photo(self):</p><p> try:</p><p> if self.top_page_text is None:</p><p> raise Exception</p><p> except Exception:</p><p> print("Woops, The Top Instagram Page Was Not Yet Accessed !")</p><p> return</p><p> text = "<div class = _mck9w _gvoze _f2mse> hey we have some text"</p><p> soup = BeautifulSoup(self.top_page_text, "html.parser")</p><p> print(soup.prettify())</p><p> '''</p><p> </p><p> </p><p>def main():</p><p> get_top_page = InstagramPhoto.get_top_page()</p><p> '''</p><p> new_photo = InstagramPhoto() #creating an instance of the new_photo that we wanna get</p><p> new_photo.find_top_photo()</p><p> '''</p><p></p><p>if __name__ == "__main__":</p><p> main()</p><p></p><p></p><p>[/CODE]</p><p></p><p></p><p>The print doesn't return the full source. It literally skips the <body> which is what I need. Does anyone know how I can get around this?</p><p>Thanks.</p><p></p><p>I've tried using the requests module but it literally does the same thing.</p></blockquote><p></p>
[QUOTE="Mikee, post: 418124, member: 76567"] [CODE] #Instagram Thief #scape the top page every hour, check every photo on the top page, compare each of their likes #download the photo with the highest votes. Do this every hour, also keep track of the tags that they used #upload the photo i downloaded, to my account and put the same tags in them #after 24 hours, record which photos got most likes, and record their tags in a JSON File. #do this every day from bs4 import BeautifulSoup import urllib3 class InstagramPhoto(object): top_page_text = None def __init__(self): self.data = None @staticmethod def get_top_page(): try: http = urllib3.PoolManager() r = http.request("GET", "https://www.instagram.com/explore/") print (r.data.decode('utf-8')) except Exception as e: print("\nAn Error With UrlLib3 Has Occured...\n\n",e,"\n") return ''' def find_top_photo(self): try: if self.top_page_text is None: raise Exception except Exception: print("Woops, The Top Instagram Page Was Not Yet Accessed !") return text = "<div class = _mck9w _gvoze _f2mse> hey we have some text" soup = BeautifulSoup(self.top_page_text, "html.parser") print(soup.prettify()) ''' def main(): get_top_page = InstagramPhoto.get_top_page() ''' new_photo = InstagramPhoto() #creating an instance of the new_photo that we wanna get new_photo.find_top_photo() ''' if __name__ == "__main__": main() [/CODE] The print doesn't return the full source. It literally skips the <body> which is what I need. Does anyone know how I can get around this? Thanks. I've tried using the requests module but it literally does the same thing. [/QUOTE]
Insert quotes…
Verification
Post reply
Forums
Software Development
Programming
Programming Q&A
How Do I get Around Instagram Blocking Web Crawlers
Top