Menu
Forums
All threads
Latest threads
New posts
Trending threads
New posts
Search forums
Trending
What's new
New posts
New profile posts
Latest activity
Members
Current visitors
New profile posts
Search profile posts
Upgrades
Log in
Register
What's new
Search
Search
Search titles only
By:
All threads
Latest threads
New posts
Trending threads
New posts
Search forums
Menu
Log in
Register
Navigation
Install the app
Install
More options
Contact us
Close Menu
Forums
Software Development
Programming
[PY] Data Mining Script
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="Brackson" data-source="post: 242975" data-attributes="member: 34747"><p>Here's a looping version of this script with proxy support, so you don't get firewalled.</p><p>[code]</p><p>#!/usr/bin/env python</p><p></p><p>import urllib2, time, random</p><p></p><p>proxies = [proxy.strip() for proxy in open('proxies.txt', 'r')] # Setting up the proxies.</p><p></p><p>def store_into_file(proxies):</p><p> random_proxy = random.choice(proxies)</p><p> proxy = urllib2.ProxyHandler({'http': random_proxy})</p><p></p><p> try:</p><p> url = 'http://google.com/' # URL that you want to mine.</p><p> data = urllib2.urlopen(url).read() # Get the HTML source of URL.</p><p></p><p> current_time = time.strftime('%H:%M:%S', time.localtime()) # Get the current time so we can use if for the txt filename.</p><p></p><p> r = open('%s.txt' % (current_time), 'w') # Create the file.</p><p> r.write(data) # Put the source in the file.</p><p> r.close() # Close the file.</p><p> except:</p><p> raise</p><p></p><p>while True:</p><p> store_into_file(proxies)</p><p>[/code]</p><p><span style="font-size: 9px"><a href="http://pastie.org/8451768" target="_blank">(with syntax formatting)</a></span></p><p></p><p>Include a 'proxies.txt' file with a proxy list separated by line breaks in the same directory, and if you have a good proxy list, you won't be firewalled.</p></blockquote><p></p>
[QUOTE="Brackson, post: 242975, member: 34747"] Here's a looping version of this script with proxy support, so you don't get firewalled. [code] #!/usr/bin/env python import urllib2, time, random proxies = [proxy.strip() for proxy in open('proxies.txt', 'r')] # Setting up the proxies. def store_into_file(proxies): random_proxy = random.choice(proxies) proxy = urllib2.ProxyHandler({'http': random_proxy}) try: url = 'http://google.com/' # URL that you want to mine. data = urllib2.urlopen(url).read() # Get the HTML source of URL. current_time = time.strftime('%H:%M:%S', time.localtime()) # Get the current time so we can use if for the txt filename. r = open('%s.txt' % (current_time), 'w') # Create the file. r.write(data) # Put the source in the file. r.close() # Close the file. except: raise while True: store_into_file(proxies) [/code] [SIZE=1][URL='http://pastie.org/8451768'](with syntax formatting)[/URL][/SIZE] Include a 'proxies.txt' file with a proxy list separated by line breaks in the same directory, and if you have a good proxy list, you won't be firewalled. [/QUOTE]
Insert quotes…
Verification
Post reply
Forums
Software Development
Programming
[PY] Data Mining Script
Top