DZone Forums
Go Back   DZone Forums > Community > Languages & Frameworks > Java
Reload this Page Screen Scraping Https/JavaScript?
Notices
Reply
 
LinkBack Thread Tools Display Modes
  (#1 (permalink)) Old
Member
 
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Join Date: Jul 2009
Question Screen Scraping Https/JavaScript? - 07-22-2009, 08:17 PM

I want to screen scrape volume and quote data from an online broker I have an account with for a hobby trading system I'm building. The quote/volume screen uses frames and I must login to get to it. Also, when I try to access the quotes/volumes screen using a URL the site logs me out and takes me to the homepage, i.e. it is using JavaScript to prevent systems from hitting links directly and scraping the data.

I need a Java framework/utility that will let me write a program to login to the site and step by step access the quotes/volumes screen as though a human user with a browser were doing the same thing so I can scrape the screen. Can someone please recommend a utility that can do this?
Reply With Quote
  (#2 (permalink)) Old
Forum Leader
 
glennji's Avatar
 
Posts: 58
Thanks: 0
Thanked 1 Time in 1 Post
Join Date: Feb 2008
Location: London
Send a message via MSN to glennji Send a message via Yahoo to glennji Send a message via Skype™ to glennji
Default 07-24-2009, 07:53 AM

It really sounds like they're going out of the way to stop things like that -- so even if you get it working you're probably going to be breaking some user agreement, and you're at the mercy of them changing their app and breaking yours...

Most brokers have a quotestream service (for a price), and if it's just for a "game" or "hobby" you might be able to find time-delayed quotestreams for free.

Sorry if that's not what you wanna hear ;-)


Quote:
Happy hacking!
Reply With Quote
  (#3 (permalink)) Old
Member
 
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Join Date: Mar 2009
Default 07-30-2009, 05:22 AM

You might be able to do it using this: HttpComponents - HttpComponents Overview or, more specifically the previous stable version HttpClient - HttpClient Home.
Reply With Quote
  (#4 (permalink)) Old
Member
 
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Join Date: Aug 2009
Default 08-01-2009, 03:54 AM

Try using web harvest, its an open source java based scripting that allows you to code in xml to login into secure websites and scrap data. It also allows to do additional processing such as saving them to database or additionally do any more processing with the data. It is fully developed in java.

Hope that helps you.
Reply With Quote
  (#5 (permalink)) Old
Moderator
 
jwenting's Avatar
 
Posts: 99
Thanks: 0
Thanked 8 Times in 8 Posts
Join Date: Feb 2008
Send a message via MSN to jwenting
Default 08-03-2009, 02:25 AM

Contact the data owners. Stealing their data is NOT the way to go (and screen scraping amounts to stealing), and is a violation of the law.

All data owners of such data have APIs available (usually at a price, that data doesn't come cheap).
Reply With Quote
  (#6 (permalink)) Old
Member
 
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Join Date: Jul 2009
Default 08-05-2009, 08:42 PM

I decided to just scrape everything from Yahoo finance and live with the 15 minute delay on the options data. I'm manually parsing the HTML, it only took about 1 night to write a parser.

Separate question, any suggestions on tuning my parser? It takes about 90 minute to pull the options quote pages, parse them and insert the data for 5000 symbols into mysql. I guess this isn't too bad, but I'd be happy if I could improve the speed. I haven't done much tuning outside of using eclipses profiling tools. Any suggestions on how to approach tuning this?
Reply With Quote
  (#7 (permalink)) Old
Member
 
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Join Date: Aug 2009
Default Try Selenium - 08-14-2009, 04:25 AM

The Selenium test framework has java APIs - it may be possible to use it to screen scrape, though it would do it through your browser (You would see your browser pop up and go and retrieve the data.)

I dunno about the legal issues, and you are at the mercy of the site owner changing their layout...
Reply With Quote
  (#8 (permalink)) Old
Moderator
 
jwenting's Avatar
 
Posts: 99
Thanks: 0
Thanked 8 Times in 8 Posts
Join Date: Feb 2008
Send a message via MSN to jwenting
Default 09-08-2009, 07:40 AM

The legal issues are simple: it's almost always against the TOS of a site to redistribute its content.

That may not be a criminal case in most countries, but it's certainly a civilian case of breach of contract.
Reply With Quote
  (#9 (permalink)) Old
Member
 
Posts: 3
Thanks: 1
Thanked 0 Times in 0 Posts
Join Date: Feb 2009
Default 09-24-2009, 04:01 AM

Quote:
Originally Posted by mikebanderson View Post
I want to screen scrape volume and quote data from an online broker .[...] i.e. it is using JavaScript to prevent systems from hitting links directly and scraping the data. [...] so I can scrape the screen.
Do you read anything wrong here?
Reply With Quote
Reply

Tags
screen scrape https

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
onClose in PDF JavaScript robbluther Ajax & Javascript 0 06-22-2009 07:08 AM
Need help with JSF and Javascript/AJAX nalla.abhilash Java 0 11-05-2008 06:37 PM
Splash Screen in RCP ewald Eclipse 0 09-17-2008 05:16 AM
Eclipse Splash Screen Mxmler Eclipse 1 07-11-2008 01:49 PM
Javascript zone kremso Zones 2 02-13-2008 01:28 PM


Copyright 1997-2009, DZone, Inc.
vBulletin Skin developed by: vBStyles.com