New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Box requirements for Selenium automated script
Hey guys,
I have a Java script that needs to be run every 10 minutes automatically.
This script uses Selenium Webdriver to open Iceweasel, browse a couple of websites, get some data from them, store them in a csv and send it to my other box.
I already have this script functioning in Debian within VirtualBox and it works like a charm. However, I have no idea to what kind of VPS specs I would be needing for that. Does anyone have any experience with that? Or how could I possibly measure requirements for that purpose?
Thanks everyone!
Comments
stupid question: what is the spec of the vm you are using locally?
1 core on a Late 2013 Macbook Pro and 512MB RAM.
Is tweaking with these guys a reliable way to find a limit?
Browsers tend to eat lots of RAM. And selenium just fires up browsers.
I say 1-2gb ram
1-2GB RAM would be enough. But I think you may need a good CPU
Why do you need to use a web browser when you could just script the whole thing so you'll need a lot less resources.
Why not use selenium with phamtonjs driver?
https://github.com/detro/ghostdriver
I have it running in a Raspberry PI and doesn't eat much ram.
Regards
selenium not eat memory. Firefox yes, ~300MB for basic usage if your 'target' sites not have a lot of java/flash/other shits. So! X + Firefox + selenium works with 1GHZ, 512MB RAM.. but better is to create a virtual machine with lowspecs to see how fast working (idk if virtualbox is best way to compare ..but is fine). I used python + selenium + firefox and was fine for me.. if 500ms-3s delay for opening new firefox browser is not a issue.. is fine
About PhantomJS, I tried using it before regular browsers, but the websites I need to visit do not work correctly with PhantomJS, maybe they have a blocking script for headless browsing...
Now I'll try to find a bottleneck for this. If anyone is interested, I'll report back.
Yes there are fingerprints for phantomjs but it's something that can be overcome. I'd agree a headless browser might be fine for the task, only the OP would know.
OP I had a project that automated 100 firefox profiles concurrently, 300MB is good ballpark though you can tweak your .ini file to get it down, it tends to get to 300MB for heavy pages like FB or when you've been browing a while. You'll likely require closer to 200MB if the pages aren't JS heavy. I'd simply use curl/wget when at all possible, or mozrepl within firefox when it's not (which is what selenium uses in the case of firefox).
@VCT, when i used selenium..somewhere i founded about 'webdrive' of firefox.. and when is used ,a public javascript variable is created .. which cannot be modified or deleted (something like this).. so webdrive can be detected (is an standard thing.. if i remember right). But i used selenium without any issue..about detection.
@ricardo, curl/wget wouldn't be of much help, I think, because I need to perform multiple logins through weird javascript forms. Or is this a problem of low skill from my part? LOL
@getvps, the problem isn't with selenium, I believe. The problem is with PhantomJS. Selenium doesn't get detected when I use the Firefox driver.
Most of the time curl/wget is fine. If the data you're after is obfuscated too much by JS/ajax (hardly ever) or there's some bot blocking/security on the site (sometimes), then perhaps a browser is the path of least resistance.
In general you just observe each request/step you need to get at the data. 99% of the time it's a) post user/pass b) accept/store cookie c) load up page(s)
I am using PhamtomJS in Python and I used Selenium DesiredCapabilities and a custom user_agent because many webs drop the connection if no user_agent is found. This is an example of my code:
Please paid attention in the service_args=['--ssl-protocol=any','--ignore-ssl-errors=true'] if you use PhatomJS to connect to https you must specify then to avoid fails in many https connections.
Regards.
That's very interesting, @Juanako.
Thanks!