Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Box requirements for Selenium automated script
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Box requirements for Selenium automated script

VCTVCT Member

Hey guys,
I have a Java script that needs to be run every 10 minutes automatically.
This script uses Selenium Webdriver to open Iceweasel, browse a couple of websites, get some data from them, store them in a csv and send it to my other box.
I already have this script functioning in Debian within VirtualBox and it works like a charm. However, I have no idea to what kind of VPS specs I would be needing for that. Does anyone have any experience with that? Or how could I possibly measure requirements for that purpose?

Thanks everyone!

Comments

  • tehdantehdan Member

    stupid question: what is the spec of the vm you are using locally?

    Thanked by 1VCT
  • VCTVCT Member

    @tehdan said:
    stupid question: what is the spec of the vm you are using locally?

    1 core on a Late 2013 Macbook Pro and 512MB RAM.
    Is tweaking with these guys a reliable way to find a limit?

  • jcalebjcaleb Member

    Browsers tend to eat lots of RAM. And selenium just fires up browsers.

    I say 1-2gb ram

    Thanked by 1VCT
  • 1-2GB RAM would be enough. But I think you may need a good CPU

    Thanked by 1VCT
  • Why do you need to use a web browser when you could just script the whole thing so you'll need a lot less resources.

    Thanked by 1VCT
  • Why not use selenium with phamtonjs driver?

    https://github.com/detro/ghostdriver

    I have it running in a Raspberry PI and doesn't eat much ram.

    Regards

    Thanked by 2cfgguy VCT
  • getvpsgetvps Member

    :) selenium not eat memory. Firefox yes, ~300MB for basic usage if your 'target' sites not have a lot of java/flash/other shits. So! X + Firefox + selenium works with 1GHZ, 512MB RAM.. but better is to create a virtual machine with lowspecs to see how fast working (idk if virtualbox is best way to compare ..but is fine). I used python + selenium + firefox and was fine for me.. if 500ms-3s delay for opening new firefox browser is not a issue.. is fine :)

    Thanked by 1VCT
  • VCTVCT Member

    About PhantomJS, I tried using it before regular browsers, but the websites I need to visit do not work correctly with PhantomJS, maybe they have a blocking script for headless browsing...

    Now I'll try to find a bottleneck for this. If anyone is interested, I'll report back.

  • ricardoricardo Member
    edited July 2015

    maybe they have a blocking script for headless browsing

    Yes there are fingerprints for phantomjs but it's something that can be overcome. I'd agree a headless browser might be fine for the task, only the OP would know.

    OP I had a project that automated 100 firefox profiles concurrently, 300MB is good ballpark though you can tweak your .ini file to get it down, it tends to get to 300MB for heavy pages like FB or when you've been browing a while. You'll likely require closer to 200MB if the pages aren't JS heavy. I'd simply use curl/wget when at all possible, or mozrepl within firefox when it's not (which is what selenium uses in the case of firefox).

    Thanked by 1VCT
  • getvpsgetvps Member

    @VCT, when i used selenium..somewhere i founded about 'webdrive' of firefox.. and when is used ,a public javascript variable is created .. which cannot be modified or deleted (something like this).. so webdrive can be detected (is an standard thing.. if i remember right). But i used selenium without any issue..about detection.

    Thanked by 1VCT
  • VCTVCT Member

    @ricardo, curl/wget wouldn't be of much help, I think, because I need to perform multiple logins through weird javascript forms. Or is this a problem of low skill from my part? LOL

    @getvps, the problem isn't with selenium, I believe. The problem is with PhantomJS. Selenium doesn't get detected when I use the Firefox driver.

  • curl/wget wouldn't be of much help, I think, because I need to perform multiple logins through weird javascript forms

    Most of the time curl/wget is fine. If the data you're after is obfuscated too much by JS/ajax (hardly ever) or there's some bot blocking/security on the site (sometimes), then perhaps a browser is the path of least resistance.

    In general you just observe each request/step you need to get at the data. 99% of the time it's a) post user/pass b) accept/store cookie c) load up page(s)

  • @VCT said:
    About PhantomJS, I tried using it before regular browsers, but the websites I need to visit do not work correctly with PhantomJS, maybe they have a blocking script for headless browsing...

    Now I'll try to find a bottleneck for this. If anyone is interested, I'll report back.

    I am using PhamtomJS in Python and I used Selenium DesiredCapabilities and a custom user_agent because many webs drop the connection if no user_agent is found. This is an example of my code:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.support.ui import Select
    from selenium.common.exceptions import NoSuchElementException
    from selenium.common.exceptions import NoAlertPresentException
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    
     phantomjs_path = '/home/user/phantomjs-linux-armv6l-master/phantomjs-1.9.0-linux-armv6l/bin/phantomjs'
    
     user_agent = (
                    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) " +
                    "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"
            )
    
            dcap = dict(DesiredCapabilities.PHANTOMJS)
            dcap["phantomjs.page.settings.userAgent"] = user_agent
    
            driver = webdriver.PhantomJS(executable_path=phantomjs_path, desired_capabilities=dcap, service_args=['--ssl-protocol=any','--ignore-ssl-errors=true'])
    
     driver.get("my_https_web")
    ...
    

    Please paid attention in the service_args=['--ssl-protocol=any','--ignore-ssl-errors=true'] if you use PhatomJS to connect to https you must specify then to avoid fails in many https connections.

    Regards.

    Thanked by 1VCT
  • VCTVCT Member

    That's very interesting, @Juanako.
    Thanks!

Sign In or Register to comment.