New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Easy way to extract IP Addresses from multi-page list of VPN Gate Volunteers?
Here is the URL http://www.vpngate.net/en/volunteer_servers.aspx
There are like 170+ pages and (updating regularly). I want to extract all the ip addresses mentioned on those pages. Is there any easy way to do it quickly so that i can get a full list of ip addresses from all those 170+ pages?
Comments
regex + python + bs4
It easy, but I suspect you will doing something evil there so no free script for you
Nothing evil just want to block all of them from my my site because i am tired of all those spammers/abusers using these korean/japanese/vietanmese/thaliand and other east asian countries ip through vpngate/softether
how they spam/abuse you? I'm still didnt grt it as valid reason
how they spam/abuse you? I'm still didnt grt it as valid reason
Running a community site with chat rooms so if a user go against rules again and again we have to kick and sometime ban him but some of them come back using vpngate softether vpn to abuse and spam in chat rooms and there are like 30,000+ ip addresses in vpngate which makes it hard to block them.
If they want to come back, they can also choose something like commercial VPN with free trial ,free proxy or/with Tor(such as VPN gate + Tor), are you going to ban all of them?
By the way this is my first time to see someone rather than a cencorship authority who want to extract and ban VPN gate IPs
Most of those commercial vpn services are all on networks like ovh, hetzner, do etc and i already blocked most of them now my biggest worry is vpngate because most of those abusers are coming through vpngate using east asian ip addresses associated with vpngate/softether.
My sites got no content for that region so it's not that i am going to lose any real traffic and after blocking all those asn like ovh, hetzner etc i saw a decrease of 3 to 5% in traffic but things are much better now if i compare them with past and they will hopefully become more better once i get rid of these vpn gate addresses.
double post delete
How do you know that its VPNGate IPs that are harassing you?
http://www.linuxjournal.com/content/downloading-entire-web-site-wget
http://superuser.com/questions/202818/what-regular-expression-can-i-use-to-match-an-ip-address
ubuntu@xxx:~$ cat * | grep -Eo '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}'
96.41.5.5
96.41.5.5
96.50.101.31
96.50.101.31
96.51.233.188
96.51.233.188
96.51.61.185
96.51.61.185
97.106.251.108
97.106.251.108
97.112.175.149
97.112.175.149
97.117.88.196
97.117.88.196
97.121.102.187
....
You shouldn't crawl their website.
It's not that hard to find there are multiple methods and one of them is just google their site results and look for that ip address because they list many of those ip addresses on their website server list pages for example use try:
site:www.vpngate.net "42.113.194.181"
That was really helpful! Thanks i was able to find 30,000+ ip addresses within few minutes using these 2 methods.
Thats true i just found that those lists are not updated because most of the current ip addresses i see in their vpn client can't be found in those lists
Current IP address? Lol how do you compare and known current ip address without using them ?
I cant believe you googling for 30k+ IP :P
btw I'm done here.
By current ip addresses i mean the fresh (active) list of servers that users can use through their vpn client.
Posting a screenshot from their vpn client to give you an idea how all those spammers abusers can use all these vpngate ip addresses and everytime you click the refresh button you can see more fresh servers (and the list is long see ther scroll bar).
By the way i am just curious if i need a certificate from you that it's not for any evil thing? The person who wanted to help already did it without even asking a stupid question.
People on USENET give out "Useless Use of cat" demerits and this would qualify. There is no need for cat in that command line. grep can work with files/wildcards just fine.
Applying regex to html is almost always the wrong approach. In this case, that would probably work because the HTML makes it easy. But if there were, say, no line breaks in that HTML or if it was a complex table, regex would be tough. Regardless, it's brittle.
The proper way is to use whatever language you like's html parsing library and either get a stream of tags and attributes or delve selectively.
VPNGate provides a CSV list which it's a lot easier to parse, you do not need to grab the raw HTML. As mentioned earlier, they do not show all vpngate relays regardless of how many times you refresh and they will provide "fake" relays if they think you're trying to get a list of their relays. This implies that they'll have direct access to your firewall rules (ban whatever IP they want) so it's not recommended.
Yea in the end i found it's a waste of time not only VPNGate but there are these services like HideMyAss with multple servers in 200 countries now with 200k different ip and allow you unlimited server/ip switching in different country/city/server/subnet so it's impossible to stop it!
Use getipintel.net's API if you want a free solution or maxmind.com/blocked.com for a paid solution.
I don't think maxmind / blocked can detect vpngate though I'm not 100% sure.
time consuming and difficult, but not impossible
I can check Maxmind's database if somebody can provide some IPs (VPNGate is blocked by my proxy so I can't get them myself).
I pm'd you a couple of IPs.