How to Cache or copy a website?

Tenshi_420 · February 2014

I am trying to cache or copy a (an entire website) websites contents to save as a backup. I am doing this out of self preservation not to clone or phishing sites. How can I go about this ?

The site contains mostly text with images. Nothing to fancy just cache/copy.

In case something happens I want to be able to offer copies in case of an emergency. Its a bit of a secret project and very very time sensitive project that I am trying to accomplish as soon as I can.

Thanks for taking time to read. (:

P.S. I do not own or have access to this website other than its www .

imagine · February 2014

HTTrack should do the trick: http://www.httrack.com/

smile93 · February 2014

Wget will the do job

wget \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --restrict-file-names=windows \ --domains www.domain.com \ --no-parent \ http://www.domain.com/folder/

Amitz · February 2014

Just as a friendly reminder by the admin of a website that gets "copied" every day by people who want to "archive" the content: It's a pain in the ass and can put some heavy load on the server and causes quite some trouble to the admin. Be gentle...

Tenshi_420 · February 2014

For the others thanks, I'll try it when I have more time and;

@Amitz said:
Just as a short reminder by the owner of a website that gets "copied" every day by people who want to "archive" the content: It's a pain in the ass and can put some heavy load on the server and causes quite some trouble to the admin. Be gentle...

Oh thanks for the heads up, I think I would cache (when I can) a page per 9 seconds, maybe write a cheesy workable bash script and cron it-- with supervision)

Mun · February 2014

@Amitz said:
Just as a short reminder by the owner of a website that gets "copied" every day by people who want to "archive" the content: It's a pain in the ass and can put some heavy load on the server and causes quite some trouble to the admin. Be gentle...

What site do you run O.O

Amitz · February 2014

I am responsible for the technical part (not the content) of a niche adult gallery. 15,000+ images and there are people who try to "cache" them all locally on a daily basis...

marcm · February 2014

@Amitz can you share a link please?

support123 · February 2014

marcm said: @Amitz can you share a link please?

Link will be better.

Amitz · February 2014

No, sorry.
I do not want the site to get associated (publicly) with me. I just take care about the backend, not the content.

marcm · February 2014

ftpit said: 0.0

sleddog · March 2014

Use wget, as suggested. There's lots of commandline options to do what you want, including "mirror".

Amitz said: It's a pain in the ass and can put some heavy load on the server

And there's a "wait" parameter (-w) to insert a delay between requests so you don't create an issue like this. Use a healthy wait and cue the job up before going to bed... Next morning your good to go.

painfreepc · March 2014

easy way to do it on you home desktop:

WinHTTrack

Download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer.

sleddog · March 2014

painfreepc said: easy way to do it on you home desktop:

WinHTTrack

Personally I block it on my servers for the abuse issues that @Amitz raised above.

Radi · March 2014

@Amitz said:
I am responsible for the technical part (not the content) of a niche adult gallery. 15,000+ images and there are people who try to "cache" them all locally on a daily basis...

Create a weekly archive by cron(may create a load on server, but its only once a week). Use nginx on an unmetered server to host the archive. :P

Howdy, Stranger!

Categories

In this Discussion

How to Cache or copy a website?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

How to Cache or copy a website?

Comments