Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


How download list files and export dead links
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

How download list files and export dead links

sonicsonic Veteran

I have txt file contain 24k image links.
I use command: wget -i files.txt to download

But there are some dead links, i want to export them to txt file. Please let me know how to do, many tks

Comments

  • VereloxVerelox Member
    edited April 2015

    Here you go! This is a bash script which uses ping to check whether the host is reachable or not, you may adjust the values "files.txt" (input) and "deadlinks.txt" (output) as well:

    #!/bin/bash
    INPUT="files.txt"
    OUTPUT="deadlinks.txt"
    
    cat $INPUT | while read line
    do
    if [ ! -f $INPUT ]; then
        echo "The input file couldn't be located"
        break;
    elif ping $line -c1 -W1 | grep -q "0 received"
    then
    echo $line >> $OUTPUT
    fi
    done
    

    Edit: added more code to stop if the file is no longer reachable during the loop (thanks to @Amitz)

  • AmitzAmitz Member

    Great so far, but what if the host is reachable, but the file is no longer existing?

    Thanked by 1sonic
  • ehabehab Member
    edited April 2015

    curl return status can be handy here 200 ok otherwise to the bin

    Thanked by 2Amitz sonic
  • BochiBochi Member

    wget has some nice return codes: http://www.gnu.org/software/wget/manual/html_node/Exit-Status.html
    Code 8 might be what you need, just use it in tiny shell script.

    Thanked by 2Amitz sonic
  • sonicsonic Veteran

    @Verelox said:
    Here you go! This is a bash script which uses ping to check whether the host is reachable or not, you may adjust the values "files.txt" (input) and "deadlinks.txt" (output) as well:

    > #!/bin/bash
    > INPUT="files.txt"
    > OUTPUT="deadlinks.txt"
    > 
    > cat $INPUT | while read line
    > do
    > if [ ! -f $INPUT ]; then
    >     echo "The input file couldn't be located"
    >     break;
    > elif ping $line -c1 -W1 | grep -q "0 received"
    > then
    > echo $line >> $OUTPUT
    > fi
    > done
    > 

    Edit: added more code to stop if the file is no longer reachable during the loop (thanks to Amitz)

    Is this bash only export broken links? How to also download and then export broken links.

    BTW, many thanks for all helps!

  • @sonic said:
    Is this bash only export broken links? How to also download and then export broken links.

    Sorry I just re-read your main post and realized that you would like to validate whether the URL returns a valid image or not - I have modified the script to do the following:

    • Read line by line from the $INPUT variable
    • Outputs the line which returns a "404 Not Found" status code
    • Outputs the line which doesn't return a "404 Not Found" status code but is not an image

    Here you go:

    #!/bin/bash
    INPUT="files.txt"
    OUTPUT="deadlinks.txt"
    
    cat $INPUT | while read line
    do
    if [ ! -f $INPUT ]; then
        echo "The input file couldn't be located"
        break;
    elif curl -I $line -L | grep -q "404 Not Found"
    then
    echo $line >> $OUTPUT
    elif ! curl -sI $line | grep -q "Content-Type: image"
    then
    echo $line >> $OUTPUT
    fi
    done
    
  • joepie91joepie91 Member, Patron Provider
    edited April 2015

    For what it's worth, wget has a logging flag. You can just enable that, and parse out the 404s afterwards. That would save you from doing requests more than once.

Sign In or Register to comment.