Youtube to MP3 Site Implementation Review/Thinktank

daxterfellowes · March 2016

I wanted to get some input currently on how I'm implementing a Python-based Youtube to MP3 site. Currently my stack isn't running since I had some implementation issues and was moving providers, but I'd like to hear feedback on how I could improve it.

Two VMs:
One handles front end
One handles the 'API'/queue

Currently I have a Django front-end that takes in an input URL (single video URL) and queries the API, which runs the URL through youtube-dl and returns whether it's YT link or not.

First step: we record an entry(for analytical purposes) and check the DB to see if there is already an entry for the video (via YT ID), if so, we grab the URL reference identification. The URL corresponds to the filename located within the CDN and small amount of metadata. We then immediately serve the file from the CDN location.

If not, we then send a cURL request to the API with the url (url encoded) and youtube ID (for identifier) via celery/RabbitMQ to handle callbacks of task completion and to manage a queue. This part is a bit hazy and poorly executed, but I couldn't figure out a better task management system (issue #1). The task then goes through and grabs the video, downloads it to temp storage, transcodes to MP3, then uploads to to managed CDN storage (issue #2, limited space), and then calls back to the front end server to record the filename (since we don't know how it'll store it on the filesystem, issue #3 for blackbox syndrome).

As you can see, it can get quite cluttered and it's probably not the best way of checking or storing. Any ideas on improvement would be greatly appreciated. I know there are OOTB solutions for this, but was looking to roll my own for the experience side of it.

To add, I was thinking of serving the content off of the API server as the origin server utilizing the CDN for what it actually is. This will cut down on time it takes to upload and will reduce managed content costs.

nonissue · March 2016

1) why not just use youtube-dl to just download the videos audio? Read the man.
2) why do you want to do this? you open yourself to a ton of liability and will be shut down almost immediately.

Frecyboy · March 2016

There are already way too much youtube-dl sites, and I guess most prefer using one download manager for all their stuff anyway.

theroyalstudent · March 2016

Erm... https://github.com/Rudloff/alltube

daxterfellowes · March 2016

@nonissue said:
1) why not just use youtube-dl to just download the videos audio? Read the man.
2) why do you want to do this? you open yourself to a ton of liability and will be shut down almost immediately.

I do use Youtube-DL on the backend to handle the conversion process and metadata checking.

@theroyalstudent said:
Erm... https://github.com/Rudloff/alltube

As I said, I know there are many out of the box solutions, but was looking to build my own out for experience.

@Frecyboy said:
There are already way too much youtube-dl sites, and I guess most prefer using one download manager for all their stuff anyway.

I never expected this to get big or anything as there are so many players already, saturated market, but wanted to build my own for personal use.

lybxlpsv · March 2016

@nonissue said:
1) why not just use youtube-dl to just download the videos audio? Read the man.

This, I don't get all of this youtube to mp3 thingy as it re-encodes youtube's m4a/aac to mp3 and decreases quality and those online converter that grabs the video then convert the audio to mp3. i'd rather get the m4a/aac/opus directly from youtube which is faster than to wait online converter to grab videos then converting it.

what kind of audio player that doesn't support AAC anyways? even my old nokia phone support HE-AAC v2.

not converting to mp3 also saves you cpu time and bandwidth depending on bitrate.

rincewind · March 2016

I think you are using the message broker incorrectly. The component that submits a fetch request should not have to wait, or get called-back on completion. Here is one design, based on QUEUE/PIPELINE:

Front-end submits a FETCH request to broker, makes entry in DB.
A bunch of fetcher threads look for FETCH entries, attempt to download. If success, submit TRANSCODE request to broker.
Transcoder threads subscribe to transcode requests. On completion, submit UPLOAD request.
Uploader waits on broker for UPLOAD requests, uploads, updates DB and informs front-end.

The message broker is the backbone, and individual components look for special types of requests and handover ownership once they are done with their part. Issue#3 is non-existent - you shouldn't care about filename until upload is complete anyway.

You can handle failures and retries. For instance, if FETCH/DOWNLOAD fails, you can resubmit it to the broker and increment a failed_attempts field as part of the request. If failed_attempts > max_retries, for instance, discard request and inform front-end.

Take a look at alternate brokers like NATS or Nanomsg.

Think of storage in terms of multi-level cache design. Your most popular files end up on your fastest CDN-storage. Gather stats to figure out whats popular, and move files around.

daxterfellowes · March 2016

@rincewind said:

Awesome! Appreciate the insight. Due to how youtube-dl works, may have to condense the steps.

Howdy, Stranger!

Categories

In this Discussion

Youtube to MP3 Site Implementation Review/Thinktank

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Youtube to MP3 Site Implementation Review/Thinktank

Comments