Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


New project: Streaming Pastebin API (also, more details about NixOS)
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

New project: Streaming Pastebin API (also, more details about NixOS)

joepie91joepie91 Member, Patron Provider
edited July 2017 in General

So I actually built this a while ago, but I figured that it'd be a good opportunity to get my feet wet with NixOps as a deployment management tool, so it took me a while to get around to getting it running. Either way, I've just set it up :)

In short: it's a free WebSocket-based API that streams out all (public) pastes made to Pastebin, including content, with a delay of at most a minute. It doesn't require API keys to use (although you're expected to use it responsibly), and it can be accessed directly from a page on another domain, meaning it's fairly trivial to build projects on top of it. Of course, any non-browser WebSocket client will work too.

Naturally, as with all my things, it's open-source. Keep in mind that it may take a moment for the preview box to start moving, as new pastes are only scraped once every minute.

Some technical details about the software: There's not terribly much to it, really. It's a Node.js project that uses bhttp (a HTTP client library of my own making) to retrieve new pastes every minute from Pastebin's scraping API - which requires a paid account to use - and then uses an internal task queue to retrieve the actual contents of a paste approximately every second, streaming out the result over a WebSocket. It uses some custom abstractions for tracking clients and pastes.

The frontend code is a fairly simple and boring page, containing a Riot component for rendering a preview of the feed. No abstractions like Socket.IO are used; it's just a plain WebSocket, using the browser's native API for that. Webpack is used to bundle the frontend code into a single .js file.

(Obligatory note: The above description is just to give an idea of how things are put together - it doesn't mean that you should use all of those tools for your project. I picked them specifically as tools that are suitable for this particular project.)

Some technical details about the deployment: Deployment is handled using NixOps, to a NixOS system (a DigitalOcean droplet that's created on-the-fly for staging, and an Afterburst KVM VPS for production). The 'expression' for the Pastebin Stream API itself - basically a package definition - is generated using node2nix.

For those not familiar with NixOS, which I suspect will be most of you: in short, it's a Linux distribution that uses Nix as a package manager. Nix is a package manager that produces reproducible and isolated package installations. NixOS has no 'global' environment (eg. /lib doesn't exist), and every single installed application explicitly specifies its dependencies, meaning a few things:

  1. You don't have to deal with dependency conflicts, since it's completely valid to have many versions (or customized variants) of the same library installed on the same system - after all, every dependency is explicitly referenced from whichever software uses it.
  2. All installations are reproducible; ie. if an installed package works on one system, it's guaranteed to work on another system with the same architecture as well, no matter what the other differences between the systems are.
  3. The system gets 'rebuilt' when you change its configuration; that is, every changed version of a package is installed as a new package, and the old packages eventually get garbage-collected. That means you're not left with outdated or left-behind configuration files and such. Essentially, your system is just as 'fresh and clean' after a few years, as it was when you first installed it.
  4. Because of the above, upgrades are almost entirely painless. You just change the branch you're on, rebuild your system, and you're done.
  5. You can roll back to older versions of the system configuration. Since there's nothing 'global' on the system, everything is built around virtual environments, which means that you can trivially boot into an older version of your environment (and install packages as a non-privileged user without security risks, and create actual virtual environments for specific projects, and...)

TL;DR: It makes systems a lot more reliable and easier to test and manage than a "change global stuff as you go along" distribution like Debian, CentOS, Ubuntu, and so on. There are a lot more goodies to it (eg. booting a new configuration in a created-on-the-fly QEMU VM to test it before applying it to your main system), but I won't go into them too much here unless people are particularly interested. This and this article are good reads on the topic.

One other thing about NixOS (but not available with Nix on other distros/platforms) that I do want to mention, since it's pretty critical to how I've set up the Pastebin Stream API, is that it provides declarative configuration.

This is a simple example of somebody's laptop's NixOS configuration, to give you an idea, but you will find similar configuration in my NixOps configuration repository as well. Essentially, you can specify all configuration for all software in one place, and have it generate the appropriate configuration files.

Anyhow, I'm curious to see what people here can build with the API :)


EDIT: Also, to point this out explicitly: while NixOS is great from a technical perspective and very interesting, do not expect to just be able to jump in overnight. The documentation and tooling usability are still severely lacking, and you'll essentially have to relearn Linux to use it today.

I'll be happy to answer any questions about it, but keep in mind that I'm not an expert on it yet either :)

Comments

  • TL;DR

    What would this be used for as a newbie just trying to see the benefits of having this streaming api?

  • joepie91joepie91 Member, Patron Provider

    @ljseals said:
    TL;DR

    What would this be used for as a newbie just trying to see the benefits of having this streaming api?

    Could be many things, really. Archiving all public pastes, scanning them for specific kinds of data, collecting spam statistics, visualizing pastes, and so on. Anything for which a feed of pastes is useful, including stuff I haven't thought of :)

    Thanked by 2ljseals BlaZe
  • WSSWSS Member

    Reminds me of MetaSpy. Only oldies need apply.

  • joepie91joepie91 Member, Patron Provider

    @WSS said:
    Reminds me of MetaSpy. Only oldies need apply.

    Heh. I just read up on that, and I can't believe that didn't cause more of a ruckus in its time. Seems to have a high creepiness factor...

  • What information are you getting from pastes that you want to archive it?

  • WSSWSS Member

    @joepie91 said:

    @WSS said:
    Reminds me of MetaSpy. Only oldies need apply.

    Heh. I just read up on that, and I can't believe that didn't cause more of a ruckus in its time. Seems to have a high creepiness factor...

    Well, they had a "filter" by default that was pretty good at not showing people looking for porn.

    Back then we didn't even bother thinking about monetization; I don't believe we even had the ability to "opt out" of searches being shared, but we didn't even get cookies for using MetaCrawler.. it was pretty much just a direct interface to their system which forked to other services and provided us with the most "content rich" sites. It wasn't very accurate, but it was "Google" before Google for searches at the time.

    Overall, we just thought "Hey, this is what people are actually looking for", and we'd pepper it with odd searches to see if we could get noticed and mentioned on IRC. Yep, it was a different time.

Sign In or Register to comment.