Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Flat-file based blog performance for lot of posts
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Flat-file based blog performance for lot of posts

edanedan Member
edited December 2015 in General

Many people assume that the flat-file based blog will definitely slow if you have a lot of posts (more than 2k posts) or lengthy time when rebuilding the contents. Well it depends on the algorithm when accessing or listing the content based on date, category, tag, or author.

I put the date, tags, and the slug on the filename (separated it using an underscore). Put the author, category and content type as the folder name:

content/username/blog/category/type/2014-01-31-12-56-40_tag1,tag2,tag3_databaseless-blogging-platform-flat-file-blog.md

And with this we can match the mysql query. So I just need to glob the dir using specific wildcard and get the filename, serialize it and save it as an index. Later let say if we need tag A than just open the index and examining it, if match just open the files. So the bottleneck here is when opening the index file (not a big deal). In my local dev the index file size that contains a filenames for 6k posts is just around 600kb. When we edit/add a post, just recreating the index (depend on how fast is PHP glob).

I have made a temporary test blog in OpenShift (free account) and fill it with about 3K posts and hundreds of tags, and the page generation time is only around 0.023-0.035 seconds (view the html source) with archive, recent posts, popular posts, and popular tags, so it is quite fast!

Lowendbox has around 2300-2400 posts and the page generation time 1.238 seconds http://lowendbox.com/page/231/ (don't know the hardware here).

So for simple blog than flat-file based blog is okay, make sure when choosing a flat-file based blog and planning to put lot of contents there, check first their algorithms in accessing the contents.

Note: I am the creator of the blog platform used by the demo site linked above so little biased.

Thanked by 1GoatSeller
«1

Comments

  • Loads amazingly fast.

  • edan said: Many people assume that the flat-file based blog will definitely slow if you have a lot of posts (more than 2k posts)

    you kidding right, plain html (js, etc) slow? :D

    Who? I've 10k post with Pelican.

    Thanked by 1ehab
  • Sounds cool, but what about comments?

  • edanedan Member
    edited December 2015

    darknessends said: Loads amazingly fast.

    That if using the twenty fifteen if using the ported version of Vapor (Ghost) the page generation time down to around 0.0012-0.0022 (on OpenShift) with that many posts.

    tommy said: you kidding right, plain html (js, etc) slow? :D

    Well people that choosing the wrong choice and let say they want list the content based on specific category, featured posts etc. I will not mention any product here since that's is bad :D

    2bb3 said: Sounds cool, but what about comments?

    So far only using Disqus and FB comments.

  • No matter what I do, it is amazingly fast.

    Thanked by 1edan
  • vRozenSch00nvRozenSch00n Member
    edited December 2015

    I've used miniPortail (now freeguppy) since version 2.0 and I'ts quite fast. There is a flat file based CMS called HTMLy, it's also a good one.

    tambahan:
    One of the key to flat file based CMS, don't use script to call menu, use a cached menu file

  • @vRozenSch00n said:
    There is a flat file based CMS called HTMLy, it's also a good one.

    OPs blog is build on HTMLy :) .

    There is also MonstraCMS - http://monstra.org/ which I use for my blog.

  • vRozenSch00nvRozenSch00n Member
    edited December 2015

    Hidden_Refuge said: OPs blog is build on HTMLy :)

    He he he, he made it ;)

    addition:
    I neet to try monstra.

  • What's the address to your blog?

    @Hidden_Refuge said:
    There is also MonstraCMS - http://monstra.org/ which I use for my blog.

  • @k0nsl said:
    What's the address to your blog?

    https://hiddenrefuge.eu.org/

    I've almost nothing on it though but MonstraCMS is pretty easy to handle and works good.

    Thanked by 1k0nsl
  • darknessends said: No matter what I do, it is amazingly fast.

    This is because it depend on CPU power to examine the data, the fast CPU you get, the query will much faster, and faster. With all saved as filename and folder name than we just need to open the file when all are match, if we set per pager to 10 posts, we only need to open 10 files per pager.

    vRozenSch00n said: He he he, he made it ;)

    The demo running the latest version with many improvements (undocumented yet) :D

  • I used first IkiWiki, now using Sculpin and Everstatic (the latter's my own purely static CMS).

    Note that using wiser directory structure (do not put all XXk files into single directory, use nested folders) and tmpfs/similar cachign does wonders.

  • Linux page cache will make it fast for the most common URLs right?

  • raindog308raindog308 Administrator, Veteran

    edan said: Many people assume that the flat-file based blog will definitely slow if you have a lot of posts (more than 2k posts) or lengthy time when rebuilding the contents.

    Two different things.

    For #1 ("slow") I don't know why anyone would think lots of posts = slow. It's just a static web server.

    For #2 (rebuilding), yes, I'd expect that if you have tons of pages to be rebuilt, it's slow. But I think some of the static generators operate more like "make" - only rebuilding what's changed.

    Most of the static blogs I've seen use Disqus or something like that for comments.

    Thanked by 24n0nx vRozenSch00n
  • @edan said:
    Note: I am the creator of the blog platform used by the demo site linked above so little biased.

    You need a "platform" to serve up flat files? I call the process I used to build my blog RAGtag, but I don't pretend it's a fancy platform.

  • I don't understand why static files would take longer to load when there are more of them on a server.

  • raindog308 said: For #1 ("slow") I don't know why anyone would think lots of posts = slow. It's just a static web server.

    Flat-file based blog is not static content, its dynamic.

    raindog308 said: For #2 (rebuilding), yes, I'd expect that if you have tons of pages to be rebuilt, it's slow. But I think some of the static generators operate more like "make" - only rebuilding what's changed.

    Do you know Hugo? https://gohugo.io/ it use similar directory structure with mine so as static website generator it should fast. I don't use rebuild content since I just collect the filename (not static).

    impossiblystupid said: You need a "platform" to serve up flat files?

    Its dynamic content :)

    Thanked by 14n0nx
  • edanedan Member
    edited December 2015

    Master_Bo said: I used first IkiWiki, now using Sculpin and Everstatic (the latter's my own purely static CMS).

    Well mine has content type like post, image, video, audio, link, quote, has a category and tags, static page and sub static page so its pretty dynamic content. With those directory structure than we can list let say latest image, video, audio, quote, or link easily (not implemented yet since no request about it yet, planned though).

    vimalware said: Linux page cache will make it fast for the most common URLs right?

    4n0nx said: I don't understand why static files would take longer to load when there are more of them on a server.

    Its dynamic actually since it use PHP interpreters and such but it use different method with most flat file based out there.

    Thanked by 14n0nx
  • exception0x876exception0x876 Member, Host Rep, LIR

    this is nice for systems with a very low RAM (like 64MB VPS), and I guess this is why you posted it on LET :)
    However on systems with enough RAM it would be a challenge to outperform any decent and properly indexed database with this approach.

  • exception0x876 said: this is nice for systems with a very low RAM (like 64MB VPS), and I guess this is why you posted it on LET :) However on systems with enough RAM it would be a challenge to outperform any decent and properly indexed database with this approach.

    Yes :)

    Versus database? yes that's an interesting challenge and only with proper test we can conclude which is the faster one.

  • edan said: The demo running the latest version with many improvements (undocumented yet) :D

    Very good work, bro. I'm proud of you.

    btw, you got dan.ovh I got den.ovh :D so we are related :P

  • If you have too many posts on a flat-file based blog wouldn't it cause I/O lag if you receive high traffic?

  • Thanks. Developing it in my spare time.

    Nice domain even though its .ovh :)

  • edanedan Member
    edited December 2015

    @Ruriko said:
    If you have too many posts on a flat-file based blog wouldn't it cause I/O lag if you receive high traffic?

    It should not. And I cached the ouput similar like WP Super Cache do, non login user or bot will generate the cache itself. Currently only file based cache but planned to put the cache index and the page cache to APCu/Memcache or any other memory caching system, but seems not needed for now.

    Database is more memory intensive and without caching (example WP) it will kill your server immediately if the blog receive high traffic.

  • vRozenSch00nvRozenSch00n Member
    edited December 2015

    My Guppy installation is not original. I tweak it a lot and added some functionalities. Among other, use of https://github.com/pear/Cache_Lite and some of them uses SQLite as the backend.

    Added:

    Here's an old but nice article https://mahtonu.wordpress.com/2009/09/25/cache-php-output-for-high-traffic-websites-pear-cache_lite/

    Thanked by 1edan
  • @vRozenSch00n said:
    My Guppy installation is not original. I tweak it a lot and added some functionalities. Among other, use of https://github.com/pear/Cache_Lite and some of them uses SQLite as the backend.

    Added:

    Here's an old but nice article https://mahtonu.wordpress.com/2009/09/25/cache-php-output-for-high-traffic-websites-pear-cache_lite/

    Thanks. Current cache is very simple but its work.

    Thanked by 1vRozenSch00n
  • Ruriko said: If you have too many posts on a flat-file based blog wouldn't it cause I/O lag if you receive high traffic?

    eh, it takes a lot of users to rape even the 20k+ IOPS/200MB/s read of a single, cheap, SSD. Millions. HDD... still 5 digit at least, with like 300 IOPS. RAM can likely serve any realistic number your network card can handle, i.e. speed will be your limit then by far.

    Thanked by 1edan
  • William said: RAM can likely serve any realistic number your network card can handle, i.e. speed will be your limit then by far.

    If you can cache a bit that would work the same with a database, isn't it?

  • Flat-file sounds like a great idea, assuming you're nowhere near your inode limits.

    Thanked by 1edan
  • @4n0nx said:
    I don't understand why static files would take longer to load when there are more of them on a server.

    sadly enough some file systems (most?) do linear scans of directories to get files, so opening a file in a heavily filled directory can be slow. I'm working on a flatfile comment system and was surprised to find how slow opening a file can be with 10s of thousands of files in a directory.

    @Ruriko said:
    If you have too many posts on a flat-file based blog wouldn't it cause I/O lag if you receive high traffic?

    Depending on how much data you have it'll be cached in RAM by the OS and it ends up just being syscall overhead. I'm storing each individual comment in its own file, so 1000 comments would be 1000 syscalls, cached or not. It's still very fast by most standards, but I'm just adding my own simple caching layer to keep the comments in memory.

    Thanked by 1edan
Sign In or Register to comment.