Split a large directory into smaller chunks

I wanted to archive a 130GB directory containing various files to bluray discs. As bluray burning seems to be "highly experimental" under Linux, no GUI program I wanted to install could write to such a disc. On the command line it works with dvd+rw-tools which wants a subdirectory with contents that gets written to disc.

The code below takes an input directory 'basedir' and hardlinks its content into smaller directories under 'targetdir' with a maximum size of 'volumesize'.

#!/usr/bin/python # burns with: growisofs -udf -iso-level 3 -allow-limited-size -Z /dev/sr0 vol1/ import os basedir = '/media/Blackbox/More_Stuff/Software' targetdir = '/media/Blackbox/More_Stuff/Software-bkp' volumesize = 48440016896 # Size from dvd+rw-mediainfo print 'Creating hard-linked directory structure from\n%s\nto\n%s\nwith a volume size of %d bytes\n' % (basedir, targetdir, volumesize) def createroot(root): # print 'createroot on path %s' % root if not os.path.exists(root): os.makedirs(root) def makelink(relname): src = '%s/%s' % (basedir, relname) target = '%s/%s' % (volume_dir, relname) # print 'Linking %s -> %s\n' % (src, target) try:, target) except OSError as error: print 'File probably not readable for us: %s' % src size = 0 vol = 1 volume_dir = '%s/vol%d' % (targetdir, vol) os.makedirs(volume_dir) for root, dirs, files in os.walk(basedir): for name in files: fname = os.path.join(root, name) fsize = os.path.getsize(fname) relname = os.path.relpath(fname, basedir) if (size + fsize) > volumesize: print 'Volume %d: %s bytes.' % (vol, size) # New volume vol += 1 size = fsize volume_dir = '%s/vol%d' % (targetdir, vol) os.makedirs(volume_dir) else: size += fsize newroot = os.path.dirname(os.path.join(volume_dir, relname)) createroot(newroot) makelink(relname) print 'Volume %d: %s bytes.' % (vol, size)


Day 44

Trinidad Scorpion seedling Ghost Pepper seedling Fatalii seedling

Three chili plants and all of them look different. The Scorpion (left one) has somewhat longish leaves while the others are shaped more classically. The Ghost Pepper is also different as the trunk is coloured slightly red while the others are completely green. The Habaneros, too, have hints of red in their colours. They are also growing their first real leaves, so it's time to repot them in the next few days. They will have to leave their heated greenhouse, but indoors temperature should be enough.

New (and probably final) leaderboard:

  • 3x Jolokia
  • 3x Habanero
  • 2x Fatalii
  • 1x Cayenne
  • 1x Trinidad Scorpion

Day 42

Finally! A Trinidad Scorpion has sprouted. While on the last spot of our leaderboard, I hope it'll take the first place when it comes to heat. Well, two more Scorpions to go. I hope they, too, will grow.

Day 39

The Scorpion remains tenacious. Still no sign, so I used three new seeds. Unfortunately it looks like the seeds are bad or the Scorpion wants it even hotter, but I doubt that. I also sowed two new Cayennes since only one of the original seeds sprouted.

The new leaderboard:

  • 3x Jolokia
  • 3x Habanero
  • 2x Fatalii
  • 1x Cayenne
  • 0x Trinidad Scorpion

Nine plants should be a good start. I originally had 12 last year and gave four away once they bared fruit.

The Ghost Pepper

Chilis in a Greenhouse

While the Cayenne and Scorpions take their time (they got 30 days!) the first of the Jolokias is growing the first real leaf. The middle one in the picture above.

The little greenhouse has artificial lighting, because at about 16:30 it's already dark outside. It is a ~2W LED light that did the job just fine last year. It might be a better idea to have a light source that also emits UV. Last year the poor plants were kind of shocked when I moved them from inside behind the window to the balcony into the direct sunlight. That's not a pretty sight if you leave them in the direct sun for too long. Luckily they all adapted after a while and around August all of them could be left standing in the brutal midday sun and they seemed to enjoy that. ;) (As long as enough water was supplied)

First Little Chilis '14

The first chilis have already sprouted. The Jolokias took the lead. I planted three of each type, so that would be 15 chili plants if everyone of them grows. Last year only 2 of 16 did not sprout.

Ghost Pepper seedling Habanero seedling Fatalii seedling

The leaderboard is now as follows:

  • 3x Jolokia
  • 1x Fatalii
  • 1x Habanero
  • 0x Cayenne
  • 0x Trinidad Scorpion

Hopefully the rest of them will follow soon. They are all kept inside a little greenhouse with a heating mat below on the window sill. It's (kind of) winter outside and this will keep them at a nice 25-30°C.

Season '14

This year's chili season just started with:

The scorpion is new this year, everything else I cultivated already last year. Unfortunately last year the habaneros didn't turn out to be very hot. Interestingly some of the Fataliis smelled almost dangerous so it couldn't have been lack of sun. Let's see if the Trinidad Scorpion can live up to its reputation. :)


When you think you know all… most… well, a lot of Mysql's strange ways to fuck up your day. There is always something new to learn.

A regular SQL dump that has been generated with mysqldump on a totally ordinary Linux system. And when mysqldump just works on a database of several gigabytes something must be strange.

# mysql --one-database DB < db.sql ERROR 1054 (42S22) at line 1199: Unknown column 'NUDL' in 'field list'

What the noodle? Inspecting the SQL dump reveals the following INSERT:


Apparently the value should be NULL, but somehow mysqldump managed to mangle that into NUDL.

The Internet is for por... spam!

It took less than a week, if not only three days, for spammers to discover the new wolfpaper greeting cards I added to the archive. One day I noticed spam being posted and after a couple of days another one with very meaningful URLs like (not an actual URL, I hope) and completely invalid email. I added a protection from the pattern that was used.

And then there was silence. 

A legitimate card was created some time later so I was reminded that everything actually works. Out of curiosity I checked the server's log to see how many junk I had (not) missed. My guess was one maybe two every other day. Well, not quite. 

In the last week's access log there were about one hundred post actions. Every access from the spam script (it's too stupid to be human but you never know!) consists of two posts. So about 50 attempts in one week. That may not sound very much, but at the time this started the feature was live merely a week and nobody mentioned it on any site, board or weblog. 

I would really like to know how the spammers, or better "spam script" as I still pretend they cannot be human, discovered it so quickly. 

File formats that do not rock: CS...X!?

Comma separated XML – or also known as “parse this!”

<data> funny;column;description;goes into this field oh;and;here;is;some;other; column description dataset;1;2;3 <moredata> some id;123;43;653;314;fubar some id;325;31;434;143;blah some id;343;fu;---;;bar </moredata> <evenmoredata> interesting;description;for;some;columns;below or above blah;fasel;kek;blubb;10;432441 foo;bar;baz;<3;2352;23 </evenmoredata> </data>


  1. Find the dataset lines matching the definitions.
  2. Parse a single document containing 68000 such datasets.
  3. Write all of the data into an SQL database using sensible data types (text for each column does not count!) succeeding on the first try.

← newer stuff older stuff →

Index: 1 - 2 - 3 - 4 - 5 - 6