Sonntag, 15. Dezember 2013

Disk Full: How to Find and Effectively Delete Large Directories Using Linux

There are 4 steps to effectively free disk space when a partition is full on a linux machine:
  1. Find the parition that is full (with df -h)
  2. Find the largest directories (with du -kx /mountpoint | sort -n | less)
  3. Delete files
  4. Make sure no more file handles exist (with ls -l /proc/*/fd | less)

Find the partition that is full


The df command shows you how much space there is available on each parition. Using the -h option formats the output in a way that's easier to read, which looks as follows:

$ df -h
Filesystem                   Size  Used Avail Use% Mounted on
/dev/sda5                    147G   99G   41G  72% /
udev                          12G  4.0K   12G   1% /dev
tmpfs                        4.7G  976K  4.7G   1% /run
none                         5.0M     0  5.0M   0% /run/lock
none                          12G  156K   12G   1% /run/shm
/home/tsteffens/.Private     147G   99G   41G  72% /home/...

The 'Use%'-column you will show you which partition is too full and the 'Mounted on'-column will show you where it is mounted.

 

Find the largest directories


The command du allows you to print out all directories together with the sum of the sizes of all files in each directory (including subdirectories). To determine the largest directories for a single partition, you can use:

du -m -x /mountpoint | sort -n

This will produce an output as follows:

$ du -m -x /home/tsteffens/VirtualBox\ VMs/ | sort -n
...
1670    /home/tsteffens/VirtualBox VMs/vagrant_sar_1386255963
13175   /home/tsteffens/VirtualBox VMs/vagrant_db_1385708315
15936   /home/tsteffens/VirtualBox VMs/vagrant_db_1386167954
30780   /home/tsteffens/VirtualBox VMs/

The -m parameter tells du to print out the size in megabytes. -x skips directories on a different partition. sort -n sorts the lines by the first number in each line.

You might want to pipe the result to less for a better overview:

du -mx /mountpoint | sort -n -r | less

With the -r option sort reverses the order and starts with the highest value first.

Delete files


Once you have selected the directories you want to delete, you should have a look at the content using ls (-l tells ls to output the long format containing additional information like size and owner):

$ ls -l /whatever-directory
total 16317360
drwx------ 2 tsteffens tsteffens        4096 Dec  5 11:02 Logs
-rw------- 1 tsteffens tsteffens 16708927488 Dec  5 18:29 box-di...
-rw------- 1 tsteffens tsteffens        7617 Dec  5 18:29 vagran...
-rw------- 1 tsteffens tsteffens        7617 Dec  5 11:02 vagran...

This will show you which files are the biggest ones - in case you do not want to delete the whole directory. But there's another information that is easily overlooked: the number in the second row. It tells you how many hard links exist to that file.

If there should be more than one hard link to that file (i.e. the number in the second row is greater than 1), you need to make sure all other hard links to that file are deleted as well. Although this is a rather rare case, you can encounter it especially when working with version control or backup systems.

You can find all hard links to a file as follows:

find /mountpoint -xdev -samefile someFile

The -xdev option tells find NOT to traverse directories on other partitions. If you really want to delete all of them you can use:

find /mountpoint -xdev -samefile someFile | xargs rm

For directories it is absolutely fine to have more than one hard link to themselves: the . link in the directory itself (which you can see when appending -a to a ls call) will always increase the hard link count by one and each .. link in a subdirectory will do so as well.

If you shouldn't find any additional hard links, just go ahead with deletion using rm for files or rm -rf for directories.

Make sure no more file handles exist


After you have deleted some files, you may call df -h again to check how much new space you have won. Don't panic if this is much less than you expected. Unlike under Windows (where a file accessed by a program can not be deleted), Linux does not prevent deletion of files currently in use! But as long as the file is in use, the space for that file will not be available again.

To find out about that, you can make use of the proc file system as follows:

ls -l /proc/*/fd | less

This will list all open file handles of all processes. Now you can search for the file you have deleted (using cntrl-/ [downward seach] or cntrl-? [upward search]). If a process still uses it, you will find something like:

/proc/5150/fd:
total 0
lrwx------ 1 tsteffens tsteffens 64 Dec 15 18:08 0 -> /dev/...
lrwx------ 1 tsteffens tsteffens 64 Dec 15 18:08 1 -> /dev/...
lrwx------ 1 tsteffens tsteffens 64 Dec 15 18:08 2 -> /dev/...
lr-x------ 1 tsteffens tsteffens 64 Dec 15 18:08 3 -> /dev/tty
lr-x------ 1 tsteffens tsteffens 64 Dec 15 18:08 4 -> /home/tsteffens/tmp/test (deleted)

Now you know which process still holds a handle on that file - the process id can be found after /proc/ in the first line.

To get more information you can either look at the other proc fs entries in the processes directory (like /proc/5150/cmdline) or use the ps command:


$ ps aux | grep 5150
1000      5150  0.0  0.0  19436   996 pts/2    S+   18:08   0:00 less test
1000      5527  0.0  0.0  15048   620 pts/0    S+   18:47   0:00 grep ...

That tells you that in this case a less command still uses the file.

You can free the disk space by ending the appropriate process. For this you have (at least) 3 options:
  1. Stop the process "the way it should be done" (e.g. go to the terminal where the less command is still open an press q, or restart the service with service ... restart etc.)
  2. If that should be possible by any reason (e.g. it is no service and it's not opened in a terminal), you can use kill processId (inserting the id of the process for processId). This will give the process the signal that it should end itself.
  3. If the above point didn't succeed, you can still use the rough way using kill -9 processId - which terminates the process immediately. Note that the process will not be able to do any cleanup before shutting down.
You can check if the process was ended properly using

ps aux | grep processId

Final Remarks


Please let me know if you should have any questions or suggestions! Many thanks to Andreas Krüger who gave a talk on this topic, which gave me a lot of input for this post!

Mittwoch, 6. November 2013

JavaScript WTFs: Scoping and Invocation Context for Methods

Scoping of variables in JavaScript is weird. The invocation context for methods is even weirder!

Function Scope and Lexical Scope


Being used to languages like Java, C++, C#, Pascal etc., the first thing that brought a lot of WTFs to my mind when learning JavaScript was scoping.

The following code sample makes me want to cry out: This is wrong! This will cause a compilation error! a is not in scope, when it is accessed!

function someFunction() {
  if (true) {
     var a = 5;
  }
  console.log(a);
}


Well, in JavaScript this is not true. In JavaScript all variables defined in a functions are visible throughout the whole function, no matter if they were defined in a block or not. So this code will actually execute and output 5. This is called function scoping. JavaScript Scope Quiz (where this example was taken from) actually shows pretty well, how the scoping for variables works and where it probably differs from your expectation.

Reading about scoping in JavaScript I found the following:
"Like most modern programming languages, JavaScript uses lexical scoping. This means that functions are executed using the variable scope that was in effect when they were defined, not the variable scope that is in effect when they are invoked."
('JavaScript: The Definitive Guide, Sixth Edition', D. Flanagan, O'Reilly 2011)

With that in mind, it totally makes sense that the following example (also taken from the JavaScript Scope Quiz) will output 12:

function getFunc() {
    var a = 7;
    return function(b) {
        console.log(a+b);
    }
}
var f = getFunc();
f(5);


So far so good. Although it might seem strange that the context of a function 'already swiped from the function stack' is still available, this seems to be a consistent approach which you can get used to. But wait, there's a big WTF waiting - have a look!

Scoping and Invocation Context for Methods


Consider this:

var someObject = {
  a: 1,
  displayValue: function() {
    console.log(this.a);
  } 
}

var someOtherObject = {
  a: 2,
  listener: null,
  registerListener: function(listener) {
    this.listener = listener;
  },
  callListener: function() {
    this.listener();
  }
}

someOtherObject.registerListener(someObject.displayValue);
someOtherObject.callListener();
 
someObject has a method to display the value of a. This method is registered as a callback via someOtherObject.registerListener(...) and invoked via someOtherObject.callListener().

From the things you read so far, you might be tempted to think that someOtherObject.callListener() will call someObject.displayValue() and thus, using the context of where displayValue was defined, the ouput would be 1 ... wrong! Go ahead and paste it to the JavaScript console of you browser. The console log will display 2.

Let's see what happened here. First someObject.displayValue() is registered via someOtherObject.registerListener(...). When someOtherObject.callListener() is invoked, it will call someObject.displayValue(). Superisingly, a is not retrieved from someObject, but from someOtherObject!

What do we learn from this? The this keyword always references the object whose code is currently executed. WTF? Yes, this is inconsistent in terms of lexical scoping and yes, this is error prone because methods can have really nasty side effects depending on the context in which they are invoked.

But of course, besides turning to the gods and crying "How could you let this happen? How could you do this to us?", there are some workarounds to get this fixed, two of which I want to show you.

Make Object Context Explicit


Luckily you can provide the invocation context in a explicit way by using the call function. This is shown in the following example:

var someObject = {
  a: 1,
  displayValue: function() {
    console.log(this.a);
  }
}

var someOtherObject = {
  a: 2,
  listenerObject: null,
  listenerMethod: null,
  registerListener: function(listenerObject, listenerMethod) {
    this.listenerObject = listenerObject;
    this.listenerMethod = listenerMethod;
  },
  callListener: function() {
    this.listenerMethod.call(this.listenerObject);
  }
}

someOtherObject.registerListener(someObject, someObject.displayValue);
someOtherObject.callListener();


Now we get the 1 that we expected before!

For registering a listener we provide a listenerObject in addition to the function to be called. Invoking call(...) on listenerMethod() with listenerObject as first parameter invokes listenerMethod() in the context of listenerObject - meaning this will be someObject and not someOtherObject.

You might want to make sure, that the object passed as invocation context (someObject in our example) is never null, undefined or a primitive value because then the global context (null or undefined) or a wrapper object (primitive value) would be used as invocation context. By the way: any parameter passed to call(...) after this first would be passed to listenerMethod().

This workaround is nice as long as you have control how the methods on your object are invoked. But if you don't - e.g. because an external library will invoke you method as a callback - or if you just forget to do it the right way, this will most certainly lead to unexpected program behavior.

Make this a Local Variable in a Function


There is another another way to avoid this problem, which seems to be commonly used. This time we use a little trick to remember the desired context without explicitly having to pass it when invoking a method:

var someObject = {
  a: 1,
  createDisplayValueCallback: function() {
    var that = this;
    return function() {
       console.log(that.a);
    }
  }
}

var someOtherObject = {
  a: 2,
  listener: null,
  registerListener: function(listener) {
    this.listener = listener;
  },
  callListener: function() {
    this.listener();
  }
}

someOtherObject.registerListener(someObject.createDisplayValueCallback());
someOtherObject.callListener();


As you see this time the code for someOtherObject is entirely unchanged. Instead we created the method createDisplayValueCallback() which returns a function that does what displayValue() did before. Notice that it operates on that (rather than this), which is defined as a local variable in createDisplayValueCallback() and assigned the value of this. This time the lexical scoping does work as expected, so when the function returned by createDisplayValueCallback() is called, that will be the same as when assigned in createDisplayValueCallback().

You could even add the following method to someObject, if you wanted to leave the possibility to call displayValue() as before:

  displayValue: function() {
    this.createDisplayValueCallback()();
  }


Note that this will again only work if called with the correct invocation context.
 
Of course, there are cases (like this example) where this approach does decrease understandability. But it might be the only chance to get things work the way you want if you have to rely on callbacks from other libraries. And also there are cases where it looks a little better - like this snippet:

...
  init: function() {
    var that = this;
    this.listenee.registerListener(function() {
      that.doSomething();
    });
  },

  doSomething: function() {
    ...
  } 
...

Conclusion


Be aware of scoping in JavaScript - especially when working with objects. And of course: make sure you didn't do it the wrong way by writing meaningful unit tests with a high coverage!

Did you find this blog post useful? Did I write crap? Any other solutions? Questions? Please let me know!

Freitag, 1. November 2013

Preserving Knowledge

During my years of working as a software developer I found that having up to date knowledge is invaluable. It's quite a challenge to keep an eye on the rapid evolving technologies and to select and actually learn what is important for you. But there's another challenge that is easily overlooked: preserving the knowledge. In my experience, things you learn that you don't reactivate regularly - for example by using them, talking or thinking about them -  tend to vanish more quickly than you'd like them to.

In this blog I want to write about the things I learned, having the following aims in mind:

  1. Persisting Knowledge. Writing something down might help me structuring and therefore remembering things. If that doesn't work I can still look up whatever I wrote!
  2. Sharing Thoughts. Maybe you will find this blog useful or inspiring.
  3. Getting Feedback. Let me know what you are thinking - I appreciate any new input!