No more corruption on the RPi

I talked in a preceding post about corruption problems on my RPi’s SD-Card. I was told that 4.65V is very low for the RPi and that was probably the cause for the frequent corruptions of the SD-Card. Unfortunately new peripherals drop the voltage rather quickly on the RPi. Here are the voltage after each plugin in (cumulative):

  • None: 4.80V
  • Ethernet (internal – smsc95xx)4.77V
  • Ethernet (external – asix): 4.66V

So my first solution was to limit the stress on the SD-Card. I measured the IO bandwidth of various task with iotop -aoP. Turns out that most tasks accounted for only a handful to a hundred of kylobytes. These tasks are sporadic (dhcpd, rsyslogd and jbd for the most part) and in the end of the day accounted for less than 20MB read/write on the SD-Card.

On the other end an update (apt-get) account for around 80MB of writes. That’s a lot combined to increased CPU usage that further contributed to the voltage drop. With apticron running an update at least once a day, it’s not a wonder that the SD-Card got corrupted so quickly.

So my first solution was to put apt into a tmpfs. That is in /etc/fstab:

tmpfs /var/lib/apt   tmpfs noatime,nosuid,nodev,noexec,mode=755 0 0
tmpfs /var/cache/apt tmpfs noatime,nosuid,nodev,noexec,mode=755 0 0

And we don’t want packages to fill the cache. So we specify that the cache should be emptied after each package installation / upgrade. That is in /etc/apt/apt.conf.d/70no-cache:

DPkg::Post-Invoke { "/bin/rm -f /var/cache/apt/archives/*.deb || true"; };

This way an apt-get update does not solicit the SD-Card anymore. On the plus point updates are also faster. However there are two disadvantages with this solution:

  • You cannot upgrade after the system just rebooted. You need to rebuild the cache with apt-get update first. But that is not a problem as apticron does so automatically once a day.
  • The number of packages you can install / upgrade at once depends on their size and the size of the tmpfs. But it is OK if you frequently upgrade your system on stable.

But that is probably not enough to avoid corruption on the SD-Card. Or at least this is what I thought. So the other solution was to find a way to raise the voltage from 4.66V to a reasonable value. The F3 polyfuse that protects the board has a noticeable resistance causing a voltage drop of ~0.2V.

The F3 polyfuse (green) is located at the bottom right of the board, next to the zero ohm resistor.

The F3 polyfuse (green) is located at the back bottom right of the board.

I soldered the polyfuse and the voltage raised to 4.85V. Did not have any corruption problem since more than a month. Fantastic!

However remember that the fuse is there for a reason. It limits the maximum amount of current powering the board. Without the polyfuse the RPi can ask more current than the PSU is rated for (which can happen for example if you short the GPIOs). So it might be a better idea to try another power supply or USB cable. I just like to live dangerously. I also protected those RPi with a case. Note that the RPi B+ and newer have a new power supply circuitry with a lower voltage drop. So all of this may not be needed.

Constant SD-Card corruption on the RPi

Our home servers broke. Here we are again.

I spent weeks of my time, countless evenings up to 4AM, entire weekends since months trying to design and configure our reborn home-servers and gateways.

And it was neat.

  • DNSSEC all the way down
  • RPC accross the nodes
  • Easy configuration
  • Caching and stuff
  • Automatic tests

It took me a lot of time to assemble all of this in something that I liked. And to document everything so that we could easily install a new node from scratch.

I installed two nodes and it worked well for several weeks. Until a week ago or so I started to see corruption on the first node. And by corruption I mean random garbage in a lot of binaries and libraries. Exec format error at every corner. At this point it was completely broken and useless so the only option was to reinstall it.

So I used a new SD-Card, changed the power supply and reinstalled everything last weekend. Just finished today and also fixed bugs in some of our scripts. Had to search for a package on the second node which at this point was still in a pretty good shape.

$ apt-cache
zsh: exec format error: apt-cache
$ su
zsh: exec format error: su

Dang! So there goes another weekend I will spend to reinstall the thing. And who knows how long until the first node gets corrupted again.

Checked the TP1-TP2 voltage, 4.65V, probably because of the second USB Ethernet adapter. I tried to limit the amount of writes on the SD-Card. No heavy writers, no swapping, no overclocking.

So I must be doing something wrong, right? Right?! The RaspberryPi can be that unreliable. I wonder how many power supplies and SD-Cards I will have to buy and try until, by sheer luck, I do not have to reinstall everything in the following three months or so.

I ran into this problem years ago. And now it seems that I will run in the same problem over and over again. Any recommendation is welcome of course. Though to be honest, for now, I just want to fly the damn thing across the room.