th.oughts

Living the fake American dream, 3 years at a time

In Part 1, we had a quick introduction to tapes and tape drives and why you would choose one for your backups. In this part, we talk about actually using tapes to create a backup strategy using simple scripting.

Of course, there are plenty of readily available tools that might suit your needs. Writing your own does have a few advantages though: first, it keeps it simple, highly customized and second, when things go wrong, you would probably have a better idea of why the damn thing isn't working!

A look at tape's workings

If this is your first time dealing with tapes(as it was for me), there's a few prerequisites.

Tools

Tape operations are carried out via the mt tool. Data is written with tar. Both of these are probably already installed on your system.

Writing data to tapes

Do you remember the good old days of cassette players ? Tapes are similar. A magnetic head reads and writes data from a magnetic ribbon spooled in an enclosure. With that in mind, there are a few operations that you would do frequently:

  1. rewind: rewinds(of course!) the tape and points the tape head to the beginning of the magnetic ribbon. Example command: mt -f /dev/nst0 rewind

  2. forward: Move forward count files. Every time data is written, a marker is set at the end. Let's say, you write some data to the beginning of the tape: tar cvf /dev/nst0 backupdir Rewind the tape as above. Now, you want to move forward one marker: mt -f /dev/nst0 fsf 1

  3. erase: That's a really slow process and could take hours (if not days) for larger tapes. However, you can also do a short erase: mt -f /dev/nst0 erase 0.

Tar and incremental backups

tar has a handy feature that lets you do incremental backups and the workings are really simple. Let's look at an example:

  1. tar -C /home —listed-incremental=diff.snar -clpMvf /dev/nst0 data This is what we call a Level 0 backup. diff.snar is special – it contains a log of all the files that were added to the archive.

  2. Next, lets's say you add file.txt to folder data and run the above command again. The only file that would be added to the archive is file.txt. Moreover, diff.snar would also be overwritten with the only one entry that was just added to the archive. This would be a Level 1 archive.

Obviously, if you would want to have a record of all the backups, you wouldn't want to overwrite diff.snar but have rather something like this:

  • diff0.snar: level0 backup
  • diff1.snar: level1 backup and so on...

Backup Strategy

With all this quick preliminary information, we can try a incremental backup strategy as follows:

  1. Maintain two sets of full backup tapes and two sets of incremental backup tapes.
  2. Create a full backup the start of every cycle: could be a month, bimonthly, quarterly or whatever you prefer.
  3. Until the beginning of next cycle, perform incremental backups.
  4. At any point of time, you should always have a backup set that has a full backup of the last cycle as well as incremental backup tape(s) of the last cycle.

Tape utility script illustrates the idea. To perform a full backup, you would run something like:

tapeutility.sh -d /dev/nst0 -F -p /etc/tapeutility/folders.txt where “-F” does a full backup of folders listed in “folders.txt”.

For the next run, to create an incremental backup, you would run: tapeutility.sh -d /dev/nst0 -I -p /etc/tapeutility/folders.txt where “-I” does an incremental backup.

Please take a look at the script for how the metadata file is determined for incremental backup and other features available for basic tape maintenance.

Wrap-up

I presented a simple way to use tapes for backups. Using a combination of full and incremental backups, and maintaining two sets of tapes, we have reliable backup of data that you could combine with a RAID style setup for long term reliable data storage.

#backups #bash #diy

Part 2 of my Poor man's UPS has not showed up and probably never will; so, let me write about something I did recently.

Unless you were born around the time when people were still talking about the differences between PC-AT and PC-XT and BBC Micros were the cutest thing around, tape storage must be pretty antiquated for you. Allow me to contradict you though, and tell you, how cool tape storage is!

Why tapes ?

Tape storage is common sense, it's how you would design a storage solution if you started with nothing. What does that mean ? Well, data is pushed down a pipe sequentially, the imaginary picture of your data you have in your mind is exactly how tape would store it. Of course, magnetic media needs to be handled with care but tape storage is designed to be scalable and stable and you can have a functional backup system with simple UNIX tools. There are other exceptions too, such as LTFS that treats tape storage just as regular file system. It's impressive, but it's also overkill to treat everything as a regular file system. Just like programming languages – everybody wants to write their own :)

LTO

If you are getting into tapes, you definitely want LTO (Linear Tape Open) tapes and drives. It's backed by a few great ones and the technology still sees revisions; the latest being LTO-9 which can store a whopping 45 TB of compressed data in a single cartridge.

Now, when you are buying tape drives, look for the latest LTO drive that you can afford (they are backwards compatible). Also, the big names are usually manufactured by IBM and rebranded (such as the Dell PowerVault that I own and what this post covers) and they are known to be very reliable. You will also notice that internal mount drives are usually cheaper than external drives; but beware, internal drives need proper ventilation and cooling. A safe bet is to always go for a drive with external enclosure.

Connectivity

Almost every external tape drive I have seen in the wild and on the used market uses SAS. So, it's a must that you have a unused sas port on your system. Nothing exotic, something like a LSI SAS 3008 would do. Older ones should be fine too but cheaper and maybe, a bit unreliable.

Getting a used tape drive

Now that the basics are done, it's time to move on to getting one. Your best bets are eBay auctions of enterprise equipments but there are two things I would stress – first, try to get your hands on a good brand (Dell/IBM or maybe HP), and second, unless there's a test report included, make sure there's a return period within which you can can test the system.

Testing it out

So, you have got one and set it up, it's time to make sure everything's ok.

Self tests

All drives should have some form of self tests. You will most likely need an empty cartridge for all tests. They can be run either via a combination of button presses or via an ethernet interface. At least, the PowerVault has one. Try to run all of diagnostics and make sure everything passes.

ITDT

IBM's Tape Diagnostic Tool is also a handy tool for running diagnostics and firmware upgrades. There's a good chance, your drive is supported (remember how I mentioned that IBM is the OEM for most drives ?) and if it is, you can run many of the diagnostics that the tool offers.

Firmware upgrades

Dell, for example, provides firmware for many of its drives but the requirements are somewhat esoteric to be able to run the executables. However, as long as you have a way to extract the firmware image, and ITDT supports your drive, you can use it to upgrade the firmware. For example, with Dell firmware, you can do something like:

./Tape-DrivesFirmware64VG4LNM571_A07.BIN —extract firmware

which will extract the firmware image to the firmware/payload directory.

Once you have the firmware file(*.fmrz) you can use ITDT to install the firmware.

Using your tape drive

And that's it! Now you can use standard mt and tar commands to read and write data from/to your tape drive. You can also use well known backup tools if you wish. In the next part of this post, we will script a scalable backup strategy for our data using readily available unix commands such as tar, mt and family. Stay tuned!

As a follow up to this post that I wrote a while back, one of things I have been thinking of doing is to have a reliable uninterrupted power supply. The setup is powered by a typical run-of-the-mill power bank which supports passthrough. However, these batteries typically rely on a mechanical relay which introduces a short break when the power switches from battery to mains supply. The unfortunate outcome is a hard power cycle of the RPi during power cuts that is pretty common in this part of the world! So, without further ado, let's look at our options.

A diode setup

Let's consider this simple circuit. Diode as a forward switch

V1 simulates a pulse to show a sudden voltage drop to 0 (simulate a blackout). D1 and D2 are regular silicon diodes with a forward voltage of 0.7v. While the circuit protects the battery from getting damaged when mains is powering the RPi, the voltage drop brings the output voltage down significantly. We can replace these with germanium or Schottky diodes that have lower forward voltage drops. However, these come at the expense of higher reverse leakage currents and lower stability with temperature variations. Let's try something else.

A single MOSFET setup

A MOSFET can act as a switch with a lower forward voltage drop. Let's modify our original circuit and include a P-MOSFET. MOSFET as a switch

There are two issues here – First, our diode problem still remains and second, M1's drain to source path will try to charge the battery which may be undesirable. To understand why we need the diode, let's take a look at how the MOSFET operates. The P-channel of the MOSFET stops conducting when a positive gate voltage is applied. Now, if V1 were to turn off, M1 turns on and OUT now sources the battery. In the absence of the diode, the gate will be at the same potential as OUT which will turn it off!

Could we replace the diode D1 with another MOSFET ? Let's take a look at a simplified circuit that does that.

Rotated MOSFET setup

There's an important thing to point out – the MOSFETs are rotated, meaning, the source is connected to the point where drain should have been connected and vice versa. So, current always flows from drain to source. Or in other words, the semiconductor acts more as an off switch and simulates and ideal diode. But does it really work ? When V1 is on, there's a positive gate voltage at M2 and so current cannot flow into V2 and damage it. When V1 is 0, M2 is on and conducts in both directions.

We are approaching the ideal diode behavior but there's still a minor hiccup. When V2 > V1, the battery will start discharging even if V1 is on! The solution to that is to add another MOSFET to M2 but rotate it. Yet another issue in the previous circuit is that M1 is always on which might cause current to flow into it from V2 potentially damaging V1. The solution to that is to turn M1 on only when V1 is powering the circuit. This can easily be achieved with the help of a differential pair. The final circuit reflects these changes.

Final Circuit

As mentioned above, M2 and M3 are the MOSFETS connected back-to-back and Q1 and Q2 form a differential pair. When V1 is active, Q1 conducts and M3 is off. This prevents current to flow out of V2. When V1 is off, Q2 conducts first which in turn will turn off M1. The battery now powers on the circuit. Let's take a look at a few use cases -

  1. V1 = 5V > V2 = 4.8V Graph1 Here, Vout is V1 – the forward voltage drop, so we are good.

  2. V1 = 4.8V < V2 = 5V Graph2 Even though V1 < V2, it still takes precedence.

  3. V1 simulates a blackout – on/off/on. Graph3 When V1 is on, it drives the output. V2 takes over at t=2 and until t=6.

In the next part, we will decide on taking this circuit out on a drive in the real world and/or investigate solutions that already do this job such as the CAT6500 (now obsolete!).

#tech #diy #electronics #mosfets

A while back, my rusty Sans Digital TowerRAID gave up. Honestly, it had not been a very expensive investment, presumably, at the cost of reliability. Nevertheless, I got a few good years out of it. From the looks of it, it looked like the power supply failed and although, I could have replaced the power supply board, I decided to venture out for future proofing my storage requirements.

Upgrading from a 4 slot JBOD enclosure to 8 disks enclosure

Pretty much everything out there comes at a price of greater than $500 for a 8 slot JBOD. Most of them don't have decent reviews and the ones that do are usually more expensive. That led me to the other option.

DIY

I wanted to explore this option before I splurged on a brand name enclosure. Luckily, there were many helpful resources^1^2 available that led me to believe this is indeed a possibility. Below, you will find a BOM of what went into my DIY JBOD. The heart of the device is a RAID expander. Ofcourse, you also need to invest in a decent enclosure that houses everything.

RAID Expander ~$60

The item we are looking at is a discontinued Intel RES2SV240 that you can still find on Ebay and some other stores. This was more than enough for my needs – It supports SAS-2. it has 24 ports- 4 ports/1 socket connects to the cable, that in turn connects to the SAS initiator. The rest can be connected to disks – so, you can plug in 20 disks theoretically.

Power Board ~$70

This one's optional in my opinion but it does make the whole setup a little more polished. The one that I used is a SuperMicro CSE-PTJBOD-CB2, again, pretty easily available on Ebay. What this does is let you use the enclosure switch to control power to the system. This would not have been possible otherwise, without a motherboard.

Mini SAS SFF-8088 to SFF-8087 Adapter ~ $25

This will be our portal to the outside world. The SFF-8088 cable (that I already have) will connect the expander to the initiator on the server. The one that I got(CableDeconn) conveniently fits into a full height PCI slot on the enclosure.

SFF-8087 to 4 SATA ~$20

This goes from the RAID expander to the backplane in the enclosure that we will use. Since I plan to use 8 disks, I got two of the cables.

SFF-8087 to SFF-8087 cable ~$8

This cable connects the expander on one end and the SFF-8088 to SFF-8087 adapter on the other end.

Power supply ~$50

Nothing special here, I used a 430W 80+ ATX supply but that's more than what you would need.

Enclosure ~$160

This was the most expensive buy for the project but it's worth it. I decided on a SilverStone CS380B which doesn't have stellar reviews, to be honest, most complained about unsatisfactory ventilation but I was sure I would be fine because I wouldn't install a motherboard in it.

Fitting everything together

The enclosure already has a backplane for the disks. The RAID expander card as well as the SFF-8087 to 8088 adapter both went into a slot on the enclosure where a full height card would usually go. I had to drill some holes so that the power board could stay in place.

Here's a pic of the innards after everything has been fixed in place: Enclosure

Total cost and troubleshooting

Total cost comes out to be ~$400 which is still a good price for a system that can house more than 8 disks (The Silverstone has internal bays for a few more).

There's nothing here that could go wrong. Everything's pretty much plug and play. The only thing worth noting is that the expander card has been discontinued and there's probably not a lot of them out there. You might end up getting a dead card. If things don't work out as expected, just blame it on the card and get a replacement! :)

My setup has been going strong for a few months now. I am glad I went this route!

#tech #diy #jbod

My Dad had a specific set of requirements from a security camera he wanted for our home back in India. When I researched between options, on whether to buy one or to build something, I stumbled upon many builds based on the Raspberry Pi. Most successful builds run Motion on top of a RPi board, or maybe, even Motioneye for a friendlier UI. This post summarizes the issues that I/you are likely to face and what I did about them.

Underwhelming hardware

I used a RPi 3 B board that has a 1.2 Ghz quadcore ARM processor. For processing a video stream and running the motion detection daemon, it's not really very capable and you would end up with stuck/unusable frames on your stream. One of the things that makes a huge difference is the incoming stream frame rate and resolution. I got the best results with sliding down the incoming frame rate to as low as 10 on the camera that I am using.

B vs B+

The B+'s advantage is more on the I/O side and it really doesn't make much of a difference with processing power when it comes to the video stream. On the other hand, the B is more battery friendly which was a major requirement in my setup owing to the frequent power-cuts associated with Indian summers. Overclocking, too, isn't worth it if you consider the battery drain (as high as 20% faster) compared to any noticeable performance gain.

Backup power

As mentioned above, this was an important requirement. I used a 20000 mAH battery that has passthrough. On the downside, when passthrough triggers, there's a momentary disconnect in power which restarts the camera and the RPi which is undesirable but the small downtime is acceptable.

Network

One of the requirements was failover to a backup network but jumping back to the main network once it's back up. A reverse tunnel to a public IP takes care of ssh and http access and could be easily scripted as well. HTTPS is achieved by setting up a nginx reverse proxy on the public facing system and integrating with letsencrypt.

Motion detection

False positives is a major challenge and I could get a good compromise with a mix of a few things: – Setting up a manual mask. This is easy to do with the motioneye http interface. – Using a despeckle filter. Take a look at this article for a nice write up by the author. After experimenting with several combinations, EedDl gave the best results (which also happens to be the recommended starting poin). – Experimenting with thresholds. I used the threshold_maximum parameter to minimize the maximum pixel change. A script changes the threshold value based on input from a LDR similar to this setup.

Usability

The system is easy to use/configure with the Motioneye http interface but to make it a little bit more interesting, I used some NFC tags to enable/disable motion detection. This can be easily done with Tasker along with the NFC plugin for it. This script takes care of syncing up the config file with the current state of motion detection.

#thoughts #tech #diy #rpi #bash

Large file transfers with Media Transfer Protocol (MTP) are very mysterious. The most common search results are about Android users having difficulties with transferring large files and while, this has nothing to do with the MTP specification itself and more to do with the largely deficient Android MTP implementation, MTP typically bears the users' wrath. I believe that part of the stigma associated with MTP has to do with Microsoft being instrumental in its inception. There are some others, for example, the name could have been better! (A person, apparently a storage expert, was actually unaware that Media Transfer Protocol, while focused on media files can actually handle other file types as well!). As to why Android decided MTP to be the default file system is something we can get into in another post but for now, let's get back to the curious case of large files.

The MTP spec could use some clarity for the section on writing large files. The fundamental issue with writes being that the MTP object metadata uses a 32 bit unsigned integer for the file size which limits the file size. The spec mentions that for a file size > 4GB, the size should be set to 0xFFFFFFFF. However, it doesn't clearly specify how to manage the transfer itself (when/how to stop!).

So, can you transfer a > 4GB file with MTP ? Yes, you can and there are two ways to implement it. The first is to use a ObjectPropList, specifically, support SendObjectPropList. This operation has a 8 byte field for object size. However, ObjectPropLists may not be supported by a lot of initiators and even if they are, initiators may decide not to use them for write support.

The second method is mostly reliant on the transport mechanism that MTP mostly relies on, USB. It probably even makes the file size field in the object metadata an optional item. USB guarantees that the end of the data phase will be marked by what is called a short packet. A short packet is a data packet whose size is less than wMaxPacketSize specified by the endpoint descriptor, basically the size of the data buffer in the endpoint. The snippet to handle the write operation in the responder could be something like:

  total_size += incoming_datasize;
  if (incoming_datasize < wMaxPacketSize) {
    /* This is the last packet in this data phase */
    write_buffer_to_file();
    free_buffer();
  }

There's a small hiccup here though. Typically, USB controllers will try to fill up their buffer before sending the data out. So, the above “if condition” may become true even before the actual end of the data phase and mark an incorrect end of data phase. Fixing this is easy though and relies on the fact that the incoming data size will be a multiple of 64 unless it's the end of the data phase:

  total_size += incoming_datasize;
  if ((incoming_datasize % 64) != 0) {
    /* This is the last packet in this data phase */
    write_buffer_to_file();
    free_buffer();
  }

#tech #programming #qemu #mtp

As a newbie programmer, recursion took some time getting used to and once something worked, it usually appeared like magic. So, when working on an evening Lisp project, nostalgia kicked in and encouraged a little bit of exploration!

Let's consider a program that computes that the nth-power of a number. I usually tend to think in C (just like my thought process is mostly in my mother tongue which is not English) and looks as simple as:

int nth-power(int number, int n)
{
    if (n == 1)
        return number;
    else
        return number * nth-power(number, n - 1);
}

The Lisp (elisp) version is easy enough and almost identical:

(defun nth-power (number n)
   (let* ((result number))
   (if (eq n 1)
        number
        (setq result (* result (nth-power number (-n 1)))))
        result))

This works for small numbers but let's try a large number:

(nth-power 200 10000)
Debugger entered--Lisp error: (error "Lisp nesting exceeds `max-lisp-eval-depth`")
...

Ok, elisp gives up but what about SBCL:

(nth-power 200 10000)
(SB-KERNEL::CONTROL-STACK-EXHAUSTED-ERROR)

However, if we slightly modify the program to look like this:

(defun nth-power (number n acc)
   (let ((result (* number acc)))
     (if (eq n 1)
          result
     (nth-power number (- n 1) result))))

The result is:

Some huge number

The point is there's no more stack overflow. How so ?

Notice how the recursive call is the last statement in the above definition. That enables the compiler to do a Tail call-optimization. In simple terms, it means that the stack from the previous call can be reused and won't grow over time.

A disassembly makes it clearer. With the first example program, we have:

0] (disassemble 'nth-power)
; disassembly for NTH-POWER
					; Size: 130 bytes. Origin: #x1001F8EBC9
...
; C18:       488B0559FFFFFF   MOV RAX, [RIP-167]              ; #<SB-KERNEL:FDEFN NTH-POWER>
; C1F:       B904000000       MOV ECX, 4
; C24:       48892B           MOV [RBX], RBP
; C27:       488BEB           MOV RBP, RBX
; C2A:       FF5009           CALL QWORD PTR [RAX+9]
; C2D:       480F42E3         CMOVB RSP, RBX
; C31:       488B75E0         MOV RSI, [RBP-32]
; C35:       488BFA           MOV RDI, RDX
; C38:       488BD6           MOV RDX, RSI
; C3B:       41BB80030020     MOV R11D, 536871808             ; GENERIC-*
; C41:       41FFD3           CALL R11
; C44:       488BF2           MOV RSI, RDX
; C47:       EB98             JMP L0
; C49:       CC10             BREAK 16                        ; Invalid argument count trap

Notice how recursion is implemented with the call instruction on line C2A. This signifies switching to a new stack frame. In contrast, with our modified nth-power program, we have:

0] (disassemble 'nth-power)

; disassembly for NTH-POWER
; Size: 105 bytes. Origin: #x1001E67663
...
; 98: L0:   488B55E8         MOV RDX, [RBP-24]
; 9C:       BF02000000       MOV EDI, 2
; A1:       41BB10030020     MOV R11D, 536871696              ; GENERIC--
; A7:       41FFD3           CALL R11
; AA:       488BFA           MOV RDI, RDX
; AD:       488B5DD8         MOV RBX, [RBP-40]
; B1:       488B55F0         MOV RDX, [RBP-16]
; B5:       488BF3           MOV RSI, RBX
; B8:       488B0549FFFFFF   MOV RAX, [RIP-183]               ; #<SB-KERNEL:FDEFN NTH-POWER>
; BF:       B906000000       MOV ECX, 6
; C4:       FF7508           PUSH QWORD PTR [RBP+8]
; C7:       FF6009           JMP QWORD PTR [RAX+9]
; CA:       CC10             BREAK 16                         ; Invalid argument count trap

The call instruction has been replaced with a jmp on line C7. This signifies that the stack size won't grow as was the case with the call instruction.

Note that, when using optimization with gcc, it will always try to optimize tail calls. It's easy to experiment with the optimization switches(-O0 -O1 -O2) and check the generated assembly code.

#lisp #programming

P2P and ready to be stolen!

Sometime in 2012, I made up my mind that I was done with my Bitcoin stash. I had a small collection, which at that time I thought wasn't much. Those were the good old days; I remember websites giving you free coins just for visiting them. Back to the point, I said to myself: “If I needed them later, I will get them again but for now, there's no sense in holding onto them”. So, off I went, transferring my coins to a [presumably] very reputable exchange and sell them for cash. Alas, before I could blink, I saw my account getting drained in small increments. It was almost like a scene from a popular CSI episode that nerds would make fun of for being too good to be true! Well, money was lost and I tried to wipe the experience from my memory somehow believing that it's just a matter of time before cryptocurrencies would be dead anyway. Boy, was I wrong!

Not only have many alt-coins of varied popularity sprung up with large communities thriving around them, they are getting unusual attention, both good and bad from researchers to financial organizations to oppressive governments. And while their fate is still a bit uncertain, news of breaches and stolen bitcoins are still not uncommon. One begs the question: Are cryptocurrencies really solving a problem and introducing another ?

Researchers are often so engrossed in solving a specific problem that they seldom look at the potential advantages of the bad by design (no pun intended) problem. And the cryptocurrencies solution seem to fall into this category. Ofcourse, Anonymity and Decentralization all sound like a step forward and they might be inherently desired but do they always work for the better ? Probably not. One of the many reasons why you don't hear about major breaches at your local bank is because by design, paper currency and the economy that encompasses it make transactions trackable. So, while they do expose your identity which might be undesirable in the greater scheme of things, maybe, a medium of exchange system could mostly benefit from being easily trackable and controlled by a central authority for it to work fairly. It would be difficult to ignore the evidence that paper money's long and successfully history provides in support of this theory.

On the other hand, while the anonymity advantage that digital currencies offer does sound appealing, it's worth pondering if it contributes to making it more difficult to protect it as well. Allow me to explain with an example: A regular bank transaction, by definition, requires an identifiable account holder. Even assuming that someone does hack into your account, there's a certain degree of responsibility that the exchange (read: the bank) has when it comes to protecting your investment. Ofcourse, once alt-coins get widespread recognition and there are standards put in place to protect users, this would be applicable to them too but that's exactly my point! With conventional currency, it's impossible for just anyone to setup an exchange without proper security protocols in place to protect users. That's because the creator of the currency (the government in most cases) has the authority over its usage and disbursement. On the other hand, with the decentralized nature of digital currencies, light regulations have led to rampant fraudulent activities. If there's any consolation, all attacks are on the associated services and not the core software and protocol. But to reiterate on my point, it's worth considering whether the large number of unpleasant hesists associated with bitcoins are directly proportional to the design intended to make it decentralized.

Understandably, it's easy to blame this thought on conservative bias. I do wonder however whether research and innovation and their applications are entirely two separate beasts and worth analyzing very carefully with differently colored eyeglasses. With our constant desire to improve our lives, I believe this ideology should be universally applicable. As a different example, let's consider Autonomous Driving too! Advances in Machine Learning is all good but can a machine's logic really replace the empathetic decision making of a human at the time of a crisis ? That's a topic for some other time!

#thoughts #tech #crypto

Enter your email to subscribe to updates.