2023-08-19

Two logical interfaces on one physical, with Netplan

In my home network, I have a server which I want to appear under two hostnames, mainly so I can later move the functionality associated with one of them around to other hosts. I'm just using my ISP-supplied broadband router/modem to manage the network, but it doesn't exactly bristle with configuration options to make this directly possible with, say, a DNS alias. Nevertheless, I want to stick with it, as other solutions might involve duplicating a lot of its functionality, or splitting it across multiple hosts, both of which introduce their own risks.

The router provides local DNS resolution (in the .home domain), and it honours the hostnames specified by DHCP requests. By presenting two interfaces to it, a single host can get two IP addresses and so two distinct names. Yes, it's ugly and hacky, but it's a solution within the constraints.

Approach

In this specific example, enp3s0 is the physical interface, and the second hostname is media-centre. The approach is to create two virtual interface pairs (faux0-faux0br and faux1-faux1br), connect one end of each (faux0br and faux1br) to a virtual bridge (br0), and connect this to the physical interface enp3s0. The other two ends of the pairs (faux0 and faux1) are now on the same Ethernet network, and running DHCP on them causes them to acquire distinct IP addresses, and registers them under distinct DNS names.

IPv4 ARP

For IPv4, it's essential to prevent the two interfaces stepping on each other's toes regarding ARPs, and a Server Fault answer shows how. Put this in your /etc/sysctl.d/local.conf (or create a numbered file for it, say 99-dualiface.conf):

net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.all.arp_announce=2
net.ipv4.conf.all.rp_filter=2

That will apply on boot, but you can apply it immediately with sudo sysctl -p/etc/sysctl.d/local.conf.

Creating virtual interface pairs

At the time of writing, and as far as I can tell, Netplan can set up bridges, but not the veth pairs used in the previous solution. This Ask Ubuntu answer explains how to do it another way. For our case specifically, create /etc/systemd/network/25-faux0.netdev:

[NetDev]
Name=faux0
Kind=veth
[Peer]
Name=faux0br

Create /etc/systemd/network/25-faux1.netdev similarly:

[NetDev]
Name=faux1
Kind=veth
[Peer]
Name=faux1br

Connecting with a bridge

We create and define the bridge in the /network/bridges section of a YAML file in /etc/netplan/. I've called this one 99-bridgehack.yaml:

network:
  ethernets:
    enp3s0:
      dhcp4: false
    faux0:
      dhcp4: true
    faux0br: {}
    faux1:
      dhcp4: true
      dhcp4-overrides:
        hostname: media-centre
    faux1br: {}
  bridges:
    br0:
      interfaces:
        - faux0br
        - faux1br
        - enp3s0

We enable DHCP on faux0 and faux1. The former announces itself using the server's own name by default, but we set the name explicitly for the latter. Note that we also disable DHCP on our original interface enp3s0, overriding the setting in /etc/netplan/00-installer-config.yaml:

# This is the network config written by 'subiquity'
network:
  ethernets:
    enp3s0:
      dhcp4: true
  version: 2

The section /network/bridges/br0/interfaces binds the backends of the veth pairs together with the physical interface. faux0br and faux1br must have some presence in /network/ethernets in order to reference them here, so they are set empty.

Deployment

With /etc/netplan/99-bridgehack.yaml in place, you just need to tell Netplan about it. Any remote network reconfiguration risks you losing the very connection you're using to do it over, so this is best done on the server's console:

sudo netplan generate
sudo netplan apply

Maybe I did something wrong, but I would often find that Netplan would create new entities as requested, but not tear down old ones. A reboot ensures you're starting from a clean slate. If you make a mistake, you can always rename 99-bridgehack.yaml to disable it.

Déjà vu

I did this before, but without Netplan. I turned it off, and enabled legacy ifupdown functionality still available in Ubuntu 18.04. However, it's less clear how to do that on 22.04, so I had to find a way with Netplan. There was no need to mess with /etc/dhcp/dhclient.conf this time, which is good, as it didn't seem to make any difference. (Is dhclient being used any more?) The IPv4/ARP advice remains largely the same.

2022-08-25

The Brexit Song

To the tune of “Thank you for the music” by ABBA:

Thank you for the Brexit that keeps on giving
To the EU. You've lost your living.
Thank you for the workforce,
The jobs and all your money,
For sovereignty,
And for some bad trade deals you will see
Aren't worth the loss of all your farming,
Fishing and industry.

Feel free to develop.

2021-04-06

BT email rules not working

So, I just spent the evening rejigging my parents' email rules with BT. They seemed to stop working sometime in January 2021, and I've just worked out why.

BT have changed how comparisons like is and ends with work on the From: field (and possibly others). Previously, the email address was extracted from the field, so it didn't matter whether the whole text of the field read any of these ways:

From: j.bloggs@example.com
From: Joe Bloggs <j.bloggs@example.com>
From: "Joe Bloggs" j.bloggs@example.com

Can't be certain that I've remembered that third form correctly; it's in an RFC somewhere anyway. However, I don't think I've seen it for a long time, so I've going to assume it's fallen out of favour, and focus on the other two.

Under the new mechanism, From: is j.bloggs@example.com will only match the first form. You'll now also need a From: contains <j.bloggs@example.com> to guarantee a match. You can't use multiple operators like is and contains on the same field in the same rule, so you must duplicate the rule, and maintain it. You could, of course, match both j.bloggs@example.com and <j.bloggs@example.com> in the same rule with contains, and you'll probably get away with it, but you'll be left scratching your head when bob.j.bloggs@example.computing.invalid ends up in the same place. Also, if they change it back without notice, your is rule will continue to work.

From: ends with @example.com will also fail to match the second form. You need From: ends with @example.com> too now. Fortunately, you can do that with an extra entry in the same rule; you don't need a duplicate rule. However, bear in mind that you can only have 15 From: entries in a single rule.

To: and CC: can have multiple addresses. Some experimentation is required to determine whether they are automatically split and tested separately.

While I'm in gripe mode, BT rules could do with a few other features:

  • Match on List-Id: to pick out mailing-list posts unambiguously.
  • Filter out those damn subject-line tags like [zarquon users] that needn't pollute mailing lists when they've already been sorted into the right folder.
  • Mark messages as read.

2021-01-07

“Wrong __data_start/_end pair” work-around

I was getting Wrong __data_start/_end pair from my Mokvino Web scripts when converting ODG to SVG, since upgrading to Ubuntu 20.04 (though I've used Mokvino Web so little lately, I can't be sure that that's the start of the problem). It was an inkscape command that was failing. When I ran the command manually, I got no error. I found few differences in environment variables between running directly and running via make, and when I forced them to be the same in the script as in the console, it still failed within make and worked in the console.

A StackExchange question pointed towards a work-around. I checked the resource limit for the stack size (ulimit -s), and it was unlimited when run from make, but 8192 in the console. I bunged in a ulimit -s 8192 before the command, and it worked!

$ ulimit -s unlimited 
$ inkscape -z --query-all "example.svg" | head -2
Wrong __data_start/_end pair
$ ulimit -s 8192
$ inkscape -z --query-all "example.svg" | head -2
svg805,5848,8815,14472,4305.111
rect2,0,0,29700,21000
$ 

Can't say I understand what's happening here; just hope it helps.

2020-05-25

Weather station woes and fixes

I've set up some Raspberry Pis to export data from various FineOffset weather stations, using WeeWX as the server software periodically downloading records and presenting them graphically in a webpage. Each weather station comes with a dedicated console to receive transmissions from the outdoor sensors every minute or so, and the console has a USB socket to allow a host to configure it and extract data.

This console is known to have a “USB lock-up” bug, whereby it refuses to talk to the host after some random period (from a few days to a couple of months), even though it had been interacting successfully prior to that point. The only robust work-around is to power-cycle the console, which is not easy to automate. Here's what I had to do.

Detection

The lock-up bug now appears as Operation timed out in the WeeWX log, as given by sudo /bin/systemctl status weewx:

fousb: get_records failed: [Errno 110] Operation timed out

You can get essentially the same lines from grep weewx /var/log/syslog. Four of these appear (about 45 seconds apart), and then WeeWX seems to reconnect in vain, and gets another four, and so on. This cycle lasts about 3½ minutes.

Note that the WeeWX documentation on the matter identifies a different error:

could not detach kernel driver from interface

Maybe that means this isn't really the lock-up bug I'm getting, but the symptoms and treatment seem to be the same.

Recovery

You have to take the batteries out of the console, and ensure it is disconnected over USB. You can run the console off USB power alone, so for an unattended power cycle, you just need a USB hub that can depower its sockets. The big disadvantage of leaving the batteries out is that no readings are taken during a power cut; with the batteries in, you could at least pull them off the console when power returned, as it can store several days' worth.

Here are the Pis I'm using with each weather station:

Hostname Host model Weather station model WeeWX version
fish RPi 3B+ WH3083 3.9.2
ruscoe RPi 3B WH3083 3.9.2
kettley RPi 3B WH1080 3.9.1

All are running some version of Raspbian.

I've used uhubctl to check for and invoke the power-cycling feature, and can confirm that both the RPi 3B and 3B+ can control the power on their USB sockets. I also tried an RPi Zero W, which would have had the ideal amount of grunt for the task, but it's unable to control power on its sockets. Since I've not seen the problem on the WH1080, it could be used there, or indeed on any similar set-up with a different type of weather station. I was using an older RPi model at some point (with no built-in Wi-Fi); it could power-cycle its entire USB hub, although this included the USB Wi-Fi chip!

The output of sudo uhubctl looks something like this (on a 3B; it's marginally different on the 3B+):

$ sudo uhubctl 
Current status for hub 1-1 [0424:9514]
  Port 1: 0503 power highspeed enable connect [0424:ec00]
  Port 2: 0100 power
  Port 3: 0100 power
  Port 4: 0303 power lowspeed enable connect [1941:8021]
  Port 5: 0100 power

1941:8021 is the weather station console:

$ lsusb 
Bus 001 Device 014: ID 1941:8021 Dream Link WH1080 Weather Station / USB Missile Launcher
Bus 001 Device 013: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp. SMC9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

This means that a command of the following form will power-cycle the console:

sudo uhubctl -l 1-1 -p 4 -a 2 -d 30 -R

Update: On fish, I had to set -p to one less than the reported port! sudo uhubctl said it was in port 3, but the command only worked on port 2. It helps to have someone looking at the console to confirm when you're doing it remotely!

  • -l 1-1 -p 4 are taken from the output of uhubctl, identifying the hub and port.
  • -a 2 causes a power cycle, rather than switching on or off.
  • -d 30 keeps it off for a generous 30 seconds. That maybe could be trimmed a bit.
  • -R resets the hub, forcing devices to re-associate. I found this to be essential, and wonder if it would be effective without the power cycle. Update: It isn't; you must remove the batteries.

Putting it together

A script, in ~/.local/bin/check-weather-station:

#!/bin/bash

count=0
while read line ; do
    if [[ "$line" == *"get_records failed: [Errno 110] Operation timed out"* ]] ; then
        ((count++))
    elif [[ "$line" == *"Stopping LSB: weewx weather system"* ]] ; then
        count=0
    fi
done < <(grep weewx /var/log/syslog | tail -50)

if [ $count -ge 4 ] ; then
    printf >&2 'Fault detected, power-cycling...\n'
    echo >&2 'Stopping station software'
    sudo /bin/systemctl stop weewx
    echo >&2 'Power-cycling hub'
    sudo /usr/sbin/uhubctl -l 1-1 -p 4 -a 2 -d 30 -R
    echo >&2 'Waiting for end of sensor-learning period'
    sleep 180
    echo >&2 'Setting time'
    sudo /usr/bin/wee_device -y --set-time
    echo >&2 'Setting interval'
    sudo /usr/bin/wee_device -y --set-interval=5
    echo >&2 'Restarting station software'
    sudo /bin/systemctl start weewx
fi

A cron job then checks every few minutes:

*/3 * * * * $HOME/.local/bin/check-weather-station

That should pick up the fault within one 3½-minute cycle.

Other aspects of the script:

  • Four “timed out” messaged are awaited. Maybe I could get away with two, or even one!

  • The weewx service is suspended during the reset. This ensures there's no interaction with the console shortly after it comes back on.

  • While the service is suspended, we don't want overlapping invocations of the script to do anything. This is detected by resetting the message count whenever we see that the service has been stopped. Only “timed out” messages that are not followed by a “stopping” message are counted.

    (There's a potential race condition here, but it's not going to happen unless parsing the log and stopping the service take more than 3 minutes.)

  • Waiting three minutes after the reset ensures that the console's sensor-learning mode is not jeopardized by external activity. The weather-station manual warns about key activity on the console during this time, and I suspect it actually extends to USB activity too. I've been very cautious, so it might be possible to trim the timing a bit.

  • The console's time is synchronized with the host's. This can only be done while the service is stopped. (Unfortunately, this does not seem to update the clock displayed in the console.)

  • The logging interval is set to 5 minutes. Apparently, the console can sometimes forget this after a power cycle, but this setting is thought to reduce the likelihood of lock-ups.

Other issues

  • One Pi wouldn't come back on after a power cut. Changing the power supply fixed that.

  • Another Pi seems to lose its Internet connection, but continued gathering data from the console. Being headless, the simplest thing for a non-technical person to do is to power-cycle the Pi, but that's a bit drastic, and undermines the goal of unattended operation. I tried the following to prevent the Wi-Fi from going to sleep, but it still happened:

    sudo iw dev wlan0 set power_save off
    

    I've resorted to pinging the router once a day.

Results

With a cruder detection mechanism (one that took about four or five log cycles to get lucky), I've seen the script work twice in just a couple of days. I'm trying out this new detection mechanism above, which should be safe to use as often as every three minutes, and so it should be able to detect the first cycle. I'll update this article as things develop.

2020-02-02

Removing variable prefixes and suffixes from other variables in Bash

Just been bitten by this…

If you have a variable txt in Bash, you can strip a given prefix or suffix from it like this:

$ txt=a/b.d/c.jpg
$ echo "${txt%.*}"
a/b.d/c
$ echo "${txt##*.}"
jpg
$ echo "${txt%%.*}"
a/b
$ echo "${txt#*.}"
d/c.jpg

The % operator strips of the shortest matching suffix, and .* matches .jpg, so that gets removed. %% strips off the longest matching suffix. Similarly, # and ## strip off the shortest and longest matching prefix, respectively. Asterisks, square brackets and other characters are special, probably following the same rules as Pattern Matching in the Bash manual page.

You can also use literal strings as the patterns, i.e., no special characters:

$ txt=a/b.d/c.jpg
$ echo "${txt%.jpg}"
a/b.d/c
$ echo "${txt%.png}"
a/b.d/c.jpg
$ echo "${txt#a/b.d/}"
c.jpg
$ echo "${txt#c/b.d/}"
a/b.d/c.jpg

Note that, if the prefix or suffix doesn't match (whether you use special characters or not), you get the whole string returned.

These operations are useful for traversing pathnames:

$ path="/home/john/file.jpg"
$ echo Leaf is "${path%%*/}"
Leaf is file.jpg
$ echo Dir is "${path#/*}"
Dir is /home/john

You have to be careful if your input doesn't contain the separator:

$ input1=path/to/file.jpg
$ input2=file.jpg
$ echo Input 1 dir "[${input1%/*}]" leaf "[${input1##*/}]"
Input 1 dir [path/to] leaf [file.jpg]
$ echo Input 2 dir "[${input2%/*}]" leaf "[${input2##*/}]"
Input 2 dir [file.jpg] leaf [file.jpg]

To avoid this special case, I thought I could do this:

input1=path/to/file.jpg
input2=file.jpg
input1leaf="${input1##*/}"
input1dir="${input1%${input1leaf}}"
input2leaf="${input2##*/}"
input2dir="${input2%${input2leaf}}"
echo "[${input1dir}]" "[${input1leaf}]"
echo "[${input2dir}]" "[${input2leaf}]"

…which leads to this:

[path/to/] [file1.jpg]
[] [file1.jpg]

However, I hadn't noticed that special characters are still interpreted after the partial expansion:

input3="path/to/file [2002].jpg"
input3leaf="${input3##*/}"
input3dir="${input3%${input3leaf}}"
echo "[${input3dir}]" "[${input3leaf}]"

The square brackets are taken as a wildcard, and fail to match the literal value:

[path/to/file [2002].jpg] [file1 [2002].jpg]

The trick is to quote again:

input3="path/to/file [2002].jpg"
input3leaf="${input3##*/}"
input3dir="${input3%"${input3leaf}"}"
echo "[${input3dir}]" "[${input3leaf}]"

Now you get the intended result:

[path/to/] [file1 [2002].jpg]

An alternative technique would be to use the length of your prefix/suffix in a substring operations, but it's less convenient and more error-prone if you want to do small adjustments to a prefix or suffix before applying it.

Anyway, in summary, if you're going to use Bash's prefix/suffix removal with a computed pattern, put the result in quotes!

Bash redirection with descriptor in variable, and locking

A recommended way to acquire a lock in Bash is to open the lock file for a group command, and call flock on the open descriptor before doing anything dangerous:

{
  echo waiting
  flock -x 9
  echo in
  sleep 10
  echo done
} 9> /tmp/lock

Try it in two independent terminals. The second command will run only as the first finishes.

However, one should never have to pick an arbitrary file descriptor (9 in this case). Fortunately, you can get Bash to choose an available descriptor, using {var} in place of the literal descriptor number:

unset lfd
{
  echo waiting
  flock -x $lfd
  echo in
  sleep 10
  echo done
} {lfd}> /tmp/lock

Problem solved!

No, wait. The descriptor doesn't get closed at the end of the group command, so your second invocation will hang indefinitely. Once the first terminal has finished, if you manually close the descriptor, the second proceeds:

exec {lfd}>&-

Looks like you have to do things more explicitly (and the group command is no longer useful):

echo waiting
unset lfd
exec {lfd}> /tmp/lock
flock -x $lfd
echo in
sleep 10
echo done
exec {lfd}>&-

This is inconvenient if you want to break or continue out of an enclosing loop:

for i in $(seq 1 10)
do
  {
    echo waiting
    flock -x 9
    echo in
    sleep 5
    if something_went_wrong ; then continue ; fi
    sleep 5
    echo done
  } 9> /tmp/lock
done

Is this a bug, a feature, or a mistake on my part? (Bash version 4.4.20(1)-release.)