2024-05-27

Reading pressure from a QMP6988

I got hold of an envIII sensor for the indoor humidity, temperature and pressure readings, and bunged it on an RPi via a Grove HAT. This device incorporates an SHT30 for humidity and temperature, and a QMP6988 for pressure (but it also measures temperature for performing some compensation on the pressure). I had no trouble interpreting the SHT30's datasheet, and got the readings out with two I²C calls. The procedure for the QMP6988 is a bit more involved, and its datasheet required some guesswork, so I'm documenting the steps I took in case someone else is having trouble.

Reading the raw coefficients

To perform compensation, you need to read 12 raw integer coefficients, then scale and translate them as real numbers, before combining them with the raw pressure/temperature readings. The raw coefficients are expressed as 25 1-byte constant read-only registers within the device, so you only need to fetch them once, even if you're going to take multiple readings. I used the I2C_RDWR ioctl to write the register being requested, read the value, cancel the request, and confirm the cancellation, in sequence for each register. Each call (re-)used a single buffer:

uint8_t buf;
struct i2c_msg msg = {
  .addr = addr,
  .len = 1,
  .buf = &buf,
};
struct i2c_rdwr_ioctl_data pyld = {
  .msgs = &msg,
  .nmsgs = 1,
};

With fd open on the I²C device, I could request register reg_idx like this:

buf = reg_idx;
msg.flags = 0; // write
if (ioctl(fd, I2C_RDWR, &pyld) < 0)
  throw std::system_error(errno, std::generic_category());

To read, set msg.flags = I2C_M_RD, and call ioctl again. I kept reading as long as ioctl returned negative with errno == EIO.

My understanding of the datasheet is that one should then request register 0xff (as if to cancel the prior request), and keep reading until one gets 0. In fact, my code stopped if it got EIO or a zero, though I don't think I've seen the latter:

buf = 0;
msg.flags = I2C_M_RD;
do {
  if (ioctl(fd, I2C_RDWR, &pyld) < 0) {
    if (errno == EIO) break;
    throw std::system_error(errno, std::generic_category());
  }
  if (buf != 0x00) continue;
  break;
} while (true);

Coefficients' signedness

Ten of the coefficients are 16-bit integers, and the other two are 20-bit. I couldn't find anywhere in the datasheet about their signedness, but I only get reasonable readings if they are treated as signed. I used a wider unsigned type to compose the value from bytes, reinterpreted as the corresponding signed type, then subtracted if the ‘top’ bit was set:

uint_fast32_t val = low_byte;
val |= high_byte << 8;
int_fast32_t ival = val;
if (val & 0x8000)
  ival -= 0x10000;

Scaling and translating the coefficients

Each of the 16-bit integers must be divided by an integer constant, then multipled by a real constant, and then offset by another real. In the datasheet, these real constants are provided under Conversion factor in a table, and a general equation shows how to use them. However, the information for the 20-bit coefficients looks potentially contradictory. In the corresponding table, the Conversion factor column says Offset value (20Q16), while the equation simply says to divide by 16 (so no offset?). I haven't found any definition of this notation, but I think it implies that the original value is 20 bits, with the unit being 1/16. In other words, all you have to do is divide the signed integer by 16, as the equation states.

Taking the raw readings

I used a one-off write to one of the registers to initialize the device (a 2-byte <register, value> message), but I send another 2-byte message to force each reading. After waiting a moment, I read each of the 6 bytes separately, in the same way as reading the coefficients (request, read, cancel, confirm).

The datasheet states that each 24-bit reading should have 223 subtracted from it, but at 24bits[sic] output mode. I thought maybe this meant that the result should be masked with 0xffffff, but that would create a considerable discontinuity, and indeed it does not yield correct results. Simply treat the raw 24-bit value as unsigned, convert it to a signed value (with no sign extension), and do the subtraction.

Units

After applying compensation, the pressure is expressed in Pa, which is stated in the datasheet. Divide by 100 to get hPa or mbar.

The datasheet mentions 256 degreeC as the unit for the compensated temperature. I got meaningful readings by dividing by 256, so I guess it means that the unit is one 256th of a degree C. When you use the compensated temperature to compensate the pressure, just use the value as is; don't divide.

WS3085 wind speed codes

I've been examining the raw signals from several Aercus Instruments weather stations, mainly the WS3085 and similar. Two bytes of the long (80-bit) messages appear to carry wind speed, one for the average, and one for gust.

By recording the signals and simultaneously observing the console, I could get a mapping between the signal and reported wind speed. Here are some plain speeds:

byte 1 (wind speed, bits 32-39) console speed (km/hr)
00000000 0.0
00000001 1.1 (corrected signal after possible misreading)
00000010 2.5
00000011 3.6
00000100 5.0
00000101 6.1
00000110 7.2
00000111 8.6

Here are some gust speeds (on a windier day):

byte 2 (gust speed; bits 40-47) console gust speed (km/hr)
00000110 7.2
00001000 9.7
00001001 11.2
00001110 17.3
00001111 18.4
00010001 20.9
00010010 22.0
00011101 35.6
00100000 39.2

Where they overlap, gust speeds and plain wind speeds appear to use the same representation, and larger numbers correspond to greater speeds, so I'm going to assume that they indeed use the same representation. However, there's no consistent ratio shown in the recordings above, but it's always (so far) between 1.1 and 1.25. The mean is ~1.218, which works closely for codes 5 and 8, but over-reports for 1, 3 and 6, and under-reports for 2, 4, 7, 9, 14, 15, 17, 18, 29 and 32. Perhaps using different units would have yielded a more consistent ratio, e.g., the code is first multiplied and rounded to get the speed in another unit, then multiplied again and rounded again to get the speed in km/hr. Other units are m/s (÷3.6), mi/hr (÷1.609) and knots (÷1.852), and none of these are going to yield a nicer ratio.

To get a more intuitive understanding, here's a plot of speeds against raw values, but with a couple of anticipated scales subtracted:

Those drops are all by the same amount. The increments aren't, but some are similar. What's going on?

Here's the Gnuplot script:

set title 'Wind ratio'
set datafile sep ','
set xlabel 'signal'
set ylabel 'speed (km/hr)'
set term pdf monochrome linewidth 0.1
set output 'windratio.pdf'
set key left bottom
set grid xtics
set xtics 1
show grid
plot 'windratio.csv' using 1:($2-$1*1.25) with linespoints title 'observed - 1.25x', \
  'windratio.csv' using 1:($2-$1*1.225) with linespoints title 'observed - 1.225x'

And here's windratio.csv:

0,0
1,1.1
2,2.5
3,3.6
4,5.0
5,6.1
6,7.2
7,8.6
8,9.7
9,11.2
14,17.3
15,18.4
17,20.9
18,22
29,35.6
32,39.2

Looks like you can reproduce that table with something like this:

def conv(i):
    return i * 1.1 + \
        ((i + 3) // 5 + (i + 1) // 5) * 0.3 + \
        ((i + 16) // 25) * 0.1

for i in range(0, 33):
    print('%2d: %4.2f' % (i, conv(i)))
    continue

In other words, add 1.1 per unit, then add 0.3 every 5 units from positions 1 and 4, and add a further 0.1 at 9 (and I'm guessing that's every 25 units, but it must be at least 24).

According to Kevin, just multiply by 0.34, and round to the nearest tenth, to get metres per second. Converting to km/h and rounding again gives all the reported values. Try the following, and you'll see all the reported values matching:

def conv(i):
    return i * 1.1 + \
        ((i + 3) // 5 + (i + 1) // 5) * 0.3 + \
        ((i + 15) // 24) * 0.1

def conv2(i):
    return int(i * 3.4 + 0.5) / 10 * 3600 / 1000

for i in range(0, 33):
    print('%2d: %4.1f %4.1f' % (i, conv(i), conv2(i)))
    continue

[2024-06-10 Minor corrections to table; inferred expression]
[2024-06-12 Linked to Kevin's post with "the answer"; corrected bit positions]

2023-08-19

Two logical interfaces on one physical, with Netplan

In my home network, I have a server which I want to appear under two hostnames, mainly so I can later move the functionality associated with one of them around to other hosts. I'm just using my ISP-supplied broadband router/modem to manage the network, but it doesn't exactly bristle with configuration options to make this directly possible with, say, a DNS alias. Nevertheless, I want to stick with it, as other solutions might involve duplicating a lot of its functionality, or splitting it across multiple hosts, both of which introduce their own risks.

The router provides local DNS resolution (in the .home domain), and it honours the hostnames specified by DHCP requests. By presenting two interfaces to it, a single host can get two IP addresses and so two distinct names. Yes, it's ugly and hacky, but it's a solution within the constraints.

Approach

In this specific example, enp3s0 is the physical interface, and the second hostname is media-centre. The approach is to create two virtual interface pairs (faux0-faux0br and faux1-faux1br), connect one end of each (faux0br and faux1br) to a virtual bridge (br0), and connect this to the physical interface enp3s0. The other two ends of the pairs (faux0 and faux1) are now on the same Ethernet network, and running DHCP on them causes them to acquire distinct IP addresses, and registers them under distinct DNS names.

IPv4 ARP

For IPv4, it's essential to prevent the two interfaces stepping on each other's toes regarding ARPs, and a Server Fault answer shows how. Put this in your /etc/sysctl.d/local.conf (or create a numbered file for it, say 99-dualiface.conf):

net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.all.arp_announce=2
net.ipv4.conf.all.rp_filter=2

That will apply on boot, but you can apply it immediately with sudo sysctl -p/etc/sysctl.d/local.conf.

Creating virtual interface pairs

At the time of writing, and as far as I can tell, Netplan can set up bridges, but not the veth pairs used in the previous solution. This Ask Ubuntu answer explains how to do it another way. For our case specifically, create /etc/systemd/network/25-faux0.netdev:

[NetDev]
Name=faux0
Kind=veth
[Peer]
Name=faux0br

Create /etc/systemd/network/25-faux1.netdev similarly:

[NetDev]
Name=faux1
Kind=veth
[Peer]
Name=faux1br

Connecting with a bridge

We create and define the bridge in the /network/bridges section of a YAML file in /etc/netplan/. I've called this one 99-bridgehack.yaml:

network:
  ethernets:
    enp3s0:
      dhcp4: false
    faux0:
      dhcp4: true
    faux0br: {}
    faux1:
      dhcp4: true
      dhcp4-overrides:
        hostname: media-centre
    faux1br: {}
  bridges:
    br0:
      link-local: []
      interfaces:
        - faux0br
        - faux1br
        - enp3s0

We enable DHCP on faux0 and faux1. The former announces itself using the server's own name by default, but we set the name explicitly for the latter. Note that we also disable DHCP on our original interface enp3s0, overriding the setting in /etc/netplan/00-installer-config.yaml:

# This is the network config written by 'subiquity'
network:
  ethernets:
    enp3s0:
      dhcp4: true
  version: 2

The section /network/bridges/br0/interfaces binds the backends of the veth pairs together with the physical interface. faux0br and faux1br must have some presence in /network/ethernets in order to reference them here, so they are set empty.

[Edit 2024-12-07] /network/bridges/br0/link-local is set to an empty list to prevent IPv6 addresses being assigned to the bridge. This isn't vital, but it might save you some head scratching about strange entries in your router's network device list.

Deployment

With /etc/netplan/99-bridgehack.yaml in place, you just need to tell Netplan about it. Any remote network reconfiguration risks you losing the very connection you're using to do it over, so this is best done on the server's console:

sudo netplan generate
sudo netplan apply

Maybe I did something wrong, but I would often find that Netplan would create new entities as requested, but not tear down old ones. A reboot ensures you're starting from a clean slate. If you make a mistake, you can always rename 99-bridgehack.yaml to disable it.

Déjà vu

I did this before, but without Netplan. I turned it off, and enabled legacy ifupdown functionality still available in Ubuntu 18.04. However, it's less clear how to do that on 22.04, so I had to find a way with Netplan. There was no need to mess with /etc/dhcp/dhclient.conf this time, which is good, as it didn't seem to make any difference. (Is dhclient being used any more?) The IPv4/ARP advice remains largely the same.

2022-08-25

The Brexit Song

To the tune of “Thank you for the music” by ABBA:

Thank you for the Brexit that keeps on giving
To the EU. You've lost your living.
Thank you for the workforce,
The jobs and all your money,
For sovereignty,
And for some bad trade deals you will see
Aren't worth the loss of all your farming,
Fishing and industry.

Feel free to develop.

2021-04-06

BT email rules not working

So, I just spent the evening rejigging my parents' email rules with BT. They seemed to stop working sometime in January 2021, and I've just worked out why.

BT have changed how comparisons like is and ends with work on the From: field (and possibly others). Previously, the email address was extracted from the field, so it didn't matter whether the whole text of the field read any of these ways:

From: j.bloggs@example.com
From: Joe Bloggs <j.bloggs@example.com>
From: "Joe Bloggs" j.bloggs@example.com

Can't be certain that I've remembered that third form correctly; it's in an RFC somewhere anyway. However, I don't think I've seen it for a long time, so I've going to assume it's fallen out of favour, and focus on the other two.

Under the new mechanism, From: is j.bloggs@example.com will only match the first form. You'll now also need a From: contains <j.bloggs@example.com> to guarantee a match. You can't use multiple operators like is and contains on the same field in the same rule, so you must duplicate the rule, and maintain it. You could, of course, match both j.bloggs@example.com and <j.bloggs@example.com> in the same rule with contains, and you'll probably get away with it, but you'll be left scratching your head when bob.j.bloggs@example.computing.invalid ends up in the same place. Also, if they change it back without notice, your is rule will continue to work.

From: ends with @example.com will also fail to match the second form. You need From: ends with @example.com> too now. Fortunately, you can do that with an extra entry in the same rule; you don't need a duplicate rule. However, bear in mind that you can only have 15 From: entries in a single rule.

To: and CC: can have multiple addresses. Some experimentation is required to determine whether they are automatically split and tested separately.

While I'm in gripe mode, BT rules could do with a few other features:

  • Match on List-Id: to pick out mailing-list posts unambiguously.
  • Filter out those damn subject-line tags like [zarquon users] that needn't pollute mailing lists when they've already been sorted into the right folder.
  • Mark messages as read.

2021-01-07

“Wrong __data_start/_end pair” work-around

I was getting Wrong __data_start/_end pair from my Mokvino Web scripts when converting ODG to SVG, since upgrading to Ubuntu 20.04 (though I've used Mokvino Web so little lately, I can't be sure that that's the start of the problem). It was an inkscape command that was failing. When I ran the command manually, I got no error. I found few differences in environment variables between running directly and running via make, and when I forced them to be the same in the script as in the console, it still failed within make and worked in the console.

A StackExchange question pointed towards a work-around. I checked the resource limit for the stack size (ulimit -s), and it was unlimited when run from make, but 8192 in the console. I bunged in a ulimit -s 8192 before the command, and it worked!

$ ulimit -s unlimited 
$ inkscape -z --query-all "example.svg" | head -2
Wrong __data_start/_end pair
$ ulimit -s 8192
$ inkscape -z --query-all "example.svg" | head -2
svg805,5848,8815,14472,4305.111
rect2,0,0,29700,21000
$ 

Can't say I understand what's happening here; just hope it helps.

2020-05-25

Weather station woes and fixes

I've set up some Raspberry Pis to export data from various FineOffset weather stations, using WeeWX as the server software periodically downloading records and presenting them graphically in a webpage. Each weather station comes with a dedicated console to receive transmissions from the outdoor sensors every minute or so, and the console has a USB socket to allow a host to configure it and extract data.

This console is known to have a “USB lock-up” bug, whereby it refuses to talk to the host after some random period (from a few days to a couple of months), even though it had been interacting successfully prior to that point. The only robust work-around is to power-cycle the console, which is not easy to automate. Here's what I had to do.

Detection

The lock-up bug now appears as Operation timed out in the WeeWX log, as given by sudo /bin/systemctl status weewx:

fousb: get_records failed: [Errno 110] Operation timed out

You can get essentially the same lines from grep weewx /var/log/syslog. Four of these appear (about 45 seconds apart), and then WeeWX seems to reconnect in vain, and gets another four, and so on. This cycle lasts about 3½ minutes.

Note that the WeeWX documentation on the matter identifies a different error:

could not detach kernel driver from interface

Maybe that means this isn't really the lock-up bug I'm getting, but the symptoms and treatment seem to be the same.

Recovery

You have to take the batteries out of the console, and ensure it is disconnected over USB. You can run the console off USB power alone, so for an unattended power cycle, you just need a USB hub that can depower its sockets. The big disadvantage of leaving the batteries out is that no readings are taken during a power cut; with the batteries in, you could at least pull them off the console when power returned, as it can store several days' worth.

Here are the Pis I'm using with each weather station:

Hostname Host model Weather station model WeeWX version
fish RPi 3B+ WH3083 3.9.2
ruscoe RPi 3B WH3083 3.9.2
kettley RPi 3B WH1080 3.9.1

All are running some version of Raspbian.

I've used uhubctl to check for and invoke the power-cycling feature, and can confirm that both the RPi 3B and 3B+ can control the power on their USB sockets. I also tried an RPi Zero W, which would have had the ideal amount of grunt for the task, but it's unable to control power on its sockets. Since I've not seen the problem on the WH1080, it could be used there, or indeed on any similar set-up with a different type of weather station. I was using an older RPi model at some point (with no built-in Wi-Fi); it could power-cycle its entire USB hub, although this included the USB Wi-Fi chip!

The output of sudo uhubctl looks something like this (on a 3B; it's marginally different on the 3B+):

$ sudo uhubctl 
Current status for hub 1-1 [0424:9514]
  Port 1: 0503 power highspeed enable connect [0424:ec00]
  Port 2: 0100 power
  Port 3: 0100 power
  Port 4: 0303 power lowspeed enable connect [1941:8021]
  Port 5: 0100 power

1941:8021 is the weather station console:

$ lsusb 
Bus 001 Device 014: ID 1941:8021 Dream Link WH1080 Weather Station / USB Missile Launcher
Bus 001 Device 013: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp. SMC9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

This means that a command of the following form will power-cycle the console:

sudo uhubctl -l 1-1 -p 4 -a 2 -d 30 -R

Update: On fish, I had to set -p to one less than the reported port! sudo uhubctl said it was in port 3, but the command only worked on port 2. It helps to have someone looking at the console to confirm when you're doing it remotely!

  • -l 1-1 -p 4 are taken from the output of uhubctl, identifying the hub and port.
  • -a 2 causes a power cycle, rather than switching on or off.
  • -d 30 keeps it off for a generous 30 seconds. That maybe could be trimmed a bit.
  • -R resets the hub, forcing devices to re-associate. I found this to be essential, and wonder if it would be effective without the power cycle. Update: It isn't; you must remove the batteries.

Putting it together

A script, in ~/.local/bin/check-weather-station:

#!/bin/bash

count=0
while read line ; do
    if [[ "$line" == *"get_records failed: [Errno 110] Operation timed out"* ]] ; then
        ((count++))
    elif [[ "$line" == *"Stopping LSB: weewx weather system"* ]] ; then
        count=0
    fi
done < <(grep weewx /var/log/syslog | tail -50)

if [ $count -ge 4 ] ; then
    printf >&2 'Fault detected, power-cycling...\n'
    echo >&2 'Stopping station software'
    sudo /bin/systemctl stop weewx
    echo >&2 'Power-cycling hub'
    sudo /usr/sbin/uhubctl -l 1-1 -p 4 -a 2 -d 30 -R
    echo >&2 'Waiting for end of sensor-learning period'
    sleep 180
    echo >&2 'Setting time'
    sudo /usr/bin/wee_device -y --set-time
    echo >&2 'Setting interval'
    sudo /usr/bin/wee_device -y --set-interval=5
    echo >&2 'Restarting station software'
    sudo /bin/systemctl start weewx
fi

A cron job then checks every few minutes:

*/3 * * * * $HOME/.local/bin/check-weather-station

That should pick up the fault within one 3½-minute cycle.

Other aspects of the script:

  • Four “timed out” messaged are awaited. Maybe I could get away with two, or even one!

  • The weewx service is suspended during the reset. This ensures there's no interaction with the console shortly after it comes back on.

  • While the service is suspended, we don't want overlapping invocations of the script to do anything. This is detected by resetting the message count whenever we see that the service has been stopped. Only “timed out” messages that are not followed by a “stopping” message are counted.

    (There's a potential race condition here, but it's not going to happen unless parsing the log and stopping the service take more than 3 minutes.)

  • Waiting three minutes after the reset ensures that the console's sensor-learning mode is not jeopardized by external activity. The weather-station manual warns about key activity on the console during this time, and I suspect it actually extends to USB activity too. I've been very cautious, so it might be possible to trim the timing a bit.

  • The console's time is synchronized with the host's. This can only be done while the service is stopped. (Unfortunately, this does not seem to update the clock displayed in the console.)

  • The logging interval is set to 5 minutes. Apparently, the console can sometimes forget this after a power cycle, but this setting is thought to reduce the likelihood of lock-ups.

Other issues

  • One Pi wouldn't come back on after a power cut. Changing the power supply fixed that.

  • Another Pi seems to lose its Internet connection, but continued gathering data from the console. Being headless, the simplest thing for a non-technical person to do is to power-cycle the Pi, but that's a bit drastic, and undermines the goal of unattended operation. I tried the following to prevent the Wi-Fi from going to sleep, but it still happened:

    sudo iw dev wlan0 set power_save off
    

    I've resorted to pinging the router once a day.

Results

With a cruder detection mechanism (one that took about four or five log cycles to get lucky), I've seen the script work twice in just a couple of days. I'm trying out this new detection mechanism above, which should be safe to use as often as every three minutes, and so it should be able to detect the first cycle. I'll update this article as things develop.