Henk Postma

Tuesday, November 10, 2015

So you want to measure something? - noise, sampling, and filtering

The math in this post does not render well on some browsers, in that case, please look at the PDF version of this post instead.

Manual readings

Figure 1

Suppose there is a signal that we want to measure, and it is a little bit noisy. It could be a small voltage $V$ across a resistor that has a current running through it. We are not interested in the fluctuations, but only the average value. Let us assume that we have taken great care to remove line interference (at multiples of the line frequency of $50$ or $60$ Hz). We take one reading, call it $V_1$, and while we're at it, we take a few more. We get \[ V_1 = 1.3423\mbox{ V} ; \quad V_2 = 0.403999 \mbox{ V}; \quad V_3 = 0.51914 \mbox{ V} \] and the average is $ \left< V_{1..3} \right> = 0.75513 $ V. Just to be sure, we repeat the three readings again, and we get a new average $ \left< V_{4..6} \right> = 1.1404 $ V (figure 1). We see that there is a significant difference between the readings, and we are not happy with that. So we take the average of all $6$ numbers, and get $\left< V_{1..6} \right> = 0.94779 $ V.

Figure 2

Computer readings

By this time, we are ready to hook up a computer to the signal and take some of this work off our hands, and we go all out: we take $1000$ readings and get $ \left< V_{1..1000} \right> = 0.97996 $ V. Our sample rate is $100$ Hz, so this reading takes $10$ s. To verify that our readings are converging to the actual average value, we run a short script in matlab/gnu octave, and arrive at the following plot of the average voltage $V_n$ as a function of how many averages we were taking $n$ (figure 2).

We see that indeed the average values fluctuate quite a lot in the beginning when we're averaging over a low number of readings, but converge to an average value of $\sim 1$ V.

Bandwidth and aliasing

When we were doing readings by hand, we took perhaps a few seconds to read the value, record it in our notebook, and then we calculated the average. While we were doing our recordings, however, the signal kept fluctuating.

Figure 3

Look at the blue signal in figure 1, and then look at the red marked readings that we took by hand. The fluctuations happened on a time scale that was shorter than we were reading at. To illustrate, look at the fourier transform of the final readings we were doing by computer in figure 3. Indeed we have high-frequency fluctuations that got folded into our measurements even though we were not reading at that high frequency. That is known as `aliasing'.

Anti aliasing

In order to get a better reading, we should discard these higher frequencies. Because even if we're reading at a low frequency, our readings still contain the effect of higher-frequency flucations. Removing the high frequencies that occur faster than our sampling rate is known as applying an `anti aliasing' filter. There are many ways you can accomplish this. We could put a low-pass filter with a cut-off frequency $f_0$ slightly above what we will be reading at. For instance, we could use a simple first-order RC low-pass filter with a transfer function \[ |H(f)| = \frac{1}{\sqrt{1 + f^2/f_0^2} } \qquad , f_0 = \frac{1}{2\pi RC} \qquad . \] This approach requires a-priori knowledge of our measurement frequency and maybe some soldering. Alternatively, we could measure the signal as quickly as possible with the computer, and average it in software. Let's say, as in our experiment, we sample the signal at $100$ Hz, then load all that data into the computer for averaging.

How averaging works

The signal has a specific power spectral density $S_V$, in $V^2/\mbox{Hz}$, and when we read the signal, we get a root-mean-square (RMS) level of fluctuations into the reading equal to \[ V_{RMS}^2 = \int_0^\infty S_V (f) \mathrm{d}f \] If we limit the signal to a specific bandwidth $B$, we basically terminate the integral before $f$ reaches infinity \[ V_{RMS}^2 = \int_0^B S_V (f) \mathrm{d}f \] and the RMS signal is smaller, i.e. we have less fluctuations. More accurately, when we filter, we modify $S_V$ itself because we send it through a filter. If it is a simple RC low-pass filter as suggested above, it has a transfer function with absolute value \[ |H(f)| = |V_{out}/V_{in}| = 1/\sqrt{1 + f^2/f_0^2} \quad , \] where $f_0$ is the cutoff frequency of the filter. If the noise is white, i.e. $S_V(f) = S_V$, a frequency-independent level of fluctuations, the RMS signal becomes \[ V_{RMS}^2 = \int_0^\infty S_V \frac{1}{1 + f^2/f_0^2} \mathrm{d}f = S_V f_0 \pi/2 \] The lower we make the cutoff frequency $f_0$, the smaller the level of fluctuations is.
If we average the signal, we can describe that as integrating the signal. That, in turn, can be described as convolving the signal with a scaled rectangular function in the time domain, \[ V_{av} (t) = \int V(t^\prime) h(t-t^\prime) \; \mathrm{d}t^\prime \qquad . \] If the duration of the average is $\tau$, we are convolving the signal with a modified rectangular function $h(t)$ \[ h(t) = \mathrm{rect}^\prime(t) = \left\lbrace \begin{array}{lcl} 0 & \mbox{if } |t| > \tau/2 \\ \frac{1}{2\tau} & \mbox{if } |t| = \tau/2 \\ 1/\tau & \mbox{if } |t| < \tau/2 \\ \end{array} \right. \] and its fourier transform is \[ H(f) = \int_{-\infty}^\infty \mathrm{rect}^\prime (t) \mathrm{e}^{-2\pi i f t} \mathrm{d}t = \int_{-\tau/2}^{\tau/2} \frac{\mathrm{e}^{-2\pi i f t}}{\tau} \mathrm{d}t = \frac{\sin(\pi f \tau)}{\pi f \tau} = \mathrm{sinc} (\pi f\tau) \] Therefore, if the noise is white, the RMS level is (using $ \int \mathrm{sinc}^2(x) \; \mathrm{d}x = \frac{1}{2}$) \[ V_{RMS}^2 = S_V \int_0^\infty \left( \frac{\sin(\pi f \tau)}{\pi f \tau} \right)^2 \mathrm{d}f = \frac{S_V}{2\pi\tau} \] The longer we average, the smaller the fluctuations become, and the RMS level scales like \[ V_{RMS} \propto \tau^{-1/2} \quad . \] This behavior is very similar to the discrete case, where the standard deviation of the mean is $\sigma \propto n^{-1/2}$.
Let's look again at how the average reading gets better and better the longer we average. We expect the deviation of an average over $n$ numbers ($V_{1..n}$) from the final average value to get smaller and converge as $\propto \tau^{-1/2}$ towards $V_\infty$. But the deviation could be positive as well as negative. Therefore, if we square the deviation, we get \[ (V_n - V_\infty)^2 \propto \frac{1}{\tau} \] If we look at the bottom of figure 2, we see that indeed it appears to follow that behavior.

Summary

Measure as fast as you can so you can get many readings in. If you cannot measure as quickly as your signal is varying, filter out the fluctuations above your sample frequency. If you do not have access to a computer that can read the signal quickly, you can filter the signal yourself. Or, you can use a digital multimeter that allows you to increase the integration time.

Thursday, November 8, 2012

Write your own web browser in 9 lines flat

This web browser in 9 lines of code comes to you courtesy of the awesome python bindings to the webkit renderer on Mac OSX, WebKitCtrl. One simply defines a window and drops the webkit renderer in, done!

First, make sure you have installed wxPython

Then drop this code into a file called browser.py :

import sys
import wx
import wx.webkit

theApp = wx.PySimpleApp(0)
theFrame = wx.Frame(None, -1, "", size=(640,480))
w = wx.webkit.WebKitCtrl(theFrame, -1)
w.LoadURL(sys.argv[1])
theFrame.Show()
theApp.MainLoop()

You launch the browser from the command line, passing the URL to go to as argument, like so:

python_32 browser.py http://www.google.com/

The last bit needed is the script python_32, which ensures python will run in 32 bit mode, because the 64 bit mode is broken unless you have installed the wxPython Cocoa libraries. Place these lines in the file 'python_32'

#!/bin/bash
export VERSIONER_PYTHON_PREFER_32_BIT=yes
/usr/bin/python "$@"

and place that file in the same directory as 'browser.py'. Now, also make sure that this script is executable

chmod a+x python_32

Nitpickers take note. One might say that this is no different from calling a browser from the command line directly, which, according to the line counting above, would give you a browser in 0 lines of code:

open http://www.google.com

However, I would not call that 'your own browser' in the sense that you cannot wrap your own controls around it and interact with the browser contents as you can with the script above.

Friday, July 20, 2012

Computer-Supported Collaborative Science, Summer 2012

It was an honor to work with the CSCS team again this summer in the yearly workshop for middle-school science teachers. In the first week of this workshop, we develop new science curriculum using the CSCS methodology: Computer-Supported Collaborative Science.

The teachers then proceed to teach these new lessons to middle-school students that participate in the SAEP (Summer Academic Enrichment Program). This provides instant feedback and allows rapid changes to the lessons based on real classroom experience.

For this workshop, I created two youtube screencasts:

Thursday, May 3, 2012

... or I shall replace you with a small shell script

So, this rather long description that I posted a while ago on how to get Sparkleshare (a cloud storage solution where you host your own cloud) working on Ubuntu can now be replaced with a simple

sudo apt-get install sparkleshare

or select it in the package manager. This is all possible due to it being available in the repository in the latest Ubuntu: 12.04, aka "Precise Pangolin". Awesome.

Pretty soon I will be able to

sudo apt-get install write_grant write_paper win_nobel

Wednesday, March 21, 2012

Sparkleshare: not quite Dropbox, but better in ways that matter to me

So. You have been using dropbox for some time to make sure all your files are synced between different systems. Perhaps you have shared folders with collaborators and this is how you work together on documents. It has become part of your workflow.

But, there are things that bother you. First, you don't have ownership over where the data is stored. You're not paranoid, but you wonder if trusting all your precious data to an unknown location in the cloud is the best thing. Second, you really cannot afford to pay for more storage, but if would be sooo useful if you could just put everything in the cloud and be done with it, never having to decide whether something needs to go into Dropbox or not.

Enter Sparkleshare. It works on Linux, Mac, and Window 7/Vista (sorry, no XP). It is open source. It is free. And you can store the data in your own system. And it uses git, major geek cred there. That also means you can use sparkleshare to always have the latest snapshots on github etc on your local drive.

The following steps are not too difficult to follow, but it is definitely not as easy as Dropbox.

1. Preparation. As my linux box is running Ubuntu Lucid, I had to upgrade GIT by subscribing to this repository

2. Install sparkeshare, either by subscribing to this repository or installing the client from the website

3. Setup your own data store

4. ?

5. Profit!

Downsides

1. As it uses GIT on the backend, it is not very good at dealing with large binary files. I haven't found any issues yet, but that is what google tells me

2. As it uses GIT on the backend, forget about putting a GIT repository in a sparkleshare folder

3.No iphone/android/Webos clients. Although you may run a web frontend to the GIT repository so you can get to the files that way too.

Saturday, February 25, 2012

Would You Like to Know the Truth?

Spread the Word, people. PDF is here http://www.csun.edu/~hpostma/thetruth.pdf

Monday, December 26, 2011

Getting Things Done with Thunderbird

Introduction

Let me ask you a question: have you ever used todo lists? I have. Lots of them. And I don't like them. Some require singing on to an online service, and collaboration is clunky, or require money. And then they require your team members to use the same service, let alone people outside of your organization. Some are just downright too basic, and don't sync between multiple computers. But there is a bigger issue, so let me tell you what I do instead and why.

Many of my tasks (stuff on my plate) come to me through email. So, if you have a todo list that is separate from email, you will end up copying and pasting a lot back and forth. Yes, I know you can create todo items from email, but you still need to make sure your todo list stays in sync with your email.

Yes, I have read David Allen's "Getting Things Done", and I love the book. I have tried the paper approach he describes, but I ended up having to write down things that come in through email. And then when the piece of paper shows up on top of my list, I have to dig through the email and locate the item so I can work on it. In all fairness, David's book is not so much about using paper, it is more about the process.

My Approach

So, a far easier approach is to overlay the todo list with your emails. To do so, I am using labels, and I do that in Thunderbird (There are a few tutorials on the web, e.g. : see here).

I have been using this setup since the Summer of 2011 and haven't modified it since the initial setup.

I have renamed the standard thunderbird labels to

"DoNow" and "Do". I really see no need for more than two categories of urgency. You either have to do it NOW or do it a little later. DoNow versus Do only refers to time, both are important. I do not believe there are unimportant things you need to do. If they are unimportant, you don't need to do it. If you need to do it, it is important.

'WF' : Waiting For. I use it to mark emails that I have sent to someone asking for some action. I also use it to track purchases I have made when the order confirmation email comes in, or when the confirmation of a paper submission comes in.

'Someday/Maybe' : I use it for keeping interesting things around for later. Say: a funding opportunity that may be interesting in the future, or a new journal asking for submissions.

Email Triage

I have 'new mail' notifications turned off, so I only check my email when I'm done with something. Yes, I split up my work in 15 minute chunks, so I never miss out on something really urgent. Most of the triaging goes on in the morning though.

So, once I open email, I hit 'N' to go to the next unread email.

If an email comes in and I can do what is being requested in less 2 minutes, I do it right away. This is on of the most powerful things in the David Allen strategy.

Examples:

So, an email comes in requesting if I can review a paper. I decide whether I want to do it, and if so, I hit the key "2", the shortcut for applying the label 'Do' in my label setup.

An email comes in asking for some information, I hit '2'

An email comes in saying we are out of gold for our evaporator, I hit '1' for 'DoNow'

An email comes in with a confirmation that I purchased something, I hit '5' : WF

I ask somebody to do something, I mark that email with 'WF'

Actually Doing Stuff

Now the time comes to do something. I hit the use the Quick Filter bar, and select 'DoNow' and do whatever is there. Afterwards, I hit 'Do'. I also have search folders setup, but prefer filtering my inbox instead. That way, if there was more communication on that task, I can quickly navigate the message thread.

WF : Waiting For. At the end of the day, I see what I have been waiting for. If things have been completed, I clear the label. If there has been no response, I choose to either send a reminder, or wait a little.

Someday/Maybe. Every once in a while, I revisit this list (about every week), and see if there are things that I want to work on now. In that case, I change the label to 'Do' or 'DoNow'.

Closing The Loop, Capturing Off-Line Action Items. So, what do you do if there is something on your plate that did not have an email that originated it? Send yourself an email! Like I said, most of my action items originate in email, so this rarely happens. But, if someone asked you in the hall to do something, ask them to send you an email, or, what I do, is send them an email with a summary of what you discussed and what actions were agreed upon. And apply the appropriate label. WF for if they need to do something, or 'Do'/'DoNow' if I have to.

Downsides.

What was the action? The problem with using 'email as todo list' is that you need to remember what action was required based on an email, or you'll have to reread the email every time you see it. This is a little bit out of line with the Zen approach to todo lists: only put things on there that are actionable, and use language that is action oriented ("Follow up", "Read", "Think about", "Do", "Write", "Google"), but it works for me. Besides, this is not too much of an issue, as long as you remember what action item was required. As a workaround, you could respond by sending an email to yourself, where you write what action is required and apply the label to that mail.

Dates: No capturing of due dates. If it is really important, and requires long stretches of work with not too many little tasks, use your calendar.

Final thoughts

And it's portable! Since I use thunderbird to access my google mail through imap, both thunderbird on my mac as well as linux machines are always in sync, and that includes the labels. Just make sure you name the labels the same on all thunderbird installs. Sometimes I need to restart the applications to force reloading labels, especially when coming to work, when I have been applying and changing labels on my laptop, and want to continue on my desktop.

Threading provides history and context. One of the greatest advantages of using this method is that you can always use the threaded message view to see the history and context of that action item. In addition, that also helps if you forgot to clear the WF label if you forgot to do so, because you see the follow up messages right below it.

Thread hijacking and multiple action items per email. Sometimes threads get hijacked, or you get a message with multiple action items associated with it. In that case, forwarding separate emails to yourself with separate action items may be advisable.

Cross-platform and -organization. As all my communication is through email, the collaboration and communication aspect of this todo list comes for free and it works across organizations!

No context. A lot of David Allen's framework is about contexts: At Home/At Work/Behind Computer. I don't use them, as I almost always have my computer with me, or I know I can only do a certain task when I'm physically at work or at home. If you want, you could have a third label 'AtHome', yes, you can have multiple labels, and use that to provide some context, but I don't. I do minor triaging on my cell phone, adding stars to things I need to apply labels to when I'm behind the computer. Yeah it is not ideal, but my silly phone email does not do mail labels and I'm not in the market for a new phone just yet.

No associated files. I wish there was a nice way to link files and folders to action items. So when you start work on a certain task, you automatically open the right file and folder. This would be especially useful for actions that have a long lead time, and documents don't get revisited until a few months later. If possible, I keep the attachments around and work off that.

Finally. Always be aware of the 'next thing'. When you clear a label, always ask yourself if there is a next action item that is supposed to follow after this one. Rinse, Repeat, Enjoy :)