Category ArchiveComputation
Acoustics &Computation &Science Wesley R. Elsberry on 31 Mar 2012
Python and the STFT
I've been going through biosonar data and while the SciPy specgram method is serviceable, I was interested in a short-time Fourier transform (STFT) implementation. There are a couple of ad hoc routines on Stack Overflow and the like, but I've started off with the Google Code PyTFD module. There are others out there as well, at least two projects including an STFT implementation are aimed at extracting time and frequency data from musical recordings. I may have a look at one or both of those at some point.
In any case, installing PyTFD involves downloading the code via Subversion and then running the setup.py script.
Since I spent more time than I think was absolutely necessary getting a couple of examples done with the STFT, let me run through an example in the hopes that helps somebody.
-
# Imports
-
from __future__ import division
-
from pytfd.stft import *
-
from pytfd import windows
-
-
import numpy as np
-
import numpy.fft as nf
-
import matplotlib
-
matplotlib.use('Agg')
-
import scipy
-
import scipy.signal as spsig
-
import pylab
-
from pylab import *
-
-
# [...]
-
-
w = windows.rectangular(8)
-
Y_stft = stft(clkdata,w)
-
extt = [0,Y_stft.shape[0]*1e-6,0,5e5]
-
pylab.imshow(abs(Y_stft)[Y_stft.__len__()//2:],
-
extent=extt,
-
aspect="auto",
-
origin="upper")
OK, so there's a fair amount of things to be imported along the way. The first three items (lines 2 to 4) are specifically for setting up access to PyTFD's STFT method. Line 18 sets up the window function to use in the STFT. Line 19 actually does the work, getting the resulting multidimensional Numpy array with the STFT result given a Numpy array input and the window.
Line 20 sets up the extent array to express the size of the X range and the Y range covered by the STFT. Lines 20 to 24 puts the result in a subplot. There are some issues there. The STFT results are essentially a whole series of Fourier transforms, and those have both negative and positive frequencies, and are complex values to boot. So the "abs" function provides a magnitude for each point. The slice yields just the positive frequency range. Then the extent gets set to the range represented by the STFT. The "aspect" parameter is set to "auto" so that the X and Y ranges can be calculated separately by Matplotlib. The "origin" is set to "upper" to put the frequencies in the expected orientation.
Here's a couple of the outputs:


Computation Wesley R. Elsberry on 11 Mar 2012
Raspberry Pi: The Shopping List
I ordered a Raspberry Pi Model B computer from Newark, so now I'm waiting for stock to catch up with the truly phenomenal initial demand.
If you are wondering what the Raspberry Pi is, it is a small computer board based on a Broadcom System On Chip (SoC). The SoC is ARM-based, so the operating systems offered so far are Linux distributions. The board has a CPU, GPU, 256MB of Ram, an SD card interface, a USB host interface, audio output, Ethernet network port, and video output via composite or HDMI interfaces. And it costs $35.
The Raspberry Pi is the brainchild of a United Kingdom non-profit organization that aims to make a low-cost programming platform to re-invigorate interest in computer science among students. Since computers have turned into consumer devices rather than primarily being programming platforms, students don't have a low-cost way to spark an interest in programming itself. Until, the Raspberry Pi folks hope, now.
But the Raspberry Pi is just now starting to be distributed in quantity. As the device comes, it is just a computer board a little bigger than a credit card. It doesn't even have a case. So there is some shopping to be done to trick out your Raspberry Pi once you order it.
The first order of business is power. The way I've seen this discussed is to get a powered USB hub and micro-USB power cable. The Raspberry Pi's power plug is a micro-USB interface. That port only hooks up power, so there's no problem hooking that into a powered USB hub that you'll also use for peripherals.
You'll need an SD card and a downloaded image of an operating system to run. I see people talking about 4GB or larger SD cards. I found a SanDisk Class 4 8GB for less than $3:
Getting something on-screen requires either an RCA composite video card and monitor, or something that you can hook up via the HDMI output. I've got two DVI-equipped monitors here, so I'm looking to link my Raspberry Pi with a cable that goes from HDMI to DVI.
The rest of the items are peripherals that hook up via the powered USB hub discussed above. If you don't have a USB keyboard and mouse, or if you prefer a trackball, or if you want to add audio recording capability, that all happens by adding USB devices. Here are some items I located:
I've put the Amazon links in the sidebar.
Computation &General Wesley R. Elsberry on 11 Mar 2012
Updating the Modular CV
Some time ago, I wrote about making a modular curriculum vitae in . Since that time, I've had to update the contents. Things change. Colleagues request current CVs to include in grant proposals, and given the current state of public sector employment it is no bad thing to have the CV ready to go.
But I'm now fighting a problem of separating content and presentation. There are different rules for formatting CVs and resumes, and I've done the wrong thing previously: I've copied and modified sections like employment history in order to change how the presentation happens. This is bad, because now any time I change something in my employment history, I need to make sure that every relevant copy gets changed. I needed some way to make it so there would be one and only one place where each piece of information would be kept, and apply that to different pieces of presentation code in the source.
The solution I found today is the datatools module for . This is a module that allows one to generate, read, and manipulate data stored in CSV (comma-separated-values) files. There is a lot of functionality in the module that I'm not using yet, but the ability to get data out of CSV and format it as needed is a big step forward for me.
I've created two CSV files so far, one to hold my education data and another to hold my employment data. The CSV files have more columns than will often go into an output format. For example, my education CSV has columns for my advisor name and my thesis title, even those don't appear anywhere in an output yet. This will allow me to keep all associated data together, whether or not it is currently used. Previously, I simply used comments to add this kind of information close to what it relates to in my source.
I'm using various sources of good resume formatting to get ideas. Here's the code to show my three degrees:
-
\usepackage{datatool}
-
-
[...]
-
-
\def\dtledu{
-
\vskip 0.125in
-
\dtlverbosetrue%
-
\DTLloaddb{edu}{wreeducation.csv}%
-
\DTLforeach{edu}{%
-
\graddate=GradDate,\degree=Degree,\major=Major,\university=Institution,\place=Location}{%
-
\noindent\textbf{\degree, \major:} \graddate, \university, \place\\
-
}
-
}
The "usepackage" line happens in the header. The "datatools" commands are only valid within the bounds of a document environment. I'm defining a macro "dtledu" to use in conditional statements. Within the macro, I skip an eighth of an inch down the page. I set the "datatools" package to emit a lot of debug information. The "DTLloaddb" command actually pulls in the contents of a CSV file. I first tried to use tab-delimiting, which is in the "datatools" documentation, but I couldn't get it to work. I eventually went with all default formatting: commas for delimiters, and double-quotes for separating fields. That means that any text that has a comma must go in double-quotes.
The actual work happens in the "DTLforeach" command. It uses the data that was read in. One line holds the assignments from data from columns to macros. Then a block appears where I can use those macros in conjunction with markup. Each line from my CSV is iterated over and formatted as I've defined it.
So this gives me a way to keep one place where my education information exists, and just one place for my employment information to exist. That information can be read in and formatted in different ways as needed for getting just the right output I'm looking for.
Computation &General Wesley R. Elsberry on 22 Feb 2012
Losing Revenue?
Have a look at this article on the BBC site.
There's just so much wrong. The mobile telephony companies are toting up projections of profit from SMS and MMS messaging and seeing a shortfall as a "loss".
"I think it's a growing threat which is manageable through the right tariffs and the right costing," Mr Barford added.
"People are still using the mobile networks to communicate - and they're willing to pay for that."
Yes, people are wiling to pay to communicate, but they are also going to look to find the best available methods. That evaluation is going to include cost. And the pricing telcos have artificially placed on SMS and MMS messaging simply is uncompetitive with other technologies now.
The buggy-whip manufacturers experienced a "loss of revenue" with the advent of the automobile. That doesn't mean they deserved to continue getting it, no matter what "tariffs and right costing" they might have contemplated. The only difference here is that the buggy-whip manufacturers were not also the only people selling and servicing automobiles.
Acoustics &Computation &Science Wesley R. Elsberry on 19 Feb 2012
Some Data Analysis and Visualization
As noted here before, I'm working through refreshing archived data, mostly from CD-ROM media. I've run into a whole batch of CD-ROM disks that are in good physical condition, but which mostly cannot be read. I'm trying some tools that I've seen recommended, but would be open to suggestions.
But the whole point of getting the archived data refreshed is to do something with it. And that's what I will aim to discuss here this time.
Over several years, there were a number of different technologies I was using to collect bioacoustic data. This means that I don't have one single type of data of interest. I have data that was recorded on audio cassette tape. I have data from a Racal Store V data recorder that was transferred to cassette tape. I have digital data from Keithley-Metrabyte DAS-1800 DAQ, Tucker Davis Technologies DAQ, and a couple of different National Instruments DAQ boards multiplied by at least two different multichannel scenarios. Plus, there's digital data transferred off of a Racal Storeplex unit via SCSI. There's mixed endian byte order issues, among other things.
I have a good software solution for two of these particular data acquisition scenarios. I wrote that between 1999 and 2001 using Borland's Delphi 5. In all, there's about 60,000 lines of code for data acquisition, reduction, analysis, and visualization. The original can handle multi-channel recordings taken from a single National Instruments board. A variant works on digitized audio recordings. That includes interactive data reduction with an automated click-picker whose choices can be refined with changes in parameters or by interaction with an oscillogram graph.
That still leaves a lot of data waiting for analysis. During my time at Michigan State University, I got into Python programming. There are a number of nice things about going after the rest of the data with Python. A big one is that Python is free, open-source software. I can have colleagues install it and not have to worry about breaking their budgets, which is a concern when one considers the well-established science and engineering scripting platform, MATLAB. While Python doesn't yet have all the "toolbox" capability of MATLAB, it has enough to move ahead with. For the scientific programmer, there are the Numpy, Scipy, and Pylab modules (I installed the Python(x,y) package on my Windows laptop, which includes those and more besides.) Numpy extends Python with a fast array and matrix manipulation capability. Scipy includes a variety of analysis tools. Pylab looks to put a wrapper on those two, plus the Matplotlib graphics module and the Ipython interactive shell.
I recently wanted to extract spectral information about dolphin clicks from one of the datasets that I hadn't previously examined. So I turned to Python to do that. The data was stored as raw binary, 16 bit signed integer samples. Reading that data was simply:
-
fd = open(fn, 'rb')
-
read_data = np.fromfile(file=fd, dtype=np.int16)
-
fd.close()
where "fn" is a filename pulled from the directory of interest. The "np" reference above resolves to "numpy". The three lines say to get an open file object, fd, by opening a file, fn, for binary read. Then, a Numpy array containing the data is returned by the Numpy static method, fromfile, given the file object and the specification of the data type as signed 16 bit integers. The third line closes the file object. If I had a problem with endian issues, there's at least a couple of ways to address that in Numpy. (Getting the wrong byte order should be obvious on visualization, but I've seen a professor merrily tout a new processing method for dolphin clicks when his slides clearly showed that he had a byte-order problem with his dataset.)
While it is better to handle DC offset problems at the time of data collection, sometimes you just have to deal with it at analysis time. This dataset handed me that problem. This problem is one where a time-varying signal should be centered at zero volts input, but instead centers at some non-zero voltage. Fortunately, it was a fixed offset, so a pretty simple approach worked nicely: find the mean value across the dataset, and subtract that value from each sample.
-
shiftdata = read_data + ([-np.average(read_data)])
The use of a Numpy array for the data means that the one line above handles the element-wise addition operation. The Numpy array on the left is now a floating-point array instead of an integer array.
My Delphi program had a click-picking algorithm that took a while to craft. I haven't ported it yet, so I just went with a very simple approach in Python. That looks at chunks of the data, where the chunksize was selected to be a bit larger than the maximum click width, but a good deal smaller than the interval between clicks. Within each chunk, the maximum value and minimum value are found. If the maximum and minimum are outside a defined noise level, consider it a found feature.
-
chunkmin = np.min(cary)
-
chunkmax = np.max(cary)
-
if (chunkmin <-noiseband) and (chunkmax> noiseband):
-
# Found a click! Or a transient, at least.
-
chunkmaxloc = cary.argmax()
Using the Numpy routines to find the min, max, and max location is pretty snappy.
Then, for each "click" located, I ran an FFT to get a power spectral density, and plotted that. I just used example code to add this functionality. (For underwater acoustics where pressure is measured, though, the conversion to decibels uses a factor of 20 rather than 10.)
So, for a quick and dirty script of less than three hundred lines total, I was able to:
* get a directory listing
* match to filename features to identify files to analyze
* remove DC offsets
* save new versions of the data
* scale the data according to field notes
* locate "clicks" in the data
* generate a PSD for each "click"
* collect PSD data
* generate and save oscillogram/PSD plots
* rank "clicks" on spectral features
* copy off plots of the highest-ranked clicks to a directory
My 2.4GHz dual-core Ubuntu workstation ran this script on 230 megabytes of data, producing over 1,400 graphs, and did it in eight minutes time. I've just located a calibration sheet on the hydrophone used, so once I've digitized that and applied it, I'll post an example with real dB numbers on the axis.
Acoustics &Computation &Science Wesley R. Elsberry on 12 Feb 2012
The Weekend
I don't know what other people got up to this weekend, but mine has been pretty well filled with computing projects.
I've been working with my friend Marc to try to get to the bottom of the Verizon FIOS connection foul-up. We each ran TCPDUMP on our respective machines while making a request that could be fulfilled (a small static HTML page) and one that could not be fulfilled (a dynamic page for webmail). We've sent the logs off to a networking guru friend of ours to see if he has any ideas. While I fully expect that this is a problem in Verizon's gear and processes, we are continuing to test any possibility that a fault in our gear could be an issue.
As I've mentioned previously here, I have data stretching back to the mid-1990s on CD-ROM. I've made a chunk of progress toward refreshing the archive by copying various of those to hard disk. It takes time, and needs manual attention every five minutes or so to unmount the last disk, load the new disk, mount it, and set up a copy process. Fortunately, most of the disks simply copy without error. I'm using ddrescue to go after the few files that won't copy cleanly.
I've also been going through some of the packed boxes to locate more disks to be refreshed. Along the way, I've been reminded that I also have a pile of video and acoustic recordings on tape to digitize as well. I do have a cassette tape deck set up to digitize to my laptop, but I haven't gotten my desk set up nicely to incorporate the video digitizing machine into a smooth workflow. From left to right, I have a Macbook Pro, a Viewsonic 24" LED monitor for a second screen for a laptop, a Gateway MT6458 laptop running Win7, an Optiquest 15" monitor for a desktop machine, plus keyboard and mouse for a desktop. Under the desk itself, I've got the video digitizing machine and the workstation/file server box. The video digitizing machine was built as state-of-the-art in 2001. It runs Windows XP, since the digitizing card doesn't work under anything more recent. It still does a nice job of pulling in analog sources in a DV video stream. The file server is much more recent, being built in 2007. It runs Ubuntu Linux 11.10. There's 4 terabytes of hard disk storage in that machine, which we use for our project files, personal files, multimedia, photos, and data. We're coming up to the limits on that, especially after this weekend's work.
I found a box of pocket notebooks, several of which have notes from our research data collection. But I did find one that has notes from the 1997 Discovery Institute conference on "Naturalism, Theism, and the Scientific Enterprise". I see from my notes that Michael Ruse classed approaches to "religion v. science" into "conflict", "accommodation", and "separation". I don't think "accommodation" was used by Ruse in exactly the same way that more recent commentary has gone, but I thought it interesting to see the word there, anyway.
I'm also working on some Python programming and a PHP/MySQL project. Between these things, that pretty well soaks up the time.
Computation &General Wesley R. Elsberry on 31 Jan 2012
Verizon FIOS Continues to Not Talk to Verizon FIOS
I have two new trouble tickets with Verizon FIOS as the connectivity situation continues to be nearly completely non-functional, as it has been since January 10th. The one entered from the Verizon Business FIOS side of things is TXP08R8CY. During the hour-and-a-half tech support conference call needed to get that one going, I happened to inquire about my previously-entered trouble tickets, and was told that they had been closed. Since the problem continues, I insisted that the tech set one up for my residential FIOS account, too. That one is FLCP08R8EN.
If you are a Verizon customer and have difficulty getting through to this site or other sites I run, be sure to reference the above tickets when you put your complaint in. Thanks.
Computation &General Wesley R. Elsberry on 16 Jan 2012
Verizon FIOS Doesn’t Talk to Verizon FIOS?
I have a bit more information about the connection difficulties I've been having with my ISP, Verizon FIOS. I have a residential account in Palmetto, FL with Verizon FIOS. Mostly, it works fine. I can get to a host of web sites without difficulty, and the transfer speeds are great.
I do remote system administration on two servers in the Dallas-Fort Worth Metroplex. Those servers get their connection via a Verizon FIOS Business plan link. (Yes, Verizon, the servers are on an account where serving is usual and expected.) One server provides my regular email, the other serves a whole bunch of web sites via virtual hosting. And things there are mostly working, where the outside world can merrily get pages served on demand.
But...
As of sometime early last Tuesday morning, January 10th, Verizon FIOS stopped reliably talking to Verizon FIOS. I can tell the approximate time of the outage as the last email message my computer here picked up from the server there was at 1:09 AM CT. The problem is very likely to have manifested within a very few minutes after that. And the problem's characteristics are just plain weird. One expects most 'problems' with connections to be user error. Certainly that's the primary basis of Verizon FIOS's residential account tech support, who are ready to quit if the problem isn't solved by having the user clear their browser cache or resetting the router. This problem, though, is more complex and is not localized to my particular account. First, not all connectivity is gone, just *most* connectivity. I can use SSH to log in remotely and use commands that return small amounts of information. Once I try a command that would return a page or more of text, the connection drops with a 'Broken pipe' message. There's a web page that is static and is only a few hundred characters in size that I can successfully retrieve. But none of the web sites that rely on web applications (Drupal, WordPress, and IkonBoard) do anything but spin forever while the browser displays 'Waiting on ...'.
So let me jot down some things I've learned about this so far.
* It isn't a DNS issue, as 'nslookup' finds any of the domain names and returns the correct IP address quite rapidly.
* It isn't a single port failure. Ports 22, 25, 80, and 587 are, at a minimum, included in the affected list.
* It isn't a complete break, as connections on the scale of a single packet of data at a time work.
* Using traceroute for other websites shows three hops taken within the Verizon routing center in Tampa. Traceroute for the affected servers shows two hops taken similarly, but the third times out.
* My parents live in Lakeland, Florida, a goodly distance away from where I live, and have Verizon FIOS as their ISP. I visited there this past weekend and asked my dad if he had been able to check this blog recently. He said no, not for about the past week. I tried traceroute from their connection, and it behaved the same way as from my home connection. The problem is not localized, it affects other Verizon FIOS customers.
* I've heard from Texas where another Verizon FIOS user of the email system cannot connect to the email server. I don't have a traceroute result from them to compare.
I have two open tickets on this problem with Verizon, FLCP08NT6J and FLDQ090SXY. There are some other people who have posted to the web saying that they are having network difficulties with Verizon FIOS in the same time frame, but I haven't seen a report that exactly matches what I am seeing. I'm writing this post by the expedient of using a proxy for my browser, which is a nuisance. (While it is on, my Google search results tend to come back in German, which I can't read.) It's a bit of a Catch-22, since I'd like to get feedback from Verizon FIOS users, but if the problem is of the nationwide scale that I expect it is, this post will be unaccessible to them from that account. On the other hand, if it is accessible via Verizon FIOS elsewhere, that would be useful information to have. If you are a Verizon FIOS user, I would appreciate it if you could run traceroute from the Verizon account to baywing.net and copy the results into a comment here. I'll copy my traceroute results into a comment here shortly.
How to invoke traceroute:
Under Windows, open a command prompt. In the command prompt, type in the following:
tracert baywing.net > tr_baywing.txt
It will take a few minutes to complete if you also have the problem I'm having. The result ill be in a text file, 'tr_baywing.txt', in that directory. Copy and paste the text in a comment here if you aren't seeing the problem, or contact me if you are having the problem.
On Mac or FreeBSD, open a terminal window. At the command prompt, type in:
traceroute baywing.net > tr_baywing.net
On Ubuntu Linux, open a terminal window. At the command prompt, type:
tracepath baywing.net > tr_baywing.net
Here's my email, if you can't leave a comment here (remove spaces and convert to symbols as indicated): w e l s b e r r at b a y w i n g dot n e t
Computation &General Wesley R. Elsberry on 14 Jan 2012
Connection Issues
My connection to the servers in Texas from my home systems is unreliable. For the moment, my only reliable link to various of my web sites and my usual email is via my Android phone. Fortunately, I'm grandfathered into an unlimited data plan and have a Bluetooth keyboard. But that is still not a long-term solution. I have a trouble ticket in for my Verizon FIOS ISP that has been active since this past Wednesday without resolution. I just got a call from Marc saying that another email user is having much the same connection problem, so he's also putting in a trouble report from his side. The servers run on a Verizon FIOS business plan, so connection outages are a concern on that basis, too.
Computation Wesley R. Elsberry on 22 Sep 2011
Refreshing Data, Part Two
Some time back, I mentioned getting data off CD-ROM and putting it on hard disk with a second hard disk for back-up. As time passes, this gets more critical. I think archivists start getting antsy about CD-ROM after a decade or so, and I have media that go back to 1996.
And I have run into CD-ROM data disks with various reading errors.
So I thought that I would mention a freeware tool for Windows that addresses getting what can be gotten from a CD-ROM with problems. This is Roadkil's Unstoppable Copier (RUC). Fortunately, you can stop it in bad circumstances by killing the process in Task Manager. I've done this after setting it to work on a CD-ROM with an obvious, visible blemish. In its default setup, RUC will attempt multiple reads of bad sectors in order to recover as much of a file as possible. This leads to it taking a long, lllllooooonnnnngggg, time to get through a patch of damage. Longer than I was willing to wait, anyway. So in the "Settings" tab, I set it to "Auto Skip Damaged Files". This copies off all the undamaged files from the CD-ROM, and it does so fairly expeditiously. For some CDs, I may decide to let it trundle for a few days to analyze things, but first I want to get as much of the good stuff secured as I can. This tool looks to be a help in that regard.
The lengthy recovery process is probably most useful for large text files, where recovering a majority of a file is preferable to losing all of it due to a possibly small section that is damaged. For binary files, this may not be universally useful. The data files I have are raw integer data, so as long as the reconstituted file preserves the same length, I can recognize the bad patches and leave them out of analysis. That may not hold true for ZIP files and other compressed archives, JPG images, and the like.
Acoustics &Computation &Science &Wildlife Wesley R. Elsberry on 14 Aug 2011
Multiple Sound Sources in the Bottlenose Dolphin
It's been a long time coming, but the paper on evidence for multiple sound sources in the bottlenose dolphin appears in the October 15th issue of the Journal of Experimental Marine Biology and Ecology. I've been told that the PDF will be freely available soon, hopefully in the next week or so.
The abstract is:
Indirect evidence for multiple sonar signal generators in odontocetes exists within the published literature. To explore the long-standing controversy over the site of sonar signal generation, direct evidence was collected from three trained bottlenose dolphins (Tursiops truncatus) by simultaneously observing nasal tissue motion, internal nasal cavity pressure, and external acoustic pressure. High-speed video endoscopy revealed tissue motion within both sets of phonic lips, while two hydrophones measured acoustic pressure during biosonar target recognition. Small catheters measured air-pressure changes at various locations within the nasal passages and in the basicranial spaces. Video and acoustic records demonstrate that acoustic pulses can be generated along the phonic fissure by vibrating the phonic labia within each set of phonic lips. The left and right phonic lips are capable of operating independently or simultaneously. Air pressure in both bony nasal passages rose and fell synchronously, even if the activity patterns of the two phonic lips were different. Whistle production and increasing sound pressure levels are generally accompanied by increasing intranarial air pressure. One acoustic “click” occurred coincident with one oscillatory cycle of the phonic labia. Changes in the click repetition rate and cycles of the phonic labia were simultaneous, indicating that these events are coupled. Structural similarity in the nasal apparatus across the Odontoceti suggests that all extant toothed whales generate sonar signals using the phonic lips and similar biomechanical processes.
This was a big undertaking, requiring the coordinated effort of a lot of talented and busy people.
Diane Blackwood designed and implemented our acoustic recording layout and the dolphin stationing device and biteplate, and made sure the amplifying equipment was operational and protected from incident. (Incidents with electronics in proximity to sea water are all too common.) I designed and wrote the software that acted as a multichannel digital data recorder, the data reduction program, and the analysis program. Bill van Bonn was our veterinarian who spent our data recording sessions lying prone on the dock as he placed, checked, and positioned the endoscopes and pressure catheters. Our principal investigator, Ted Cranford, operated the video side of things, including the high-speed video capturing the endoscope views. Sam Ridgway and Don Carder consulted with us, helping us with the use of the pressure catheters (which had previously been used in two prior studies they authored). Monica Chaplin and Jennifer Jeffress were the dolphin trainers on the spot during data recording. Tricia Kamolnick and Mark Todd were trainers who helped get the subjects prepared for our data collection process, and Mark Todd implemented the regular video system. It took between two and three hours each data collection day for us to set up, test, and calibrate all the equipment. Breaking down took somewhat less time, but I would still have to run a custom program to demux the data, produce images visualizing the data for each trial, and then shift the day's data off the hard disk and on to CD-ROM media.
Update: The Marine Mammal Center has put up the PDF of the paper.
Computation &Philosophy Wesley R. Elsberry on 25 Sep 2010
The Turing Test as Gender Discrimination
I jumped into discussion of a comment by Greg Laden on Facebook that touched on the Turing test.
There was a comment by Dan Fincke that got me interested:
indeed, at this point I'm generally more impressed when I'm convinced a girl talking to me online is NOT a robot
My reply:
Dan,
Ironically enough, the Turing test as presented by Turing in his famous paper is not as generic as most people think. A male observer compares the conversation coming from a female correspondent and a program, and is supposed to pick out which is the female. So your comment fits right into Turing's original test conditions.
One can speculate that Turing's own gender identity issues had something to do with him casting it in that way. The difference between the somewhat-mysterious other gender and a program trying to imitate a female may have been considered by Turing as a more difficult for a male observer, and thus a slightly lower bar as a sufficient criterion for intelligence in a computer program. Or it could be that he just forgot to clarify that the imitation game with the computer involved could be gender-neutral if we wanted to do so.
Greg Laden jumped back into the discussion:
Dan, given a recent conversation on facebook, I'm impressed when a girl talking to me on line is NOT some guy who is pretending to be a girl but really looks like ZZ Top.
Wesley, interesting. I think but I could be wrong, that most of the first order (or should I say first generation) descriptions of the Turin test do not say that. I'm not sure if I've ever read the original (I think the first place I saw it was in something written by Gardner).
And my response:
Greg, you are right that most descriptions of the Turing test after Turing are phrased more generically. In the original paper, Turing says:
The new form of the problem can be described' in terms of a game which we call the 'imitation game'. It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart from the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either 'X is A and Y is B' or 'X is B and Y is A'. The interrogator is allowed to put questions to A and B thus:
C: Will X please tell me the length of his or her hair?
Now suppose X is actually A, then A must answer. It is A's {p.434}object in the game to try and cause C to make the wrong identification. His answer might therefore be
'My hair is shingled, and the longest strands, are about nine inches long.'
In order that tones of voice may not help the interrogator the answers should be written, or better still, typewritten. The ideal arrangement is to have a teleprinter communicating between the two rooms. Alternatively the question and answers can be repeated by an intermediary. The object of the game for the third player (B) is to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as 'I am the woman, don't listen to him!' to her answers, but it will avail nothing as the man can make similar remarks.
We now ask the question, 'What will happen when a machine takes the part of A in this game?' Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, 'Can machines think?'
Turing does himself later refer to the conditions of the game he introduced more generically, which probably licenses everybody else to treat the test as gender-neutral.
It might be urged that when playing the 'imitation game' the best strategy for the machine may possibly be something other than imitation of the behaviour of a man. This may be, but I think it is unlikely that there is any great effect of this kind. In any case there is no intention to investigate here the theory of the game, and it will be assumed that the best strategy is to try to provide answers that would naturally be given by a man.
But I still find it interesting that Turing's explicit description of the "imitation game" is a *gender-discrimination* test, even with the computer in play.
(I'll note here that my recall of the original did not include the note that the observer could be either male or female, and that vitiates part of my speculation from the first comment I made above.)
Computation &Science Wesley R. Elsberry on 04 Aug 2010
New Scientist Article on Evolving Programs
This New Scientist article discusses some really cool results coming out of the Devolab at Michigan State University. In for particular attention was my colleague, Laura Grabowski, who defended her dissertation on memory evolving in Avidians shortly before I left MSU. She is now a professor at the University of Texas - Pan American in Edinburg, Texas, continuing her work on artificial life.
Rob Pennock and Jeff Clune also got attention in the article, and a paper of mine (with Laura and Rob) published last year got a link in the article.
Computation Wesley R. Elsberry on 14 Jul 2010
Toyota, WSJ, and Computers
I heard a segment on NPR this evening about the Toyota sudden uncontrolled acceleration problem (I'll just call it SUAP). They were following the lead of the Wall Street Journal, who said:
The U.S. Department of Transportation has analyzed dozens of data recorders from Toyota Motor Corp. vehicles involved in accidents blamed on sudden acceleration and found that at the time of the crashes, throttles were wide open and the brakes were not engaged, people familiar with the findings said.
The results suggest that some drivers who said their Toyota and Lexus vehicles surged out of control were mistakenly flooring the accelerator when they intended to jam on the brakes. But the findings don't exonerate Toyota from two known issues blamed for sudden acceleration in its vehicles: sticky accelerator pedals and floor mats that can trap accelerator pedals to the floor.
What the WSJ reported, though, doesn't exonerate Toyota of anything.
NPR had a commentator on who said something to the effect that 100% of the cases examined showed the same thing, and that one would be hard pressed to argue that the computers got it wrong every time. Not at all, Mr. Non-programmer dude on the radio; all it shows is that the fault is upstream of the black-box recorder and not downstream of it. And it isn't just the driver who is upstream; there is a lot of Toyota software and hardware there, too. If the Toyotas have an intermittent fault that causes the brake to be recognized as if it were the accelerator, it would explain the data far better than the "all those drivers forgot which pedal is the brake pedal, some of them for minutes at a time" conjecture. That's just one way in which the problem might occur. In any case, it appears that the data recorders do tell us what the computer controlling the car operated upon, which is full-throttle acceleration and no attention to brakes whatsoever, which corresponds neatly with the survivors' reports of what happened to them.
I'm thinking when all is said and done, this is going to be discovered to be a software fault in Toyota's control program. I'm hoping the commentator on NPR gets 30 seconds of airtime to make an abject apology to the survivors when that happens.
Update: I found the NPR All Things Considered transcript, and the fellow whose name I didn't recall is Mike Ramsey of the Wall Street Journal.
NORRIS: How many data recorders were analyzed? And of those, how many of these accidents were found to have been caused by driver error?
Mr. RAMSEY: Well, we have been saying several dozen, all of them that were -fit the criteria, were found to have the brake not depressed and the accelerator wide open. So 100 percent of the incidents where it fit that criteria, that's what was found.
NORRIS: One hundred percent?
Mr. RAMSEY: Yes.
NORRIS: It sounds like, upon hearing that, that the government might be on its way toward exonerating Toyota.
Mr. RAMSEY: Well, when it comes to incidents where people are claiming electronic throttle control, the government has already said they have no evidence of it. This set of data, what it does is it completes the other side of it, which is if it's not that, then what is it, right? It's probably driver error. So the government has been hesitant to say that so far.
[...]
I totally understand the position of these people. And if you hear many of these anecdotes, it's incredibly compelling to hear them and all of their evidence. That said, when you have dozens of incidents that are similar where people say they were stepping on the brake and the car accelerated anyway and hit and that all of these incidents show virtually the same findings, that's difficult to believe that the computer was wrong and, you know, they had a special instance.
(Emphasis added.)
Mike, the data recorder can say what it says and the survivors still be right. Try doing some embedded programming sometime. You haven't come up with anything that in the least puts their accounts in a bad light, at least not to those who know something about computer control systems.
And be scripting your apology.
Update 2: I've marked in bold a particularly interesting piece of information from Ramsey. We have dozens of incidents that show exactly the same thing: no depression of brakes ever, and full depression of the accelerator throughout. This pattern is not what one would expect of humans behaving either in panic, where accidental touching of the brake would be likely, or in Mr. Ramsey's alternative of confusion of pedals. Pumping the brake is common, so if people were confusing the accelerator with the brake, we'd expect to see some fraction of those incidents showing variation in the accelerator control, and according to Mr. Ramsey, we never see that. That's pretty damning for Toyota, I think. Having absolutely the same data pattern across dozens of drivers when some of those incidents went on for a significant amount of time doesn't speak to mass confusion of drivers; it says "computer screw-up" to me.
Computation Wesley R. Elsberry on 10 Jul 2010
Fun with Email
For a while after moving in here and getting our new ISP, we were able to send our email through our server in Texas using port 25. That stopped working, so it was time to deal with the joys of managing with an ISP blocking port 25.
The first step was getting Postfix on our email server in Texas to use the submission port, port 587. There's about six lines in Postfix's "master.cf" configuration that have to be uncommented and restarting Postfix, plus making sure /etc/services has port 587 uncommented.
I tested things out using my Thunderbird email client, and things went fine, with just a dialog about accepting the SSL certificate from the email server. That made me feel good.
Then I tried to get Diane's antique installation of Eudora to connect up. My mood went down. Trying to add ":587" to the SMTP server name resulted in Eudora not figuring out where the server was, despite various places online where Qualcomm says appending ":587" would fix things up. Another round of searching turned up an odd procedure: copy "esoteric.epi" up to the main Eudora directory, restart Eudora, then set the port for SMTP in the new "Ports" section of the Options part of the menu. That brought me to the next stop: SSL negotiation failed because the certificate had expired. Last year, Jeff handled getting the certificate set up, so now I got to work on the SSL certificate. But things did eventually fall into place, and our email now flows in its accustomed channels once again.
Computation Wesley R. Elsberry on 20 May 2010
Trying to Find a Market
Following up on a comment from Dick Hoppe, I expanded upon the data compilation I wrote about earlier concerning the Manatee County 2010 Tax Certificate Auction. Now I'm pulling in data from three additional pages and have it all tidily summarized in the resulting comma-delimited CSV file. I made a short demo CSV file with three of the entries so people could pull it into a spreadsheet and see how it works. I made a page to explain what I had and why an investor ought to want to have it here, and that includes PayPal links for people to pick either the MS-DOS/Windows or the Unix/Mac OS X version.
My biggest problem is there is a small market for this, and I don't really have a good way to make them aware that there is an alternative to them doing all their information look-ups manually themselves. I tried making a posting to Craigslist, but all the responses I've gotten so far are spam.
Anybody else have experience with time-limited, targeted market information compilation marketing?
Computation &Law and Politics Wesley R. Elsberry on 15 May 2010
Beautiful Soup and Tax Certificates
Manatee County offers tax certificates to bidders. When property owners fail to pay their taxes, and that is happening a lot right now, the county gets other people to pay the taxes and gives them a tax certificate, which is a lien against the property. Each year, an auction happens where people can bid to get these. The bid amounts are in percent interest, and range from 18% at the high end down to 0%. The person bidding the lowest percent interest gets the tax certificate, after, of course, they pay the county the outstanding taxes.
Today, there was a practice auction. This is all handled online now. The page included the option to download data on the 9,000+ properties in CSV, XLS, or XML formats.
Diane is interested in the process and specifically in the land just to the south of our property. It currently has unpaid taxes, and if the executors of the former owner's estate don't pay up by June 1st, it will be included in the tax certificate auction. But she is also interested in what else is available out there.
That brings up an interesting problem. The downloaded data is minimal, giving just a parcel ID, outstanding tax balance, and some auction-related attributes. On the other hand, Diane would like information that is available online from another county office, that of the Property Appraiser.
I worked on a Python script to handle the job of getting additional information on acreage, zoning, the address, and bits like that. I hadn't done anything with Python regular expressions to date, and started looking at that and getting less enthused by the minute. The issue is getting data out of an HTML page downloaded from the Property Appraiser. I could have it done in Perl right offhand, but wanted to develop my Python skills a bit.
On the other hand, getting the job done is the top priority, so while looking stuff up, I ran across the BeautifulSoup module for Python. The web site sounded promising, and a number of other people seemed to have found it useful. Very useful.
BeautifulSoup is an HTML/XML parser. It aims to not only handle clean XHTML, but also to do reasonable things with the sort of HTML people were writing when the Web was young, in other words, bad HTML.
I downloaded the module distribution, and got it uncompressed. Setup is simply
python setup.py -install
My usage so far is to pluck values out of adjacent cells in a table. I can load a BeautifulSoup object with the HTML in question, then ask it to find the label I'm looking for in text. Then I just ask it to retrieve the next text in the document, and that is the stuff I'm looking for.
Anytime one gets started with a library to do a job, it can take a while to get going with it. BeautifulSoup let me get my job done without a lot of effort on the initial learning curve. Right now, my script is about halfway through getting the additional data wanted for those 9,000+ properties. We'll be able to look it over in the morning. The whole script I'm using is less than a hundred lines of code, and that reads in a CSV file, traverses that, gets the associated profile page from the Property Appraiser for each property, parses that with BeautifulSoup, adds the additional fields of info to the original, and writes out a new CSV file with the more complete data set.
Computation Wesley R. Elsberry on 11 Apr 2010
New ISP for Us: Verizon FIOS
We got our new ISP activated on Saturday, and we had selected Verizon FIOS. On a dollars per bandwidth unit basis, it was by far the most effective way to spend the money. The choices where we are were Verizon DSL, Bright House cable, and Verizon FIOS.
I had priced the DSL a couple of months ago, and Verizon was offering 1 Mbps for $19.99/month and 1.5 Mbps for $29.99/month. We were considering the 1 Mbps DSL service simply on the cheapskate basis. However, when I checked again last week, the prices had been sharply changed upward. The 1 Mbps was $29.99/month, and the 1.5 Mbps was $39.99/month. I happened to have a chat session going with a Verizon representative, and part of it went something like this:
Me: So what additional value has been added to the DSL options to make them worth $10 more a month now than back in January?
[2 minute pause]
Verizon Rep: I'm sorry, I don't have any information available about that.
Me: Good answer.
While we didn't really want to reward Verizon for the predatory pricing structure they've created on DSL, the bandwidth available with Verizon FIOS was just too tempting. The FIOS Internet service starts at 15 Mbps downstream and 3 Mbps upstream at $54.95/month. It's more than we wanted to be budgeting for our internet, but we really do use it.
A Verizon service person called last Thursday to discuss access to our driveway. It's a mere 663' long. His job was to get the fiber optic cable laid down to the house. We found out that they had to put in a splice; they've marked that patch of ground with flags and recommended that we don't plan to extend our driveway over that spot. I had informed them about the long driveway in the chat session, and they get their fiber optic cable in 1000' lengths, so they should have had plenty to manage to get there without a splice.
The actual install went fairly smoothly. Verizon says installation may take between four and eight hours, but our install was done in about three.
I did a Speakeasy bandwidth test, and the gear delivered a bit more than advertised, so that's to the good. We've been using Bright House cable to access the internet since last August, and we've had a variety of annoying lapses in what we've been able to do. For instance, we use email on a server located in Texas. We have not been able to send email through that server for several months. That has now been remedied.
The next step is to get our home internal network set up again. Right now, FIOS does look like it will help us get done the things we need done on the internet.
Update: OK, I found a slightly annoying thing here: poor DNS resolution. Apparently, the FIOS router defaults to a set of not-so-hot nameservers. Fortunately, I can specify better ones on my individual computers. See this page for an explication of the problem and the fix.
Computation &Science Wesley R. Elsberry on 05 Feb 2010
Refreshing Data Storage
I have data on Compact Disks (CDs) from past projects. The technology was getting toward being affordable around 1996. CD writers dropped under $100 for the first time somewhere around there, and media started selling for less than $5 a disk. The amount of storage space on a CD was comparable to the size of hard disks available at the time, and optical storage seemed far better than tape as a medium. So now I have cases, drawers, and spindles of CDs dating right back to 1996.
No storage medium is perfect, so archived data is a commitment and not just a static collection. Last month, Sam asked me what I would like for my birthday. I said I wanted a disk for backing up data. After having a look at off-the-shelf external hard drives, it seemed that all the models I looked at had warranties of 1 year or shorter. However, if you buy an internal hard disk and a separate USB enclosure, the warranty on the drive can be much, much longer. Sam and I visited the Newegg site and picked out a Western Digital 1.5 terabyte drive and a Rosewill USB enclosure. The drive comes with a 5-year warranty. I can pair this with another 1.5 terabyte disk so that I can copy off my data from the CDs, then copy to the second hard disk.
Back when I was about to move from California to Michigan, I had a chat with a fellow who works for the Internet Archive. That is a project whose modest aim is to store the World Wide Web. All of it. You can browse sites as they were in 1995. Well, with a few caveats. My acquaintance said that the Internet Archive's data storage was based on consumer-grade IDE drives. You can get them cheap and in quantity, and if you store things on multiple disks, the redundancy will help. That's because disks fail. With an organization like the Internet Archive, they rack up lots of failures. They have to be swapping out bad drives and attempting to restore content from remaining copies on other drives. And they couldn't, he said, quite keep up with the failures. Some data does get lost because failures occur before the redundancy can be exploited to restore some sites.
I figure for my purposes, the data I have is a copy of what my colleagues have, and for the hard disk copy, I aim to have two of those. I think that should be sufficiently paranoid. The process or workflow takes about six to seven minutes per CD to create a directory, copy the files, and mark the CD as copied. I'm working on the third page out of 32 pages in a CD case now. This will take some effort, but then I invested years of my life getting that data in the first place.
Computation Wesley R. Elsberry on 27 Jan 2010
Students and the Apple iPad
Apple announced its iPad tablet computer today. The device seems to be mostly a large-screen iPod Touch. The intriguing aspects of the iPad, at least to me, were that Apple says that for the 3G versions ($130 extra over the WiFi-only versions) these devices will be unlocked, and that Apple has arrangements with textbook publishers for EPUB content. It seems that Apple was able to wring some few concessions from AT&T concerning the unlocking and the two tiers of data plans. While the data plan costs are not cheap, they manage not to be exorbitant.
I saw that some other commentators were perplexed about the time taken in the announcement to show Apple's iWork applications as they are ported to the iPad. I think, though, that a major market for the iPad might just turn out to be among high school and college students. Consider the points made and that market:
- Light enough to carry around in the backpack (If a student can skip carrying even one textbook and carry an iPad instead, they will be lightening their load.)
- 10 hour battery life, good enough for the school day
- Low cost applications that will be good enough for note-taking and in-class analysis
- Capable of holding and displaying full textbook content in color plus supplemental multimedia
- Cost low enough that it is compatible with current budgets for textbooks
- WiFi for on-campus connectivity and research
The fact that it also does a bunch of multimedia service plus gaming will be seen as a plus, at least by the students if not their parents.





CafePress Shop