Plotting Arduino Temperature Logs #1 - Arduino Yun logging minute data for a whole year

Last year I had put an Arduino Yun together with DHT22 sensors and an LCD display as a clock that also displays my indoor temperature and outdoor temperature (and humidity) in my apartment. I've been surprised at how much better my ability to guess the temperature is after having this by my desk for a year.
Arduino Yun Clock + Temperature Logger 

The system:
  • Log indoor & outdoor temperatures every minute to SD card
  • Timestamped log messages using RTC clock
  • Display current indoor/outdoor temperature on LCD screen
  • Ability to pop out SD card and get data without crashing Arduino
Every minute it logs to a file, and that's essentially the gist of the Arduino code

// Log data every minute
if (minute() != last_minute) {
  last_minute = minute();

Plotting a single log file using d3

I decided to use d3 to do some data plotting of the logged data. The Arduino has been logging data every minute to a text file, and logs to a new text file each day, a few entries look like this:

2015/01/18 00:24 78.4 14.1 37.0 17.0 78.7 19.6
2015/01/18 00:26 78.3 14.1 37.0 17.0 78.7 19.6
2015/01/18 00:27 78.2 14.1 36.0 17.0 79.1 19.7
2015/01/18 00:29 78.0 14.2 36.0 17.0 78.8 19.8

The first column is date, then time in PST, the next 6 columns are humidity & temperature (Celsius) for the outdoor dht22 sensor, indoor dht11 sensor and indoor dht22 sensor respectively. I happened to have a dht11 sensor connected also for no important reason.

Since the data is logged every minute there are 1,440 messages a day, and roughly half a million in a year, though I had it turned off for a month or two here and there.

So what's the best way to plot a half a million data points? Well, for quick and dirty analysis we can just pass that into matplotlib or similar and it'll work okay. However, that's not the best idea since it's wasted computation as we don't have half a million pixels on our screens and humans don't really parse that much data anyway. It'd be much nicer to build some sort of web-based visualization system that can take subsets of the data, and aggregate by hour, day, week, or month for example.

There's a few straightforward options out there, flot is a fun one that I've used in previous internships, but I've heard good things about d3, especially for real-time data, and it'd be fun to learn something new.

Cleaning and plotting a single days worth of temperature data

I considered pushing the data into a mongodb, but for just half a million entries it's a bit overkill. Instead as a first small goal let's just clean up individual day logs into a csv format that d3 can load. In the process we can aggregate the data as well. The data is simple enough we can just parse line by line:

with open(csv_filepath, 'w') as f:
  f.write("%s\n" % (",".join(labels))) # First line contains labels
  for line in open(raw_filepath,'r'):
    linedata = line.strip().split('\t')
    linedata[0] = formatDate(linedata[0])
    f.write("%s\n" % (",".join(linedata)))

We output a csv file with a header, as d3 has the handy function d3.csv to parse the file. Finally, using d3 we can make a bar plot like so:
Plot of outdoor temperature over one day - Gist used to generate plot
Following a really nice d3 tutorial as well as rummaging on google, I added axes and a real-time mouseover overlay to the data. The next step is to deal with aggregate data.

Popular posts from this blog

Building a PID hover controller for Kerbal Space Program with kOS and IPython Notebook

Learning TensorFlow #1 - Using Computer Vision to turn a Chessboard image into chess tiles

Learning TensorFlow #2 - Predicting chess pieces from images using a single-layer classifier