Posts

Showing posts from December, 2015

Parsing Text from the ground up by building a Tokenizer, Parser, AST generator and walker in C++

I'd previously read a text file line by line, and used regex to split it up into words to do frequency analysis with Ruby . Nowadays every major programming language provides libraries for parsing text files, but how do we get from reading in a stream of characters to more abstract concepts like words, lines, or sentences? This happens in something like two parts, the first is  Lexical analysis  with the  lexing phase, where we read a stream of characters from a file, and chunk it up into tokens . These tokens can be whatever we want, words, objects, concepts etc. In programming, open and close brackets are special tokens for example. Once we have a stream of tokens, we can then parse  them, where the order of the tokens can be used to understand something, for example, a left bracket token and a right bracket token group tokens within the two from those without. We can turn these tokens into everything from English sentences to programming languages, HTML web pages or say she

Basic Frequency Analysis with Ruby

Ruby is a pretty fun language, previously I've used it for N-body simulations following the Maya Open Lab Development series . I decided to do a short brush-up project by writing a command-line ruby script that does some basic word statistics on text files. For example, looking at the number of occurrences of words of a certain length in a file. On the side we can use the optparse and benchmark libraries to make the interface a little cleaner and do some timing. How can we find the most common words in a text file? Well, first we'd parse the text file on a word chunk basis, this can be done using a lexical parser, which converts a sequence of characters into a sequence of tokens, where the tokens are words in our case. Luckily regex can do this handily for us instead of rewriting our own, again . Then, when we have this list of tokens, we want to detect duplicates, and keep track of the number of times each word appears. A way to do this is to keep a memory of words seen,

Building a Pi Camera Mini-Tankbot with a Raspberry Pi, Arduino, and ROS

Image
Building toy robots is always fun. I had a Raspberry Pi and a  Pi camera  lying around, the small form factor, power and weight with a linux core was just begging to be roboticized. This miniature tankbot takes advantage of ROS and rosbridge  to get communication up quick and allow tele-operation over the internet using either a web GUI, android app, or a google glass app. The camera also has pan-tilt control using 2 servos. Mini Robotic Tank toy We could do everything with the raspberry pi, however I had 2 micrometal gear motors that run on 5V and comes with a Zumo kit designed for an Arduino. Arduino with the kit it is, used as the embedded real-time controller. Architecture The 2 motors have optical encoders soldered on. The Arduino keeps track of the encoder counts for wheel velocity as well as runs the motors. It also controls two servo motors that holds the raspberry pi camera, and 1 LED as a flashlight. The Arduino is connected to the Raspberry pi over USB and

ASCII-based Conway's Game of Life in C++

Image
Conway's Game of Life is a specific rule in cellular automation that makes a for a fun organic system that is both Turing complete and allows for 'evolution' over time, meaning that organized structures can and will grow out of chaos. From wiki - A system creating 'gliders' The system is pretty simple. We have a 2D grid (though other dimensions can be done) that contains binary values 1 or 0, alive or dead. Each step, we check a cell, and it's 8 neighbors. If the cell is currently alive, it dies unless there are exactly 2 or 3 neighbors, so no under-population or over-population allowed. If the cell is currently dead, it lives if there are exactly 3 neighbors, so reproduction. This can be pretty short in C++, implemented like this for example: bool Life::checkLife ( int x, int y) { int n = neighbors (x,y); // neighbors return n== 3 || (world[x][y] && n== 2 ); } We can  seed  the system with some initial pattern of

Learning SVD by doing PCA Image Compression using Matlab

Image
We can use Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) to poorly compress an image of a rubbery ducky and learn some concepts about eigenvectors and values at the same time. Using Matlab I wrote a little script to generate an approximate image for all possible subsets and put it together into a little gif. SVD breaks an NxM matrix up into 3 matrices (called  UΣV*), which are: U - An MxM unitary matrix. Each column is an eigenvector. Σ (or S) - An MxN diagonal matrix, representing the eigenvalues. V* - (complex conjugate of V), An NxN unitary matrix. Each row is an eigenvector. Which when multipled together give you back the original matrix. Matlab is a decent choice of language since it's literally two lines to get the SVD. %% Do SVD on grayscale NxN matrix img = imread ( ' Rubber_Ducky.jpg ' ); [U,S,V] = svd ( rgb2gray (img)); It wasn't immediately clear to me what this means, or why it's useful. One of my fr

Relearning web development by writing a clicking game using React, Npm and Browserify

Image
I had some free time this vacation, and came across this interesting little sandbox game called  Pokemon or BigData by pixelastic . The aesthetics of the code are clean and simple. It uses React , which seems to be used for real-time web UIs. It also uses  npm  as a package manager. My previous forays into async calls in web dev were not nearly as pretty, so this seems like a good opportunity to learn something new. I decided a fun first start would be a simple game, and the simplest game I could think of was clicking a button to make a count go up. To give it some real-time feedback, I added some upgrade buttons that automatically click for you, a common incremental game technique. The buttons don't show up until there are enough clicks, and are grayed out if there's not enough to further upgrade. I set it to run at 50Hz to feel responsive. The heart of the code is basically a state machine that contains the current number of clicks and the number of upgrades