Thursday, October 22, 2009

ICM Class Notes on Text Processing and Serial - Oct 14, 2009

Last week’s ICM class focused on parsing strings, picking up where we left off the week before, with a short overview of serial communications. Here a list of the topics that we covered:
  • Processing and XML
  • Additional functions for processing text
  • Serial communications
XML Libraries in Processing
XML has a hierarchical structure similar to a tree. This makes XML files easier to read than other types of content. There are many different ways that you can read XML documents in processing. Here is an overview of these options:
  1. Standard string parsing functions in processing, like the ones outlined below and in my post from last week.
  2. Existing processing XML libraries. Examples of these include simpleML, XMLElement, proXML. The libraries enable processing to navigate the structure of an XML file by finding a data elements by navigating through its children or parent structure.
  3. Application processing interfaces (APIs) from the data source. Examples of sites with APIs include Flickr, Google Maps, etc.
Using Tokens to Parse Text
This week we were introduced to the concept of tokens and splits. Tokens are small chunks of text (these are only one character long). Here is a list of functions that leverage tokens or splits.
  • split(String, SplitStringIdentifier); – This function returns multiple strings. The input copy “String” is split at the instance(s) of SplitStringidentifier. The SplitStringIdentifier is removed from the final strings.
  • splitTokens(s, multiple SplitStringIdentifier); – This function is the same of split, however, it can accept multiple SplitStringIdentifiers. These are input all together within quotation marks. For example using “_.” Would look for instances where either “_” or “.” appear in the string.
Comparing Text
  • equals(); – This function compares the copy contained in a string to a piece of text data. For example “string.equals(“stringData”);” is equivalent to the syntax “intVar == 3”. That said, we cannot use the Boolean operator “==” to match strings. That is why the equals() function is needed. In order to compare words regardless of whether they are lower or upper case we can use the string.toLowerCase() function.
  • Regular Expressions - Regular expressions are special text strings for describing search patterns. These capabilities enable more sophisticated parsing of content than is possible through the standard functionality in Processing. This is the ideal way to perform complex data parsing and cleaning. We did not cover this in class, so research will be needed on this front.
Serial Overview
There are two main approaches to creating protocols for serial communication. The first is called punctuation, and it entails adding tokens to the data being communicated to enable the receiving computer to parse the data once received. The second approach, called handshaking, entails having each computer wait to send a message until they have received data from the other connected device.

I will not review these approaches here in detail because I have covered them in my posts from the Intro to Physical Computing class. The punctuation method is described here. I will soon add a link to the post regarding the handshake method (which is currently a work in progress).

Related notes and concepts:
  • CallBack refer to methods that are called by other applications when a certain event takes place. MousePressed and SerialEvent are types of callback or event functions available in processing.
  • To create a new serial object it is important to always include the Serial library (as it is not a standard processing library). Then to instantiate the object, once you’ve declared it, you need to use the following syntax: “portName = new Serial(this, “serialPortNumber”, baudRate);” The “this” argumen tells the serial object that it is setting up a communication between the serial port and this specific sketch.

No comments: