Wednesday, February 26, 2014

Learning Go with a Martini - The Basics

Intro

I’m working with my son to build a system to manage all sorts of data, files and devices. Our apps will need to run on OSX, Windows, Linux, and possibly Raspberry Pi machines. In addition to multiple platform support, our apps need to be fast, we will eventually be working with near real time data. So I started looking for a language that would fit the bill and Go caught my eye. Why? For one, it compiles, I mean it really compiles down to an actual executable, not byte code. Another reason is some of the speed benchmarks I saw said go was second only to C for speed. Also, if I can’t get Go to be fast enough I can load C libraries too. Lastly, Go appears to have a vibrate third party package ecosystem. So I thought I’d give Go a try. Since I most of my day job work is in web development I thought I’d learn Go while I build out a web service and share my experience with others who may be curious about Go.

Goal

In this post we will start building out a API to manage attributes. Every resource in our system will have attributes and these attributes will be different per each type of resource. The API will provide the necessary functionality to assign,remove and list attributes the available attributes assigned to a particular resource. In this post we will set up the Go development environment and create the basics for the GET /attributes/:resource.

Setup

Installing and Configuring Go

Installing Go is pretty straight forward, just follow the instructions on golang.org’s install page. To ensure Go is setup correctly in your development environment run go version and you should see output similar to this, go version go1.1.2 darwin/386

After installing Go I need to do a few tasks to setup my Go work space. The Go tools are created to work with a certain directory structure, mainly a ‘home’ directory that contains three subdirectories: src, where my source files will go; pkg, where any of 3rd party packages I install will live; and bin, where any executables I install will live. the directories have been created with in the ‘home’ directory a GOPATH environment variable should be set to that directory. In my environment my $GOPATH variable is set to ~/src/go. For a more detailed overview of the workspace layout read the “How to Write Go Code” page on the golang.org site.

Editor Setup

I use emacs to write the vast majority of my code. I would imagine that my editor choice is not the norm for most of you reading this post so I’m going to add a few other editors that have Go support.

Eclipse goclipse
Emacs go-mode.el
Sublime go plugin
Vim go language vim support

Follow the instructions and your editing environment will be ready to ‘Go’.

Installing Third Party Packages

The last step before we start writing code is to install the lone third-party package that will use to create the attributes API. Go makes it easy to install the packages by providing the go get tool. As you can see below you just add the package you wish to install after go get. When the install completes the package can be found by running ls $GOPATH/pkg in a terminal window.

go get github.com/codegangsta/martini

Now that I have Go installed, my work environment setup including my editor, installed martini, I’m ready to start coding. I will be writing the code for this blog series in the $GOPATH/src/github.com/rippinrobr/martini-to-go-posts directory.

Writing the Code

A Basic HTTP Server

One of the reasons why I chose to get familiar with Martini is it allows you to get a functional web server up and running with about 10 lines of code. So to make sure I have everything in place and I can respond to an HTTP GET / request I’m going to start with a very basic app. This app will respond to the HTTP GET / request by returning a string that reads "Where are the attributes?!?!” Here’s the code

package main

// loading in the Martini package
import "github.com/codegangsta/martini"

func main() {
  // if you are new to Go the := is a short variable declaration
  m := martini.Classic()

  // the func() call is creating an anonymous function that retuns a stringa
  m.Get("/", func() string {
    return "Where are the attributes?!?!"
  })

  m.Run()
}

Before you run it let’s go over the code. The first line defines the main package, all Go code must be in a package. This code belongs to the main package which is a special package in Go. The main package is where the main function must be for all executable Go projects. Once I’ve defined the package I need to let go know what packages I want to import. In this code I am only importing the the martini package. This statement makes the martini functions, structs, and interfaces available to my code. To call anything in this package I need to preface the call with martini. In this example I’m only using one function from martini, martini.Classic().

The martini.Classic() call creates the classic martini object that I will use to declare the supported routes and start the service. This particular line makes use of the ‘short variable declaration’ syntax. The := determines the type of the object, var, etc.. on the right side and creates a variable of that type on the left side of things. The := syntax can only be used within the body of a function.

Now that I have created my martini object I can start setting up to handle our HTTP GET / request. Adding a route is to handle is pretty straight forward. For this simple example I declare the route I want to respond to “/“ and I am using an anonymous function to handle the requests. If you are new to Go the string that follows func() is the return value of the function.

m.Get( “/”, func() string {
  return “Where are the attributes?!?!”
})

Since this is the only route I’ve declared any other route sent to the service will result in a 404 error. The last bit of code is the Run() call which starts the HTTP server. By default the server will listen on port 3000, if you want to change that port set the PORT environment variable to the new value and restart the server. Martini automatically looks for the PORT variable.

The next step is to actually see the code in action. The easiest way to do that is by calling go run.

go run attr-server.go

The go run command will compile and run the application. If there are no errors you should see a message [martini] listening on host:port :8000. My server is running on port 8000 because I've set my PORT environment variable to 8000.

Now, to make sure that the response is what I expect. I'm going to run:

curl http://localhost:8000/

And I should see should see Where are the attributes?!?! string returned. As you can see it is pretty simple to get a basic HTTP server up and running.

GET /attributes/:resource

Ok, now that I’ve shown you the basics of martini its time to build out our first ‘real’ route. Remember, the goal of this service is to track a resource’s attributes. Resources can be anything from a TV, scoreboard, ad boards, etc.. Each of these will have its own set of attributes. For this blog post the /attributes/:resource route need to do the following:

  1. If the resource requested is a TV then we will return a JSON object with all the attributes assigned to a TV and the HTTP Status code of 200
  2. If the resource is not a TV then we will return a JSON error object that we will define and a status code of 404.
New Packages

I am going to need to include a few more packages to the code to meet my needs. The first import is the net/http package. I’m importing this package so I can use http.StatusOK instead of the number 200, it will make the code a little more readable. The next new package is the strings package. I’m using this package so I can convert the requested resource to lower case so that I can ensure my string comparison is comparing the input in the same case as my test string.

import (
  “net/http” // this will allow us to use http.StatusOK and http.StatusNotFound instead of 200 and 404
  “strings”  // I’m adding this so I can ensure that we are comparing lower case   strings.
  “github.com/codegangsta/martini” 
)

Notice that the import call has changed. When there are multiple packages to import you can group them together as I have above or you could use an import call for each one. Either way works but I believe the way I have it here is the more idiomatic Go way.

The New GET Handler

The next change is that I’ve replaced the m.Get call we had previously with this one:

m.Get("/attributes/:resource", func( params martini.Params ) (int, string) {
  resource :=  strings.ToLower( params["resource"] )

  if resource  == "tv" {
    return http.StatusOK, “a TV attributes object will be returned here"
  } else {
    return http.StatusNotFound, "JSON Object here"
  }
})

The new m.Get call has a bunch of new parts to it. The first /attributes/:resource tells martini what route to look for. The :resource is used to indicate to martini that whatever value is here we want to store in params map. The value will be stored under the key ‘resource’, notice that the key does not have the leading colon. This handler’s function has one parameter, params, which will contain all route parameters. Next is the return value declaration. This version of the handler returns two values, the HTTP Status code and a string.

The guts of the function are there to determine if the resource being requested is a TV or not. If it is return OK if not return a not found error. Right now the code has placeholders in it but soon the strings will be string representations of a JSON object. If you are new to Go like me the if statement looks a little naked, there are no () around the test portion. The return statements are a little different than what I’m used to seeing also. Remember that this function has two return values and on the return lines the values are separated by a comma.

Now that we’ve talked it to death if you want to see it an action download the code form here https://gist.github.com/rippinrobr/9084362 and run it using:

go run attr-server.go

In a seperate terminal run the following curl command and you should see similar output.

curl -v http://localhost:3000/attributes/tvs

* Adding handle: conn: 0x7f911c004400
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7f911c004400) send_pipe: 1, recv_pipe: 0
* About to connect() to localhost port 3000 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 3000 (#0)
> GET /attributes/tvs HTTP/1.1
> User-Agent: curl/7.30.0
> Host: localhost:3000
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< Content-Length: 16
< Date: Wed, 19 Feb 2014 01:27:00 GMT 
<
* Connection #0 to host localhost left intact
JSON Object here

The curl -v displayed enough output so that you can see just about everything that happened during the request. I’m using it here so you can see that the call above does in fact return a 404 code in addition to the error message. To see what happens when you pass it TV rerun the command after removing the trailing s from tvs. The HTTP/1.1 status should now be 200.

Sending the JSON Object

Now that I have the basic logic in place its time to start building out the infrastructure to support resource attributes. To help model the resource to attributes relationship I am introducing two new structs, Attribute and ResourceAttributes.

type Attribute struct {
  Name string `json:"name"`
  DataType string `json:"type"`
  Description string `json:"description"`
  Required bool `json:"required"`
}

type ResourceAttributes struct {
  ResourceName string `json: "resourceName"`
  Attributes []Attribute `json: "attributes"`
}

The Attribute struct contains all of the information I want to store about each attribute. The ResourceAttribute struct is used to represent the relationship between a resource and its attributes. Since these structs will be converted to JSON and I want to the names of the field to follow proper JSON naming conventions I’m using the “field's tag value” to convert the names to lower case during the JSON conversion process.

The updated handler will be returning a string representation of the JSON object and in order to do that I need to create a String method for each of the types I’ve declared. I want to send all of my structs back to the client as JSON objects I need to create two String() methods. The method below is used on the ResourceAttributes struct.

func (ra ResourceAttributes) String() (s string) {
  jsonObj, err := json.Marshal(ra)

  if err != nil {
    s = ""
  } else {
    s = string( jsonObj )
  }

  return
}

There are two differences in the declaration of this method from the functions I declared earlier. The first is right after the func keyword is what looks like a parameter declaration. What it does is declare what type is the 'receiver' for this method. What that means is any ResourdeAttribute object can call the String() method. The second difference is in the way the return value is declared. This method makes use of Go's named return value. What that means is whatever the value of the variable s is at the time that the method returns will be the value returned by the method.

First, the method converts the receiver into JSON. If there are no errors returned during the conversion process then the JSON representation is converted to a string and stored in s. If an error occurs then s is set to an empty string.

To be able to send all of the structs I declared as JSON I would have to create a String() method for each type. The methods would be exactly the same except for the reciever. Not exactly keeping the code DRY. Thankfully shortly after writing the code of this part of the blog I reached a section on interfaces in The Go Programming Language Phrasebook and I was happy to see that using interfaces will let me DRY up the String() methods.

A Go interface is a set of methods. Any struct that has all of the methods in the interface declaration is said to implement the interface. Interfaces can have 1, 10, or no methods. So I decided to try an empty interface declaration that would stay within my main package.

type jsonConvertible interface { }

Since the name of this type starts with a lower case character it is only visible within the package it was declared in. Now any struct I declare in the main package will implement the jsonConvertible interface. After creating the interface I moved away from using methods back to a normal function, I created a new function named JsonString. JsonString has a single parameter, a jsonConvertable struct. Now I can have one function to convert all my structs into a JSON string.

func JsonString( obj jsonConvertible ) (s string) { jsonObj, err := json.Marshal( obj )

if err != nil {
  s = ""
} else {
  s = string( jsonObj )
}

return

}

If you want to see this version of the code in action you can grab it in this gist and go run it.

Setting the Content-type to application/json

When you run the latest and greatest you see that the server does send back a JSON string but if you look at the headers you can see that the Content-Type is set to text/plain. I want the Content-Type to be application/json. In order to do that I need to set the Content-Type before I send the response. Luckily, using martini makes this as easy as adding a new parameter to my handler function, writer http.ResponseWriter. I can use the new parameter to set the correct Content-Type.

writer.Header().Set("Content-Type", "application/json")

Now whichever object is returned, it will have the correct Content-Type set. To see for yourself, clone the repository, checkout the 1st-post branch and run it. You’ll see in the headers that I now have the correct Content-Type set.

Summary

With that, I’ve completed everything that I set out to do by the end of this post. I showed you where to get Go and how to setup your environment. I’ve walked you through how to create structs, respond to HTTP GET calls and how to return a JSON string. In addition to that I introduced you to interfaces in Go.

In my next post, I will add CRUD functionality using MongoDB, showing you how to add middleware to pass along database connectivity to the request handlers. By the end of the second post the attr-server will retrieve all available attributes assigned to a TV from the database using our /attributes/:resource route.

Resourcs

Editors

Eclipse goclipse
Emacs go-mode.el
Sublime go plugin
Vim go language vim support

Go Language

GitHub & Gits

Blogs & Books

Thursday, November 14, 2013

Intro to the MEAN Stack - Part 1 - The Data

I recently changed jobs to join a startup. One of the many reasons why I took the job is the fact that the software is built with the MEAN stack stack – MongoDb, Express.js, AngularjS and Node.js. Prior to joining the company I had dabbled in each of the parts of the stack but I hadn’t used any of them on a ‘real’ project. So to reinforce what I’m learning during the day by using the MEAN stack to build an app to view the stats for the New York/San Francisco Giants. I thought I would share my experiences hopefully to help someone else who's learning the MEAN stack. This is the first post in a three part series that will walk you through building my app. The planned posts are:

Part 1 – The Data: Converts the MySQL data model to a suitable model for MongoDB.

Part 2 – The API: Building out a Node.js based API that will allow us to retrieve the stats.

Part 3 – The UI: Covers building out an AngularJS UI

Goal

The goal for this post is to have the data modeled and loaded into a MongoDB database so we can use it in the next post.

Setup

If you want to 'follow along' with this post you will need to download and install MongoDB and Node.js. Once you have installed mongo and node you can download the part 1 code. Keep in mind before you run the load scripts you will need to do install the node packages. To do that, simply change into the <YOUR CODE DIR>/post-1-the-data/scripts directory and run:

npm install

npm is the node package manager. The install option tells npm to read the package.json file and install any of the requirements that have not already been installed. If you plan on loading the Giants data I have already parsed out you are good to go. However if you want to start the from the beginning yourself you will need to download the data from the Baseball Databank project. The most up to date branch is the 2012update branch. After cloning the repository read the scripts/README.md and you will then be ready to generate your own Giants data or any other team’s data.

My Two Second MongoDB Introduction

Before I dive into the meat of the post I want to give you a very brief MongoDB intro. We will be storing data in collections, which are analogous with tables. Each collection will contain a document, which you can think of as a row in relational databases. As the NoSQL term implies we will not be using SQL to retrieve the data. Instead we will use javascript.

The Data

As a kid who grew up reading box scores every morning while I ate my breakfast I am very thankfully that there is an open source project out there dedicated to providing the statistics for Major League Baseball. It is distributed using 24 CSV files, each file maps to a MySQL table that was used to generate it. All 24 data files either describe a manager, player or a team. So as we build out the database we will create and load three collections: managers, players, and seasons.

Each collection will house documents that have been designed so that one document will represent one player, one manager or one season. This will make our development of the API much easier. For the most part one call should retrieve everything we need. To give you an idea of what type of data the documents will contain I’ve created a map between the collections and files. Remember each file is a table in the Baseball Databank database. In some cases you’d have to write some pretty complicated joins to get the data. In Mongo, our queries will be straight forward.

Managers Collection (4 tables) => AwardsManagers, Managers, ManagersHalf, Master

The Players Collection (11 tables) => AllStars, Appearances, AwardsPlayers, Batting, BattingPost, Fielding, FieldingPost, Master, Pitching, PitchingPost, Salaries

The Seasons Collection (3 tables) => SeriesPost, Teams, TeamsHalf

If you’d like a description of the tables checkout the Baseball Databank README

The MongoDB Side

For the rest of this post I am going to walk you through an example of a document that is stored in each collection. The description will also have examples of how to retrieve the data from the mongo console app. We will start with the simplest of the collections, the managers.


Managers

Any manager that has managed at least a game for either the New York or San Francisco Giants will have a document in this collection. A managers document has demographic information,, managerial record plus any awards they may have won. Our managerial document example is Rogers Hornsby’s. He managed the New York Giants in 1927.

The first property of the document is the _id property. By default each record would have an _id field that is a randomly generated ObjectID created by mongoDB during the insert. Here's what an ObjectID looks like

ObjectId("528398bb3b06760000000004")
In some cases that may work fine but for this collection I am using the baseball databank managerID value. This allows me to take advantage of the built in unique constraint on the _id index. The documents properties are self explanatory however I would like to discuss the record properties.

The record property is an array of JSON objects. Each entry in the array represents a full or partial season with the Giants. Since Hornsby only managed part of one season for the Giants he only has one item in the array. You can think of the entries in the record array as a row in a record table in a relational database. Using an array allows you to keep all data related to managers in one document making it easy to retrieve all of the manager's data when needed. We will make use of arrays in all of our documents. If the Giants had made it to the playoffs or if Hornsby had won any managerial awards his document would have two more array properties, playoffs and awards.

You may have not caught that last bit but documents in the same collection do not have to have the same properties. In all of our collections the documents will have about 95% of the properties in common. I will show you how we can check for the existence of a property when querying the database in a bit. Stay tuned.

You might be wondering how I retrieved the Hornsby document. Let me walk you through the queries I used.

At a command prompt fire up the mongo client by running:

  mongo giants

This will connect you to a local instance of mongodb and switch you into the giants database. You could run just mongo to connect and the use giants at the mongo prompt to do the same thing. For more options read the mongo shell documentation.

Now that I'm in the giants database I can retrieve the Hornsby document by running:

   db.managers.find( { nameLast: 'Hornsby' }).pretty()

What the query says is in the current database search the managers collection for a document that has a nameLast property equal to 'Hornsby'. The find call will return an array containing all of the documents that match our query. The pretty function formats the results of the find in a more human readable fashion. Try the find
call with and without the pretty and you will see what I mean.

If you look closely at the Hornsby document you will see in the record array his only entry has the inseason property set to 2. This means there was at least two managers for the 1927 Giants. To find out who the other managers there were we can run the following query.

db.managers.find( { 'record.yearID' : 1927} )

This will return two documents one for John McGraw and one for Hornsby. Look closely at the query and your will see that I'm using a 'dot notiatiod' to find the managers for the 1927 season. Like the previous find it will return an array of matching documents. Unlike the previous query this one looks inside of the record array. Each document within the record array will have its yearID property compared to 1927. If it has one entry with that yearID then the document will be returned. Going back to my analogy of each entry in the record array being equal to a row in a relation database table you can almost thing of the dot notation as a join. One thing to keep in mind is whenever you use dot notation you must quote the property like I have done. Failure to do so will cause an error from mongoDB.

Remember I said I would show you how to test for the existence of a property? We will get a count of all the managers who have the playoffs property in their document.

db.managers.find({playoffs: {$exists:true}}).count()

This query simply says, find all the documents in managers that have the playoffs property. I am using $exists which is one of the built in query operators. To see what other operators are available checkout the MongoDB Operators page.

The last bit of work I need to do is to setup the indexes on the managers collection. Since most of the queries I run will be on either the last name or to look for a particular season I will add an index on nameLast and record.yearID. Here’s how to create an index in mongo:

db.managers.ensureIndex({ nameLast:1 })
db.managers.ensureIndex({ ‘record.yearID’:1})

These two calls create two separate indexes on the nameLast and record.yearID properties. Notice that I can use the dot notation when declaring an index also. The 1 indicates that we want the index created using ascending ordering. To create an index that uses the descending order swap out the 1 for a -1. Now our managers collection has three indexes: one on the _id property, one on the nameLast property and one on the record.yearID property. To see what indexes are on a collection you can run:

db.managers.getIndexes();

For more information on ensureIndex and getIndexes visit: http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/ http://docs.mongodb.org/manual/reference/method/db.collection.getIndexes/


Players

Just like the managers, every player that has stepped onto the diamond in a New York or San Francisco Giants uniform will have a document in this collection. A player’s document will contain demographics, statistics, and appearances. If the player has been an all-star, won an award or has been inducted into the hall of fame his document will have additional properties. Below is the document for Eddie ‘Hotshot’ Mayo who played for the New York Giants in 1936.

The players document is considerably larger than the managers document is. The reason for that is I chose this design was it allows me to retrieve the Giants history of a player with a single query. Even though I've chosen a player centric design it is still relatively easy to find roster related information. As an example lets say we want to see who else played third for the Giants during the 1936 season. I could run the following query:

The query returns a total of four documents, four full player documents which makes it a little hard to read the names of the players. All I really want to see is the nameLast, nameFirst and the value fiendingStats.G for the players who played third. I can convert the output to only contain the values I've indicated by using a projection. I am also changing the names of the properties to something I find a little nicer to read. Now when I run the query I should have four much easier to read results. The updated query and results are below.

That returns the following:

Now that we have the data in a readable layout I would like to sort the players so that the man who played the most games at third will be listed first. Sorting is as easy as adding the $sort operator. Here's what the query looks like with the sort call added.

Notice that I used the new name that I created in the $project call. The -1 indicates we want to sort the games in a descending fashion. The results of the updated query are below.

If you’ve been paying attention you noticed that I was using a function called aggregate instead of find. The aggregate function allows us to chain commands together. We can use the aggreation pipeline to ‘filter’ our data. It works by passing the results from one task to another as illustrated in the $project and $sort calls. I used $project to rename the fieldingStats.G property to just games. I then used the new name, games, to sort by. Let’s walk through the last query to get a better picture of whats going on.

$unwind

{ $unwind : "$fieldingStats" }, 

What $unwind does is create a new document or each member of an array. That means a copy of the demographics is put together with each entry in the fieldingStats array. So if a player has 10 entries there will be 10 documents with the same demographic information. Each document will have a single entry in the fieldingStats directory. I chose the fieldingStats property to $unwind on because I am only interested in third basemen. Notice that fieldingStats has a $ in front of it. Remember, that means that you want to use the value of the $fieldingStats property in the command. If I executed the query now with only the $unwind call I would receive the following message.

aggregation result exceeds maximum document size (16MB)

The message brings up one thing I haven't mentioned yet and that is all documents must be less than 16MB in size. Remember the unwind creates many new documents. The players collection has 1675 documents in it, if each player has 5 years worth of stats for 3 different positions, you can see how the size of the result set will increase. Thankfully, in my case I'm filtering the results of the $unwind call down so the 16MB limit is not a problem for me. In my three months of working in MongoDB I have yet to have the size limit cause any issues for me.

$match

{ $match : {"fieldingStats.POS": "3B",
             "fieldingStats.yearID" : 1936 }},

The output of the $unwind call are passed as to the $match call as input. $match searches the input documents looking for documents that match the given parameters. In this case anyone who played third base during the 1936 season will be returned. The number of documents have gone from the thousands to four. The four complete players documents are passed the $project operator.

$project

{ $project : { _id : "$_id",
          lastName : "$nameLast",
         firstName : "$nameFirst",
             games : "$fieldingStats.G"}}

I've already gone over what the $project call does so I won't go into it again.

$sort

{ $sort : { games: -1 } } 

Since I have already gone over the $sort call, I won't do it again here.

Indexes

There will be a few more player API calls so I am going to create a few more ensureIndex calls. Here are the players collection indexes.


Seasons

Each season the New York/San Francisco Giants have played in professional baseball is represented by a document in the seasons collection.
Each document in the collection will have the team’s regular and playoffseason records, team statistics, the roster, and list of managers. The document below represents the 2012 season when the Giants won their second World Series championship in 3 years.

To get the 2012 season document I used another select function, findOne. It is similar to find but it only returns a single object. In cases where there are more than one matching document findOne will return the first document found in the ‘natural order’, it will return the first one stored on disk.

The seasons document is similar to the players and managers document in that it has a ‘core’ set of data that pertains the team’s season plus arrays that store information about the players who were on the team that year as well as the managers.

For the seasons collection I will add the indexes below. The API will make use of these indexes as you’ll see in the next post.

Summary

I have taken the data from 18 database tables and stored them in three different collections in my MongoDB database. The new schema will allow us to make the fewest calls to the database when retrieving player, managerial or season related data. Throughout the post I showed you how to run select statements in the mongo client using find, findOne and the aggregation pipeline. I this post helped illustrate some ways that MongoDB can be used to store data in ways that makes using the data easier.

Resources

MongoDB Doc Links

Baseball Sites